Tuesday, September 12, 2017

MD5 vs SHA1 vs RSA

Difference between MD5 or SHA1 or RSA

MD5 is a cryptographic hash function. "SHA" is the name for a family of hash functions; first a short-lived "SHA" which was renamed "SHA-0", then "SHA-1" was defined (a best seller). Later on, new members of the family were added, collectively designated as "SHA-2", and consisting of SHA-224, SHA-256, SHA-384 and SHA-512. Recently, a new SHA generation was designed, called "SHA-3" but also "Keccak" (this was an open competition, Keccak being the codename of one of the candidates, who ultimately won).

A cryptographic hash function is a fully defined, deterministic function which uses no secret key. It takes as input a message of arbitrary length (a stream of bits, any bits) and produces a fixed-size output. Output size depends on the function; it is 128 bits for MD5, 160 bits for SHA-1, 256 bits for SHA-256... Everybody can compute a given hash function on a given input, and they all get the same results. Hash functions are also called digests because they somehow produce a kind of "checksum" or "summary" of the input. Robust hash functions must be such that nobody knows how to "invert" them or even find two distinct inputs which yield the same output. The latter is called a collision and it is a mathematical necessity that collisions exist (since the function can accept many more distinct inputs than it can produce distinct outputs), but we require that it is unfeasible to find even one collision.

MD5 turned out to be very broken with regards to collisions (we can produce a collision in a few seconds of work on a PC) and SHA-0 is also broken in that respect; SHA-1 is a bit flaky; the rest of the SHA family appears to be robust so far. How a hash function achieves collision resistance is a bit of a miracle since the whole function is completely known, with no secret value; it just mixes the data too much for the best cryptographers to unravel the process.

RSA is two algorithms: an asymmetric encryption algorithm and a digital signature algorithm. Although both algorithms build on the same kind of mathematics, they are quite distinct (a lot of people describe signatures as "encryption with the private key", which is a flaw analogy and at best confusing, so don't do that). Both algorithms uses keys, i.e. pieces of data which must be kept secret. It so happens that for RSA signatures, what is signed is not directly a given message (a sequence of bits) but a hash of the message: the message is first processed with a cryptographic hash function like SHA-256, and the hash value is then used. This is done that way because the mathematics of RSA can handle only values of moderate size, a few hundred bits at best. Cryptographic hash functions are such that signing the hash is as good as signing the original data.
That way, RSA and cryptographic hash functions are often used together; but they are not the same thing at all.
Hexadecimal is a way to represent a sequence of bits into a sequence of characters: hexadecimal uses digits and letters from 'A' to 'F'; each character encodes exactly four bits ('0' encodes '0000', '7' encodes '0111', 'D' encodes '1101'...). Any sequence of bits (and in particular the output of a hash function) can be converted to hexadecimal and back. Hexadecimal is popular because human eyes and brain are good at reading characters, and not at reading bits. Command-line tools which compute cryptographic hash functions on files traditionally output hexadecimal characters for that reason. However, there is nothing which intrinsically ties hash functions with hexadecimal: everything which fits in a computer one way or another is a sequence of bits, and so is amenable to hexadecimal; and a hash function output is just a sequence of bits, which can be encoded in various ways, hexadecimal being just the "traditional" way (although Base64 is also often encountered, especially when dealing with databases).


No comments:

Post a Comment