Introduction
Disclaimer
This book is not written by experts in cryptography and cryptanalysis. We take no liability in the security of your systems inspired by the text within. We will however link to sources for any claims made within the book.
What is this guide
So, you want to roll-your-own-crypto, but are told not to? You're in the right place. This guide will cover the primitives and standard constructions, as well as listing common attacks against crypto. Before getting started on your ideas, I would recommend reading everything in this guide.
Primitives
Crypto primitives are the building blocks of any good crypto system. Do not try to write your own primitive. Use one of the well-known implementations for your language. Many primitives are vulnerable to implementation quirks such as timing attacks or cache-based key leaking.
Cryptographic Hash Functions
A hash function takes an input set of bytes and outputs a 'digest' set of bytes. The digest is always the same if the input is the same. For a cryptographic hash, the reverse operation from digest to input is not possible.
Randomness
The digest of a cryptographic hash function is pseudo-random.
Uniqueness
A cryptographic hash digest is not guaranteed to be unique, but up to a certain number of inputs, it can be assumed with a high probability. If we assume a 256-bit digest, it would take 480 octillion different hash calculations to have even a 1 in 1 quintillion chance of having a collision. For the breakdown of the mathematics, read about Birthday attacks.
This uniqueness assumption relies on the cryptographic hash still being considered secure. MD5 and SHA1 both have practical collision attacks.
SHA
Standing for "Secure Hash Algorithm", this is a family of cryptographic hash functions standardized by the NSA.
SHA-1
Do not use SHA-1.
SHA-2
SHA-2 is a good cryptographic hash function. Because it is such a commonly used function, it has several instructions in x86 SSE and many modern ARMv8 chips. On an M2 Max, SHA-256 can hash about 2.4GB/s, and SHA-512 can hash about 1.5GB/s
There are many forms of SHA-2:
SHA-224- 256 bit state/224 bit outputSHA-256- 256 bit state/256 bit outputSHA-512/224- 512 bit state/224 bit outputSHA-512/256- 512 bit state/256 bit outputSHA-384- 512 bit state/384 bit outputSHA-512- 512 bit state/512 bit output
Some of them truncate the state, which reduces the effectiveness of certain length-extension attacks.
SHA-3 (Keccak)
SHA-3 is a very different construction to SHA-2. It is considered to have far better security, especially in a post-quantum world. However, it does not yet have common hardware support and thus is considerably slower.
BLAKE
Message Authentication Codes
Message Authentication Codes are similar to hash functions. They take arbitrary bytes and output a random but deterministic value, however, MACs use a key to authenticate that the code was produced by a trusted party.
Length Extensions
Naive MACs using a hash function are vulnerable to length extension attacks. For example,
let's say you have a construction like HASH(key || message). Unfortunately, with SHA-256,
it is trivial to extend the message with arbitrary data and calculate a new forge signature.
Example code
HMAC
HMAC stands for "Hash-based Message Authentication Code". It is very simple, and allows for a generic hash function.
- If the key is too large, perform
key = hash(key) - Calculate
inner = hash((key ^ ipad) || message) - Calculate
digest = hash((key ^ opad) || digest)
And that is it.
BLAKE
KMAC
Key Derivation Functions
HKDF
BLAKE3
Argon2
Cipher
Ciphers represent symmetrical encryption. They take some input data (plaintext), a key, and an initialisation value and they turn the data into a random looking blob (ciphertext) of the same length. The IV and random ciphertext can be safely made public, and only the owner of the key is able to turn the ciphertext back into plaintext.
Stream Cipher
Stream ciphers are an extremely versatile and secure form of cipher.
All stream ciphers can be represented by the following algorithm
#![allow(unused)] fn main() { let state = init(key, iv); for block in message { // determine the block key from the state let block_key = hash(state); // progress the state forward state = forward(state); // apply the key to the block by xor block ^= block_key } }
Stream ciphers allow for encrypting data that can be streamed (hence the name). This allows you to decrypt data without first having decrypted the block before.
The enigma machine is a famous example of a stream cipher, however instead of XOR it used substitution.
Theory
If the hash function is a strong pseudo random number generator, then the key stream will be a high entropy random key. Since a one-time-pad (OTP) is provably secure with a perfectly random key, then the stream cipher key stream will imitate that.
Insecurities lie in the deterministic factor of the key stream. Given a known
block plaintext, the block key is easily determined. If there's a way to reverse the
let block_key = hash(state) step, then an attacker can decrypt all following blocks.
This was the failure of the enigma machine. The state was directly related to the substitutions.
If the substitutions could be determined, then the state was trivially brute-forcable.
Salsa/ChaCha
Salsa and ChaCha are two popular and secure stream ciphers. They achieve high performance even in software implementations. They are configurable, allowing you to change the number of rounds the state goes through to get the block key.
Standard configurations are
- Salsa8
- Salsa12
- Salsa20
- ChaCha8
- ChaCha12
- ChaCha20
Because the number defines how many rounds the state goes through to get the key, the performance is inversely proportional to the number of rounds.Eg Salsa8 is 2.5x faster than Salsa20.
The implementations of Salsa and ChaCha are almost identical, so they have equivalent speeds but the ChaCha variant is considered more secure as it's hash function is said to mix the internal state more than the Salsa hash function.
There is also the "extended" XSalsa and XChaCha algorithms which allow you to use a larger initialisation value.
| Algorithm | Key Size | Initialisation Value Size |
|---|---|---|
| Salsa | 256 bit | 64 bit |
| XSalsa | 256 bit | 192 bit |
| ChaCha | 256 bit | 96 bit |
| XChaCha | 256 bit | 192 bit |
Salsa20 is well known for being the cipher behind NaCl's secretbox. ChaCha20 is well known for being a stream cipher in TLS 1.2/1.3