Introduction

Disclaimer

This book is not written by experts in cryptography and cryptanalysis. We take no liability in the security of your systems inspired by the text within. We will however link to sources for any claims made within the book.

What is this guide

So, you want to roll-your-own-crypto, but are told not to? You're in the right place. This guide will cover the primitives and standard constructions, as well as listing common attacks against crypto. Before getting started on your ideas, I would recommend reading everything in this guide.

Primitives

Crypto primitives are the building blocks of any good crypto system. Do not try to write your own primitive. Use one of the well-known implementations for your language. Many primitives are vulnerable to implementation quirks such as timing attacks or cache-based key leaking.

Cryptographic Hash Functions

A hash function takes an input set of bytes and outputs a 'digest' set of bytes. The digest is always the same if the input is the same. For a cryptographic hash, the reverse operation from digest to input is not possible.

Randomness

The digest of a cryptographic hash function is pseudo-random.

Uniqueness

A cryptographic hash digest is not guaranteed to be unique, but up to a certain number of inputs, it can be assumed with a high probability. If we assume a 256-bit digest, it would take 480 octillion different hash calculations to have even a 1 in 1 quintillion chance of having a collision. For the breakdown of the mathematics, read about Birthday attacks.

This uniqueness assumption relies on the cryptographic hash still being considered secure. MD5 and SHA1 both have practical collision attacks.

SHA

Standing for "Secure Hash Algorithm", this is a family of cryptographic hash functions standardized by the NSA.

SHA-1

Do not use SHA-1.

SHA-2

SHA-2 is a good cryptographic hash function. Because it is such a commonly used function, it has several instructions in x86 SSE and many modern ARMv8 chips. On an M2 Max, SHA-256 can hash about 2.4GB/s, and SHA-512 can hash about 1.5GB/s

There are many forms of SHA-2:

  • SHA-224 - 256 bit state/224 bit output
  • SHA-256 - 256 bit state/256 bit output
  • SHA-512/224 - 512 bit state/224 bit output
  • SHA-512/256 - 512 bit state/256 bit output
  • SHA-384 - 512 bit state/384 bit output
  • SHA-512 - 512 bit state/512 bit output

Some of them truncate the state, which reduces the effectiveness of certain length-extension attacks.

SHA-3 (Keccak)

SHA-3 is a very different construction to SHA-2. It is considered to have far better security, especially in a post-quantum world. However, it does not yet have common hardware support and thus is considerably slower.

BLAKE

Message Authentication Codes

Message Authentication Codes are similar to hash functions. They take arbitrary bytes and output a random but deterministic value, however, MACs use a key to authenticate that the code was produced by a trusted party.

Length Extensions

Naive MACs using a hash function are vulnerable to length extension attacks. For example, let's say you have a construction like HASH(key || message). Unfortunately, with SHA-256, it is trivial to extend the message with arbitrary data and calculate a new forge signature. Example code

HMAC

HMAC stands for "Hash-based Message Authentication Code". It is very simple, and allows for a generic hash function.

  1. If the key is too large, perform key = hash(key)
  2. Calculate inner = hash((key ^ ipad) || message)
  3. Calculate digest = hash((key ^ opad) || digest)

And that is it.

BLAKE

KMAC

Key Derivation Functions

HKDF

BLAKE3

Argon2

Cipher

Ciphers represent symmetrical encryption. They take some input data (plaintext), a key, and an initialisation value and they turn the data into a random looking blob (ciphertext) of the same length. The IV and random ciphertext can be safely made public, and only the owner of the key is able to turn the ciphertext back into plaintext.

Stream Cipher

Stream ciphers are an extremely versatile and secure form of cipher.

All stream ciphers can be represented by the following algorithm

#![allow(unused)]
fn main() {
let state = init(key, iv);
for block in message {
    // determine the block key from the state
    let block_key = hash(state);
    // progress the state forward
    state = forward(state);
    // apply the key to the block by xor
    block ^= block_key
}
}

Stream ciphers allow for encrypting data that can be streamed (hence the name). This allows you to decrypt data without first having decrypted the block before.

The enigma machine is a famous example of a stream cipher, however instead of XOR it used substitution.

Theory

If the hash function is a strong pseudo random number generator, then the key stream will be a high entropy random key. Since a one-time-pad (OTP) is provably secure with a perfectly random key, then the stream cipher key stream will imitate that.

Insecurities lie in the deterministic factor of the key stream. Given a known block plaintext, the block key is easily determined. If there's a way to reverse the let block_key = hash(state) step, then an attacker can decrypt all following blocks. This was the failure of the enigma machine. The state was directly related to the substitutions. If the substitutions could be determined, then the state was trivially brute-forcable.

Salsa/ChaCha

Salsa and ChaCha are two popular and secure stream ciphers. They achieve high performance even in software implementations. They are configurable, allowing you to change the number of rounds the state goes through to get the block key.

Standard configurations are

  • Salsa8
  • Salsa12
  • Salsa20
  • ChaCha8
  • ChaCha12
  • ChaCha20

Because the number defines how many rounds the state goes through to get the key, the performance is inversely proportional to the number of rounds.Eg Salsa8 is 2.5x faster than Salsa20.

The implementations of Salsa and ChaCha are almost identical, so they have equivalent speeds but the ChaCha variant is considered more secure as it's hash function is said to mix the internal state more than the Salsa hash function.

There is also the "extended" XSalsa and XChaCha algorithms which allow you to use a larger initialisation value.

AlgorithmKey SizeInitialisation Value Size
Salsa256 bit64 bit
XSalsa256 bit192 bit
ChaCha256 bit96 bit
XChaCha256 bit192 bit

Salsa20 is well known for being the cipher behind NaCl's secretbox. ChaCha20 is well known for being a stream cipher in TLS 1.2/1.3

Block Cipher

AES

Crypto Systems

Public Key Encryption