...And you will know me by the trail of bits

Updated: 14 hours 57 min ago

Announcing AES-GEM (AES with Galois Extended Mode)

Fri, 2024/07/12 - 15:00

By Scott Arciszewski Today, AES-GCM is one of two cipher modes used by TLS 1.3 (the other being ChaCha20-Poly1305) and the preferred method for encrypting data in FIPS-validated modules. But despite its overwhelming success, AES-GCM has been the root cause of some catastrophic failures: for example, Hanno Böck and Sean Devlin exploited nonce misuse to inject their Black Hat USA slide deck into the MI5 website. Security researchers have been sounding the alarm about AES-GCM’s weaknesses for years. Nineteen years ago, Niels Ferguson submitted a paper to a NIST project on block cipher modes outlining authentication weaknesses in AES-GCM (although NIST would ultimately standardize it). And earlier this year, Amazon published a paper that detailed practical challenges with AES-GCM and posited that AES’ 128-bit block size is no longer sufficient, preferring a 256-bit block cipher (i.e., Rijndael-256). To address these issues, I propose a new block cipher mode called Galois Extended Mode (GEM for short), which I presented last month at the NIST workshop on the requirements for an accordion mode cipher. AES-GEM improves the security of GCM in every dimension with minimal performance overhead. Important: The current design for AES-GEM is not ready for production use, as some details will likely change in the future. To understand the current design, let’s start by understanding where AES-GCM falls short, and then discuss how we can do better with GEM. How AES works Before we dive in, it may be helpful for some readers to explain some of the terms and concepts used throughout this blog post. AES (Advanced Encryption Standard) is a block cipher widely used to encrypt information. It supports multiple key sizes (128-, 192-, and 256-bit keys) but always operates on 128-bit blocks. AES is the standardized form of the Rijndael family of block ciphers. Rijndael supports other block sizes than 128-bit, but only the 128-bit blocks were standardized by NIST. Modern processors provide dedicated hardware instructions for accelerating AES operations, but the AES key schedule can still negatively impact performance. ECB (Electronic Code Book) mode is the absence of a block cipher mode of operation. It involves computing the block cipher directly on a block of data. ECB mode is not semantically secure, as many cryptographers have demonstrated. For improved security, block ciphers like AES are typically used with a mode of operation. (If not, they almost certainly should be. Get in touch with our cryptography team if you think you’re using ECB to encrypt sensitive data.) CTR (Counter Mode) is a mode of operation for block ciphers wherein an increasing sequence of values is encrypted with the block cipher to produce a pseudorandom keystream. To encrypt data, simply calculate the XOR of each plaintext byte with each corresponding keystream byte. GCM (Galois/Counter Mode) is a block cipher mode of operation that provides authenticated encryption. It is what cryptographers call an AEAD mode: Authenticated Encryption with Additional Data. GCM can provide confidentiality for sensitive data and integrity for both sensitive and public data. AEAD modes are important for designing cryptosystems that are resilient to attackers who attempt to mutate encrypted data in order to study the system’s behavior in hopes of learning something useful for cryptanalysis. GCM is a composite of Counter Mode (CTR) for encrypting the plaintext and a Galois field Message Authentication Code (GMAC), which authenticates the ciphertext (and, if provided, additional associated data). GMAC is defined with a function called GHASH, which is a polynomial evaluated over the authenticated data. The output of GHASH is XORed with an encrypted block to produce the final authentication tag. The authentication key, called H, is calculated by encrypting a sequence of 128 zeroed bits. POLYVAL is an alternative to GHASH that is used in AES-GCM-SIV. The irreducible polynomial used by POLYVAL is the reverse of GHASH’s irreducible polynomial. Many cipher modes (including GCM and CTR) require a number that is used only once for each message. This public number that should never be repeated is called a nonce. Finally, the birthday bound is a concept from probability theory that indicates the likelihood of collisions in a set of random values. In cryptography, it implies that if nonces are selected randomly, the probability of two nonces colliding increases significantly as more nonces are used. For AES-GCM with 96-bit nonces, after about 232 messages, there’s a 1 in 232 chance of a nonce collision, which can lead to security vulnerabilities such as the ability to forge messages. Practical challenges with AES-GCM today The biggest challenge with AES-GCM, as others have pointed out, is that AES only has a 128-bit block size. This has two primary consequences:

The size of the public nonce and internal counter is constrained to a total of 128 bits. In practice, the nonce size is usually 96 bits, and the counter is 32 bits. If a larger nonce is selected, it is hashed down to an appropriate size, which has little improvement on security. If you ever reuse a nonce, you leak the authentication subkey and can, therefore, forge messages indefinitely.
Above a certain number of blocks encrypted under the same key, an attacker can distinguish between ciphertext and random bytes with significant probability.

When you understand that we’re dealing with powers of two, 96 bits of nonce space may sound like a lot, but if you’re selecting your nonces randomly, you can encrypt only 232 messages before you have a 2-32 probability of a collision. Using a cipher with a larger block size would alleviate this pain point, but it’s not the only way to fix it. The AES block size is not the only problem with AES-GCM in practice. As Niels Ferguson pointed out in 2005, a successful forgery against short tags reveals the authentication subkey. Further, we also learned that AES-GCM has an unexpected property where multiple keys can decrypt the same ciphertext + authentication tag. Its discoverers referred to this problem as Invisible Salamanders because it allowed them to hide a picture of a salamander from an abuse-reporting tool in an encrypted messaging application. Mitigating against Invisible Salamanders in a protocol that uses AES-GCM requires some one-way commitment of the key used. Finally, the maximum plaintext length for a single message in AES-GCM is relatively small: just below 64 GiB. In order to cope with this maximum length, software often decomposes larger messages into shorter frames that fit within this length constraint. This leads to the limited nonce space before the birthday bound being used up much more quickly than if longer messages were tolerable. Introducing AES-GEM Our proposal, Galois Extended Mode, is a modification of GCM (Galois/Counter Mode) that currently addresses most of these weaknesses. However, there is still an open question about which tactic we want to employ to mitigate the last pain point, which I will explain momentarily. At a high level, we propose two variants: AES-128-GEM and AES-256-GEM. We also specify two AEAD constructions using the standard AEAD interface. AES-128-GEM

Key length: 128 bits
Subkey length: 128 bits
Nonce length: 192 bits
Maximum plaintext length: 261 – 1 bytes
Maximum AAD length: 261 – 1 bytes
Tag length: 48 bytes (AEAD) or 16 bytes (without commitment)

AES-256-GEM

Key length: 256 bits
Subkey length: 256 bits
Nonce length: 256 bits
Maximum plaintext length: 261 – 1 bytes
Maximum AAD length: 261 – 1 bytes
Tag length: 48 bytes (AEAD) or 16 bytes (without commitment)

The road from GCM to GEM If you start with the existing design for AES-GCM and make the following changes, you will arrive at the current draft for GEM. Nonce extension First, we need a longer nonce, which we will use for subkey derivation in the next step. For 256-bit keys, a 256-bit nonce is a nice round number. For 128-bit keys, we end up needing 192 bits. In either case, the rightmost 64 bits will be reserved for the actual underlying encryption. The remaining bits (192 bits for AES-256, 128 bits for AES-128) are to be used for subkey derivation. This allows us to amortize the cost of the key derivation and set up an AES key schedule across multiple messages, provided the first (n – 64) bits of the nonce and key are the same. Subkey derivation There are multiple strategies for using AES for key derivation. At Real World Cryptography 2024, Shay Gueron presented DNDK-GCM, which uses an interesting construction to achieve subkey derivation. We want to keep things simple and well-understood. Consequently, we based our key derivation strategy on CBC-MAC since CMAC is already an FIPS-approved MAC (i.e., for AES-CCM). In the case of AES-256, we use two CBC-MAC outputs to derive a 256-bit subkey. However, this approach has one subtly annoying property: The two halves will never produce the same output, so there are, strictly speaking, fewer than 2256 possible outputs. In both variants of GEM, we borrow a trick from Salsa20’s design: XOR the output with the input key to ensure the subkey is indistinguishable from uniformly random to any attacker who doesn’t know the input key. If you don’t know this key, the output is indistinguishable from a random key of the appropriate length. Support for longer messages The reason we needed 64 bits of leftover nonce, rather than 96 as would be typical for GCM, is that our internal counter size is not 32 bits long. It is, instead, 64 bits long. Otherwise, as currently written, GEM behaves identically to what you’d expect from GCM: It uses counter mode for bulk data encryption. Let’s put a pin in that and revisit it in a moment. Improved authentication security Our incumbent design, AES-GCM, is constructed in the following way:

Derive an authentication subkey, H, by encrypting an all-cleared block with the key.
Calculate GHASH() of the ciphertext, associated data, and a block containing the lengths of both segments (in bits).
XOR the output of step 2 with the AES-CTR encryption of a counter block.

Our design is mostly the same, but with an important tweak:

Derive an authentication subkey, H, by encrypting an all-set block with the subkey.
Calculate GHASH() of the ciphertext, associated data, and a block containing the lengths of both segments (in bits).
Encrypt the output of step 2 with AES-ECB, using the input key.
XOR the output of step 3 with the AES-CTR encryption of a counter block.

Step 3 directly addresses the weaknesses that Niels Ferguson identified with AES-GCM in 2005. The other changes are implementation details. This tweak offers better security for short tags since the AES encryption of the raw GHASH output bits is a nonlinear transformation that is not invertible without the key. We use the input key rather than the subkey since the only other place the input key is used to encrypt data (i.e., subkey derivation) is never directly revealed. Key commitment Before we tackle GEM’s protection against Invisible Salamander-style attacks, we need to analyze some other subtleties of the design. The component lengths in the final block for both GCM and GEM are both expressed in terms of bits, not bytes, and are restricted to 264 each. This means that, even though GEM could theoretically allow up to 264 blocks (or 268 bytes) of plaintext per message due to its internal counter, we would have to tweak the final GHASH step to accommodate this extra overhead. Instead of doing that, the unreachable values for the internal counter are reserved for the cipher’s internal use. Specifically, the internal counter values that end in 0x02000000 00000000 through 0xFFFFFFFF FFFFFFFF cannot be reached while respecting this 261 – 1 byte limit on the plaintext. The all-set block (0xFFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF) is already used in GEM for the authentication subkey, while the 64-bit trailing nonce + 0xFFFFFFFF 0xFFFFFFFE is used for the counter block, for the final authentication tag calculation. To provide key commitment, the next two blocks down, the nonce + 0xFFFFFFFF 0xFFFFFFFC and 0xFFFFFFFF 0xFFFFFFFD will serve as a commitment value for the key and nonce. We specify two blocks because using one AES block here is not sufficient. Consider the case of AES-256, which has 256-bit keys and 128-bit blocks: by the pigeonhole principle, we expect there to be 2128 different keys that will map a given fixed plaintext value into a fixed ciphertext value. Therefore, a single block is not sufficient for commitment. However, no such pigeonhole consideration is necessary for two successive blocks, assuming the block cipher is secure. In this way, we can quickly generate a commitment value for a given key and nonce. In the AEAD interface, the commitment is appended to the authentication tag. Both must be compared to their recomputed values, in constant-time, when decrypting messages. AES-GEM’s performance characteristics Although we’ve addressed most of GCM’s pain points, the actual performance impact of GEM is minimal. AES-256-GEM:

Key derivation: four additional blocks of AES encryption, some XORs, one additional key schedule
Authentication: one additional block of AES encryption
Key commitment: two additional blocks of AES encryption

AES-128-GEM:

Key derivation: two additional blocks of AES encryption, some XORs, one additional key schedule
Authentication: one additional block of AES encryption
Key commitment: two additional blocks of AES encryption

Since AES is very fast these days due to hardware acceleration, this performance impact should be mostly unnoticeable in all but the most performance-sensitive of applications. In those cases, the key derivation performance cost can be amortized across up to 232 different messages if the derived subkey is cached. Polishing AES-GEM There is one final issue that the current draft of GEM does not sufficiently address, but we hope to discuss this issue at the NIST workshop and will certainly address it in the final design. Although our draft GEM construction allows for longer messages than GCM, the AES block size makes it risky to use as-is. The primary concern is that encrypting a very long message would give an attacker a significant advantage in distinguishing AES-GEM ciphertexts from sequences of random bytes. (This is one of the concerns raised in Amazon’s 2024 paper.) There are a few ways we can polish GEM to address this weakness, which have different performance characteristics and trade-offs. Wide block PRPs Over the years, many cryptographic designs have used wide block PRPs, such as AES in XTS mode, to securely encrypt more than the AES block size would normally allow. Since XTS is widely used in disk encryption, this method would likely prove secure. However, XTS mode is not currently standardized for use cases aside from disk encryption. Hierarchical key derivation What if, instead of using the subkey directly, we used the upper 32 bits of the internal counter to select a different value from the reserved nonce space, encrypt that, and derive a new subkey every 236 bytes? Then, we encrypt using this subkey for only the remaining 32 bits of the counter, which is analogous to what AES-GCM has been doing for decades. This sub-subkey derivation could be constructed similarly to the key commitment:

For AES-256-GEM, encrypt 32 bytes derived from the reserved nonce space, and use these as the actual CTR key.
For AES-128-GEM, encrypt 16 bytes derived from the reserved nonce space (but a distinct nonce space from what AES-256-GEM would select), and use it as the actual underlying CTR key.

This is an attractive option for multiple reasons. Most importantly, this tactic would sidestep the PRP distinguisher problem in a very direct way. It also doesn’t depend on any non-standard designs (like XTS mode). You could build the whole thing with FIPS approved components, as we did with the rest of our draft design for AES-GEM. The downside? This approach does incur yet another key schedule, every 236 bytes of plaintext. This probably still amortizes well, but is worth keeping in mind. Total performance cost of AES-GEM with hierarchical key derivation AES-256-GEM:

Key derivation: four additional blocks of AES encryption, some XORs, one additional key schedule
Additional key derivation every 236 bytes of plaintext: two additional blocks of AES encryption, one additional key schedule
Authentication: one additional block of AES encryption
Key commitment: two additional blocks of AES encryption

Total additional overhead for a 1 GB plaintext: seven blocks of AES-256, two additional AES-256 key schedules Total additional overhead for a 1 TB plaintext: 37 blocks of AES-256, 17 additional AES-256 key schedules AES-128-GEM:

Key derivation: two additional blocks of AES encryption, some XORs, one additional key schedule
Additional key derivation every 236 bytes of plaintext: one additional block of AES encryption, one additional key schedule
Authentication: one additional block of AES encryption
Key commitment: two additional blocks of AES encryption

Total additional overhead for a 1 GB plaintext: five blocks of AES-128, two additional AES-128 key schedules Total additional overhead for a 1 TB plaintext: 21 blocks of AES-128, 17 additional AES-128 key schedules Other ideas There may be yet another option that we haven’t imagined yet. Finding the best trade-off, especially when considering hardware design, is one reason why we’re presenting GEM at the NIST workshop. Cutting the GEM The IETF’s CFRG is currently discussing an RFC draft for a modified variant of AES-GCM that is secure for short tags, called GCM-SST. Their design uses POLYVAL rather than GHASH, for performance reasons, and uses a second authentication key (Q) with a second POLYVAL, which is all XORed together. Unsurprisingly, this additional XOR doesn’t significantly protect against the weakness of short tags in AES-GCM (although it does make the usual attack more expensive). Our initial design for GEM uses the AES block cipher to permute the GHASH output rather than simply introducing an additional linear operation (XOR) to the polynomial output. We are interested in partnering with other industry leaders to deliver a variant of GEM that emphasizes the short tag use case (i.e., WebRTC). This hypothetical variant (CUT-GEM, tentatively) could use POLYVAL instead of GHASH and use an epoch-based subkey derivation schedule to reduce the performance hit on each packet. Where can I learn more about AES-GEM? More information about AES-GEM is available on our GitHub!

Categories: Security Posts

...And you will know me by the trail of bits

Announcing AES-GEM (AES with Galois Extended Mode)

Latest blog posts

Security Posts