Random Number Generator Algorithm in CryptoSys Products


This document describes in detail the latest deterministic random number generator (RNG) algorithm used in our CryptoSys range of products since 2007: CryptoSys API and the CryptoSys PKI Toolkit.

The RNG has been implemented to conform to NIST Special Publication 800-90† Recommendation for Random Number Generation Using Deterministic Random Bit Generators [SP80090], first published June 2006, revised March 2007. Entropy is accumulated in "Fortuna" pools as described in Ferguson and Schneier, Practical Cryptography, chapter 10, "Generating Randomness" [FERG03].

Note 2013-09-21: Our implementation does not use the Dual EC_DRBG component of NIST 800-90 that has been revealed to contain an NSA backdoor.

This supersedes the old RNG implementation used prior to 2007 which complied with ANSI X9.31 Appendix A.2.4 [AX931] (which replaced X9.17 Appendix C). We assert that the NIST method as implemented here is at least as secure as the ANSI X9.31-compliant method under any circumstances.

This document describes the implementation for the Windows® operating system. The Linux implementation is similar but does not use thread-safe techniques and the personalisation string uses fewer parameters.

This information is published for peer review and comment. Please send us a message.

Contents

Definitions

Random Number:
1. A number selected from a known set of numbers in such a way that each number in the set has the same probability of occurrence. 2. A number obtained by chance. 3. One of a sequence of numbers considered appropriate for satisfying certain statistical tests or believed to be free from conditions that might bias the result of a calculation.
(Federal Standard 1037C).
Random Number Generator:
Random Number Generators (RNGs) used for cryptographic applications typically produce a sequence of zero and one bits that may be combined into sub-sequences or blocks of random numbers. There are two basic classes: deterministic and nondeterministic. A deterministic RNG consists of an algorithm that produces a sequence of bits from an initial value called a seed. A nondeterministic RNG produces output that is dependent on some unpredictable physical source that is outside human control. There are no FIPS Approved nondeterministic random number generators. [FIPS140]
Random Bit Generator (RBG):
A device or algorithm that outputs a sequence of binary bits that appears to be statistically independent and unbiased. An RBG is either a Deterministic RNG (DRBG) or a Non-deterministic RBG (NRBG). [SP80090]
Pseudo-Random Number Generator:
A pseudo-random number generator, or PRNG, is a random number generator that produces a sequence of values based on a seed and a current state. Given the same seed, a PRNG will always output the same sequence of values. [LAVA]
Randomness:
Random data that is unpredictable to the attacker, even if he is taking active steps to defeat our randomness. [FERG03]
Entropy:
A measure of the disorder, randomness or variability in a closed system. The entropy of X is a mathematical measure of the amount of information provided by an observation of X. As such, entropy is always relative to an observer and his or her knowledge prior to an observation. [SP80090]
The average number of bits you would need to specify the value if you could use an ideal compression algorithm. [FERG03]

Terminology

We use the term RNG in this document to mean a cryptographically-secure PRNG, deterministic RNG or deterministic RBG (DRBG). All these terms mean the same thing for our purposes.

RNG ≡ RBG ≡ DRBG ≡ PRNG

In particular, the terms random number generator (RNG) and random bit generator (RBG) are interchangeable. The output from a RNG or RBG is a sequence of zero and one bits. In our case, the output is always in 8-bit blocks (bytes, octets).

Design Principles

Our objective for our RNG is to produce, on request, a sequence of the required number of random bits. The RNG should be in compliance with FIPS 140-2 and NIST SP800-90, and any issues must be documented. The RNG must be compatible with a general-purpose cryptographic library which must be usable on any 32-bit variant of the Windows® operating system in a thread-safe manner. It must not interfere with the operation of the library unless it fatally fails.

Background Theory

We use two basic references for the background theory: NIST Special Publication 800-90 Recommendation for Random Number Generation Using Deterministic Random Bit Generators [SP80090] and Ferguson and Schneier, Practical Cryptography, chapter 10, "Generating Randomness" [FERG03]. Ferguson and Schneier is much easier to read and we have drawn on several sections of their book here (because they can write about the subject much better than we can).

Types of PRNG

In broad terms, there are three levels of PRNG.

1. Statistically random
A statistically-random PRNG will pass various statistical tests of randomness, for example, the old FIPS 140-2 tests (as in the October 2001 update) [FIPS140]. Almost any output from repeated application of a cryptographic hash or encryption function will pass these tests. The FIPS 140-2 statistical tests are no longer in the Standard (they were withdrawn in December 2002), but they still provide a useful check that you are not accidentally producing non-random garbage.
2. Cryptographically strong
If an attacker sees a lot of the random data generated by the PRNG, she should not be able to predict anything about the rest of the output of the PRNG [FERG03].
3. Security strength of n bits
A PRNG with an intended security strength of n bits means that the amount of work (that is, the number of operations) required to break the system is in the order of 2n.

A statistically-random PRNG is not necessarily cryptographically-secure. The concept of security strength is an attempt to quantify just how cryptographically secure it is.

PRNG mechanism

PRNGs work by keeping an internal state. Typically this is a seed and a key, which are kept secret. When a consumer requests random data, a cryptographic algorithm operates on the seed and the key to produce pseudo-random output. The internal state is then updated so that the next request does not produce the same data. Ferguson and Schneier [FERG03] describe a simple generator using AES-256 and a 128-bit counter. NIST SP800-90 [SP80090] specifies a whole smorgasbord of generators using message digest hashes, HMACS, block ciphers and even elliptic curves. The idea is that designers can use whichever cryptographic function is already available to them.

Some typical pseudo-code for a PRNG generator might be:

INPUT:  (Key, Seed)
OUTPUT: random_data, (Key', Seed')
   random_data = F(Key, Seed)
   Key' = F(Key, Seed+1)
   Seed' = F(Key', Seed)
   return random_data

where F is a cryptographic function. All the generators are essentially some variant of this.

Backtracking resistance and prediction resistance

NIST SP800-90 formalises the resistance to attacks with the concepts of Backtracking Resistance and Prediction Resistance. (In the following, remember that PRNG, RBG and DRBG all mean the same thing.)

Backtracking Resistance
The assurance that the output sequence from an RBG remains indistinguishable from an ideal random sequence even to an attacker who compromises the RBG in the future, up to the claimed security strength of the RBG. For example, an RBG that allowed an attacker to "backtrack" from the current working state to generate prior outputs would not provide backtracking resistance. [SP80090]
Prediction Resistance
Assurance that a compromise of the DRBG internal state has no effect on the security of future DRBG outputs. That is, an adversary who is given access to all of the output sequence after the compromise cannot distinguish it from random; if the adversary knows only part of the future output sequence, he cannot predict any bit of that future output sequence that he has not already seen. [SP80090]

Backtracking resistance is provided by ensuring that the DRBG generator algorithm is a one-way function. This is easy: all the DRBG mechanisms in NIST SP800-90 provide backtracking resistance.

Prediction resistance depends on the Reseed process; that is, the ability to effectively reseed after a compromise but before the next request. This is more difficult.

Attack Models for a PRNG

Source: [FERG03]

  1. The attacker attempts to reconstruct the internal state from the output. This is a classic cryptographic attack, and rather easy to counter using cryptographic techniques.
  2. The attacker is at some point able to acquire the internal state. If no further entropy is added, then the attacker can follow all the outputs and all the updates of the internal state. This means that if the the PRNG is ever attacked successfully, then it can never recover to a secure state.

    To solve: mix in entropy from truly-random events into the internal state.

  3. If the entropy added is only in small amounts - as it most likely will be - the attacker makes frequent requests for random data from the PRNG. As long as the total amount of entropy added between two such requests is limited to say, 30 bits, then the attacker can simply try all possibilities for the random inputs and recover the new state after the mixing.

    The best defence against this particular attack is to pool the incoming events that contain entropy. You collect entropy until you have enough to mix into the internal state without the attacker being able to guess the pooled data. How much is enough? You want to have 128 bits of entropy. But here is the real problem: making any kind of estimate of the amount of entropy is extremely difficult, if not impossible. It depends heavily on how much the attacker knows or can know, but that information is not available to the developers during the design phase.

Fortuna Accumulation Pools

Source: [FERG03]

To reseed the generator, we need to pool events in a pool large enough that the attacker can no longer enumerate the possible values for the events in the pool. Fortuna solves the problem of how many events to collect in a pool before using it to reseed the generator.

There are 32 pools: P0,P1,...,P31. Each pool conceptually contains a string of bytes of unbounded length but in practice contains the partly-computed hash of the string as it is assembled in the pool.

Each source distributes its random events over the pools in a cyclical fashion. This ensures that the entropy from each source is distributed more or less evenly over the pools. Each random event is appended to the string in the pool in question.

We reseed the generator every time pool P0 is long enough. Reseeds are numbered 1,2,3,.... Depending on the reseed number r, one or more pools are included in the reseed. Pool Pi is included if 2i is a divisor of r. Thus P0 is used in every reseed, P1 every other reseed, P2 every fourth reseed, etc. After a pool is used in a reseed, it is reset to the empty string.

Irrespective of how many fake random events the attacker generates, or how many of the events he knows, as long as there is at least one source of random events he can't predict, there will always be a pool that collects enough entropy to defeat him.

To prevent the attacker injecting so many events that even if pool P31 does not contain enough randomness between reseeds to recover from a compromise, we limit the speed of the reseeds. A reseed will only be performed if the previous reseed was more than 100 milliseconds ago, so it will take more than 13 years before P32 would have been used, had it existed.

Measuring Randomness

Source: [FERG03]

The measure of randomness is called entropy. Entropy measures how uncertain you are about the value. You can think of entropy as the average number of bits you would need to specify the value if you could use an ideal compression algorithm. The more you know about a value, the smaller its entropy is.

Mathematically, the definition of entropy, H(X), for a variable X is

entropy formula

where P[X=x] is the probability that the variable X takes on the value x.

For example, if you have a value consisting of a sequence of 16 bytes that are completely random; that is, each of the sequence of 128 bits (16 bytes x 8 bits/byte = 128 bits) has been chosen in an unbiased manner, then the 16-byte value has 128 bits of entropy. You would say that the security strength of the value is 128 bits, or the amount of work required to break the security is 2128 operations. This is computationally infeasable. (The hard part is to select the bytes in an unbiased manner.)

But if you know that each byte has been chosen from the set of, say, the two values {0x00, 0xFF} (so the sequence is made up of bytes that are either 0x00 or 0xFF in some random order), then you only have 16 bits of entropy. Each byte in the sequence has entropy of only 1 bit, so the sequence has 16 bits. Thus the work required to break the security has been reduced to 216 operations: a mere 65,000 guesses.

At the far extreme, if an attacker knows exactly what these 16 bytes are, then you have zero bits of entropy.

So the amount of entropy can be anything between zero and the actual size of the value in bits depending on how much the attacker knows. It's relative to an observer and his knowledge prior to an observation.

The Standards

From FIPS PUB 140-2 Annex C Approved Random Number Generators for FIPS PUB 140-2 [FIPS140XC], January 24, 2007:

ANNEX C: APPROVED RANDOM NUMBER GENERATORS

Annex C provides a list of the FIPS Approved random number generators applicable to FIPS PUB 140-2
(snip)...
6. National Institute of Standards and Technology, Recommendation for Random Number Generation Using Deterministic Random Bit Generators, Special Publication 800-90, June 2006.

From FIPS PUB 140-2 Section 4.9.2 Self-Tests - Conditional Tests [FIPS140]:

Continuous random number generator test.

If a cryptographic module employs Approved or non-Approved RNGs in an Approved mode of operation, the module shall perform the following continuous random number generator test on each RNG that tests for failure to a constant value.

1. If each call to a RNG produces blocks of n bits (where n > 15), the first n-bit block generated after power-up, initialization, or reset shall not be used, but shall be saved for comparison with the next n-bit block to be generated. Each subsequent generation of an n-bit block shall be compared with the previously generated block. The test shall fail if any two compared n-bit blocks are equal.

Our Design

Our PRNG functions use the HMAC_DRBG mechanism as specified in Section 10.1.2 of NIST SP800-90 with SHA-1 as the underlying hash function.

Specification

Highest supported security strength = 128
Output block length (outlen) = 160 bits
Maximum number of bits per request = 2^19 bits (i.e. 65536 bytes)
Reseed interval = 2^30 (< 2^48 max for HMAC_DRBG) or as dictated by the Fortuna algorithm
Maximum length of personalization string = 160 bits
Maximum length of entropy input = 5120 bits (32 pools x 160 bits)
Maximum additional input length = 2^34 bits (2^31 bytes) 

Each thread has its own Generator in Thread Local Storage. Each process has one Accumulator accessed by all Generators and protected by a Critical Section when accessed. The accumulator has 32 "Fortuna" accumulation pools with the minimum pool size before a reseed set to 32 bytes.

The "personalization string" used on instantiation in each thread is derived by hashing the current time, process ID, thread ID, user name and computer name, and so is almost certain to be different each time. The instantiation nonce is a 32-bit value derived from the current time which is incremented on the instantiation of any new Generator in a different thread.

The random events polled by the accumulator include the system time, the clock count, the memory status, the position and class name of each window, the free disk space, and other system parameters. Every event is time-stamped to the accuracy of the system clock, which means that, in the worst case, we have a cumulative hash of the times of every event polled since power up.

In the absolute worst case, if no seed file is used and an attacker can call the GenerateRandomData function before any entropy has been generated by the system, the output is effectively a "strong" hash of the current time and personalization string with good backtracking resistance. This is at least equivalent to an X9.31-compliant generator.

Our Reseed Process

We do not make available to the consumer either a reseed_required_flag or a prediction_resistance_request. This is by design to prevent a clash with the Fortuna accumulation system.

Reseeds are carried out when either:

  1. The Fortuna algorithm decides; or
  2. The NIST DRBG mechanism dictates.

The Fortuna algorithm will reseed every time pool P0 is long enough (in our case, when its length is 32 bytes or more) AND the time-since-last-reseed is greater than 100 milliseconds. This is not directly under the consumer's control, although he can force it eventually by repeatedly calling the library functions known to trigger entropy accumulation.

The NIST DRBG mechanism reseeds on either (a) first use; or (b) at the end of the seed life. The seed life of the DRBG mechanism is deliberately set high to reduce the risk of an attacker forcing a reseed by repeatedly requesting random data.

If the DRBG mechanism requires a reseed, then it requests entropy from the Fortuna pools, which is provided unconditionally.

Health Check

A random number generator Health Check is carried out on power up and every time a new RNG generator is instantiated in a new thread. It can also be carried out on demand. The health check performs self-tests to obtain assurance that the DRBG continues to operate as designed and implemented according to section 11.3 of [SP80090] including Known Answer Testing, Testing the Instantiate Function, Testing the Generate Function, Testing the Reseed Function and Testing the Uninstantiate Function.

Continuous RNG Test

We use a 64-bit value for continuous checks as required in Section 4.9.2 of FIPS 140-2. Every time the consumer requests a set of random data, we generate an extra 64-bit value which is not output but is stored for comparison purposes. If this matches the first 64 bits of the next about-to-be-output data, then we throw a catastrophic error. If the length of the requested random data is less than 64 bits, then we pad the about-to-be-output data to 64 bits with zeroes before comparison. On start-up, we generate a 64-bit block that is not used for output but is saved for comparison with the next request.

In pseudo-code:

INPUT:  nbits, previous_block
OUTPUT: random_data, previous_block'
if (previous_block != Null) 
{
   random_data = GenerateRandomData(nbits)
   if (nbits < 64)
      tocompare = random_data || 0^(64 - nbits)
   else
      tocompare = LeftMostBits(random_data, 64)
   if (tocompare == previous_block)
      return "catastrophic error"
}
previous_block' = GenerateRandomData(64)
return random_data

A strict reading of FIPS 140-2 would seem to require a check of every successive 64-bit block generated by the DRBG_Generate function where the requested number of bits is greater than 64. We assert that this pointless unless the HMAC-SHA-1 function is corrupted. Furthermore, and far more serious, storing every generated block to compare with the next would expose a huge security hole.

Why we did it the way we did

We chose the HMAC_DRBG mechanism with SHA-1 because:

Potential Issues

We can see at least the following potential problems. We'd be happy to discuss them if you have some constructive comments.

Contradictions in the reseed process between NIST 800-90 and Fortuna
The NIST 800-90 algorithm is specific in requiring a reseed of a certain security strength when necessary, especially at start up. The Fortuna algorithm avoids having to measure entropy by removing the potential for an attacker to compromise the pools. These two concepts may conflict. In particular, if the NIST algorithm requires entropy, we will provide it unconditionally from the Fortuna pools, regardless of how much entropy has been collected. We have to do this to prevent the RNG functions blocking the operation of the library. The user is advised of this issue and recommended to use a seed file on start up. The NIST reseed interval is deliberately kept at a high value to prevent this being a point of attack.
Applications that use multiple threads
Some applications, especially .NET ones, fire off threads at a huge rate. You initialize with a strong seed file in one thread and this adds entropy and security to the generator in that thread. Later - perhaps even just one instruction later - you call RNG_Generate but the system decides to call that function in another thread. In that case, the generator in the new thread will be initialized again without the extra entropy from the seed file. It will only have the basic nonce and personalisation string plus a small amount of entropy from the pools that may have been collected in the interim. How can you tell this has happened? You can't. Update: The algorithm in all products was updated in 2010 to mix seed and user-generated entropy from all threads into the pools.

The Source Code

Actually, we don't do any of this. We just use the rand() function. Here is the source code. We know nobody ever reads this far :-)

#include <stdlib.h>
#include <time.h>

void rng_bytes(unsigned char *buf, int nbytes)
{
   static unsigned int seed;
   int i;

   if (seed == 0)
   {
      seed = (unsigned)time(NULL) ^ (unsigned)clock();
      srand(seed);
   }
   for (i = 0; i < nbytes; i++)
      buf[i] = rand() & 0xFF;
}

References

Contact

For more information or to comment on this page, please send us a message.

This page last updated 4 June 2014