BotGadget: What is Entropy and How Do I Get More of It?

Let’s start off with one of my favorite quotes from John von Neumann: “Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin. For, as has been pointed out several times, there is no such thing as a random number — there are only methods to produce random numbers, and a strict arithmetic procedure of course is not such a method.”

What von Neumann is getting at is that the “pseudo” in pseudorandom number generator (PRNG) is really a synonym for “not at all”. Granted, if you come in the middle of a good PRNG sequence, guessing the next number is nearly impossible. But if you know, or can guess, the seed that started the PRNG off, you know all past and future values nearly instantly; it’s a purely deterministic mathematical function. This shouldn’t be taken as a rant against PRNGs, but merely as a reminder that when you use one, the un-guessability of the numbers that it spits out is only as un-guessable as the seed. And while “un-guessability” isn’t a well-defined mathematical concept, or even a real word, entropy is.

That’s why entropy matters to you. Almost anything that your computer wants to keep secret will require the generation of a secret random number at some point, and any series of “random” numbers that a computer generates will have only as much entropy, and thus un-guessability, as the seed used. So how does a computer, a deterministic machine, harvest entropy for that seed in the first place? And how can you make sure you’ve got enough? And did you know that your Raspberry Pi can be turned into a heavy-duty source of entropy? Read on!

Everything You Needed to Know About Entropy…

But first, to hammer von Neumann’s point home, here’s an algorithm for a PRNG that will pass any statistical test you throw at it: start counting up integers from one, hash that counter value, increase the counter by one, and append the new counter value to the previous hash and hash it again. The result will be a hash chain, a “random” looking series of numbers that is entirely predictable — that’s the “pseudo” in PRNG. If I knew you were doing this, and that you’d only used up 100 or so numbers so far, it wouldn’t take me long to do the same thing and figure out where you were in the series. I’d have a much harder time cracking your series if I only knew that you were using the same algorithm but started somewhere between 1 and 1,000,000 instead of at 1.

Claude Shannon‘s concept of entropy is essentially this: count up the minimum number of yes/no questions it would take to figure out whatever your secret is. That number is the same as the base-2 logarithm of the number of possible secrets that could have been drawn, or equivalently the number of digits it would take to write out the number of possible secrets in binary.

Let’s say you had started your counter in the hash chain at 1, and had used no more than 128 values so far. There are only 128 possible states that your PRNG machine can be in. Expressing 128 in binary requires seven bits, $\log_2 128 = 7$ , and if I started asking you if the number was greater than 64, and subdividing the remaining interval each time, it would only take me seven guesses.

Entropy is a measure of the number of possible choices from which our secret value could have been drawn, and it’s a way to measure hardness-to-guess, strength of passwords, and it’s what people mean when they say that some procedure generates “kinda random” versus “very random” numbers. A random number between 1 and 1,000,000 has just under 20 bits of entropy. (The English language has around 1,000,000 words — you should never lose at “twenty questions”, or use a single word as a password.) A good long-term password should probably have in excess of 128 bits of entropy.

Quick quiz: If you generate a 256-bit random number from a good cryptographic PRNG, it will have 256 bits of entropy, right? Wrong! If you didn’t pick a seed completely randomly from at least 2²⁵⁶ possibilities, your number will have fewer than 256 bits of entropy. So let’s go harvest some entropy.

The Deep End of the Entropy Pool

Turn on your computer, or fire up a microcontroller. How much uncertainty is there in the system? It’s running exactly the same BIOS as last time you booted up, and probably all of the same OS and higher code. A computer is a deterministic machine running predictable code. Maybe the wall-clock time is different, but that’s pretty easily guessable. This means that the most reliable source of randomness exists just where any sysadmin would tell you it does — between the chair and the keyboard.

Any Linux system, on a single-board computer or on your high-powered workstation, pulls entropy from the last few digits in the timestamp of interrupts fired off by keyboard, mouse, disk drive, or other hardware events. Assuming you can’t press keys repeatably with microsecond accuracy, this is probably pretty random. These different sources of new data get hashed together with old data, and the amount of entropy in the pot increases slightly with each stirring.

Estimating how much additional entropy is added to the pot each time is difficult, bordering on impossible, but the Linux kernel has been shown to make a conservative underestimate. (This used to be the way it was done — anyone know if this is still in place?) Functionally, however, each keystroke or mouse update adds about one or two bits of entropy to the pool. These add up over time, and the system also keeps track of how many bytes you pull out. The result is like a gas gauge for entropy.

What’s My Entropy?

Anyway, it’s easy to see how much entropy you have available, and you can learn a lot by watching it go. Type cat /proc/sys/kernel/random/entropy_avail to see how many bits of entropy your computer has stored up right now. If all’s well, it should be somewhere near 3000 but below the maximum value of 4096 bits — anything greater than 256 is probably fine unless you’re going to be generating long SSH keys right now.

Now let’s watch entropy accumulation in real time. Type:

watch -n 1 cat /proc/sys/kernel/random/entropy_avail

to see your entropy estimate every second. Don’t do anything at first and you’ll see the value increasing slowly, if at all. That’s probably disk access or other system interrupts. Now type or move your mouse and watch the numbers jump. For instance, I had 2886 bits at the start of this sentence, but now I’ve got 3078, just from typing.

Now let’s see if we can use of some of that entropy. The first, and easiest way to burn up your entropy pool is just to dump whatever’s in /dev/random, the kernel’s random number generator. Type:

cat /dev/random > random_bits.bin or hexdump /dev/random

in another window and watch your entropy available drop to zero as random values are output to the terminal. Press ctrl-C to stop the madness, and watch how moving your mouse or typing on the keyboard will rebuild up the entropy. That’s it.

You should probably never do this in practice. Indeed, the whole point of gathering up entropy was to seed a good PRNG. Once that is done, there’s little point in pulling more entropy out of the pool unless you have reason to believe that your system is compromised. This brings us to /dev/urandom.

/dev/urandom

Now open up a browser to http://ift.tt/2z7hf9D and watch the entropy available. The entropy probably won’t fall, even though I asserted that the SSL connection would require a secret random number to be generated. That’s because any modern browser uses /dev/urandom instead of /dev/random for its random numbers. urandom is a PRNG that’s periodically re-seeded from the system’s entropy pools when they contain enough estimated entropy.

Since the output of /dev/urandom is a PRNG, it’s only as good as the seed, but since the seed is drawn from the system entropy pool when it’s got more than 256 bits of entropy (as of kernel 4.8) you could hardly ask for more. And since this PRNG is re-seeded periodically when the entropy pool has enough in it, it’s very hard to exploit even if your computer is taken over by baddies.

/dev/random versus /dev/urandom is a bit of a holy war in the history of the Linux kernel, where the main difference to the end-user is that /dev/random will wait to deliver numbers until it has enough entropy, while /dev/urandom is a free-running PRNG that’s already been seeded from /dev/random, ideally with enough entropy. The one time that the difference between the two is really crucial is at boot-up, before there is enough entropy in the system to properly seed the PRNG in /dev/urandom. The rest of the time, just use /dev/urandom.

Raspberry Pi and the HW RNG

But consider the fate of a standalone, headless server (or a microcontroller for that matter) with no human typing or mousing around, and no spinning iron drive providing mechanical irregularity. Where does it get entropy after it starts up? What if an attacker, or bad luck, forces periodic reboots? This is a real problem.

The easiest answer is a built-in hardware random number generator. Once the purview of the paranoid, on-chipset hardware to generate random bits is now relatively common. Recent Intel chipsets support the RDRAND instruction natively, and even the Raspberry Pis have a hardware RNG onboard. Let’s test it out!

As above, you can check on your RPi’s available entropy with cat /proc/sys/kernel/random/entropy_avail, but let’s do something more dramatic. Try to get 1024 bytes of randomness from /dev/random, which only buffers up 4096 bits, or 512 bytes. Type sudo dd if=/dev/random of=/dev/null bs=1024 count=1 iflag=fullblock and wait a while. Then press Ctrl-C because you’re going to be waiting for a long time, even if you’re wiggling the mouse. Human entropy is good entropy, but it takes a long time to get 512 bytes’ worth. This is the situation on every reboot. You shouldn’t be generating SSH keys for a while.

Now let’s get the hardware RNG contributing to system entropy: sudo apt-get install rng-tools. You can tweak configurations in the /etc/default/rng-tools file. In particular, for a RPi2, you might need to uncomment the HRNGDEVICE=/dev/hwrng line. Otherwise, you should be good to go. Retry that command to dump 1024 bytes of randomness: I get 135 kB/s throughput on my RPi3 — it’s done in eight milliseconds.

Even with the hardware random number generator in place, the system will mix hardware and human entropy, which is great news if you don’t fully trust the silicon black box. In the config file mentioned above, you can tweak the relative percentages of entropy from different sources. Cool. And remember, the main point of something like this in a headless system is just making sure that it has sufficient entropy to seed the PRNG in /dev/urandom before programs start asking it for random numbers.

You can test the quality of the random numbers that come out of the hardware RND easily enough: rngtest -c 1000 < /dev/hwrng and waiting 45 seconds or so will run the FIPS 140-2 test suite on 2,000,000 bits of random data. (You will need to be root to access /dev/hwrng directly like this.)

 
root@raspberrypi:/home/pi# rngtest -c 1000 < /dev/hwrng 
rngtest 2-unofficial-mt.14 
Copyright (c) 2004 by Henrique de Moraes Holschuh 
This is free software; see the source for copying conditions. 
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
rngtest: starting FIPS tests... 
rngtest: bits received from input: 20000032 
rngtest: FIPS 140-2 successes: 1000 
rngtest: FIPS 140-2 failures: 0 
rngtest: FIPS 140-2(2001-10-10) Monobit: 0 
rngtest: FIPS 140-2(2001-10-10) Poker: 0 
rngtest: FIPS 140-2(2001-10-10) Runs: 0 
rngtest: FIPS 140-2(2001-10-10) Long run: 0 
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0 
rngtest: input channel speed: (min=476.082; avg=899.493; max=907.881)Kibits/s 
rngtest: FIPS tests speed: (min=7.080; avg=14.217; max=14.352)Mibits/s 
rngtest: Program run time: 23057351 microseconds

It surely looks like statistically random data. Of course, so does a counter fed into a hash chain. But more to the point, it verifies that the hardware RNG is up and running, is not obviously broken, and is spitting out over 1 kB of random numbers every second. That should get the Pi up over the initial 256-bit hurdle at startup within a millisecond. What to do with all of the leftover entropy? If you’ve got a Raspberry Pi on your home network, you’ve got an entropy server for any other computers that might be running out. Or post the data to an MQTT topic on your home network, and you’ve got your own freaky numbers station. Need a one-time pad? dd if=/dev/hwrng of=hwrng-test-data.bin bs=1024 count=1024 will dump 1 MB of sweet randomy goodness to a file.

Takeaways

Pseudorandom number generators are deterministic functions, and they can only spit out random numbers with as much entropy as their seed. Getting 256 bits of entropy can be difficult, especially at boot time or on a headless computer with no human interaction. Almost all of the time, however, the Linux kernel has your back. It’s watching what you do, and adding your human randomness into the mix, and that’s probably the best source of randomness there is. This is what should be (and is) seeding PRNGs on your system.

For devices like the Raspberry Pi, there is also a hardware random number generator that can be used to accumulate entropy to partially make up for lack of human interaction. If you’re doing anything security-relevant on your Pi, install rng-tools and get it running!

Filed under: Hackaday Columns, linux hacks, Raspberry Pi

BotGadget

Thursday, November 2

What is Entropy and How Do I Get More of It?