BotGadget: True Random Number Generator for a True Hacker

How can you generate random bits? Some people think it’s not easy, others will tell you that it’s pretty damn hard, and then there are those who wonder if it is possible at all. Of course, it is easy to create a very long pseudorandom sequence in software, but even the best PRNG (Pseudorandom Number Generator) needs a good random seed, as we don’t want to get the same sequence each time we switch on the unit, do we? That’s why we need a TRNG (True Random Number Generator), but that requires special hardware.

Some high-end microprocessors are equipped with an internal hardware TRNG, but it is, unfortunately, not true for most low-cost microcontrollers. There are a couple of tricks hackers use to compensate. They usually start the internal free running counter and fetch its contents when some external event occurs (user presses a button, or so). This works, but not without disadvantages. First, there is the danger of “locking” those two events, as a timer period may be some derivative of input scan routine timing. Second, the free running time (between switching on and the moment the unit requests a random number) is often too short, resulting in the seed being too close to the sequence start, and thus predictable. In some cases even, there is no external input before the unit needs a random seed!

Despite what has already been discussed, microcontrollers do have a source of true randomness inside them. While it might not be good enough for crypto applications, it still generates high enough entropy for amusement games, simulations, art gadgets, etc.

Uninitialized RAM

Hackers often make use of hardware resources beyond their initial means. Here we will use volatile RAM, not as a memory unit, but as a source of entropy.

When power is applied to the MCU, its internal volatile RAM has unknown contents. Each flip-flop will be preset to a 0 or 1 state – a consequence of the imperfection of internal circuits, power supply glitches, surrounding current flow, or thermal (or even quantum) noise. That’s why the content of RAM is different each time it is powered on.

09.Cover4.Final A few years ago, Intel came up with a new way of generating true random bits, using a tuned flip-flop which was forced to a metastable state, and then released to a stable 0 or 1 state at 3 MHz rate. They announced that it will be embedded in a new generation of processors.

We don’t have tuned flip-flops in the MCU, but we have a lot of flip-flops and we can expect some of them to act as tuned. How many flip-flops do we really need to create a good random number using volatile RAM content? It surely depends on MCU RAM performance, so we should go through with some experiments to see how many bits are unpredictable after powering on the MCU. This instability occurs only in flip-flops which are highly symmetric, with well balanced transistors. We need those unpredictable bits to harvest entropy, so we should use as much data as we have – the whole data memory, before it is initialized to some known values. We don’t have control lines which could force RAM flip-flops to a metastable state, so we can use it only once – after powering on, and before initializing RAM.

Experiments

Let’s see the contents of a typical uninitialized MCU RAM. The following picture represents an initial state of one part (20480 bits) of data memory in PIC18F2525 MCU. There are 256 bits (32 bytes) in each row. The ones are red, and the zeros are yellow.

mcu_ram_1

Obviously, there are less 1’s than 0’s in this MCU RAM, but it doesn’t really matter. There are two other matters of interest here: first, will this pattern be different if we use other MCUs (even from the same production line), and how many bits will be changed after the next power on, with the same MCU?

You probably guessed the answer to the first question – yes, each MCU has a completely different initial RAM pattern. The second question is more important, and the answer is that not many bits are different (unpredictable) after every power on. Yet there are enough to generate several random bytes.

Here is an experimental result of switching off and on PIC18F2525 twice. Gray pixels are both times read as 1’s, white are both 0’s, red ones are “0, then 1″ and blue ones are “1, then 0″.

mcu_ram_2

After some experimenting, you could see that generally the same bits in one MCU are unpredictable after switching on. They came from “good” flip-flops and they represent our source of entropy. I tested Microchip’s PIC18 and PIC24E MCUs only (the results are listed below), but any other MCU should act in a similar fashion. Examples shown here are generated with short Vdd rise time (high slew rate), as the on-off switch was between DC supply and the MCU. If you switch the primary side of the power supply which has a low slew rate, you will notice some patterns in memory contents, but somehow the number of unstable bits will actually increase.

Software

So, if you need a few random bytes (for PRNG seed or some magic number), you can use uninitialized RAM contents to generate them. You have to create a special routine, which will XOR (or ADD MOD 256) every byte in some block of uninitialized RAM to generate the first random byte, then the next block for the next byte, and so on. For instance, if your MCU contains 4K of RAM (like PIC18F2525, if you include SFR area), and you need a 32-bit seed, you can split RAM into four 1024-byte blocks and use them to generate four random bytes. Even if small parts of some blocks are already initialized (SFRs or variables for a routine loop), it should not affect the global entropy significantly. I used this technique about a decade ago for LED curtains, which simulated a waterfall in a casino. I built a lot of single chip MCU controllers for that project, and each of them was dimming 256 addressable LEDs. The trick with RAM random seed and 32-bit PRNG created beautiful chaos with a minimum of hardware.

You can also add more scrambling later, for instance you can call your PRNG routine from an existing timing loop (instead of NOP looping), so it will be invoked, supposedly, a random number of times until your program asks for PRNG output. This principle would not create a good random number sequence if used alone, but it can not cause the TRNG performance degradation, as the entropy can only grow.

The major limitation of this approach is that it can’t be used if there is battery backup, or if SLEEP mode is used instead of the ON-OFF switch. However, for some less sensitive applications (like digital arts or amusement games), you still have the long PRNG sequence to cover the period until battery replacement and a new Reset, when the seed will be randomized. You can also let the peripheral counter run in sleep mode and use it later in PRNG seed scrambling, supposing that the extra current in sleep mode does not affect battery life significantly.

We have to keep in mind that RAM contents must be uninitialized and used as-is, immediately after power up. It is also recommended to avoid high capacity Vdd decoupling, as CMOS RAM retention voltage can be pretty low. The worst case scenario is if you switch off the unit for a very short period of time, so that the voltage at the decoupling capacitors drops enough to activate Brown-Out Reset, but not to scramble the RAM contents. Luckily, it is very unlikely that the contents of RAM will ever be exactly the same, as there is always some “housekeeping” part of RAM (or at least MCU registers) which will be affected by program flow. Additional safety can be achieved by a multi-byte freerunning binary counter in RAM, even if the program doesn’t need it and never reads its state.

There are a few design approaches which you can use to enhance the quality of random numbers harvested from the volatile RAM. Never initialize the whole RAM with zeros or any other content, but only the necessary portion. If you don’t use sleep mode (but switch Vdd off instead), then don’t use too high decoupling capacity, or at least use an extra resistor in parallel with MCU supply. It will increase the supply current, but not by too much. Even 5% of total MCU supply current through this resistor will be enough to prevent voltage locking, when Vdd drops so that Idd is close to zero, or even when some external component sources current to the MCU (a beautiful example is described in The Mystery Of Zombie Ram article) . This resistor guarantees that RAM will not remember anything from its previous “reincarnations”, even if MCU was in sleep mode before switching power off.

Searching the Web for similar ideas and experiences, I stumbled upon a discussion with the topic Why don’t we use CPU/RAM-usage for “true” random generation. The author of the idea was heavily criticized, mostly because he suggested using initialized RAM for crypto applications. I also found the patent US5214423, which is based on a similar idea (used to reduce the possibility of repeated bus access collisions). Such patent applications scare every design engineer, as it makes the design process look like walking through a mine field – you never know when you might get sued for your ideas. The good news is that the legal status of this patent is “expired due to failure to pay maintenance fee”, so, hopefully, I have nothing to worry about – at least this time.

Experimental Units

The first experimental results encouraged me to build some units which use uninitialized RAM contents to create permanent random data stream. So I named the described idea (using uninitialized RAM contents for seed creation) “zero concept”, and then built three units to test three new concepts.

Two PIC18F2525s with 8-bit one-way parallel communication, supported by simple handshaking. Slave MCU generates 16 random bytes, each of them based on 248 bytes of uninitialized internal Data RAM. Master MCU (shown in the middle above) receives those bytes and sends them to a desktop PC through the RS232 port. After receiving a 16-byte string, master CPU uses the PNP transistor to power off the slave MCU for 200 ms, which will supposedly create new random contents of RAM (according to my experiments, the “Zombie Ram” effect in 8-bit PIC MCUs takes place if the unit was switched off for less than 85 ms). Then the whole process is repeated 625,000 times, for a total of 10 MB of binary data, needed for randomness tests. It takes about 50 hours (the average rate is 56 bytes/sec).
For the second test the same master MCU is used, but this time the slave is a 16-bit MCU 24EP512GP202. All MCUs are on the same proto board (that’s why they are on the same schematic diagram), and the program in the master MCU decides which slave MCU will be used. The left one is significantly faster (70 MIPS vs. 4 MIPS) and has much more internal Data RAM (48 K instead of 4 K bytes). One string of random data contains 192 bytes, so the 10 Mbyte file was created in 5.5 hours.
And finally, only one MCU is used (24EP256GP710A) with 4 Mbytes of external SRAM (CY62177EV30). I had hardware leftovers from some old project, otherwise I would have used the low cost I2C SRAM and a small MCU. SRAM is powered off-on, and then read in a similar manner. The only advantage is high amount of RAM which resulted in high speed, so it took only 19 minutes for a 10 Mbyte file. It could be much faster, but the RS232 port is too slow to keep up with the fast random data stream creation.

Randomness Tests

Two batteries of tests were used, Diehard and ENT. The first concept (with two 18F2525s) had an excellent result – all 15 Diehard and 6 ENT tests passed! The remaining two concepts passed all Diehard tests, but failed one ENT test, which is the most sensitive indicator of randomness, roughly defined as “rate at which Chi square distribution would be randomly exceeded”. The required rate is between 10% and 90% (ideally 50%), but most TRNGs and PRNGs fall flat on this test, showing less than 0.01% or more than 99.99%. Fourmilab published an example of a well rated commercial TRNG which uses radioactive decay events – the ENT page linked above puts that hardware at 40.98%.

I am not a mathematician, but during the experiments I had a feeling that passing all Diehard tests should be quite easy (I don’t know why design engineers are afraid of it), as well as getting good values on almost all of the ENT tests – but this one is a nightmare! It fluctuates significantly during file growth, and tends to stabilize as the file becomes quite large, but even with 10 Mbyte files it is still not stable enough. This test is so sensitive that it is extremely hard to get values which are not saturated below 0.01% or above 99.99%, so even if it fluctuates somewhere between, it’s good news.

The first test run for my first concept gave a surprisingly good rating of 47.47%! The other two concepts failed (<0.01%), but I still had a secret weapon, called whitening transformation. Every hardware TRNG has some kind of a randomness extractor (mainly implemented in software), to minimize the bias and enhance the uniformity of data distribution. The simplest way to do this is to merge somehow (e.g. XOR) raw random data from the hardware TRNG with a software PRNG. So I added a simple 32-bit linear congruential PRNG routine to the MCU firmware (for concepts 2 and 3 only) – even though it might not seem like the best of ideas, it got the job done, as all results were perfectly in range. Here are ENT test results:

Entropy source	1. (8-bit MCU raw data)	2. (16-bit MCU with PRNG support)	3. (ext RAM with PRNG support)
Entropy (bits/byte)	7.999982	7.999982	7.999982
Compression reduced the size by	0%	0%	0%
Chi square distribution randomly would be exceeded at	47.47%	34.29%	42.24%
Arithmetic mean value (127.5 = random)	127.4936	127.5478	127.4937
Error for Monte Carlo Pi value	0.03	0.05	0.00
Serial correlation coefficient (0.0 = uncorrelated)	0.000206	0.000434	0.000104

Diehard test results require much more presentation room, you can find them at http://ift.tt/1RLF08j, together with binary files which are generated in this experiment and all source files in PIC’s assembly language.

Conclusion

At the end, all tested projects passed all Diehard and ENT tests. Even the commercial TRNGs (including the very expensive models) sometimes fail tests, and those $10 DIY units passed all of them!

It is important to keep in mind that this principle is based on undocumented characteristics of microcontrollers, and that it is impossible to reach the degree of safety required for crypto-grade applications. So, what about our high score? Well, at the very least, we proved that the uninitialized RAM can offer high quality random numbers, and that we can use it for seed generation in a lot of MCU applications. It’s simple, it’s free, it takes no extra room on the PCB and it consumes no extra current.

For those that want a random seed without building their own hardware, check out [Elliot’s] article on the NIST Randomness Beacon.

[Illustration by Bob Zivkovic]

Voja Antonic works as a freelance microcontroller engineer in Belgrade. His first microprocessor projects, based on Z80, date back to 1977, just a few years after the appearance of the first Intel’s 4004. He assembled the firmware manually, by pen and paper. In 1983, he published his original DIY microcomputer project called Galaksija, which was built by around 8000 enthusiasts in the former Yugoslavia. To date he has published more than 50 projects, mostly based on microcontrollers, and released all of them in the public domain.

Filed under: Featured, Microcontrollers, security hacks, slider

BotGadget

Monday, June 29

True Random Number Generator for a True Hacker