With more and more embedded systems being connected, sending state information from one machine to another has become more common. However, sending large packets of data around on the network can be bad both for bandwidth consumption and for power usage. Sure, if you are talking between two PCs connected with a gigabit LAN and powered from the wall, just shoot that 100 Kbyte packet across the network 10 times a second. But if you want to be more efficient, you may find this trick useful.
As a thought experiment, I’m going to posit a system that has a database of state information that has 1,000 items in it. It looks like an array of RECORDs:
typedef struct { short topic; int data; } RECORD;
It doesn’t really matter what the topics and the data are. It doesn’t really matter if your state information looks like this at all, really. This is just an example. Given that it is state information, we are going to make an important assumption, though. Most of the data doesn’t change frequently. What most and frequently mean could be debated, of course. But the idea is that if I’m sending data every half second or whatever, that a large amount isn’t going to change between one send and the next.
Compression
The most obvious answer is to use some type of compression. If you have the time to run a compressor on the data and then decompress it all inside your window, then that’s probably the answer — or at least part of it. The problem, of course, is that these programs take time to run. Another problem is that sometimes data doesn’t compress well and if you add the overhead of the compression, you might not get any reduction in data or it can even cause you to send more data.
I’m going to propose a very simple run-length encoding scheme for our system. It will treat the entire state array as a sequence of bytes. It will then send a bunch of packets that consist of a count and a payload. If the count is positive, then that’s how many bytes are in the payload and they are just copied to the output. If the count is negative, then the magnitude is one less than a count of the payload byte (that is, -1 is a repeat count of 2). I arbitrarily limit the repeat size to 256. A count of 0 marks an end of the state information.
As compression schemes go, this is pretty poor. My state array has 8,000 bytes in it (1,000 entries). If I use the compression algorithm on a random state array, it fails badly, taking about 10-12% more bytes to send than just sending the whole thing over.
The Fix
By now you are laughing and thinking what good is a compression algorithm that fails by 10% most of the time? But what if there was one simple thing you could do to fix it? In particular, the reason the algorithm fails is that not enough data repeats to make this compression worthwhile. Even if you use a “real” compression algorithm, they will almost always perform better if the data to compress has more repeating sequences. This is one reason why file formats that are already compressed don’t compress well — any repeating sequences were already removed.
So how do we get more repeating data? After all, the state is what it is, right? You can’t arrange for temperature and humidity and position data to all line up in repeating sequences, right? If you think of the array as fixed in time, that’s probably true. But if you start thinking about it in time, maybe not.
Remember how I said I’m assuming the state array doesn’t change much between transmissions? That’s the key. Here are the steps:
- Send the first state array compressed. It will probably be larger than the rest.
- Before sending the second state array, XOR with the previous one. Since not much changed, this will change most of the state array to zeros. Repeating zeros.
- Compress and send the second state array.
- Repeat for the third, fourth, and all subsequent arrays.
If you are used to using programs like rsync, this is the same idea. Don’t send everything, just the parts that changed. On the receive side, you simply reverse the XOR operation using the current state array. Even if you use another compression algorithm, changing your data to be mostly zeros is going to help the compressor.
There are a few caveats, though. First, unless your connection is perfect, you need a way to tell the sender you didn’t get the last state update, so it needs to send a full array again. You could also add some overhead to mark when you send a full frame (and maybe send that uncompressed). You could also make the sender periodically send a full array to aid in synchronization.
A Simulation
To test this idea, I wrote a simple simulation you can find on GitHub. Since the sender and the receiver are really the same program, I didn’t simulate doing an ack/nack type handshake nor did I make provisions for resynchronizing. You’d want to add that in real life.
I wrote the simulation to use a test driver which — in a fit of originality — I named test. However, there are two test drivers, and you can select one by setting TEST_MANUAL to 0 or 1. When set to 0, the code uses two different sets of state data that you can set in the setstate0 and setstate1 functions. This is great for trying out a specific case for debugging. To keep things simple, I only populated the first four elements, plus the 800th element and the last element of the state array. This lets me keep debugging output short while still being able to tell that things work. The first four are just handy. The one at 800 shows that expanding data doesn’t get out of sync and the last one is good for catching corner cases where you overflow the buffer.
However, I also wanted to test on a broad set of conditions, so I wrote the other test driver. It randomly makes changes to the state array over 100 iterations (no more than 20 changes per iteration). You can supply a seed on the command line to get the same run of random values for testing.
In addition to a seed, you can prefix the seed with a caret (^) to suppress the XOR processing. If you don’t want to do the XOR, you won’t have a lot of runs in the data and your percentage will be very high — possibly more than 100%, indicating you sent more data than just sending the bytes in the regular fashion. For example, here’s a run that took almost 11% more bytes to send than normal without using the XOR.
alw@enterprise:~/tmp/state$ ./state ^143 TX/RX=8000000/8863392 (110.79) Done
With the XOR in place, the same run was at 2% meaning it was 98% smaller than the original data.
You can read the code to see how the above ideas are implemented. There are a few key data structures and functions you should know:
- state – The current state to transmit
- target – The current receiver state
- tx_prev – The previous state used by the transmitter
- rx_prev – The previous state used by the receiver (see below)
- receive – The simulated receiver
- xmit – The simulated physical transmitter (that is, all code that wants to transmit to the receiver calls this function)
- transmit – Send current state to the receiver
I debated about using the rx_prev buffer. This holds the previous state so you can XOR with the current data upon reception. However, you could XOR the data in place and save the buffer. The only problem is then you couldn’t use efficient calls like memcpy. Granted, even after memcpy you have to go do the XOR if you are doing the XOR at all. So it is probably as broad as it is long. If you were implementing this in a real system, it would be something to think about.
Both test drivers use memcmp to validate that the receivers buffer is the same as the transmitter’s. The random test driver also computes the percentage of data sent which is the bytes received divided by the total bytes sent times 100.
Limitations
As I mentioned, I don’t account for resynchronizing in the simulation. I also don’t do any integrity checking like a checksum or CRC. Those could be related in real code. It is easy to imagine a receiver computing a CRC, comparing it, and sending an acknowledge or a negative acknowledge. It would also be easy to imagine the transmitter sending some sort of sync header and flag to show that the contents are XOR’d or not. You could even use that mark to indicate the sent data is not compressed at all, which would allow you to send the initial reference state with no compression. That should make the algorithm even more efficient in most cases.
Results
The randomize function changes a random number of items on each pass, up to 20 at once. For 1,000 records that is a 2% change. Unsurprisingly, then, the amount of data sent compared to the data received is about 2% as you can see in the above graph. The correlation isn’t perfect, because the data is random, but in general, the algorithm will do well until so much of the state array changes each time that the overhead kills any benefit.
Still, many projects won’t have state that changes very often and this is a great way to send only changes. This works for any sort of data structure. However, as I proposed it, you could look at which topics have changed and work out a way to send only the changed topics. Note, however, that in my example, the topics are just more data and they change too so that would be harder in that particular case. My point is that there are many ways you could approach this problem.
Sometimes it seems like there is nothing XOR can’t do. There’s lots of compression tricks out there including squozing if your data can survive it.
No comments:
Post a Comment