Joined: Sun Oct 01, 2006 9:26 pm
Title: All in a day's work.
After some serious consideration, it's apparent that a VLI "format" just isn't enough; the data still needs to be worked with and manipulated. To this end, I started looking at the options available for arbitrary precision math libs. My first choice was GMP
, since it's fairly ubiquitous, and supports just about every operation you could ever possibly want.
So, the first bit of research I did while looking into GMP was to find its "native" serialized data output format. What I found was mpz_out_raw()
, which does exactly that: outputs a "raw" serialized byte stream that represents the number. I thought that was pretty awesome until I read:
The size is 4 bytes ...
That means the smallest possible number (0) will be represented in 5 bytes; a 4-byte "size" header, and 1 byte of data. That's just unacceptable for SRDP. The original thought of using VLIs in the first place was to have a super-compact byte stream for all of the serialized data. If all addresses and registers, etc. require a 4-byte size indicator, that just destroys the whole advantage of using any arbitrary precision numbering system.
GMP is too big for my purposes, anyway. SRDP doesn't need support for rational numbers, floating point numbers, natural numbers, or the random number functions that GMP provides. So I started looking for other options, and I came across LibTomMath
At first glance, LibTomMath looks seriously bad ass. It's lightweight, "decently optimized", and he includes a whole book (in PDF format) in the source code archive that details the library, its uses, and rationale. The library is also in the public domain, so that's a huge win.
The research stage begins again: 1) serializing is done with mp_to_unsigned_bin()
. 2) the byte stream size is gathered with mp_unsigned_bin_size()
. It's an interesting approach; the byte stream size and data are separated.
So a size header is still required. But 4 bytes is serious overkill. How about just a single byte? The properties of a single-byte size header go something like this:
- Approximately 1 in every 256 bytes will be serialized to a single byte. 0 = 0x00.
- A number up to 255-bytes in length will be serialized to 256 bytes or less.
The special case for the number 0 is represented as a single byte: a size header of "0" indicating zero data bytes. Where other numbers like 1 would have the size header, and a single data byte = two bytes total.
But, let's stop there, for a moment. What do I mean by "a number up to n-bytes in length?" I'm talking about a number serialized in base-256. That means, each byte is a single digit, and each digit has 256 states. (Think base-10: each digit has 10 possible states.) If we have a 2-byte number, the highest possible value it can store is 256^2 - 1 = 65,535. And a 255-byte number can store a maximum value of 256^255 - 1 =Broken up for readability:
Perhaps you can appreciate the scope of a single-byte size header. Now just imagine the number that maxes out GMP's 4-byte size header. I will leave that as an exercise for you, dear reader, since bc
refuses to calculate 256^4294967295 - 1.
This seems like a solid plan, assuming no one needs numbers any larger than that. A pretty safe assumption, I'll bet.
Since arbitrarily large numbers like this will be so rare, I'm planning an API to convert these VLIs into 64-bit integers. The API will fetch the VLI data, deserialize it, and return a 64-bit (un)signed integer if the number will fit. Or else it will return an error, and the number can still be processed/handled using the LibTomMath functions.
This theory fits elegantly with the "Scalable" part of SRDP. I'm still open to thoughts and opinions...
I have to return some video tapes.
Feed me a stray cat.