Until recently I thought that currently popular scripting languages, which mostly evolved over last 10 years or something, must allow for easier portability across different platforms compared to ye good olde C/C++.
After all, their development started a few decades after C, so its notorious caveats are all well-known and should be easy to avoid when designing a new language, right?
However, PHP just brought me a new definition of “portable” – and that was when working with… integers.
PHP is not able to handle unsigned integers, and converts values over 2^31 to signed. So if your IDs go slightly over 2 billion, and PHP decides to treat them as integers, you’re in trouble.
Oh wait, no – that’s on 32-bit platforms only! PHP int size is platform-dependent, and it seems to be 8 bytes on our 64-bit boxes. Yes, the very same ones where C/C++ int is 4 bytes, you know.
That was the easy part. It was mostly documented.
Now, there’s a function called unpack() which essentially allows to convert different types of data from binary strings to PHP variables. What if you try to unpack unsigned 32-bit big endian integer (format code “N”)? Let’s check the doc:
Having read the doc I personally blatantly relied upon it and expected that large unsigned 32bit numbers would be converted to float, or string, or something, but handled properly. However, a couple or so weeks ago the following notice suddenly appeared:
How sweet. No, it just could not behave like documented and convert 32-bit unsigned value to float on x32 or keep it integer on x64 – you now suddenly have to care about value size yourself. Ah, and by the way, there’s no official way to know what’s int size.
To make things even better, 5.2.1 introduced a nice bug in unpack(), which f..ed unpacking less-than-16-bit values on x64. (I assume you understand that “f..ed” means “fixed”). It took some time and several tries to convince PHP team that x64 has enough bits to hold 16-bit unpacked value, but thankfully its now acknowledged and assigned.
To summarize, if you need to unpack an unsigned 32bit int from binary stream, you have to:
Most people could probably learn all that, and then use sprintf(“%u”,$id), work with string IDs everywhere, avoid 5.2.1 and be happy.
Unfortunately, my final goal was to have support for 64-bit document IDs…
Let’s do a small time travel. Integer types in C/C++ have always been a pain, but back in 1999 ISO commitee ratified ISO/IEC 9899:1999 standard, also known as ISO C99, which guarantees that “long long int” integer type must be at least 64 bits in size. By now, most compilers support that part perfectly.
However, designers of PHP 5 (released in 2004) type system were either not aware of this change, or decided to not rely on the standard which has been out for “only” 5 years by then, or just thought that 31 (no typo) bits and 640K should be enough for everybody.
Long story short, it’s 2007 now but there’s no native 64-bit integer type in PHP. Let me remind that built-in “int” might be 64-bit, but then again it might be not, and there’s no official way to tell.
This time, there’s a number of routes one could take – either use ints (and pray that the app is never run on x32, and that “platform dependent” size does not change to 4 next version); or use GMP or bcmath extensions if they are available.
Fine, so 99.999% of the world would hit that, compile in bcmath, and be happy again.
Unfortunately, I needed to develop a library which could be deployed in any environment – and still work, and produce reasonable results. The worst case is x32, and neither GMP nor bcmath available.
And this is how the following code was born.
|
1 |
/// portably build 64bit id from 32bit hi and lo parts<br>function _Make64 ( $hi, $lo )<br>{<br> // on x64, we can just use int<br> if ( ((int)4294967296)!=0 )<br> return (((int)$hi)<<32) + ((int)$lo);<br><br> // workaround signed/unsigned braindamage on x32<br> $hi = sprintf ( "%u", $hi );<br> $lo = sprintf ( "%u", $lo );<br><br> // use GMP or bcmath if possible<br> if ( function_exists("gmp_mul") )<br> return gmp_strval ( gmp_add ( gmp_mul ( $hi, "4294967296" ), $lo ) );<br><br> if ( function_exists("bcmul") )<br> return bcadd ( bcmul ( $hi, "4294967296" ), $lo );<br><br> // compute everything manually<br> $a = substr ( $hi, 0, -5 );<br> $b = substr ( $hi, -5 );<br> $ac = $a*42949; // hope that float precision is enough<br> $bd = $b*67296;<br> $adbc = $a*67296+$b*42949;<br> $r4 = substr ( $bd, -5 ) + + substr ( $lo, -5 );<br> $r3 = substr ( $bd, 0, -5 ) + substr ( $adbc, -5 ) + substr ( $lo, 0, -5 );<br> $r2 = substr ( $adbc, 0, -5 ) + substr ( $ac, -5 );<br> $r1 = substr ( $ac, 0, -5 );<br> while ( $r4>100000 ) { $r4-=100000; $r3++; }<br> while ( $r3>100000 ) { $r3-=100000; $r2++; }<br> while ( $r2>100000 ) { $r2-=100000; $r1++; }<br><br> $r = sprintf ( "%d%05d%05d%05d", $r1, $r2, $r3, $r4 );<br> $l = strlen($r);<br> $i = 0;<br> while ( $r[$i]=="0" && $i<$l-1 )<br> $i++;<br> return substr ( $r, $i ); <br>}<br><br>list(,$a) = unpack ( "N", "xffxffxffxff" );<br>list(,$b) = unpack ( "N", "xffxffxffxff" );<br>$q = _Make64($a,$b);<br>var_dump($q);<br> |
For reference, this is what would the equivalent C/C++ snippet look like:
|
1 |
typedef unsigned long long myuint64; // just for brevity<br>unsigned int a = 0xffffffffULL;<br>unsigned int b = 0xffffffffULL;<br>myuint64 c = (myuint64(a)<<32) + myuint64(b);<br>printf ( "%llu", c );<br> |
Portability in year 2007.
Resources
RELATED POSTS