October 23, 2014

PHP vs. BIGINT vs. float conversion caveat

Sometimes you need to work with big numbers in PHP (gulp). For example, sometimes 32-bit identifiers are not enough and you have to use BIGINT 64-bit ids; e.g. if you are encoding additional information like the server ID into high bits of the ID.

I had already written about the mess that 64-bit integers are in PHP. But if the numbers you use do not cover 64-bit range fully, floats might save the day. The trick is that PHP floats are in fact doubles, i.e. double-precision 64-bit numbers. They have 52 bits for mantissa, and integer values up to 2^53-1 can be stored exactly. So if you’re using up to 53 bits, you’re OK with floats.

However, there’s a conversion caveat you should be aware of.

On different systems, float is converted to string differently. (I spent a bit of time fighting with it today.) Consider the following script:

It prints the following (normally expected) output on x64 box running Linux with PHP 5.1.6 or 5.2.2, and also with PHP 4.4.2 on SPARC64 under NetBSD 3.0:

However 32-bit FreeBSD with PHP 4.4.7 produces different results:

The same 32-bit FreeBSD box with PHP 5.2.5 starts to emit all digits but still incorrectly:

And PHP 5.2.2 on Windows produces yet another variant:

As you can see float to string conversion is not portable across systems and PHP versions. So if you’re handling large numbers stored as float (especially on 32-bit systems where 32-bit int overflow will implicitly convert to float) and getting strange bugs, remember that string conversion might let you down. sprintf(), on the other hand, is always a friend when it comes to fixing PHP “subtleties” in numeric value handling: it can be used to workaround signed vs. unsigned int issues; it helps with float formatting; always a saviour.

UPDATE: as Jakub Vrana pointed out in the comments (thanks!), it’s the “precision” option set from php.ini which affects the conversion. I’ve played with it a bit; the compiled-in and php.ini-dist values seem to vary across architectures and versions, but setting precision=16 (enough to hold 2^53 in decimal without sign) helped in all cases.

However at the moment the option is neither mentioned in the online documentation, nor explained clearly in php.ini. So this adds another fixing route (ie. you could just set higher precision); but the caveat is still there.

About Andrew Aksyonoff

Andrew is the creator of the Sphinx full-text search engine, which powers Craigslist, Slashdot, Dealnews, and many other large-scale websites.

Comments

  1. Skye says:

    Good post. Saved me some headaches with large ints using sprintf.

  2. Well, that’s just mad.

    Why on earth do people continue to use that infernal language?!

  3. The differences probably have to do with the libraries used on those different platforms (I can’t test since I only run linux here).

    If you know that you’re going to overflow, or even if you just think you might overflow) you should probably be using the BCMath functions

    e.g.,

  4. Huh, code example snipped. Probably because I was using PRE tags. Ah well, it’s trivial and obvious after a look at the docs. and it’s not clear to me how to post code here since I don’t see a list of allowed tags (nor have I looked very deeply for that list since it’s no big deal :-)

  5. I think you may find this is more version than platform dependent. PHP4 which is at end of life in 25 days handles doubles differently, on my MacBook Pro PHP 4.4.7 does the same as your x64 box but 5.2.4 works as your FreeBSD box.

    Your windows test could be platform dependent however and I guess it depends on how it is compiled to some degree.

  6. peter says:

    The problem with PHP (or its documentation) is intimate behavior details are not described and assumptions can be wrong. If conversion of floats to strings is not designed to be predictable it would be good to say it is platform and/or version dependent.

  7. Zeroes in s1 are caused most probably by ‘precision’ configuration directive. Set it to high enough value to get consistent results.

  8. For those who don’t understand how floating point number work, please read http://php.net/float , especially the BIG RED WARNING box.

  9. peter says:

    Antony,

    The precision is one thing the conversion to string is completely different question.

    Indeed when you have one expression result of 1/3 and another expression result of 1/3 comparing these with “=” will quite likely cause them to be not equivalent.

    However what Andrew is speaking about is completely different thing – conversion to string. The double format on the platforms checked is the same and there is little reason this should differ if same code is applied.

    Furthermore check examples closer – we’re speaking about conversion for small enough integer numbers which can be preserved exactly by the double IEEE type.

  10. k says:

    3. $f = 65536*65536*65536*4; // 2^52

    2^16 * 2^16 * 2^16 * 2^2 = 2^(16*3+2) = 2^(48+2)

    Or am I indeed completely confused?

  11. peter says:

    Right it is 2^50

    Which just means the problem happens at even lower values :)

  12. shodan says:

    k, you are right, the comment slipped there from one of the intermediate test script versions. I’ve edited it; thanks!

  13. Hello Peter.

    >However what Andrew is speaking about is completely different thing – conversion from string.

    First of all, I don’t see any strings there.
    I can see a float converted to a string in 2 different ways: using “$var” syntax and sprintf(“%.0f”).

    Second, in both cases float is expected to lose its precision and the result you get is most likely different from the value it really holds, because that’s the way floats work (surprise!).

    And the third thing: even though PHP tries to handle this in a crossplatform way (because it uses its own float-to-string and string-to-float utilities since 5.2 or so, I don’t remember exactly), the resulting value still depends on your system – compiler, CPU etc and may differ.

    To be honest, the whole issue described in this post looks kinda weird: isn’t it obvious that you can’t use 64bit integer values on 32bit platform without a special library/extension?
    There are at least 3 (three) extensions I know of designed especially for this purpose.
    Not enough for the author?

    Oh, yeah, surely “designers of PHP 5 (released in 2004) type system were either not aware of” long long.
    As well as designers of MySQL (released in 1995) never heard of Unicode (released in 1991), eh?

  14. shodan says:

    Antony,

    > First of all, I don’t see any strings there.

    Peter just made a typo. I explicitly mentioned conversion TO string, in bold.

    > in both cases float is expected to lose its precision

    No, it’s not. It must not lose precision in THIS case. Refer to IEEE 754 section on 64-bit doubles (it’s a bit more trusted source than php.net/float by the way). Doubles have 52 bits allocated for mantissa, which means that 52-bit (actually 53-bit) INTEGER values MUST be stored without ANY precision loss.

    > PHP tries to handle this in a crossplatform way

    Even though it tries, it still fails. That failure is unexpected, and undocumented. For me, this is enough of a reason to make a blog post to warn the readers about such caveat.

    > To be honest, the whole issue described in this post looks kinda weird: isn’t it obvious that you can’t use 64bit integer values on 32bit platform without a special library/extension?

    No, it isn’t (other environments somehow manage to nicely support 64-bit values), but that’s out of scope.

    One should be able to manipulate 52-bit integer values stored as doubles without any precision loss. That actually works, and might be a perfectly valid solution in some scenarios. It’s only the float to string conversion which suddenly, and unexpectedly, fails.

  15. peter says:

    Thanks Andrew I’ve now modified the comment.

  16. peter says:

    Antony,

    Regarding other libraries to deal with big numbers in PHP they have one serious issue – they are rather slow.
    Sometimes it does not matter but sometimes it does.

  17. shodan says:

    Jakub,

    you were right, it’s the default settings of “precision” affecting this in ALL cases. Setting it to precision=16 (enough to hold 2^53 w/o sign) helps everywhere. The default values (both php.ini-dist and compiled-in) seems to vary but none of these reaches 16. I’ll update the post.

    It’s not mentioned at php.net/float and the embedded description in php.ini-dist is, well, not quite clear (“The number of significant digits displayed in floating point numbers”). I believe it’s kept at default 12 as most if not all sites.

  18. Pierre says:

    Try http://pecl.php.net/package/big_int

    It is relatively fast (given that you have a php function call on each op). About php and large integers, it never supported anything else than 32bits integer. Whether the integer was stored in the system default integer (64bits on amd64 for example) did not change this problem. It is annoying but once you know it :)

    I have something about this topic on my todo, to expose the openssl math functions. They are not the faster on earth but they can then be available by default (as almost all setup has openssl installed).

  19. shodan says:

    Pierre,
    thanks for the link, but the post point is a bit different: it’s not about what you can use at this or that setup (there’s plenty of extensions); it’s only about the conversion issue you might run into.

  20. SMAK says:

    Awesome, thanks you soo much for this post.

  21. Skye says:

    Good post. Saved me some headaches with large ints using sprintf.

  22. Michael says:

    Great post!

    If you are using Mysql BIGINT(20) data type, read this twice.

  23. Kevork says:

    Just a note on the “precision” option in php.ini. Setting the value to anything above 14 (the default), will give you odd results when doing normal arithmatic with numbers.

    For example, you may divide two numbers and get this as a result: 52.0000000001

    Even if you execute the division inside the round() fucntion, the rounding will be ignored, and the display of the floating point number will be incorrect.

    The way around this is to use the bcmath module in projects when using large numbers.

Speak Your Mind

*