October 2, 2014

Idea: Couple of more string types

MySQL has a lot of string data types – CHAR, VARCHAR, BLOB, TEXT, ENUM and bunch of variants such as VARBINARY but I think it is not enough :)

I would also like to see type HEXCHAR which would be able to store hex strings, such as those returned as MD5() and SHA1() efficiently. With little modification it could work for UUID() as well (it adds some dashes). Currently it is quite inconvenient to deal with strings like that in MySQL. Either you store them as strings and waste space or you spend them as binary and deal with inconvenience of having not readable strings in the table OR adding UNHEX() everywhere – which also adds overhead.

Another one I would like to see is zBLOB or zTEXT (or call them BLOB COMPRESSED/ TEXT COMPRESSED) which would transparently compress the blobs when they are inserted and retrieved from the database – this would allow to avoid having COMPRESS()/UNCOMPRESS() everywhere which clobbers things or compressing/uncompressing on the client.

It would be best if last one is optimized so if BLOB is not used in any WHERE clause (HAVING, GROUP BY etc) you could actually transparently decompress it on the client and compress bad. Though this is likely to require more significant changes in MySQL so I would not expect to happen quickly. The basic support should not be that hard though.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. MD5 and SHA1 aren’t really hexchar datatypes.

    They’re just encoded as base16 for historical/portability reasons.

    The most efficient storage would be binary and to store 128 and 160 bits respectively.

    What is REALLY needed in MySQL is a mechanism to take binary data and encode it when using the MySQL command line application or any other user level application or console.

    Specifically, base16 is NOT space efficient. Base64+filesafe is the most efficient and portable storage format IMO.

    We store binary data in MySQL and it’s very annoying when my console is screwed up because control characters are printed.

    Kevin

  2. moo says:

    i thought of this before, but instead of making different field types, why not just add intput/output filter to field property? like inputfilter=compress outputfilter=uncompress

    not all functions take 1 param though

  3. peter says:

    Kevin,

    Indeed MD5 and SHA1 are binary strings 128 and 160 bits appropriately. Using hex strings would indeed allow to STORE them as such, You can get binary presentation of them now using unhex(sha1(”)) for example but storing this in the database makes things non printable which adds a lot of inconvenience. For example copy-paste query from the query log quite likely would not work, your console can be trashed by processlist etc.

    If is is not clear by hex string I mean – string containing 0-f digits which is packed two characters per byte for storage.

  4. peter says:

    moo,

    I do not care about syntax to be honest. Just would like to see it and would like it to work fast. If it can be done efficient in general enough way it is always good.

  5. jim says:

    if the server did not mangle so many strings and declare them binary even though they obviously aren’t, it would be easy for the client to properly encode binary fields instead of trying to display them.

    in fact, maybe we should make the client do that so the server team would finally fix all the stupid cases where it flags non-binary strings as binary strings.

  6. peter says:

    Jim,

    BINARY in MySQL means different. This means the string is processed as byte string rather than character string – so for example it is case sensitive and there are also differences end spaces handling.

  7. Peter,

    > BINARY in MySQL means different. This means the string is processed as byte string rather
    > than character string – so for example it is case insensitive and there are also
    > differences end spaces handling.

    Binary strings are indeed case sensitive. I assume, it was just a typo :)

  8. peter says:

    Thanks for good catch. Fixed.

    Indeed it means case sensitive among other things

  9. Bill says:

    Yeah, I was complaining about this on friday. Mysqldump has the –hex-blob option to encode binary as hex. I never understood why the client didn’t.

  10. Jared says:

    ANSI Sql says you should be able to store hex data using

    INSERT INTO foo VALUES(md5) (X’0123456789abcdef’);

    Its just a pain none of the prepared APIs (in PHP) do the conversion, which means have to build the sql by hand. :/

  11. peter says:

    Jared,

    for prepared statements you can use unhex(?) though the main problem is still selecting the data

  12. krteQ says:

    We are using UNHEX() to store + HEX() to retrieve md5 data. Yes, it’s not as comfortable as it could be, but still far better than VARCHAR(32)…

Speak Your Mind

*