GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Actual table size decrease

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Actual table size decrease

    Hey,

    I'm looking at a table here that is kinda big.. but it doesnt need to be.. I'm wondering if I should suggest something to change how its stored, inserts would have an added layer of complexity, but the queries should be faster

    Big big parameter table with

    parameter_id int
    parameter_type_id int (points to the type of parameter here, eg. int, string, date, large string etc)
    parameter_value varchar(64000)?

    Data fills this table from xml, the parameter value could be as small as an int '1' or as big as a large string containing log data

    If a new row gets inserted and it's only an int, it will still take up the space for varchar(64000) right?
    Doing a search on for int '1' will be doing a txt conversion (fulltext) etc?

    *should* I split this table into a table per parameter_type?
    as in 1 table for int values, 1 table for datetime values, 1 table for small strings, 1 table for mediumtexts etc.

    something like

    main table -
    parameter_id int
    parameter_type_id int (points to the type of parameter here, eg. int, string, date, large string etc)

    int table -
    int_parameter_id int
    parameter_id int (same as maintable)
    parameter_value int

    datetime table -
    datetime_parameter_id int
    parameter_id int (same as maintable)
    parameter_value datetime

    large string table -
    largestring_parameter_id int
    parameter_id int (same as maintable)
    parameter_value varchar(64000)

    medium string table -
    largestring_parameter_id int
    parameter_id int (same as maintable)
    parameter_value varchar(333)

    so doing a query would be

    select it.parameter_value, dt.parameter_value, st.parameter_value, mst.parameter_value from maintable mtleft outer join inttable it on mt.parameter_id = it.parameter_idleft outer join datetable dt on mt.parameter_id = dt.parameter_idleft outer join large_strtable st on mt.parameter_id = st.parameter_idleft outer join medium_strtable mst on mst.parameter_id = st.parameter_idwhere mt.parameter_id = 3

    if the parameter is an int, then dt.parameter_value and st.parameter_value will be null

    I know the extra tables will add to the speed of the query, but hmm for the drastic reduction in storage, it might be worth it?

    This table is HUUGGGEEEE .. disksize wise, compared to the number of rows in it, index file for the table is huge with indexes on the primary key and the value

    inserts have an added layer of complexity too
    but looking at things like ' http://doc.dev.md/mysql-4.1/optimization.html#data-size '

    Does anyone have a nicer way to optimize a table with different data types for data values? I know what datatype the values are, it just seems a huge waste just dumping them all in the same table and good luck trying to do a query on date ranges as it is!

    Thanks )

    /Bal

  • #2
    er


    (select it.parameter_value from maintable mtinner join inttable it on mt.parameter_id = it.parameter_idwhere mt.parameter_id = 3)union(select dt.parameter_value from maintable mtinner join datetable dt on mt.parameter_id = dt.parameter_idwhere mt.parameter_id = 3)union(select lst.parameter_value from maintable mtinner join large_strtable lst on mt.parameter_id = lst.parameter_idwhere mt.parameter_id = 3)union(select mst.parameter_value from maintable mtinner join medium_strtable mst on mt.parameter_id = mst.parameter_idwhere mt.parameter_id = 3)limit 1


    might make more sense as a query, also wondering if limit 1 would stop the query completting the unions if it found a value in the first table

    I know this is complex and a bit awkward, but does it make any sense? Should I just avoid doing this and stick to 1 table? Is there any nicer way to organise the table?

    Comment


    • #3
      Hmm

      InnoDB might not have any issue with storage here?
      What about indexes though?

      MyISAM pads everything though?

      Comment


      • #4
        It sounds a bit odd that you have anything from a string with one character '1' to 64000 characters of XML in the same column. Could it be that all these parameters are mostly present so that you should instead have 4 different columns in this table? one int, one date, etc that contains the data. That way you can also have different indexes on them like for example the large VARCHAR() you might want a fulltext while on the other columns you want normal indexes.


        Doing your 4 table join strategy is something I would definately advice against (based on your description it sounds just like you are going to have a lot of work and no direct benefit).

        Quote:


        Data fills this table from xml, the parameter value could be as small as an int '1' or as big as a large string containing log data


        Technically it wouldn't be a "int" since it would be the string '1' so as long as you still query with "WHERE yourVarCharColumn = '1' " there is no implicit conversion.

        But your main concern seems to be that you thinks this table will be excessively large, and it sounds like you think a VARCHAR(64000) will be very large even though it only contains a small string.
        That is not so, the size of a VARCHAR(64000) will be stringLength + 2 bytes.
        So the string '1' will only occupy _3_ bytes of storage.

        As for indexes (if you have used a normal index) you can use the yourVarCharColumn(x) syntax in your index creation where you say that only x bytes from the beginning of the string should be part of the index.
        This is since you want to reduce the size of the index since usually the first 20-30 bytes can be enough to locate the matchings records, then the rest of the data is just payload which is better fetched from the table. This way the index itself can be kept small and fast.
        But this is not true for fulltext indexes which is a totally other beast.

        Comment


        • #5
          Hey Sterin,

          Thanks for the reply )

          sterin wrote on Wed, 08 September 2010 14:59

          It sounds a bit odd that you have anything from a string with one character '1' to 64000 characters of XML in the same column. Could it be that all these parameters are mostly present so that you should instead have 4 different columns in this table? one int, one date, etc that contains the data. That way you can also have different indexes on them like for example the large VARCHAR() you might want a fulltext while on the other columns you want normal indexes.



          This is a device/parameter_value table and a device could have 0 - 5000 different parameters, so 1 device might have 2000 parameters where I have to store their values, another device might have 3000 parameters.

          If a completly new parameter is discovered, I've to deal with it and store it and the parameter name and value.

          A device might tell my it has a parameter
          with value "mylogin"
          then an instant later tell me it's got a parameter
          with value "this is a really long log...."

          I've no control over the parameters on the devices, I think 64000 was an arbitrary value when this table was created or just the largest it could handle at the time without moving to blobs/text.

          sterin wrote on Wed, 08 September 2010 14:59


          Doing your 4 table join strategy is something I would definately advice against (based on your description it sounds just like you are going to have a lot of work and no direct benefit).



          True.. join would create problems in the app side of things, but the union might be ok? It'd return 1 value and the data type would be known..it should be extremely quick since its all on a small indexed field, but its looking like I shouldnt need to split it anyhow

          sterin wrote on Wed, 08 September 2010 14:59


          Quote:


          Data fills this table from xml, the parameter value could be as small as an int '1' or as big as a large string containing log data


          Technically it wouldn't be a "int" since it would be the string '1' so as long as you still query with "WHERE yourVarCharColumn = '1' " there is no implicit conversion.



          Ah true again but dates would be an issue to search on or querying values below a certain value. Basically all searches are restricted to strings or having to do a conversion/function call on the data?

          eg. search for all devices with parameter name 'uptime' and value > 20000 .. or last_restarted_date > (now() - INTERVAL 1 MONTH)

          sterin wrote on Wed, 08 September 2010 14:59


          But your main concern seems to be that you thinks this table will be excessively large, and it sounds like you think a VARCHAR(64000) will be very large even though it only contains a small string.
          That is not so, the size of a VARCHAR(64000) will be stringLength + 2 bytes.
          So the string '1' will only occupy _3_ bytes of storage.


          This is only InnoDB? Currently all the tables in this lovely mess are MyIASM, doesnt this work a bit differently?

          ahh nm, I found

          The character data typesCHAR• Storage: fixed-length string with space padding on the right.Chapter 9: MyISAM Record Structure 83• Example: a CHAR(5) column containing the value 'A' looks like:hexadecimal 41 20 20 20 20 – (length = 5, value = 'A ')VARCHAR• Storage: variable-length string with a preceding length.• Example: a VARCHAR(7) column containing 'A' looks like:hexadecimal 01 41 – (length = 1, value = 'A')

          from http://faemalia.net/mysqlUtils/mysql-internals.pdf

          As it is, the index was on the first 333 characters of the parameter value .. but could easily drop to 20

          The sizing issue with huge varchars is all in my head I think now I'd just been looking at hints and tips and I might have been assuming too much, like dont use large values for PRIMARY indexes.. I can manage the size of the indexes as you said above.

          It's also in my head that there has to be a nicer way to organise the data, its too generic as it is.. but having varchar(64000) shouldn't be an issue of any sort (apart from not being able to do datatype specific searches on the data) .. but the size was my concern for now, I was thinking we were just wasting space and it'd have an effect on searches, insert speeds etc aswell as disk space


          /Bal

          Comment

          Working...
          X