Announcement

Announcement Module
Collapse
No announcement yet.

performance with varchar vs tinytext

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • performance with varchar vs tinytext

    Hey guys, I have tried searching for difference between varchar(255) and tinytext in terms of performance but I haven't found a straight answer to this. Is it because there is no significant difference in performance? I know the difference in terms of spaces being trimmed and case sensitivity.

    I have a table with about 200+ columns. Most of the small text types are tinytext rather than varchar. Is there a good reason to be using one type over the other in terms for performance? Would the order by be faster with varchar than tinytext when the table has a big row length? I read somewhere that varchar is stored right along with the row data on disk while tinytext is stored else where on disk and such(I'm not sure about that).

    Any thoughts on this? Thanks.

  • #2
    The problem is with TMP tables those with TEXT/BLOB type columns will always be disk based, and not fast memory based, thus taking a performance hit.

    In MySQL versions 5.0.3+ you can have CHAR types upto 65535 in length.

    Comment


    • #3
      Thanks for the response. I have been messing around this a bit more. Have a column called "region" and it was tinytext. Also created an index for it region(100). But I couldn't get mysql to use the index for a simple order by query (even with force index):-

      select * from table order by region

      The explain for it always used filesort. Am i missing something here? Is there a way you can force mysql to use the index that I created for tinytext type?

      So then I duplicated that column and made it a varchar(100) instead and created another index region_varchar(100) on that new column. and issued the query again:-

      select * from table order by region_varchar

      Again it did not use the index and resorted to filesort. But it did work when i specified force index. Then I modified the index to use only first 10 characters region_varchar(10). Now again I couldn't get mysql to use the index even with force index.

      The only other way I could force it to use the index was to put in a where clause on that column.

      So I'm missing something in the way mysql uses indexes and am not sure what that is...I'm kind of new to this stuff so if anyone could point out what I'm missing, its much appreciated.

      Thanks.

      Comment


      • #4
        Sounds to me like a cardinality problem.

        Basically meaning there's not enough variance in the index justifying MySQL to use it.

        For example, you have a region column and 99% of the values is "England" then MySQL will chose full table scan.

        By using a INDEX prefix of only 10 CHARS you're further decreasing variables in the index tree.

        Example (using INDEX(6) to make the point easier):

        'United Kingdom' INDEX(6) => 'United',
        'United States' INDEX(6) => 'United',
        'United Arab Emirates' INDEX(6) => 'United'

        No variance there, why use the index?

        Comment


        • #5
          Sorry the name "region" is bit misleading but thats the way it was when I got it. It has a bunch of codes and stuff embedded in it but it has a high cardinality (show index shows something 140K cardinality for first 10 char index out of a total of 150K rows in the table).

          Comment


          • #6
            Hmm,

            This doesn't really make sense to me then, if there is high cardinality then it would be very efficient to resort to an index lookup over that of a full table scan.

            Posting the table schema, a sample result and a sample query would help give a better responce.

            Comment


            • #7
              First of all thanks for looking into this.
              Alright, to keep things simple here's the sample table I'm using to test :-


              CREATE TABLE `foo` ( `id` int(11) unsigned NOT NULL auto_increment, `name` tinytext NOT NULL, `name_varchar` varchar(100) NOT NULL default '', PRIMARY KEY (`id`), UNIQUE KEY `name_index` (`name`(100)), KEY `name_varchar_index` (`name_varchar`(7))) ENGINE=MyISAM DEFAULT CHARSET=latin1 ROW_FORMAT=FIXED;


              name and name_varchar contains the same data which are 7 digit numbers(in the form of strings).


              mysql> select * from foo limit 5;+----+---------+--------------+| id | name | name_varchar |+----+---------+--------------+| 1 | 1550620 | 1550620 || 2 | 1554011 | 1554011 || 3 | 1554864 | 1554864 || 4 | 1571814 | 1571814 || 5 | 1582609 | 1582609 |+----+---------+--------------+5 rows in set (0.00 sec)mysql> select count(*) from foo;+----------+| count(*) |+----------+| 131951 |+----------+1 row in set (0.00 sec)mysql> show index from foo;+-------+------------+--------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |+-------+------------+--------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+| foo | 0 | PRIMARY | 1 | id | A | 131951 | NULL | NULL | | BTREE | || foo | 0 | name_index | 1 | name | A | 131951 | 100 | NULL | | BTREE | || foo | 1 | name_varchar_index | 1 | name_varchar | A | 131951 | 7 | NULL | | BTREE | |+-------+------------+--------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+3 rows in set (0.00 sec)mysql> explain select * from foo order by name;+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+1 row in set (0.00 sec)mysql> explain select * from foo order by name_varchar;+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+1 row in set (0.00 sec)mysql> explain select * from foo force index(name_varchar_index) order by name_varchar;+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+1 row in set (0.00 sec)


              I have no idea why its using filesort.

              Also like I mentioned earlier the only way i got it to use the index is by using a where clause


              mysql> explain select * from foo force index(name_varchar_index) where name_varchar='blah' order by name_varchar;+----+-------------+-------+------+--------------------+--------------------+---------+-------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+--------------------+--------------------+---------+-------+------+-------------+| 1 | SIMPLE | foo | ref | name_varchar_index | name_varchar_index | 7 | const | 1 | Using where |+----+-------------+-------+------+--------------------+--------------------+---------+-------+------+-------------+1 row in set (0.00 sec)

              Comment


              • #8
                short answer from what i understand is:

                mysql sorts by creating tmp tables

                tmp tables dont support text/blob columns

                mysql will then use a filesort for those queries

                using a where does not require a tmp table to reorder the data, and therefor would not trigger a filesort.


                That's my understanding. milage may vary

                Comment


                • #9
                  myrddin,

                  Are all values numbers?

                  Are they within the integer range? (0 to 2^32 - 1)

                  Comment


                  • #10
                    Quote:


                    mysql sorts by creating tmp tables

                    tmp tables dont support text/blob columns

                    mysql will then use a filesort for those queries



                    So if any of the columns in the table is of text/blob it will use filesort (regardless of whether the sort by column is of type varchar and has been indexed?). Well I tried creating a table with only varchar type columns and I still got the same result.

                    Quote:


                    Are all values numbers?

                    Are they within the integer range? (0 to 2^32 - 1)


                    Yes. However after you posted that, I altered the test table with:-
                    update foo set name_varchar=concat('a',name_varchar);
                    So its alpha-numeric. However I still got the same results as before. Maybe you were thinking of another issue?

                    Comment


                    • #11
                      In that case is it absolutely neccessary to use a CHAR type over int type?

                      Int type will require 4 bytes for each int.

                      Say you have 123456789, as int type, 4 bytes, as varchar 9 bytes.

                      Also, problems occur on comparisons, e.g.

                      WHERE id=3

                      Will cause problems for the SQL interpreter, you have to be aware of the required use of quotes on VAR/CHAR columns, e.g.

                      WHERE id='3'

                      Comment


                      • #12
                        yea I know there are differences between the numeric/text types but I just used numbers just for that test table. I will have various types varchar/int/text etc in the actual table but for this test case I'm just using this so I can figure out why mysql isn't using the indexes for order by.

                        Also I just did this:-

                        mysql> explain select * from foo order by id ;+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+1 row in set (0.00 sec)


                        Now id is an auto-increment primary key so should be indexed and its still using filesort? This is suggesting to me that mysql always uses filesort for order by. Can you maybe verify this on some table on your machine? If it is the case that mysql always uses filesort for order by then thats probably what I missed. But that would be quite odd...

                        Comment


                        • #13
                          Update: Ignore the above post about the id column. I created another column of type int and created an index on that and it worked ok.


                          mysql> explain select * from foo force index(name_int_index) order by name_int;+----+-------------+-------+-------+---------------+----------------+---------+------+--------+-------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------+---------------+----------------+---------+------+--------+-------+| 1 | SIMPLE | foo | index | NULL | name_int_index | 5 | NULL | 131951 | |+----+-------------+-------+-------+---------------+----------------+---------+------+--------+-------+1 row in set (0.00 sec)


                          So...it would seem the indexes aren't being used for simple order by when dealing with columns of type VARCHAR and TEXT. I still dont understand why it doesn't use it...since text type cols are quite common.

                          Comment


                          • #14
                            The problem is basically a combination of
                            1. Choosing all columns with a "SELECT * "
                            2. Choosing all rows since you don't have a WHERE clause.

                            Compare these:

                            mysql> explain select * from foo order by id;+----+-------------+-------+------+---------------+------+---------+------+------+----------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+------+----------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 2000 | Using filesort |+----+-------------+-------+------+---------------+------+---------+------+------+----------------+1 row in set (0.00 sec)mysql> explain select id from foo order by id;+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+| 1 | SIMPLE | foo | index | NULL | PRIMARY | 4 | NULL | 2000 | Using index |+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+1 row in set (0.00 sec)

                            Notice that the second query with only id column uses index while the first don't.

                            And these:

                            mysql> explain select * from foo order by id;+----+-------------+-------+------+---------------+------+---------+------+------+----------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+------+----------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 2000 | Using filesort |+----+-------------+-------+------+---------------+------+---------+------+------+----------------+1 row in set (0.00 sec)mysql> explain select * from foo where id < 100 order by id;+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+| 1 | SIMPLE | foo | range | PRIMARY | PRIMARY | 4 | NULL | 98 | Using where |+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+1 row in set (0.00 sec)

                            Notice how when we limit the amount of rows returned that it starts to use a index even though I still have SELECT *.


                            The reason for this is that using an index is actually expensive in it's own way.

                            If you are scanning an index and all columns that you want to return is part of the index it means that it can return the data right away and you have no extra cost (performance wise).

                            But if you are selecting columns that are _not_ part of the index that is used. It means that the DBMS has to read the index and then jump to the correct row in the table to return the data from the missing columns.
                            So each row read = one jump to the table.
                            And the big problem with this is that random reads against disk devices is very expensive compared to sequential reads (the whole table in one go).

                            So somewhere there is a trade off where there is more expensive to use an index than it is to read in the entire table, sort it in RAM memory and return it.

                            It's hard to really say where this limit is depending on your DB size and hardware. But as a rule of thumb I usually say that when you return more than somewhere around 5-10% of the rows you should start to think about it.

                            So that is why you get strange results with your explain.

                            BTW filesort doesn't necessarily mean that it is written to disk. That depends on the size of your sort_buffer and the size of the result set returned by the query.
                            Only when the sort_buffer is full it will start to write temporary files on disk.

                            Comment


                            • #15
                              Quote:


                              The problem is basically a combination of
                              1. Choosing all columns with a "SELECT * "


                              That is actually the first thing I tried (specifying the reqd cols only) after I started seeing this issue but:-

                              mysql> explain select name_varchar FROM foo force index (name_varchar_index) order by name_varchar;+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |+----+-------------+-------+------+---------------+------+---------+------+--------+----------------+1 row in set (0.00 sec)

                              It doesn't seem to matter if I specify the columns or use *, I get the same result for explain and its using filesort. However it does seem to work for INT columns which is what "id" is. As before mysql seems to use the index for int cols and not varchar and text.

                              Quote:


                              BTW filesort doesn't necessarily mean that it is written to disk. That depends on the size of your sort_buffer and the size of the result set returned by the query.
                              Only when the sort_buffer is full it will start to write temporary files on disk.


                              Quote:


                              From the ini:-
                              # This buffer is allocated when MySQL needs to rebuild the index in
                              # REPAIR, OPTIMZE, ALTER table statements as well as in LOAD DATA INFILE
                              # into an empty table. It is allocated per thread so be careful with
                              # large settings.
                              sort_buffer_size=64M

                              From the mysql manual:-
                              If you want to increase ORDER BY speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:

                              Increase the size of the sort_buffer_size variable.

                              Increase the size of the read_rnd_buffer_size variable.

                              Change tmpdir to point to a dedicated filesystem with large amounts of empty space. This option accepts several paths that are used in round-robin fashion. Paths should be separated by colon characters (‘:’) on Unix and semicolon characters (‘;’) on Windows, NetWare, and OS/2. You can use this feature to spread the load across several directories. Note: The paths should be for directories in filesystems that are located on different physical disks, not different partitions on the same disk.



                              From what I understood, that variable is only used if mysql can't use the index that I created and thats whats puzzling me as to why it isn't using the index for varchar and text cols. And if my table is really big, it cant quite fit all the indexes in memory and it will most likely start writing to disk and in any case the index rebuilding will take time until its cached and hence the first query will always be slow.

                              If you have more suggestions on why this is happening please post away. I'm all ears at this point to understand how this thing works.

                              Comment

                              Working...
                              X