GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

load data overhead

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • load data overhead

    I'm trying to restore data from text file (been generated as 'select ... into outfile ...') in a Percona server version: 5.5.28-29.3.

    I launched query 'load data infile '...' ignore into table tbl_new', but it takes already 5 days and I can't understend what happening?

    The 'old' table takes 38.6 GiB (information of phpmyadmin), data file size is 18.7 GiB, but the new table already takes 46.5 GiB and query still in progress!

    In adddition I chenged table schema of the new one and drop primary key autoincrement (int(11)) and set to primary the new field `hash_d` - bigint(20), but the old table has both fields and id - autoincrement (int(11)) and `hash_d` - bigint(20). Table size about 220M rows.


    CREATE TABLE `tbl_old` ( `id_d` int(11) unsigned NOT NULL AUTO_INCREMENT, `hash_d` bigint(20) NOT NULL DEFAULT '0', `name_d` varchar(300) NOT NULL, `id_z` smallint(5) unsigned NOT NULL, `hash_p` bigint(20) NOT NULL DEFAULT '0', `ins_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `st_b` smallint(5) unsigned NOT NULL, `s_type` int(11) NOT NULL DEFAULT '0', `id_st` int(11) DEFAULT '0', `r_code` smallint(3) NOT NULL DEFAULT '-1', PRIMARY KEY (`id_d`), UNIQUE KEY `idx_hd` (`hash_d`), KEY `id_st` (`id_st`), KEY `id_z` (`id_z`), KEY `idx_st_b` (`st_b`), KEY `idx_p` (`hash_p`)) ENGINE=InnoDB DEFAULT CHARSET=utf8CREATE TABLE `tbl_new` ( `hash_d` bigint(20) NOT NULL DEFAULT '0', `name_d` varchar(300) NOT NULL, `id_z` smallint(5) unsigned NOT NULL, `hash_p` bigint(20) NOT NULL DEFAULT '0', `ins_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `st_b` smallint(5) unsigned NOT NULL, `s_type` int(11) NOT NULL DEFAULT '0', `id_st` int(11) DEFAULT '0', `r_code` smallint(3) NOT NULL DEFAULT '-1', PRIMARY KEY (`hash_d`), KEY `id_st` (`id_st`), KEY `id_z` (`id_z`), KEY `idx_st_b` (`st_b`), KEY `idx_p` (`hash_p`)) ENGINE=InnoDB DEFAULT CHARSET=utf8

  • #2
    Can you show us the output of show engine innodb status and show full processlist?

    If you have changed the PK from INT to BIGINT then the table will be bigger in size. INT is 4 bytes and BIGINT is 8 bytes. Also, take in account that in InnoDB every secondary key has the value of the PK attached to it. So now, every KEY will have a hidden second column with a BIGINT of 8 bytes.

    Comment


    • #3
      Thanks alot!
      This fact, about hidden column in index, I did not know. Unfortunatly I can't show engine status because the query already finished. But when I tested different types of data and structures, I got contradictory results. The stuctures above are good example of your explanation of key's structure, but when I tryed anothe structires, like

      REATE TABLE `tbl_int_key` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `field_1` bigint(20) NOT NULL DEFAULT '0', `field_2` bigint(20) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), UNIQUE KEY `idex_fiealds` (`field_1`,`field_2`), KEY `idx_domainTo` (`field_2`,`field_1`)) ENGINE=InnoDB DEFAULT CHARSET=utf8CREATE TABLE `tbl_d_bigint_key` ( `field_1` bigint(20) NOT NULL DEFAULT '0', `field_2` bigint(20) NOT NULL DEFAULT '0', PRIMARY KEY (`field_1`,`field_2`), KEY `idx_domainTo` (`field_2`,`field_1`)) ENGINE=InnoDB DEFAULT CHARSET=utf8


      The first one (tbl_int_key) table was almost in 40% bigger then the second one (tbl_d_bigint_key), whenever the primary key must be bigger in the second table (tbl_d_bigint_key).

      Is the answer lies in the quantity of keys in the tables?

      Comment


      • #4
        Yes, the more secondary indices, the larger the difference.

        But also take the page fill rate into account. The whole database and its indices are stored in 16 kB pages. Row updates, and insertions in the middle of the table cause a less than ideal fill rate (close to 15 kB per 16 kB page). The table is sorted on primary key, so your first table has a very good fill rate since new rows are always inserted at the end of the table (due to your auto_increment column).
        The size of your tbl_d_bigint_key table may be temporarily reduced by running ALTER TABLE tbl_d_bigint_key ENGINE = InnoDB; This causes a table rebuild in which the pages get an ideal fill rate. But as soon as you start inserting new rows, pages get splitted again and the table size slowly increases to its current size.

        Comment


        • #5
          But the table tbl_d_bigint_key wasn't altered. Both table was filled by load data infile and query time was quite different too, load data to tbl_d_bigint_key was faster in two times than load data to tbl_int_key. This may explained by the size of table, tbl_d_bigint_key - equal to my memory size and tbl_int_key - bigger than my memory size on 10GiB.
          Do I understand correctly, that it is better when every table has autoincrement primary key? Or it depends of structure?

          Comment


          • #6
            How was the data in the file ordered? If it was ordered on field1,field2, you still insert at maximum efficiency.

            What is preferable depends on data size, number of indices, etc. Usually, a small PK and smaller tables are better but a long PK may also be advantageous if queries use the primary key (as compared to queries that use a secondary key (that is not a covering key)). I'd say it's try-and-see.

            Comment


            • #7
              Thanks alot! You help me to understand mysql indexes a little more. Right from this moment I will review all my queries and table structures.

              Comment

              Working...
              X