GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Big table, 340+ million rows, 42G table, SELECT has taken days,

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Big table, 340+ million rows, 42G table, SELECT has taken days,

    Hi,

    Any idea how can I improve the query performance for this situation?

    I have a big table with 340 million rows, MyISAM, denormalized, as below:
    Code:
    +---------------------+--------------------------------------------+------+-----+------------------+-------+
    | Field               | Type                                       | Null | Key | Default          | Extra |
    +---------------------+--------------------------------------------+------+-----+------------------+-------+
    | message_id          | mediumint(8)                               | NO   | PRI | -1               |       |
    | message_id_binary   | binary(16)                                 | NO   | PRI | 0                |       |
    | UDH                 | varbinary(16)                              | NO   | PRI |                  |       |
    | mo_mt               | enum('MT','MO','MSF','MCB','PSF','NONSUB') | NO   |     | MT               |       |
    | shortcode           | varchar(16)                                | NO   |     |                  |       |
    | msisdn              | varchar(32)                                | NO   |     |                  |       |
    | operator_code       | varchar(32)                                | NO   |     |                  |       |
    | operator_name       | varchar(32)                                | NO   |     |                  |       |
    | gateway             | enum('CM','Globway','MIF','Jet')           | NO   |     | CM               |       |
    | content_type        | varchar(20)                                | NO   |     |                  |       |
    | tariff              | mediumint(8)                               | NO   |     | 0                |       |
    | country             | varchar(2)                                 | NO   | MUL |                  |       |
    | send_or_received_dt | datetime                                   | NO   |     | NULL             |       |
    | scenario_id         | varchar(32)                                | NO   |     |                  |       |
    | status_code         | mediumint(8)                               | NO   |     | 0                |       |
    | final_status_dt     | datetime                                   | NO   |     | NULL             |       |
    +---------------------+--------------------------------------------+------+-----+------------------+-------+
    With indexes on below fields:
    Code:
    +-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    | Table | Non_unique | Key_name  | Seq_in_index | Column_name       | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
    +-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    | t     |          0 | UNIQ      |            1 | message_id        | A         |       18476 |     NULL | NULL   |      | BTREE      |         |               |
    | t     |          0 | UNIQ      |            2 | message_id_binary | A         |    68476650 |     NULL | NULL   |      | BTREE      |         |               |
    | t     |          0 | UNIQ      |            3 | UDH               | A         |   342383253 |     NULL | NULL   |      | BTREE      |         |               |
    | t     |          1 | extractor |            1 | country           | A         |        NULL |     NULL | NULL   |      | BTREE      |         |               |
    | t     |          1 | extractor |            2 | final_status_dt   | A         |        NULL |     NULL | NULL   |      | BTREE      |         |               |
    +-------+------------+-----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
    The following query is running for 697,554 seconds, and it's still in "Sending date" stage:
    Code:
    SELECT
        t.shortcode                             AS shortcode,
        ''                                      AS keyword,
        ''                                      AS service_name,
        t.operator_code                         AS operator_code,
        t.operator_name                         AS operator_name,
        t.gateway                               AS gateway_name,
    
        'Undefined'                             AS frequency_type,
        'Undefined'                             AS content_type,
    
        IF(t.tariff = 0, 'Free', 'Premium')     AS billing_status,
        t.tariff                                AS tariff_code,
        UPPER(t.mo_mt)                          AS transaction_type,
        'Content MT'                            AS message_type,
    
        t.send_or_received_dt                   AS send_dt,
        t.final_status_dt                       AS update_dt,
        CASE
            WHEN (t.status_code = 20)            THEN 'Delivered'
            WHEN (t.status_code IN (19, 37, 22)) THEN 'Pending'
            ELSE 'Failed'
        END                                     AS dnstatus_type,
    
        t.msisdn                                AS customer_code,
        990000                                       AS etl_code,
        0                                       AS subscription_id,
        CONCAT(t.message_id, '_', HEX(t.message_id_binary), '_', HEX(t.UDH)) AS `hash`
    FROM
        cmlog.t
    WHERE
        t.country = 'ES' AND
        t.final_status_dt > '2012-01-01 00:00:00'
    ORDER BY
        t.final_status_dt ASC
    This is MySQL 5.6, running on CentOS, with below memory stat:
    Code:
    free -m
                 total       used       free     shared    buffers     cached
    Mem:         22137      21950        187          0        151      10533
    -/+ buffers/cache:      11265      10872
    Swap:         8191       1091       7100
    and this is /etc/my.cnf:
    Code:
    # General
    port                            = 3306
    socket                          = /var/lib/mysql/mysql.sock
    datadir                         = /var/lib/mysql
    character-set-server            = utf8
    user                            = mysql
    symbolic-links                  = 0
    back_log                        = 50
    max_connections                 = 100
    max_connect_errors              = 999999999
    table_open_cache                = 4096
    max_allowed_packet              = 32M
    thread_cache_size               = 128
    open_files_limit                = 32768
    sql_mode            = NO_ENGINE_SUBSTITUTION
    default-storage-engine        = MyISAM
    skip-name-resolve
    slave-net-timeout        = 300
    
    # Threading
    thread_handling                 = one-thread-per-connection
    thread_stack                    = 256K
    
    # Query
    query_cache_type                = 0
    query_cache_size                = 0M
    query_cache_limit               = 1M
    query_cache_min_res_unit        = 4K
    
    # MYISAM
    key_buffer_size                 = 10240M
    read_buffer_size                = 1M
    read_rnd_buffer_size            = 2M
    bulk_insert_buffer_size         = 32M
    join_buffer_size                = 1M
    sort_buffer_size                = 4M
    myisam_sort_buffer_size         = 8192M
    tmp_table_size                  = 1024M
    max_heap_table_size        = 1024M
    myisam_repair_threads           = 8
    myisam-recover                  = DEFAULT
    The data on disk:
    Code:
    -rw-rw----. 1 mysql mysql  42G May  4 15:30 t.MYD
    -rw-rw----. 1 mysql mysql  22G May  4 15:30 t.MYI
    I googled around to find some ways to improve this, first, by preloading the index to memory using:
    Code:
    load index into cache t ignore leaves;
    OR by simply IGNORE INDEX, as 90% of the data in this table, belong to country="ES". But I cannot try these experiments one by one, because each will take days, so I'm looking for more advices before going on.

    Any idea how to improve this?

  • #2
    Try to recreate extractor index with country and final_status_dt columns swapped.

    Comment

    Working...
    X