simple performance question

  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    it' still running


    • #17
      Thats not entirely surprising. Ensure you don't run out of disk space on the server while it rebuilds the table.

      This would be a good time to look over the documentation I referenced a few posts up.


      • #18
        Thanks for your help jcesario. I will let you know how it goes.


        • #19
          It's now 4 and a half hours to try to add the index to a table with ~300 million rows. You have to admit that it's kind of ridiculous. Databases are designed to handle large amounts of data.


          • #20
            The idea of performance can be relative and somewhat amorphous at times....

            Based on your current schema: I'd expect
            SELECT id1,id2 from table where id1=4;
            to perform very well.

            While I would expect
            SELECT COUNT(*) FROM tablea
            to perform relatively poorly.

            I think you can expect some significant performance benefits with some work.
            Ill try to offer an explanation for the problematic responsiveness and some recommendations on how to remedy them:

            An ALTER TABLE can take 4+ hours because MySQL (and Innodb), except where online operations are allowed, will rebuild the entire table. Rewriting the entire 100GB back to disk and then swapping it in, there's some other work involved as well.

            For this reason, when dealing with large amounts of data, it pays dividends to plan your access patterns before the data is actually populated.

            That operation is most likely going to be bottlenecked by the IO subsystem. Examine the performance of your disks and RAID controller, ensuring the RAID controller is taking advantage of its BBU (assuming it has one).

            Tools like iostat and pt-diskstats can help you measure current IO load, while benchmarking suites like sysbench and iozone can help measure theoretical maximums.

            Your SELECT `Distance`...GROUP BY `Distance` is going to perform poorly until it has an index to run on. Relational databases depend heavily on indexes to accelerate read operations. Indexes do come at a cost though, so they must be applied intelligently. When indexes are not available, displayed by the explain output, rows will most likely be read from disk; again, limited by IO subsystem capacity.

            The SELECT COUNT(*) query unfortunately exposes a weakness in B-Trees, which Innodb uses to organize and store its data. Getting a total count requires walking and counting the entire tree, other databases and storage engines have different properties. MyISAM can perform this operation very quickly, but comes with its own set of drawbacks.
            There are multiple ways to mitigate this, most of them at a variance of cost of speed, accuracy, and complexity. I believe the book "High Performance MySQL" describes some of them in detail.

            Also, there are operational techniques to mitigate the impact of DDL on tables themselves. For example, applying the indexes on a slave or using pt-online-schema-change, documented in more detail in the book mentioned above.


            • #21
              Why is distance a varchar and not a numerical type? It would make index creation much faster.