November 1, 2014

Why MySQL could be slow with large tables ?

If you’ve been reading enough database related forums, mailing lists or blogs you probably heard complains about MySQL being unable to handle more than 1.000.000 (or select any other number) rows by some of the users. On other hand it is well known with customers like Google, Yahoo, LiveJournal,Technocarati MySQL has installations with many billions of rows and delivers great performance. What could be the reason ?

The reason is normally table design and understanding inner works of MySQL. If you design your data wisely considering what MySQL can do and what it can’t you will get great perfomance if not, you might become upset and become one of thouse bloggers. Note – any database management system is different in some respect and what works well for Oracle,MS SQL, PostgreSQL may not work well for MySQL and other way around. Even storage engines have very important differences which can affect performance dramatically.

The three main issues you should be concerned if you’re dealing with very large data sets are Buffers, Indexes and Joins.

Buffers
First thing you need to take into account is the fact – situation when data fits in memory and when it does not are very different. If you started from in-memory data size and expect gradual performance decrease as database size grows you may be surprised by serve drop in performance. This especially apples to index lookus and joins which we cover later. As everything usually slows down a lot once it does not fit in memory the good solution is to make sure your data fits in memory as good as possible. This could be done by data partitioning (ie old and rarely accessed data stored in different servers), multi-server partitioning to use combined memory and a lot of other technics which I should cover at some later time.

So you understand how much having data in memory changed things here is small example with numbers. If you have your data fully in memory you could perform over 300.000 of random lookups per second from single thread depending on system and table structure. Now if you data fully on disk (both data and index) you would need 2+ IOs to retrieve the row which means you get about 100 rows/sec. Note multiple drives do not really help a lot as we’re speaking about single thread/query here. So difference is 3.000 times! It might be a bit too much as there are few completely uncached workloads but 100+ times difference is quite frequent.

Indexes
What everyone knows about indexes is the fact they are good to speed up accesses to database. Some people would also remember if indexes are helpful or not depends on index selectivity – how large proportion of rows matches to particular index value or range. What is often forgotten about is – depending if workload is cached or not different selectivity might show benefit from using indexes. In fact even MySQL optimizer currently does not take it into account. For In memory workload index accesses might be faster even if 50% of rows are accessed, while for disk IO bound accessess we might be better of doing full table scan even if only few percent or rows are accessed.

Lets do some computations again. Consider table which has 100 byte rows. With decent SCSI drive we can get 100MB/sec read speed which gives us about 1.000.000 rows per second for fully sequential access, jam packed rows – quite possible scenario for MyISAM tables. Now if we take the same hard drive for fully IO bound workload it will be able to provide just 100 row lookups by index pr second. The difference is 10.000 times for our worse case scenario. It might be not that bad in practice but again it is not hard to reach 100 times difference.

Here is little illustration I’ve created the table with over 30 millions of rows. “val” column in this table has 10000 distinct value, so range 1..100 selects about 1% of the table. The times for full table scan vs range scan by index:

Also remember – not all indexes are created equal. Some indexes may be placed in sorted way or pages placed in random places – this may affect index scan/range scan speed dramatically. The rows referenced by indexes also could be located sequentially or require radom IO if index ranges are scanned. There are also clustered keys in Innodb which combine index access with data access, saving you IO for completely disk bound workloads.

There are certain optimizations in works which would improve performance of index accesses/index scans. For example retrieving index values first and then accessing rows in sorted order can be a lot of help for big scans. This will reduce the gap but I doubt it will be closed.

Joins
Joins are used to compose the complex object which was previously normalized to several tables or perform complex queries finding relationships between objects. Normalized structure and a lot of joins is right way to design your database as textbooks teach you… but when dealing with large data sets it could be recepie to disaster. The problem is not the data size – normalized data normally becomes smaller, but dramatically increased number of index lookups which could be random accesses. This problem exists for all kinds of applications, however for OLTP applications with queries examining only few rows it is less of the problem. Data retrieval, search, DSS, business intelligence applications which need to analyze a lot of rows run aggregates etc is when this problem is the most dramatic.

Some joins are also better than others. For example if you have star join with dimention tables being small it would not slow things down too much. On other hand join of few large tables, which is completely disk bound can be very slow.

One of the reasons elevating this problem in MySQL is lack of advanced join methods at this point (the work is on a way) – MySQL can’t do hash join or sort merge join – it only can do nested loops method which requires a lot of index lookups which may be random.

Here is good example. As we saw my 30mil rows (12GB) table was scanned in less than 5 minutes. Now if we would do eq join of the table to other 30mil rows table and it will be completely random. We’ll need to perform 30 millions of random row reads, which gives us 300.000 seconds with 100 rows/sec rate. So we would go from 5 minutes to almost 4 days if we need to do the join. Some people assume join would be close to two full table scans (as 60mil of rows need to be read) – this is way wrong.

Do not take me as going against normalization or joins. It is great principle and should be used when possible. Just do not forget about performance implications designing the system and do not expect joins to be be free.

Finally I should mention one more MySQL limitation which requires you to be extra careful working with large data sets. In MySQL single query runs as single thread (with exeption of MySQL Cluster) and MySQL issues IO requests one by one for query execution, which means if single query execution time is your concern many hard drives and large number of CPUs will not help. Sometimes it is good idea to manually split query into several, run in parallel and aggregate result sets.

So if you’re dealing with large data sets and complex queries here are few tips

Try to fit data set you’re working with in memory – Processing in memory is so much faster and you have whole bunch of problems solved just doing so. Use multiple servers to host portions of data set. Store portion of data you’re going to work with in temporary table etc.

Prefer full table scans to index accesses – For large data sets full table scans are often faster than range scans and other types of index lookups. Even if you look at 1% or rows or less full table scan may be faster.

Avoid joins to large tables Joining of large data sets using nested loops is very expensive. Try to avoid it. Joins to smaller tables is OK but you might want to preload them to memory before join so there is no random IO needed to populate the caches.

With proper application architecture and table design you can build applications operating with very large data sets based on MySQL.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. L says:

    Great article, gave me some good pointers.

    Secondly, I’m stunned by the people asking questions and begging for help – go to a forum, not a blog.

    Thanks again

  2. Good post Peter..

    just a couple of questions to clarify somethings.

    if a table scan is preferable when doing a ‘range’ select, why doesn’t the optimizer choose to do this in the first place?

    is there some sort of rule of thumb here.. use a index when you expect your queries to only return X% of data back? otherwise put a hint in your SQL to force a table scan ?

    what changes are in 5.1 which change how the optimzer parses queries..

    does running optimize table regularly help in these situtations? or just when you have a large change in your data distribution in your table?

    oh.. one tip for your readers.. always run ‘explain’ on a fully loaded database to make sure your indexes are being used

  3. peter says:

    Ian,

    Right. In theory optimizer should know and select it automatically. In fact it is not smart enough. It also simply does not have the data available – is given index (range) currently in memory or will it need to read it from the disk ? How much index is fragmented ? How random accesses would be to retrieve the rows. Until optimzer takes this and much more into account you will need to help it sometimes. I did not mentioned it in the article but there is IGNORE INDEX() hint to force full table scan.

    There is no rule of thumb. As you could see in the article in the test I’ve created range covering 1% of table was 6 times slower than full table scan… which means at about 0.2% table scan is preferable. In other cases especially for cached workload it can be as much as 30-50%.

    In MySQL 5.1 there are tons of little changes. What change you’re speaking about ?

    OPTIMIZE helps for certain problems – ie it sorts indexes themselves and removers row fragmentation (all for MYISAM tables). It however can’t make row retrieval which is done by index sequential one. One more hint if you have all your ranges by specific key ALTER TABLE … ORDER BY key would help a lot.

    Yes. Runing explain is good idea. Also do not forget to try it out for different constants – plans are not always the same.

  4. Vadim says:

    Ian,
    as I wrote in http://www.mysqlperformanceblog.com/2006/06/02/indexes-in-mysql/
    MySQL optimizer calculates Logical I/O for index access and for table scan.
    One big mistake here, I think, MySQL makes assumption 100 key comparison
    like ” if (searched_key == current_key)” is equal to 1 Logical I/O.
    But I believe on modern boxes constant 100 should be much bigger.

  5. Alexey says:

    For most workloads you’ll always want to provide enough memory to key cache so its hit ratio is like 99.9%. Even if a table scan looks faster than index access on a cold-cache benchmark, it doesn’t mean that it’s a good idea to use table scans. It can easily hurt overall system performance – by trashing OS disk cache, and if we compare table scan on data cached by OS and index scan on keys cached by MySQL, table scan uses more CPU (because of syscall overhead and possible context switches due to syscalls).
    Just an opinion. :)

  6. peter says:

    Alexey,

    You probably missunderstood this article. I’m writing about working with large data sets, these are then your tables and your working set do not fit in memory. This is the case then full table scan will actually require less IO than using indexes. And this is when you can’t get 99.99% keycache hit rate. Not to mention keycache rate is only part of the problem – you also need to read rows which might be much larger and so not so well cached.

    As you probably seen from the article my first advice is to try to get your data to fit in cache. If it is possible you instantly will have half of the problems solved. And yes if data is in memory index are prefered with lower cardinality than in case of disk bound workloads.

  7. Aaron says:

    We’ve got 20,000,000 bank loan records we query against all sorts of tables. Everything is real real slow. We’re using LAMP. I could send the table structures and queries/ php cocde that tends to bog down.

  8. peter says:

    Hi Aaron,

    Please feel free to send it to me to pz at mysql performance blog.com. Just do not forget EXPLAIN for your queries and if you have php to load it up with some “random” data which is silimar to yours that would be great.

    We should take a look at your queries to see what could be done.

  9. Paul says:

    I implemented a simple logging of all my web sites access to make some statistics (sites access per day, ip address, search engine source, search queries, user text entries, …) but most of my queries went way too slow to be of any use last year. The problem started when I got to around 600,000 rows (table size: 290MB).

    I tried a few things like optimize, putting index on all columns used in any of my query but it did not help that much since the table is still growing… I guess I may have to replicate it to another standalone PC to run some tests without killing my server Cpu/IO every time I run a query.

    I am not using any join, I will try the ‘explain’ and the ‘IGNORE INDEX()’ when I have a chance although I don’t think it will help since I added indexes after I saw the problem.

    The index does make it very fast for one of my table on another project (list of all cities in the world: 3 million rows). I guess it’s all about memory vs hard disk access.

    Now my question is for a current project that I am developing. You can think of it as a webmail service like google mail, yahoo or hotmail.

    Right now I am wondering if it would be faster to have one table per user for messages instead of one big table with all the messages and two indexes (sender id, recipient id). This could mean millions of table so it is not easy to test.

    I will probably write a random users/messages generator to create a million user with a thousand message each to test it but you may have already some information on this so it may save me a few days of guess work.

    Any suggestions would be appreciated.

  10. peter says:

    Paul,

    300MB table is tiny. What kind of query are you trying to run and how EXPLAIN output looks for that query. Also what is your MySQL Version ?

    There are some other tricks which you need to consider – for example if you do GROUP BY and number of resulting rows is large you might get pretty poor speed because temporary table is used and it grows large. Using SQL_BIG_RESULT helps to make it use sort instead.

    Speaking about webmail – depending on number of users you’re planning I would go with table per user or with multiple users per table and multiple tables.

    Besides having your tables more managable you would get your data clustered by message owner, which will speed up opertions a lot. Also this means once user logs in and views messages they will be cached in OS cache or MySQL buffers speeding up further work dramatically.

  11. Apachez says:

    8. peter: Please (if possible) keep the results in public (like in this blogthread or create a new blogthread) since the findings might be interresting for others to learn what to avoid and what the problem was in this case.

    11. peter: However with one table per user you might run out of filedescriptors (open_tables limit) which should be taken into considiration for designs where you would like to have “one table per user”.

  12. peter says:

    Apachez – yes this is the point.

    You get free answers to your questions by asking them in this blog (or at MySQL Forums) but other people can benefit from the answers as well.

    Speaking about “table per user” – it does not mean you will run out of file descriptors. “table_cache” is what defines how many tables will be opened and you can configure it independently of number of tables you’re using. You however want to keep value hight in such configuration to avoid constant table reopens. With Innodb tables you also have all tables kept open permanently which can waste a lot of memory but it is other problem. Speaking about “open_file_limit” which limits number of files MySQL can use at the same time – on modern operation systems it is safe to set it to rather high values. I used MySQL with other 100.000 of files opened at the same time with no problems. Shutdown can be long in such case though.

  13. I have a table with 35 mil records. When I wanted to add a column (alter table) I would take about 2 days. Writing my own program in
    c# that prepared a file for import shortened this task to about 4 hours.

  14. peter says:

    Hm. I’m actually quite surprised. ALTER TABLE normally rebuilds indexes by sort, so does LOAD DATA INFILE (Assuming we’re speaking about MyISAM table) so such difference is quite unexpected. Might be for some reason ALTER TABLE was doing index rebuild by keycache in your tests, this would explain it.

    ALTER TABLE and LOAD DATA INFILE should nowever look on the same settings to decide which method to use.

  15. Daniel says:

    I’m just dealing with the same issue with a message system.

    At the moment I have one table (myisam/mysql4.1) for users inbox and one for all users sent items.
    Now the inbox table holds about 1 million row with nearly 1 gigabyte total. The
    sent items is the half.
    At this point it is working well with over 700 concurrent user.
    Now I’m doing a recode and there should be a lot more functions like own folders etc. and the queries will be a lot more complex.
    That’s why I’m now thinking about useful possibilities of designing the message table and about whats the best solution for the future.

    Is it really useful to have an own message table for every user?

    Thanks a lot for any suggestions.

    System: It’s now on a 2xDualcore Opteron with 4GB Ram/Debian/Apache2/MySQL4.1/PHP4/SATA Raid1)
    In near future I will have the Apache on a dedicated machine and the Mysql Server too (and the next step will be a Master/Slave server setup for the database).

  16. peter says:

    Daniel,

    I would surely go with multiple tables. If it should be table per user or not depends on numer of users. For 1000 users that would work but for 100.000 it would be too many tables.

    I would have many to many mapping from users to tables so you can decide how many users you put per table later and I would also use composite primary keys if you’re using Innodb tables so data is clustered by user.

  17. Sergey says:

    Peter,

    I think you can give me some advise. I am running MySQL 4.1 on RedHat Linux. The box has 2GB of RAM, it has dual 2.8GHz Xeon processors, and /etc/my.cnf file looks like this.


    [mysqld]
    ...
    key_buffer = 512M
    max_allowed_packet = 8M
    table_cache = 512
    sort_buffer_size = 32M
    read_buffer_size = 32M
    read_rnd_buffer_size = 128M
    myisam_sort_buffer_size = 256M
    thread_cache = 32
    query_cache_size = 256M

    I am running data mining process that updates/inserts rows to the table (i.e. I do multifield select on indexed fields, and if row is found, I update the data, if not I insert new row). Up to about 15,000,000 rows (1.4GB of data) the procedure was quite fast (500-1000 rows per second), and then it started to slow down. Now I have about 75,000,000 rows (7GB of data) and I am getting about 30-40 rows per second.

  18. peter says:

    Sergey, Would you mind posting your case on our forums instead at
    http://forum.mysqlperformanceblog.com and I’ll reply where. This way more users will benefit from your question and my reply.

  19. Sergey says:
  20. D.ZHU says:

    Hi peter,

    I’m doing a coding project that would result in massive amounts of data (will reach somewhere like 9billion rows within 1 year). supposing i’m completely optimized…

    * If i run a ‘select from where…’ query, how long is the query likely to take?
    * also how long would an insert take?

    I see you have in the example above, 30 millions of rows of data and a select took 29mins!

    * and how would i estimate such performance figures?

  21. peter says:

    D Zhu,

    Answer depends on selectivity at large extent as well as if where clause is matched by index or full scan is performed.

    You also need to consider how wide are rows – dealing with 10 byte rows is much faster than 1000 byte rows.

    Regarding how to estimate I would do some benchmarks and match them against what you would expect and what you’re looking for. There are many design and configuration alternatives to deliver you what you’re looking for

  22. Labus says:

    Normally MySQL is rather fast loading data in MyISAM table, but there is exception, which is when it can’t rebuild indexes by sort but builds them
    row by row instead. It can be happening due to wrong configuration (ie too small myisam_max_sort_file_size or myisam_max_extra_sort_file_size) or
    it could be just lack of optimization, if you’re having large (does not fit in memory) PRIMARY or UNIQUE indexes.

    Below is the internal letter I’ve sent out on this subject which I guessed would be good to share

    Today on my play box I tried to load data into MyISAM table (which was
    previously dumped as mysqldump –tab)

    The data was some 1.3G, 15.000.000 rows, 512MB memory one the box.
    MySQL 4.1.8

    One could could call it trivial fast task, unfortunately I had
    unique key on varchar(128) as part of the schema.

    The load took some 3 hours before I aborted it finding out it was just
    about 20% done.

    The problem is – unique keys are always rebuilt using key_cache, which
    means we’re down to some 100-200 rows/sec as soon as index becomes
    significantly larger than memory.

    You can’t go away with ALTER TABLE DISABLE KEYS as it does not affect
    unique keys.

  23. peter says:

    Labus,

    Yes that is the problem. As I mentioned sometime if you want to have quick build of unique/primary key you need to do ugly hack – create table without the index, load data, replace the .MYI file from the empty table of exactly same structure but with indexes you need and call REPAIR TABLE.

    REPAIR TABLE builds all indexes by sort.

  24. Len says:

    Google may use Mysql but they don’t necessarily have billions of rows – just because google uses MySQL doesn’t mean they actually use it for their search engine results. They have many little sections in their website you know. Laughably they even used PHP for one project. Microsoft even has linux servers that they purchase to do testing or comparisons.

  25. peter says:

    The rumors are Google is using MySQL for Adsense. Yahoo uses MySQL for about anything, of course not full text searching itself as it just does not map well to relational database.

  26. Jay Pipes says:

    @Len: not quite sure what you’re getting at…other than being obtuse.

    If you’d like to know how and what Google uses MySQL for (yes, AdSense, among other things), come to the Users Conference in April (http://mysqlconf.com). 4 Googlers are speaking there, as is Peter. Also, I don’t understand your aversion to PHP… what about using PHP is laughable? Sounds to me you are just flame-baiting…

  27. Jasper says:

    I have made an online dictionary using a MySQL query I found online. It has been working pretty well until today. Now the page loads quite slowly. I wonder how I can optimize my table. The things you wrote here are kind of difficult for me to follow.
    My table has 545 rows, id column, dutch words column, Thai characters, phonetics, is dynamic, has 59 bytes per row, auto-indexes, has a collation: utf8 unicode, data: 25,820 bytes index: 6,144 bytes and total: 31,964 bytes of data. Should I split up the data to load iit faster or use a different structure? Thanks for your suggestions.

  28. peter says:

    Jasper,

    Your table is not large by any means. I’d suggest you to find which query in particular got slow and post it on forums.

  29. Bart R says:

    Hello peter,

    I have a table with a unique key on two columns (STRING, URL). The table contains 36 million rows (Data size 5GB, Index size 4GB).

    I run the following query, which takes 93 seconds !

    SELECT TITLE FROM GRID WHERE STRING = ‘sport';

    When I run the query below, it only takes 0.1 seconds :

    SELECT COUNT(*) FROM GRID WHERE STRING = ‘sport';

    So while the where-clause is the same, the first query takes much more time.
    The slow part of the query is thus the retrieving of the data. I came to this
    conclusion also because the query took longer the more rows were retrieved.

    Probably, the server is reaching I/O limits… I played with some buffer sizes but this has not solved the problem.. Has anyone experience with table size this large ? Is MySQL able to handle tables (MyIsam) this large ?

    Also, is it an option to split this big table in 10 smaller tables ?

    CREATE TABLE GRID (
    ID bigint(20) NOT NULL auto_increment,
    STRING varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL default ”,
    URL varchar(230) character set utf8 collate utf8_unicode_ci NOT NULL default ”,
    LANGUAGE char(2) NOT NULL default ‘EN’,
    COUNTRY char(2) NOT NULL,
    TITLE varchar(255) character set utf8 collate utf8_unicode_ci NOT NULL default ”,
    DESCRIPTION text character set utf8 collate utf8_unicode_ci,
    POINTS decimal(10,2) NOT NULL default ‘0.00’,
    PRIMARY KEY (ID),
    UNIQUE KEY string (STRING,URL)

  30. peter says:

    Why are you surprised ? COUNT(*) query is index covered so it is expected to be much faster as it only touches index and does sequential scan.

  31. Nick says:

    Peter,
    I have similar situation to the message system, only mine data set would be even bigger. I have several data sets and each of them would be around 90,000,000 records, but each record has just a pair of IDs as compository primary key and a text, just 3 fields.
    Number of IDs would be between 15,000 ~ 30,000 depends of which data set. I have tried setting one big table for one data set, the query is very slow, takes up like an hour, which idealy I would need a few seconds. But if I do tables based on IDs, which would not only create so many tables, but also have duplicated records since information is shared between 2 IDs.
    What would be the best way to do it?

  32. Jacob says:

    Hello,

    I am building a statistics app that will house 9-12 billion rows. I am trying to use Mysql Clustering, to the ndbcluster engine. Hardware is not an issue, that is to say I can get whatever hardware I need to do the job. Select times are reasonable, but insert times are very very very slow. As an example, in a basic config using MyISM tables I am able to insert 1million rows in about 1-2 min. The Hardware servers I am testing on are 2.4G Xeon CPU with a 1GB RAM and a Gig network. However, with ndbcluster the exact same inserts are taking more than 15 min. Anyone have any ideas on how I can make this faster?

  33. Martin says:

    Im working on a project which will need some tables with about 200-300 million rows.
    Each row consists of 2x 64 bit integers.
    The table structure is as follows:
    CREATE TABLE z_chains_999 (
    startingpoint bigint(8) unsigned NOT NULL,
    endingpoint bigint(8) unsigned NOT NULL,
    PRIMARY KEY (startingpoint,endingpoint)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=FIXED;

    My problem is, as more rows are inserted, the longer time it takes to insert more rows.
    I insert rows in batches of 1.000.000 rows.
    First insertion takes 10 seconds, next takes 13 seconds, 15, 18, 20, 23, 25, 27 etc. I fear when it comes up to 200 million rows.
    Removing the PRIMARY KEY stops this problem, but i NEED IT.. Any suggestions what to do?
    my key_buffer is set to 1000M, but this problem already begins long before the memory is full.

  34. peter says:

    How large is index when it becomes slower. Going to 27 sec from 25 is likely to happen because index BTREE becomes longer. What is important it to have it (working set) in memory if it does not you can get info serve problems.

    What queries are you going to run on it ? May be merge tables or partitioning will help

  35. Martin says:

    It gets slower and slower for every 1 million rows i insert. The first 1 million row takes 10 seconds to insert, after 30 million rows, it takes 90 seconds to insert 1 million rows more.
    I need to do 2 queries on the table. 1st one (which is used the most) is “SELECT COUNT(*) FROM z_chains_999″, the second, which should only be used a few times is “SELECT * FROM z_chains_999 ORDER BY endingpoint ASC”

    I found that setting delay_key_write to 1 on the table stops this from happening. Now it remains on a steady 12 seconds every time i insert 1 million rows. What exactly is it this option does?
    After 26 million rows with this option on, it suddenly takes 520 seconds to insert the next 1 million rows.. Any idea why? Maybe the memory is full?

  36. Will says:

    Peter, I just stumbled upon your blog by accident. Very good info!

    I’m building an evaluation system with about 10-12 normalized tables. There are two main output tables that most of the querying will be done on. I filled the tables with 200,000 records and my query won’t even run. It’s losing connection to the db server. I tried SQL_BIG_RESULT, analyze table, etc… nothing seems to help

    Here’s my query. It does lot of counting. Does this look like a performance nightmare waiting to happen?

    Thanks,
    Will

    SELECT
    Q.questionID,
    Q.questionsetID,
    Q.question,
    Q.questioncatid,
    QAX.questionid,
    QAX.answersetid,
    ASets.answersetid,
    ASets.answersetname,
    ASAX.answersetid,
    ASAX.answerid,
    A.answerID,
    A.answername,
    A.answervalue,
    COUNT(DISTINCT e3.evalanswerID) AS totalforthisquestion,
    COUNT(DISTINCT e1.evalanswerID) AS totalforinstructor,
    (COUNT(DISTINCT e3.evalanswerID)/COUNT(DISTINCT e1.evalanswerID)*100)
    AS answerpercentage
    FROM tblquestions Q
    INNER JOIN tblquestionsanswers_x QAX USING (questionid)
    INNER JOIN tblanswersets ASets USING (answersetid)
    INNER JOIN tblanswersetsanswers_x ASAX USING (answersetid)
    INNER JOIN tblanswers A USING (answerid)
    LEFT JOIN (tblevalanswerresults e1 INNER JOIN tblevaluations e2 ON
    e1.evalid = e2.evalid
    AND e2.InstructorID = ‘1021338’

    ) ON e1.questionid = Q.questionID
    LEFT JOIN (tblevalanswerresults e3 INNER JOIN tblevaluations e4 ON
    e3.evalid = e4.evalid
    AND e4.InstructorID = ‘1021338’

    ) ON e3.questionid = Q.questionID AND
    e3.answerID = A.answerID

    GROUP BY
    Q.questionID,
    Q.questionsetID,
    Q.question,
    ASets.answersetid,
    Q.questioncatid,
    A.answerID,
    A.answername,
    A.answervalue
    HAVING Q.questioncatid = 1

    UNION
    /**The following query is just for the totals, and does not include the
    group columns**/
    SELECT
    9999,
    NULL,
    ‘Totals’,
    Q.questionID,
    Q.questionsetID,
    Q.question,
    Q.questioncatid,
    QAX.questionid,
    ASets.answersetid,
    ASets.answersetname,
    A.answerID,
    A.answername,
    A.answervalue,
    COUNT(DISTINCT e3.evalanswerID) AS totalforthisquestion,
    COUNT(DISTINCT e1.evalanswerID) AS totalforinstructor,
    (COUNT(DISTINCT e3.evalanswerID)/COUNT(DISTINCT e1.evalanswerID)*100)
    AS answerpercentage
    FROM tblquestions Q
    INNER JOIN tblquestionsanswers_x QAX USING (questionid)
    INNER JOIN tblanswersets ASets USING (answersetid)
    INNER JOIN tblanswersetsanswers_x ASAX USING (answersetid)
    INNER JOIN tblanswers A USING (answerid)
    LEFT JOIN (tblevalanswerresults e1 INNER JOIN tblevaluations e2 ON
    e1.evalid = e2.evalid
    AND e2.InstructorID = ‘1021338’

    ) ON e1.questionid = Q.questionID
    LEFT JOIN (tblevalanswerresults e3 INNER JOIN tblevaluations e4 ON
    e3.evalid = e4.evalid
    AND e4.InstructorID = ‘1021338’

    ) ON e3.questionid = Q.questionID AND
    e3.answerID = A.answerID

    GROUP BY Q.questioncatid, ASets.answersetname,A.answerID,A.answername,A.answervalue

    HAVING Q.questioncatid = 1

  37. Adesh says:

    I am having one query

    SELECT DISTINCT spp.provider_profile_id, sp.provider_id, sp.business_name, spp.business_phone, spp.business_address1, spp.business_address2, spp.city, spp.region_id, spp.state_id, spp.rank_number, spp.zipcode, sp.sic1, sp.approved
    FROM service_provider sp
    INNER JOIN service_provider_profile spp ON sp.provider_id = spp.provider_id
    WHERE sp.approved = ‘Y’
    AND spp.master_status = ‘0’
    ORDER BY sp.business_name ASC
    LIMIT 0 , 100

    In all three tables there are more than 7 lakh record. My query doesn’t work at all
    Any solution……….?
    I m using php 5 and MySQL 4.1………

  38. Andreas says:

    Hi, Im working proffesionally with postgresql and mssql and at home im using mysql for my leasure projects .. Currently Im working on a project with about 150.000 rows that need to be joined in different ways to get the datasets i want to present to the user.

    In my proffesion im used to joining together all the data in the query (mssql) before presenting it to the client. Is this wise .. i.e.

    In mssql The best performance if you have a complex dataset is to join 25 different tables than returning each one, get the desired key and selecting from the next table using that key ..

    But as I understand in mysql it’s best not to join to much .. Is this correct ..

  39. Fernando says:

    Hello Guys
    This article puzzles a bit. I have a project I have to implement with open-source software. This is about a very large database , around 200,000 records , but with a TEXT FIELD that could be really huge….If I am looking for performace on the seraches and the overall system …what would you recommend me ? (In terms of Software and hardware configuration)

    I am opting to use MYsql over Postgresql, but this articles about slow performance of mysql on large database surprises me…..

    By the way….on the other hard, Does Mysql support XML fields ?

  40. peter says:

    This article is not about MySQL being slow at large tables. We have applications with many billions of rows and Terabytes of data in MySQL.

    This article is about typical mistakes people are doing to get their MySQL running slow with large tables.

    MySQL can store XML in Text field.

  41. andreas says:

    Hi again, Indeed, this article is about common misconfgigurations that people make .. including me .. Im used to ms sql server which out of the box is extremely fast .. But for my mysql server I’m having performance issues, sÃ¥ my question remains, what is the best route, join and complex queries, or several simple queries. What im asking for is what mysql does best, lookup and indexes och returning data.

  42. peter says:

    Andreas,

    You can’t answer this question that easy. Depending on type of joins they may be slow in MySQL or may work well. A lot of simple queries generally works well but you should not abuse it.

    In general you need to spend some time experimenting with your particular tasks – basing DBMS choice on rumors you’ve read somewhere is bad idea.

  43. Andreas Thörn says:

    Hi. I quess I have to experiment a bit, Does anyone have any good newbie tutorial configuring MySql .. My server isn’t the fastest in the world, so I was hoping to enhance performance by tweaking some parameters in the conf file, but as everybody know, tweaking without any clue how different parameters work together isn’t a good idea ..

  44. Jim says:

    Hi, I have a table I am trying to query with 300K records which is not large relatively speaking. My problem is some of my queries take up to 5 minutes and I can’t seem to put my finger on the problem. I have tried indexes and that doesn’t seem to be the problem. My query is based on keywords. I get the keyword string then look up the id. I then use the id of the keyword to lookup the id of my record. I then build a SELECT query. My SELECT statement looks something like
    SELECT * FROM table_name WHERE (year > 2001) AND (id = 345 OR id = 654 ….. OR id = 90)
    The second set of parenthesis could have 20k+ conditions. Is there another way to approach this?
    I am running MYSQL 5.0

  45. Will says:

    Take the * out of your select, and name the columns you need. That should improve it somewhat.

    Will

  46. Jim says:

    Will,
    I’m not using an * in my actual statement
    my actual statement looks more like
    SELECT id FROM table_name WHERE (year > 2001) AND (id = 345 OR id = 654 ….. OR id = 90)

    Even the count(*) takes over 5 minutes on some queries.

  47. Jim says:

    To answer my own question I seemed to find a solution.
    I used the IN clause and it sped my query up considerably.
    To use my example from above

    SELECT id FROM table_name WHERE (year > 2001) AND (id IN( 345,654,…, 90))

  48. Erick says:

    This article is BS. MySQL sucks on big databases, period. The big sites such as Slashdot and so forth have to use massive clusters and replication. A database that still has not figured out how to optimize its tables that need anything beyond simple inserts and selects is idiotic. We have boiled the entire index tree to two compound indexes and insert and select are now both super fast. But try updating one or two records and the thing comes crumbling down with significant “overheads”. InnoDB is suggested as an alternative. The select speed on InnoDB is painful and requires huge hardware and memory to be meaningful. Basically: we’ve moved to PostgreSQL, which is a real database and with version 8.x is fantastic with speed as well.

  49. Eric says:

    Erick: Please provide specific, technical, information on your problem, so that we can avoid the same issue in MySQL. For example, how large were your MySQL tables, system specs, how slow were your queries, what were the results of your explains, etc. Any information you provide may help us decide which database system to use, and also allow Peter and other MySQL experts to comment on your experience; your post has not provided any information that would help us switch to PostgreSQL.
    Eric

  50. Erick says:

    Ok, here are specifics from one system. It has exactly one table. MYISAM table with the following activity:

    Per day:

    1. INSERTS: 1,000
    2. UPDATES: 200
    3. SELECTS: 1 million

    Since this is a predominantly SELECTed table, I went for MYISAM. My my.cnf variables were as follows on a 4GB RAM system, Red Hat Enterprise with dual SCSI RAID:

    [mysqld]
    long_query_time=5

    skip-bdb
    skip-innodb
    skip-locking

    query_cache_limit=1M
    query_cache_size=32M
    query_cache_type=1
    max_connections=1500
    interactive_timeout=25
    wait_timeout=10
    connect_timeout=5
    thread_cache_size=60
    key_buffer=750M
    join_buffer=10M

    max_heap_table_size=50M
    tmp_table_size=64M

    max_allowed_packet=16M
    table_cache=1800
    record_buffer=10M
    sort_buffer_size=24M
    read_buffer_size=9M
    max_connect_errors=10
    thread_concurrency=4
    myisam_sort_buffer_size=950M
    character-set-server=utf8
    default-collation=utf8_unicode_ci
    set-variable=max_connections=1500
    log_slow_queries=/var/log/mysql-slow.log
    sql-mode=TRADITIONAL
    concurrent_insert=2
    low_priority_updates=1

    The problem was that at about 3pm GMT the SELECTs from this table would take about 7-8 seconds each on a very simple query such as this:

    SELECT column2, column3 FROM table1 WHERE column1 = id;

    The index is on column1. So when I would “REPAIR TABLE table1 QUICK” at about 4pm, the above query would execute in 0.00 seconds. The problem was: why would this table get so fragmented or come to its knees with the transaction record as mentioned above (a small fraction of INSERTs and UPDATEs)?

    We explored a bunch of issues including questioning our hardware and our system administrators :) When we switched to PostgreSQL, there was no such issue. We do a “VACCUM” every *month* or so and we’re fine.

    MySQL, I have come to realize, is as good as a file system on steroids and nothing more. InnoDB doesn’t cut it for me if the backup and all of that is so very cumbersome (mysqlhotcopy is not available, for instance) and eking performance out of an InnoDB table for raw SELECT speed will take a committee of ten PhDs in RDBMS management. Not kosher.

    The above example is based on one very simple website. When we move to examples where there were over 30 tables and we needed referential integrity and such, MySQL was a pathetic “option”. Yes 5.x has included triggers, stored procedures, and such, but they’re a joke. A simple AFTER INSERT trigger takes about 7 second. What gives? Integrity checks don’t work — try making a check on a column NOT NULL to include NOT EMPTY (i.e., no blank space can be entered, which as you know, is different from NULL). We will have to do this check in the application.

    Just my experience. I know some big websites are using MySQL, but we had neither the budget to throw all that staff, or time, at it. PostgreSQL solved it for us.

    Sorry for mentioning this on a mysql performance blog. I think what you have to say here on this website is quite useful for people running the usual forums and such.

  51. Erick says:

    I may add that this one table had 3 million rows, and growing pretty slowly given the insert rate.

  52. lotso says:

    I’ve read SHE-DBA’s blog on using MySQL as a data-warehousing platform and where it _can_ be used as such provided the design is good and the queries are optimised.

    I work with MSSQL and multi-GB tables (up to 1 TB) with some very ridiculous normalisation and absurd number of columns (some tables have > 900!!) and the number of rows are also in the couple millions.

    I do data-mining of these data and I’m(we actually) getting reprimanded for sending queries that take a long time to finish. (30min up to 2/3 hours). My queries are complex and involves a quite a few joins (due to the normalisation) and multiple subqueries (due to nature of the data). (This is on a 8x Intel Box w/ mutli GB ram)

    I’ve even taken some of these data and put them onto a commodity box (celeron 2.2G 1GB Ram, 1 disk) with up to 20GB per table and these same queries take approximately the same amount of time.

    Due to the usage of subqueries, I think this may be the main cause of the slowness. Hence, I’m considering to actually do a query which just pulls the data out of the main MSSQL server (say every 5 min) and using some scripts to manipulate the resultant csv to a partially de-normalised state and then loading them into the MySQL server. I suspect that this may make things faster. (At least I don’t have to join so many tables to get the data)

    The other thing which I’m considering is to just pull these data direct into users’ PC, dumping it into a sqlite DB and do the joins there, thereby minimising load on the main server or even the mysql server. (running on 5.0, so I can’t even play with partitioning)

    Do you have any thoughts on this? Seems like we’re going in circles with these.

    Everything on Mainframe. –> Processed data to the users PC
    pull from server(select a,b,c from table) -> process data at users PC (due to increase in computational power of current PC) (select a, join b, where c…)

    eg:
    select
    a.a,
    b.x,
    etc..

    from
    (select

    ..
    ) a
    left outer join
    (select

    ) b
    on b.a = a.a
    etc..

    (in total I have like close to 11 subqueries joining a min of 4 tables at each instance)

    Ps : reading Eric/k statement, perhaps PostGres could better handle things?

    As it is, right now, I’m exploring the option of dumping the data into users’s PC into SQLite and doing the processing there so that users PC will be un-disposed and not the server.

    My current issue is how to get it out of the MSSQL server and into the SQLite DB. Anyway… not the correct forum/blog

  53. Alain says:

    Hi, I’m having a similar challenge with a table that is storing files for a web application. The table structure is as following and it stores uploaded files in many portions (fragments) of max. 4MB each:
    CREATE TABLE files (
    fragmentid int(10) unsigned NOT NULL default ‘0’,
    data mediumblob NOT NULL,
    pid int(10) unsigned NOT NULL default ‘0’
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

    The “fragmentid” species a part of the file (data), and the pid specifies the id of the relating record in another table with some meta data. The ‘data’ attribute contents the binary fragments. Building the entire file at the end consists of just putting all the fragments in the right order together. I also have to add, that once a file has been retrieved on request, the content is cached on the file system, in a manner that calling that file afterwards do not require a query, unless the cached file is deleted.

    Meanwhile the table has a size of 1,5Gb. The queries that were taking less than 1 second some times ago are taking at least 20 to 30 seconds. I’ve read the differents comments from this and other forums. It seems like it will be more efficient, to split the tables i.e. in a manner that the table size remain in a good range for fast queries.

    Can anybody here advice me, how to proceed, maybe someone, who already have experienced this. Tnx.

  54. Ray says:

    I am having a problem with updating records in a table. We have aprox 14,000,000 records using over 16gigs storage. This table stores (among other things) ID, Cat, Description, LastModified. The engine is InnoDB.

    I am having a problem when I try to “prune” old data. This query takes about 45 minutes to execute (DELETE FROM Data WHERE Cat=’1021′ AND LastModified

  55. Ray says:

    I am having a problem with updating records in a table. We have aprox 14,000,000 records using over 16gigs storage. This table stores (among other things) ID, Cat, Description, LastModified. The engine is InnoDB.

    I am having a problem when I try to “prune” old data. This query takes about 45 minutes to execute (DELETE FROM Data WHERE Cat=’1021′ AND LastModified < ‘2007-08-31 15:48:00′)

    Each Cat may or may not be updated. I am trying to prune a Cat after updating the Cat to clear out any records that were not updated (hence deleted from our files). There are aprox 900 different Cat values (all integers).

    We’ve tried to put a combined index on Cat and LastModified. This did not seem to speed it up any.

    I attempted to use Sphinx to pull the ID’s of all records in a Cat, then lookup the LastModified from MySQL. This did not seem to help anything. (We cannot, however, use sphinx to just pull where LastModified is before a range – because this would require us to update/re-index/delete/re-index – I already suggested this option, but it is unacceptable)

    We are at a loss here. Any help would be appreciated!

  56. Jim Burke says:

    Indexes end up becoming a liability when updating a table.. The indexes need to be rewritten after each update slowing the process down. Indexes help speed up the look up phase of a record(s). Try minimizing the number of indexes while running a big process like this one, then reapplying the indexes afterward. How long does it take to get a SELECT COUNT(*) using the conditions used in your DELETE statement? My guess is not 45 minutes.

  57. hari says:

    This is hari, Here is my work…….there will be so many so many machines(lathe,crushing),which they are operated at different places with PLC boards.
    Some operators will control the machines by varying the values in the plc board.We need to collect that values from those machines via wireless communication and store that values into the database server.We need to observe that ,the operator operating the machines correctly or not at server place.Here problem is how we have to create the database for dynamic data. That is operator can change his entire table of data (values) at any point of time. With that variation also database need to be updated with these values(that is old table of values need to be replaced with new values when ever change occurs at the machine) please observe that, we need to replace entire table of values with some other values not a single row .Please give me the structure to represent this database(dynamic data).

  58. Chandan says:

    Good article.

  59. Hugo Rafael says:

    Hi!..
    I’m currently working on a web project using MySql, Apache and Php.
    The database has a relatively acceptable size, not only in number of tables, but also in some table sizes.
    The problem I have, is regarding some specific tables in the database, which I use for a couple of months duration, minning them with detailed data of a particular task. After those months pass, I’ll drop all those tables, and rebuild them once again for another couple of months work.

    I’m currently at design time, and at half the specified month period, and so far, the tables are about 10GB in size each (so far, 6 in total).
    SELECTing data from the tables is not a problem, and it’s quite fast (<1 sec.), which is what it’s intended, but INSERTing in them is a nightmare.

    I use multiple record INSERTs (150 each time), and the first INSERT usually takes 600 secs (when the task starts). After this one is INSERTed I can easily INSERT 100.000 (in groups of 150) in less than 10 minutes.

    The question I have, is why is this happening, and if there is any kind of query I can preform in order to “focus” the DBMS “attention” to the particular table (in context), since SELECTing data is always faster then INSERTing it.

    The table spec is the follows:
    CREATE TABLE IF NOT EXISTS TableName (
    A INT(10) UNSIGNED NOT NULL,
    B INT(10) UNSIGNED NOT NULL,
    C TINYINT(3) UNSIGNED NOT NULL DEFAULT ‘0’,
    D TINYINT(3) UNSIGNED NOT NULL DEFAULT ‘1’,
    E TINYINT(3) UNSIGNED NOT NULL DEFAULT ‘0’,
    a TINYINT(3) UNSIGNED NOT NULL DEFAULT ‘0’,
    b TINYINT(3) UNSIGNED NOT NULL DEFAULT ‘0’,
    c TINYINT(3) UNSIGNED NOT NULL DEFAULT ‘0’,
    d TINYINT(3) NOT NULL DEFAULT ‘0’,
    e TINYINT(3) UNSIGNED NOT NULL DEFAULT ‘0’,
    PRIMARY KEY (A,B,C,D),
    KEY (E),
    CONSTRAINT key_A FOREIGN KEY (A) REFERENCES ATable(A) ON DELETE NO ACTION ON UPDATE NO ACTION,
    CONSTRAINT key_B FOREIGN KEY (B) REFERENCES BTable(B) ON DELETE NO ACTION ON UPDATE NO ACTION
    ) ENGINE=InnoDB;

    The capital letters are INDEXes and some reference columns in other tables, and the small letters are just data columns that are returned but never used as filters (WHERE/JOIN/ORDER CLAUSEs).
    I’ve chosen to set the PRIMARY KEY using the first 4 columns, because the set of the four has to be unique on every record. I’ve used a different KEY for the last capital letter column, because I use that column to filter in the table.
    I use a group of these tables in the system, and preform simple SELECTs in them (not joining them with other tables).
    Once again, I use multiple inserts into the table, with sets of 150 records each:
    INSERT INTO T (col1,col2) VALUES (1,1),(1,2),(1,3)…(1,150);

    Is this the best design for the table, taking in consideration what I want from it?!
    Could the INSERTs be slow, dued to the size of the PRIMARY KEY?! Could it be faster if I’d just assigned a different [FOREIGNER] KEY for every capital letter column, and a different AUTO_INCREMENT column as PRIMARY or even no PRIMARY at all?!

    Cheers for your time in advance.

  60. Jim says:

    In general without looking at your ini file and having no knowledge of your system it would be your indexes. The more indexes you have the faster SELECT statments are, but the slower INSERTS and DELETES. This is because the indexes have to be rewritten everytime you make a change. You can tweak memory usage in your ini file or add memory or processors to your computer. the type of DB you are using for the job can be a huge contributing factor for example Innodb vs MyISAM. You need to weigh the pros and cons of all these factors against your situation and then make your own educated decision.

  61. Hugo Rafael says:

    In your opinion, is better policy to have a separate index for each column I search, or a combined index for all the columns?

    Cheers.

  62. Jim Burke says:

    It depends on your searches. MySQL will not combine indexes, only choose the best one it sees fit. There for if you do a search on 2 columns a lot then I would have a separate index to speed up that query. Indexes are great and the more the better, but if you want to bulk upload records then you will suffer the consequences. Sometimes if I have many indexes and need to do bulk inserts or deletes then I will kill the indexes, run my process and then recreate my indexes afterward. Hope this helps.

  63. Hugo Rafael says:

    I didn’t said I wanted to combine indexes, I was talking about a combined index.
    Is it better to have:
    INDEX (‘col1′,’col2′,’col3′);
    or:
    INDEX1 (‘col1′), INDEX2 (‘col2′), INDEX3 (‘col3′)
    ?!

    The reason I’m asking is that I’ll be inserting loads of data at the same time, and the insert has to be relatively quick. On the other hand, I’ll need to select the data even quicker. The table has hundreds of millions of records.

    Cheers.

  64. Jim says:

    Sorry for the confusion, but this is what I was talking about also. If you need to search on col1, col2, col3 then create an index(col1,col2,col3). What I was saying is if you have an index(col1), index(col2), and index(col3) and you run a query and want it optimized, the MySQL engine will not combine the second example to give you index(col1,col2,col3) for a more optimized query. It will only pick index(col1) or index(col2), index(col3) or none of the above. It doesn’t take any longer to create any one of these indexes, but I’m sure you know all this. The trick I use, because I have one database that is fairly large with many indexes, is to drop the indexes, then do the bulk up load, and then recreate the indexes.
    I hope this helps.

  65. Hugo Rafael says:

    It does help, cheers.
    Dropping the index is out of the question, since dropping them and creating them takes far too much time, being even quicker to just let them be.
    I think the answer to this, is just drop the PK and FK’s, and create a normal index with the two main searchable columns. Although this index seams to be a bit slower, I think it might be quicker on large inserts on the table.
    I suppose that I’ll have to break the table up, as well, in order to have all the data in smaller tables and smaller indexes.

    I was having indexes almost the size of the complete table (+/- 5GB), which made the whole table around 10GB. I have been playing with different indexes and at this time I managed to drop the index’s size to up 1.5GB, which is much more acceptable. Although the selects now take 25% more time to perform, it’s still around 1 second, so it seams quite acceptable to me, since there are more than 100 million records in the table, and if it means that the inserts are faster.

    Cheers for your help.

  66. xeeshaan says:

    Best Practice to deal with large DBs is to use a Partitioning Scheme on your DB after doing a thorough analysis of your Queries and your application requirements. Since 5.1 support Data Partitioning, I am using the scheme over a Huge DB of Call Details records which is growing as 20M (approximately 2.5GB in size) records per day and I have found it an appropriate solution to my Large DB issues.

    As for Joins, its always best practice not to use joins over Large Tables. Instead use alternate Sub-queries to Joins where possible and with the use of Partitioning make subsets of your data and bring them in Heap-tables rather than storing them on Disk and perform your operations making sub-tasks of Task.

    Remember when Anaconda eats a deer it always take time to get it right in itss stomach. So give your Anaconda small pieces of meat than full deer all in once.

  67. Hugo Rafael says:

    That seems to be the solution. I had already found that solution on MySql web site, although 5.1 is still not stable according to them, and 5.0 doesn’t support partitioning.
    I’ll have to do it like that, and even partitioning over more than one disk in order to distribute disk usage.

    Cheers.

  68. YC Wee says:

    Hi guys,
    I have recently written an article, “Full Text Partitioning – The Ultimate Scalability”, and my article can be found at the url below:

    http://www.addedworth.com/index.php/2008/06/03/full-text-partitioning-the-ultimate-scal

    This article describes the steps to take when a database is spilled over to more than a single server. Do come by my site and let me know your opinion.

    Thanks,
    YC Wee

  69. Vladimir says:

    Normally MySQL is rather fast loading data in MyISAM table, but there is exception, which is when it can’t rebuild indexes by sort but builds them
    row by row instead. It can be happening due to wrong configuration (ie too small myisam_max_sort_file_size or myisam_max_extra_sort_file_size) or
    it could be just lack of optimization, if you’re having large (does not fit in memory) PRIMARY or UNIQUE indexes.

  70. trent says:

    So – if you have a table with millions of rows with lots of updates and reads happening, InnoDB would be the way to go from what I read, But want if you want to use mysql ‘full text search’ which can only be used on MyISAM.

    Would duplicating data on inserts and updates be an option which would mean having two of the same table, one using InnoDB for main reading purposes and one for MyISAM for searching using Full text search and every time you do an update actually uipdate bith table etc. (forget duplicating data as this is cheap enough) I want speed.

    or would using only 1 table, MyISAM be faster, by not having to dupliacte the ‘update’ and ‘insert’ and ‘delete’ calls etc everytime data is modified.

    Or is there another was to do this like, run a cron every hour to keep MyISAM table in sinc or somthing, meaning search would be out of date by one hour – which I can live with in my situation.

  71. arpit says:

    Hello,pls suggest the solution for my problem.
    I retrive records from 4 tables which are quite large in size using joins ,but it takes lot of time to execute.How to speed up the same query?

  72. sqi says:

    ok

  73. Vladimir says:

    >>Use multiple servers to host portions of data set

    Where can I find out more about this comment? I’d like to partition large tables on several servers. Does it work on 4.1 (we use both 4.1.2 and 5.1)

  74. BraBo says:
  75. Michael says:

    I have the following scenario: I have many (several 100K) lists that contain 50 items, each. From these lists I want to create a table that counts any “co-occurring” items (items are co-occurring if they are occur in the same list). So, I want to count, how many lists contain both, item1 and item2, item1 and item3, etc. The universe of items is huge (several millions). Thus, it is definitely not an option to save all possible pairs of items (as most of them will have count 0). My strategy is to go through the lists one by one, and, for each list, insert the 50 * 49 / 2 pairs into a table with 3 columns (id1, id2, cnt) (where the pair id1, id2 forms the primary key) using INSERT … ON DUPLICATE KEY UPDATE cnt = cnt + 1.

    Obviously, the resulting table becomes large (example: approx. 200M rows for 300K lists), and I noticed that inserts become really slow. I guess this is due to index maintenance. Therefore, I wonder how I best restructure the DB to maintain an acceptable insert performance. Which are the most relevant parameters I should look into (to keep as much as possible in memory, improve index maintanance performance, etc.)? Could it, for example, help to increase “key_buffer_size”? Also, which storage engine (MyISAM?, InnoDB?) is supposed to work faster in this scenario (and which parameters are relevant for different storage engines)? Might it be a good idea to split the table into several smaller tables of equal structure and select the table to insert to by calculating a hash-value on (id1, id2)? Is there maybe built-in functionality to do such splitting? Or does partitioning only help if I have several servers (or hard-drives) available? Could I maybe disable some concurrency related checks (as I have exactly one process inserting into the table, which I could even stop when I want to read data)?

    Any hints are greatly appreciated!

  76. Jim Burke says:

    Why do you have so many short lists? Why don’t you have 1 list with a field for list type? This could make your reporting much quicker, but it’s hard to say when I don’t have all the details of the project. I would normally say you could somehow use joins/subqueiries but with 100k lists I don’t think that would be a suitable or even possible solution.

  77. Michael says:

    Thanks for the prompt answer! The lists are actually constantly coming in, kind of in a stream. I just buffer them for processing, but in the end I am not interested in the lists, but only in the mentioned statistics (i.e. the co-occurrence counts of item-pairs). Obviously I would like to be able to process as many lists as possible and thus I would like to have the insertion process as fast as possible. I guess storing, say, 100K lists and then applying the appropriate join and updating the “statistics-table” (i.e. the one with the co-occurrence counts) is indeed not really an option, as this would probably result in a temporary table with more than 100M rows that then has to be inserted into the “statistics-table”.

    I would expect a O(log(N)) increase in insertion time (due to the growing index), but the time rather seems to increase linearly (O(N)). I thus guess my database is not well configured for this kind of inserts at the moment (i.e. it does not keep the right things (primary key, …) in memory, etc.) Any ideas what the reasons could be? Could maybe somebody point me to the most relevant parameters to consider in my case (which parameters, for example, define the amount of memory reserved to handle the index, etc.)? Also, are there any suggestions about which storage engine to use? Thanks a lot!

  78. John-Marc says:

    >I would have many to many mapping from users to tables so you can decide how many users you put per table later and I would also use composite primary keys if you’re using Innodb tables so data is clustered by user.

    How do you feel InnoDB composite keys work as opposed to separate tables? For example, if a table with 30 million records and a composite key on userID, Could one expect better performance by splitting into 30 tables of 1 million records each, or are composite keys that efficient?

  79. peter says:

    John,

    When the number is just 30 going with separate tables may be easier and good for performance.

    However if you have 1000000 users some with only few records you need to have many users in each table.

  80. John Marc says:

    Peter,
    Thanks. Yes my ommission. Of course, I am not trying to get one user per table.

    So I guess a better question is,
    30 million records, 1 million users with 30 records each
    You mentioned (and I have experienced) that a composite key starting with userid in Innodb is indeed very efficient and fast.

    So I am wondering, in your experience, is innoDB so efficient that splitting into 30 tables with 1 million records and 300,000 users per table would be noticable.
    Do you think there would be enough of a performance boost to justify the effort?

  81. peter says:

    John,

    It depends on the operations a lot. Many tables is typically makes it easier to move data across many servers (sharding) as well as do maintainence operations – ALTER TABLE etc.
    If you designed everything right 1 table should not be slower than 30 smaller tables for normal OLTP operations.

  82. John Marc says:

    OK, good to know. We contracted you to help us a couple years back, and it has been VERY stable and quick. I am now looking to further optimize, and it seems I am hearing I would probably do better to look closer at my schema, and possible ‘sharding’

  83. Paola says:

    I need to optimize mysql server to manage a big tables (now about 78Mb, with increment of 1Mb/day)


    I have used a table MyISAM, info from phpmyadmin:
    Collation utf8_general_ci
    lenght row 122
    row dimension avg 194 byte
    row 411,069

    actual main phpmyadmin usage stat:

    QueryType # dim/hour %
    change db 2,103k 3,262,61 61,29%
    select 933k 1,447,66 27,20%
    insert 358k 555,08 10,43%
    update 30k 47,19 0,89%
    set option 1,189 1,84 0,03%

    I have a gallery with select query with ORDER BY, LIMIT and with paging

    Handler_read_rnd 12 M
    Handler_read_rnd_next 4,483 M
    Created_tmp_disk_tables 5,270
    Created_tmp_tables 5,274
    Created_tmp_files 37 k
    Key_reads 4,226
    Key_write_requests 380 k
    Key_writes 367 k
    Sort_merge_passes 18 k
    Sort_rows 12 M


    Actual my.cnf:
    [mysqld]
    datadir = /var/lib/mysql
    socket = /var/lib/mysql/mysql.sock
    user=mysql
    # Default to using old password format for compatibility with mysql 3.x
    # clients (those using the mysqlclient10 compatibility package).
    old_passwords=1
    big-tables

    [mysqld_safe]
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid


    Questions:

    – i have webmin installed, but when I change mysql vars and restart server, my configs are not applied, defautl mysql

    applied, and my.cnf are not modified. Why? Where are written this configs?

    – do i switch table from MyISAM to InnoDB (if yes, how to configure InnoDB for best performance?)

    – what parameters i need to insert manually in my.cnf for best performance & low disk usage?

    – may i remove old_passwords=1 & big-tables?

    Web Server with Apache 2.2.
    Mysql ver. 5.0.45 via socket on x86_64 centos 5.2, 1 CPU 4core with 4 Gb RAM, 3Tb SATA disk space
    Load avg: 0.81 (1 min) 0.68 (5 mins) 0.73 (15 mins)
    Real memory 3.86 GB total, 1.38 GB used
    Virtual memory 4 GB total, 288 kB used

    thanks,
    Paola

  84. Mertkan says:

    Hi,

    I had a problem with COUNT(*) when I run it from my application. Whenever I run “SELECT COUNT(*) FROM MYTABLE WHERE status=1″ it was tooking only miliseconds from a mysql interface (in 120.000 records). The table has necessary indexes…

    But whenever I run this query in my application on ADODB connection, it was tooking 8-10 seconds. I tried using “SELECT COUNT(id) FROM z use index(id) WHERE status=1″ but no hope, it was taking to much time again. Then I changed my query like this “SELECT * FROM (SELECT COUNT(*) FROM MYTABLE WHERE status=1) as derived” and it took miliseconds again.

    Strange behavior…

  85. Mertkan says:

    Correction “I tried using “SELECT COUNT(id) FROM MYTABLE use index(id) WHERE status=1″ but no hope, it was taking to much time again.”

    Btw when I run “SELECT * FROM MYTABLE WHERE status=1″ from my application, I was getting the result in miliseconds… Only COUNT(*) keyword in the same conditions (Same where clause) was taking too much time…

  86. Erick says:

    Mertkan, if you insist on using MySQL, be prepared to see these whimsical nuances. When I finally got tired of it, I moved to PG and have never looked back. It’s even faster than MySQL now so that argument is out the window. Only excuse to use MySQL these days is that it’s easy to set up.

    Anyway, in your specific case, the weirdness may be coming from ADODB and not only the database.

  87. Erick says:

    A count will always take more. In real DBs like PG, you can set an index on the count too, as well as several “partial indexes”.

  88. Morten Simonsen says:

    A extreme case!! Let me show you something:

    # Query_time: 1 Lock_time: 0 Rows_sent: 1 Rows_examined: 150000
    SELECT COUNT(DISTINCT(u.unit_id))
    FROM unit u
    RIGHT JOIN
    (SELECT up1.unit_id
    FROM unit_param up1
    WHERE up1.unit_type_param_id = 24
    AND up1.value = ‘ServiceA’
    ) nmp0
    ON u.unit_id = nmp0.unit_id;

    This query is fine. No problem. Now, I have to make a selection from the unit_param table which, for which I need to do another JOIN in addition to the previous query. The first section of the query is exactly the same as the previous query. Beware:

    # Query_time: 1138 Lock_time: 0 Rows_sent: 0 Rows_examined: 2271979789
    SELECT COUNT(DISTINCT(u.unit_id))
    FROM unit u
    RIGHT JOIN
    (SELECT up1.unit_id
    FROM unit_param up1
    WHERE up1.unit_type_param_id = 24
    AND up1.value = ‘ServiceA’
    ) nmp0
    ON u.unit_id = nmp0.unit_id
    RIGHT JOIN
    (SELECT up1.unit_id
    FROM unit_param up1
    WHERE up1.unit_type_param_id = 23
    AND up1.value = ‘Bigland’
    ) nmp1
    ON u.unit_id = nmp1.unit_id;

    This query never responded, I had to cancel it (but not before it had run for 20min!! running on 100% CPU) However, the number of rows with “Bigland” was just 75K rows compared to rows with “ServiceA” which was 50K rows. The initial table (unit) was 100K rows. Each row record is approx. 500B. So adding ONE JOIN extra, with an additional 75K rows to JOIN, the query went from OK to a DISASTER!!!!

    Then number of rows involved went from 150K rows to 2.2G rows!!!!! How on earth could this happen? Has the JOIN thing gone completely crazy??? What can I do about this? I have tried adjusting the join_buffer_size (128M and 16M) but to no avail.

    I must admit I am getting desperate now. Maybe I can rewrite the SQL, since it seems like MySQL handles ONE JOIN, but no way it handles TWO JOINS.

    Sincerly grateful for any input here!

    Morten Simonsen

  89. peter says:

    Morten,

    The problem is you’re joining “derived tables” which causes MySQL to create tables without indexes which causes very slow joins. Known feature limitation, though annoying one.

  90. Jari says:

    I worked on a project containing 5 tables and a realtime search (AJAX). First I split the searchable data into two tables and did a LEFT JOIN to get the results. I noticed that when there were few million records in both table, things got extremely sluggish. Then I merged the two tables and tested it with 35 million records with no performance problems.

    “Avoid LEFT JOIN if possible” is the best tip you could get when handling large tables. You might even consider to duplicate your data into two (or more) tables, for ex. table meant for searching data could contain duplicate columns than the table meant for viewing the selected data.

  91. Morten Simonsen says:

    In the end I changed the SQL…as suggested. For those who is interested it came out like this instead:

    SELECT COUNT(DISTINCT(u.unit_id)) FROM (
    SELECT u1.unit_id
    FROM unit u1, unit_param up1
    WHERE u1.unit_id = up1.unit_id AND up1.unit_type_param_id = 24 AND up1.value = ‘ServiceA’) u2, unit_param up2
    WHERE u2.unit_id = up2.unit_id AND up2.unit_type_param_id = 23 AND up2.value = ‘Bigland’

    …a nested select. This query works “fine”…some seconds to perform. Nothing to be impressed by.

    All in all, pretty annoying, since the previous query worked fine in Oracle (and I am making a cross-database app). Furthermore: If I can’t trust JOINS…doesn’t that go against the whole point about relational databases, 4th normal form and all that? Any hope that this issue will be fixed any time soon?

  92. tulip says:

    hi All

    i am using mysql
    I had a problem with joining table where all table had a records more than 2 lakhs
    so when I run it from my application. it was taking too much time for execute
    i had a table like education_details,professional_details,contact_details etc where all table had a different fields and only uid and resume_id this two fields are same in all tables with unique row of this id

    when i was trying to retrieve data from one table that time working properly but
    when i was using this with multiple table joing of both type like inner join and left join etc
    but it was not working and give rows as more that 10 minituts

    so please anybody help me for the fetch row from multiple rows having a more than two lacs records

    Thanks in Advance

  93. anon says:

    Question 1
    I’m just wondering what you mean by ”keeping data in memory”?

    “the good solution is to make sure your data fits in memory as good as possible”

    Do you mean ensuring SELECTs return less data than the sytems’s RAM?

    Question 2
    Big joins are bad. But, do you have any suggestions on how to circumvent them?
    I’m thinking of doing a number of queries to SELECT subsets of data into smaller TEMPORARY TABLES then doing a JOIN on them.

  94. jadfreak says:

    hello peter,

    i wanted to know your insight about my problem.

    QUERY USED:
    SELECT DISTINCT MachineName FROM LogDetails WHERE NOT MachineName IS NULL AND MachineName !=” ORDER BY MachineName

    SETUP A:
    We have a web application that uses MS SQL database. When invoking a SELECT statement in LogDetails table(having approx. 4 million rows), the execution time is more or less 30 seconds.

    SPECS of SETUP A:
    OS: Windows XP Prof
    Memory: 512MB

    SETUP B:
    It was decided to use MYSql instead of MS SQL. It took approx. 2.5-3 mins to invoke the same query used in SETUP A.

    SPECS of SETUP B:
    OS: Red Hat Linux 4
    Memory: 512MB

    QUESTION:
    1) Why does MS SQL performs faster when they have the same specs though with different OS?
    2) I know the memory can affect the performance but why has it not affected MS SQL much compared with MYSQL?
    3) Any suggestions on how to improve SETUP B?

  95. rich says:

    Peter,

    I am relatively new to working with databases and have a table that is about 7 million rows. It seems to be taking forever (2+ hrs) if I need to make any changes to the table (i.e. adding columns, changing column names, etc.) Is there an easy way to make these operations go faster? Thanks!

  96. Harutyun says:

    Hello Peter

    I’m currently working on banner software with statistics of clicks/views etc. I’m testing with table with ~ 10 000 000 rows generated randomly. In first table I store all events with all information IDs (browser id, platform id, country/ip interval id etc.) along with time when event happened. The most common query in such cases is to get top N results for browsers/platforms/countries etc in any time period.

    The main event table definition is
    CREATE TABLE IF NOT EXISTS stats (
    id int(11) unsigned NOT NULL AUTO_INCREMENT,
    banner_id int(11) unsigned NOT NULL,
    location_id tinyint(3) unsigned NOT NULL,
    url_id int(11) unsigned NOT NULL,
    page_id int(11) unsigned NOT NULL,
    dateline int(11) unsigned NOT NULL,
    ip_interval int(11) unsigned NOT NULL,
    browser_id tinyint(3) unsigned NOT NULL,
    platform_id tinyint(3) unsigned NOT NULL,
    PRIMARY KEY (id),
    KEY bannerid (banner_id),
    KEY dateline (dateline),
    KEY ip_interval (ip_interval)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=FIXED AUTO_INCREMENT=10100001 ;

    The country codes stored in different table named iplist
    CREATE TABLE IF NOT EXISTS iplist (
    id int(11) unsigned NOT NULL AUTO_INCREMENT,
    code varchar(2) NOT NULL,
    code_3 varchar(3) NOT NULL,
    name varchar(255) NOT NULL,
    start int(11) unsigned NOT NULL,
    end int(11) unsigned NOT NULL,
    PRIMARY KEY (id),
    KEY code (code)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=91748 ;

    So the query to get top 10 countries will be

    SELECT iplist.code COUNT(stat.ip_interval ) AS count
    FROM stats AS stat
    LEFT JOIN iplist AS iplist ON (iplist.id=stat.ip_interval)
    WHERE stat.dateline>=1243382400 AND dateline<1243466944
    GROUP BY code
    ORDER BY count DESC
    LIMIT 0, 10

    This query takes ~13 seconds to run (2GHZ Dual Core CPU, 2GB RAM).

    EXPLAIN this query shows that it uses JOIN perfectly.

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE stat range dateline dateline 4 NULL 277483 Using where; Using temporary; Using filesort
    1 SIMPLE iplist eq_ref PRIMARY PRIMARY 4 vb38.stat.ip_interval 1

    So have you any idea how this query can be optimized further or it’s normal time for such query?

  97. adnan says:

    I’m running my own custom ad server and saving all the impressions and clicks data in one single table. When the entries goes beyond 1 million the whole system gets too slow. The site and the ads load very slowly.

    Any suggestion?

    Thanks
    Adnan

  98. Roy says:

    Hypothetical:

    I have 500,000 user records.
    Each user is going to have a score based on values from another table. (the average of 30 scores for each user)

    How should I create table structure so I can do rankings for each user based on their
    average score?

    So rank 1 through to rank 500,000. Want to be able to view -quickly- someones rank via SQL at anytime.
    Is this possible?

  99. Robin says:

    We encountered the performance problem when we join two large tables. However, the problem is quite tricky:

    We only select a range of records in one large table to join another large table. the join fields are indexed and the selection of the records for join uses primary key.
    the time for retrieving records between 1-20000, 20000-40000, … is quite stable (about 5 seconds for each range).

    However, when it came to about 560000-580000 and above, the time became significant longer (more than 50 seconds). I don’t know why this happens and if any one also had this problem.

  100. Marc says:

    Is there a point at which adding CSV values to an IN(val1, val2,…) clause starts to make an index lose it’s efficiency? My situation:
    Large MyISAM table: 30 million recs, data: 1.2 GB
    Data columns are like so: (INT val #1, INT val #2, INT val #3, VARCHAR val #4)

    WHERE clause searches are being done in this manner
    Val #1: always = one single integer value
    Val #2: IN(1 to 4 CSV integer values)
    Val #3: IN(unknown number of CSV integer values, probably max of 50)
    Val #4: LIKE ‘string%’

    I actually have not added the column/data for Val #3 yet. Just doing searches as above on (Val #1, #2, #4) are very fast. I’m worried that when I add Val #3, things will get quite slow. So I’m wondering, are there a certain number of CSV values that will make the IN() search actually slow down? I’m not worried if I only have a few in there. But if I need to have “Val #3 IN (1,2,3,4,5,6…50)” will that make index access super slow?

    Thanks.

  101. Marc says:

    Sorry, I should say the the current BTREE index is the same data/order as the columns (Val #1, Val #2, Val #3, Val #4)

  102. David Stone says:

    Hello all,

    I have a very large table, 3 billion rows, 300GB in size. It’s of ip traffic logs. Select queries were slow until I added an index onto the timestamp field. It took about 6 hours to create the index, as I tweaked some buffer sizes to help it along. Previous attempts at creating the index took days. The index file is about 28GB in size now. The server has 4GB of RAM, dual Core 2 2.6GHz processors. All it does is process these logs, and handle the occasional query we need to do when we run a report for someone, maybe once a week.

    Adding the index really helped our reporting, BUT now the inserts are taking forever. Each file we process is about 750MB in size, and we insert 7 of them nightly. It used to take about 9 hours to insert all the files, and now it takes upwards of 15 hours, which is becoming a problem.

    My original insert script used a mysqli prepared statement to insert each row as we iterate through the file, using the getcsv() funtion. I did some reading and found some instances where mysqli can be slow, so yesterday modified the script to use regular mysql functions, but using an insert statement with multiple VALUES to insert 50 records at a time. There is no appreciable performance gain.

    I know I will most likely have to break this very large table down into tables by week. My question is what my plan of attack should be to get the best insert performance?

    MERGE tables?
    PARTITION tables?
    Simply break up my big table into smaller ones?

    I think the root of my issue is that the indexes don’t fit into RAM.

  103. David Stone says:

    I forgot to add that while the server is inserting the logs, I see very LOW disk throughput — 1.5Mb/s! And very low CPU usage as well! I was hoping to see the machine either disk or CPU-bound to help troubleshoot what the problem is, but this is not the case.

    Thanks all!

  104. zia says:

    HI,

    What are the causes of Mysql index lost ? Can Mysql lose index during high traffic load ?

    Best Regards
    Zia Ullah Butt

  105. Experiment on nested queries and join queries: http://vpslife.blogspot.com/2009/03/mysql-nested-query-tweak.html

  106. Mike says:

    Great article and interesting viewpoints from everyone.
    Guess no one here would freak out if I said I have a perfectly operational MySQL 5.1.44 database with 21,4 GB of data in a single table and no problems whatsoever running quite hefty queries on it.

    I simply put the search index in a separate table, of some 80 MB and the binary / blobs in another.
    No bloating :) works like a charm. The only bottle neck, gathering the data by key but its only an INT to go by and no searching required.

    When you really work up an amount of data to store, this could possibly help someone else too.

  107. Sypqseuth says:

    To the author,
    Thank very much for the post. It is what I am concerning now, I use mysql to process just around million rows with condition to join on two columns, I spend whole day with nothing reply yet. I have two table: A and B, each table have 20 columns. I have to find records having same value in column5 and another value in column10 of table A equal column6 of table B. I use inner join. By this post I try to divide into smaller table and running one sql per time, but still not faster. Could you please advise me. I need this in resurgence.

    Best regards

    Seuth

  108. Frank Qaxy says:

    “One of the reasons elevating this problem in MySQL is lack of advanced join methods at this point (the work is on a way) – MySQL can’t do hash join or sort merge join – it only can do nested loops method which requires a lot of index lookups which may be random.”

    Are the advanced join methods available now?

  109. Eddie says:

    Hello,

    I need to delete all 300,000 records found in one table from a table that has 20,000,000 records, and neither the subquerry i wrote nor the join i wrote give me any result at all in over 12 hours. This is being done locally on a laptop with 2 GB of Ram and a dual core 1.86 Ghz Cpu – while nothing else is happening. The large table has 2 indexes on it and totals 3 GB – more than the ram in the machine – this done on Ubuntu 10. Is this normal – for a delete involving 2 tables to take so long? Is there something special about a delete that makes it MUCH MUCH slower than a select? And will it improve performance to remove the indexes so the larger table size is less than 2 GB? (There are only 3 fields of a one characterseach in the table plus one field of 40 characters, which is inexed and is the filed being used for the sql statement relating both tables). MySQL works so fast on sume things, that I wonder if I have a real problem in my Ubuntu OS or Mysql 5.1 or Php 5.1.x ? Can a real expert please comment on whether these are realistic times or not, and offer advice on how to improve performance without adding RAM – which I will be able to do when I can afford new hardware. Thanks very much.

  110. Erick says:

    Eddie, this depends on

    1. Your MY.CONF settings
    http://www.notesbit.com/index.php/web-mysql/mysql/mysql-tuning-optimizing-my-cnf-file/

    2. The type of table it is — is it MYISAM or INNODB? For InnoDB, you may also fix the INNODB specific settings.

  111. nilesh says:

    I think MySQL does support hash joins and merge joins (merge on primary, hash otherwise). Been a while since I’ve worked on MySQL. Please correct me if I am wrong. And update the current status on the blog itself. Would be a great help to readers.
    Thanks

  112. Alex says:

    Hi

    We have a small Data Warehouse with a 50 million fact table (around 12GB). It is partitioned by month. (we are using InnoDB 1.0.6)

    Some of our queries need to access entire table (full table scan), these queries are deadly slow. Out I/O system offer around 60MB/sec but before this limit is reached the I/O system is flooded by very high amount on IOPS (we have observed around 1200 IOPS).

    Seems to me that the limitation is how MYSQL (or InnoDB) reads data, it is not capable to do scatter reads, every disk read is only 16K and to read a large table this basically screws up the I/O system.

    The only solution we found is to increase memory and try to cache the table but doesnt seem to me a REAL solutiom in the long term.

    Anything we can do with MYSQL and InnoDB configuration?

    Thanks

  113. Rag says:

    I have around 9,00,000 user records, I have Problems with Login, which is very slow (Terribly Slow), All i have is PHP / MYSQL with a VPS with 768MB RAM. The total size of the MySQL is Just around 650 MB.

    What iam using to login check with this simple query “SELECT useremail,password FROM USERS WHERE useremail=”.$_REQUEST['USER_EMAIL'].”AND password=” .$_REQUEST['USER_PASSWORD'] ;

    Sometimes it takes few minutes or else , it times out.

    My Max script execution time in PHP is set to 30 Secs.

    Any idea to improve?

    Thanks in Advance

  114. John M Ventimiglia says:

    While your question is better suited elsewhere – may I put my ESP cap on and suggest you add indexes? There are many very good articles on optimizing queries on this site. You may want to read them.

  115. Mohammed says:

    Peter,

    I have MYSQL database performance issue and I have updated the MYSQL Performance blog as below link.

    Please provide your view on this and its very urgent and critical.

    http://forum.percona.com/index.php/t/1639/

    Thanks

    Mohammed.

  116. peter says:

    Mohammed,

    If you require urgent assistance for project of critical importance forum is not the right way to seek help as it is only looked at at spare times.
    In such cases commercial services work much better http://www.percona.com/contact/sales/

  117. Hi ,

    I am using mysql query
    select contentid,SheetName,languageName from contentapplicationview where sheetName in (select tblapplicationauthorized.date from tblapplicationauthorized where applicationid='” + applicationId
    + “‘ and authscrname='” + screenName + “‘) and fkApplicationId='”+applicationId+”‘ order by SheetName,languageName

    I had 40000 row in database when ever i fire this query in mysql its taking too much time to get data from database.

    what i do fro this.

    thanks,

    subrahmanyam k

  118. Andrew says:

    In response to Rag

    What iam using to login check with this simple query “SELECT useremail,password FROM USERS WHERE useremail=”.$_REQUEST['USER_EMAIL'].”AND password=” .$_REQUEST['USER_PASSWORD'] ;

    Probably down to the way you mySQL table is setup. I’d be more concerned about your login though, I hope a bit further up the script you have a

    $_REQUEST['USER_PASSWORD'] = mysql_real_escape_string($_REQUEST['USER_PASSWORD']);

    otherwise some little script kiddy is going to cause you an even bigger problem in the future.

  119. L says:

    Great article, gave me some good pointers.

    Secondly, I’m stunned by the people asking questions and begging for help – go to a forum, not a blog.

    Thanks again

  120. Shashidhar says:

    Dear author & readers

    Whatever post/blog I read, related to MySQL performance, I definitely here “fit your data in memory”. But, never tried explaining a bit on it? What exactly “fitting the data in memory” means?

    at least could you able to explain brief in short?

    Cheers in advance,

  121. i think indexes can fix the problem.

  122. joe says:

    You need a lot of work on your technical writing skills. Most of your sentences don’t pass as “sentences”.

    “So you understand how much having data in memory changed things here is small example with numbers.” -OMG

    How you obtained a masters degree is beyond me.

  123. Greg says:

    “fit your data into memory” in a database context means “have a big enough buffer pool to have the table/db fit completely in RAM”. In InnoDB, have innodb_buffer_pool_size > the size of your database (or at least your table). In MyISAM, it’s a bit trickier, but make sure key_buffer_size is big enough to hold your indexes and there’s enough free RAM to load the base table storage into the file-system cache.

    Obviously, this gets expensive with huge databases, but you still want to have a good percentage of the db in RAM for good performance.

  124. ankit says:

    I have 10GB MYISAM table . My website has about 1 million hits daily . Under such a heavy load the SELECT and inserts get slowed . I have the below solutions in mind :
    1. Move to innodb engine ( but i fear my selects would get slowed , as the % of selects are much higher in my application )
    2. Upgrade to 5.0+ ( currently i am on 4.0)
    3. Partition my tables ( i.e. break my table data into mutliple smaller tables , but this would make thigs very difficult for me to handle)

    Can anybody help me in figuring out a solution to my problem .

  125. @ankit: replications? you could use your master for write queries like, update or insert and the slave for selects. that should increase the speed dramatically.

  126. Toko Bunga says:

    Processing in memory is so much faster and you have whole bunch of problems solved just doing so. Use multiple servers to host portions of data set. Store portion of data you’re going to work with in temporary table etc.

  127. Toko Bunga says:

    Did the reason is normally table design and understanding inner works of MySQL?

  128. Michael says:

    So joins are good if selecting only a few items, but probably not good if selecting nearly an entire table (A) to then be joined with nearly another table (B) in a one-to-many join?

    I’m considering doing this with a 1 min cron. But, would this (nearly full table joins within a script that runs frequently) be a case were it’d be better to store a small amount of data from table B (the “many”; five or six items; thresholds for notifications actually) as *serialized* data within Table A?

    Are huge joins or serialization the greater evil?

    (I’ve been struggling with this one for about a week now.)

  129. francis says:

    i also have problems with my queries i tried to optimized mysql using the explain and i got 1 row result per table except the master table in which it has 13,000 records. i joined 6 tables and the result using explain is

    Tables Result
    master 130000 Sorting, temporary…
    subrtable 1 overal records is 20
    subtable2 100 overal records is 100
    subtable3 1 – overal records is 13,0000
    subtable4 1 – the overall records of this is more than 100 thousand records

    running the query with subtable4 took 36 – 59 secs
    running the query without took 6 secs

    i want to have at less 1 sec run per query. any help is very much appreciated thanks

  130. Cristina Fon says:

    Need help! I was on a windows server with mysql and get faster and moved to a linux machine with 24 G of memory. It was very slow and the process that ran on another machine that took 1 hour this is 6 hours. The tables are large and when I select a table that is locked to insert. Someone could help me, perhaps the my.cnf configuration, the team was named the server my.cnf for large tables. Thank you

  131. chet says:

    i have 90GB db, my selects are pretty slow with tables over 500,000 rows.

    guess i will have to partition as i used up the maximum 40 indexes and its not speeding things up.

    i think max rows per table should be 50-100k rows

  132. mallikarjun says:

    Hi All,
    Can any one please help me how to solve performance issue in mysql database.
    problem is i am have a table with 10Lacks records when i am selecting a table through java hibernet.
    first 2Lacks records are getting fast after that it is taking lot off time to fetch and fetching some 100 records only very slow.
    so please guide me where to set parameters to overcome this issue.
    my.cnf file contains the following information.
    [mysqld]
    innodb_buffer_pool_size = 2G
    innodb_log_buffer_size=5m
    innodb_flush_log_at_trx_commit=2
    innodb_lock_wait_timeout=120
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    user=mysql
    init_connect=’SET collation_connection = utf8_general_ci; SET NAMES utf8;’
    default-character-set=utf8
    character-set-server=utf8
    collation-server=utf8_general_ci
    [client]
    default-character-set=utf8
    set-variable = max_allowed_packet=32M
    [mysqld_safe]
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid

    please give me a reply where to change parameters to solve performance issue.

    Thanks,
    Mallikarjun Reddy.

  133. Muktesh says:

    Performance is very important with any application.If your database tables have millions of records then a simple SQL query will take 3-4 mins.but ideal time for a query should be at max 5 sec. Now the Question comes “How can improve performance with large databases.“
    See this article http://techathon.mytechlabs.com/performance-tuning-while-working-with-large-database/

  134. Wajira says:

    What happen when i create view.when i get data from view then system is slow or what?

  135. Hi Wajira, this post is from 2006 – the best place to ask your question is on the Percona MySQL discussion forums here: http://www.percona.com/forums/ I’m the community manager and invite you to join the conversations there. :)

  136. Saji Antony says:

    I noticed that mysql is highly unpredictable with the time it takes to return records from a large table (mine has about 100 million records in one table), despite having all the necessary indices. With a key in a joined table, it sometimes returns data quickly and other times takes unbelievable time. From my comparable experience with Sql Server and Oracle, I noticed MySQL’s ISAM tables are terribly week in its algorithms. I finally now resorting to small snapshots approach. So in my application, I decided to write either a program function or a stored procedure that takes small dataset initially and then utilizing the individual records from that dataset to further expand searching in other tables. That somehow works better.

  137. sargon says:

    Hi all
    I love this article I’m actually looking for solution to solve my website slow mysql
    Ecommerce Local

    I have the following issues:
    when I click on small value pages e.g. page number 6 http://www.ecommercelocal.com/pages.php?pi=6
    the site load quickly
    but on other pages e.g. page number 627500 http://www.ecommercelocal.com/pages.php?pi=627500
    the site load very slow some time with error as below:

    Internal Server Error

    The server encountered an internal error or misconfiguration and was unable to complete your request.

    Please contact the server administrator, webmaster@ecommercelocal.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.

    More information about this error may be available in the server error log.

    Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

    ##any idea what to do?? ##

    I don’t want to use split the database. is there any setting to be changes e.g. cache size….
    any help will be appreciated.

  138. Hi sargon, I’m Percona’s community manager. Thanks for the comment/question but this post is from 2006 so it’s not likely that Peter will see this to respond. I suggest instead sharing it on our MySQL discussion forums – so that the entire community can offer some ideas to help. Here’s a url: http://www.percona.com/forums/questions-discussions/mysql-and-percona-server

    That category is for generic MySQL issues as well as those associated with Percona Server, which is an enhanced, drop-in MySQL replacement. So feel free to post there and I also invite you to join our community, too.
    -Tom

  139. Ghanshyam says:

    You would always build properly normalized tables to track things like this.

    For instance, a possible schema:

    CREATE TABLE user_attributes (
    id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
    user_id INT NOT NULL,
    attribute_name VARCHAR(255) NOT NULL,
    attribute_value VARCHAR(255),
    UNIQUE INDEX index_user_attributes_name(user_id, attribute_name)
    );
    This is the basic key-value store pattern where you can have many attributes per user.

    Although the storage requirements for this is higher than a fixed-columns arrangement with the perpetually frustrating names like attribute1, the cost is small enough in the age of terabyte-sized hard-drives that it’s rarely an issue.

    Generally you’d create a single table for this data until insertion time becomes a problem. So long as your inserts are fast, I wouldn’t worry about it. At that point you would want to consider a sharding strategy to divide this data into multiple tables with an identical schema, but only if it’s required.

    I would imagine that would be at the ~10-50 million rows stage, but could be higher if the amount of insert activity in this table is relatively low.

  140. jitendra Kumar says:

    Hi All,
    I have the following issues:
    When I make joining on a table which have approx 10,00,00,00 records with other table which have approx 1,00,00,000 records, it takes more than two hours to give result. So please suggest me how to reduce this time? Any help will be appreciated.
    -Thanks,
    Jitendra

  141. Leon says:

    If I need time difference calculation on the same datetime field between 4 types of events related to the same thing (i.e books), and yearly the number of this events goes up to 2 million, what is better:
    A) 1 big table with correctly indexed book_id and event_id and possible partitioning by book type (up to 50 types)
    B) 4 tables, one for each event, indexed with the same foreign key that holds the relation and possible partitioning all 4 tables by the book type

    Queries needs to search max 2 years in the past, ones in the year full search on data.

    After 5 years it will be 10 million events, so what is better, 10 mill rows or 4 x 2.5 mill rows in 1to1 relationship?

    And what if one or more event happens more than ones for the same book?

    Table type MyISAM or InnoDB.

    Thanks in advance

Speak Your Mind

*