Announcement

Announcement Module
Collapse
No announcement yet.

Selecting where id in (list of 1000 ids)

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selecting where id in (list of 1000 ids)

    Hi,

    I'm trying to do the following select:
    SELECT * FROM table WHERE id IN
    (1,2,3..999,1000); (with other id numbers of course)

    However, this query takes about 2 seconds. Is there any way to speed this up?

  • #2
    Nobody who knows a solution?

    SELECT * FROM table WHERE (entries.id IN (8631,...351485) LIMIT 1000
    Takes: 14 sec

    SELECT * FROM table LIMIT 1000
    Takes: 0.8 sec

    Comment


    • #3
      Do you have a unique index on id?

      Comment


      • #4
        Yep. It is the primary key..

        Comment


        • #5
          Some questions:

          How many entries did you have in the 14 second query?

          Why do you need to single out and select so many entries?

          How large is the database?

          And how much memory have you reserved for key_cache (if MyISAM) or for innodb_buffer_size if you are using InnoDB tables?

          Comment


          • #6
            sterin wrote on Tue, 17 April 2007 00:06

            Some questions:

            1. How many entries did you have in the 14 second query?

            2. Why do you need to single out and select so many entries?

            3. How large is the database?

            4. And how much memory have you reserved for key_cache (if MyISAM) or for innodb_buffer_size if you are using InnoDB tables?



            1. 1000 id's (entries) were selected..
            2. I get the fulltext results from SphinxSearch. This returns only the id's .. I have to go over as many (at least 1000) to get a good result.
            3. Half a million rows.
            4. Table is MyISAM. key_buffer_size = 128M

            Any suggestions?

            Comment


            • #7
              1.
              No, your query _returned_ 1000 rows because you have a LIMIT 1000 at the end.
              What I wanted to know is how many entries did you have in the IN(...) expression?

              3.
              How many MB is the indexes as total in the DB?


              And what kind of hardware are you running this on?


              You should avoid using LIMIT and instead try to limit the number of rows in the WHERE clause.
              The reason is that the LIMIT is the last thing that is performed in the execution of a query.
              Which means that during the execution it will have to work on all records that match the WHERE part before you LIMIT away surplus rows.

              Comment


              • #8
                Hi, sorry, the LIMIT is unnecessary.. it are exactly 1000 records (ids) that are in the IN(..) part.

                I'm running on a dual xeon with 1Gb memory.

                3. How do I check this? What do you mean? The table is 82.000KB, the index (which has a lot of fulltext indexes and indexes on many columns) is 262.000KB This might be a little bit out of proportions.
                Does this matter? Does MySQL put the COMPLETE index in memory??

                Comment


                • #9
                  No MySQL doesn't put the complete index in memory.
                  But it will try to use as much of the cache as possible (until the limit that you have set is reached).
                  And you want it to use as much as possible.

                  The most ideal situation is when the entire DB with both index and data fits into RAM. Because as soon as it has to read from disk everything slows down by magnitudes.

                  So if you can fit the entire index into key_buffer_size while still maintaining about 75% free memory for OS cache for the tables is the best solution.

                  So RAM is generally considered a DBMS best friend.
                  As much as possible is the rule ).

                  Comment


                  • #10
                    First,

                    Using sphinx you should not retrieve all 1000 rows but use SetLimit to retrieve only ids you're going to show on the page.

                    Second the difference is expected - the query needs 1000 random IO operations for large data so it takes what it takes.

                    Other query does full table scan so it can find 1000 rows and stop pretty fast.

                    Comment


                    • #11
                      oordopjes wrote on Tue, 17 April 2007 14:46

                      Hi, sorry, the LIMIT is unnecessary.. it are exactly 1000 records (ids) that are in the IN(..) part.

                      I'm running on a dual xeon with 1Gb memory.

                      3. How do I check this? What do you mean? The table is 82.000KB, the index (which has a lot of fulltext indexes and indexes on many columns) is 262.000KB This might be a little bit out of proportions.
                      Does this matter? Does MySQL put the COMPLETE index in memory??


                      I don't know what Sphinx is, but if it were my table and my query I might try removing any indexes that weren't needed. It sounds like you may have too many. Ideally you can optimize away any with very low cardinality (phpMyAdmin shows this), or any with high null counts. The more unique the value, the better the column is to index.

                      RAM is cheap. you could always buy more.

                      You might try changing your query so it says SELECT * FROM BLAH WHERE id BETWEEN lowest_value AND highest_value AND id IN (list of ids)

                      I have no idea if that will help or not, but logically, it seems like if mysql did a table scan for each of the ids in the IN (list) then you could conceivably make that list smaller. Since those are primary keys, though, it may just be redundant code and possibly even slow things down. Tinkering is best. Still think it sounds like you have too many indexes.

                      Comment


                      • #12
                        I'm not a MYSQL expert, but I'm not a newbie, either.

                        Like said before, I'd first reduce your indexes and only have them where you need them. Additionally, knowing that Mysql will only use ONE index for a table per query, you should be able to examine your queries and decide which ones should go (if any).

                        Second... essentially you have 1,000 OR statements. I hate OR statements, almost as much as I refuse to use LEFT JOINs (I just use standard joins 100%).

                        I know how to speed it up (assuming you're doing PHP):

                        1) All those ID's you're searching for -- store them in an array, with their key as the ID *and* the value as the ID:
                        for ($a=0; $a<=$numberOfIDsToSearch; $a++)
                        { $val = $myOrigIDArray[$a]; $newIDArray[$val] = $val; }

                        2) Before executing the query, get the lowest ID # and the highest ID #. Set them at $min, and $max variables. It's easy to grab them from an array thru PHP's built-in functions.

                        3) Now, we'll make your SQL Statement getting all IDs in between your highest and lowest ($max and $min). You'll fetch an array of all possibilities of where those 1,000 lye, and you saved them all in memory!

                        Then, just decide which one to get. Here, I'll put the code from the select statement on (do you really need *?? - that'll slow it down):

                        $getidData = "SELECT id, info1, info2, info3 FROM table WHERE id>=$min AND id<=$max";
                        $result = mysql_query($getidData, $db_conn) or die("I screwed up");
                        while ($row = mysql_fetch_array($result))
                        {
                        $id = $row[0];
                        if ($newIDArray[$id] == $id)
                        { // ... I found it! I can build a new array of found ones here, to process later! }
                        } // end while

                        Comment


                        • #13
                          JGilbert wrote on Tue, 08 May 2007 15:31


                          I don't know what Sphinx is, but if it were my table and my query I might try removing any indexes that weren't needed. It sounds like you may have too many



                          But removing excessive indexes would not speed up selects. For write operations sure, carefully choose them to make sure theres no io wasted.

                          toddjnsn wrote on Tue, 27 May 2008 01:59


                          I hate OR statements, almost as much as I refuse to use LEFT JOINs (I just use standard joins 100%).



                          I dont understand this. If your query only needs an INNER JOIN then great, but I can think of many queries where a LEFT JOIN is necessary and desirable.

                          Dont be afraid of LEFT JOIN, it is more important to return the right results, and *then* worry about performance, than having a fast query that misses out some records.

                          toddjnsn wrote on Tue, 27 May 2008 01:59


                          3)Now, we'll make your SQL Statement getting all IDs in between your highest and lowest ($max and $min)



                          I think the IN (1,2,3..999) he pasted was an example )
                          The real IDs will are unlikely to be a nice contiguous range, since they are document ids which are returned by the Sphinx search daemon.

                          I agree with Peter, try to limit the amount of ids that Sphinx is returning. The Sphinx PHP API (and all other Sphinx APIs) have support for paging, and asking for 100 records from Sphinx and then fetching those 100 from the database should be much faster than using 1000. Unless you really *need* 1000 results?

                          Theres a few options you can consider.

                          If youre querying Sphinx for 1000 results, and then doing a JOIN to filter those results down further, you may be better off considering SphinxSE, which integrates with MySQL.
                          The bad news is that its not as scalable, and Sphinx must run on the same box as MySQL. Because of this, Id choose to avoid SphinxSE personally, but sometimes its a valid choice.

                          Secondly, could any of the content be cached either to filesystem, or to memcached? Typically most search features have a subset of basic queries which are run frequently, you may be able to cache these to alleviate the load.

                          I cant think of much else really. Youre essentially selecting a 'random' set of records where the ID is the primary key,so theres not much scope for improvement.

                          Maybe you need to alter your table schema, or hive some fields off into another table.

                          You could also consider innodb, which in my experience provides vastly improved performance for selects

                          Comment


                          • #14
                            "I dont understand this. If your query only needs an INNER JOIN then great, but I can think of many queries where a LEFT JOIN is necessary and desirable.

                            Dont be afraid of LEFT JOIN, it is more important to return the right results, and *then* worry about performance, than having a fast query that misses out some records."

                            I avoid doing SLOW (left) joins when I'd have to do them on my system a lot. So I denormalize things and put redundant data on pieces that WON'T change after writing (ie linking the rows by association). That way, I avoid the left join by a once-in-a-while write, with a million queries w/ avoided left joins. I'd never do such a thing if I had to put a redudant piece somewhere and have to change it after the fact.

                            As far as my example -- I think it'd solve his problem just fine. I ran into situations where I had to do matching -- 100,000+ IDs to find matches of multiple tables w/ 100,000+ IDs. I couldn't do an OR statement -- that'd be ridiculous.

                            So I did what I showed by example. Turned 30-45 minutes of query time down to about 20-30 seconds (swallowed ram on the run, but only for that 20-30 seconds).

                            Comment


                            • #15
                              Quote:


                              Second... essentially you have 1,000 OR statements. I hate OR statements, almost as much as I refuse to use LEFT JOINs (I just use standard joins 100%).



                              OR's are bad indeed because MySQL can't optimize them very well which means that it can't use an index to solve them.
                              But IN(...) are not bad because MySQL can use an index to solve IN()'s.
                              So you should not be afraid of using IN() if you need to.
                              Reading in all ID's in a PHP array is not usually the answer because reading the values from the DB into a PHP array also takes time and CPU (sometimes a lot).

                              Quote:


                              I avoid doing SLOW (left) joins when I'd have to do them on my system a lot. So I denormalize things and put redundant data on pieces that WON'T change after writing (ie linking the rows by association). That way, I avoid the left join by a once-in-a-while write, with a million queries w/ avoided left joins. I'd never do such a thing if I had to put a redundant piece somewhere and have to change it after the fact.


                              You can't really assume that:
                              1. all LEFT JOIN's are slow
                              Yes you are forcing the join order with a LEFT JOIN but the execution time for an INNER JOIN and a LEFT JOIN that are performed in the same table order is the same.
                              Unless you add extra rows as in your suggestion then those will require extra reading that can be implicitly be set to NULL in a normal LEFT JOIN query.

                              2. that writes are only once-in-a-while
                              For web servers then yes this is often true but on a lot of other servers it is not true and writes can be very large part of the load on the machine and on these server every extra write is bad.

                              3. that there are more matching records than non matching records in the child table
                              Since if most of your records in the master table don't have a corresponding value in the child table it is a waste of space and speed to add data that doesn't even need to be there.
                              It will use up CPU cycles and cache memory that can be better spent on something else.
                              [quote]

                              Quote:


                              As far as my example -- I think it'd solve his problem just fine. I ran into situations where I had to do matching -- 100,000+ IDs to find matches of multiple tables w/ 100,000+ IDs. I couldn't do an OR statement -- that'd be ridiculous.


                              Yes it solves the problem, but if the min and max values are very large then it will take a lot of CPU to read all that data and a lot of RAM to store it before you throw it away.

                              The rule for DBMS optimization is to always think about how you can as fast as possible throw away surplus data so you don't need to shuffle it around in any way.

                              In this case a IN() is the much better choice.

                              I understand that you come from a programmer background and yes some problems are solved faster in an external language.

                              But when you have understood SQL and indexes and how the optimizer works in a DBMS then you can solve most things much faster than performing loops in your external programming language.

                              Comment

                              Working...
                              X