GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

What's the difference in performance (why)?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • What's the difference in performance (why)?

    Hi All,

    The 2 queries below are the same except the "ON" clause is on 2 different sets of fields.

    The first ON clause is based on the "email" field in both tables. In both tables the email field is a "unique" index.

    The second ON clause is based on the "user_id" field which is a primary key in the users table (id) and only indexed in the subscribers table.

    Why would the "email" query take such a long time vs. the "id" query when both sets of fields are indexed? Is it the text comparison vs. the integer comparison?

    This query takes a really long time.

    SELECT M.* FROM users AS M
    LEFT JOIN subscribers AS N
    ON M.email = N.email
    WHERE
    N.id IS NULL AND
    M.block=0 AND
    M.registerDate<>'0000-00-00 00:00:00'

    This query takes a very short time.

    SELECT M.* FROM users AS M
    LEFT JOIN subscribers AS N
    ON M.id = N.user_id
    WHERE
    N.id IS NULL AND
    M.block=0 AND
    M.registerDate<>'0000-00-00 00:00:00'

  • #2
    What does EXPLAIN report? Storage engine? Time taken per query (really long / very short is not very descriptive)? Any multi-column indices?

    Very wild guess: the index on e-mail is too large to be fully kept in memory.

    Comment


    • #3
      Thanks for the response.

      Here are additional details:

      1. Explain on the "fast" query comes up with 2 rows:
      On the users table (M), it ends up using a "where" clause and looking at 153,000 records. On the subscribers table (N), it ends up looking at 1 row using "where, not exists" but also relying on the "user_id" index of "N".

      2. Explain on the "slow" query comes up with 2 rows:
      On the users table (M), it ends up using a "where" clause and looking at 153,000 records. On the subscribers table (N), it ends up using a "where, not exists" clause and looking at 152,000 records without using an index, this is probably a big part of the problem.

      3. Storage engine for both tables is: MyISAM.

      4. The fast query takes about 3 to 4 seconds, the slow query is over 50 seconds and essentially holds up the dedicated server we are using.

      5. There are no "multi-field" indexes in either table.

      Any more feedback you could provide would be really appreciated.

      Thanks

      Comment


      • #4
        The problem is that it is not using an index to look up the emailaddrsss in the N table.

        What's the data type of the e-mail column? And what's the length of the email index?

        If M.block=0 AND M.registerDate<>'0000-00-00 00:00:00' is very restrictive (filters at least 95%), add a multi-column index on (block,registerDate).

        Comment


        • #5
          Thanks for the quick response.

          Each table has an email address unique index, field type is varchar(100). I'm including the table information for both below. As I'm writing this, I noticed that the charset of users is set as utf8 and the charset of subscribers is set as latin1. Would that matter?

          CREATE TABLE `users` (
          `id` int(11) NOT NULL auto_increment,
          `name` varchar(255) NOT NULL default '',
          `username` varchar(150) NOT NULL default '',
          `email` varchar(100) NOT NULL default '',
          `password` varchar(100) NOT NULL default '',
          `usertype` varchar(25) NOT NULL default '',
          `block` tinyint(4) NOT NULL default '0',
          `sendEmail` tinyint(4) default '0',
          `gid` tinyint(3) unsigned NOT NULL default '1',
          `registerDate` datetime NOT NULL default '0000-00-00 00:00:00',
          `lastvisitDate` datetime NOT NULL default '0000-00-00 00:00:00',
          `activation` varchar(100) NOT NULL default '',
          `params` text NOT NULL,
          PRIMARY KEY (`id`),
          UNIQUE KEY `email` (`email`),
          UNIQUE KEY `username` (`username`),
          KEY `usertype` (`usertype`),
          KEY `idx_name` (`name`),
          KEY `gid_block` (`gid`,`block`)
          ) ENGINE=MyISAM AUTO_INCREMENT=155071 DEFAULT CHARSET=utf8;



          CREATE TABLE `subscribers` (
          `id` int(11) NOT NULL auto_increment,
          `user_id` int(11) NOT NULL default '0',
          `name` varchar(64) character set latin1 collate latin1_general_ci NOT NULL default '',
          `email` varchar(100) character set latin1 collate latin1_general_ci NOT NULL default '',
          `receive_html` tinyint(1) NOT NULL default '1',
          `confirmed` tinyint(1) NOT NULL default '0',
          `blacklist` tinyint(1) NOT NULL default '0',
          `timezone` time NOT NULL default '00:00:00',
          `language_iso` varchar(10) character set latin1 collate latin1_general_ci NOT NULL default 'eng',
          `subscribe_date` datetime NOT NULL default '0000-00-00 00:00:00',
          `params` text character set latin1 collate latin1_general_ci,
          PRIMARY KEY (`id`),
          UNIQUE KEY `email` (`email`),
          KEY `subscribe_date` (`subscribe_date`),
          KEY `user_id` (`user_id`),
          KEY `name` (`name`)
          ) ENGINE=MyISAM AUTO_INCREMENT=171388 DEFAULT CHARSET=latin1;

          Comment


          • #6
            Yes, character set is your problem. After you alter it, the second query should be as fast.

            Please read http://www.xaprb.com/blog/tag/vadim-tkachenko/ after "Generosity can be Unwise".

            Usertype could probably an enum-field, and I doubt an index helps much for that field.

            Comment


            • #7
              Thanks!

              I actually changed the character set of both email fields to "ascii". That took care of the problem.

              Thanks!

              Comment

              Working...
              X