Announcement

Announcement Module
Collapse
No announcement yet.

performance of query for normalized DB

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • performance of query for normalized DB

    I'm trying to build a webapplication where users can search for a person having a particular preference for color and material. To store this information I use the following structure (a MySQL dump can be found at the end of this post):
    *table person with fields:
    -persid: autoincrement id
    -name: name of the person
    *table material with fields:
    -materialid: autoincrement id
    -material: name of the material eg "wood"
    *table color with fields:
    -colorid: autoincrement id
    -color: name of the color eg "green"
    *table persmaterial with fields:
    -persmatid: autoincrement id
    -persid: link to table person
    -materialid: link to table material
    *table perscolor with fields:
    -perscolorid: autoincrement id
    -persid: link to table person
    -colorid: link to table color

    In the webapplication the search can be entered by the users as a kind of pseudo query:
    (color=red OR color=blue) AND color=green AND material=iron

    My question is: how can I automatically transform this pseudo query into an efficient MySQL query?
    I have tried out some different options:


    Option 1:
    (SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE p.persid=pc.persid AND (pc.colorid=1 OR pc.colorid=2) AND p.persid=pm.persid AND pm.materialid=2 GROUP BY p.persid HAVING (count(DISTINCT pc.colorid)=2 AND count(DISTINCT pm.materialid)=1)) UNION
    (SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE p.persid=pc.persid AND (pc.colorid=2 OR pc.colorid=3) AND p.persid=pm.persid AND pm.materialid=2 GROUP BY p.persid HAVING (count(DISTINCT pc.colorid)=2 AND count(DISTINCT pm.materialid)=1))
    Remarks:
    *I do not see how to turn a general pseudo query into a query like the one in option 1, except for turning the pseudo query into a sum of products form where the sulms would correspond to the UNIONs. IS there a clever way to obtain such a sum of products form from an arbitrary pseudo query?


    Option 2:
    SELECT persid FROM person p WHERE
    (EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=1 AND p.persid=pc.persid)
    OR
    EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=3 AND p.persid=pc.persid))
    AND
    EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=2 AND p.persid=pc.persid)
    AND
    EXISTS(SELECT * FROM persmaterial pm WHERE pm.materialid=2 AND p.persid=pm.persid)
    Remarks:
    *very easy to get from pseudo query to MySQL query but what about performance?

    Option 3:
    SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE
    p.persid=pc.persid
    AND
    (pc.colorid=1 OR pc.colorid=2 OR pc.colorid=3)
    AND p.persid=pm.persid
    AND pm.materialid=2
    GROUP BY p.persid HAVING
    sum(case when pc.colorid in ('1','3') then 1 else 0 end) >= 1
    AND
    sum(case when pc.colorid='2' then 1 else 0 end)>=1
    AND
    sum(case when pm.materialid='2' then 1 else 0 end)>=1
    Remarks:
    *this option requires the pseudo query to be turned into a product of sums form; again is their a clever way to obtain such a form;

    Option 4
    SELECT DISTINCT pc1.persid FROM perscolor pc1
    INNER JOIN perscolor pc2
    ON pc1.persid=pc2.persid AND pc2.colorid=2
    INNER JOIN persmaterial pm1
    ON pc1.persid=pm1.persid AND pm1.materialid=2
    LEFT OUTER JOIN perscolor pc3
    ON pc1.persid=pc3.persid AND pc3.colorid=1
    LEFT OUTER JOIN perscolor pc4
    ON pc1.persid=pc4.persid AND pc4.colorid=3
    WHERE COALESCE(pc3.persid,pc4.persid) IS NOT NULL
    Remarks:
    *this option requires the pseudo query to be turned into a product of sums form

    Option 5:
    SELECT p.persid FROM person p, persmaterial pm,perscolor pc1,perscolor pc2,perscolor pc3 WHERE p.persid=pm.persid AND p.persid=pc1.persid AND p.persid=pc2.persid AND p.persid=pc3.persid AND (pc1.colorid=1 OR pc2.colorid=3) AND pc3.colorid=2 AND pm.materialid=2 GROUP BY p.persid
    Remarks:
    *very easy to get from pseudo query to MySQL query but what about performance?



    -- phpMyAdmin SQL Dump
    -- version 2.6.1
    -- http://www.phpmyadmin.net
    --
    -- Host: localhost
    -- Generation Time: Oct 19, 2006 at 01:13 PM
    -- Server version: 4.1.9
    -- PHP Version: 4.3.10
    --
    -- Database: `aston`
    --

    -- --------------------------------------------------------

    --
    -- Table structure for table `color`
    --

    CREATE TABLE `color` (
    `colorid` int(11) NOT NULL auto_increment,
    `color` varchar(30) NOT NULL default '',
    PRIMARY KEY (`colorid`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;

    --
    -- Dumping data for table `color`
    --

    INSERT INTO `color` VALUES (1, 'red');
    INSERT INTO `color` VALUES (2, 'green');
    INSERT INTO `color` VALUES (3, 'blue');
    INSERT INTO `color` VALUES (4, 'yellow');

    -- --------------------------------------------------------

    --
    -- Table structure for table `material`
    --

    CREATE TABLE `material` (
    `materialid` int(11) NOT NULL auto_increment,
    `material` varchar(30) NOT NULL default '',
    PRIMARY KEY (`materialid`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;

    --
    -- Dumping data for table `material`
    --

    INSERT INTO `material` VALUES (1, 'wood');
    INSERT INTO `material` VALUES (2, 'iron');

    -- --------------------------------------------------------

    --
    -- Table structure for table `perscolor`
    --

    CREATE TABLE `perscolor` (
    `perscolorid` int(11) NOT NULL auto_increment,
    `persid` int(11) NOT NULL default '0',
    `colorid` int(11) NOT NULL default '0',
    PRIMARY KEY (`perscolorid`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ;

    --
    -- Dumping data for table `perscolor`
    --

    INSERT INTO `perscolor` VALUES (1, 1, 1);
    INSERT INTO `perscolor` VALUES (2, 1, 2);
    INSERT INTO `perscolor` VALUES (3, 2, 1);
    INSERT INTO `perscolor` VALUES (5, 3, 3);
    INSERT INTO `perscolor` VALUES (6, 3, 2);

    -- --------------------------------------------------------

    --
    -- Table structure for table `persmaterial`
    --

    CREATE TABLE `persmaterial` (
    `persmatid` int(11) NOT NULL auto_increment,
    `persid` int(11) NOT NULL default '0',
    `materialid` int(11) NOT NULL default '0',
    PRIMARY KEY (`persmatid`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;

    --
    -- Dumping data for table `persmaterial`
    --

    INSERT INTO `persmaterial` VALUES (1, 1, 1);
    INSERT INTO `persmaterial` VALUES (2, 1, 2);
    INSERT INTO `persmaterial` VALUES (3, 2, 1);
    INSERT INTO `persmaterial` VALUES (5, 3, 2);

    -- --------------------------------------------------------

    --
    -- Table structure for table `person`
    --

    CREATE TABLE `person` (
    `persid` int(11) NOT NULL auto_increment,
    `name` varchar(30) NOT NULL default '',
    PRIMARY KEY (`persid`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;

    --
    -- Dumping data for table `person`
    --

    INSERT INTO `person` VALUES (1, 'john');
    INSERT INTO `person` VALUES (2, 'emily');
    INSERT INTO `person` VALUES (3, 'liz');

  • #2
    I added some more persons and preferences to the different tables so that there where about 10000 persons with random color and material preferences in the database; the example query resulted in 1900 persons; here are the timing results in seconds:
    option 1: 0.2
    option 2: 0.005
    option 3: 0.003
    option 4: 0.007
    option 5: 0.006
    option 6: 0.002

    where option 6 is a newly added option:
    SELECT pm.persid FROM persmaterial pm WHERE pm.materialid=2 AND pm.persid IN (SELECT pc.persid FROM perscolor pc WHERE pc.colorid=2 AND pc.persid IN (SELECT pc.persid FROM perscolor pc WHERE pc.colorid=1 OR pc.colorid=3))

    I was a bit surprised by the good timing results for option 2; is there any one who can explain the way MySQL tackles the query in option 2 (I suppose MySQL is not going over all records of table "person" launching a subquery for each record of table "person"?);

    I tend to conclude that option 2 is the way to go since it is so easy to translate the pseudo query into the MySQL query and on top of that the resulting MySQL query is rather fast!

    Comment


    • #3
      I added some more persons and preferences to the different tables so that there where about 10000 persons with random color and material preferences in the database; the example query resulted in 1900 persons; here are the timing results in seconds:
      option 1: 0.2
      option 2: 0.005
      option 3: 0.003
      option 4: 0.007
      option 5: 0.006
      option 6: 0.002

      where option 6 is a newly added option:
      SELECT pm.persid FROM persmaterial pm WHERE pm.materialid=2 AND pm.persid IN (SELECT pc.persid FROM perscolor pc WHERE pc.colorid=2 AND pc.persid IN (SELECT pc.persid FROM perscolor pc WHERE pc.colorid=1 OR pc.colorid=3))

      I was a bit surprised by the good timing results for option 2; is there any one who can explain the way MySQL tackles the query in option 2 (I suppose MySQL is not going over all records of table "person" launching a subquery for each record of table "person"?);

      I tend to conclude that option 2 is the way to go since it is so easy to translate the pseudo query into the MySQL query and on top of that the resulting MySQL query is rather fast!

      Comment

      Working...
      X