Dear Srs,
I have a table with near 1.000.000 rows, and I want to select random rows, but without repeating records.
If I don't prevent repeating records MySQL works fine, this is a PHP code fragment:
$result = mysql_query("SELECT MIN(id) AS min , MAX(id) AS max FROM tabla");$rango = mysql_fetch_object($result);$rand_id = mt_rand($rango->min, $rango->max);$result = mysql_query("SELECT * FROM tabla WHERE id >= $rand_id LIMIT 0,1");
This is soo fast, MySQL rocks
For preventing repeating rows, I'm making this:
1) There's another column to mark if a record was selected previosly or not
2) Select rows not market at selected
3) Get the record and mark it as selected
4) Go to 1)
The step 2) has a WHERE statement that slows de query to more than 10 seconds..
This is the table definition:
mysql> SHOW CREATE TABLE ip_assigned \G*************************** 1. row *************************** Table: ip_assignedCreate Table: CREATE TABLE `ip_assigned` ( `id` int(11) NOT NULL auto_increment, `ip` varchar(256) NOT NULL, `assigned_time` timestamp NOT NULL default '0000-00-00 00:00:00', `crawler` int(11) NOT NULL, PRIMARY KEY USING BTREE (`id`), KEY `index_ip` (`ip`(16)), KEY `index_crawler` (`crawler`,`ip`)) ENGINE=MyISAM AUTO_INCREMENT=871539 DEFAULT CHARSET=utf81 row in set (0.00 sec)mysql>
This is the query to get not market records:
// Get COUNT(*) FROM cache, cache->total_ips$sql = "SELECT val AS total_ips FROM cache WHERE param = 'total_ips'";$result = mysql_query($sql);$cache = mysql_fetch_object($result);// Calculate a random offset for the LIMIT$offset = mt_rand(1, $cache->total_ips);// Get $total_requested_ips random IPs$sql = "SELECT ip FROM ip_assigned WHERE crawler = 0 LIMIT $offset,$total_requested_ips";$result = mysql_query($sql);
This last query is very slow, because MySQL needs to scan a lot of rows, this is the output of EXPLAIN command:
mysql> DESCRIBE SELECT ip FROM ip_assigned WHERE crawler = 0 LIMIT 705607,10240 ;+----+-------------+-------------+------+---------------+---------------+---------+-------+--------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------------+------+---------------+---------------+---------+-------+--------+-------------+| 1 | SIMPLE | ip_assigned | ref | index_crawler | index_crawler | 4 | const | 813308 | Using index | +----+-------------+-------------+------+---------------+---------------+---------+-------+--------+-------------+1 row in set (0.00 sec)
What can I do to improve this query? perhaps another method/idea to prevent repeating random registers? Thanks!
Regards,
I have a table with near 1.000.000 rows, and I want to select random rows, but without repeating records.
If I don't prevent repeating records MySQL works fine, this is a PHP code fragment:
$result = mysql_query("SELECT MIN(id) AS min , MAX(id) AS max FROM tabla");$rango = mysql_fetch_object($result);$rand_id = mt_rand($rango->min, $rango->max);$result = mysql_query("SELECT * FROM tabla WHERE id >= $rand_id LIMIT 0,1");
This is soo fast, MySQL rocks

For preventing repeating rows, I'm making this:
1) There's another column to mark if a record was selected previosly or not
2) Select rows not market at selected
3) Get the record and mark it as selected
4) Go to 1)
The step 2) has a WHERE statement that slows de query to more than 10 seconds..
This is the table definition:
mysql> SHOW CREATE TABLE ip_assigned \G*************************** 1. row *************************** Table: ip_assignedCreate Table: CREATE TABLE `ip_assigned` ( `id` int(11) NOT NULL auto_increment, `ip` varchar(256) NOT NULL, `assigned_time` timestamp NOT NULL default '0000-00-00 00:00:00', `crawler` int(11) NOT NULL, PRIMARY KEY USING BTREE (`id`), KEY `index_ip` (`ip`(16)), KEY `index_crawler` (`crawler`,`ip`)) ENGINE=MyISAM AUTO_INCREMENT=871539 DEFAULT CHARSET=utf81 row in set (0.00 sec)mysql>
This is the query to get not market records:
// Get COUNT(*) FROM cache, cache->total_ips$sql = "SELECT val AS total_ips FROM cache WHERE param = 'total_ips'";$result = mysql_query($sql);$cache = mysql_fetch_object($result);// Calculate a random offset for the LIMIT$offset = mt_rand(1, $cache->total_ips);// Get $total_requested_ips random IPs$sql = "SELECT ip FROM ip_assigned WHERE crawler = 0 LIMIT $offset,$total_requested_ips";$result = mysql_query($sql);
This last query is very slow, because MySQL needs to scan a lot of rows, this is the output of EXPLAIN command:
mysql> DESCRIBE SELECT ip FROM ip_assigned WHERE crawler = 0 LIMIT 705607,10240 ;+----+-------------+-------------+------+---------------+---------------+---------+-------+--------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------------+------+---------------+---------------+---------+-------+--------+-------------+| 1 | SIMPLE | ip_assigned | ref | index_crawler | index_crawler | 4 | const | 813308 | Using index | +----+-------------+-------------+------+---------------+---------------+---------+-------+--------+-------------+1 row in set (0.00 sec)
What can I do to improve this query? perhaps another method/idea to prevent repeating random registers? Thanks!
Regards,
Comment