Hi,
Here is an easy way to run the subset sum check from SQL, which you can then distribute with Shard-Query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
CREATE TABLE `the list` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `val` bigint(20) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), KEY `id` (`id`) ) ENGINE=MyISAM; SELECT val as `val`, COUNT(DISTINCT (id)) as `cd` FROM test.data as d WHERE val in (-2,-3,-10,15,15,16) GROUP BY val; +-----+----------+----------+ | val | cd | CNT | +-----+----------+----------+ | -10 | 1 | 1 | | -3 | 1 | 1 | | -2 | 1 | 1 | | 15 | 35417088 | 35417088 | +-----+----------+----------+ 5 rows in set (40.20 sec) |
Notice there is no 16 in the list. We did not pass the check. There are enough 15s though. The distinct value count for each item in the output set, must at least match the cardinality of each item in the input set).
Notice that I have a lot of numbers in my list:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
mysql> select val, count(*) from data group by val; +-----+----------+ | val | count(*) | +-----+----------+ | -10 | 1 | | -3 | 1 | | -2 | 1 | | 0 | 250000 | | 4 | 8000000 | | 7 | 14680064 | | 14 | 14680064 | | 15 | 35417088 | +-----+----------+ 8 rows in set (20.47 sec) |
Now insert a value which will cause our check to pass:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
mysql> insert into data (val) values (16); Query OK, 1 row affected (0.01 sec) mysql> SELECT val as `val`, COUNT(DISTINCT (id)) as `cd` FROM test.data as d WHERE val in (-2,-3,-10,15,15,16) GROUP BY val; +-----+----------+ | val | cd | +-----+----------+ | -10 | 1 | | -3 | 1 | | -2 | 1 | | 15 | 35417088 | | 16 | 1 | <-- now the check passes +-----+----------+ 5 rows in set (40.01 sec) |
Of course, I can use a materialized view and check the expression in subsecond.