At Users Conference Heikki did good presentation about Innodb planned features. I did not see some of big and tiny wishes listed so I was making notes. Here is what I’d like to see

  1. Packed indexes. In many cases then difference in Performace with MyISAM and Innodb is huge for read only workload it is caused by the fact Innodb does not have prefix compression for its indexes, as MyISAM has which in certain cases makes them 10 times + larger. Heikki plans to have gzip like page compression but it would only give up to 2 times space compression.
  2. Insert buffer for delete. For Insert operation Innodb uses nice tecnique called Insert Buffer which speeds up inserts for non-unique indexes a lot. However deletes are not handled so they can be 10 times+ slower, which is especially bad for rollback of failed inserts which may take too long to complete
  3. Index build by sort. This is in ToDo but so important I’ll still list it. It would allow data load to be much faster as well as ALTER TABLE (expecially together with ability to create index without table rebuilding)
  4. Buffer pool improvements. I would like to see couple of things here. To have kind of middle point insertion so full table scan do not wipe it off as well as different priorities for different kind of data, for example flushing of undo pages might be avoided if possible as we know they are going to be purged.
  5. Long Rows. One ugly thing Innodb currently does is storing first 700 bytes or something like that from any blob/string in the page. Which gives you the nice surprise of row being unable to fit to the page if many columns are long strings. I would like to see it at least doing it for indexed columns only or even better have option similar to PostgreSQL to specify where this column should be stored – with the row in the page or outside of it. The other item with blobs – currently if I’m not mistaken blobs are stored each in its own “segment” which means having many blobs in the same row is going to be slow. For medium sized blobs it would be much better to merge them and store on the same segment and ie only store blobs over 1M in their own segments.
  6. Clustering. I love Innodb’s ability to cluster data by primary key it is so helpful in many cases, but in other cases you would like to cluster data by index which is not unique – for example cluster data by customer or by timestamp. Creating artificial second column is ugly and inconvenient. Having independent CLUSTER=key option would be great, it also would help to avoid penalty of very long primary key – you can simply select not to cluster by it instead of creating fake primary key and promoting real one to unique.
  7. Tablespaces. To call “innodb_file_per_table” option tablespaces is a joke. It is better than nothing but needs to be improved allowing many to many map of objects to the tablespaces. Especially being able to map different indexes to different tablespaces is important. This could be helpful in certain workloads. Yes you could simply use RAID and forget about tablespaces by spreading the load, but you often can do it better manually 🙂
  8. I/O On Linux you’re currently limited to innodb_file_io_threads=4. Some people do not know it but setting it to different values does not have any effect 🙂 In certain cases multiple threads for some tasks would be helpful, especially for flushing dirty buffers. I would actually like to see it separated to different options – for example it does not make much sense to have more than one log flush thread. The other big lacking (especially as patch for it existed for years) is Async IO support in Linux. It should be very helpful with direct IO as well as with innodb_file_per_table=1 allowing to flush all of modified tablespaces in the parallel.
  9. Multiple page sizes. One size does not fit all and for certain workloads 16K is too much while for some others too little. Being able to specify page size per object or at least per tablespace would be great. This will complicate buffer management a bit but Innodb is already to deal with 2 page sizes by supporting compressed pages so it might be not that bad. Also other guys are able to handle it – even MyISAM has 1K-4KB key blocks so it is well possible
  10. Scalability It is actually bug but it is so bad I will list it. Basically Innodb has problems scaling both with large number of active threads and with large number of CPUs (in this bug you can see it is too bad even with 4 concurrent queries). The problem seems to be at least partially due to global mutexes in many places so it is not easy to fix. But I guess Heikki will have to do it otherwise with raise of MultiCores Innodb will become unusable for more and more types of workloads

What would your Innodb feature wishes be ?

28 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
letenky

In my list FULLTEXT would be #1

Paul Wieland

Fulltext indexing on InnoDB tables. I’ve been dying to have this for years!!

pudo

(I have to use MySQL 4.0 at work and I’m not sure this is not already there in a later version, but): I’d love to see FULLTEXT indices in InnoDB. MyISAM has it. InnoDB needs it.

Otherwise, that’s a neat list.

Joe

I 3rd fulltext. It would solve a number of problems for me.

n

Why bother fixing a inherently broken database-system when PostgreSQL and Firebird are freely available?

Justin Mecham

FULLTEXT support would definitely be useful in InnoDB.

Francesco Simi

We need Unicode (utf8) support in MyODBC driver for Windows.

Broken for 4 years now… killing the windows platform for MySQL.

Damon

Free hot backup

Gokhan Demir

1. MySQL to open interface fulltext search and innoDB implement it.
2. Descending indexes.
3. Better index usage on range queries.

Gokhan Demir

select count(*)
from Statement T
join CurrDailyVal Q on Q.cur1 = T.cur1
and Q.cur2 = ‘USD’
and Q.xdate = ( select max(xdate) from CurrDailyVal
where cur1 = Q.cur1
and cur2 = ‘USD’
and xdate

Gokhan Demir

< = T.xdate )

This is a select from an income statement table to report incomes in a different currency. CurrDailyVal table holds the exchange values. Its primary key is cur1, cur2, xdate.

This select is an example to my previous post’s item 2 and 3.

Item 2: I believe descending indexes would be helpful here since in the inner select i find a max date when approaching the nearest exchange value.

Item 3: MySQL does not use xdate in the inner select. It just uses cur1 and cur2 and narrows the potential result set and then make a scan thru the remaining rows to find the max date.

Gokhan Demir

According to the documentation at MySQL site, Peter is right. MySQL should use all three columns in the inner select. However, I have faced performance problems in the real world which made me think MySQL does not use the range indexes efficiently.

For the example query I previously attached here, I have prepared a test case, where you can see what I mean. To better illustrate what I mean, I have put rather big amounts of data to both tables. Statement table has 172806 rows, and CurrDailyVal has 638 rows. You can download the test case at http://igonline.biz/gdemir/data.tar.bz2 and try for yourself. The test case file itself is 1367455 bytes (~1.3 MB).

Gokhan Demir

This same test case takes 30 seconds to complete with Sybase ASE 15. Showplan output clearly states the usage of all three columns in the correlated subquery. Now, I have a production system with this kind of select and the data is growing day by day, I don’t know what to do.

Gokhan Demir

Why can Sybase use the three columns and MySQL cannot?

Gokhan Demir

Hi Peter,

Thanks for the suggestion. Before submitting this case as a bug, I made a search through the bug system and found that bug, most probably very similar to this.

http://bugs.mysql.com/bug.php?id=18465

Therefore, I decided not to submit this case as a bug.

By the way, especially nowadays, it is getting too much time for mysql developers to fix the bugs. There are even some severity 1 bugs open for six months or so, and no response from MySQL at all.

Best Regards,
Gokhan

Dmitry

simple “SELECT COUNT(*) FROM table” takes minutes/hours to run on tables with xxM records, can’t InnoDB calculate it based on Primary Key instantly?

Marcin

Endless ibdata grow… #1 place on my list