Recently I had a chance to take a look at Redis project, which is semi-persistent in memory database with idea somethat similar to memcache but richer feature set.
Redis has simple single process event driven design, which means it does not have to deal with any locks which is performance killer for a lot of applications. This however limits it scalability to single core. Still with 100K+ operations a second this single core performance will be good enough for many applications. Also nothing stops you from running many Redis instance on single server to get advantage of multiple cores.
I call Redis semi-persistent because it does not store the data on disk immediately but rather dumps its all database every so often – you have a choice of configuring time and number of updates between database dumps. Because dump is basically serial write Redis does not an expensive IO subsystem. Also because this dump is background it does not affect read/write performance to the database which is in memory. In the tests I’ve done I’ve seen Redis doing writes some 4MB/sec for probably 50% of test duration where Innodb had to write 50MB/sec for about third of throughput and doing a lot of random IO as it was doing it. This is among other things because Innodb has to flush full 16K pages while doing flush.
The background flush in Redis is designed the following way – Redis process forks and justs dumps the database it has in the background. Unix copy on write takes care of getting another copy of pages as they are modified. This keeps overhead rather low. The database is dumped in temporary file which is renamed only after fsync which means if you crash during the dump you simply discard partial file.
I also liked full pipelining support in the protocol – you can send multiple commands at once – any commands and redis server will process them in order and returns results to you. This not only allows for multi-get and multi-set but any of batches of commands being submitted. The API support for this is relatively week but the features are there.
When it comes to data types – Redis supports simple key-value storage just as memcache but it also adds support for Lists, which are similar to the linked list data type you would have as well as sets which allow to store sets of strings with support of various set operations.
I kind of miss support for something like associative array/hash/map in the data types but I guess there is nothing in architecture which would stop it from being added later.
Redis also has support for “databases” which are basically key spaces inside the server. This should allow for using server by different applications, different versions, testing or some other features. Though you’ve got to be careful with these – there is only simple per instance password based authentication in place, so if application can talk to the instance it can access all databases.
I am also a bit surprised why databases are numbered instead of named. Naming the databases would make it more simple to avoid unwanted conflicts etc.
Redis also supports master/slave replication out of the box and it is extremely simple. You just specify from which node to replicate and this is it. It is even more simple than with MySQL as you do not need to deal with snapshot or binary log position. Replication is asynchronous and low overhead – redis will perform the database dump and store the commands on the data since the start of the process. Slave can get the data (which is basically set of commands to populate database itself) and when get the data from the master as it comes. I did not benchmark the replication capacity but I’d expect it to be close to 100K of writes/sec the single instance can handle.
The benchmarks I’ve done were for applications which is very update intensive with updates being pretty much random single row updates which are hard to batch. With MySQL/Innodb I got server being able to handle some 30.000 updates/sec on 16 core server with replication being able to handle 10.000 updates/sec. This was using about 5 cores so you could probably get 4 MySQL instances on this server and get up to 100K updates/sec with up to 40K updates/sec being able to replicate.
With Redis I got about 3 times more updates/sec – close to 100.000 updates/sec with about 1.5 core being used. I have not tried running multiple instances and I’m not sure the network and TCP stack would scale linearly in this case but anyway we’re speaking about hundreds of thousands of updates/sec.
I think Redis can be great piece of architecture for number of applications. You can use it as the database or as cache (it supports data expiration too)
I have not benchmarked it against memcache in terms of performance and memory usage. This may be another project to look at.