Looking at Redis

PREVIOUS POST
NEXT POST

Recently I had a chance to take a look at Redis project, which is semi-persistent in memory database with idea somethat similar to memcache but richer feature set.

Redis has simple single process event driven design, which means it does not have to deal with any locks which is performance killer for a lot of applications. This however limits it scalability to single core. Still with 100K+ operations a second this single core performance will be good enough for many applications. Also nothing stops you from running many Redis instance on single server to get advantage of multiple cores.

I call Redis semi-persistent because it does not store the data on disk immediately but rather dumps its all database every so often – you have a choice of configuring time and number of updates between database dumps. Because dump is basically serial write Redis does not an expensive IO subsystem. Also because this dump is background it does not affect read/write performance to the database which is in memory. In the tests I’ve done I’ve seen Redis doing writes some 4MB/sec for probably 50% of test duration where Innodb had to write 50MB/sec for about third of throughput and doing a lot of random IO as it was doing it. This is among other things because Innodb has to flush full 16K pages while doing flush.

The background flush in Redis is designed the following way – Redis process forks and justs dumps the database it has in the background. Unix copy on write takes care of getting another copy of pages as they are modified. This keeps overhead rather low. The database is dumped in temporary file which is renamed only after fsync which means if you crash during the dump you simply discard partial file.

I also liked full pipelining support in the protocol – you can send multiple commands at once – any commands and redis server will process them in order and returns results to you. This not only allows for multi-get and multi-set but any of batches of commands being submitted. The API support for this is relatively week but the features are there.

When it comes to data types – Redis supports simple key-value storage just as memcache but it also adds support for Lists, which are similar to the linked list data type you would have as well as sets which allow to store sets of strings with support of various set operations.

I kind of miss support for something like associative array/hash/map in the data types but I guess there is nothing in architecture which would stop it from being added later.

Redis also has support for “databases” which are basically key spaces inside the server. This should allow for using server by different applications, different versions, testing or some other features. Though you’ve got to be careful with these – there is only simple per instance password based authentication in place, so if application can talk to the instance it can access all databases.

I am also a bit surprised why databases are numbered instead of named. Naming the databases would make it more simple to avoid unwanted conflicts etc.

Redis also supports master/slave replication out of the box and it is extremely simple. You just specify from which node to replicate and this is it. It is even more simple than with MySQL as you do not need to deal with snapshot or binary log position. Replication is asynchronous and low overhead – redis will perform the database dump and store the commands on the data since the start of the process. Slave can get the data (which is basically set of commands to populate database itself) and when get the data from the master as it comes. I did not benchmark the replication capacity but I’d expect it to be close to 100K of writes/sec the single instance can handle.

The benchmarks I’ve done were for applications which is very update intensive with updates being pretty much random single row updates which are hard to batch. With MySQL/Innodb I got server being able to handle some 30.000 updates/sec on 16 core server with replication being able to handle 10.000 updates/sec. This was using about 5 cores so you could probably get 4 MySQL instances on this server and get up to 100K updates/sec with up to 40K updates/sec being able to replicate.

With Redis I got about 3 times more updates/sec – close to 100.000 updates/sec with about 1.5 core being used. I have not tried running multiple instances and I’m not sure the network and TCP stack would scale linearly in this case but anyway we’re speaking about hundreds of thousands of updates/sec.

I think Redis can be great piece of architecture for number of applications. You can use it as the database or as cache (it supports data expiration too)

I have not benchmarked it against memcache in terms of performance and memory usage. This may be another project to look at.

PREVIOUS POST
NEXT POST

Comments

  1. says

    I would think that rather than using the ‘databases’ feature, that one would run multiple redis servers so that each “database” could have full access to a CPU since the application is single threaded.

  2. says

    Justin,

    Yes. It just the different angle. If you have many applications which all have very light load maintaining multiple instances can be a pain. But yes… this is surely another option.

  3. says

    As for associative data types, can’t you work around that using key/value pairs:

    key1:1 = blah
    key1:2 = blah2
    key1:3 = blah3
    key1:7 = blah7
    key1:apple = bologna

    key1:map = 1:2:3:7:apple — used for getting all keys with the key1 prefix

  4. says

    Justin,

    Not really. Think about arrays which are being modified very actively with one thread wants to add the element to the map and other remove it ?

    Actually in redis it is a bit easier as you can have names of the keys in the LIST which you can add values to. Still I think something as convenient as associative/sparse array would be very convenient.

  5. says

    Justin,

    It is single threaded for internal processing but it serves multiple clients at once so there could be other clients served between requests from the single client.

  6. says

    One thing you forgot to mention (I think), is that you *must* have enough memory to store your dataset in. There is no support for paging to disk.

    I’m still using it for various small projects, mostly because the set operations are hot.

  7. says

    peter,

    Just wanted to be clear. If you compare it to Tokyo Tyrant or something similar, it’s something to keep in mind.

  8. says

    Hello! Thanks for the review. I’m near to be back to work (this year I was able to stop the full August exceptionally…) and among the issues I’m going to solve there is the problem (for some kind of app is a problem, for other domains it is a strength IMHO) of Redis not supporting paging and being all in memory.

    Currently it’s not clear to me if I’ll implement an abstract interface in order to implement “storage engines” where the current one will just be the memory-storage-engine, followed by a new one called disk-storage-engine, or if I’ll try to implement paging in a way similar to operating systems, that is Redis objects that were not accessed recently will get swapped on disk and reloaded if needed. I think I’ll go for the first solution of the different storage engines in order to avoid reinventing the OS wheel.

    Probably the disk-storage-engine will not support every operation supported by the memory one in the first release, and it is possible even to have multiple storage engines for the disk and memory targets, for instance I (or somebody else) can write a memory storage engine that is slower but uses less memory.

    The first thing to address is the release of Redis 1.0 that is mostly a matter of small patches and documentation.

    Thanks again for this great review!

  9. Didier Spezia says

    One point I don’t like much with Redis is the event loop implementation, only based on select (whatever the platform). I do not expect it to scale much when the number of clients increases. This could be changed in the future though. Its competitors use libevent (memcached family), or at at least an epoll customized implementation (Tokyo Tyrant).

  10. says

    Hello Didier,

    to change this is very simple, but using the benchmark even with a lot of clients does not show a noticeable performance problem due to select(2), at least up to 50/100 clients. Anyway Redis already uses an abstract interface (ae.c) to implement the event loop, so it’s just a matter of implementing epoll in ae.c once this will become a problem. This issues was not addressed before because Redis is already one of the fastest kv stores available and there were other priorities, but after 1.0 stable it will be time to fix this problem.

  11. Dimitri says

    Hi Peter,

    I’m curious – did you try to run any tests with NDB engine?.. Because NDB is way more mature and already integrated within MySQL :-)

    Several years ago we already reached 1.5M(!) TPS (yes, per second!) with NDB. And currently NDB team claims to be able to do much more better :-)

    Rgds,
    -Dimitri

  12. says

    Antirez,

    I did not notice select but this is indeed the problem as soon as you get to deal with many connections 50-100 clients is a trivial amount if you use persistent connections. There are many memcache installations which are working with 10K+ connections. Some though just use UDP.

  13. says

    Dimitri,

    I think it is apples and oranges in this case. I looked at Redis as on persistent and more feature reach memcache rather than anything else. I also wanted and liked extreme simplicity. Getting master-slave redis up and running accessing it from PHP took me 15 minutes.

  14. Istvan Podor says

    Hey Peter,

    Thanks for sharing. We just met with a challenge a few days ago what we thought the only way to serve is to get a bunch of servers. But now, with redis+nginx+varnish we can serve 15k dynamic content with one single machine (not php sites, just dynamic content).

    Thanks for it.

  15. Andy says

    Peter,

    The one thing that strikes me is the performance penalty of MySQL replication. In your test replication reduces MySQL performance by 67% (from 30K to 10K updates/sec).

    67% is a very large performance degradation, is it typical for replication to have such big slowdown? What about replication that causes such a huge performance drop? Is it the binlog? innodb_support_xa?

    What if I have a MySQL server without replication, but still have binlog & innodb_support_xa enabled for data recovery purpose? What kind of performance penalty would that incur? Hopefully a lot less than 67%?

    Andy

  16. says

    Andy,

    The overhead of replication is really rather small. The problem is replication is single thread so slave capacity is pretty much limited to performance when all queries are ran by single thread.

  17. Andy says

    Peter,

    I understand that the single-threadedness of the slave will slow down the slave, even cause it to lag behind the master. But that shouldn’t have any effect on the master, right? So what caused the master’s performance to drop from 30k updates/sec to 10k updates/sec when replication was turned on in your test?

    Andy

  18. Dainel, Wu says

    Hi Peter,

    1: I am looking at redis and see whether there is a chance to use it to replace mysql memory engine. As there are some insert, the table lock blocks the throughput, but one insert could touch/set many columns, say 10 columns, so if insert runs 1000/second in mysql, then for redis, that’s 1000*10=10,000 set/second as redis can’t set many columns (not sure whether I am right, but I looks through the doc and can’t find a way to do so) as one command as mysql. For query many columns such as 10 columns, redis could return a list, so one query could be mapped to one get. So does that mean if an app has lots of insert/update which touch many columns, then redis is not a good choice?

  19. says

    Daniel,

    There are multiple choices. The most common is to store serialized documents in Redis – if you change one of the columns well you basically change the whole document. This works well if your update is basically replace if it is something like increment you may want to store column as a separate value.

  20. Dainel, Wu says

    Thanks Peter,

    Your test shows redis could show 300,000 update/second, if using serialzied documents, then the entire row will be replaced, if the row size is 1,000 (in our case, the row size could go up to 8,000, the row has about 50 columns), then every second 300,000 * 1,000=300M data will be transferred over network, the network card is hard to support that. So if I could operate one one column, that will save lots of network bandwidth

  21. Dainel, Wu says

    Thanks for your reply, Peter,

    In our mysql table, there are about 20 columns, row size could go up to 2,000. Even if Redis could support 100,000 update/second, the network bandwidth is a problem as 100,000 * 2,000=200M per second if using serialized documents as you said.

  22. says

    Hello,

    just in order to add some context.

    Redis 1.1 ( currently in beta, download it from Git) supports MSET (multi-set) and MGET (already in 1.0). So it’s possible to set multiple fields in a single operation.
    Redis 1.2 will support an Hash type.
    Redis 1.1 supports append-only journal for better durability, with three different fsync() policies (never, every second, after every write).

    Redis 1.1 is 10x faster with operations like LRANGE or SMEMBERS or MGET involving more data than 1 Kbyte. This is very important in your scenario if your objects are > 1k.

    For any info please drop a message into the Redis Google Group and we’ll try to help.

    Cheers,
    Salvatore (author of Redis)

  23. says

    Hi sir,

    i am new to the Grails and Redis .

    i have to build a project by using above technologies.

    i feel comfort with Grails but i coming to Redis i know basic command now to run it thats it i dont know more than that.

    can u help me out .

    give me the basic stuff which need to practice and i am using STS tool.give me the sample project which build on both Grails and Redis .

    and pls explain me how to configure Redis in config.groovy.

    i am waiting for ur help and suggestions .

    if possible please mail me to ” sivakotiuday@gmail.com” .

    thank you.

    Your Uday.

Leave a Reply

Your email address will not be published. Required fields are marked *