October 22, 2014

Mail clients and Databases

I get a lot of mail and I prefer to store it for long time if not forever. With modern hard disk sizes it should not be problem at all, but because of how mailing programs are written it causes a lot of problems.

I’ve tried a lot of programs – Kmail, Evolution, Thunderbird on Linux, Outlook and The Bat! on Windows and they all seems to have the same problem – it is some kind of assumed mail messages, or at least some portion of them will fit in memory.

At this point for example I got tired of Thunderbird handling my 1GB inbox (In fact my Inbox holds less than 1000 of emails rests are “Deleted” but Thunderbird still keeps it in the same file) so I decided to move some 70.000 of messages to specially created “archive”
Folder. This makes Thunderbird to consume about 2GB of memory and I’m not sure if it will be able to complete operation at all as it is already running low on virtual memory.

This is not only my problem with these systems. Second one is crash recovery – in case of corruption due to power down or lack of disk space I see index rebuilt being done which is far from enjoyable on large data sizes.

So what always was interesting to me – why these mainstream solutions do not use some form of databases which both would handle problem of recovery and memory consumptions as databases usually are designed to handle large data sizes with limited amount of memory. MySQL in its embedded version could be cool but if not there are bunch of others such as BDB, SQLite, even JET if we count Microsoft solutions.

Seriously the only part you really need to have in memory to be able to quickly show list of messages sort them etc is message subject authors and few more fields from the header – it is no more than 200 bytes per message which should allow handling folders with 1.000.000 of messages with something like 200MB of memory.

Interesting enough if we look at hosted solutions there are some with database backend such as Zimbra or DBMail.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Sencer says:

    I strongly recommend you check out Claws Mail (formerly Sylpheed Claws) to which I switched from Thunderbird just recently (therefore the evangelism, I guess. Sorry, I can’t help it). It is a lot more conservative with resource usage, and it is a lot more responsive than Thunderbird. You can import mbox files (which TB uses) individually, or there is also a script available if you want to migrate a whole folder hierarchy from Thunderbird. I wrote down my experience and a have a few links if you follow the link to my site.

    Other than that, yes, I agree that it is odd that so few mail-programs make use of sql-based storage. I think at least partly it has to with the history (a lot of better known mail programs have a history that goes a loong way back).

  2. evan says:

    Apple’s Mail.app uses SQLite. It doesn’t do any vacuuming or maintenance of the db file if it becomes fragmented though. But SQLite does seem ideal for this task.

  3. peter says:

    Sencer,

    I guess I should I looked into it couple of years ago and I had some problems with it, it was something silly like ugly fonts or something similar.

  4. peter says:

    Evan,

    Why do all good things happen with Mac ?

    I probably should try to wast money on Mac next time I upgrade laptop :)

  5. thenexus6 says:

    If we look at hosted solutions we can see 10-years-old cyrus IMAP server :)
    It stores mail headers in bdb, so it is amazingly fast on large mailboxes. I’m keeping my 1Gb mailbox on IMAP plus have several maillist archives (more than 1Gb) in the shared folders and sylpheed, thunderbird, evolution, outlook express work fast. Very fast.

  6. I totally agree with you on the subject and i have asked myself the question many times.

    I guess part of the problem lies in the fact that most databases have to be installed as a service on the computer and hence creates a rather large footprint for the software. I guess that SQLite and solutions like that will change that in the future.

    Is there some other small database solutions that could fit for this type of solutions?

  7. ChristianS says:

    It’s a bit off-topic, peter:

    > I got tired of Thunderbird handling my 1GB inbox (In fact my Inbox
    > holds less than 1000 of emails rests are “Deleted” but Thunderbird
    > still keeps it in the same file)

    Ever tried “File -> Compact Folders” to reorganize your mailbox-folders and purges deleted messages?

  8. peter says:

    Sure. But the point is I do not really want to remove deleted mail from “Deleted” box and if you do not do it purge does not help as Deleted seems to be stored in Inbox file really.

    Compacting helps if you really have few messages and large number of ones you permanently deleted.

  9. Apachez says:

    Use GMail, problem solved… NEXT! ;-)

  10. peter says:

    Regarding IMAP – I’ve tried it for so many times but always got back to downloading email because I have not found how to set up the system so it really will cache ALL email locally so it will work without Internet connection like in the plane etc.

  11. peter says:

    Apachez,

    I want to have my mail locally so I can use it without being connected to the Internet so Gmail would not work.
    Not to mention I do not want to use other domain for my mail.

  12. Andrisi says:

    There are several other types of applications that could use a database under the hood. Maybe someone should support this explicitly and so developers realize the potentials. I think a database sould be a part of the OS. Store all preferences, configurations, etc. Win32 registry is a nice start, but try to list the available filtypes – you’ll see that a relational schema and some SQL would be much better for this. (Someday I’ll make it happen :-)

  13. Just for the record, M2 (Opera’s email client) uses a database to store its messages. I’ve never used it properly (too many different identities which it doesn’t seem to handle well) but it seems very interesting.

    There was a lot more innovation a couple of years back – see my roundup at http://peter.mapledesign.co.uk/weblog/archives/most-email-clients-are-obsolete. Sadly most that interested me have since gone out of production.

  14. Peter, I use Thunderbird on both Linux (Fedora Core 5/6/7) as well as Windows, and while I have to agree with most of your complaints, you CAN stored your e-mail locally while using an IMAP server, and check your messages while offline. I do this by going File -> Offline -> Download/Synch Now. It may take a while to download 1GB of e-mail (that’s about how much I have, after deleting spam and messages I don’t need to keep [e.g., forum post notifications]).

    I can’t say it works flawlessly, because if you check your e-mail from multiple locations, like I do, you may have to this routine frequently to keep everything synched, but I haven’t run into too many issues with it, and as a bonus, I can access old e-mail quickly, without a round-trip to the server to download it.

Speak Your Mind

*