I get a lot of mail and I prefer to store it for long time if not forever. With modern hard disk sizes it should not be problem at all, but because of how mailing programs are written it causes a lot of problems.
I’ve tried a lot of programs – Kmail, Evolution, Thunderbird on Linux, Outlook and The Bat! on Windows and they all seems to have the same problem – it is some kind of assumed mail messages, or at least some portion of them will fit in memory.
At this point for example I got tired of Thunderbird handling my 1GB inbox (In fact my Inbox holds less than 1000 of emails rests are “Deleted” but Thunderbird still keeps it in the same file) so I decided to move some 70.000 of messages to specially created “archive”
Folder. This makes Thunderbird to consume about 2GB of memory and I’m not sure if it will be able to complete operation at all as it is already running low on virtual memory.
This is not only my problem with these systems. Second one is crash recovery – in case of corruption due to power down or lack of disk space I see index rebuilt being done which is far from enjoyable on large data sizes.
So what always was interesting to me – why these mainstream solutions do not use some form of databases which both would handle problem of recovery and memory consumptions as databases usually are designed to handle large data sizes with limited amount of memory. MySQL in its embedded version could be cool but if not there are bunch of others such as BDB, SQLite, even JET if we count Microsoft solutions.
Seriously the only part you really need to have in memory to be able to quickly show list of messages sort them etc is message subject authors and few more fields from the header – it is no more than 200 bytes per message which should allow handling folders with 1.000.000 of messages with something like 200MB of memory.
Interesting enough if we look at hosted solutions there are some with database backend such as Zimbra or DBMail.