In a previous blog post we described the pipelining performance optimizations we made in MailSite 10 to improve IMAP and TLS connections. This wasn't all we did in MailSite 10, we also decided to tackle a long standing protocol flaw in IMAP; the ability to query for changes to a folder, especially in modern mail folders which could easily contain 10,000s of email messages.
When you select a folder, IMAP by default is able to efficiently tell you how many messages exist in the folder, how many are recent, how many are unseen, and what the next UID is. This allows you to fetch all the new messages in the folder since you last asked, but it can't tell you what has been deleted or modified since you last asked. This leads to clients making frequent requests to ask for the UID and Flags of all the messages in the folder. The client can then tally off the results against its local cache to conclude what has been deleted and modified since it last asked.
RFC5162 "IMAP4 Extensions for Quick Mailbox Resynchronization" addresses this flaw:
"This document defines an IMAP4 extension, which gives an IMAP client the ability to quickly resynchronize any previously opened mailbox as part of the SELECT command, without the need for server-side state or additional client round-trips. This extension also introduces a new response that allows for a more compact representation of a list of expunged messages (and always includes the Unique Identifiers (UIDs) expunged)."
It allows you to pass in synchronization tokens when you select a folder. The server is then able to respond with what messages have been deleted, and modified since that. Unfortunately client support for this extension is currently poor so we have no choice but to respond to the client's request as quickly as we possibly can.
Given the vast majority of flags don't change, and those flags that do change are likely to be the most recently received, we decided to extend the work we did in 9.6 to cache the UID+FLAGS responses for a folder. With this previous work, we stored a file on the protocol server which contained the UID+FLAGS response for a folder. When they re-issued a UID+FLAGS request, we'd test the previously cached data to find out if the contents were still valid, and if they were, then we'd stream the response from disk rather than re-generate the response from other data stores.
The problem with this approach was if a single new message was delivered into the folder, the cache would no longer be valid and would need to be re-generated. If you then downloaded that new message it would have it's recent flag removed requiring the cache to be re-generated again and then when you flagged the message as seen it would be re-generated yet again. This meant for folders whose content rarely changed, the cache performance was excellent, but for busy archive folders, or for a user who never filed any mail out of their inbox, the cache performance was poor.
So in MailSite 10 we broke the UID+FLAGS cache file down into response volumes. Each volume stores the UID+FLAGS response for a range of 1000 UIDs, for example I might have the following files in my protocol server cache for folder 33 starting at UID 583686435:
When a UID+FLAGS request arrives, we'd test the validity of these cache volumes before deciding if any of them could be used. To confirm their validity we'd use the last modified time of each to issue a ChangesSince() against the source folder; and if any of the UID returned in the added/modified/deleted would fall in the range of this cache volume, then we'd conclude the file was out of date and would be deleted to force re-generation. In the most common case, all but the volume containing the responses for the highest and therefore most recent messages would be valid, so we'd only need to regenerate responses for a small subset of the target folder.
As a result of these changes, we were able to see an 88% improvement in the time taken to select sample large folders.