One of the major new features in MailSite 10 was an upgraded ActiveSync server to support V14 of the protocol. This allowed us to take advantage of the new ActiveSync account features in Outlook 2013 and offer full Native end to end calendar and contact sync.
After performance tuning our ActiveSync V14 support in MailSite 10.0, we found to our surprise that syncing a large account was dramatically faster than it was against our IMAP server. In both cases the vast majority of the data transferred must be identical message data, so it ought to be the case that the sync times are very similar. This led us to set up an experiment to examine our IMAP performance in more detail; an exercise which turned out to be a very fruitful indeed.
When we send a packet of data in our native services, we do so asynchronously using I/O Completion Ports.
"I/O completion ports provide an efficient threading model for processing multiple asynchronous I/O requests on a multiprocessor system. When a process creates an I/O completion port, the system creates an associated queue object for requests whose sole purpose is to service these requests. Processes that handle many concurrent asynchronous I/O requests can do so more quickly and efficiently by using I/O completion ports in conjunction with a pre-allocated thread pool than by creating threads at the time they receive an I/O request."
This means that after a thread asks the OS to send data for one connection, it can go off and process other connections while the thread pool waits for a completion packet to indicate that the I/O is complete for this connection. At this point a worker thread receives the completion packet and it can prepare the next data to send to the client and release or re-use the buffers used to send the initial data.
The assumption in our code was that once our data had been accepted by the local OS, it would post a completion packet to us, and the network layer would get on with sending the data in the background while we prepared the next thing to say. Examining our event traces showed that this was not in fact the case, rather the delay in receiving the completion packet was very close to twice the ping time between the client and the server. We concluded that the I/O completion packet not only indicated that the data had been sent, but that it had been ack'ed by the network layer on the client.
If you have a high latency connection, the ping time between client and server might be 150ms. Were you to sync 1000 emails across such a connection, where each message gets sent in one transaction, then our connection would be idle on the server for 1000 * 150 * 2 ms = 5 mins. We predicted that it ought to be possible to remove almost all of this delay by pipelining our data sends, meaning we'd be concurrently sending data while waiting for the ack of previous sends.
Implementing pipelining to increase concurrency presents a few challenges; we need to keep all send buffers until each specific send has been ack'ed, and the connection will be under greater contention as we assemble the next data to send concurrently to receiving the acks of previous sends. We addressed these challenges in MailSite 10.1 by amending our CFetchCommand to spawn multiple CFetchProcesses which would be tasked with assembling the response for a particular set of messages into "fetch response volumes" on disk. The files would then be passed back to the CFetchCommand to order and send. When the ack was received for each volume, we'd be able to cleanup resources.
Our TLS code also needed to be amended; it was written such that when you send clear-text data, the TLS code encrypts this into potentially several cipher-text buffers, and sends each one in turn. It also suffered from the latency delay in awaiting the I/O completion packet before sending the next buffer. In MailSite 10.1 we addressed this issue by sending each buffer in turn concurrently while awaiting the I/O completion packets for previous sends.
With the above changes complete, we were able to sync 1.4GB of messages spanning 40K messages over a 160ms latency connection in 1 hour. Previously we'd have spent at least 40,000*160*2 ms = 3.5 hours idly waiting for packets to be sent in addition to time taken on the server to produce those packets. Were we to be using TLS this idle time would have been considerably greater, perhaps even as large as 10.5 hours.
In a later blog post we'll talk about further optimizations we made in MailSite 10 to improve the speed of accessing large folders.