MemCachier Growing Pains

MemCachier is a new service. Since May we’ve been growing more rapidly than we anticipated, and we’ve been struggling to catch up. Some of you may have been noticing short downtime that happens every few days.

First, let me apologize about this downtime. We’re working hard to resolve the issue, and we’re not proud that it’s taken us this long to get there. However, in an attempt to be honest and upfront, I want to explain what’s going on and how we’re trying to address the issue.

The root cause of our downtime is around new TCP connections. Existing TCP connections are not affected, which is why most of you have not noticed the downtime. Every new TCP connection that comes in needs to be authenticated. As we’ve grown much more rapidly than we had anticipated, we’ve started to see a huge increase in new TCP connections, which has been stressing our authentication logic. We’ve learned that our authenticate logic isn’t executing quickly enough, which causes a small number of TCP connections to timeout. We’re profiling our authentication logic and deploying changes very often – each change makes the code a little faster and more stable.

I want to apologize to those of you who have been experiencing issues. We know one of the most important qualities of a cache is stability. We’re working hard to get MemCachier to a rock-solid stable state. We’re sorry about the downtime, thanks for using MemCachier, and stay tuned for more updates on the issue. We’ll update our blog and Twitter when we have news.

For real-time status updates on MemCachier’s availability, visit status.memcachier.com.