Friday, September 05, 2008

The Achilles heel of the CAP theorem

In my last post I discussed the theoretical proof of the CAP theorem. Both the theorem and the proof have a limitation that might very well render them not-so-universal as assumed.

The limitation of the CAP proof

The limitation of the CAP proof (as formulated by Lynch et al) is the following: it assumes that - for the purpose of availability - requests are to be served even when there is a partition in the cluster.

A way around the limitation

There is a way around this limitation - although it may sound exotic: just make sure that there are no partitions when requests are served.

How? By simply doing the following:

  • Queue requests (e.g., in JMS).

  • Only process requests when there is no partition problem.

  • Send responses asynchronously, for instance via email.

Since no partition (hopefully) lasts forever, this solution does not lead to livelock.

Also, note that quorum solutions exist to avoid that the complete cluster has to be up at the same time.

Is this the capitulation of CAP? Who knows...


Blogger PetrolHead said...

Have you read:

"Impossibility of distributed consensus with one faulty process"?

This is an important result and has significance to your comments and the CAP theorem. Essentially one can't tell the difference between a genuine failure and a slow running machine or busy network.

Thus your solution might work for a very small number of machines all in a single data-centre but for larger installations, failure of machines, routers, switches, cables etc will happen several times a day and thus quorums and clusters become considerably less practical and loose consistency more attractive.

Note also that the theorem isn't just about clustered services in the traditional sense but also services that run across multiple data-centres.

I also have a specific observation:

"....note that quorum solutions exist to avoid that the complete cluster has to be up at the same time."

This is true but they are limited by a number of factors practically:

(1) The assumption that you will have a majority - seemingly this is straightforward but a partition plus a loss of a machine can leave you without a majority.

(2) Getting all members back into sync. Can require all sorts of special admin involvement and it can go wrong.

(3) Performance - quorum protocols especially across enough nodes to ensure survival can be slow.

(4) Ensuring that clients don't continue to make use of the minority during a partition e.g. reporting out-of-date information.

(5) You can have a cluster capable of achieving consensus but you can't reach it because the network is broken between cluster and clients.



9:11 PM  
Blogger Guy said...

Hi Dan,

Sure have I read "Impossibility of distributed consensus with one faulty process" - it is at the basis of the heuristic exceptions in all two-phase commit solutions (including Atomikos).

However, what I am saying is that the failure usually only lasts for so long, and afterward things can move on. Exploiting the right tools to do that can help availability.

That is the main advantages of (persistent) queues and that is all I am saying. Lynch et al do not seem to exploit it as much as they could...


9:30 PM  
Blogger PetrolHead said...

Indeed the failure only lasts so long, alas I should have made this clearer:

"Essentially one can't tell the difference between a genuine failure and a slow running machine or busy network."

That means you cannot reliably tell when you have a partition and when you don't. Thus you might pause processing (and incur large backlogs) un-necessarily and that might upset your customers (this is somewhat why Amazon built Dynamo, always writable so people can fill their baskets even when underlying systems are broken).

Lynch et al don't discuss queue's because that's outside of what they're trying to prove/test. They're providing underpinning theory. It's up to others (such as yourself) to propose solutions.

Queue's are helpful in that they can retain state for you (but they have nasty failure modes of their own) however they tend to live in a single-data-center, lose that and your queue is useless. That's one of the reasons Amazon built SQS and I note of late that something similar was built by MS for Windows Live. They're queues of a sort but nothing like for example JMS.

Note that asynchronous structures in general are good for handling failure, scaling and latency. Dan Pritchett has some good scribblings on this (as does his cohort Randy Schoup).

Lastly, of course we don't all have to worry about multiple data-centres so some of the above can be ignored in some cases.

Nice chatting with you,


10:36 PM  
Anonymous Anonymous said...

If you're happy to wait when there is a partition, then that's fine but the system isn't "available", is it?

All you seem to be doing is taking the CAP theorem and inventing a "CP" theorem where availability isn't necessary.

5:11 PM  
Blogger Guy said...


Please define "available".

8:16 PM  
Anonymous Anonymous said...

In terms of the CAP theorem, I think available means available all the time.

In particular, waiting for one of the other aspects (consistency or waiting for a partition to recover) would definitely count as "not available".

1:55 PM  
Blogger Guy said...


If you want a real-time system then I agree. However, if you just want a response within some reasonable time frame then the approach I suggest would work. Whether or not you are waiting for a partition to resolve, or for an email to come in with your order confirmation makes little different IMHO, and is definitely acceptable in most cases I think.

In general, I think there is series of application classes that fall into distinct requirements/characteristics wrt what can be tolerated. I think there is some promising research in that area...


5:35 PM  
Anonymous Anonymous said...

> if you just want a response within some reasonable time frame then the approach I suggest would work.

Yes, it will work in a few cases (waiting for an order email). No, it will not work in general. (How do people add items to their order during an outage? You can't show a price because you don't know if it's authorative or not.)

You are trying to re-define the definition of "Available". If it's not taking reads and writes, then it's not Avaliable. And serving stale info in some cases isn't Consistent.

12:29 AM  

Post a Comment

<< Home