Monday, October 08, 2007

Atomikos Offers 3rd Generation TP Monitors

This post on InfoQ was made by Arjuna, one of our (ex) competitors after JBoss (and then Red Hat) bought their transaction technology.

More interesting than the referred paper are the comments, which I would like to discuss here. Most posts seem to rule out transactions as something that doesn't scale. None of these comments I agree with.

The main complaints uttered seem to fall into these categories:

  1. Transaction managers are supposedly centralized.

  2. Transaction managers are accused of overhead for two-phase commit and synchronization.

I will now show that both these statements are a misconception, claiming that the 3rd generation transaction monitor already exists. Moreover, I will show that 3rd generation transaction managers are better than (or at least as good as) the alternatives - when used correctly.

The product I am talking about is Atomikos ExtremeTransactions, including its JTA/XA open source edition named TransactionsEssentials. Let me now outline why none of the above objections are actually accurate:

  1. Atomikos ExtremeTransactions is a peer-to-peer system for transactions. Whenever two or more processes are involved in the same transaction, the transaction manager component (library) in each process will collaborate with its peer counterpart in the other process. This is how it is done. Consequently, there is no centralized component nor bottleneck. Our studies have shown that this gives you linear (i.e., perfect) scalability. This invalidates the first criticism above.

  2. While two-phase commit does incur some synchronization, the same is true for any other solution (assuming that you want to push operations to one or more backends). A simple example to illustrate my point: many people think that queuing is a way to avoid the need for transactions (and two-phase commit). Is it? Hardly: even if we neglect the resulting risk in message loss (see then you have to realize that most queueing systems use two-phase commit internally anyway. This invalidates the second criticism above.

  3. The often-heard criticism that transactions may block your data is not fair either.
    There is some interesting theoretical work done by Nancy Lynch (MIT) et al - I believe it is this one. Basically, this is mathematics that proves that you cannot have a non-blocking (read: perfect) solution for distributed agreement in realistic scenarios.
    In practice, this means that a queued operation may not make it if the connection to the receiver is down too long. So your system is 'blocked' in the queue, even though you don't use transactions. This is the equivalent of the perceived 'blocking' but now placed in a non-transactional scenario.

  4. Again on the perceived synchronization overhead: if you don't keep track of "what" you have done and "where" (by synchronizing) then you end up with an error-prone process. This is especially true for many critical applications that consume messages and insert the results in a database. If you don't use transactions then you will find yourself implementing duplicate message detection and/or duplicate elimination, none of which are safe without the proper commit ordering. Basically, you are implementing a transaction manager yourself (yuk!).

Am I saying that transactions and two-phase commit don't block? Not exactly - especially if you use XA then things can block. However, Atomikos avoids this in two ways:

  • Very strong heuristic support: unilateral decision are encouraged both in the backend and in the Atomikos transaction manager. If a transaction takes too long, it is terminated anyhow. Where classical scenarios would block, Atomikos enforces a unilateral termination by either party. The resulting anomaly is reflected in the transaction logs, so the transaction manager can track problem cases (instead of letting you chase different systems to find out what happened - the alternative without transactions). Ironically, we have seen more blocks caused by non-XA transactions: if your database does not support an internal timeout mechanism for non-XA (which seems to be so in the most commonly used DBMS) then it will be non-XA transactions that cause the blocking!. I can go on for hours about this - but that is another post.

  • Atomikos also offers local transactions with compensation instead of rollback: you can use our TCC (Try-Cancel/Confirm) API to handle overall rollback. This allows you to use non-XA, local transactions. It never blocks your application, ever! TCC is similar to WS-BA, only better because we have been working on it for much longer than anybody else in the world. See for more on TCC.

Summing up then: do I recommend two-phase commit? Yes, if needed. In the past, this need arose out of legacy integration. In the present and future, that need arises out of up-front requirements. The most typical use cases are:

  • Processing persistent messages with exactly-once guarantees. There is no substitute for the reliability and ease of Atomikos ExtremeTransactions here. Note that this can be done intra-process!

  • Across processes/services if you have a reservation model inherent in your business process. Our TCC technology will make sure that your database never blocks.

More information about Atomikos products can be found here