As Web services mature and more and more implementations start to appear, the discussions around related technologies and protocols such as security, transactions and reliability are also becoming more relevant. This article focuses on transactions for Web services, and particularly on the concept of long-running transactions and the implications for Web service providers. In particular, we will look at the ways in which transactions and compensation fit the business model of a Web service provider, and the implications with respect to the architectural support that is needed.
Before we get started, let's make sure we agree on some vital terminology.
Transaction: for the purpose of this article, a transaction is a set of related interactions (or operations) that may need to be cancelled AFTER they were executed. These operations can be at different geographical locations, like different Web services on the Internet.
Long-running transaction (or business transaction): for the purpose of this article, a long-running transaction is one where the total duration exceeds the duration of an individual interaction by several orders of magnitude. Consequently, a long-running transaction is a transaction where the cancellation of an interaction may happen a relatively long time after the interaction was executed.
Example: Figure 1: Booking a flight at two Web services shows a trip reservation: scheduling a trip from Brussels to Toronto can consist of the booking of a flight from Brussels to Washington, and then a second booking of a connecting flight from Washington to Toronto, the final destination. Suppose the second flight is with a different airline, in a different reservation system.
If the second part fails then I am stuck with a ticket from Brussels to Washington (which can certainly be interesting, but it defeats the purpose of my trip planning). So in that case, it can be a valid option to cancel the trip to Washington. However, before I decide to do so, I may have been looking for alternatives like another connecting flight two hours later or so. This can potentially lead to a very long-running overall transaction.
Note that the crucial point here is that the reservation has already been made at the time I decide to cancel. That is, the airline reservation system has accepted and verified my first reservation, and is keeping my seat for at least some time. If there is no cancel event then either I will get booked with a ticket I don't need, or the airline is going to lose money on an unneeded reservation. Either way somebody loses money. Consequently, there are strong drivers for the existence of a cancel event.
Speaking purely technically, the cancellation of different interactions across different locations is nothing new: it has been done for decades by something called a transaction manager or transaction service. CORBA's OTS (Object Transaction Service) is an example of such a technology, J2EE's Java Transaction API and Java Transaction Service (JTA/JTS) are another example.
However, these technologies are all based on the concept of ACID transactions. The way ACID technology works is as follows: all data a transaction accesses become locked from other concurrent transactions until the transaction either commits (saves changes) or rolls back (cancels changes). This is illustrated in Figure 2: Lock time for ACID transactions.
For long-running transactions this is far from ideal, because cancellation would only be possible if all the data were kept locked. Keeping locks on behalf of web service interactions may not be desirable for different reasons such as:
- Web services may require a long delay before cancellation is done, especially if there are human decision times involved. This would imply long lock times, and equally long data unavailability.
- Web services may involve parties that don't know or don't trust each other, making them vulnerable to denial of service (DOS) attacks at the database level.
So if ACID and rollback are not going to help us then what is?