Sunday, August 7, 2011

Cloud Integration Architecture: The complementary roles of Data Distribution and Application Eventing

When discussing the importance of a distributed data fabric in modern application architecture, I often have to explain the new role of application messaging and how it complements data fabric within the context of Cloud Integration Architecture.

Message Oriented Middleware has been largely misused in the past as a workaround for distributing large amounts of data within the enterprise due to the lack of partitioning support within many standard RDBMS offerings. This is why expensive and complex centralized distributed transaction coordination has sometimes found its way in to enterprise application designs – to solve artificial problems that stove-piped relational database and message oriented middleware products have created. The end result of all this has been a higher-degree of coupling at the application tier, not only between the application and its underlying infrastructure, but also between applications themselves – as most of these types of solution implementations use point-to-point messaging in their designs. Message Bus and Business Process Management products evolved to loosen the coupling of these solutions, but still require applications to share data and operate in a unified manner in response to a set of common business requests. These solutions, like the relational databases they compliment, are implemented as centralized servers requiring shared storage for high availability and limiting architects to vertical-only scalability models which are not optimized for cloud-style deployment.

A distributed data fabric, such as VMware vFabric Gemfire, supports the partitioning and replication of big data by combining database and messaging semantics.

The data fabric supports ACID-style consistency and high availability through the automated management of redundant copies of data partitions across multiple local servers. Redundant local data copies are synchronized in a parallel fashion so that it doesn't cost the application architect in terms of latency to create higher levels of availability for their distributed solution. When a local server is lost, one of the redundant copies takes over as the new primary for its data and redundancy SLA's are re-established across the fabric. This means that in order to have an availability issue within a data center, if the redundancy SLA is set to n copies, n + 2 servers would have to be lost simultaneously. It also means that the solution can easily horizontally scale within the data center by adding / removing servers from the local fabric dynamically to serve more (or less) application clients.

A data fabric also supports eventual consistency and further high availability through the automated management of redundant copies of data partitions across multiple data centers over a wide area network. Redundant distributed data copies are asynchronously maintained, allowing for updates to be batched before being sent over the WAN - optimizing the use of this more expensive network resource. A distributed data fabric allows for data to be globally consistent within tenths of seconds / seconds, as opposed to tens of minutes / hours with traditional log shipping solutions.

Each server within the distributed data fabric uses “shared nothing” parallel disk persistence to manage both its primary and redundant data. Reads are then served by all copies, while writes are served only by the primary. The built-in messaging queues underlying the WAN distribution mechanism of the data fabric are also managed by the same redundancy SLA and backed by the same shared nothing parallel disk persistence. In this way, architects no longer need to use either shared storage nor distributed transactions to support the effective management of data underneath distributed applications optimized for cloud deployment.

So what does this all mean for application messaging?

The future of application messaging is founded in event driven architecture. Distributed application components publish events asynchronously to a message broker solution as they are processing data. Those same distributed application components can also voluntarily subscribe to the message broker solution in order to consume messages they are interested in for further processing.

Modern message brokers, such as VMware vFabric RabbitMQ, are designed to handle very high throughput employing similar horizontal scalability and availability characteristics as their complimentary data fabric solutions. All messages are published to exchanges (or topics) which are shared across multiple brokers. All messages are consumed from queues which are local to a specific broker. New brokers can be added / removed to / from the cluster to serve more / less application clients. Brokers are backed by persistence to local disk - eliminating the need for shared storage.

Since the distributed application components also share a distributed data fabric, the business events being shared at the application messaging tier don't need to contain all of the data in the model. In fact, modern application frameworks such as Spring Integration, support the Claim Check pattern for this very reason. The Claim Check pattern allows an architect to persist a complex object model to a shared data store before the message is sent. The shared data store returns a claim check, or unique id, by which the data can be retrieved if/when needed. In this way, the message payload for the event need only contain the claim check for the data.

With a distributed data fabric underneath the application tier, architects are now free to use application eventing ubiquitously within an application architecture. No longer must we obsess over the proper level of granularity across our distributed application components, because modern application frameworks, such as Spring Integration, support an abstract concept of the channel used to communicate between those application components. It is only a matter of external configuration to change my application components from collaborating locally within a single process to communicate remotely to multiple distributed processes - nothing in my application code itself is aware of this change.

Looking ahead to the not-so-distant future, it will be possible for cloud application platforms to manage the distribution of an application in real-time, in direct response to load. Under low-load conditions an application may be configured to run all within one process. Cloud application Platform as a Service (aPaaS) solutions, such as VMware Cloud Foundry, can already dynamically scale individual processes in response to real-time load characteristics. With the support of the Control Bus pattern by modern application frameworks going forward, aPaaS solutions will also be able to automatically distribute applications across multiple processes as well as scale those multiple processes independently of each other.