Sunday, August 7, 2011
Cloud Integration Architecture: The complementary roles of Data Distribution and Application Eventing
Message Oriented Middleware has been largely misused in the past as a workaround for distributing large amounts of data within the enterprise due to the lack of partitioning support within many standard RDBMS offerings. This is why expensive and complex centralized distributed transaction coordination has sometimes found its way in to enterprise application designs – to solve artificial problems that stove-piped relational database and message oriented middleware products have created. The end result of all this has been a higher-degree of coupling at the application tier, not only between the application and its underlying infrastructure, but also between applications themselves – as most of these types of solution implementations use point-to-point messaging in their designs. Message Bus and Business Process Management products evolved to loosen the coupling of these solutions, but still require applications to share data and operate in a unified manner in response to a set of common business requests. These solutions, like the relational databases they compliment, are implemented as centralized servers requiring shared storage for high availability and limiting architects to vertical-only scalability models which are not optimized for cloud-style deployment.
A distributed data fabric, such as VMware vFabric Gemfire, supports the partitioning and replication of big data by combining database and messaging semantics.
The data fabric supports ACID-style consistency and high availability through the automated management of redundant copies of data partitions across multiple local servers. Redundant local data copies are synchronized in a parallel fashion so that it doesn't cost the application architect in terms of latency to create higher levels of availability for their distributed solution. When a local server is lost, one of the redundant copies takes over as the new primary for its data and redundancy SLA's are re-established across the fabric. This means that in order to have an availability issue within a data center, if the redundancy SLA is set to n copies, n + 2 servers would have to be lost simultaneously. It also means that the solution can easily horizontally scale within the data center by adding / removing servers from the local fabric dynamically to serve more (or less) application clients.
A data fabric also supports eventual consistency and further high availability through the automated management of redundant copies of data partitions across multiple data centers over a wide area network. Redundant distributed data copies are asynchronously maintained, allowing for updates to be batched before being sent over the WAN - optimizing the use of this more expensive network resource. A distributed data fabric allows for data to be globally consistent within tenths of seconds / seconds, as opposed to tens of minutes / hours with traditional log shipping solutions.
Each server within the distributed data fabric uses “shared nothing” parallel disk persistence to manage both its primary and redundant data. Reads are then served by all copies, while writes are served only by the primary. The built-in messaging queues underlying the WAN distribution mechanism of the data fabric are also managed by the same redundancy SLA and backed by the same shared nothing parallel disk persistence. In this way, architects no longer need to use either shared storage nor distributed transactions to support the effective management of data underneath distributed applications optimized for cloud deployment.
So what does this all mean for application messaging?
The future of application messaging is founded in event driven architecture. Distributed application components publish events asynchronously to a message broker solution as they are processing data. Those same distributed application components can also voluntarily subscribe to the message broker solution in order to consume messages they are interested in for further processing.
Modern message brokers, such as VMware vFabric RabbitMQ, are designed to handle very high throughput employing similar horizontal scalability and availability characteristics as their complimentary data fabric solutions. All messages are published to exchanges (or topics) which are shared across multiple brokers. All messages are consumed from queues which are local to a specific broker. New brokers can be added / removed to / from the cluster to serve more / less application clients. Brokers are backed by persistence to local disk - eliminating the need for shared storage.
Since the distributed application components also share a distributed data fabric, the business events being shared at the application messaging tier don't need to contain all of the data in the model. In fact, modern application frameworks such as Spring Integration, support the Claim Check pattern for this very reason. The Claim Check pattern allows an architect to persist a complex object model to a shared data store before the message is sent. The shared data store returns a claim check, or unique id, by which the data can be retrieved if/when needed. In this way, the message payload for the event need only contain the claim check for the data.
With a distributed data fabric underneath the application tier, architects are now free to use application eventing ubiquitously within an application architecture. No longer must we obsess over the proper level of granularity across our distributed application components, because modern application frameworks, such as Spring Integration, support an abstract concept of the channel used to communicate between those application components. It is only a matter of external configuration to change my application components from collaborating locally within a single process to communicate remotely to multiple distributed processes - nothing in my application code itself is aware of this change.
Looking ahead to the not-so-distant future, it will be possible for cloud application platforms to manage the distribution of an application in real-time, in direct response to load. Under low-load conditions an application may be configured to run all within one process. Cloud application Platform as a Service (aPaaS) solutions, such as VMware Cloud Foundry, can already dynamically scale individual processes in response to real-time load characteristics. With the support of the Control Bus pattern by modern application frameworks going forward, aPaaS solutions will also be able to automatically distribute applications across multiple processes as well as scale those multiple processes independently of each other.
Sunday, August 8, 2010
There are two data persistence requirements that are fairly unique to integration architectures:
- High-Write - The primary reasoning for having integration as a separate tier is to avoid the loss of any in-flight messages. Ensuring this as a message is validated, transformed, enriched, and routed within the integration tier means a lot of writes to an underlying persistence store. Since messages are typically passed by value in most integration architectures today, this also means very little reads.
- Transient Data - The actual message data is really only of interest to the integration tier while a message is in-flight. After the message processing is complete, message data is no longer of use to the integration application. Sure - integration solutions do typically provide tracking of messages for historical purposes - but auditing is a tangential concern to the integration tier.
The transient data requirement makes you wonder even if a traditional database is the right solution for an integration tier - since traditional databases are meant for long-term, static storage of data. Certainly, a RDBMS makes sense for auditing of historical processing within the integration tier, but not necessarily for real-time online transaction processing of in-flight message data.
So - what other persistence store can handle the high-writes of transient data and scale out effectively to take better advantage of cloud deployment architecture? The answer - an in-memory distributed data cache (a data fabric). It is with this argument that I firmly believe highly-distributed cloud enterprise integration solutions must be based on a data fabric capabale of high-write transactions within multiple data partitions, providing high availability, and configurable synchronous / asynchronous persistence to disk. Persistence to underlying RDBMS for long-term historical auditing purposes can be done through an asynchronous write-behind mechanism that clears those completed transactions from the cache on a scheduled basis.
Moving up the stack from the data tier into the integration tier, we now must take a close look at traditional Message Oriented Middleware (MOM) solutions. Those of you have who have an "I love M.O.M." tattoo on your arms should probably not read further. M.O.M. evolved not from Eve, but from traditional Enterprise Application Integration solutions (EAI) that were a fad back in the late 90's. These EAI solutions were, as you would expect, highly centralized hub-and-spoke architectural approaches to enterprise integration. After Y2K, the ISV industry dusted these off, re-labeled them, and began to sell them to you for twice the price.
Enterprise Service Bus and Business Process Management middleware are server-based approaches to middleware based on JMS. JMS, like JEE is a specification, and there is a "J" at the beginning of it for a reason. Vendors build server support for that specification and then compete on features that go beyond the specification to lock you in. Eventually, 2 or 3 competitors get the same new features into there servers, so they come together and agree on a specification ... and the cycle renews itself.
AMQP is an open internet protocol - designed to be asynchronous (unlike IIOP) and reliable (unlike SMTP). Open internet protocols are proven to outlast the lifetime of the average software company (i.e., HTTP). AMQP is an open standard, natively supported by Java, .NET (WCF), Python, C, Perl, and Ruby among others.
If you've been building your enterprise integration solutions on Spring Integration, as I've blogged about in the past - then you are in a great place ... as Spring Integration gives your application a portable abstraction over the transport layer called the channel. Making the switch from JMS to AMQP is simply a configuration change to the channel - with no affect on your application. Also, if you've been moving away from ESB/BPM server-centric architectures towards a highly-distributed event-driven architecture as I've blogged about in the past, then you will be able to truly take advantage of the horizontal scale that ubiquitous messaging with AMQP gives you. Think of it as "twitter-style" application-to-application messaging for the business.
Moving further up the stack from the integration tier into the application tier, we now must take a close look at traditional JEE application servers. Those of you in application development who have been developing on the Spring Framework for years will already agree that you hardly make use of any of the runtime features of a full-stack JEE server. Spring released you from EJB's and gave you the portability you desired without the high cost against your creativity or productivity.
So why do we as an industry still hold on to the JEE application server when we know that our application developers don't really use it? The answer has to do with Reliability, Availability, Scalability, and Performance. The full-stack JEE application server gives java developers RASP on physical hardware. RASP is not an easy thing to make simple, and folks who know how to tune a specific application server for RASP are hard to find, harder to recruit, and almost impossible to pay enough to keep for very long. Just like Oracle DBA's or folks that know the inner-workings of Websphere Message Broker, these folks command the highest salaries because they have dedicated their professional careers to learning all the buttons, levers, and switches that need to be set when tuning an application server for RASP.
The way to free yourself from all of this is to look for a different approach to RASP ... one that is consistent across all tiers of your application infrastructure, and one that is well known across many operations folks you already have running that infrastructure on a daily basis. The answer, of course, is virtualization. Virtualization is a proxy that provides RASP to application infrastructure through the very same Inversion of Control pattern that you've already come to love about the Spring Framework. Virtualization is capable of providing RASP to a database, a message broker, or an application server in a consistent and predictable manner that is well-known to a large percentage of your operations staff. VMware's virtualization technology bases it's approach to RASP on it's VMotion capability - which allows for virtual machines to be quickly moved from one physical host to another either due to outage of the physical host, or even just spikes in application load, without having to take those virtual machines down.
I hope I have convinced you that it is time to "re-think" the server. Cloud Integration Architecture requires tearing down the monolithic server-bound architectures we've spent the past 20 years building: database servers, middleware servers, JEE application servers. The cloud is now the server and solid application architecture again takes its rightful position as king. Re-commit yourself to the craft of software engineering and you will embrace the future of IT.
Saturday, June 20, 2009
Thursday, December 18, 2008
Saturday, October 4, 2008
The real challenge of adopting SOA is to change the way you and your organization think about enterprise architecture, not to change your information technology infrastructure yet again to continue the cold war-like arm's race against your competitors the software vendors have sold you on (at a great profit to them, I might add).
Why is SOA such a challenge? Simple, because it puts the focus of enterprise architecture on software architecture, not hardware and network architecture. The IT industry is in the midst of handoff between its first generation of infrastructure architects and a new generation of software architects. Businesses, who have come to rely heavily on their architects, are struggling to understand a new sofware-centric view of their technology portfolio. The old days were (somewhat) easy, let the IT guys handle the infrastructure, and we'll handle the business. Investments in technology were relatively simple and straightforward to both understand and manage - and the results were tangible assets that had an expected life and could be depreciated with comfortable precision.
The first generation of architects are extremely good at what they were asked to do - connect hardware through ever-growing and ever-speedier networks. Software, at least the software they intended to run the mission-critical elements of the business on, was considered to be a commodity - just like the hardware and network components they were used to implementing. Their view of software from an enterprise perspective was through the hardware nodes that the software was to be deployed on (this box is for General Ledger, that box is for Accounts Payables, this box is for email, that box is for our website, and so on ...). Rack it, stack it, connect it up, and turn it on.
The new generation of architects don't think of software only in terms of how it is deployed. Disrupting forces like the Internet, Mobile Computing, Virtualization, and Cloud Computing are making physical hardware and networks a ubiquitos (and somewhat abstract) concept. With the changes brought about by these forces, it is not likely that the IT group of the future will even continue to manage physical technology assets within business-owned data centers anymore.
So if IT departments are no longer managing hardware and networks for the business, what will that leave them with? Will IT cease to exist as a business-critical function within the enterprise? The answer, of course, is no (or else I'd be learning a new trade instead of blogging about this one). The answer is that IT departments will "move up the food-chain" within the business - becoming newly responsible for managing and securing its intellectual property (IP).
IP is the beating heart of today's business. It is composed of the knowledge and the processes that form the foundation of a business and give it its unique competitive differentiation within the marketplace. Knowledge and processes are modeled as software, not hardware - and those models can no longer be confined to the physical boundaries of hardware and networks. Look no further than Amazon, Google and other major internet-based companies that have survived the dot.com bubble to form the new guard of today's business for proof of this paradigm shift is real.
So how does this all tie back to SOA? SOA is simply a better way of managing the knowledge and processes that form the IP of your business. SOA is a better way because it is a software architecture that, like the knowledge and processes it manages, is not confined to the physical boundaries of hardware and networks.
So, if I shouldn't approach SOA the way I've approached technology investments in the past, by going out and buying some components from a vendor, connecting them up, and turning them on; how should I approach it? This will be the subject of my next blog - Part II in this series.
Friday, June 20, 2008
- effectively captured during requirements analysis and
- efficiently delivered by the technology traced to that requirement.
- the expected impact of the requirement on the enterprise architecture,
- the planning input for the project required to deliver it, and
- the expected impact on IT operations to support it once it has been delivered.
Monday, May 19, 2008
Most commercial enterprises would agree that their Enterprise Architecture has organically grown over the years in a similar way to how the Amish sew together a patch-work quilt. This is the problem of "shopping for your EA solution".
On the left-side of the "shopping for an EA" continuum, you have customers that consider themselves early adopters and are willing to try new things to "get an edge" on their competition. [insert your favorite market guru here] tells these customers that they should have a portal, and that Plumtree is one of the best point-solutions out there so they buy that. [insert your favorite market guru here] tells them they should have a framework, and that SilverStream is one of the best point-solutions out there so they buy that. [insert your favorite market guru here] tells them they need an ESB solution, and that Cape Clear is the one of the best point-solutions out there so they buy that. And so on. These folks may be fast out of the gate, but lose momentum over time, due to inefficiences caused by lack of integration along the way and the rapid turnover of products (and vendors) caused by the fickleness of the commodity software industry.
On the right-side of the "shopping for an EA" contiuum, you have customers that consider themselves more conservative and place a high value on staying with a single vendor. These folks are still "shopping for an EA", except that they wait to be told what to buy and when to buy it by IBM or Microsoft. These folks are always dealing with repressed feelings of frustration and doubt caused by their inability to keep up with software market innovation because their chosen vendor isn't getting them there fast enough.
In my opinion, customers at either end of this continuum can be refactored toward a planned Enterprise Architecture, highly customized to their specific needs, that can deal with change in a "systemic" or "repeatable" fashion through the combination of:
- the strategic use of Open Source software to get at the core problems within the IT portfolio by either better glueing the pieces together or more rapidly extending the functionality of the monolith (with either approach based on open standards) - making the IT-side of these commercial customers more agile while also providing a forward-looking context for better supporting their engrained spending habits, and
- the tactical use of Governance solutions to give the business-side of these commercial customers visibility into the measureable (metrics-driven) progress they are making towards the goals that define their reasons for investing in IT to begin with.
To summarize, I like the "ReFactoring your Enterprise Architecture" angle because it deals with the heterogeneity (or lack thereof) that likely exists in most commercial enterprises while still sending the "we aren't here to change you ... just make you better" message that Enterprise Modernization sends.