Making SOA Leaner and Cleaner to Maximize Profits (Part I of II)

Untitled Document


Introduction



Walking through the IKEA store a few weeks back I noticed a poster highlighting IKEA's green credentials and the trio: "Reduce, Reuse and Recycle." I started thinking, as any SOA practitioner should, about the implications of the green movement and its parallels in how we build SOA solutions. Some of the reasons that we see customers adopt SOA include: interoperability, ease of development and maintenance, visibility, and, of course, reuse. At the superficial level, SOA is somewhat green because it encourages reuse of existing capabilities as services. What about recycling? And what about reducing waste and inefficiencies? Reducing the impact on the surrounding environment and systems?

A successful SOA project usually results in driving more traffic to backend systems that will need to be scaled out, taking more hardware resources not less. In some sense, a service consumer is naturally akin to a greedy consumer that hogs resources of backend systems and drives traffic through them grinding the CRM, finance applications, legacy systems and databases to the edge -- a re-incarnation of the free-rider problem if you like.

So where does the wastefulness come from in SOA, and what can be done to manage it?

As with the green movement in general, the solution boils down to being a responsible consumer. We are all being taught to be more conscious about consumption every day in the real world. In the case of SOA the things that are being consumed are backend capabilities that provide services Just because you can call a service a hundred times doesn't mean you should. Conversely just because a specific service consumer calls your service a hundred times only for you to return the same result, it doesn't mean that you should do the same amount of work for each service call. What is beautiful about the solution that we talk about here is that it results in reducing the operational risk associated with doing SOA in addition to taking out the waste!


Reduce: Where do the inefficiencies come from?

If a service consumer is being wasteful, then the obvious question is where? As you look into the way that applications are built in a SOA environment, you start to see. We recently looked at a wireless telco's Web portal where customers can view, print, pay and add additional service to their account. This system is linked to billing, CRM, provisioning and legacy systems. Although the application was built using SOA principles, what we noticed was that during the typical customer interaction, the billing system in particular was called an average of 3 times. This is a big deal.

Additionally, the Web portal also manages changes to the subscriber's services not as an EJB, but as state data in an existing database backend. Finally, we observed that a provisioning process that was kicked off as a result of changes to services ended up pulling or updating the customer profile information from the Home Subscriber Service (HSS) multiple times. The Web front end also pulled the profile information from the HSS at least once.

The HSS is an operational system that has one hand in the IT world and the other in the network world so accessing it is only for the privileged!

Taking a high level view of the application unearthed the following characteristics that lead to wasteful consumerism:

1. Multiple Calls to Legacy Systems -- Multiple service requests were made to the legacy billing system for information that was very unlikely to change for the duration of the Web interaction. Accessing legacy systems costs time -- a few seconds per interaction, and money. The latter is not so much of an issue if you like paying big iron vendors for more mainframe MIPS, otherwise, it's of concern.

2. Hitting Backend Data Stores with Constantly Updating State Information -- An existing database was used to store transient state data related to the Web session. The state could have been managed in the mid-tier, e.g. with EJB, but the customer was looking for a scale-out solution so they decided to use the database as a state repository and store transient state data in the transactional database.

3. Multiple Read -- Write interactions with Services -- Multiple read and update requests were made to the user profile in the HSS both in the Web tier and also in the backend provisioning logic which was implemented using a BPEL orchestration engine.

All these wasteful interactions put undue stress on the backend systems. The solution to these ills is a mid-tier data caching / grid solution that enables you to save on service invocations by caching results, allowing you to manage state in the mid-tier in a scalable way.

What is a mid-tier data grid solution?

A critical part of a grid-enabled SOA environment is a mid-tier grid layer. This layer provides a JCache-compliant, in-memory, data grid solution for state data that is being used by services in a distributed service-oriented application.

The mid-tier distributed caching layer balances the in-memory storage of a service instance to other machines across the grid. This effectively provides a distributed shared memory pool that can be linearly scaled across a heterogeneous grid of machines (which can include both high-powered, big-memory boxes, and other lower-cost commodity hardware).

In a data grid-enabled, stateful service-oriented environment (one that makes use of this mid-tier caching layer), all the data objects that an application or service puts in the grid are automatically available to and accessible by all other applications and services in the grid, and none of those data objects will be lost in the event of a server failure. The fact that the application has access to all data from an API, irrespective of the location of the data, is called a single systems image. To support this, a group of constantly cooperating caching servers coordinate updates to shared data objects, as well as their backups, using cluster-wide concurrency control.

The key to this approach is in ensuring that a primary owner and an additional backup server always manage each piece of data. This data can be anything from simple variables to complex objects or even large XML documents. From the perspective of the developer, the service is simply performing operations against a programmatic interface to a collection, such as a Java map or a .NET dictionary. At any point in time, all of the data grid nodes know exactly where the primary and backup copies of all the data are stored, so the application doesn't have to! This eliminates the massive overhead of each node having to find out where a piece of data resides.

As shown in Figure 1, the request to put data into the map is taken over by the data grid and transported across a highly efficient networking protocol to the grid node P, which owns the primary instance data. The primary node in turn copies the updated value to the secondary node B for backup, and then returns control to the service once the proper acknowledgments are handled.


Figure 1: Primary/backup synchronization of data access across the SOA grid

Conversely, when the instance data needs to be accessed by a service, the data grid calls the primary node which owns that instance data, and routes the data back to the service or business object that is requesting it. A range of operations is supported in this manner, including parallel processing of queries, events, and transactions. In a more advanced implementation, an entire collection of data can be put to the grid as a single operation, and the grid can disperse the contents of the collection across multiple primary and backup nodes in order to scale.

One of the questions that you may ask is how do you ensure that the cached data within the data grid is up to date, especially in cases in which a data grid front-ends a database or transactional system. While the general discussion of this issue is beyond the scope of this paper, you can ensure data is fresh through policy-based cache expiry rules that are defined in the data grid or use an event-driven mechanism in which changes to data are propagated to the data grid at the same time as they are pushed through to backend databases.

What's the big deal with data grids? The data grid simultaneously provides performance, scalability and reliability to in-memory data. So how does that relate to what we want to deliver in a SOA environment, in relation to reducing the impact on backend systems, check back next week and find out?

(Check back next week for Part II)

About the Authors

Mohamad Afshar PhD, is Vice President, Product Management within the Oracle Fusion Middleware development organization. His main focus includes vision, strategy, and architecture, with an emphasis on SOA, EDA and next generation. He works closely with customers advising them on their SOA roadmaps and has spearheaded Oracle's SOA Methodology. Previous to Oracle he was a co-founder of Apama, an algorithmic trading platform (sold to Progress). Mohamad is a frequent speaker at industry events and is a contributing author to business-oriented, technical, and academic journals. He holds a PhD in Parallel Database Systems from Cambridge.

More by Mohamad Afshar

Dave Chappell is vice president and chief technologist for SOA at Oracle Corporation, where he is driving the vision for Oracle’s SOA Grid initiative. Chappell is well known worldwide for his writings and public lectures on the subjects of Service Oriented Architecture (SOA), SOA Grid, the enterprise service bus (ESB), message oriented middleware (MOM), enterprise integration, and is a co-author of many advanced Web Services standards. As author of several books including the O’Reilly Enterprise Service Bus book, Dave has had tremendous impact on redefining the shape and definition of SOA infrastructure. In his over 20 years of experience in the industry, he has built enterprise infrastructure ranging across client-server, web application servers, enterprise messaging systems, and ESB. Chappell and his works have received many industry awards including the "Java™ Technology Achievement Award" from JavaPro magazine for "Outstanding Individual Contribution to the Java Community" and CRN Magazine “Top 10 IT leaders” award for “casting larger-than-life shadow over the industry”.

More by David Chappell