There has been a spate of customer success stories emerging over the last few weeks - one feature of these that is prominently highlighted is the focus on data. This should hardly be surprising given a major component of all integration problems (once you get beyond technology integration) is bridging between the data models. Within a SOA, the data models built are then exposed through the service definitions. The requesting applications must either formulate the request using the data model of the service provider or more commonly rely on an intermediary such as an ESB to bridge the gap (something I refer to as mediation).
Chief Application Architect Eric Peebles for the City of Chicago recently highlighted the role of data within SOA and he pointed out while speaking about why Chicago moved to SOA using iWay’s products:
city departments and even major citywide applications are separate islands of information about people, companies, land, and buildings. We needed a strategy to integrate these applications and data in order to drive business improvements more effectively.
There are two fundamental approaches to dealing with data integration within SOA: Build a global data model for your business which each connecting business unit must bridge into and out of in order to integrate with other units or build multiple data models. At their most extreme, there are major problems with each approach:
Attempting to build a single global data model becomes harder as the size of the organization grows and the level of change increases. It requires an excellent understanding of the business at the start and an understanding of how change will impact upon it to successfully wire the data model for change. The approach of attempting to adopt a public standard is also not the solution (for instance FpML in financial services for derivatives) as there will significant divergences between the standard and what your business needs – it can be too complex, too simple or just plain wrong! That is not to say building a global data model is impossible – simply it is never as simple as it appears at the start!
Allowing each unit to build their own data model and then providing bridging capabilities between each business unit. This builds cost and complexity into the integration layer which can slow down each project if the intermediary (such as an ESB) is not sufficiently powerful. It can also reduce the level of reuse between the business units of knowledge around building data models for your organization.
The solution is of course balance between the two approaches – and the precise balance will depend on the industry and the organization. For instance:
If your architecture requires a central data exchange through which multiple organizations combine to complete business transactions, it may make sense to veer towards a global data model for this exchange.
If your organization uses a lot of out sourcing or provides out-sourcing to other organizations, it may make sense to veer towards a per-unit data model as you will have limited control over the data model of the other parties.
Finally, the approach to building data models within a SOA context must be sensitive to the types of integration problems that need to be solved and specifically the data models that your model must integrate with. I have seen models which are theoretically perfect, but because of complexity and the mismatch between the model (exposed through service definitions) and what it needed to interact with where almost impossible to use. As with so much in integration, where possible keep it simple and build in the ability to change.










What about information as a service or data virtualization? These are key concepts for data and SOA. See post...
http://blogs.ittoolbox.com/eai/business/archives/information-fabric-and-soa-9055
WHile data integration is one aspect to be considered in SOA, I strongly believe that new persisitence models have to be considered for SOA to scale. Relying on data flows that originate and end in the actual data sources will cripple most SOA deployments (and perhaps why most EII deployments suffer from poor performance and scalability - the slowest link dictates overall system health). Distibuted, in-memory caching/persistence models is certainly one effective way to deal with the distributed loosely coupled nature of SOA. Services deal with the operational subset of data that matters to them and can transact on it, without being bottlenecked by backend data sourced
Bharath,
I totally agree with your points on new persistence models: As SOA scales, it becomes essential that caching occurs in the middle - understanding all the usual issues about caching of data (synchronization etc) have to be handled. Caching will have to become an important capability provided by intermediary platforms such as ESBs - or at least plugged into those platforms.
Somewhat tangentially, I see the current interest in Enterprise Web2.0 potentially driving the same requirement ahead of SOA as a proliferation of RSS feeds will kill any back-end system attempting to cope with a 1,000 corporate desktops requesting the business intelligence reports at 5pm on Friday evening!
Ronan