Leveraging Information and Intelligence

David Linthicum

Lack of Focus on Data Killing SOA

user-pic
Vote 0 Votes


For those of you that have been following me know that I'm very much an advocate of SOA. The architectural pattern of SOA is helpful in defining an enterprise architecture that much more agile, and thus pays for itself once the business has to shift and needs IT to follow.

SOA, however, is complex and requires that the architect understand all aspects of the "as is" architecture before moving to the "to be." This means decomposing the existing architecture down to a primitive state, and rebuilding it up again at sets of services, with a process configuration or composite applications layer to define and redefine business functions. I think most get that.

What's missing within most typical SOA projects is the focus on the data, and that is killing SOA. Since the "S" in SOA, means service, most architects focus on the service definition, abstracting the existing data into collections of services, but don't pay much attention to the data within the architecture. Not good.

The truth is that the foundation of a healthy and functional SOA is the data, and you have to deal with the underlying data first, understand it, perhaps reorganize and abstract it, before defining the services that will sit on top of the data. While this is architecture 101, the fact is that those driving SOAs these days have little understanding of the importance of understanding and defining the data, and thus the architecture ends up being a bunch of well defined services that sit on top of very dysfunctional data. The end result is performance issues, data integrity issues, and even the lack of agility which is why you build SOAs in the first place.

The truth is that most failed SOA projects can be traced to the lack of a data level understanding, and while this is still an issue in this day and time is beyond me. There are many technology and tools out there to assist you, and we've been doing data for a long long time. Nothing new here, just data. However, if you ignore it your SOA will be still born.

15 Comments

Completely agree David. I think data from a SOA standpoint is extremely critical for several reasons:
- reducing errors/rework in automated business processes that leverage enterprise data services
- reducing the need for semantic and syntactic transformations as data moves from application tiers to the underlying data stores specially legacy systems
- achieving effective decoupling of legacy systems from service contracts (no point exposing services that don't encapsulate underlying data models/physical implementation details)

So true. One fortunate aspect though is that SOA is a golden opportunity for getting the benefits from data quality tools that we haven’t been able to achieve so much with the technology and approaches seen until now.

Data Quality functionality deployed as SOA components has a lot to offer:

• Reuse ensures the same data quality rules applied to every entry point of the same sort of data and thereby helps with consistency.
• Interoperability will make it possible to deploy data quality prevention as close to the root as possible.
• Composability makes it possible to combine functionality with different advantages – e.g. combining internal checks with external reference data.

More here:

http://liliendahl.wordpress.com/2009/07/07/service-oriented-data-quality/

I completely agree with David's assertion and have seen the effects of ignoring the data aspects of the service-infrastructure. Alternatively, having a holistic view of the data landscape and the metadata that supports it will allow you to construct a service-layer that is simpler and ultimately more agile.

More here: http://adjoovo.com/cms/insights/white-paper.html

I also agree with the analysis of the problem.
I think it is more general than for SOA. I will explain why and come back to SOA.

With the current fashion of describing the system requirements in use cases, the role of the use case is totally changed from the original UML intention, which was to identify all the user (actor) tasks and outline the interface between the actor concerned and the system. It is very easy to produce a functional spec for each use case, where the interface is expanded into complex descriptions of how the system processes each step. The use cases become unwieldy (unreadable even) so some one decides that they need to summarise it as a Requirements Register - often a list of a few thousand testable statements that have lost all context. The bodges that follow this are sometimes unbelievable.

I start with the mission statement outlined as a domain structure. It identifies the key data classes involved at a high level and pins the most significant business rules to them. These are only Business Classes and the associated business rules can now include a non-functional rule that says that an SOA architecture is required. This is truly a Business Analysts concern. It describes, at the right level, the key business requirements, and if SOA is one of them, then it is a proper BA concern; it actually should strongly push the BA into business level discussions with the architect. It is where the SOA architecture should start to come from.

Now, the use cases can be drawn up, without any step by step descriptions and viewed in conjunction with the outline domain classes. These use cases should almost fall out of the 'tobe' analysis, as they should. If SOA is a business requirement, then the emphasis on the use cases may change a few of them; NOT necessarily their existance, but their goals.

Only then should the use cases be expanded out step by step; almost at a realisation stage. The domain classes need expanding to accomodate the 'business rules (constraints)' which are now no longer in the use cases. How the system processes a step in the use case, if it is a business requirement, get recorded in the appropriate process of the appropriate class. etc. The use cases and business objects in the domain diagram get rationalised with one another by a sequence diagram to ensure consistancy.

Sorry, its a long comment, but I feel it is very important to both SOA and non-SOA systems.

Gil

user-pic

This is the hard reality. The abstraction between the Service layer and the data layer is important not just from Services point of view but also from Enterprise application integration point of view. Whenever we try to address the new requirement in Enterprise application, it changes the underlying data structures sometimes at physical level and sometimes at logical level as a result of changes in the data structure modelling. The intersection of the Business and Data Services layer should make sure that any ripples caused at the Data Services layer should not reach the Business Services layer in turn affecting existing Services. To me it boils down to the data architecture of any given application which is part of the SOA Architecture. If the Data architecture is robust enough than it is more likely that the Service which are build on top of that would also be Robust.

The optimal Data Architecture is of great importance. But that said, even with the finest Data Architecture the Data Quality contained within may kill SOA and any other architecture. Only the killing is so brutal with SOA.

user-pic

If we have a look to EA architecture frameworks and consider that SOA works with EA the data issue is not so critical.
The stack of EA is the following:
• Business architecture,
• Information systems architecture, often
subdivided into data architecture , and application architecture
• and technical architecture.


For instance the architecture types supported by the architecture framework TOGAF 9 from Open Group are:
 Business architecture
 Data (or information) architecture
 Application architecture
 Technology architecture

So, an architect that uses an EA architecture framework must think of data (metadata) and data organization before defining applications.

Now if SOA works with EA , what happens. SOA is a modular architecture, the modularity coming from services that are component abstractions defined by contracts that hide component implementation. With SOA, applications become collection of services that work together.
A service being a piece of application, architects that use EA architecture frameworks + SOA will think of data and data organization before defining services.

user-pic

David

It is really "Information of value to Business" which is an aggregation or processed data that is the key for SOA especially for the payload and orchestration of SOA rather than raw data?
Thanks.

user-pic

David,

I would like to suggest that your article does not go far enough. I realize your are limited in how much you can say in 5 paragraphs, but think this topic merits a series of articles. IMHO, people need to think more about than abstracting application specifics and consider that SOA without MDM is problematic. Additionally, the existance of MDM informs the services to use. If SOA is used with MDM, then the services can become indpendent of the application that serve the data. Clients can treet unifed enterprise data somewhat like a virtual database. In this case, the services might become more generic where business processes are encode in an orchestration layer instead of application level services.

The issue with "Information of value to Business" is when the same raw data is consumed for several purposes within the enterprise. This is very often the case with Master Data. Aligning the business need for several different purposes is not trivial. But exactly with SOA reusability is a core principle that requires uniqueness, consistency and precission with raw data.

More about the data quality related to multi purpose use of data here:

http://liliendahl.wordpress.com/2009/07/05/fit-for-what-purpose/

user-pic

I am glad to read this article.

This is an area that I have been talking about since XML and SOA were introduced. I spent over 10 years on DQ and DM. When XML and SOA were introduced, I wrote one of the first papers in industry discussing about the importance of data management for XML and SOA and why SOA will fail if data is not addressed. I also discussed the importance of XML/SOA Governance Framework.

Without data management strategy for SOA, SOA is nothing but "SILO ORIENTED ARCHITECTURE" as you end up tight coupling of data interfaces between SOA services. A successful implementation of SOA requires:
- Loose coupling of the physical service interfaces (the plumbing)
- Loose coupling of the data interfaces (payload in a service)
- SOA/XML governance program

Data or information is the key for any integration. Most of the time we ignore this. When you look at integration evolution, data integration is often ignored. To me, integration evolution is a classical example for "INSANITY", ie. attempting to solve the integration issues by different means without addressing the fundamental issue which is "data".

No matter how smart your integration patterns are, no matter how smart your IT system is, no matter how smart your business processes are, if the quality of data used by all these is poor and not managed, what you are going to get out of these is going to be poor. Your IT systems are only as agood as the quality of the data they process.

This is a great article. Adding few more artifacts will make this even wonderful.

David,

I've come late to this post as I only stumbled upon it recently.

Yes, SOA has been autistic re. Structured Linked Meta-Data. This affliction has cost it and the companies that took this path to IT modernization dearly :-(

If we cannot do the following re. data we have nothing:

1. Identify Data at the Datum (Data Item) level
2. Use HTTP URIs for Datum Identifiers
3. Refer to Data Items using their Idenfiers (think " * " in 'C')
4. Retrieve Data Items (e.g. for manipulation) using their HTTP Addresses i.e., URLs (think " & " in 'C')
5. Implicitly bind Data Items to their Metadata (what you get gratis via HTTP URIs)
6. Metadata should Entity-Attribute-Value + Classes & Relationshps (EAV/CR) model based on Entities, Attributes, and Values (optionally) endowed with HTTP URIs.

Kingsley

David,

Your comments on the data foundations of successful SOA projects are right on the money. And it is worth asking why this situation has come to pass.

You would think that it makes good sense that normalized business semantics should underpin good software. But the semantic chaos which confronts the SOA or BPM architect is sometimes too much complexity to handle.

Doing SOA without a good based of data is like trying to write a report without a fundamental agreement as to the set of letters which constitute the alphabet -- and the report writer must contend with six different alphabets, only part of which are easily mapped or normalized. It would be a nightmare -- or is a nightmare.

I heard a senior scientist from OntologyWorks give an address to a conference, all the way back in 2001. His comments were basically that XML was a disaster waiting to happen -- XML being the return of heirarchical data structures which had been superceded by relational technologies. His predictions have only come true now.

Business leaders -- and their senior technology executives -- will continue to pay a price for a willingness to ignore the challenge of foundational business semantics. MDM is at least a recognition of the problem, but with a tinker-toy approach as a solution. ERP is an attempt to solve the semantic problem -- and SAP's leadership has invested millions in normalizing their data models.

But the problem remains. A generation from now we may look back on these days as a time of technological barbarism. The problem of semantics generates a lot of consulting dollars --but some organizations pay a high price for chaos -- and in the case of medical records, some individuals pay a high price for chaos.

Is anyone to blame? My own view is that the source of the problem is both a willful and even macho ignorance, but that the attitude is compounded by the sheer size of the problem.

JHM

For those who don't know about SOA here is a great video about SOA : http://www.videorolls.com/watch/What-is-Service-Oriented-Architecture-SOA

Industry expert Dave Linthicum tells you what you need to know about building efficiency into the information management infrastructure

David Linthicum

David Linthicum is the CTO of Blue Mountain Labs, and an internationally known distributed computing and application integration expert. View more

Subscribe

 Subscribe in a reader

Recently Commented On

Categories

Microsoft,

Monthly Archives

Blogs

ADVERTISEMENT