Leveraging Information and Intelligence

David Linthicum

Metadata and Data Integration

Vote 0 Votes

Indeed, understanding the metadata for your data integration problem domain is key to its success. However, most data integration projects don't focus on understanding metadata as well as they should, and ultimately the solution suffers. This is most evident when the problem domain expands in the future, such as is the case now with the expansion of data integration for cloud computing. The metadata quickly become complex and difficult to manage.

This is not to say that lack of metadata understanding is the exclusive fault of the data integration architects, the data integration technology vendors are to blame as well. Clearly, we need tools and technologies that are better able to manage metadata. While most integration servers understand input and output schemas, few can take it much further beyond the simple notion of structure.

The real value, however, is an eventual movement to an active metadata management paradigm. Within this type of technology, we'll have the power to analyze existing metadata local to many diverse systems, and bring all of that disparate metadata into one unified metadata model. Moreover, we'll also have the capability to understand changes to existing metadata local to source or target systems as they happen, and update our enterprise metadata automatically. Taking this one step future we could also determine what integration processes are bound to metadata and how those processes change as the metadata changes, perhaps automating this process within the integration server.

To understand metadata within their application integration problem domain is ugly work but if you skip the deep dive into metadata, you'll never get your data integration problem under control. Never.

1 Comment

David, as always some good and invaluable points.

I have often been a cynic of "Data Integration" in the sense that data in and of itself is often useless.

Start from the bottom..

Binary zero's and one's are next to useless without business rules (bytes perhaps) to describe their purpose. (except if the purpose IS a zero or a one or a flag)).

Bytes are next to useless without business rules to describe their intent. (9 ones will NOT make a byte!).

And so on, Database, Tables, and Columns and the data they contain MUST have business rules to describe their intent otherwise, often are useless. Multiple intent is common when working with partners (compliance, rules, regulation).

And herein lies the dilemma that might help describe this paradox. Integration is hard because each time an integrator tackles the data with differing sets of business rules (as will continue forever), so will the data get harder and harder to integrate with the next consumer. Sure, without business rules, (or should I say logic or man decades of code) we'll have simple data integration.

Life is not like that though. Add enough business logic around something as clean as SOA and you are still in the same predicament a week, a month or a year from now when the business rules change! Reuse I hear you say? Nope, the SOA service will be copied, modified and put into production. 2 sets of rules on the same service = legacy.

And so the pendulum swings. Data Integration? Mmm, it's hard. Maybe never to be solved.. Eek.

Industry expert Dave Linthicum tells you what you need to know about building efficiency into the information management infrastructure

David Linthicum

David Linthicum is the CTO of Blue Mountain Labs, and an internationally known distributed computing and application integration expert. View more


 Subscribe in a reader

Recently Commented On



Monthly Archives