Leveraging Information and Intelligence

David Linthicum

Coding for data integration means getting it wrong 100 times, before you get it right once.

user-pic
Vote 0 Votes

Rick Sherman had a nice post entitled "The Trial-and-Error Method for Data Integration."   Clearly, Rick and I are kindred spirits, and I thought he made some great points.   

He points out the larger problems as:

·         "Not Developing an Overall Architecture and Workflow "

·         "Thinking that Data Quality is a Product Rather than a Process"

·         "Assuming Custom Coding is Faster than ETL Development"

"The usual development approach for data integration is to gather the data requirements, determine what data is needed from source systems, create the target databases such as a data warehouse, and then code. This is an incomplete, bottom-up approach. It needs to be coupled with a top-down approach that emphasizes an overall data integration architecture and workflow."

This is a huge issue that I see as I wonder the data integration universe.   There is little or no architectural thinking around data integration, and those charged with creating the solution simply attack the problem...code or buy technology first, ask questions later.     The end result is a data integration architecture that has to be adjusted 5 times to meet the needs of the problem domain.   That is not cost ineffective, and just not smart.

"People often assume that data quality problems are simply data errors or inconsistencies in the transactional systems that can be fixed with data quality products. They overlook and, therefore, don't try to prevent the fact that problems arise when you integrate data from disparate source systems into a data warehouse."

Data quality is not a problem you solve by just tossing technology at the problem.   It's really a matter of understanding the issues, and then bringing the right technology to the party.    Most hear "data quality issue," then Google "data quality," and then ask the first 3 vendors to come in and solve their issue.    Start with the process and understand your issues first, the technology is important, but it comes after we do our homework.  

"While most large enterprises have embraced ETL development as a best practice, the reality is that custom coding is still prevalent, "...

I wrote the EAI book back in the day because I saw too much custom coding around integration.  In essence, putting some good architectural thinking around data integration, and then finding the right tools for the job, not as popular as you might think.    Everyone thinks that they can code their way to success here, when in fact it typically means that your ongoing maintenance costs, and also the cost of inefficiencies, are off the charts.    In this day and age, unless there are specific and unique needs, custom coding your data integration solution is never a good idea.

No TrackBacks

TrackBack URL: http://www.ebizq.net/MT4/mt-tb.cgi/16192

Leave a comment

Industry expert Dave Linthicum tells you what you need to know about building efficiency into the information management infrastructure

David Linthicum

David Linthicum is the CTO of Bick Group, and an internationally known distributed computing and application integration expert. View more

Subscribe

 Subscribe in a reader

Recently Commented On

Categories

Microsoft,

Monthly Archives

Blogs

ADVERTISEMENT