Rick Sherman had a nice post entitled "The Trial-and-Error Method for Data Integration." Clearly, Rick and I are kindred spirits, and I thought he made some great points.
He points out the larger problems as:
· "Not Developing an Overall Architecture and Workflow "
· "Thinking that Data Quality is a Product Rather than a Process"
· "Assuming Custom Coding is Faster than ETL Development"
"The usual
development approach for data integration is to gather the data requirements,
determine what data is needed from source systems, create the target databases
such as a data warehouse, and then code. This is an incomplete, bottom-up
approach. It needs to be coupled with a top-down approach that emphasizes an
overall data integration architecture and workflow."
This is a huge issue that I see as I
wonder the data integration universe.
There is little or no architectural thinking around data integration,
and those charged with creating the solution simply attack the problem...code or
buy technology first, ask questions later.
The end result is a data integration architecture that has to be
adjusted 5 times to meet the needs of the problem domain. That is not cost ineffective, and just not
smart.
"People
often assume that data quality problems are simply data errors or
inconsistencies in the transactional systems that can be fixed with data
quality products. They overlook and, therefore, don't try to prevent the fact
that problems arise when you integrate data from disparate source systems into
a data warehouse."
Data quality is not a problem you
solve by just tossing technology at the problem. It's really a matter of understanding the
issues, and then bringing the right technology to the party. Most hear "data quality issue," then Google
"data quality," and then ask the first 3 vendors to come in and solve their issue. Start with the process and understand your
issues first, the technology is important, but it comes after we do our
homework.
"While
most large enterprises have embraced ETL development as a best practice, the
reality is that custom coding is still prevalent, "...
I wrote the EAI book back in the day
because I saw too much custom coding around integration. In essence, putting some good architectural thinking
around data integration, and then finding the right tools for the job, not as
popular as you might think. Everyone
thinks that they can code their way to success here, when in fact it typically
means that your ongoing maintenance costs, and also the cost of inefficiencies,
are off the charts. In this day and
age, unless there are specific and unique needs, custom coding your data
integration solution is never a good idea.













Leave a comment