Integration on the Edge: Data Explosion & Next-Gen Integration

Hollis Tibbetts

Our World of Sick Software and Dirty Data - what to do?

user-pic
Vote 0 Votes

In survey after survey, about half of IT executives consistently agree that data quality and data consistency is one of the biggest roadblocks to them getting full value from their data.

This has been consistently true all the way back to Greek and Roman days. I suspect it will be true in 100 years.

Similarly, many experts estimate that HALF the money spent on developers goes towards "software repair".

So we're living in a world of sick software and dirty data. A fine mess we've gotten ourselves into. And the cost of all this is staggering.

I've long been a proponent of rapid, iterative, and continuous testing. This concept holds true for both data and applications.

This rapid, iterative, continuous testing model has measurably improved the quality of software development. Evangelists such as Kent Beck have had a huge impact on this. I recently posted a freely downloadable white paper on this topic.

But where are the evangelists for data quality? Where is an open source "JUnit for Data" and if it's out there, why isn't everyone using it?

As data are added or integrated, data should be tested. Profiling is a simple, fast, relatively easily implemented and highly effective way for eliminating significant volumes of defective data.

When developers write a new application for the input of some new data, it's normal for input fields to be "validated" - a simple "hard coded" form of profiling. Month number needs to be between 1-12. Not rocket science. And it's universally done.

Yet people have far fewer reservations about integrating data from here, there and everywhere - often not checking for even the most egregious data errors, and thereby polluting the organizational drinking water.

Data profiling engines are a great technology for quickly improving the quality of data as it is integrated from one system into another. At the highest level, they are an engine that scans data, and applies certain easily definable rules to data elements, such as formats, ranges, allowable values and can evaluate relationships between different fields.

Furthermore, these engines can also be used to analyze existing data stores very rapidly and generate "exceptions files" for manual, or semi-automated remediation (if anyone can find a totally automated data remediation system, I'd love to know about it). So they can be used in "continuous testing" or "batch testing" mode.

I've never understood why these engines haven't been more popular. There is no "JUnit for data" as far as I know. But commercial solutions are available - they're not terribly expensive and rapidly pay for themselves.

On the other hand, I've never understood why organizations are so tolerant of bad, dirty data. They waste millions directly because of it (and untold quantities of money in "wasted opportunities"), but are reluctant to spend $15,000 to help fix a significant portion of it.

No TrackBacks

TrackBack URL: http://www.ebizq.net/MT4/mt-tb.cgi/18121

1 Comment

| Leave a comment

I'd like to encourage readers to read an excellent article by Loraine Lawson that is highly relevant to this topic.

Her articles are very well-written and well-researched.

http://www.itbusinessedge.com/cm/blogs/lawson/why-business-users-should-do-data-profiling/?cs=48413

Leave a comment

This blog offers an informed and informative perspective on next-generation integration and the ongoing explosion of technologies, data and applications. The ultimate goal: turning the problems caused by this explosion into assets and competitive advantages.

Hollis Tibbetts

Hollis has established himself as a successful software marketing and technology expert. His various strategy, marketing and technology articles are read nearly 50,000 times a month. He is currently Director for Software Strategy in the Mergers & Acquisitions organization of Dell, Inc.

Hollis has developed substantial expertise in middleware, SaaS, Cloud, data management and distributed application technologies, with over 20 years experience in marketing, technical, product management, product marketing and business development roles at leading companies in such as Pervasive, Aruna (acquired by Progress Software), Sybase (now SAP), webMethods (now Software AG), M7 Corporation (acquired by BEA/Oracle), OnDisplay (acquired by Vignette) and KIVA Software (acquired by Netscape).

He has established himself as an industry expert, having authored a large number of technology white papers, as well as published media articles and book contributions.

Hollis is a regularly featured blogger at Sys-Con Media. He is also a featured author on Social Media Today "The World's Best Thinkers on Social Media", and maintains a blog focused on creating great software: Software Marketing 2011.

He tweets actively as @SoftwareHollis

Additional information is available at HollisTibbetts.com

Recently Commented On

Monthly Archives

Blogs

ADVERTISEMENT