By Alex Gorelik, Co-Founder and Chief Technology Officer, Exeros
The exponential growth of data in large enterprises has resulted in literally more data than can be consumed and used in an efficient manner. While it’s often the case that more than enough data is available, it’s even more likely that crucial data is hidden, and it’s often nearly impossible to determine which data to rely upon for specific needs. Redundant data sources – and worse, data sources that differ in small but key ways – need to be reconciled in order to turn reams of data into information that can be effectively used to accelerate decision-making.
As the need grows to migrate data from legacy systems, and as state and federal laws require more strict control of information and the flow of information, businesses are compelled to rationalize and organize huge volumes of data. Certain industries rely on reference data for critical business functions: for example, the securities industry has major reference data issues, which reduce efficiency and increase operational risk and costs. Research shows that the cost of repairing a trade is $6 if identified at the order entry stage, but $50 if identified at settlement. Poor data integrity and lack of consistency between externally sourced equity index feeds is the reason 30percent of all trades fail. (Tower Group, Global STP Study, 2002.)
What are the options?
There are many business reasons to defeat data inconsistency, including better data security, lower risk of compliance errors and even simply to make better business decisions. Getting to that point, however, is not as simple as it sounds. There are plenty of obstacles working against a data analyst hacking his or her way through the data jungle. Data can be old, missing, in a language no one (computer or human) can understand, or maybe there is just too much data, so out of human scale that rationalizing by hand is a pipe dream. Technology has been thrown against this beachhead again and again, with ominously mixed results. And the pitfalls of errors are real, mistakes are costly: rogue data, inconsistency and inaccessibility are expensive to any enterprise. Government regulations and possible fines for non-compliance with HIPAA, Sarbanes-Oxley, Basel II and other requirements can cripple a business.
Basic strategy can go in the wrong direction if a manager cannot answer a simple series of questions:
What data do I have and where is it located?
How does it relate to all the other data in my possession?
How does flow through my organization?
Where are the data inconsistencies that cause bad business decisions?
When there is data overlap between data sets, which one do I trust in order to make good business decisions?