We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Leveraging Information and Intelligence

David Linthicum

The Differences between Combined and Abstract Data

Vote 0 Votes

In the world of business intelligence, most of those charged with building BI systems set up data warehouses as the place where decision support data is deposited after being cleansed, aggregated, and restructured. This seemed like the best approach for years, considering that the data we needed to leverage for business intelligence was very different than operational data. Therefore, most of those doing BI had very expensive and maintenance intensive data warehouses and data marts. Is that a good thing? Sometimes it is.

Truth be told, that that this more traditional approach, while seeming a bit antediluvian, is more often a fit than not. I call that data combining, or combined data.

The magic of combined data is that you have complete control over the data before it's mined through a BI tool. In essence you're creating instances of data that are leveraged by BI, and thus you can move from instance to instance as you see fit, and only place an instance of combined data on-line when it's of the proper content and quality. Thus, the advantages of combined data would be control and quality, where the downsides would be latency. You could be mining data that's weeks, perhaps months old.

Rising up as a popular way to do BI is the notion of abstraction, or abstract data. This means that instead of creating another physical database, or data warehouse, that we're leverage the data where is exists, binding it to a virtual database schema that exists only in middleware. Pretty slick.

Data abstraction has been leveraged for years, with the advantages being the avoidance of maintaining a separated physical database, also the ability to leverage BI against data that near real time considering that it's abstracted out of operational databases. However, the disadvantages around data abstraction is that in some instances data consistency and data quality can be an issue, moreover abstracted data typically cannot be aggregated in real time or near time.

The approach you take is really a function of your requirements, as usual. I would not get too religious around either way of managing data in the context of BI. Hope this helps.

1 Comment

Historically, there has always been a challenge when traditional business intelligence meets transactional processing models. “Real time� BI has been both a dream and an oxymoron for years in the data warehousing marketplace. One key reason for data warehousing is the fact transactional systems were inappropriate or inadequate for historical data analysis. Siloed application data didn’t play well with other siloed data and creating BI-like analysis in a transactional application would grind operational performance to a crawl or create glacial query response.

This is why there is BI a market in the first place and for years, the two technological architecture camps have tried to convince the other that their approach was superior for the ultimate integration of real-time, historical and analytical reporting. Transactional systems have added support for historical snapshots, versioning and some form of time series analysis, while warehouses approaches have added transactional “real-time� structures to traditional data warehouse designs.

For years BI companies have offered the ability to “drill into� operational systems, so there is nothing new here, but we still are looking for the ultimate build it once, serve everybody solution.

“Real-time� applications consisting of monitors and dashboards that have a single or limited data integration model are being labeled “Operational BI�. Transactional systems, integrated ERP and ESB vendors have tried for year to sell the concept of “real-time BI� but it has been hard to find a universal use case. For instance, take a multinational company that wants to know what its worldwide sales right now. For this global company, supporting multiple financial systems around the globe, all in different states of closure and reconciliation, it just might prove to be an impractical request; and forget about getting hourly profitability, it’s not going to happen.

This is not just a latency issue; this is process synchronization issue as well. When dealing with operational systems, it is not about just accessing the stored data; it’s about accessing the right information in the right state of process. Sometimes real time is the wrong time for both data quality and accuracy.

A real-time meta data management system that would be the heart of any fully abstracted or federated model is an enormous undertaking. But if we try to solve individual application issues with a federated approach in cloud computing’s Web 2.0 technology mashup scenario, things do get interesting.

A PivotLink OEM partner, OrderMotion is a real time application that optimizes marketing campaigns that are updated with inventory and shipment information from multiple suppliers and manufactures in “real-time�. PivotLink provides historical data series that are inputs to Ordermotion’s real-time predictive models and also allows historical and real time views to be “mashed on the glass� in an integrated dashboard.

Unfortunately, most traditional data from disparate systems will not “mashup� as tidily as this example or in every enterprise services bus product demonstrations I have seen over the years.

BI professionals are painfully aware of data integration challenges. Database administrators lose sleep over “data quality�. The entire data “stewardship and data governance� movement is to assure that the single version of truth is in fact, close to the truth. These checks and balances are needed because of the nature of building a data warehouse is about combining data that was not designed to be combined. The modern data warehouse is still the antithesis of traditional single source, real time transactional processing system.

But outside or alongside the traditional BI “validated and blessed� data warehouse domain, the world is changing and moving to a more federated data model. The promise of a low-no latency metadata driven “process to process� integration, where all data is accessed and aggregated and reported (or visualized) in real time is stunning.

For me, I still think the best demo for the integration of historical information, predictive analytics and real time data analysis is still the opening scene of The Minority Report, but that application is still “Mission Impossible� for most "for profit" companies outside of Hollywood.

Application specific integrated data integration with a MDM aware framework is where I think we are heading and that is a good thing. Will it replace all traditional DW practices, no. Is it worth considering about how it may be added to the BI process, absolutely.

Here are some areas it might prove interesting to investigate when considering abstract data
• Integration with external data sources for mashups by means of web services (Web 2.0) (integration such as geospatial integration (think Google Maps))
• Encapsulated query access to transactional apps/tables that directly query transactional applications, as long as they are isolated to the point where they will not negatively impact transactional performance.
• Streaming data analysis and exception monitors on dashboards, and email alerts

Industry expert Dave Linthicum tells you what you need to know about building efficiency into the information management infrastructure

David Linthicum

David Linthicum is the CTO of Blue Mountain Labs, and an internationally known distributed computing and application integration expert. View more


 Subscribe in a reader

Recently Commented On



Monthly Archives