Bill Inmon made an interesting observation in this post entitled "The Elusive Virtual Data Warehouse." In essence he pushed back on the ability for data warehouses to reside in virtual data layers. While I typically agree with Bill, in this case he may be missing some key issues.
"Why then is the virtual data warehouse such a supremely bad idea? There are actually lots of reasons for the vacuity of virtue manifested by the virtual data warehouse. Some of those reasons are:
- A query that has to access a lot of databases simultaneously uses a lot of system resources. In the best of circumstances, query performance is a real problem.
- A query that has to access a lot of databases simultaneously requires resources every time it is executed. If the query is run many times at all, the system overhead is very steep.
- A query that has to access a lot of databases simultaneously is stopped dead in its tracks when it runs across a database that is down or otherwise unavailable.
- A query that has to access a lot of databases simultaneously shuffles a lot of data around the system that otherwise would not need to be moved. The impact on the network can become very burdensome.
- A query that has to access a lot of databases simultaneously is limited to the data found in the databases. If there is only a limited amount of historical data in the databases, the query is limited to whatever historical data is found there. For a variety of reasons, many application databases do not have much historical data to begin with."
Core to this assertion is that too many resources are consumed when leveraging a virtual data warehouse, typically using data abstraction software to go against the back end databases and create a single abstracted view. I've found that good abstraction layers that leverage good federated data query techniques typically don't have the resource impact that Bill is describing above. Indeed, it has a tendency to be more "operationally simplistic" since you're not dealing with another physical database to create your data warehouse.
Indeed, I recommend that my clients always consider the use of a virtual data warehouse, and the use of database abstraction software, in the context of their overall enterprise architecture strategy for the following reasons:
- The database abstraction software is well tested, and if tuned and implemented correctly performance and resource consumption is not an issue.
- This is almost always more cost effective when modeled over time.
- ETL routines are problematic, and difficult to manage.
- Those leveraging data warehouses are looking for up-to-minute info, and that's much more difficult with traditional data warehousing approaches.
I get Bill's point, but in practice I've not found that to be the case.













David -
I don't understand why people are so polarized on this question of virtual data federation VS. physical data consolidation. Maybe it is the word "data warehouse" that throws people off.
Perhaps if data store, data mart or some other noun was used as the object of the virtualization, people would be more willing to apply common sense to this question, as you have in this blog.
Both approaches have their merits as well as limitation, and often they are used together to provide the best solution to an end-user's needs.
Dogma aside, the wise data architect or BI development team leader pragmatically uses a full palette of tools and technigues as you advise.
-Bob