Enterprise data warehousing is supposed to reduce the incidents of stovepiped BI approaches seen in spreadsheets and spreadmarts. But many EDW sites may get so complex and far from its original design that end-users end up running back to spreadsheets, warns Dan Linstedt.
Dan recently pointed to the opportunities and challenges with the next generation of data warehousing, which he refers to as "Enterprise Data Warehousing 2.0." (Just as there's now a trend identified as "BI 2.0".)
The problem with enterprise data warehouses as they have existed is that they have been expensive to build, usually requiring expensive consultants. The issues aren't necessarily seen in the first iteration, but later on as more business units want to get on board.
As Dan explains it, the company selects "star-schema" modeling as the way to build its EDW. "Then, they select conformed dimensions, and shared fact tables. The first implementation costs the business 90 days and maybe 5 consultants, and maybe $250,000 USD. If your lucky, it might be $150,000."
So far, so good, Dan says. "The business unit that this is built for becomes very happy, with quick delivery, apparently low cost, and super fast access to dimensional information that meets their business needs... But then, reality sets in... Other business units see this success, and want 'one of their own' built."
No biggie. "Building a second or even a third star schema and then federating these together doesn't seem to be such a big deal," Dan says. "The cost may increase only slightly to maybe $180k or $275k, and the number of days to implement may increase only slightly to maybe 110-120 days."
However, by its fifth or sixth iteration, things are getting too complex, and the original designs of the EDW become lost or distorted, Dan explains. "IT (because of business needs) takes existing dimensions and begins to add different and loosely affiliated information to the same 'dimension,' thus, apparently attempting to 'conform' it." As this process continues, and IT gets in to the 5th or 6th project, "the conformity of the dimensions becomes lost in the fray," he says. "Too many different kinds of data are added to the dimension 'to conform it to the enterprise.' which distorts it's original purpose." In fact, if done improperly, "each time IT increases the size of this monster, it always creeps in to higher cost, and longer implementation timeframes."
Agility is lost, and "a simple 'change' that the business has to make (that used to cost $150k and take 90 days) now costs well in to the $350k range and takes six months or more. What was a conformed dimension now becomes a "deformed" dimension, and has trouble meeting the business needs." As a result, the business users do their own workarounds of this clumsy enterprise beast -- which means going back to relying on spreadsheets.
DW 2.0, Dan relates, "comes with the standard definitions that the industry has lacked over the years, finally and at last we have standards, definitions, and frameworks to follow." Plus, an essential piece of DW 2.0, Dan believes, is putting the right data model in place.
Dan is also the designer and advocate of the "Data Vault" modeling approach, also known as "Common Foundational Integrated Data Model Architecture." (Data Vault sounds better). The Data Vault modeling architecture is a "hybrid architecture consisting of the best of breed data modeling techniques used in both third normal form, and Star Schema - except it is a foundationally based architecture with standards, which if adhered to can steer your enterprise common data model in the right direction."
The Data Vault model has been under development since 1990, and has been available for free since 2000. Data Warehouse guru Bill Inmon said that “The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework."
Dan provides a technical overview of Data Vault here.