When it comes to data, the days of equal rights are over.
The traditional thinking around business intelligence and analytics is that every piece of data must undergo thorough transformation and cleansing before entering the data warehouse. Today, this line of thinking has changed.
As BI tools have become increasingly available to all employees for many purposes, the rigid data transformation processes of the past have become overkill for some analyses. The concept of "just-in-time" data allows for the use of native data in a growing number of circumstances. Adopting this more flexible approach to BI can help organizations achieve broader adoption and faster results from their systems.
Let's start by taking a closer look at how companies have approached BI in the past.
The notion of ETL
Extract, transform and load (ETL) and data-quality processes have been used for years to bring disparate data together into a consistent format that supports end-user dashboards and analysis. The transformation aspect of ETL involves reformatting of the data, cleansing to remove duplicate copies and inconsistencies and integration on a common platform. This requires a significant amount of time from skilled developers and establishment of daily processing requirements so that reporting systems are up-to-date the next day. ETL is required for many types of analyses, but it's expensive and creates delays for all users, perhaps unnecessarily.
Some limitations of ETL include:
BI for the few: To support these additional processes, BI has traditionally been owned and controlled by IT workers who oversaw the sourcing of data and creation of reports, and/or by a handful of analysts whose full-time jobs were to prepare reports for management. This means that most employees don't benefit from better decision-making because they can't easily access data or create their own reports. Companies then wonder why they don't have a stronger ROI from their BI systems.
Volumes of data, slow analysis: Many organizations possess a total data volume in the hundreds of terabytes, and even approaching the petabyte mark. Yet it can take analysts hours, or even days depending on the backlog of requests, to deliver useful reports. These days, that's too slow for many competitive industries.
As Chuck Schweiger, business analyst with retailer Timbuk2, recently remarked in a case study: "Our [enterprise resource planning] system has a wealth of data, but there really was no easy way to get that data out and interact with it. Moreover, our prior reporting and analytics process was so full of handoffs that it was not a workable solution for today's on-demand, flexible, global business environment." Timbuk2 took a Software-as-a-Service BI approach to overcome this barrier.
Lost data, lost opportunities: Time and budget may not be available to transform all data, and ETL tools have a tendency to transform data beyond analytic needs. In other words, the actionable qualities observed in the data's original state may be lost as it becomes sanitized and rationalized for a higher-level database.
How just-in-time data comes into play
Data quality is always important. But for many daily decisions, such as helping an airline customer rebook a flight, rerouting a shipment or analyzing same-store sales, there's got to be a faster, less risky way to get the best answers. The need to clear all data through an ETL process often creates needless overhead for companies and time lags for the people who need information quickly.
A 2010 survey of 212 BI practitioners and solution providers found that tight budgets have forced BI teams to devise faster, cheaper and more innovative ways to work. According to the survey, conducted by The Data Warehousing Institute (TDWI), this new approach to BI includes a focus on "self-service BI" (66% of respondents) and "a network of super-users in each department" (65%), compared with the top-down BI infrastructure.
Data that doesn't require extensive cleansing (for example, for auditing and compliance purposes) may include last-minute situational updates, supply-chain metrics or on-the-fly analytics. Just-in-time data can merge into a reporting system directly from the source databases.
When should you rely upon just-in-time data rather than thoroughly cleansed data? Here are a few examples:
Company mergers and acquisitions: Data from different companies, with incompatible formats, must come together for immediate insight on the new business environment.
Rapid reporting from ERP systems: Decision-makers often require quick views of where the business stands at a particular moment in terms of product shipped, inventory turns and sales by region or store.
Combined data from different subdivisions or departments: Executives and managers need to see current changes in product quality levels, customer-retention levels and market share.
Cloud interactions: The rise of external and internal cloud computing requires that data be easily accessible to people throughout the enterprise without first being extracted and then integrated.
Combined external and internal data: Critical data sources for decision-making may come from outside the firewall. It's optimal to keep this data out of the corporate data warehouse but still have it available for combination with the data-warehouse data.
Combined data from multiple operational sources: Many data sources are not within data warehouses, but instead come from disparate platforms and different formats. A BI system should be able to incorporate that data quickly, without always forcing ETL processes.
Global competition and the exponential growth in data volumes within companies mean that it's no longer practical or affordable to surface every piece of data through ETL. When it comes to compliance, legal, employee, financial and some types of government reporting and analytics, ETL is still necessary because these analytics have more stringent reporting and auditing requirements.
Yet organizations shouldn't be hampered by a one-size-fits-all data strategy. Just-in-time data can rapidly satisfy decision makers' demands for relevant business information without risking the quality of the decision. Companies that can adopt a two-tiered data quality approach will gain a far higher return from their BI efforts and open the doors to true self-service reporting.
About the Author
Claudia Imhoff is a popular speaker and internationally recognized expert on data warehousing, BI, analytics and the architectures supporting these initiatives. She co-authored five books on these subjects and writes articles for technical and business magazines. She is a faculty member for The Data Warehousing Institute (TDWI) and received the title of TDWI Fellow in 2006. She is the President of Intelligent Solutions Inc. and the founder of the Boulder BI Brain Trust (BBBT).