Making Sense of Business Information.

Scott Cleveland

What is Data Virtualization?

user-pic
Vote 0 Votes

From SearchOracle.com...

The typical enterprise today runs multiple types of database management systems (DBMSs), such as Oracle and SQL Server, which weren't designed to play well together.
Combine those issues with the fact that -- as a result of the proliferation of data-retention regulations like The Sarbanes-Oxley Act -- enterprises today are seeing an unprecedented rise in the amount of data they store, and you've got one heck of a data integration challenge on your hands.

"Data integration is getting harder all the time, and we believe [one of the causes] of that is that data volumes are continuing to grow," Yuhanna said. "[But] you really need data integration because it represents value to the business, to the consumers and to the partners. They want quality data to be able to make better business decisions."

My Thoughts...

I like this definition of data virtualization - 'Aggregate data from disparate sources to create a logical, single virtual view of information for use by front end solutions such as applications, dashboards, portals, etc.'

Why the push for data virtualization? Forrester says that business users want fast, real-time and reliable information to make business decisions; while IT wants to lower costs, minimize complexity and improve operational efficiency.

Potential Benefits include:


  • Increase revenues through improved product/service offerings, improved customer support, faster market response, etc.

  • Decrease costs through fewer physical repositories; lower hardware, software and facilities costs; reliable operation; and greater collaboration.

  • Reduce risk through access to more complete, fresher data delivered through a web user interface in real-time.

Your Thoughts...

Has your company addressed these issues?

4 Comments

| Leave a comment

Once enterprise implements virtual infrastructure and gets over the hump of trusting the cloud security, the next frontier in cloud adaption is simple - what can I actually deploy into the cloud? Tremendous capacity was created by the elasticity of the cloud and by dynamically allocating server CPU, memory, network and bandwidth resources to apps. Once entire enterprise stack has been virtualized - the data virtualization comes naturally. As such, data virtualization is a bit different in nature from the rest of the infrastructure stack; the main idea behind virtualization is object globalization across desperate data silos, access control as already defined in each silo. Some hot new start-ups emerged accomplishing data virtualization for apps in the enterprise (queplix.com), that complete the virtualization infrastructure and actually deliver its benefits. The DV is the actual enabler for the cloud adaption cycle. In addition to the industries Scott mentioned in his article, BI and Enterprise search are the two main industries which are starting to use DV and see direct results: not only they now use a single source of virtual data, but, for example queplix product also merges similar data structures and presents a single virtual entities to the BI and search. Google GSA already works with Queplix DV and most of the BI tools are now compatible with it. I believe and adaption numbers confirm DV as a required stage for any enterprise cloud strategy.

Steve -

Thank you for raising the question, "what is data virtualization?"

Data virtualization is 100% of what my company, Composite Software, provides.

And our website, www.compositesw.com includes over 50 customer use case descriptions that will provide rich insights into how these enterprises answer the what is data virtualization question for themselves.

Enjoy, Bob Eve, Composite Software

I am currently working with one of the data integration companies and I can attest to the fact that its been a great year for Data Virtualization, while it was somewhat tough for data integration companies. I saw a number of pilots fail with ETL and MDM tools. While at the same time these new kid on the block - data virtualization got a lot of cred here. The appeal in "advanced data virtualization vendors" is in their architecture (fits our environment), but also in the fact that now our business owners and LOB folks can use these products as opposed to SQL jockeys (we don't have large IT staff nor consultants). We had been using Queplix VDM product from simple data integration to decommissioning our legacy Vantive CRM to salesforce and even implemented our first MDM roll out based on queplix virtual metadata catalog. We could not afford Informatica so we actually utilized VMDM capabilities of advanced data virtualization; our MDM strategy was done in (hold your breath...) 2 weeks!

Here is what Wikipedia has to say about Data Virtualization. It is important to note that Basic Data Virtualization has been around for many years. You can get it open source from http://en.wikipedia.org/wiki/TEIID. Why buy a basic data virtualization product when you can get it for free?

Advanced data virtualization has substantially more capability, doesn't require the use of SQL and can scale to integrate multiple applications. Software blades substantially reduce the effort required to implement connectivity to major enterprise applications. Virtual master data management is also supported by Advanced Data Virtualization.

Check it all out here:

From Wikipedia, the free encyclopedia

Data virtualization is the process of abstraction of data contained within a variety of information sources such as relational databases, data sources exposed through web services, XML repositories and others so that they may be accessed without regard to their physical storage or heterogeneous structure. This concept and software is commonly used within data integration, master data management, cloud computing, business intelligence systems and enterprise search.

The History of Data Virtualization

Entry level data virtualization capability mostly came from vendors in a space called Enterprise Information Integration (EII). EII is a basic integration architecture that allowed taking a collection of few data types from a variety of sources and then present this to a consuming application, report or dashboard. EII was also referred to as data federation since it basically provided means to create relational joints and view, sometimes referred to in this context as "data mashups". Denise Draper of Microsoft Corporation wrote: "The value of EII will be realized as one technology in the portfolio of a general data platform, rather than as an independent product. The ETL approach to the problem cannot be the �nal answer because it simply is not possible to copy all potentially relevant data to some single repository."[1] The advanced data virtualization builds on knowledge and concepts developed in EAI and EII industries and presents itself as the next generation consolidated technology platform designed to replace EII, EAI and ETL point solutions.

Consistent with the relational nature of data federation concept, the entire structure of EII was completely relational and it works best with tables and rows. EII is also designed to create point to point integrations or “pipes� with one or two openings at each end. Once a set of two applications is integrated, and you want to extend an integration to another application, you start again. There is no benefit to adding the 3rd application to this existing integration and only liability. When you want to integrate to a 3rd application you lay another “set of pipes� and do it again. Another good analogy is to think of 1000 homes that need to connect phone lines to each other. The EII or basic data virtualization approach would be for each of these to run pipes to each one of the other homes.

EII was never focused on the bulk movement of heavy volumes of data or cloud computing.[2] The focus of EII was on the viewing of data and this brings with it the movement of small amounts of data. For this reason, architecturally, EII was not designed to scale. Very recently, using web services and other technologies, the notion of 2 way information flow for these basic “data virtualization� technologies became established.
[edit] Basic Data Virtualization

Basic data virtualization includes movement of small amounts of data and the federalization of data, at minimum. There is no metadata abstraction available to the user for entire data sources. Point A and point B are abstracted, by view, directly to point C. It’s a “virtual view� and hence data virtualization. These federalized views are as close as you get and they are tailored very specific to the point use case. Most of these products have basic automation and a very basic set of application program interfaces.

All basic data virtualization tools are designed for the technical users. The predominant model for UI is the wire diagram. This is a screen where either you draw or project "lines" and links between small relational tables and/or taxonomies. These wire diagrams can be very complex and require an SQL specialist with a detailed knowledge of the internal schema of every application being considered for integration. You need to draw lines between multitude trees of the database tables, web services components which requires you know the intimate details of the relational database schemas you are connecting. These are not solutions designed for use by business level users. You can also use template-based designs which could decrease the time and complexity of the wiring diagrams. When you need to connect the same application to a new one you need to “lay a new pipe� and re-use the tool again. All the links and administration need to be re-established. At a fundamental level first generation basic data virtualization products are held back by their architecture, designed for point to point solutions. Fundamentally these architectures wrap around the SQL modeling. The level of abstraction is logically a view which is fundamentally more cumbersome approach than having a true layer of abstraction. In some cases basic data virtualization solutions could be detrimental in the way the data is federated allowing the data flow to bypass data management policies that monitor data quality and cleansing.[3]

[edit] Advanced Data Virtualization

Next generation of the Data Virtualization, Advanced data virtualization, is based on the premise of the abstraction of data contained within a variety of data sources (databases, applications, file repositories, etc.) for the purpose of providing a single-point access to the data and its architecture is based on the object oriented modeling combined with semantic abstraction layer as opposed to relational data modeling and a limited visibility semantic metadata confined to a single data source. Advanced data virtualization has emerged as the new technology to complete the virtualization stack in the enterprise.

The challenges new data virtualization architecture addresses were present in the enterprise for a long time: uniform holistic access, data access security, performance and political and cultural barriers of the data owners forcing them to share the data they own and responsible for. Several other technologies were designed to solve them in past: Master Data Management (MDM), Data Warehouse solutions, Extract Transform and Load technologies (ETL), Data Aggregation. With the advent of cloud computing, advanced data virtualization technology was designed to utilize the advantages of cloud platform and resolve the above problems in more effective and efficient way than legacy technologies. Advanced Data Virtualization also led to creation of new category of Virtual Master data management. Advanced data virtualization technology combines elements of data integration for enterprise applications running in cloud and on premises, master data management, data governance and quality built on the same virtual catalog platform.[4]

Leave a comment

Steve Cleveland's thoughts on Data Virtualization.

Scott Cleveland

Scott Cleveland is a technical, innovative and creative marketing manager with more than 25 years of experience in marketing, marketing management, sales, sales management and business process consulting aimed at high-tech companies. His areas of expertise include: product marketing, solutions marketing, solution selling, sales maangement, business process management, business process improvement and process optimization. Reach him at RScottCleveland[at]gmail.com.

Subscribe

 Subscribe in a reader

Recently Commented On

Categories

BI,

Monthly Archives

Blogs

ADVERTISEMENT