Comparing ETL and EAI

INTRODUCTION



This document aims to compare the two widely used Integration Technologies, EAI and ETL.

Although both ETL and EAI technologies seem surprisingly similar from an architectural view - where so-called adapters (or connectors) provide access to systems and data sources transformations take place to standardize proprietary formats, or routing capabilities are used to move packets of data - ETL and EAI serve fundamentally different purposes from an information management perspective.

ETL is typically used to move bulk data from application systems to data warehouses or data marts (all in a highly scheduled environment to avoid generating bottlenecks from high-volume transactions) while EAI is the technology of choice to connect systems for business process management and workflow.

EAI AND ETL DEFINITIONS

EAI: Acronym for Enterprise Application Integration, EAI involves integration of incompatible business applications within and beyond enterprise to allow them to talk to each other seamlessly and to share data in real time.

EAI is usually deployed to allow real-time system users to access the data and functionality in legacy systems in a way that ensures consistency of access and update regardless of the data source. For example EAI (like Vitria, SAP XI) would be used to propagate single logical operation, such as a preferred address change to the many systems that hold preferred addresses for the customer in question.

Put simply, EAI exist to allow client applications to operate on data from across the business without regard to its location, encapsulating technology or format.

ETL: Extraction, Transformation and Loading (ETL) are three database functions that are combined into one tool (like Informatica) to pull data out of source databases and place it into target databases. ETL is used to migrate data from databases to others, to form data marts and data warehouses and also to convert databases from one format or type to another.

ETL is used within the business domain of management reporting for the process of collating business data from many sources and loading it into a form from which ad hoc queries and reports can be generated.

Put simply, ETL exist to allow client applications to query data from across the business without regard to its location, encapsulating technology or format.

COMPARING ETL AND EAI

While the common thread between ETL and EAI may be data integration from disparate systems, outside of this commonality they are fundamentally different approaches. Both technologies rely on the concept of a unified view and the definition of a mapping that allows data from many disparate sources to be "projected" onto that view. What differ are: the purpose, speed, direction and amount of data that are transformed and placed within the unified view from the external sources.

EAI is the technology solution for enabling disparate applications in the enterprise to have access to the many applications/databases that exist in the enterprise. The need for EAI arises as there is an increasing business need of connecting multiple systems in real time.

The need for ETL arises from a separate business challenge: the need to offload the data from the many OLTP systems into a common data warehousing environment for analysis and reporting. This separate data warehousing environment must have the capability of extracting data from all of the disparate source systems and running large volumes of this data through the mappings that are developed with minimal impact on the operating systems.

EAI is the connector between multiple systems, whereas ETL is a data warehousing process performed in an environment that is separate from the systems themselves.

EAI systems retrieve small amounts of data from many source systems in one operation and produce an up-to-date snapshot of a small portion of the business data (say a single customer, his accounts, payments and so on). The speed with which this can be achieved is critical, as it is usually performed in response to a live, online request - say from a Web browser. Thus performance optimization within an EAI system is aimed at reducing the response time for a single user request or update.

In contrast, in ETL systems, large batches of data from the source systems are transformed individually at relatively infrequent timed intervals (batch), to produce a consolidated historical record of the entire business. The time that it takes to produce the record, and the time taken to query it is not generally an issue and is not expected to be real time.

Another way of understanding the differences between ETL and EAI is from the perspective of user involvement. This is useful in showing that ETL is a process that is designed by users, whereas EAI is simply a technology solution that enables systems to communicate.

ETL requires extensive user involvement:

  • user teams must define the decision support requirements
  • perform the data modeling to emulate these requirements
  • identify the sources of the data that will need to be extracted into a data warehouse environment
  • develop mappings that transform this data into a form suitable for analytics and reporting

In contrast, EAI:

  • requires no user involvement
  • once implemented, EAI is a technology solution that is transparent to the end users.

Finally, the major difference between EAI and ETL is that ETL is a one-way process, creating a historical record from the source data, whereas the main purposes of EAI is to ensure bi-directional flow of data between the source and target systems.

The table below highlights the EAI and ETL characteristics:

ETL and EAI Characteristics

ETL EAI
Focus Data Integration (Data Warehousing) Application Integration (Operational Apps)
Primary Technology Database Application
Timing Batch Real-time
Data Historical Transactional
Volume Size
>Days or weeks of data
>Records per min (GB)
Throughput
>Single transactions
>Messages/second (KB)
Integration Initiation Pull, query-driven Push, pull, event-driven
Flow Control Meta-data driven, complex data flow Business-rule driven, workflow oriented
Validation Strong data profiling and cleansing capabilitiesLimited data validations
Transactional Limited transaction and messaging capabilities Strong transaction control and recovery. Guaranteed message delivery with two phase commit

CONCLUSION

EAI and ETL are fundamentally different technologies even though they are undergoing mutual consolidation. Through vendor acquisitions, combined platform offerings with integrated EAI/ETL capabilities are expected to enter the market slowly.

According to META group “Distinct data integration technologies (e.g., ETL, EAI) will converge (2007), ultimately surviving only as various subsets of intermediary capabilities in the service-oriented architecture (2009).”

About the Author

Somnath Basu is an EAI Project Manager working with Infosys Technologies Limited (http://www.infosys.com) for the last 8 years. Your comments are welcome to somnath_b@infosys.com.

More by Somnath Basu

About Infosys Technologies

Infosys Technologies Ltd. (NASDAQ: INFY) provides consulting and IT services to clients globally — as partners to conceptualize and realize technology driven business transformation initiatives. With over 25,000 employees worldwide, we use a low-risk Global Delivery Model (GDM) to accelerate schedules with a high degree of time and cost predictability.

We provide solutions for a dynamic environment where business and technology strategies converge. Our approach focuses on new ways of business combining IT innovation and adoption while also leveraging an organization's current IT assets. We work with large global corporations and new generation technology companies - to build new products or services and to implement prudent business and technology strategies in today's dynamic digital environment.