ETL with
WSO2 Enterprise Middleware Platform
Prabath Abeysekara - Associate Technical Lead
Outline
●

A Classic Use Case

●

What’s ETL and How It Is Interpreted In The Modern World?

●

Why ETL?

●

Challenges In Implementing ETL Solutions

●

Why Traditional Standalone ETL Products Are Considered
Dead In The Modern World?

●

What Factors To Be Considered When Implementing ETL In
Re-Architecting A System?
Outline contd..
●

Impact Of Tooling

●

Reference Architecture
○

How to build an “efficient, robust, scalable, auditable,
performing and maintainable” ETL solution with WSO2
EMP?

●

Demo - Data Mapping With WSO2 Developer Studio

●

Summary

●

Q&A
A Classic Use Case - Financial Sector

Flat files
Financial
Reporting
RDBMS

ETL
Process

Enterprise
Data
Warehouse

Revenue
Predictions

XML, Web
Services
Other
Analytics &
BI fronts
What’s ETL? - Traditional Interpretation
●

Extract

●

Transform

●

Load
What’s ETL? - Modern Interpretation
●

Extract

●

Monitor

●

Profile/Audit

●

Analyze

●

Cleanse

●

Transform

●

Load
Why ETL?
●

●

Generally, to build and maintain data repositories with
“single version of the truth” out of the multiple
heterogenous data sources scattered across an
organization or a business domain.
Then, the business users can use that data for,
○
Predictive Analysis
○
Revenue predictions and comparisons
○
Monitor Overall Growth of an organization
○
Business Policies
○
Strategic Decisions
Challenges
●

Data definition establishment

●

Need for expert knowledge

●

Scalability and Performance

●

Business user acceptance and seamless support for wide
range of business use cases

●

Maintenance, Data Archival

●

Real-time or Near Real-time data synchronization
Why Standalone ETL Products Are Dead?
●

●

●

●

Modern day organizations are evolving as it’s never been
before.
Tendency to adopt architecture patterns such as SOA to
reduce IT costs and have flexible business processes is
rapidly increasing.
Organizations are more focussed towards “Connected
businesses”.
Thus, it’s very likely that an organization might have a IT
infrastructure in place already.
Why Standalone ETL Products Are Dead?
●

●

●

●

Adopting a standalone ETL product? Possible, but
worthwhile?
Generally less support for open standards. Extension
points? Connectors? More custom code!
Usually, relies on some proprietary data integration
patterns, inducing high maintenance costs.
Additional licensing costs, need for separate
expert/operational assistance, again inducing high
maintenance costs.
Why Standalone ETL Products Are Dead?
●

Tendency to use in-house re-usable business components
leveraging the benefits of SOA

●

Less operational costs

●

Scalability is a main focus nowadays.

●

Having a similar process implemented enables, horizontal
scalability at different layers as the need arises.
Re-Architecting A System’s DIL?
●

●

Data Integration is always cumbersome
Need for ensuring policy compliance of data at its target
containers. (usually Enterprise Data Warehouses, Central
MDM repositories, etc)

●

Flexibility

●

Ensuring acceptable Performance

●

What about Reliability?
Re-Architecting A System’s DIL?
●

How to deal with the freshness of data?

●

When to synchronize?

●

Need for tuning the system to meet various SLAs
Impact Of Tooling

Scripts

XSLT

Custom Code
Impact Of Tooling
●

●

●

●

Numerous ETL solutions fail because of the lack of tooling.
Developers/Solution composers are left with manual coding
of XSLT, Custom mappers, etc.
Not scalable!
Often requires a powerful flexible tooling platform
particularly, as the system grows and matures.
Reference Architecture
Reference Architecture - Big Picture
BAM

ESB

MB

MB

DSS

DSS

DS

Enterprise DW
Reference Architecture - Reliable extraction

ESB

MB

DSS
Scheduled
Tasks

DS
Reference Architecture - Validate & Transform
WSO2 Data
Mapper

Input Data
Model

Data Model X

ESB

Output Data
Model

Data Model Y
Reference Architecture - Auditing
Data Policy
Compliance
Reports/
Dashboards

Data Quality
Reports/
Dashboards

BAM

ESB
Reference Architecture - Reliable Loading

ESB

MB

DSS

Enterprise DW
Tooling - Smooks Editor
Tooling - WSO2 Data Mapper
Demo
●

Building a transformation between two simple data models
using the Smooks Editor shipped with WSO2 Developer
Studio.
Summary
●

●

●

●

ETL, plays a pivotal role in any business organization.
Often requires a lot of effort put into implementing a
proper ETL process within an organization.
Standalone ETL solutions can be costly.
Re-architecting data models is made easy with WSO2
Enterprise Middleware Platform.
References
[1] How to use the Smooks Editor shipped with WSO2
Developer Studio
http://wso2.
com/library/tutorials/2011/06/perform-data-mapping-smookseditor-wso2-carbon-studio/
Q&A
ETL with WSO2 Enterprise Middleware Platform

ETL with WSO2 Enterprise Middleware Platform

  • 1.
    ETL with WSO2 EnterpriseMiddleware Platform Prabath Abeysekara - Associate Technical Lead
  • 4.
    Outline ● A Classic UseCase ● What’s ETL and How It Is Interpreted In The Modern World? ● Why ETL? ● Challenges In Implementing ETL Solutions ● Why Traditional Standalone ETL Products Are Considered Dead In The Modern World? ● What Factors To Be Considered When Implementing ETL In Re-Architecting A System?
  • 5.
    Outline contd.. ● Impact OfTooling ● Reference Architecture ○ How to build an “efficient, robust, scalable, auditable, performing and maintainable” ETL solution with WSO2 EMP? ● Demo - Data Mapping With WSO2 Developer Studio ● Summary ● Q&A
  • 6.
    A Classic UseCase - Financial Sector Flat files Financial Reporting RDBMS ETL Process Enterprise Data Warehouse Revenue Predictions XML, Web Services Other Analytics & BI fronts
  • 7.
    What’s ETL? -Traditional Interpretation ● Extract ● Transform ● Load
  • 8.
    What’s ETL? -Modern Interpretation ● Extract ● Monitor ● Profile/Audit ● Analyze ● Cleanse ● Transform ● Load
  • 9.
    Why ETL? ● ● Generally, tobuild and maintain data repositories with “single version of the truth” out of the multiple heterogenous data sources scattered across an organization or a business domain. Then, the business users can use that data for, ○ Predictive Analysis ○ Revenue predictions and comparisons ○ Monitor Overall Growth of an organization ○ Business Policies ○ Strategic Decisions
  • 10.
    Challenges ● Data definition establishment ● Needfor expert knowledge ● Scalability and Performance ● Business user acceptance and seamless support for wide range of business use cases ● Maintenance, Data Archival ● Real-time or Near Real-time data synchronization
  • 11.
    Why Standalone ETLProducts Are Dead? ● ● ● ● Modern day organizations are evolving as it’s never been before. Tendency to adopt architecture patterns such as SOA to reduce IT costs and have flexible business processes is rapidly increasing. Organizations are more focussed towards “Connected businesses”. Thus, it’s very likely that an organization might have a IT infrastructure in place already.
  • 12.
    Why Standalone ETLProducts Are Dead? ● ● ● ● Adopting a standalone ETL product? Possible, but worthwhile? Generally less support for open standards. Extension points? Connectors? More custom code! Usually, relies on some proprietary data integration patterns, inducing high maintenance costs. Additional licensing costs, need for separate expert/operational assistance, again inducing high maintenance costs.
  • 13.
    Why Standalone ETLProducts Are Dead? ● Tendency to use in-house re-usable business components leveraging the benefits of SOA ● Less operational costs ● Scalability is a main focus nowadays. ● Having a similar process implemented enables, horizontal scalability at different layers as the need arises.
  • 14.
    Re-Architecting A System’sDIL? ● ● Data Integration is always cumbersome Need for ensuring policy compliance of data at its target containers. (usually Enterprise Data Warehouses, Central MDM repositories, etc) ● Flexibility ● Ensuring acceptable Performance ● What about Reliability?
  • 15.
    Re-Architecting A System’sDIL? ● How to deal with the freshness of data? ● When to synchronize? ● Need for tuning the system to meet various SLAs
  • 16.
  • 17.
    Impact Of Tooling ● ● ● ● NumerousETL solutions fail because of the lack of tooling. Developers/Solution composers are left with manual coding of XSLT, Custom mappers, etc. Not scalable! Often requires a powerful flexible tooling platform particularly, as the system grows and matures.
  • 18.
  • 19.
    Reference Architecture -Big Picture BAM ESB MB MB DSS DSS DS Enterprise DW
  • 20.
    Reference Architecture -Reliable extraction ESB MB DSS Scheduled Tasks DS
  • 21.
    Reference Architecture -Validate & Transform WSO2 Data Mapper Input Data Model Data Model X ESB Output Data Model Data Model Y
  • 22.
    Reference Architecture -Auditing Data Policy Compliance Reports/ Dashboards Data Quality Reports/ Dashboards BAM ESB
  • 23.
    Reference Architecture -Reliable Loading ESB MB DSS Enterprise DW
  • 24.
  • 25.
    Tooling - WSO2Data Mapper
  • 26.
    Demo ● Building a transformationbetween two simple data models using the Smooks Editor shipped with WSO2 Developer Studio.
  • 27.
    Summary ● ● ● ● ETL, plays apivotal role in any business organization. Often requires a lot of effort put into implementing a proper ETL process within an organization. Standalone ETL solutions can be costly. Re-architecting data models is made easy with WSO2 Enterprise Middleware Platform.
  • 28.
    References [1] How touse the Smooks Editor shipped with WSO2 Developer Studio http://wso2. com/library/tutorials/2011/06/perform-data-mapping-smookseditor-wso2-carbon-studio/
  • 29.