DATA VIRTUALIZATION
&
INFORMATION AS A SERVICE
(IAAS)
By Anil Allewar
Senior Solutions Architect - Synerzip
1
About Me!!
2
Anil Allewar
Senior Solutions Architect @
Synerzip
Technology Evangelist &
speaker
Core interests: JEE, EAI, EII
• Use cases
Agenda
3
• What does it mean?
• Implementation Frameworks
• Demo
• Questions?
• Architecture explained
Why it makes sense?
4
Use Cases
Data
Warehouse
ETL
Financial
Data
OLTP
Data
ETL
3rd Party
Data
Data
Mart
ETL
Web
Service 1
Web
Service 2
Legacy
Data
Custom
Program
Excel
files
5
Traditional Data Integration
6
Enterprise Information System
ETL
Source
System
Source
System
ETL
Business Applications
Problems with ETL
7
More than 1 copy of
data for staging
Intermediate data =>
Errors
Lead time to add new
source
Domain knowledge for
mapping
Batch Process => No
real time data
Problems with DBMS consolidation
8
Alternate approach =>
Single EIS (say RDBMS)
Extensive changes to
existing apps
Might not satisfy
everyone’s requirements
• Use cases
Agenda
9
• What does it mean?
• Implementation Frameworks
• Demo
• Questions?
• Architecture explained
Data Virtualization & Federation
10
Single API to access
data
Only metadata stored
at virtualization layer
Real time access without
copying/moving data
Federate data across
hetero/homogenous
sources
Data Virtualization
11
• Use cases
Agenda
12
• What does it mean?
• Implementation Frameworks
• Demo
• Questions?
• Architecture explained
Architecture
13
User
Application
CommonAccess
API
Connector 1
Connector 2
RUNTIME & QUERY
ENGINE
Virtual
Database
Translator
1
Translator
2
• Use cases
Agenda
14
• What does it mean?
• Implementation Frameworks
• Demo
• Questions?
• Architecture explained
Vendors
15
 Commercial Products
 Composite Software
 http://www.compositesw.com/data-virtualization/
 Denodo
 http://www.denodo.com/en/product/overview.php?n=h
 IBM
 http://www-03.ibm.com/software/products/en/ibminfofedeserv
 Informatica
 http://www.informatica.com/us/data-virtualization/
 Red Hat
 http://www.redhat.com/products/jbossenterprisemiddleware/data-virtualization/
 Open Source
 Jboss Teiid
 http://teiid.jboss.org/
Selected Platform – JBoss Teiid
16
Open Source
Number of
relational/NoSQL/E
RP/CRM data stores
JEE standards
Add custom EIS
support using JEE
components
Active & responsive
community Synerzip contribution: Defect
discovery, root cause analysis,
feature verification
Teiid Components
17
 Virtual Database
 container for components used to integrate data from
multiple data sources
 Source Models
 structure and characteristics of physical data sources
 View Models
 structure and characteristics of abstract structures you want to expose to your applications
 Teiid Designer
 Eclipse based UI to dynamically discover data source
objects and apply data federation
 Generate virtual database from 1 or more sources
Teiid Components
18
 Translator
 Provides abstraction later between Teiid Query Engine and
source system
 Convert Teiid SQL commands to source specific execution
commands
 Convert result data from source system to Teiid specific
format
 Resource Adapter
 Provides connectivity to the physical data source
 Integration provided through Java Connector Architecture
(JCA) API
Teiid – Supported EIS
 Amazon SimpleDB
 Apache Accumulo
 Apache SOLR
 Cassandra
 File
 Google Spreadsheet
 JPA
 LDAP
 Excel – as file
 SalesForce
 JDBC
 MS access, DB2, derby, excel-
odbc, greenplum, h2 , hive(for
accessing Hadoop), oracle,
teradata and most RDBMS
 MongoDB
 Object
 OData
 OLAP
 Web Services
 SAP Netweaver Gateway
19
Performance Characteristics
20
 Access same data using Oracle and Teiid drivers
 Retrieval times comparable when accessing tables having no
Blobs
0
5,000
10,000
15,000
20,000
25,000
No. of rows Vs Time: No Blobs
Oracle-JDBC
Teiid-JDBC
No. of rows
ms
Performance Characteristics
21
 Teiid slower when accessing Blob data
 Can be tuned
0
5,000
10,000
15,000
20,000
25,000
30,000
0 0 2 42 21,804 32,531 185,454
No. of rows Vs Time: Blobs
Oracle-JDBC
Teiid-JDBC
ms
No. of rows
• Use cases
Agenda
22
• What does it mean?
• Implementation Frameworks
• Demo
• Questions?
• Architecture explained
Demo
23
JDBC
Client
JDBC
API
RDBMS
Resource
Adapter
MongoDB
Resource
Adapter
TEIID RUNTIME &
QUERY ENGINE
Federated
VDB
mySQL
Translator
MongoDB
Translator
mySQL
Demo-Steps
24
 Pre-requisites
 mySQL server 5.5+ installed
 MongoDB 2.4.x+ installed
 Steps
 Load the mySql and MongoDB database with sample data
 Setup environment – JBoss, Eclipse
 Create Teiid project in Eclipse using Teiid designer
 Import source model using JDBC
 Create the virtual model and federate data from the source model
 Create a virtual database (VDB) and deploy to JBoss
 Access data using JDBC client or through browser using OData
Demo – Scenario
25
Federated
Data
Demo – Connection Profile
26
Demo – Source Model
27
Demo - Source Model Generation
28
Demo – Map Source To View
29
Demo - Association
30
Demo – Data Federation
31
Demo – Source Code
32
 Source code
 https://github.com/anilallewar/JBoss-Teiid
 Contains
 Configuration files
 Instructions
 “How-to” videos
 VDBs, source models and view models
Conclusion
33
 Data Virtualization and Federation is a rapidly
emerging technology that solves traditional BI/ETL
problems.
 It provides lower time to market, distributes data
across the enterprise as a service and provides real
time access to enterprise data.

Data virtualization, Data Federation & IaaS with Jboss Teiid

  • 1.
    DATA VIRTUALIZATION & INFORMATION ASA SERVICE (IAAS) By Anil Allewar Senior Solutions Architect - Synerzip 1
  • 2.
    About Me!! 2 Anil Allewar SeniorSolutions Architect @ Synerzip Technology Evangelist & speaker Core interests: JEE, EAI, EII
  • 3.
    • Use cases Agenda 3 •What does it mean? • Implementation Frameworks • Demo • Questions? • Architecture explained
  • 4.
    Why it makessense? 4
  • 5.
  • 6.
    Traditional Data Integration 6 EnterpriseInformation System ETL Source System Source System ETL Business Applications
  • 7.
    Problems with ETL 7 Morethan 1 copy of data for staging Intermediate data => Errors Lead time to add new source Domain knowledge for mapping Batch Process => No real time data
  • 8.
    Problems with DBMSconsolidation 8 Alternate approach => Single EIS (say RDBMS) Extensive changes to existing apps Might not satisfy everyone’s requirements
  • 9.
    • Use cases Agenda 9 •What does it mean? • Implementation Frameworks • Demo • Questions? • Architecture explained
  • 10.
    Data Virtualization &Federation 10 Single API to access data Only metadata stored at virtualization layer Real time access without copying/moving data Federate data across hetero/homogenous sources
  • 11.
  • 12.
    • Use cases Agenda 12 •What does it mean? • Implementation Frameworks • Demo • Questions? • Architecture explained
  • 13.
    Architecture 13 User Application CommonAccess API Connector 1 Connector 2 RUNTIME& QUERY ENGINE Virtual Database Translator 1 Translator 2
  • 14.
    • Use cases Agenda 14 •What does it mean? • Implementation Frameworks • Demo • Questions? • Architecture explained
  • 15.
    Vendors 15  Commercial Products Composite Software  http://www.compositesw.com/data-virtualization/  Denodo  http://www.denodo.com/en/product/overview.php?n=h  IBM  http://www-03.ibm.com/software/products/en/ibminfofedeserv  Informatica  http://www.informatica.com/us/data-virtualization/  Red Hat  http://www.redhat.com/products/jbossenterprisemiddleware/data-virtualization/  Open Source  Jboss Teiid  http://teiid.jboss.org/
  • 16.
    Selected Platform –JBoss Teiid 16 Open Source Number of relational/NoSQL/E RP/CRM data stores JEE standards Add custom EIS support using JEE components Active & responsive community Synerzip contribution: Defect discovery, root cause analysis, feature verification
  • 17.
    Teiid Components 17  VirtualDatabase  container for components used to integrate data from multiple data sources  Source Models  structure and characteristics of physical data sources  View Models  structure and characteristics of abstract structures you want to expose to your applications  Teiid Designer  Eclipse based UI to dynamically discover data source objects and apply data federation  Generate virtual database from 1 or more sources
  • 18.
    Teiid Components 18  Translator Provides abstraction later between Teiid Query Engine and source system  Convert Teiid SQL commands to source specific execution commands  Convert result data from source system to Teiid specific format  Resource Adapter  Provides connectivity to the physical data source  Integration provided through Java Connector Architecture (JCA) API
  • 19.
    Teiid – SupportedEIS  Amazon SimpleDB  Apache Accumulo  Apache SOLR  Cassandra  File  Google Spreadsheet  JPA  LDAP  Excel – as file  SalesForce  JDBC  MS access, DB2, derby, excel- odbc, greenplum, h2 , hive(for accessing Hadoop), oracle, teradata and most RDBMS  MongoDB  Object  OData  OLAP  Web Services  SAP Netweaver Gateway 19
  • 20.
    Performance Characteristics 20  Accesssame data using Oracle and Teiid drivers  Retrieval times comparable when accessing tables having no Blobs 0 5,000 10,000 15,000 20,000 25,000 No. of rows Vs Time: No Blobs Oracle-JDBC Teiid-JDBC No. of rows ms
  • 21.
    Performance Characteristics 21  Teiidslower when accessing Blob data  Can be tuned 0 5,000 10,000 15,000 20,000 25,000 30,000 0 0 2 42 21,804 32,531 185,454 No. of rows Vs Time: Blobs Oracle-JDBC Teiid-JDBC ms No. of rows
  • 22.
    • Use cases Agenda 22 •What does it mean? • Implementation Frameworks • Demo • Questions? • Architecture explained
  • 23.
  • 24.
    Demo-Steps 24  Pre-requisites  mySQLserver 5.5+ installed  MongoDB 2.4.x+ installed  Steps  Load the mySql and MongoDB database with sample data  Setup environment – JBoss, Eclipse  Create Teiid project in Eclipse using Teiid designer  Import source model using JDBC  Create the virtual model and federate data from the source model  Create a virtual database (VDB) and deploy to JBoss  Access data using JDBC client or through browser using OData
  • 25.
  • 26.
  • 27.
  • 28.
    Demo - SourceModel Generation 28
  • 29.
    Demo – MapSource To View 29
  • 30.
  • 31.
    Demo – DataFederation 31
  • 32.
    Demo – SourceCode 32  Source code  https://github.com/anilallewar/JBoss-Teiid  Contains  Configuration files  Instructions  “How-to” videos  VDBs, source models and view models
  • 33.
    Conclusion 33  Data Virtualizationand Federation is a rapidly emerging technology that solves traditional BI/ETL problems.  It provides lower time to market, distributes data across the enterprise as a service and provides real time access to enterprise data.

Editor's Notes

  • #8 Require more than 1 copy of data for staging Creating, storing and manipulating this intermediate data can lead to errors in data quality Lead time required to add data from new sources Depends on domain knowledge of mapping entities between different data sources Batch processing – information lagging behind real time data
  • #9 Alternate approach is to move all enterprise data to a common Enterprise Information System (typically RDBMS) Extensive changes to existing applications resulting in end user impact Might not satisfy every group’s requirements – say group 1 has partitioned data but the target RDBMS doesn’t support partitioning
  • #11 Single API to access data from heterogeneous sources Only metadata stored at virtualization layer Real time access of data without copying/moving data from the source Enterprise Information System (EIS) Federate data across multiple heterogeneous/homogenous sources An enterprise information system (EIS) is any kind of information system which improves the functions of an enterprise business processes by integration. An EIS could use a database/web service/flat files or any other custom system for storing this information.
  • #17 Jboss Teiid Open Source  Supports number of relational and non relational data sources Integrated with the JBoss Application Server and JEE architecture Ability to add custom data sources using standard JEE components Very active and responsive community
  • #20 Amazon SimpleDB - web service for running queries on structured data in real time Apache Accumulo - sorted, distributed key value store Apache SOLR - search system for indexing data/services Cassandra - NoSQL database File - exposes stored procedures to leverage file system resources JPA - reverse a JPA object model into a relational model LDAP - exposes an LDAP directory tree relationally MongoDB - NoSQL database Object - reading java objects from external sources (i.e., Infinispan Cache or Map cache) OData - Consume OData web services and also act as web server to expose VDB as an OData service OLAP - online analytical processing exposing data as 3-D arrays called cubes SalesForce - CRM product SAP Netweaver Gateway - Web service calls to SAP Web Services - exposes stored procedures for calling web services