This webinar covers data virtualization – concept, architecture, and a demo using Teiid, MongoDB, and MySQL technologies.
Data virtualization enables abstraction, transformation, federation and delivery of data taken from a variety of heterogeneous data sources as if it is a single virtual data source without the need to physically copy the data for integration.
Read more at https://www.synerzip.com/webinar/data-virtualization-and-information-as-a-service-webinar/
10. www.synerzip.comwww.synerzip.com
Data Virtualization & Federation
Confidential 10
Single API to access
data
Only metadata stored
at virtualization layer
Real time access
without
copying/moving data
Federate data across
hetero/homogenous
sources
15. www.synerzip.comwww.synerzip.com
Vendors
• Commercial Products
– Composite Software
• http://www.compositesw.com/data-virtualization/
– Denodo
• http://www.denodo.com/en/product/overview.php?n=h
– IBM
• http://www-03.ibm.com/software/products/en/ibminfofedeserv
– Informatica
• http://www.informatica.com/us/data-virtualization/
– Red Hat
• http://www.redhat.com/products/jbossenterprisemiddleware/data-virtualization/
• Open Source
– Jboss Teiid
• http://teiid.jboss.org/
Confidential 15
16. www.synerzip.comwww.synerzip.com
Selected Platform – JBoss Teiid
Confidential 16
Open Source
Number of
relational/NoSQL/E
RP/CRM data stores
JEE standards
Add custom EIS
support using JEE
components
Active & responsive
community Synerzip contribution: Defect
discovery, root cause analysis,
feature verification
17. www.synerzip.comwww.synerzip.com
Teiid Components
• Virtual Database
– container for components used to integrate data from
multiple data sources
• Source Models
– structure and characteristics of physical data sources
• View Models
– structure and characteristics of abstract structures you want to expose to your
applications
• Teiid Designer
– Eclipse based UI to dynamically discover data source
objects and apply data federation
– Generate virtual database from 1 or more sources
Confidential 17
18. www.synerzip.comwww.synerzip.com
Teiid Components
• Translator
– Provides abstraction later between Teiid Query
Engine and source system
– Convert Teiid SQL commands to source specific
execution commands
– Convert result data from source system to Teiid
specific format
• Resource Adapter
– Provides connectivity to the physical data source
– Integration provided through Java Connector
Architecture (JCA) API
Confidential 18
19. www.synerzip.comwww.synerzip.com
Teiid – Supported EIS
• Amazon SimpleDB
• Apache Accumulo
• Apache SOLR
• Cassandra
• File
• Google Spreadsheet
• JPA
• LDAP
• Excel – as file
• SalesForce
• JDBC
– MS access, DB2, derby, excel-
odbc, greenplum, h2 ,
hive(for accessing Hadoop),
oracle, teradata and most
RDBMS
• MongoDB
• Object
• OData
• OLAP
• Web Services
• SAP Netweaver Gateway
Confidential 19
20. www.synerzip.comwww.synerzip.com
Performance Characteristics
• Access same data using Oracle and Teiid drivers
– Retrieval times comparable when accessing tables
having no Blobs
Confidential 20
0
5,000
10,000
15,000
20,000
25,000
No. of rows Vs Time: No Blobs
Oracle-JDBC
Teiid-JDBC
No. of rows
ms
24. www.synerzip.comwww.synerzip.com
Demo-Steps
• Pre-requisites
– mySQL server 5.5+ installed
– MongoDB 2.4.x+ installed
• Steps
– Load the mySql and MongoDB database with sample data
– Setup environment – JBoss, Eclipse
– Create Teiid project in Eclipse using Teiid designer
• Import source model using JDBC
• Create the virtual model and federate data from the source
model
• Create a virtual database (VDB) and deploy to JBoss
– Access data using JDBC client or through browser using OData
Confidential 24
33. www.synerzip.comwww.synerzip.com
Conclusion
• Data Virtualization and Federation is
a rapidly emerging technology that
solves traditional BI/ETL problems.
• It provides lower time to market,
distributes data across the enterprise
as a service and provides real time
access to enterprise data.
Confidential 33
37. www.synerzip.comwww.synerzip.com
Synerzip in a Nutshell
1. Software product development partner for small/mid-sized
technology companies
• Exclusive focus on small/mid-sized technology companies, typically
venture-backed companies in growth phase
• By definition, all Synerzip work is the IP of its respective clients
• Deep experience in full SDLC – design, dev, QA/testing, deployment
2. Dedicated team of high caliber software professionals for
each client
• Seamlessly extends client’s local team, offering full transparency
• Stable teams with very low turn-over
• NOT just “staff augmentation”, but provide full mgmt support
3. Actually reduces risk of development/delivery
• Experienced team - uses appropriate level of engineering discipline
• Practices Agile development – responsive, yet disciplined
4. Reduces cost – dual-shore team, 50% cost advantage
5. Offers long term flexibility – allows (facilitates) taking
offshore team captive – aka “BOT” option
Require more than 1 copy of data for staging
Creating, storing and manipulating this intermediate data can lead to errors in data quality
Lead time required to add data from new sources
Depends on domain knowledge of mapping entities between different data sources
Batch processing – information lagging behind real time data
Alternate approach is to move all enterprise data to a common Enterprise Information System (typically RDBMS)
Extensive changes to existing applications resulting in end user impact
Might not satisfy every group’s requirements – say group 1 has partitioned data but the target RDBMS doesn’t support partitioning
Single API to access data from heterogeneous sources
Only metadata stored at virtualization layer
Real time access of data without copying/moving data from the source Enterprise Information System (EIS)
Federate data across multiple heterogeneous/homogenous sources
An enterprise information system (EIS) is any kind of information system which improves the functions of an enterprise business processes by integration. An EIS could use a database/web service/flat files or any other custom system for storing this information.
Jboss Teiid
Open Source
Supports number of relational and non relational data sources
Integrated with the JBoss Application Server and JEE architecture
Ability to add custom data sources using standard JEE components
Very active and responsive community
Amazon SimpleDB - web service for running queries on structured data in real time
Apache Accumulo - sorted, distributed key value store
Apache SOLR - search system for indexing data/services
Cassandra - NoSQL database
File - exposes stored procedures to leverage file system resources
JPA - reverse a JPA object model into a relational model
LDAP - exposes an LDAP directory tree relationally
MongoDB - NoSQL database
Object - reading java objects from external sources (i.e., Infinispan Cache or Map cache)
OData - Consume OData web services and also act as web server to expose VDB as an OData service
OLAP - online analytical processing exposing data as 3-D arrays called cubes
SalesForce - CRM product
SAP Netweaver Gateway - Web service calls to SAP
Web Services - exposes stored procedures for calling web services