Data Virtualization is any approach to data management that allows an application to retrieve and
manipulate data without requiring technical details about the data, such as how it is formatted or
where it is physically located.
Unlike the traditional extract, transform, load ("ETL") process, the data remains in place, and real-
time access is given to the source system for the data, thus reducing the risk of data errors and
reducing the workload of moving data around that may never be used.
Unlike Data Federation it does not attempt to impose a single data model on the data
(heterogeneous data). The technology also supports the writing of transaction data updates back to
the source systems.
To resolve differences in source and consumer formats and semantics, various abstraction and
transformation techniques are used.
This concept and software is a subset of data integration and is commonly used within business
intelligence, service-oriented architecture data services, cloud computing, enterprise search, and
master data management.
Data Virtualization software is an enabling technology which provides some or all of the following
• Abstraction – Abstract the technical aspects of stored data, such as location, storage
structure, API, access language, and storage technology.
• Virtualized Data Access – Connect to different data sources and make them accessible
from a common logical data access point.
• Transformation – Transform, improve quality, reformat, etc. source data for consumer
• Data Federation – Combine results sets from across multiple source systems.
• Data Delivery – Publish result sets as views and/or data services executed by client
application or users when requested.
Data virtualization software may include functions for development, operation, and/or
Reduce risk of data errors
• Reduce systems workload through not moving data around
• Increase speed of access to data on a real-time basis
• Significantly reduce development and support time
• Increase governance and reduce risk through the use of policies
• Reduce data storage required
May impact Operational systems response time, particularly if under-scaled to cope with
unanticipated user queries or not tuned early on
• Does not impose a heterogeneous data model, meaning the user has to interpret the data,
unless combined with Data Federation and business understanding of the data
• Requires a defined Governance approach to avoid budgeting issues with the shared services
• Not suitable for recording the historic snapshots of data - data warehouse is better for this
• Change management "is a huge overhead, as any changes need to be accepted by all
applications and users sharing the same virtualization kit"
Red Hat JBoss Data Virtualization (Teiid)
JBoss Data Virtualization is a lean, virtual data integration solution that turns fragmented data into
actionable information at business speed. It aggregates data spread across physically diverse
systems, such as multiple databases, XML files, and Hadoop systems, and makes them appear as a
set of tables in a local database.
-Build Virtual Database Stores
Complete data provisioning, federation, integration, and management through the creation of
virtual logical data models.
-Access via Standard Means
Developers can use JBoss Developer Studio, DDL based Virtual Database deﬁnitions, and native
queries to access data.
-Supports Most Database Types
Support for Apache Hadoop, NoSQL, JBoss Data Grid, MongoDB as well as a variety of data
services like SAP and Salesforce.co
Why Red Hat JBoss Data Virtualization (Teiid)?
1. Familiar Interface: JDBC
Teiid has a very familiar interface: JDBC! Every Java developer is familiar with JDBC access to
data sources. Now leverage your knowledge of the JDBC standard to access all your data sources.
• JDBC 4.0 API
• DML SQL-92 support (with select SQL-99 and later features)
• Support for standard JDBC scalar functions
2. Familiar Query Language: SQL
Want to query non-SQL sources in the same way you do with SQL sources? With Teiid, you can!
You can access data from any types of sources, and interact with those sources using a single
of SQL - even if the native sources do not understand SQL!
• DML SQL-92 support (with select SQL-99 and later features)
• Issue SQL to any data source -- see currently supported sources
• Level the data access playing field, using one version of SQL dialect, scalar functions,
3. Multiple Sources Look Like One
With Teiid, you can join and union data that resides in very dissimilar data sources. Multiple
sources suddenly look like a single source to your application.
• Joins across data sources
• Unions across data sources
4. Easy To Deploy
The Teiid query engine is a Java component - it plugs right into your application, like any other
library. Deployment is simple.
• Embed in plain old Java app
• Deploy to app servers
• Available as a stand-alone server in JBoss Enterprise Data Services Platform
5. Eliminate Hand-coded Data Access Logic
Real applications often access more than one data source. We know that. Teiid technology from
MetaMatrix has been in the business of enterprise data integration since 1999. Many of you have
built your own frameworks to handle integrating multiple sources, and have realized the difficulty
of doing that in a generic manner that performs and scales well under real use conditions. Now you
can retire your custom frameworks and hand-coded logic, and use a dedicated query component
all your data access needs. This lets you focus on the logic on top of the data access layer rather
than the nuts and bolts of accessing heterogeneous data uniformly.
• Cheaper - than hand-coding and maintaining hand-coded integration, and re-inventing
integration logic on every project
• Better - than non-optimized integration logic that does not make use of a real query
• Faster - to implement your projects, leveraging the integration logic already built into
and reusing that logic on other projects
6. Battle Tested - and Improving
You don't want to be a guinea pig for someone's "product" experiments. Don't worry - with Teiid,
you won't have to. Teiid is a component form of the query engine that is the heart of the JBoss
Enterprise Data Services Platform (JBEDSP), which is used by large commercial organizations,
independent software vendors, and many federal agencies, including intelligence agencies
responsible for protecting citizens in the U.S. and other countries. These are organizations that
cannot and do not play with toys, so you can have confidence that our products have been put
through the ringer a number of times.
• Used by Fortune 500 companies and Government Intel agencies
• Used by independent software vendors
• Large data sets, small data sets, data sets with quirky characteristics
• Relational data, XML data, and data from sources you've never even heard of!
Part of being battle-tested is operating at expected levels of performance in a wide variety of
enterprise solutions. Teiid accounts for the unique requirements of integrating information across
disparate data sources.
• Cost-based optimizer
• Accounts for federating data across heterogeneous systems
• Caches result sets for user queries and queries to sources
8. Scriptable Integration
Teiid comes with an administrative shell that allows programatic access to administrative features.
9. Works Like a Charm - Fast
Your time is precious - we know that. You can't waste your time investigating every newfangled
product and solution marketed to you. With Teiid, you don't have to. In 30 minutes, you can
demonstrate to yourself that you can issue federated queries against 2 of your own databases.
• 30 minutes to get started
10. Tip of the Iceberg
Still not convinced? What if we told you that all this was merely the tip of the iceberg? That's right
there's more! Not only can you do more with the Teiid query engine, but everything you do can be
leveraged and extended with the Teiid Server and JBoss Enterprise Data Services Platform.
With Teiid Designer, you get the following additional functionality:
• Data abstraction through an Eclipse-based modeling tool
• Relational views - of any type of data
• XML views of non-XML data (XSD-compliant)
• Data Services - rapid design and deploy
• For Web services architectures
• For general services-oriented architectures (SOAs)
Moving up to the JBoss Enterprise Data Services Platform suite enables you to take advantage of
the following enterprise-level features:
• Extensive connectivity to enterprise sources
• Support for packaged applications such as SAP
• Authentication and authorization (entitlements)
• Integration of external authentication/user systems
• Model management
• Searchable metadata for dependency and impact analyses
• Monitoring and administration
• Enterprise administration and monitoring console
Teiid is a data
use data from
and services for
abstraction and federation, data is accessed and integrated in real-time across distributed data
copying or otherwise moving data from its system of record.
Query Engine The heart of Teiid is a high-performance query engine that processes
relational, XML, XQuery and procedural queries from federated
datasources. Features include support for homogeneous schemas,
heterogeneous schemas, transactions, and user defined functions.
Embedded An easy-to-use JDBC Driver that can embed the Query Engine in any
Server An enterprise ready, scalable, managable, runtime for the Query Engine
that runs inside JBoss AS that provides additional security, fault-
tolerance, and administrative features.
Connectors Teiid includes a rich set of Translators and Resource Adapters that
enable access to a variety of sources, including most relational databases,
web services, text files, and ldap. Need data from a different source? A
custom translators and resource adaptors can easily be developed.
Tools • Create - Use Teiid Designer to define virtual databases
containing views, procedures or even dynamic XML documents.
• Monitor & Manage - Use the Teiid Web Console with just the
AS or the Teiid RHQ plugin to control
any number of servers.
• Script - Use the Teiid AdminShell to automate administrative
and testing tasks.
The Virtual Database
A virtual database (or VDB) is a container for components used to integrate data from multiple
sources, so that they can be accessed in an integrated manner through a single, uniform API.
A VDB contains models, which define the structural characteristics of data sources, views, and
VDB Creation and Validation
There are two types VDBs available. Dynamic VDB is defined using a simple XML file. This
file defines the sources it is trying to integrate and then provides access through JDBC where user
queries can be written against this VDB using all the sources defined as if they are in single
Dynamic VDB does not offer view/abstact layers.
Teiid Designer, a Eclipse-based GUI tool can be used to create VDBs. This Eclipse-based tool lets
you not only define source models and import metadata and statistics from them, but also allows
you to define relational and XML views on top of those sources. This allows you to abstract the
structure of the information you expose to and use in your applications from the underlying
VDBs can contain one or more models representing the information to be integrated and exposed
consuming applications. Models must be in a valid state in order for the VDB to be used for data
access. Validation of a single model means that it must be in a self-consistent and complete state,
meaning that there are no "missing pieces" and no references to non-existent entities. Validation of
multiple models checks that all inter-model dependencies are present and resolvable.
A VDB must always be in a complete state, meaning that all information is contained within the
VDB itself -- there are no external dependencies.
Deploying a VDB for Data Access
After a VDB is defined, it must be deployed to the Teiid runtime to be accessed.
• The VDB needs to be deployed to a Teiid Server, if there are no errors during deployment
and underlying data sources are configured correctly, then VDB will be accessible to your
Accessing Multiple Sources Through a VDB
Once VDB is deployed, your VDB can be accessed through JDBC-SQL, SOAP (Web Services),
SOAP-SQL, or Xquery.
DBs, Translators and Resource Adaptors
VDBs contain two primary varieties of model types - Source and View models. Source models
represent the structure and characteristics of physical data sources, whereas view models represent
the structure and characteristics of abstract structures you want to expose to your applications.
Source models must be associated with a Translator and a Resource Adaptor. A Translator provides
a abstraction layer between Teiid Query Engine and physical data source, that knows how to
convert Teiid issued query commands into source specific commands and execute them using the
Resource Adaptor. It also have smarts to convert the result data that came from the physical source
into a form that Teiid Query engine is expecting.
A Resouce Adaptor provides the connectivity to the physical data source. This also provides way
natively issue commands and gather results. A Resource Adaptor can be a RDBMS data source,
Web Service, text file, connection to main frame etc. This is often is JCA Connector.
You can define configuration for Translators and Resource Adaptors in Teiid Designer. Once
defined, Translator information along with the JNDI name of the Resource Adaptor is stored with a
VDB, so that when a VDB is exchanged, the existing settings can be used.
Typically Resource Adaptor configuration information contains user-ids, passwords, URLs to the
physical data sources. This information is not stored with the VDB. These are automatically
by Designer for development purposes, however user need to migrate or create new ones for the
production environment themselfs using the provided tools like Admin Console.
VDB Execution in Teiid Designer
VDBs can be tested in Teiid Designer by issuing SQL queries in the SQL Explorer perspective. In
this way, you can iterate between defining your integration models and testing them out to see if
they are yielding the expected results.
Your VDB must define its Translator and Resource Adapter with all source models in order to be
VDB File Formats
VDBs are stored in an archive file format, similar to a standard Java JAR format.
Dynamic VDBs are XML files. The schema for the XML file can be found in the Teiid documents.
A model is a representation of a set of information constructs. A familiar model is the relational
model, which defines tables composed of columns and containing records of data. Another
model is the XML model, which defines hierarchical data sets.
In Teiid, models are used to define the entities, and relationships between those entities, required
fully define the integration of information sets so that they may be accessed in a uniform manner
using a single API and access protocol.
Source models define the structural and data characteristics of the information contained in data
sources. Teiid uses the information in source models to access the information in multiple sources,
so that from a user's viewpoint these all appear to be in a single source.
In addition to source models, Teiid provides the ability to define a variety of view models. These
can be used to define a layer of abstraction above the physical layer, so that information can be
presented to end users and consuming applications in business terms rather than as it is physically
stored. These business views can be in a variety of forms: relational, XML, or Web services. Views
are defined using transformations between models.
Types of Models
Teiid Designer can be used to model a variety of classes of models. Each of these represent a
conceptually different classification of models.
• Relational, which model data that can be represented in table – columns and records –
Relational models can represent structures found in relational databases,
spreadsheets, text files, or simple Web services.
• XML, which model the basic structures of XML documents. These can be “backed” by
XML Schemas. XML models represent nested structures, including recursive
• XML Schema, the W3C standard for formally defining the structure and constraints of
XML documents, as well as the datatypes defining permissible values in XML
• Web Services, which define Web service interfaces, operations, and operation input and
output parameters (in the form of XML Schemas).
• Model Extensions, for defining property name/value extensions to other model classes.
VDBs contain two primary varieties of model types - source and view. Source models represent
structure and characteristics of physical data sources, whereas view models represent the structure
and characteristics of abstract structures you want to expose to your applications.
Models and VDBs
Models used for data integration are packaged into a virtual database (VDB). The models must be
in a complete and consistent state when used for data integration. That is, the VDB must contain
the models and all resources they depend upon.
Models contained within a VDB can be imported into the Teiid Designer. In this way, VDBs can
used as a way to exchange a set of related models.
Models and Translators, Resource Adaptors
Source models must be configured with a Translator and a Resource Adaptor with them before a
VDB is tested in Designer or deployed for data access.
It is possible that multiple models may use the same settings, but each model must define these
Models must be in a valid state in order to be used for data access. Validation of a single model
means that it must be in a self-consistent and complete state, meaning that there are no "missing
pieces" and no references to non-existent entities. Validation of multiple models checks that all
inter-model dependencies are present and resolvable.
Models must always be validated when they are deployed in a VDB for data access purposes.
Model Execution in Teiid Designer
Models can be tested in the Teiid Designer by issuing SQL queries in the SQL Explorer
In this way, you can iterate between defining your integration models and testing them out to see if
they are yielding the expected results.
Models are stored in XML format, using the XMI syntax defined by the OMG.
Model files should never be modified "by hand". While it is possible to do so, there is the
possibility that you may corrupt the file such that it cannot be used within the JBoss Enterprise
Dynamic VDBs and Models
The information in this artical applies to the VDBs that are built using the Teiid Designer. If you
building Dynamic VDBs, much of the information does not apply in that case. However, even
Dynamic VDBs have models but they only define configuration for importing metadata and
Translators and Resource Adaptors.
Translators and Resource Adaptors
A Translator provides an abstraction layer between Teiid Query Engine and physical data source,
that knows how to convert Teiid issued query commands into source specific commands and
execute them using the Resource Adaptor. It also have smarts to convert the result data that came
from the physical source into a form that Teiid Query engine is expecting.
Teiid provides various pre-built translators for sources like Oracle, DB2, SQL Server, MySQL,
PostgreSQL, XML, File etc.
A Translator also defines the capabilities of a perticular source, like whether it can natively support
query joins (inner joins, cross joins etc) or support criteria.
A Transaltor along with its Resource Adaptor is always must be configured on a Source Model.
Cross-source queries issued against a VDB running in Teiid result in source queries being issued to
translator, which interact with the physical data sources.
A Translator is defined by using one of the default pre-built ones, or you can override the default
properties of the pre-built ones to define your own. The tooling will provide mechanisms to define
Check out "Developer's Guide" on how to create a custom Translator that works with your
A Resouce Adaptor provides the connectivity to the physical data source. This also provides way
natively issue commands to the source and gather results. A Resource Adaptor can be a RDBMS
data source, Web Service, text file, connection to main frame or to a custom source you defined.
This is often is JCA Connector, however there is no restriction how somebody provides the
connection semantics to the Translator.
However, if your source needs participate in distributed XA transactions, then this must be a JCA
connector. Other than providing transactions, JCA defines how to do configuration, packaging and
deployment. This also provides a standard interaction model with the Container, connection pools
etc. It can be used for more than just Teiid data integration purposes.
A instance of resouce adaptor is created by defining a "-ds.xml" file in the JBoss AS. This is same
operation that is used to create Data Sources in JBoss AS.
Check out the "Developer's Guide" on how to create a custom Resource Adaptor.
translator capabilities define what processing each translator/source combination can perform. For
example, most relational sources can process joins and unions, whereas when processing delimited
text files these operations cannot be performed by the resource adaptor or the "source" (in this
the file system).
Capabilities are used by the Teiid query engine to determine what subsets of the overall federated
query plan can be pushed down to each source involved in the query.
Translator capabilities define the capabilities of a source in terms of language features (joins,
criteria, functions, unions, sorts, etc). In addition, the source model defined in a virtual database
may specify additional constraints at the metadata level, such as whether a column can be used in
exact match or wildcard string match, whether tables and columns can be updated, etc. In
combination, these features can be used to more narrowly constrain how users access a source.
Resource Adaptors and Security
It is possible to use the security system of individual data sources if this is desired. When the
resource adapter is JCA connector, they can be configured with separate "security-domain" in their
"-ds.xml" files in the JBoss AS. However, calling thread need to login into the context before they
In Teiid, Translators and Resource adaptors can be configured and monitored using the Teiid
Console, or using the Teiid Server Administrative API.
A data service is a standards-based, uniform means of accessing information in a form useful to
Since data is rarely in a form required by applications and services, and is often not even in a
data source, a key requirement for data services is that they abstract the data from its physical
persistence structure, presenting it in a form that is closer to the needs of the using application.
effectively decouples consuming applications from the structure of the underlying data.
Hand-in-hand with abstraction, a federated query engine is required to execute the transformations
defining the abstraction layers in an efficient manner, and to expose the abstracted structures
through uniform and standard APIs.
The two key components of a data services architecture, then, are:
• Modeling environment, to define the abstraction layers -- views and Web services
• Execution environment, to actualize the abstract structures from the underlying data, and
expose them through standard APIs. A query engine is a required part of the execution
environment, to optimally federate data from multiple disparate sources.
See SOAs and Data Services for more information on the role data services play in an SOA.
Technical and Business Viewpoints
Data services can be viewed from both a technology vantage point, or from a business viewpoint.
The Technology Viewpoint
Teiid provides a suite of projects that provide data services to business applications. That is, Teiid
provides a means to access integrated data from multiple data sources, through your preferred
standards-based API. Teiid provides access to federated information through JDBC (SQL or
XQuery), ODBC (SQL or XQuery), and SOAP (Web services).
The Business Viewpoint
A more business- or user-centric view of data services is that they are information representations
required by business applications. From this perspective, data services are defined and designed by
business analysts, modelers, and developers to represent the information structures required by
business applications. Often, a key design goal is one of interoperability - the requirement that
systems work together seamlessly, including when exchanging data. Teiid provides graphical and
other tools for defining these interoperable data services, essentially relational and XML views
can be used by business applications in a semantically-meaningful manner.
These two viewpoints roughly correspond to the Execution and Modeling components of a data
services solution, respectively.
Data Services - An Essential Part of an SOA
Data services are a key part of a service-oriented architecure, or SOA. They provide the necessary
interface to data for all business services.
• Expose all data through a single uniform interface
• Provide a single point of access to all business services in the system
• Expose data using the same paradigm as business services - as "data services"
• Expose legacy data sources as data services
• Provide a uniform means of exposing/accessing metadata
• Provide a searchable interface to data and metadata
• Expose data relationships and semantics
• Provide uniform access controls to information
Service-Oriented Architectures and Data Services
Service-oriented architectures are all the rage these days, and for good reason. The guiding
principles of SOAs are based on lessons well-learned over the brief history of computing, most
notably that of decoupling of system components. It is these same principles that motivate the use
of data services in an SOA.
SOAs and Abstraction
Decoupling is the key concept in SOAs and is achieved through abstraction based on service
interfaces. Business processes in an SOA represent a formalized, executable form of the actual
enterprise's processes, but offer a layer of abstraction above the physical processes, be they
automated or manual. Business processes are composed of business services. Just as business
processes in an SOA represent an abstraction from their real-world counterparts, so do business
services offer an abstraction of actual physical services. Decoupling through abstraction imbues
SOAs with immense potential to model business operations independent of the IT infrastructure du
SOAs, as their name makes clear, are architectures. These architectures, as we've seen, involve
business processes composed of business services. Business processes and services both make use
of business information, which is likely resident in many different types and instances of databases
and files. This information can be exposed to business services using the same service-oriented
paradigm - as data services.
Just as business processes and services in an SOA represent abstractions - albeit executable ones -
of their real-world counterparts, so too do data services represent an abstraction of underlying
enterprise information. Data services expose information to business services in a form and
an interface amenable to those services.
The form is generally some representation of business objects to be manipulated by business
services and passed between services by business processes. Business objects may be simple
structures or complex nested structures. Almost always, though, they must be composed from
information residing in more than one data source, often in different persistence formats. So a key
requirement of data services is that they:
• expose integrated information in one or more desired formats, even if the original data are
The desired interface is dependent on the architecture being used. A Web service-based SOA will
provide a SOAP or REST-based interface to XML-formatted business objects. A more traditional
Java or C-language RCP-based architecture will require JDBC or ODBC access to tabular
information, obtained from multiple data sources. So, a second key requirement of data services is
• expose information through one or more consistent, standard interfaces, even if the
data are accessed through different interfaces.
These two key requirements of data services are achieved by two different technologies:
• modeling to define the required format of data, integrated from the underlying sources;
• a query engine for processing these abstract definitions efficiently, exposing the
information through one or more interfaces.
Together these form the basis for a data services architecture underpinning a robust SOA, making
data available to business processes and services in the required format and through consistent,