Data Virtualization
Data Virtualization is any approach to data management that allows an application to retrieve and
mani...
Drawbacks
May impact Operational systems response time, particularly if under-scaled to cope with
unanticipated user queri...
• Level the data access playing field, using one version of SQL dialect, scalar functions,
and
datatypes
3. Multiple Sourc...
7. Optimized
Part of being battle-tested is operating at expected levels of performance in a wide variety of
enterprise so...
Teiid is a data
virtualization
system that
allows
applications to
use data from
multiple,
heterogenous
data
stores.
Teiid ...
Virtual Databases
The Virtual Database
A virtual database (or VDB) is a container for components used to integrate data fr...
VDBs can contain one or more models representing the information to be integrated and exposed
to
consuming applications. M...
A Resouce Adaptor provides the connectivity to the physical data source. This also provides way
to
natively issue commands...
Models
A model is a representation of a set of information constructs. A familiar model is the relational
model, which def...
Types of Models
Teiid Designer can be used to model a variety of classes of models. Each of these represent a
conceptually...
Model Files
Models are stored in XML format, using the XMI syntax defined by the OMG.
Model files should never be modified...
A instance of resouce adaptor is created by defining a "-ds.xml" file in the JBoss AS. This is same
operation that is used...
Data Services
A data service is a standards-based, uniform means of accessing information in a form useful to
business app...
Data Services - An Essential Part of an SOA
Data services are a key part of a service-oriented architecure, or SOA. They p...
Service-Oriented Architectures and Data Services
Service-oriented architectures are all the rage these days, and for good ...
SOAs, as their name makes clear, are architectures. These architectures, as we've seen, involve
business processes compose...
Upcoming SlideShare
Loading in...5
×

Data virtualization

641

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
641
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Data virtualization"

  1. 1. Data Virtualization Data Virtualization is any approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located. Unlike the traditional extract, transform, load ("ETL") process, the data remains in place, and real- time access is given to the source system for the data, thus reducing the risk of data errors and reducing the workload of moving data around that may never be used. Unlike Data Federation it does not attempt to impose a single data model on the data (heterogeneous data). The technology also supports the writing of transaction data updates back to the source systems. To resolve differences in source and consumer formats and semantics, various abstraction and transformation techniques are used. This concept and software is a subset of data integration and is commonly used within business intelligence, service-oriented architecture data services, cloud computing, enterprise search, and master data management. Functionality Data Virtualization software is an enabling technology which provides some or all of the following capabilities: • Abstraction – Abstract the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology. • Virtualized Data Access – Connect to different data sources and make them accessible from a common logical data access point. • Transformation – Transform, improve quality, reformat, etc. source data for consumer use. • Data Federation – Combine results sets from across multiple source systems. • Data Delivery – Publish result sets as views and/or data services executed by client application or users when requested. Data virtualization software may include functions for development, operation, and/or management. Benefits Reduce risk of data errors • Reduce systems workload through not moving data around • Increase speed of access to data on a real-time basis • Significantly reduce development and support time • Increase governance and reduce risk through the use of policies • Reduce data storage required
  2. 2. Drawbacks May impact Operational systems response time, particularly if under-scaled to cope with unanticipated user queries or not tuned early on • Does not impose a heterogeneous data model, meaning the user has to interpret the data, unless combined with Data Federation and business understanding of the data • Requires a defined Governance approach to avoid budgeting issues with the shared services • Not suitable for recording the historic snapshots of data - data warehouse is better for this • Change management "is a huge overhead, as any changes need to be accepted by all applications and users sharing the same virtualization kit" Red Hat JBoss Data Virtualization (Teiid) JBoss Data Virtualization is a lean, virtual data integration solution that turns fragmented data into actionable information at business speed. It aggregates data spread across physically diverse systems, such as multiple databases, XML files, and Hadoop systems, and makes them appear as a set of tables in a local database. -Build Virtual Database Stores Complete data provisioning, federation, integration, and management through the creation of virtual logical data models. -Access via Standard Means Developers can use JBoss Developer Studio, DDL based Virtual Database definitions, and native queries to access data. -Supports Most Database Types Support for Apache Hadoop, NoSQL, JBoss Data Grid, MongoDB as well as a variety of data services like SAP and Salesforce.co Why Red Hat JBoss Data Virtualization (Teiid)? 1. Familiar Interface: JDBC Teiid has a very familiar interface: JDBC! Every Java developer is familiar with JDBC access to data sources. Now leverage your knowledge of the JDBC standard to access all your data sources. • JDBC 4.0 API • DML SQL-92 support (with select SQL-99 and later features) • Support for standard JDBC scalar functions 2. Familiar Query Language: SQL Want to query non-SQL sources in the same way you do with SQL sources? With Teiid, you can! You can access data from any types of sources, and interact with those sources using a single flavor of SQL - even if the native sources do not understand SQL! • DML SQL-92 support (with select SQL-99 and later features) • Issue SQL to any data source -- see currently supported sources
  3. 3. • Level the data access playing field, using one version of SQL dialect, scalar functions, and datatypes 3. Multiple Sources Look Like One With Teiid, you can join and union data that resides in very dissimilar data sources. Multiple sources suddenly look like a single source to your application. • Joins across data sources • Unions across data sources 4. Easy To Deploy The Teiid query engine is a Java component - it plugs right into your application, like any other Java library. Deployment is simple. • Embed in plain old Java app • Deploy to app servers • Available as a stand-alone server in JBoss Enterprise Data Services Platform 5. Eliminate Hand-coded Data Access Logic Real applications often access more than one data source. We know that. Teiid technology from MetaMatrix has been in the business of enterprise data integration since 1999. Many of you have built your own frameworks to handle integrating multiple sources, and have realized the difficulty of doing that in a generic manner that performs and scales well under real use conditions. Now you can retire your custom frameworks and hand-coded logic, and use a dedicated query component for all your data access needs. This lets you focus on the logic on top of the data access layer rather than the nuts and bolts of accessing heterogeneous data uniformly. • Cheaper - than hand-coding and maintaining hand-coded integration, and re-inventing integration logic on every project • Better - than non-optimized integration logic that does not make use of a real query engine • Faster - to implement your projects, leveraging the integration logic already built into Teiid, and reusing that logic on other projects 6. Battle Tested - and Improving You don't want to be a guinea pig for someone's "product" experiments. Don't worry - with Teiid, you won't have to. Teiid is a component form of the query engine that is the heart of the JBoss Enterprise Data Services Platform (JBEDSP), which is used by large commercial organizations, independent software vendors, and many federal agencies, including intelligence agencies responsible for protecting citizens in the U.S. and other countries. These are organizations that cannot and do not play with toys, so you can have confidence that our products have been put through the ringer a number of times. • Used by Fortune 500 companies and Government Intel agencies • Used by independent software vendors • Large data sets, small data sets, data sets with quirky characteristics • Relational data, XML data, and data from sources you've never even heard of!
  4. 4. 7. Optimized Part of being battle-tested is operating at expected levels of performance in a wide variety of enterprise solutions. Teiid accounts for the unique requirements of integrating information across disparate data sources. • Cost-based optimizer • Accounts for federating data across heterogeneous systems • Caches result sets for user queries and queries to sources 8. Scriptable Integration Teiid comes with an administrative shell that allows programatic access to administrative features. 9. Works Like a Charm - Fast Your time is precious - we know that. You can't waste your time investigating every newfangled product and solution marketed to you. With Teiid, you don't have to. In 30 minutes, you can demonstrate to yourself that you can issue federated queries against 2 of your own databases. • 30 minutes to get started 10. Tip of the Iceberg Still not convinced? What if we told you that all this was merely the tip of the iceberg? That's right - there's more! Not only can you do more with the Teiid query engine, but everything you do can be leveraged and extended with the Teiid Server and JBoss Enterprise Data Services Platform. With Teiid Designer, you get the following additional functionality: • Data abstraction through an Eclipse-based modeling tool • Relational views - of any type of data • XML views of non-XML data (XSD-compliant) • Data Services - rapid design and deploy • For Web services architectures • For general services-oriented architectures (SOAs) Moving up to the JBoss Enterprise Data Services Platform suite enables you to take advantage of the following enterprise-level features: • Extensive connectivity to enterprise sources • Support for packaged applications such as SAP • Security • Authentication and authorization (entitlements) • Integration of external authentication/user systems • Model management • Searchable metadata for dependency and impact analyses • Monitoring and administration • Enterprise administration and monitoring console
  5. 5. Teiid is a data virtualization system that allows applications to use data from multiple, heterogenous data stores. Teiid is comprised of tools, components and services for creating and executing bi- directional data services. Through abstraction and federation, data is accessed and integrated in real-time across distributed data sources without copying or otherwise moving data from its system of record. Teiid Parts Query Engine The heart of Teiid is a high-performance query engine that processes relational, XML, XQuery and procedural queries from federated datasources. Features include support for homogeneous schemas, heterogeneous schemas, transactions, and user defined functions. Embedded An easy-to-use JDBC Driver that can embed the Query Engine in any Java application. Server An enterprise ready, scalable, managable, runtime for the Query Engine that runs inside JBoss AS that provides additional security, fault- tolerance, and administrative features. Connectors Teiid includes a rich set of Translators and Resource Adapters that enable access to a variety of sources, including most relational databases, web services, text files, and ldap. Need data from a different source? A custom translators and resource adaptors can easily be developed. Tools • Create - Use Teiid Designer to define virtual databases containing views, procedures or even dynamic XML documents. • Monitor & Manage - Use the Teiid Web Console with just the AS or the Teiid RHQ plugin to control any number of servers. • Script - Use the Teiid AdminShell to automate administrative and testing tasks.
  6. 6. Virtual Databases The Virtual Database A virtual database (or VDB) is a container for components used to integrate data from multiple data sources, so that they can be accessed in an integrated manner through a single, uniform API. A VDB contains models, which define the structural characteristics of data sources, views, and Web services. VDB Creation and Validation There are two types VDBs available. Dynamic VDB is defined using a simple XML file. This XML file defines the sources it is trying to integrate and then provides access through JDBC where user queries can be written against this VDB using all the sources defined as if they are in single source. Dynamic VDB does not offer view/abstact layers. Teiid Designer, a Eclipse-based GUI tool can be used to create VDBs. This Eclipse-based tool lets you not only define source models and import metadata and statistics from them, but also allows you to define relational and XML views on top of those sources. This allows you to abstract the structure of the information you expose to and use in your applications from the underlying physical data structures.
  7. 7. VDBs can contain one or more models representing the information to be integrated and exposed to consuming applications. Models must be in a valid state in order for the VDB to be used for data access. Validation of a single model means that it must be in a self-consistent and complete state, meaning that there are no "missing pieces" and no references to non-existent entities. Validation of multiple models checks that all inter-model dependencies are present and resolvable. A VDB must always be in a complete state, meaning that all information is contained within the VDB itself -- there are no external dependencies. Deploying a VDB for Data Access After a VDB is defined, it must be deployed to the Teiid runtime to be accessed. • The VDB needs to be deployed to a Teiid Server, if there are no errors during deployment and underlying data sources are configured correctly, then VDB will be accessible to your client application. Accessing Multiple Sources Through a VDB Once VDB is deployed, your VDB can be accessed through JDBC-SQL, SOAP (Web Services), SOAP-SQL, or Xquery. DBs, Translators and Resource Adaptors VDBs contain two primary varieties of model types - Source and View models. Source models represent the structure and characteristics of physical data sources, whereas view models represent the structure and characteristics of abstract structures you want to expose to your applications. Source models must be associated with a Translator and a Resource Adaptor. A Translator provides a abstraction layer between Teiid Query Engine and physical data source, that knows how to convert Teiid issued query commands into source specific commands and execute them using the Resource Adaptor. It also have smarts to convert the result data that came from the physical source into a form that Teiid Query engine is expecting.
  8. 8. A Resouce Adaptor provides the connectivity to the physical data source. This also provides way to natively issue commands and gather results. A Resource Adaptor can be a RDBMS data source, Web Service, text file, connection to main frame etc. This is often is JCA Connector. You can define configuration for Translators and Resource Adaptors in Teiid Designer. Once defined, Translator information along with the JNDI name of the Resource Adaptor is stored with a VDB, so that when a VDB is exchanged, the existing settings can be used. Typically Resource Adaptor configuration information contains user-ids, passwords, URLs to the physical data sources. This information is not stored with the VDB. These are automatically created by Designer for development purposes, however user need to migrate or create new ones for the production environment themselfs using the provided tools like Admin Console. VDB Execution in Teiid Designer VDBs can be tested in Teiid Designer by issuing SQL queries in the SQL Explorer perspective. In this way, you can iterate between defining your integration models and testing them out to see if they are yielding the expected results. Your VDB must define its Translator and Resource Adapter with all source models in order to be executable. VDB File Formats VDBs are stored in an archive file format, similar to a standard Java JAR format. Dynamic VDBs are XML files. The schema for the XML file can be found in the Teiid documents.
  9. 9. Models A model is a representation of a set of information constructs. A familiar model is the relational model, which defines tables composed of columns and containing records of data. Another familiar model is the XML model, which defines hierarchical data sets. In Teiid, models are used to define the entities, and relationships between those entities, required to fully define the integration of information sets so that they may be accessed in a uniform manner using a single API and access protocol. Source models define the structural and data characteristics of the information contained in data sources. Teiid uses the information in source models to access the information in multiple sources, so that from a user's viewpoint these all appear to be in a single source. In addition to source models, Teiid provides the ability to define a variety of view models. These can be used to define a layer of abstraction above the physical layer, so that information can be presented to end users and consuming applications in business terms rather than as it is physically stored. These business views can be in a variety of forms: relational, XML, or Web services. Views are defined using transformations between models.
  10. 10. Types of Models Teiid Designer can be used to model a variety of classes of models. Each of these represent a conceptually different classification of models. • Relational, which model data that can be represented in table – columns and records – form. Relational models can represent structures found in relational databases, spreadsheets, text files, or simple Web services. • XML, which model the basic structures of XML documents. These can be “backed” by XML Schemas. XML models represent nested structures, including recursive hierarchies. • XML Schema, the W3C standard for formally defining the structure and constraints of XML documents, as well as the datatypes defining permissible values in XML documents. • Web Services, which define Web service interfaces, operations, and operation input and output parameters (in the form of XML Schemas). • Model Extensions, for defining property name/value extensions to other model classes. VDBs contain two primary varieties of model types - source and view. Source models represent the structure and characteristics of physical data sources, whereas view models represent the structure and characteristics of abstract structures you want to expose to your applications. Models and VDBs Models used for data integration are packaged into a virtual database (VDB). The models must be in a complete and consistent state when used for data integration. That is, the VDB must contain all the models and all resources they depend upon. Models contained within a VDB can be imported into the Teiid Designer. In this way, VDBs can be used as a way to exchange a set of related models. Models and Translators, Resource Adaptors Source models must be configured with a Translator and a Resource Adaptor with them before a VDB is tested in Designer or deployed for data access. It is possible that multiple models may use the same settings, but each model must define these configurations. Model Validation Models must be in a valid state in order to be used for data access. Validation of a single model means that it must be in a self-consistent and complete state, meaning that there are no "missing pieces" and no references to non-existent entities. Validation of multiple models checks that all inter-model dependencies are present and resolvable. Models must always be validated when they are deployed in a VDB for data access purposes. Model Execution in Teiid Designer Models can be tested in the Teiid Designer by issuing SQL queries in the SQL Explorer perspective. In this way, you can iterate between defining your integration models and testing them out to see if they are yielding the expected results.
  11. 11. Model Files Models are stored in XML format, using the XMI syntax defined by the OMG. Model files should never be modified "by hand". While it is possible to do so, there is the possibility that you may corrupt the file such that it cannot be used within the JBoss Enterprise Data Services Platform. Dynamic VDBs and Models The information in this artical applies to the VDBs that are built using the Teiid Designer. If you are building Dynamic VDBs, much of the information does not apply in that case. However, even Dynamic VDBs have models but they only define configuration for importing metadata and Translators and Resource Adaptors. Translators and Resource Adaptors Translators A Translator provides an abstraction layer between Teiid Query Engine and physical data source, that knows how to convert Teiid issued query commands into source specific commands and execute them using the Resource Adaptor. It also have smarts to convert the result data that came from the physical source into a form that Teiid Query engine is expecting. Teiid provides various pre-built translators for sources like Oracle, DB2, SQL Server, MySQL, PostgreSQL, XML, File etc. A Translator also defines the capabilities of a perticular source, like whether it can natively support query joins (inner joins, cross joins etc) or support criteria. A Transaltor along with its Resource Adaptor is always must be configured on a Source Model. Cross-source queries issued against a VDB running in Teiid result in source queries being issued to translator, which interact with the physical data sources. A Translator is defined by using one of the default pre-built ones, or you can override the default properties of the pre-built ones to define your own. The tooling will provide mechanisms to define override translators. Check out "Developer's Guide" on how to create a custom Translator that works with your Resource Adaptor. Resouce Adaptors A Resouce Adaptor provides the connectivity to the physical data source. This also provides way to natively issue commands to the source and gather results. A Resource Adaptor can be a RDBMS data source, Web Service, text file, connection to main frame or to a custom source you defined. This is often is JCA Connector, however there is no restriction how somebody provides the connection semantics to the Translator. However, if your source needs participate in distributed XA transactions, then this must be a JCA connector. Other than providing transactions, JCA defines how to do configuration, packaging and deployment. This also provides a standard interaction model with the Container, connection pools etc. It can be used for more than just Teiid data integration purposes.
  12. 12. A instance of resouce adaptor is created by defining a "-ds.xml" file in the JBoss AS. This is same operation that is used to create Data Sources in JBoss AS. Check out the "Developer's Guide" on how to create a custom Resource Adaptor. Translator Capabilities translator capabilities define what processing each translator/source combination can perform. For example, most relational sources can process joins and unions, whereas when processing delimited text files these operations cannot be performed by the resource adaptor or the "source" (in this case, the file system). Capabilities are used by the Teiid query engine to determine what subsets of the overall federated query plan can be pushed down to each source involved in the query. Translator capabilities define the capabilities of a source in terms of language features (joins, criteria, functions, unions, sorts, etc). In addition, the source model defined in a virtual database may specify additional constraints at the metadata level, such as whether a column can be used in an exact match or wildcard string match, whether tables and columns can be updated, etc. In combination, these features can be used to more narrowly constrain how users access a source. Resource Adaptors and Security It is possible to use the security system of individual data sources if this is desired. When the resource adapter is JCA connector, they can be configured with separate "security-domain" in their "-ds.xml" files in the JBoss AS. However, calling thread need to login into the context before they use Teiid. Administering In Teiid, Translators and Resource adaptors can be configured and monitored using the Teiid Console, or using the Teiid Server Administrative API.
  13. 13. Data Services A data service is a standards-based, uniform means of accessing information in a form useful to business applications. Since data is rarely in a form required by applications and services, and is often not even in a single data source, a key requirement for data services is that they abstract the data from its physical persistence structure, presenting it in a form that is closer to the needs of the using application. This effectively decouples consuming applications from the structure of the underlying data. Hand-in-hand with abstraction, a federated query engine is required to execute the transformations defining the abstraction layers in an efficient manner, and to expose the abstracted structures through uniform and standard APIs. The two key components of a data services architecture, then, are: • Modeling environment, to define the abstraction layers -- views and Web services • Execution environment, to actualize the abstract structures from the underlying data, and expose them through standard APIs. A query engine is a required part of the execution environment, to optimally federate data from multiple disparate sources. See SOAs and Data Services for more information on the role data services play in an SOA. Technical and Business Viewpoints Data services can be viewed from both a technology vantage point, or from a business viewpoint. The Technology Viewpoint Teiid provides a suite of projects that provide data services to business applications. That is, Teiid provides a means to access integrated data from multiple data sources, through your preferred standards-based API. Teiid provides access to federated information through JDBC (SQL or XQuery), ODBC (SQL or XQuery), and SOAP (Web services). The Business Viewpoint A more business- or user-centric view of data services is that they are information representations required by business applications. From this perspective, data services are defined and designed by business analysts, modelers, and developers to represent the information structures required by business applications. Often, a key design goal is one of interoperability - the requirement that systems work together seamlessly, including when exchanging data. Teiid provides graphical and other tools for defining these interoperable data services, essentially relational and XML views that can be used by business applications in a semantically-meaningful manner. These two viewpoints roughly correspond to the Execution and Modeling components of a data services solution, respectively.
  14. 14. Data Services - An Essential Part of an SOA Data services are a key part of a service-oriented architecure, or SOA. They provide the necessary interface to data for all business services. • Expose all data through a single uniform interface • Provide a single point of access to all business services in the system • Expose data using the same paradigm as business services - as "data services" • Expose legacy data sources as data services • Provide a uniform means of exposing/accessing metadata • Provide a searchable interface to data and metadata • Expose data relationships and semantics • Provide uniform access controls to information
  15. 15. Service-Oriented Architectures and Data Services Service-oriented architectures are all the rage these days, and for good reason. The guiding principles of SOAs are based on lessons well-learned over the brief history of computing, most notably that of decoupling of system components. It is these same principles that motivate the use of data services in an SOA. SOAs and Abstraction Decoupling is the key concept in SOAs and is achieved through abstraction based on service interfaces. Business processes in an SOA represent a formalized, executable form of the actual enterprise's processes, but offer a layer of abstraction above the physical processes, be they automated or manual. Business processes are composed of business services. Just as business processes in an SOA represent an abstraction from their real-world counterparts, so do business services offer an abstraction of actual physical services. Decoupling through abstraction imbues SOAs with immense potential to model business operations independent of the IT infrastructure du jour.
  16. 16. SOAs, as their name makes clear, are architectures. These architectures, as we've seen, involve business processes composed of business services. Business processes and services both make use of business information, which is likely resident in many different types and instances of databases and files. This information can be exposed to business services using the same service-oriented paradigm - as data services. Data Services Just as business processes and services in an SOA represent abstractions - albeit executable ones - of their real-world counterparts, so too do data services represent an abstraction of underlying enterprise information. Data services expose information to business services in a form and through an interface amenable to those services. The form is generally some representation of business objects to be manipulated by business services and passed between services by business processes. Business objects may be simple tabular structures or complex nested structures. Almost always, though, they must be composed from information residing in more than one data source, often in different persistence formats. So a key requirement of data services is that they: • expose integrated information in one or more desired formats, even if the original data are in different formats. The desired interface is dependent on the architecture being used. A Web service-based SOA will provide a SOAP or REST-based interface to XML-formatted business objects. A more traditional Java or C-language RCP-based architecture will require JDBC or ODBC access to tabular information, obtained from multiple data sources. So, a second key requirement of data services is that they: • expose information through one or more consistent, standard interfaces, even if the original data are accessed through different interfaces. These two key requirements of data services are achieved by two different technologies: • modeling to define the required format of data, integrated from the underlying sources; and • a query engine for processing these abstract definitions efficiently, exposing the integrated information through one or more interfaces. Together these form the basis for a data services architecture underpinning a robust SOA, making data available to business processes and services in the required format and through consistent, standard interfaces.

×