Data services in a Service Oriented
Architecture

 Syed M Shaaf
 Red Hat


 @sshaaf
 20th April 2012
Problem: Data Challenges

                 Tremendous value in existing information assets, but...

              Time consuming and costly to implement new applications
                           that leverage this information



Challenges
      Different physical structure
      Different terminology and meaning
                                                                            Data Gap
      Different interfaces
      May need to federate/integrate
      May be “locked in” to database
      Must ensure performance
                                                             Operational    Packaged
      Maintain/Improve security            Data Warehouse
                                                             Data Stores   Applications
EDS – Open source solution



EDS is a data virtualization
 system that allows applications
 to use data from multiple,
 heterogenous data stores.
What is EDS?
●   EDS is an open source solution for scalable
    information integration through a relational
    abstraction.
●   EDS focuses on:
    ●   Real-time integration performance
    ●   Feature-full integration via SQL/Procedures/XQuery
    ●   Providing JDBC access
●   EDS enables:
    ●   Data Services / SOA
    ●   Legacy / JPA integration
Turn the data you have into the
     information you need
Architecture
Architecture
Query Plan

●   Parsing
●   Resolving
●   Validating
●   Rewriting
●   Logical plan optimization -
●   Processing plan conversion -
Query Plan
 SELECT e.title, e.lastname FROM Employees AS e JOIN
 Departments AS d ON e.dept_id = d.dept_id WHERE year(e.birthday) >= 1970
 AND d.dept_name = 'Engineering'
Optimizations




  Access patterns for handling criteria
  and pushdown leverages the source
  database
Optimizations




  Making use of Dependant joins and
  optional joins
Cursors and batching



  By default all results are cursored and
  all results are in a batch. Set the
  processor batch size, connector
  batch size or fetchsize via jdbc
Buffer Management




  Buffers are stored in memory and/or
  on disk
Processing



 Joins are done by default as merge-
 sort and sorting algorithm is multi
 pass merge-sort
Handling Load
●   Memory Usage – the BufferManager acts as a memory
    manager for batches (with passivation) to ensure that
    memory will not be exhausted.
●   Non-blocking source queries – rather than waiting for
    source query results processor thread detach from the
    plan and pick up a plan that has work.
●   Time slicing – plans produce batches for a time slice
    before re-queuing and allowing their thread to do other
    work (preemptive control only between batches)
●   Caching – ResultSets, processing plans, internal
    materialized views, etc.
More on Caching

●   See the caching guide and
    http://community.jboss.org/wiki/AHowToGuideForMaterializationcachingViewsInTeiid

●   Admins can primarily control prepared plan and result
    set caching. Procedure plans are also automatically
    cached in the plan cache.
●   Scoping of cache entries is determined automatically
●   Internal materialization leverages EDS temp tables,
    which are in turn backed by the buffer manager.
●   Canonical value caching is dynamically used to cut
    down on the memory profile – can be disabled.
●   Internal caching of metadata at various levels.
Transactions
●   Three scopes
    ●   Global (through XAResource)
    ●   Local (autocommit = false)
    ●   Command (autocommit = true)
●   All scopes are handled by JBoss Transactions
    JTA
●   Command scope behavior is handled through
    txnAutoWrap={ON|OFF|DETECT}
●   Isolation level is set on a per connector basis.
Demo
Where to find it

●   http://access.redhat.com

Mow2012 data services

  • 1.
    Data services ina Service Oriented Architecture Syed M Shaaf Red Hat @sshaaf 20th April 2012
  • 2.
    Problem: Data Challenges Tremendous value in existing information assets, but... Time consuming and costly to implement new applications that leverage this information Challenges  Different physical structure  Different terminology and meaning Data Gap  Different interfaces  May need to federate/integrate  May be “locked in” to database  Must ensure performance Operational Packaged  Maintain/Improve security Data Warehouse Data Stores Applications
  • 3.
    EDS – Opensource solution EDS is a data virtualization system that allows applications to use data from multiple, heterogenous data stores.
  • 4.
    What is EDS? ● EDS is an open source solution for scalable information integration through a relational abstraction. ● EDS focuses on: ● Real-time integration performance ● Feature-full integration via SQL/Procedures/XQuery ● Providing JDBC access ● EDS enables: ● Data Services / SOA ● Legacy / JPA integration
  • 5.
    Turn the datayou have into the information you need
  • 6.
  • 7.
  • 8.
    Query Plan ● Parsing ● Resolving ● Validating ● Rewriting ● Logical plan optimization - ● Processing plan conversion -
  • 9.
    Query Plan SELECTe.title, e.lastname FROM Employees AS e JOIN Departments AS d ON e.dept_id = d.dept_id WHERE year(e.birthday) >= 1970 AND d.dept_name = 'Engineering'
  • 10.
    Optimizations Accesspatterns for handling criteria and pushdown leverages the source database
  • 11.
    Optimizations Makinguse of Dependant joins and optional joins
  • 12.
    Cursors and batching By default all results are cursored and all results are in a batch. Set the processor batch size, connector batch size or fetchsize via jdbc
  • 13.
    Buffer Management Buffers are stored in memory and/or on disk
  • 14.
    Processing Joins aredone by default as merge- sort and sorting algorithm is multi pass merge-sort
  • 15.
    Handling Load ● Memory Usage – the BufferManager acts as a memory manager for batches (with passivation) to ensure that memory will not be exhausted. ● Non-blocking source queries – rather than waiting for source query results processor thread detach from the plan and pick up a plan that has work. ● Time slicing – plans produce batches for a time slice before re-queuing and allowing their thread to do other work (preemptive control only between batches) ● Caching – ResultSets, processing plans, internal materialized views, etc.
  • 16.
    More on Caching ● See the caching guide and http://community.jboss.org/wiki/AHowToGuideForMaterializationcachingViewsInTeiid ● Admins can primarily control prepared plan and result set caching. Procedure plans are also automatically cached in the plan cache. ● Scoping of cache entries is determined automatically ● Internal materialization leverages EDS temp tables, which are in turn backed by the buffer manager. ● Canonical value caching is dynamically used to cut down on the memory profile – can be disabled. ● Internal caching of metadata at various levels.
  • 17.
    Transactions ● Three scopes ● Global (through XAResource) ● Local (autocommit = false) ● Command (autocommit = true) ● All scopes are handled by JBoss Transactions JTA ● Command scope behavior is handled through txnAutoWrap={ON|OFF|DETECT} ● Isolation level is set on a per connector basis.
  • 18.
  • 19.
    Where to findit ● http://access.redhat.com