AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
JBoss Enterprise Data Services (Data Virtualization)
1. 1
JBoss Enterprise Data Services
Peter Larsen
JBoss Solutions Architect, Red Hat Inc.
plarsen@redhat.com
2. 2
Agenda
• Motivation for Data Services
• EDS in the community / History
• Positioning Data Services in an Enterprise Architecture
• Use Case Domains / Customer Examples
• Technical Architecture
• Demo
3. 3
The Business Inflexibility Trap
Inflexibility: The
essential Business
Problem
With agility all
problems are
solvable
“With enough eyes,
all bugs are shallow”
4. 4
Agility is Key
Because change is the only constant
Technologies
Requirements
Regulations
Standards
Models
Processes
???
What is the single most important problem preventing Agility?
5. 5
Problem: Data Challenges
Challenges
Different physical structure
Different terminology and meaning
Different interfaces
May need to federate/integrate
May be “locked in” to database
Must ensure performance
Maintain/Improve security
Tremendous value in existing information assets, but...
Time consuming and costly to implement new applications
that leverage this information
Data Warehouse Packaged
Applications
Operational
Data Stores
Data Gap
6. 6
Problem: Data Challenges – Alternatives
Time consuming – difficult/costly
No re-use of data logic
Any changes break the application
Data Gap
Data Gap
Hard Code
Replicate/Data Mart
Data not fresh
Costly – additional licenses
More copies of data = more silos
Governance/security
7. 7
Solution: JBoss Enterprise Data Services
JBoss Enterprise
Data Services
Data Service Data ServiceData Service
SQL
Web
Services
● Access to multiple data stores in real time
● Standards-based read/write access
● Speeds application development
● Transform data structure and semantics
● Consolidates data into a “single view”
● Centralized access control
Enterprise-proven – flexible, scalable,
high-performance
Turns the Data You Have Into the Information You Need
8. 8
Data Services Platform – Where it fits
JBoss Enterprise Data Services Platform
Other Vendor
Portal /
ESB/SOA
Platforms
Data Service Data Service Data Service Data Service Data Service
10. 11
Data Services Platform Common Use Cases
Service Oriented Architecture
– Federate/transform data efficiently use by higher-level services
– Insulate business processes from data access details
Business Intelligence, Operational Analytics, Reporting
– Consolidated financial reports/dashboards/KPIs
– Virtual data marts
Information Consolidation, Reference Data Management
– Single/360 view of Customer
– Single/360 view of Supplier
– Single/360 view of Employee
Regulatory Compliance
– Provide common security, central access and auditing of data
– VISA PCI, Sarbanes Oxley, Basel II, HIPAA
11. 12
JDBC/ODBC
Query Engine
Data Virtualization, Federation
JBoss Enterprise Data Services
JBoss
ModeShape
Repository Services
JBoss jBPM JBoss Rules
JBoss Enterprise SOA Platform
JBossESB
JBoss Enterprise Application Platform
Red Hat Enterprise Linux
Windows, UNIX, other Linux
Turns the data you have into the
information you want
Augments and extends SOA Platform to
address data access, integration and
abstraction.
• SOA Patterns, best practices
• Reporting/Analytics enablement
• Information Consolidation, Data Mgmnt
• Data Governance, Compliance
Real-time read/write access to
heterogeneous data stores
Speeds application development by
simplifying access to distributed data
Centralized access control, auditing
JBoss Enterprise Data Services
14. 15
Query Performance & Optimization
• Minimal overhead for simpler requests
• Control
– enforce mandatory criteria with certain requests
– enforce time and size limitations on requests
• Rule-based optimization
– use criteria to avoid unnecessary fields and records
– removal of unnecessary joins across data sources
– merge all transformation logic for a single source
• Cost-based optimization
– join algorithms (nested loop, merge, dependent, hash)
– cost profile of each data source
• Data caching and staging (materialized views)
• Manage dataflow – buffer management
15. 16
Designer Tooling
Virtual Models
Physical Models representing
actual data sources
• Shows structural
transformations
• Defines
transformations
with
– Selects
– Joins
– Criteria
– Functions
– Unions
– User Defined
16. 18
Semantic Mediation/Integration
T
Authoritative Sources:
• Mapped to logical view
Multiple Internal/External Information Sources
Application views of
information:
• Relational, XML, Java
T T
XML Document
<a>
</a>
<b>
</b>
…
T
T
T
Web
Services
Web
Services
Workflow/
ESB
Workflow/
ESB
Business
Applications
Business
Applications
Claims, Billing, Policies, …
bldg_id SITENUM Facility_ID
Location_ID
bldg_type Depot_Number
Location_Type
Semantic Data Services
Data Dictionary:
• Based on logical data model or XML
schema
• Support for multiple COIs
• Support for multiple versions
17. 19
Data Services and the ESB
User-facing Logic
(Service
Consumers)
Business Logic
Data Logic
Process and Other
Integration Logic
Rich or Thin
Desktop
Process, Integraion Services
Business Services
ESB
Direct
ODBC,
JDBC
ODBC, JDBC
WSDL, SOAP, MOM, other
WSDL, SOAP, MOM, other
Process Orchestration Services
Data Services
18. 20
JBoss Enterprise SOA Platform
• Enables Business Process Automation
by integrating and orchestrating
application components and services
running on JBoss Enterprise Middleware
and/or any other standards-based AS
• Single distribution that integrates JBoss
ESB, jBPM, JBoss Rules, Enterprise,
Application Platform
• Enables multiple integration styles: SOA
integration, EAI, EDA, process and business
rules technologies to automate business
processes to improve business productivity
• Certified Platform for Service Integration
and Orchestration
• Simple, Flexible, and Scalable
• Light footprint, simple installation
• JBoss ON platform management and
services monitoring
• Scalable clustering to support high
transaction volumes
A flexible, standards-based platform
to integrate applications, SOA services,
business events and
automate business processes.
Red Hat Enterprise Linux
Windows, UNIX, other Linux
Workflow Rules
JBoss Enterprise SOA Platform
JBossESB
Transformation, Routing, Registry
JBoss Enterprise Application Platform
Container services, Hibernate, Web Services stack, Seam, Clustering, Cache,
Messaging, Transactions
19. 22
Enterprise Data Services 5.2
EDS 5.2 – Released December 2011
Tighter data services/ESB tooling integration
Performance tweaks
LOB handling
Cost-based optimizer enhancements
Programmatic view creation
Repository enhancements
Versioning support
More artifact types
Cloud-based data sources
Fixes, minor enhancements, additional platform certs
20. 23
Business Value of Enterprise Data Services
✔ Greater agility, faster
time to solution
✔ Increased ROA
✔ Improved
organizational
performance
✔ Better control of
information
Improved utilization of data assets
Derive more value from existing investments
Complements existing systems
Jumpstart Your SOA Initiatives!
Better/faster than hand coding
Faster, less costly than data replication
Data virtualization provides loose coupling
The right data at the right time to the right people
Decision support, BI with a complete view of
information across the enterprise
Powerful security, Auditing, Data Firewall
Avoid data silo proliferation
Central data access and policy, Compliance
21. 24
A Comprehensive Middleware Portfolio
JBoss Enterprise
Data Services Platform
JBoss Enterprise
SOA Platform
JBoss Enterprise Application Platform
JBoss Enterprise Web Platform
JBoss Enterprise Web Server
Red Hat Enterprise Messaging
JBoss Enterprise Portal Platform
JBoss Enterprise
Business Rules
Management System
JBoss Developer
Studio
Seam
Hibernate
Web Framework
Kit
JBoss
Operations
Network
Red Hat Services
Cloud Implementation Cloud GovernanceCloud Strategy &
Selection
VMWare
Microsoft
Hyper-V
Red Hat Enterprise
Virtualization
PrivatePublic
Amazon EC2
Other
RHEL, Unix, Windows
22. 25
Where did Teiid come from?
• Project lineage is from MetaMatrix starting in ~1999.
– Teiid - http://www.jboss.org/teiid
– Teiid Designer -
http://www.jboss.org/teiiddesigner
– DNA - http://www.jboss.org/dna/
• MetaMatrix was the leader in Enterprise Information
Integration (EII) – hence Teiid.
• Red Hat acquired MetaMatrix in 2007.
• Last major MetaMatrix product release, 5.5.4 – 11/09
23. 26
Project Status (March 2011)
• Open source 2/2009 – heavily refactored from 5.5
line
• 7.0 Initial release 6/2010
• 7.1 Teiid / Teiid Designer release 8/2010
– Basis for EDS 5.1 release – with hundreds of
issues resolved and targeted enhancements
• 7.4 Coming Soon! More source integration (MDX via
XMLA, Ingres), expanded function support, etc.
– should be picked up along the work in 7.1-3 by
the next service pack release.
24. 27
Community Version
• Community web site: www.jboss.org/teiid
• Teiid sub-projects: Teiid Runtime, Teiid Designer
• Teiid 7 is built for AS7
27. 30
The context
Organizations
Significant assets already deployed or otherwise in use
Applications, databases, services, spreadsheets, file
extracts, manual processes, tribal knowledge
Not realizing full benefit
Mandate
Remove business impediments, improve status quo
Control/reduce costs
– Derive greater value from the assets you already have
29. 32
Common Challenges
Data
Data sprawl
Tied up in silos
Not reconciled/integrated
Not easily usable
Decision making
Inflexible systems
Manual processes
30. 33
Common Challenges
Data
Decision making
Insufficiently informed
Missing key information
Stale or out-of-context information
Inflexible systems
Manual processes
31. 34
Common Challenges
Data
Decision making
Inflexible systems
Logic hard-coded into applications
Redundant logic, not standardized or shared
Changes require development cycle, resources, time
Unable to react quickly to business, market, IT changes
Manual processes
32. 35
Common Challenges
Data
Decision making
Inflexible systems
Manual processes
Business processes are manual
Data entry, swivel-chair integration
Overly dependent on individuals
Inconsistent, prone to error, difficult to govern
33. 36
Common Challenges
Data
Decision making
Inflexible systems
Manual processes
But...
These data, systems, applications, decision-making processes,
business processes and logic are your current assets – waiting to
be improved and put to better, more effective use.
How?
34. 37
Solution Patterns
1. Pattern: Data Foundation
2. Pattern: Information Delivery
3. Pattern: Externalize Knowledge
4. Pattern: Automate Decision Making
5. Pattern: Codify Business Processes
35. 38
Solution Patterns: Data Foundation
● Liberate, integrate, mediate, transform data
● Tap silos, gain control over data sprawl
● Create foundation data layer through data virtualization
xml
databases
warehouses
spreadsheets
services
<sale/>
<value/>
</ sale >
files
applications
…
ExistingExisting
sources andsources and
silos of datasilos of data
Integrated setIntegrated set
of canonicalof canonical
data objectsdata objects
CRM, Employee
SupplyChain,
Logistics
36. 39
Solution Patterns: Information Delivery
• Provide consistent information in the form required by different information
consuming applications, processes, services.
• Ensure complete information through all delivery modes/formats.
Forms:
Relational Tables/Views
Star schema
Procedures
Schema-compliant XML
Access Modes:
JDBC, ODBC
SOAP Web Services
POJO
XML over HTTP, JMS
<WSDL><WSDL>
(contract)
<WSDL><WSDL>
(contract)
<WSDL><WSDL>
(contract)
Custom Apps
Business Processes
Packaged Apps
Reports, Dashboards
Data warehouses
O/RMappingJDBC/OSOAP/JMS
CRM, Employee
SupplyChain,
Logistics
37. 40
Solution Patterns: Externalize Knowledge
• Externalize key business logic from application code
• Isolate and standardize rules that govern business decisions and
operations
• Enable business analysts and development to collaborate in defining
functional behavior
Rule sets possibilities:
Pricing
Fraud detection
Regulatory compliance
Productivity/Efficiency
Control systems
Product configuration
...
Insurance
Rules:
Age
Sex
Health
Occupation
= $ Price
38. 41
Solution Patterns: Automate Decision Making
• Move beyond reports to active analysis and decision making
• Extend rule sets to analyze information provided through earlier patterns.
• Process information on scheduled basis or dynamically as data is flowing
through applications and on the bus.
• Raise alerts, initiate corrective actions, seize opportunities
<sale/>
<value/>
</ sale >
39. 42
Solution Patterns: Codify Business Processes
• Codify the processes actually followed by your organization
• Create standardized, reusable workflows/orchestrations
• Eliminate unnecessary manual steps, keep human tasks only where
appropriate.
• Identify common business patterns – both standard “normal” processes
and exception remediation processes
• Extend automated decision making with business processes and vice
versa
40. 43
Solution Patterns
1. Pattern: Data Foundation
2. Pattern: Information Delivery
3. Pattern: Externalize Knowledge
4. Pattern: Automate Decision Making
5. Pattern: Codify Business Processes
41. 44
How technologies map to patterns
JDBC/ODBC
Data Virtualization
Data Access, Federation
JBoss Enterprise Data Services
Metadata
Repository
Repository Services
Workflow Rules
JBossESB
Transformation, Routing, Registry
JBoss Enterprise Application Platform
Container services, Hibernate, Web Services stack, Seam, Clustering,
Cache, Messaging, Transactions
Red Hat Enterprise Linux
Windows, UNIX, other Linux
JBoss Enterprise SOA Platform
1. Data Foundation
2. Information Delivery
3. Externalize Knowledge
4. Automate Decision Making
5. Codify Business Processes
42. 45
How technologies map to patterns
JDBC/ODBC
Data Virtualization
Data Access, Federation
JBoss Enterprise Data Services
Metadata
Repository
Repository Services
Workflow Rules
JBossESB
Transformation, Routing, Registry
JBoss Enterprise Application Platform
Container services, Hibernate, Web Services stack, Seam, Clustering,
Cache, Messaging, Transactions
Red Hat Enterprise Linux
Windows, UNIX, other Linux
JBoss Enterprise SOA Platform
1. Data Foundation
2. Information Delivery
3. Externalize Knowledge
4. Automate Decision Making
5. Codify Business Processes
43. 46
How technologies map to patterns
JDBC/ODBC
Data Virtualization
Data Access, Federation
JBoss Enterprise Data Services
Metadata
Repository
Repository Services
Workflow Rules
JBossESB
Transformation, Routing, Registry
JBoss Enterprise Application Platform
Container services, Hibernate, Web Services stack, Seam, Clustering,
Cache, Messaging, Transactions
Red Hat Enterprise Linux
Windows, UNIX, other Linux
JBoss Enterprise SOA Platform
1. Data Foundation
2. Information Delivery
3. Externalize Knowledge
4. Automate Decision Making
5. Codify Business Processes
44. 47
How technologies map to patterns
JDBC/ODBC
Data Virtualization
Data Access, Federation
JBoss Enterprise Data Services
Metadata
Repository
Repository Services
Workflow Rules
JBossESB
Transformation, Routing, Registry
JBoss Enterprise Application Platform
Container services, Hibernate, Web Services stack, Seam, Clustering,
Cache, Messaging, Transactions
Red Hat Enterprise Linux
Windows, UNIX, other Linux
JBoss Enterprise SOA Platform
1. Data Foundation
2. Information Delivery
3. Externalize Knowledge
4. Automate Decision Making
5. Codify Business Processes
45. 48
How technologies map to patterns
JDBC/ODBC
Data Virtualization
Data Access, Federation
JBoss Enterprise Data Services
Metadata
Repository
Repository Services
Workflow Rules
JBossESB
Transformation, Routing, Registry
JBoss Enterprise Application Platform
Container services, Hibernate, Web Services stack, Seam, Clustering,
Cache, Messaging, Transactions
Red Hat Enterprise Linux
Windows, UNIX, other Linux
JBoss Enterprise SOA Platform
1. Data Foundation
2. Information Delivery
3. Externalize Knowledge
4. Automate Decision Making
5. Codify Business Processes
47. 50
Architecture
• Socket transport and query engine have separate work
queues and thread pools
• Deep integration with JBoss AS
– MC, Profile Service, JCA, JTA, Web Services (consume and
produce), JAAS, standard logging
48. 51
Teiid Connector Architecture
• Teiid splits connectivity concerns into:
– Data Sources – standard JCA based pooled resources
configured on the server
– Translators – a Teiid specific CCI (common client interface)
that accesses a particular Data Source and is configured as
part of the VDB
• Extended metadata from the translator directs the optimizer
source query formation.
• In addition to out of the box offerings, our JDBC translator is
easily extended.
• Can be thought of as a JDBC/ODBC toolkit since the end
result is consumable through JDBC/ODBC
49. 52
Teiid Clustering
• Clustering is enabled in the SOA production/all profile
• Teiid does not require clustering, but will use it when available
• Clients will re-authenticate as needed in load-balancing/fail-
over scenarios
– The default strategy for determining cluster members is by
just using the URL.
• Deployments and jar updates need to happen on all nodes.
Farming should help with this.
• The result set cache and internal materialized views can be
replicated.
50. 53
Other Extension Points
• Logging (Log4j), specific contexts for audit and commands
• Configurable security domains for admin/query access
– Can utilize any container supported LoginModule
• User defined functions – both source specific and for
source/runtime execution via a Java method
• Groovy scripting through AdminShell
• Client discovery of Teiid instances
• Customizable WARs generated for Web Service access
53. 56
Credit Suisse: Derivatives Trading Dashboard
Challenge
● Monitor derivatives security trades to prevent rogue
trades and financial loss
● Trading data spread across many databases/systems
Solution
● Consolidate all trading data into “single view”
● Real-time access
● Transformation of data differences
Business Benefit
● Prevent financial loss, lower risk
● Saved time and cost to develop
● Easier to manage data changes
Data Services Platform
Dashboard
Data Sources
Data Service
One of many projects – part of “data layer”One of many projects – part of “data layer”One of many projects – part of “data layer”One of many projects – part of “data layer”
54. 57
Smith Barney: Unified Customer View
Challenge
Branch Managers’ account notes in two very
different applications (centralized DB2 on
mainframe, and distributed SQL Server (600
servers)
Cannot access extended account information
from other offices. Cannot manage “customer”
only individual accounts.
Two years behind schedule in making all notes
avail in one application
Solution
Enable CRM application to easily find customer
information across all databases
Real-time access
Business Benefit
Better management of “customer”, improved
customer service
Data Services Platform
Brokerage
CRM
Application
600 MS SQL DBs -
geographically distributed
Data Service
Single view of customer – key component in new data architectureSingle view of customer – key component in new data architectureSingle view of customer – key component in new data architectureSingle view of customer – key component in new data architecture
55. 58
Large Bank: Data Security/Governance
Challenge
VISA PCI mandates protection of card holder info
Difficult to maintain common security policy across
multiple data stores
Solution
Create “data firewall” across many data sources
Federate rather than replicate
Common access policy and common data definitions
across sources
Audit trail
Business Benefit
Single, central set of data security policies
Prove to auditors and regulators that data protection
requirements are being met.
Data Services Platform
WebFocus Portal
Data Sources
Data Service
““Data Firewall” to protect and govern use of dataData Firewall” to protect and govern use of data““Data Firewall” to protect and govern use of dataData Firewall” to protect and govern use of data
56. 59
DISA GCSS-J: Unified Logistics Portal
Challenge
Combatant commanders need timely logistics
Data spread across many databases/systems;
each system owned & managed by different agency
Solution
Provide a single capability to monitor and manage
personnel, equipment, and supplies – across all
databases
Real-time access
Networked environment allows DoD users to access
shared data & applications regardless of location
Business Benefit
Single portal for integrated logistics
Isolation/abstraction from “silos”
Easier to manage units, personnel, equipment
Data Services Platform
Multiple
Logistics
Tracking
Applications
Data Sources
Data Service
Consolidated Logistics Information – Deployed in TheaterConsolidated Logistics Information – Deployed in TheaterConsolidated Logistics Information – Deployed in TheaterConsolidated Logistics Information – Deployed in Theater
59. 62
Global Insurer: SOA Data Services Layer
Challenge
Deploying SOA reference architecture
Want common data model across sources
Don't want tightly bound data sources
Need to change sources without breaking applications
Solution
Data is accessed via data services
DSP provides federation and consistent logical data
model
Data model exposed through Web Services and SQL
Business Benefit
All applications get the same data through use of
common model
Easier to consume data with new applications
Easier to change/add data sources to architecture
Data Services Platform
Applications
Data Sources
Data Service
Service-enabled, consistent data model for SOAService-enabled, consistent data model for SOAService-enabled, consistent data model for SOAService-enabled, consistent data model for SOA
Data Service
Common Data Model
SOA Platform
60. 63
DISA ADNET: Anti-Drug Network
Challenge
Counter-narcotics and counter-narcoterrorism
Statutory detection and monitoring
Data is heterogenous & on multiple systems
Solution
MetaMatrix provides an abstracted view across
multiple State/Local Law enforcement agencies.
The virtual Database enables BI tools to get a
complete picture of a "person of interest" from any
history, warrants, jail, crimes, vehicles, etc...
Also, MM is used as a federated search layer looking
for possible persons of interest given general details
(cars, addresses, license, aliases, etc...)
Benefit
Enable ADNET to deliver on its mission
Data Services Platform
BI tools, Portal,
Federated Search
Data Service
Disparate, heterogenous
State/Local databases
61. 64
HQ/Langley: MDM and SOA Enablement
Challenge
Need to find Person of Interest among disparate
systems
Adherence/mapping to common schema
Data integration for SOA enablement
Solution
Created abstracted view of a Enterprise Schema that
is focused on Master Data Entities (Domains) like
Person, Organizations, etc.
Provide data services layer of the SOA stack, feeding
the ESB
ESB facilitates sync/async capabilities and provides
integrated enterprise data efficiently and rapidly to
multiple consumers
Benefit
Simplified data access and decoupled services and
apps from the underlying complex data infrastructure
Single view of data enables migration of external
sources into the Enterprise repository seamlessly and
without application impact
Data Services Platform
Portal, ESB,
Federated Search
Data Service
Disparate, heterogenous data
sources with varying
schemas/representations
62. 65
Intelligence Agency: Signal Analysis
Portal
Challenge
Intelligence analysts have to navigate multiple
systems to try to assess SIGINT
Underlying data consists of both managed data
assets and live feeds
Security is mission-critical
Solution
Provide a single capability to monitor and analyze
signal intelligence data across databases
Metadata repository allows for metadata-aware/driven
application
Business Benefit
Single portal for analysis – end of swivel-chair
integration
Federated data also put on DCGS ESB
Data Services Platform
Metadata-driven
Analyst Portal
Data Service
Consolidated SIGINT Data - DeployedConsolidated SIGINT Data - DeployedConsolidated SIGINT Data - DeployedConsolidated SIGINT Data - Deployed
Over a dozen unique
geospatial DBs – mix of live
& managed data
63. 66
Intel Architecture – Data Services and the
ESB
Metadata
Discovery
Service
SIGINT
Gateway
Service
Metadata
Publishing
Service
Metadata
Catalog
Alert
Subscription
Service
Event
Assessment
Service
Weather
Effect
Service
IMETS
(IWEDA)
E-Space
Services
Weather
Effects
EW
Data
Alert
Criteria
Alerts /
Events
Metadata
Metadata
Searches
InfrastructureInfrastructure
ServicesServices
ISR Data
Listener
Service
Async
“Callbacks”
Filters
Workflow
Engine
Service
Management
HUMINT
Data
Service(s)
HDWS (CHAMS)
Map /
Coverage
Google Earth
Rich Client
Handheld
NCES
Service
Discovery
Transformation
Engine
BC Gateway
Service
Force
Tracking
MIP Blue Force
Tracking
Google Earth
Rich Client
DCGS-A Services Network
EnterpriseServiceBus
66. 69
ResultSet Caching
• Caching of user query results.
• Scoping of results is automatically determined to be either
VDB (replicated) or session level.
• Configurable number of cache entries and time to live.
• Caching of XML document model results.
• Administrative clearing.
67. 70
CodeTable Caching
• Short cut to creating an internal materialized view table via the
lookup function
• Way to get a value out of a table when a key value is
provided.
• Example: Lookup(‘ISOCountryCodes,‘CountryName’,
‘CountryCode’, ‘US’)
• Limitations (why use Materialized Views):
– No option to use the lookup function and not perform
caching.
– No mechanism is provided to refresh code tables.
68. 71
Materialized Views
• Transformations are pre-computed and stored just like a
regular table
• When queries are issued against the views, the cached
results are used
• Improve Performance/Cost of accessing all the underlying
data sources and re-computing the view transforms each time
a query is executed
• Supports no cache queries(Fresh Data – full or partial)
– SELECT * from vg1, vg2, vg3 WHERE ... OPTION
NOCACHE
• Internal materialization creates Teiid temporary tables to hold
the materialized table
69. 72
When to Use Materialized Views?
• Underlying data does not change rapidly
• It is acceptable to retrieve data that is "stale" within some
period of time
• Access staged data rather than placing additional query
load on operational sources.
70. 73
Cache Hints – How They Are Used
• Indicate that a user query is eligible for result set caching
• Set the result set query cache entry memory preference or
time to live
• Set the materialized view memory preference, time to live, or
updatability
/*+ cache[([pref_mem] [ttl:n] [updatable])] */
– pref_mem - if present indicates that the cached results
should prefer to remain in memory
– ttl - if present n indicates the time to live value in milliseconds
– updatable - if present indicates that the cached results can
be updated
71. 74
Why JBoss Enterprise Data Services
Platform
Data Virtualization Technology
Real-time integration of diverse data, federation
Break down existing data silos, avoid creating new ones
Decouple applications from data stores through data services
Maintain control and security of information
Value
Maximize ROA - Return on Assets - get the most out of your existing
information and data stores.
Faster route to deployment, rapid prototyping, little/no coding
Savings in long-tail maintenance costs
Leverage skills/knowledge widely available in the industry (SQL/Eclipse)
Open source community
Available through JBoss subscription, includes JBoss SOA Platform
72. 75
Why Data Services Platform – cont'd
Flexibility
Support for standards like JDBC, ODBC, SOAP make it easy to integrate
with existing COTS applications and IT infrastructures.
Numerous extension points available to meet varying customer needs:
Connector API
Custom User-defined Functions, language extensions
Administrative API
Maturity
Based on MetaMatrix technology acquired by Red Hat in 2009. Industry
leader in the space
Technology under development for over 11 years. Many iterations,
improvements, refinements
Deployed in demanding production environments
73. 76
Repository: Metadata and more
ModeShape
● Data Service metadata
● Rules repository
● SOA repository
Includes:
● JCR Engine
● RESTful service
● WebDAV service
● JDBC driver
● Eclipse plug-in
● JBoss AS/EAP kit
● Sequencers
● JON plugin
● DB or file system storage
74. 77
Customer-Related Data
“Single customer view” requires unified view of Customers and Accounts
And Services and Transactions and Reference Data
Data Volume
Low High
Frequency of
Change,
Complexity
Static
Dynamic
Customer
Master Data
Customer
Organization
Customer
Demographics
Claims
Transactions
Call Center
Transactions
Benchmark
Data
Product/
Service
Catalog
Market
Prices
Transaction
History
Customer
Accounts
Customer
Documents
Text,
Image
Customer
Contacts
Customer
Data
Services
Reference
Data
360° view of the
customer
relationship
75. 78
Map Data Sources to XML and Deploy
• Model XML
Docs, Schemas
• Build XML Doc.
models from
XML Schemas
• Map XML Doc.
models to other
data models
• Enable data
access via XML
Designer Tooling for XML-centric Data Services
Editor's Notes
Pretty much follow the slide for the script. The business challenge is in the red box. “Applications” is any project initiative that makes use of existing data/information assets – could be business intelligence, dashboards, composite applications, CRM, business processes, higher level services in a SOA, etc.
The Challenges list is a more granular breakdown of tactical data challenges faced when addressing the business challenge.
You should use it (and the next slide) however to drill down into questions about their specific environment and integration challenges.
One point worth mentioning is the dynamic nature of the data environment. Particularly the need to incorporate new or changed data sources into the application infrastructure. Business drivers can be M&A activities, reorganization, infrastructure modernization, etc. Keeping production applications running and business users productive in this environment is challenging.
Script: Organizations may attack the data challenge in various ways.
Most frequently they choose to hard code the data access, transformation, and manipulation into their application directly. This has significant drawbacks:
It&apos;s time consuming , costly and resource-intensive.
Problems need to be solved again and again for each application and project. Logic is not standardized and is not reusable.
The solution is brittle – changes to data sources or incorporating new data sources break the application or, at best, require the application to be modified and retested and redeployed. Result = tight coupling.
Another common approach is to replicate or copy the data and create data marts. This is, in general a “better” solution than hard coding and is appropriate in some scenarios. However it does have drawbacks:
Costly – requires purchase of software for data mart (additional database licenses), may require purchase of ETL tools which are dominantly proprietary. The related ongoing maintenance and support costs.
Copied data is not current/fresh – it becomes a historical view. Not a good fit when applications need access to current data.
More copies of data creates more data silos exacerbating the original problem. Further more copies of data means less control of that data and potential compliance issues.
Finally, changes in business requirements or data sources require the copy scripts to be reworked and redeployed. This again is labor-intensive and time consuming. Result = lack of agility.
Both of these approaches also do little to assist with data security and standardization of data security policies – particularly hard coding.
Notes: Most large enterprises will use a combination of these two approaches. They will be familiar with (and possibly sensitive to) their limitations. Don’t condemn these methods as “out of date”, as there will remain use cases where each make sense (particularly data warehousing/data marts).
Script: The solution to these challenges is Data Virtualization through the use of data services. JBoss Enterprise Data Services Platform enables you to “turn the data you have into the data you want”.
Data virtualization enables you to bring together and integrate data from multiple, disparate, even distributed sources and makes the data appear as if it is available in one single source, rather than where it is actually stored. Data virtualization has evolved from Enterprise Information Integration (EII) and may be associated with other terms like information fabric, information-as-a-service, data federation, virtual data layer. In this approach, data is not copied. Rather, requests for data are automatically routed to the underlying data stores in real time and a single unified response is returned. This technology was pioneered by MetaMatrix; acquired by Red Hat in 2007. MetaMatrix is the technology basis for the JBoss Enterprise Data Services Platform.
Data services present integrated data through standards-based read/write views tailored to the needs of client applications accessing that data. Data Services and Data Virtualization allow you to produce reusable data views with the structure and terminology you want (even if the actual sources of data do not conform to that structure or terminology). Data Services provision data through SQL-based relational views (tables and columns, procedures) as well as XML and Web Services enabling the broadest set of client applications to easily access the data. In many cases, the Data Services Platform can be seamlessly inserted between an application and its data sources to add a buffer layer of flexibility and loose-coupling.
Common, consistent security can be applied to the Data Services so that all data assets accessed through the DSP adhere to a common security policy. Through auditing, both successful and failed attempts to access data are logged so administrators can monitor data security, react if necessary and demonstrate security enforcement after the fact.
This is proven technology as you&apos;ll see in later slides.
Notes: Yes, you can update (write) through data services as well even when there are multiple backend data sources involved. If the data sources are XA-compliant, the writes can participate in a distributed transaction to ensure data consistency.
Script:
Like the rest of JBoss products, the Data Services Platform is hardware, operating system, and middleware-agnostic. It can sit on top of all the resources in your enterprise, serving up integrated views via industry-standard protocols (SQL, XML, Hibernate) to:
Left most box: Other JBoss Enterprise middleware (naturally, we’d love it if you were a pure RH shop ;-) ). For example, making integrated data available to higher-level services in SOA Platform or providing multi-source data federation underneath Hibernate.
Center box: Other vendor platforms (IBM, Oracle, WebMethods, etc). Provide a set of data services forming a foundational data layer in an SOA
Right side of diagram: Directly to an application layer, such as operational applications, business reporting tools and analytical applications such as Business Objects and MicroStrategy. Such tools can connect directly to MetaMatrix just as they would any other database. You can thus perform BI on live operational data that is being integrated on the fly from multiple disparate systems.
Few overlaps among E3 integration technologies
In reality, they are more complementary than competitive
EAI won’t replace ETL or EII
ETL won’t replace EAI or EII
EII won’t replace EAI or ETL
The characteristics of the technologies is as indicated. EAI is best suited for process integration, while ETL & EII focus on data integration – ETL for batch extracts from sensitive operational systems, EII/data services for real-time integration of disparate sources
If you think of the enterprise as having two sides of the house with respect to information processing, the production side and the consumption side, data services often fit more readily on the consumption side. This is where there tends to be a proliferation of potentially unnecessary data marts that exist for the sole purpose of supporting a single application. These physical marts can be replaced/supplanted by a virtual mart.
A data services platform does introduce a small amount of latency (10-20 msec) to a query, as compared with a direct database call. Hence it is not recommended for highly transactional, high volume information processing (every credit card swipe or e-commerce transaction). Note though that it can also strategically reach back to production systems to combine historical data with live data for real-time decision making.
Data you have:
- information resides in multiple database sources, multiple vendor technologies
- source data typically looks different than the format required by the consumer
Data you want:
- MetaMatrix provides query engine to access data from multiple sources and combine it appropriately (federation)
- MetaMatrix provides design tool to capture/define transformation logic (metadata) to translate from source to target
Benefits:
- tooling (+metadata-driven approach) provides a more efficient way of developing applications that require integrated data
- federated approach to data integration provides a single view of multiple sources, without having to replicate the database, or build another data mart
Re-useable data services
Enterprise-wide data abstraction layer
Integrated views of data from multiple sources
Metadata-driven
Optimized performance
Interoperable security
Complements other tools (ETL, EAI, ESB, DQ, BI
uDynamic rule set means inappropriate optimizations are not considered
uOptimizations currently applied
–Plan across virtual groups as if they were not present
–Push specified criteria to bottom of plan to reduce data flow
•Through virtual groups
•Through unions
•Through expressions
–Apply specified criteria to other sources if join specifies an equivalence
–Minimize columns flowing through plan
–Use logical axioms to simplify criteria
–Dependent join - Subquery determined on the fly
–Push processing to data source if possible – this is often faster as it takes advantage of data source capabilities and reduces data flow
•Criteria evaluation
•Join evaluation
–Join plan optimization – ordering of joins
uCosting to decide when to apply optimizations
–Static costing – based on user-supplied metadata
–Dynamic costing – based on dynamic statistics
&lt;number&gt;
D
JBoss Enterprise SOA Platform enables business process automation by integrating and orchestrating (via jBPM) application components and services running on JBoss Enterprise Middleware, other JEE platforms, .NET and other applications exposed as standards-based web services. At right, we see the main components of the SOA Platform. The JBoss Enterprise Application Platform is there to provide infrastructure support for the ESB such as JMS messaging, web services stack and clustering. Additionally, SOA services may be hosted directly on the SOA Platform. The ESB delivers the integration fabric including event/message listening and routing, data transformation and an SOA service registry based on the UDDI v2 standard. JBPM provides workflow capabilities for Java developers as well as is used by the ESB itself to help provide the business event-drive architecture and the rules engine is included for advanced content based routing as well as business rules execution. JBoss Enterprise BRMS will support the rules engine in SOA Platform v5 in CY 2010.
The one liner summary is “The JBoss Enterprise SOA Platform is A flexible, standards-based platform to integrate applications, SOA services, business events and automate business processes”.
D
B
Sprawl: applications, databases, file extracts, spreadsheets, point databases like ms access.
Silos: Different lines of business or different business units. Different groups have ownership of problem domains and budget to address them.
- They may have been implemented as point solutions over the course of time to address particular business challenges or opportunities.
- The use of packaged application, SaaS or other COTS software may create silos.
- Mergers and acquisitions can cause application portfolios to bloat introducing yet more silos into your IT environment.
- Also creates ownership silos/fiefdoms.
Not integrated/reconciled: using data from various sources requires the data to be reconciled or rationalized. Doing this at point of use or consumption leads to duplication and inconsistency.
Decision making is often insufficiently informed,or worse misinformed due to lack of useful information, despite the sheer volume of data in most organizations.
Reports and dashboards are incomplete, missing key information. Or that information may be stale or out of context.
Getting the right information, at the right time to the right people, processes and applications is vital but often unrealized.
D
D
D
Here I propose several approaches or patterns to consider.
Each of these and more are possible and supported through various technologies in the Jboss Middleware portfolio. We&apos;ll look at the specific technologies that apply to these patterns shortly.
The patterns may be used together or independently though there&apos;s greater benefit to be gained when combining them.
We&apos;ll take a look at each of these in turn..
Expose information directly to applications, as standalone information services or as data services on the enterprise service bus.
Reports, while useful for visualizing information do not always highlight actionable information. That&apos;s left to the interpretation of the report reader or analyst. This is further complicated by the static nature of reports and the loss of context that can arise.
Rules however can be configured to fire based on patterns in information, events, data points collected and occurring in real-time. These indicate something has happened (or is happening), good or bad and requires action – maybe human intervention maybe an automated response or both.
Quick recap of 5 solution patterns we just ran through.
These are just a few examples of approaches you can take to derive more value from your existing assets. There are many possible permutations to these patterns. Applicability for your initiatives will be governed by your specific circumstances, goals, priorities, etc.
Let&apos;s take a look at the jboss technologies that enable these patterns and why Red Hat is uniquely positioned to help you tackle your challenges.
Though not shown here, tooling in the form of Jboss Developer Studio and Management and Monitoring via Jboss ON are also included.
Though not shown here, tooling in the form of Jboss Developer Studio and Management and Monitoring via Jboss ON are included.
Though not shown here, tooling in the form of Jboss Developer Studio and Management and Monitoring via Jboss ON are included.
Though not shown here, tooling in the form of Jboss Developer Studio and Management and Monitoring via Jboss ON are included.
Though not shown here, tooling in the form of Jboss Developer Studio and Management and Monitoring via Jboss ON are included.
Script:
Credit Suisse uses Data Services Platform in roughly 10 different production applications.
In this use case, Credit Suisse wanted to monitor derivative trades as they happened, clearly not a good use case for a data mart. The data is also coming from multiple systems. They created a consolidate view of trade information in Data Services Platform, and feed those integrated views to a custom portal which is instrumented with alerts and notifications based on the real-time data.
The benefits to their business include meeting the primary goal of reducing risk and preventing financial loss/exposure. They were also able to achieve very fast deployment time. Initial effort ran 5 months from project start to production deployment and resulted in better performance than the custom solution it replaced. Finally the solution set left Credit Suisse better able to manage subsequent data changes.
100K transactions per night, on 4-year-old boxes
a dozen data sources
4 years without a production outage
.
Script:
Every Smith Barney brokerage office has its own database, thus there was no way to market to high net-worth individuals that had multiple accounts at multiple branches. Citi created a single view of customer in DSP which can access account information from each branch where an account exists. A master index was created of unique customer id mapped against account id’s. Thus, any query joins against this table so only those branches that actually have account data are queried.
This solution has been in daily use by 15,000 financial analysts.
Another important point to make w.r.t. Data migration: SB knows this is not an optimal architecture going forward. They will migrate to a new topology tol replace the one-db-per-office scheme. As they do this migration, all of the applications, programs, and services built on the virtual integrated layer presented by DSP will not need to change. Simple updates to mappings and connection properties are made in the Data Services middleware. The applications themselves continue to run unmodified.
Script:
This large bank based on the east coast needed to comply with VISA PCI, a set of regulatory policies for the protection of credit card holder data that VISA requires. Credit card data must be filtered/isolated, and any copy (to a new datamart or even a spreadsheet) must carry all the governance/protections with it.
DSP does not replicate the data, it can ensure that only authorized parties have access to data (and can filter data depending on role). DSP is deployed as a Data Firewall – a term coined by the customer.. All access to the protected data is through the data service facade, where it is authenticated, authorized, filtered, and even audited and logged so that the enterprise can demonstrate to regulators that the data is being protected.
In addition, the bank went a step further and used DSP to provide a common data dictionary for this sensitive customer information, so that applications would consistently be able to use this valuable data
DSP is being used in conjunction with a 3rd party BI tool (IBI&apos;s WebFocus), and security is integrated with the bank&apos;s single-sign-on solution such that only appropriate information is served up to the reporting user.
Note VISA PCI regulations also apply to large retailers thus this use case is not restricted to card issuers.
Script:
Credit Suisse uses Data Services Platform in roughly 10 different production applications.
In this use case, Credit Suisse wanted to monitor derivative trades as they happened, clearly not a good use case for a data mart. The data is also coming from multiple systems. They created a consolidate view of trade information in Data Services Platform, and feed those integrated views to a custom portal which is instrumented with alerts and notifications based on the real-time data.
The benefits to their business include meeting the primary goal of reducing risk and preventing financial loss/exposure. They were also able to achieve very fast deployment time. Initial effort ran 5 months from project start to production deployment and resulted in better performance than the custom solution it replaced. Finally the solution set left Credit Suisse better able to manage subsequent data changes.
&lt;number&gt;
This fairly complex slide shows a structured approach to data servcies design on a large project - the GCSS-J project at DISA – Defense Information Systems Agency.
Along the bottom are the various data sources and content providers. Along the top are the “public” or “consumer” views of logistics info seen by decision support applications.
Middle layers, in the box labeled “Private Data and Metadata” contain multiple,incremental layers of abstraction and transformation – detail that is kept private from data consumers at the top. This multi-layer approach facilitates change management by insulating higher layers from changes in lower layers.
Taking those middle layers one by one from the bottom, we start with the physical layer (blue). This is a nearly direct representation/mapping to underlying physical information stores. The next layer up, Virtual Base Layer is where the first level of transformation occurs. Here, standardized terminology and data types are introduced to conform to canonical definitions. The next layer is the Virtual Mid Layer. Here, domain-specific information entities are constructed. These represent the conceptual data objects that higher levels of the architecture expect – effectively the logical data model.
Lastly, back at th top, the Public Data views are independent of the logical data model. A logical entity can be exposed through means most appropriate for the intended data consumer - e.g. relational tables for a reporting application or web services for integration withhigher-level processes.
Also note the ability of MMX to put controls on the queries ultimately issued to the physical sources. This can be used to help negotiate access to systems that might not be within the domain/under the control of the business unit or agency doing the federation. In this case, DISA did not own or have authority on any of the dozen systems that they needed to access.
Script:
[Not AIG thankfully, although this customer is not renewing – we need a new example here, although the principles still apply]
Company&apos;s term for what DSP provides is Information Integration Services (IIS)
- IIS provides a standardized data access layer, providing Create, Read, Update, and Delete (CRUD) data transaction services for “Enterprise Data”
- IIS defines data in terms of interfaces used by information consumers and producers
- Interfaces are represented in a common, standard business nomenclature independent upon source systems (e.g., Beneficiary, Participant, Plan, Sponsor, Policy, etc.)
- IIS makes enterprise data shareable data across multiple systems within a line of business (LOB) and across LOBs
- IIS provides a single point of entry for enterprise data:
- Secure access to data – role based authorization to classified data
- Audit of data transactions
- Access metering for usage
- 2 standards based technology interfaces (APIs) are supported:
- Web Services – In support of service oriented applications
- SQL – In support of reporting and analytics applications (e.g., Business Objects)
- DSP is complementary to other third-party technologies deployed in the company&apos;s environment (Business Objects for BI, Informatica for ETL, WebMethods for EAI/ESB)
Script:
Credit Suisse uses Data Services Platform in roughly 10 different production applications.
In this use case, Credit Suisse wanted to monitor derivative trades as they happened, clearly not a good use case for a data mart. The data is also coming from multiple systems. They created a consolidate view of trade information in Data Services Platform, and feed those integrated views to a custom portal which is instrumented with alerts and notifications based on the real-time data.
The benefits to their business include meeting the primary goal of reducing risk and preventing financial loss/exposure. They were also able to achieve very fast deployment time. Initial effort ran 5 months from project start to production deployment and resulted in better performance than the custom solution it replaced. Finally the solution set left Credit Suisse better able to manage subsequent data changes.
Script:
Credit Suisse uses Data Services Platform in roughly 10 different production applications.
In this use case, Credit Suisse wanted to monitor derivative trades as they happened, clearly not a good use case for a data mart. The data is also coming from multiple systems. They created a consolidate view of trade information in Data Services Platform, and feed those integrated views to a custom portal which is instrumented with alerts and notifications based on the real-time data.
The benefits to their business include meeting the primary goal of reducing risk and preventing financial loss/exposure. They were also able to achieve very fast deployment time. Initial effort ran 5 months from project start to production deployment and resulted in better performance than the custom solution it replaced. Finally the solution set left Credit Suisse better able to manage subsequent data changes.
Script:
Credit Suisse uses Data Services Platform in roughly 10 different production applications.
In this use case, Credit Suisse wanted to monitor derivative trades as they happened, clearly not a good use case for a data mart. The data is also coming from multiple systems. They created a consolidate view of trade information in Data Services Platform, and feed those integrated views to a custom portal which is instrumented with alerts and notifications based on the real-time data.
The benefits to their business include meeting the primary goal of reducing risk and preventing financial loss/exposure. They were also able to achieve very fast deployment time. Initial effort ran 5 months from project start to production deployment and resulted in better performance than the custom solution it replaced. Finally the solution set left Credit Suisse better able to manage subsequent data changes.
Note: Key takeaway from previous slides is that the product is being used in real, substantial production customer deployments
Walk through bullet. For Data Virtualization Technology, this should be rehash of things already covered.
For Value –
Return on Assets is a good model for thinking about the technology. Leveraging their existing information and infrastructure is itself a cost conscious approach.
Faster to deployment speaks to the efficiencies gained using EDSP vs hard coding or setting up heavy ETL infrastructure.
Similarly with maintenance savings, as application needs and data changed occur, it&apos;s far easier to manage the change in a centralized data layer than in brittle application logic or data movement batch scripts.
SQL and Eclipse common technologies understood by a broad population of technical contributors.
Standard value prop/pitch re open source and subscription.
Note: while you don&apos;t want to lead with price, EDSP is considerably less expensive than competing commercial products. We don&apos;t have to be shy about that.
Script:
The Data Services Platform is designed with flexibility in mind. No one-size solution really fits all needs. By supporting well known standards, EDSP is easy to integrate with common business applications, tooling and IT/SOA infrastructures. Through well documented extension points, customers, partners and/or Red Hat services professionals can tweak the product to meet the broadest set of requirements. These extension points include but are not limited to the Connector API for developing connectors to data sources not supported by out-of-the-box connectors, user-defined functions for encapsulating custom transformation logic, a membership SPI for integrating with custom security infrastructures and an administrative API for scripting or integration with IT management infrastructures.
Lastly, this technology is mature. It came to Red Hat in 2007 with the MetaMatrix acquisition. MetaMatrix was a pioneer and leader in the information integration space – they developed, improved and refined the product functionality since the first product release in 2000. Since then it has been deployed in demanding production environment from financial services, and insurance to Intelligence and Department of Defense helping these organizations leverage their information assets, break down silos and modernize and secure their data infrastructure.