Prague data management meetup 2017-02-28

Operational Data Store
#12
28. 2. 2017
Prague Data Management Meetup

Agenda
• Prague Data Management Meetup
• Operational Data Store

Prague Data Management Meetup
Data Management
Získávaní dat
Ukládání dat
Zpracování dat
Interpretace dat
Použití dat
• Otevřená profesionální zájmová
skupina
• Každý je vítán (ať už v pasivní
nebo aktivní roli)
• Témat není nikdy dost
• Snaha o pravidelné měsíční
setkávání
• Fungujeme od září 2015

Historie
Datum Téma
10. 9. 2015 Data Management
14. 10. 2015 Data Lake
23. 11. 2015 Dark Data (without Dark Energy and Dark Force)
12. 1. 2016 Data Lake (znova)
7. 3. 2016 Sad Stories About DW Modeling (sad stories only)
23. 3. 2016 Self-service BI Street Battle
27. 4. 2016 Let's explore the new Microsoft PowerBI!
22. 9. 2016 Data Management pro začátečníky
17. 10. 2016 Small Big Data
22. 11. 2016 Základy modelování DW
23.1.2017 Komponenty datových skladů
28.2.2017 Operational Data Store

Operational Database vs. Data Warehouse
Characteristic Operational Database Data Warehouse
Time focus Current Historical
Details level Individual Individual and summary
Orientation Process Subject
Records per request Few Thousands
Normalization level Mostly normalized Normalization relaxed
Update level Highly volatile Mostly refreshed (non volatile)
Data model Relational (3NF) Relational (star schemas, hybrid, 3NF) and multidimensional
(data cubes)
Source: CourseraOperational Data Store

Inmon, Imhoff & Battas ODS Definition
• Features:
• Subject-oriented (like a data warehouse)
• Made up of integrated data (standard, consistent data formats)
• Volatile (changes as often as the source system)
• Current (low-latency data capture; no historical detail)
• Defined in the mid-1990s
• Later Adopted by Gartner, Inc.
• When limited in scope to customer or product data, the canonical
ODS is similar to master data management (MDM).
9

Adastra Business Intelligence Reference Architecture
10
ODS
Operational
reporting
Enterprise DWH Big Data
Platform
Data Lake
Event
Processing
Semantic
Models
Advanced Analytics
Perceptual / cognitive intelligence
Information Management
Relational / Structured data Unstructured data Streaming
Data Workflow
Orchestration
Data Transformation /
Processing
Data
Management
Event Ingestion
Complex Event
Processing
Notifications
BI / Application
Integration
Machine Learning
In-database Data Mining, R
Recognition of human
interaction and intent
SMP and MPP
In-memory technologies
In-memory Columnar
In-memory technologies Hadoop, NoSQL
Business Intelligence / Data Delivery
Real-time DashboardsDashboards and visualizationsReports Self-service BIMobile BI
IoT Network
Field Gateway
Big data
OLAP

Architecture Reasons for ODS
• Copy vs. Reference - why copy data into ODS?
• Performance issues
• Faster local data access
• Load distribution (Operational and Reporting)
• Time issues
• Less granularity of secondary system
• History
• Availability issues
• e.g. primary 10x5, secondary 24x7
• Consolidation issues
• e.g. Consolidated client, product
• Security issues
11

ODS Possible Roles in Architecture
• ODS as data store for operational processes (PDI/CDI)
• ODS as DWH stage
• ODS as operational reporting data source
• ODS as data exchange component
• ODS as data cache for other systems
• ODS as MDM solution
• ODS as replacement of legacy system
• ODS as DWH data load type (near-real time DWH)
12

Truth in data
13
Primary data
Primary data
(another system)
Secondary data
Consolidated data
…Noise generator
Truth
• Independent truth in data does not exist
• Truth depends on Business and Data architect definition

Inmon ODS Classes
• Class I. (Real-Time ODS)
• Transactions were moved to th e ODS in an immediate manner from applications in a range
of one to two seconds from the moment the transaction was executed in the operational
environment until the transaction arrived at the ODS. In this case, the end user could hardly
tell the difference between an activity that had occurred in the operational environment and
the same activity as it was transmitted in the ODS environment.
• Class II. (Near Real-Time ODS)
• Activities that occurred in the operational environment were stored and forwarded to the
ODS every four hours or so. In this case, there was a noticeable lag between the original
execution of the transaction and the reflection of that transaction in the ODS environment.
However, this class of ODS was much easier to build and to operate than a Class I ODS.
• Class III. (Daily ODS)
• The time lag between execution in the operational environment and reflection in the ODS is
overnight. In a Class III ODS there is a noticeable time lag between the execution of the
transaction in the operational environment and the reflection of the transaction in the ODS
environment. This type of ODS is relatively easy to build.
• Class IV. (Datawarehouse ODS)
• A Class IV ODS is one that is fed from the data warehouse from analysis created by the DSS
analyst in the data warehouse environment and condensed down to a point where the
results of the analytical processing fit comfortably in the ODS. The input to the ODS can be
either regular or irregular. This class of ODS is very easy to build as long as the data
warehouse has already been constructed.
• (Class V.)
• Highly integrated and aggregated data source for reporting
14

Alternative ODS Typology (Execution MiH)
• TYPE I (Data Cache)
• Online data store, used for transaction execution and system interface purpose
• These data stores have source system data replicated in the central data store. The source system exchange data with other systems through this data store,
instead of exchanging point to point interface files
• Other applications of this kind of data store architectures is to provide a common database for source systems to directly refer to. For example, you can
have the source systems updating and referring to the sanitized master tables existing in the ODS (we will refer to this in our Master Data Management
Section, which is still under authoring). There are situations where the source system is directly referring to or updating a table in an ODS.
• TYPE II (CDI/PDI)
• Online data stores, used for Servicing and Relationship
• This is a similar application as mentioned above, however the focus is limited to getting single customer, process and master data view for the sake of
stakeholder servicing (like customer, employee and Vendor servicing). The examples are customer relationship single view, or customer touch point single
view. You can retrieve this single view during your in-bound or out-bound interactions with the customers. This online operational access, gives you the
benefit of risk management, cross-sell, up-sell etc.
• TYPE III (Operational Reporting)
• For reporting
• Technically it is not an ODS, but people use the term for this application as well. You can have a reporting data to churn out your operational reporting. It
has replica of select data from the source systems. It generally has low-intervention transformation.
15

Microsoft: DWH vs. ODS
• The purpose of the Data Warehouse (DWH) in the overall Business Intelligence Architecture is to integrate corporate data from different
heterogeneous data sources in order to facilitate historical and trend analysis reporting. It acts as a central repository and contains the
"single version of truth" for the organization that has been carefully constructed from data stored in disparate internal and external
operational databasessystems.
• The purpose of the Operation Data Store (ODS) is to integrate corporate data from different heterogeneous data sources in order to
facilitate real time or near real time operational reporting. Often data in the ODS will be in structured similar to the source systems,
although during integration it can involve data cleansing, de-duplication and can apply business rules to ensure data integrity. An ODS is
mainly intended to integrate data quite frequently at the lowest granular level for operational reporting in a close to real time data
integration scenario. Normally, an ODS will not be optimized for historical and trend analysis on huge set of data.
• Let's summarize the differences between an ODS and DW:
• An ODS is meant for operational reporting and supports current or near real-time reporting requirements whereas a
DW is meant for historical and trend analysis reporting on a large volume of data
• An ODS is targeted for low granular queries whereas a DW is used for complex queries against summary-level or on
aggregated data
• An ODS provides information for operational, tactical decisions about current or near real-time data acquisition
whereas a DW delivers feedback for strategic decisions leading to overall system improvements
• In an ODS the frequency of data load could be hourly or daily whereas in an DW the frequency of data loads could be
daily, weekly, monthly or quarterly
16

MDM/ODS Architecture Patterns
17

Adastra ODS Principles
Integrated and
consolidated
data
Subject
oriented data
Master data
focus (business
entities)
Changing data
(actual data)
Limited history
data
(transactions)
Low level data
granularity (no
aggregations)
Mix between
OLTP and DWH
„The best from
both worlds“
18

ODS Features
• One version of truth (with different processes presentation)
• Single customer view across all systems / businesses
• Customer Data Integration
• Product Data Integration
• Data cleansing and consolidation (MDM platform)
• Integrated data for other systems or applications (data cache)
• Online access (read and write)
• Quick access to actual data (operational reporting)
• One of component for SOA Architecture (not only)
• Efficient common information exchange among businesses or systems
• One platform for all businesses and IT systems (online and offline processes)
• Data sets from many sources
• Support or replacement for legacy systems
19

ODS Benefits
Business Benefits
• Real-time consolidated and integrated data for any purpose
• More reliable mission critical processes
• Reduce costs on IT solutions
• Single customer view
• Integrated product data
• Enabling multichannel and efficient campaign management
• Data for credit risk management
• Integrated communication across all channels
• Economical network analysis
• Faster collection processes
• Online fraud detection
• Near-real time operational reporting
• Data monetization
Technical Benefits
• One version of truth (with different process presentation)
• Customer Data Integration (CDI)
• Product Data Integration (PDI)
• Integrated data for other systems or applications (data
cache)
• One of central component of SOA Architecture
• Efficient common information exchange among businesses or
systems
• One platform for all businesses and IT systems (online and
offline processes)
• Support or replacement for legacy systems
20

ODS
ADS
(DWH or EDW)
DATA
ONLINE WORLD OFFLINE WORLD
1. Focus on operational processes
2. Online read and write 24/7
3. For other IT systems / prorcesses
4. Limited data set
5. Very limited history
6. Focus on current data
7. Low data granularity
8. Integration with ADS
1. Focus on analytic tasks
2. Offline batch processing
3. For end-users
4. Large data Set
5. Long history
6. Focus on all data
7. Many levelds of data granularity
8. Data marts and data aggregates
21

ODS Data Refresh Time Period
Real-time
Near-real
time
Many times
per day
Daily
Monthly Ad-hoc Hybrid

ODS Data Transformations
• Batch Processing
• ETLs
• Extract, Transform, Load
• Transform data from source table / tables to one target table
• Transformation ETLs, Synchronization ETLs
• Advanced data processing
• Batch data cleansing and unification
• Advanced calculations
• Online Processing
• APIs
• Read APIs
• Write APIs
• Change Data Capture (CDC)

Database provider’s
competency
Consumer’s competencyConsumer’s competency
System independency – Reason for API
24
Database
External Data Consumer
Database
External Data Consumer
Interface layer
Concentrated transformation logics
Enterprise level impact analysis required
External workload consumers

Service layer agreement (SLA)
• A definition of services
• Availability (99.99%)
• Open hours (24x7, 10x5)
• Performance
• Problem management
• Security
• Disaster recovery
• Termination of agreement
25
Availability % Downtime per year
98% 7.30 days
99% 3.65 days
99.5% 1.83 days
99.9% 8.76 hours
99.99% 52.6 min
99.999% 5.26 min
99.9999% 31.5 s

Case Study #1 (2x)
Velká česká banka
Nová česká banka
Transakční integrace

28
ADQC
Server
Workflow Scheduler
ETL Workflow Server
APIsOracle
Plugin
WebServices
Plugin
APIs
ELTsELTs ELTs
EnterpriseServiceBus
OtherSystems
Velká česká banka (2006)
Oracle DB

29
ELT Server
APIsOracle
Plugin
ELTsELTs ELTs
EnterpriseServiceBus
OtherSystems
Nová česká banka (2011)
Oracle DB

Datové domény
Produkty 3.
stran
Oddlužnění ETM Nabídky Žádosti Souhlasy
Klasifikace
Ekonomické
skupiny
Kampaně Produkty Segmentace
Behaviorální
data
Externí data
Identifikace
klienta
Podpisová
oprávnění
Kontaktní
údaje
Unifikace Ostatní
30

0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
10000000
20090207
20090314
20090418
20090523
20090629
20090803
20090907
20091015
20091119
20091224
20100128
20100304
20100408
20100513
20100617
20100722
20100826
20100930
20101104
20101209
20110113
20110217
20110324
20110429
20110603
20110708
20110812
20110916
20111022
20111126
20111231
20120204
20120309
20120413
20120518
20120622
20120727
20120831
20121006
20121110
20121215
20130119
20130223
20130330
20130504
20130608
20130713
20130817
20130921
20131026
20131201
20140105
20140209
20140316
Daily API Calls
31

ODS API Write Ratio
32
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
70,00%
80,00%
90,00%
100,00%
20090207
20090314
20090418
20090523
20090629
20090803
20090907
20091015
20091119
20091224
20100128
20100304
20100408
20100513
20100617
20100722
20100826
20100930
20101104
20101209
20110113
20110217
20110324
20110429
20110603
20110708
20110812
20110916
20111022
20111126
20111231
20120204
20120310
20120414
20120519
20120623
20120728
20120901
20121007
20121111
20121216
20130120
20130224
20130331
20130505
20130609
20130714
20130818
20130922
20131027
20131202
20140106
20140210
20140317
20140422

Instance Party
Unified PartyLocated Address
Instance Address Instance Phone
Unified Phone
Account
Product Instance
Product Instance Party Role
Application
Account Balance Fact
Account Role
Product Instance Relationship
Loan Instance
Facility Instance
Business Product Type
ODS Core Tables
(ABDM)
Card Instance
... Instance
Instance Email
Instance ID Card
Application Detail

Benefits
Business Benefits
• Real-time consolidated and integrated data for any purpose
• More reliable mission critical processes
• Reduce costs on IT solutions
• Single customer view
• Integrated product data
• Enabling multichannel and efficient campaign management
• Data for credit risk management
• Integrated communication across all channels
• Economical network analysis
• Faster collection processes
• Online fraud detection
• Near-real time operational reporting
• Data monetization
Technical Benefits
• One BI version of truth (with different process
presentation)
• Customer Data Integration (CDI)
• Product Data Integration (PDI)
• Integrated data for other systems or applications (data
cache)
• One of central component of SOA Architecture
• Efficient common information exchange among businesses or
systems
• One platform for all businesses and IT systems (online and
offline processes)
• Support or replacement for legacy systems 34

Case Study #2
Ještě větší česká banka
CDC replikace

CDC Real-time ODS
 Různí replikační agenti pro platformy zOS, Oracle, MSSQL.
 Redology  PWX listener  PWX logger  condense file 
CDC session  L0  L1  webservice.
36
Redolog
CDC agentAdviser 1
TABLE1
TABLE2
Source system
database
RTODS
writedata
Core banking
system
write
data
Real time
ETL
Layer L0
Layer L1
triggerETL
Investigates
redolog and
transfer
changes
RTODS
webservices
Adviser 2Unified
workspace
2,5M callů denně
Unifikace v CRM
Oracle DB

Datové domény
Běžné účty /
Deposita
Úvěry Karty Pojištění Služby
Produkty třetích
stran (Energie,
Telco,..)
Transakce Rezervace/Blokace Klienti Žádosti o produkty
Žádosti o procesní
zpracování
Kontakty Zajištění Eventy
38

Přínosy
Konsolidace dat z
mnoha BE
Odlehčení
middleware
Zrychlení odezvy
front end
aplikacím.
Zajištění vysoké
dostupnosti služeb.
Online interface
pro DWH.
Detekce událostí
Datový rozcestník
do BE
Kratší čas a méně
úsilí pro dodávku
požadavků.
Bez složité procesní
integrace
Propis dat je mimo
účetní uzávěrky
opravdu rychlý.

Case Study #3
Retail řetězec se sekačkami
Disková replikace

41
WEB Services
WEB Services
CRM
Vrstva L0
eShop
Vrstva L1
Navision
Rozhraní pro návazné systémy
CRM eShop
Metadata
Adresář
MS SQL Server 2012
OLE DB
OLE DB
Navision
ODS
Navision
Diskový svazek
pro NAV
Snapshot
svazku
SQL Server 2012 SQL Server 2012
ODS
Agent diskového pole
Diskové pole
Připojení svazku k
serveru
Metadata
Konec ETL
SQL Server Agent
Odpojení
svazku
Start ETL

Přínosy
Uvolnění zátěže
primárního systému
Integrace e-shopů
Podpora pro věrnostní
program
Snadnější integrace
nových systémů
Zpřehlednění datových
toků
Jedna verze pravdy pro
návazné systémy i
zákazníky na webu
Přímý přístup k datům
prostřednictvím
databázových
snapshotů
Webové služby
•metody s online přístupem
•metody pro synchronizaci
dat
42

One ODS to rule them all!
… or two…three…

Prague data management meetup 2017-02-28

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Prague data management meetup 2017-02-28

Similar to Prague data management meetup 2017-02-28 (20)

More from Martin Bém

More from Martin Bém (20)

Recently uploaded

Recently uploaded (20)

Prague data management meetup 2017-02-28