1
Webinar:
Automation of Data Warehouses / Lakes / Stores
About the speaker
Our competencies
Company stats
This webinar
begins shortly
Petr Hájek
Senior Advisor
Information Management
SOFTWARE DEVELOPMENT
APPLICATION OUTSOURCING
ENTERPRISE INTEGRATION
BUSINESS INTELLIGENCE/DWH
BIG DATA AND DATA SCIENCE
22+ yrs.
On the
tech market
since 1998.
Prague
Headquarters
at cenrte of
Europe.
500+
Experienced
and enthusiastic
professionals.
Top 3
CAD company
in Czech Republic
(IDC study).
28M €
Company
revenue in
2019.
Multiple areas
Clients from
Finance, Insurance
and Telco industry.
– Profinit Data Management
Competency Senior Advisor
– 20+ years of professional
experience in Data Management,
Data Warehousing and Business
Intelligence
– Data Architect, Solution Architect,
Management Consultant
– Citigroup, KPMG
Petr Hájek February 3, 2021
Webinar:
Automation of Data Warehouses /
Data Lakes / Data Stores
About the Lecturer
› Profinit Data Management
Competency Senior Advisor
› 20+ years of professional
experience in Data Management,
Data Warehousing and Business
Intelligence
› Data Architect, Solution Architect,
Management Consultant
› Citigroup, KPMG
Petr Hájek
Seasoned and Windswept
Information Management
Professional
4
About Profinit
ISO 9000 ISO 27000 ISO 20000
Our competencies
Company stats
SOFTWARE DEVELOPMENT
APPLICATION OUTSOURCING
ENTERPRISE INTEGRATION
BUSINESS INTELLIGENCE/DWH
BIG DATA AND DATA SCIENCE
22+ yrs.
On the
tech market
since 1998.
Prague
Headquarters
at cenrte of
Europe.
500+
Experienced
and enthusiastic
professionals.
Top 3
CAD company
in Czech Republic
(IDC study).
28M €
Company
revenue in
2019.
Multiple areas
Clients from
Finance, Insurance
and Telco industry.
Certifications, culture & quality
50+
We serve
many prominent
world clients.
A long history of technical engineering excellence has lead western
companies to rely heavily on skills and expertise from the Czech
Republic. We are proud of the quality of our services and the certificates
ISO 9001, ISO 27001, ISO 20000, PRINCE 2, underpinning our
commitment to provide high quality sustainable services.
5
What are the specifics of “Data Oriented Solutions” ?
› Data Warehouses (DWH)
› Operational Data Stores (ODS)
› Data Lakes (DL)
› Accummulation of Large and
Historical Data from
Heterogenous Sources
› High Complexity & Robustness
6
Data oriented solutions are usually:
UNDOCUMENTED FRAGILE
COMPLICATED
7
What are the major challenges?
› Manual coding
› Too many people to
be organized
› Lack of transparency
8
Reduce the complexity: elementary building blocks
Data Oriented Solutions are:
orchestrated steps of transformations and storage of data
Data Transformation Data Transformation Data Transformation Data
9
Decomposition to Metadata / Rules / Templates
Metadata Templates
Rules
10
Generating scripts like printing envelopes
Everything that
is variable (dynamic)
will be written in the
form of metadata.
Everything that
is repeated (static)
will be prepared in
templates.
Metadata Templates
Rules
Generator
11
Framework
FRAMEWORK DATA SOLUTION
Metadata Templates
Rules
Smart
Automation
Reverse
engineering
12
Investment into the framework
TIME &
COSTS
SCOPE, QUANTITY, COMPLEXITY, ROBUSTNESS
FRAME
WORK
SETUP
Gartner:
200
%
-
400
%
productivity
increase
Source:
Automating
Data
Warehouse
Development,
2020,
G00465794
13
Transformations Patterns
Technical
› Surrogate key assignment
› Consolidation (Deduplication)
› Historization
Mapping
› 1:1
› Column level mapping
› Join & Lookup
› Aggregation
› Union
14
Maintain the framework and automate the lifecycle!
BUSINESS
ANALYST
TECHNICAL
ANALYST
FRAMEWORK
ENGINEER
OPERATIONS
& SUPPORT
BUSINESS
USERS
Business
specs
Prepare
METADATA
15
Enable agile development
BUSINESS
ANALYST
TECHNICAL
ANALYST
BUSINESS
USER
SQUAD
5
BUSINESS
ANALYST
TECHNICAL
ANALYST
BUSINESS
USER
SQUAD
4
BUSINESS
ANALYST
TECHNICAL
ANALYST
BUSINESS
USER
SQUAD
3
BUSINESS
ANALYST
TECHNICAL
ANALYST
BUSINESS
USER
SQUAD
2
FRAMEWORK
ENGINEER
OPERATIONS
& SUPPORT
BUSINESS ANALYST
TECHNICAL ANALYST
BUSINESS USER
SQUAD
1
16
Case Study – DWH for Gambling Industry Regulation
 Data from ~80 public gambling companies loaded every 8 hours
 Maintaining over 25 TB of data in >100.000 database tables
 First version delivered in 5 months by a team of 5 engineers
 99.9 % of SQL code automatically generated
 "DATA_FRAME" - DWH Automation methodology & toolset
17
Advantages of Smart Data Automation
We do the analysis by reusing pre-defined patterns
The result of the analysis are always both human and machine-readable
We foster prototyping
We enjoy the „license to make mistakes“ at almost no cost
We provide immediate feedback in case of errors in the designed metadata
We eliminate manual coding
We streamline the whole development lifecycle and minimize time-to-market
We enable data lineage tracking even before the solution is implemented
18
Traditional Approach vs. Smart Automation
UNDOCUMENTED FRAGILE
TRANSPARENT AGILE
COMPLICATED
ORGANIZED
19
Questions?
20
Webinar:
Automation of Data Oriented Solutions
We need your help to be better!
The webinar
has ended.
Thank you
very much for
attending!
Since you are here, please help us
improve our events and vebinars and take
a look at our short survey. We appreciate
your interest to help us grow.
Contacts
www.bigdataforbanking.com
linkedin.com/company/profinit
petr.hajek@profinit.eu
www.profinit.eu
Petr Hájek
Senior Advisor
Information Management
Profinit EU, s.r.o.
Tychonova 2, 160 00 Prague 6 | Phone + 420 224 316 016
Web
www.profinit.eu
LinkedIn
linkedin.com/company/profinit
Twitter
twitter.com/Profinit_EU
Facebook
facebook.com/Profinit.EU
Youtube
Profinit EU
Thank you
for your attention

Automating Data Lakes, Data Warehouses and Data Stores

  • 1.
    1 Webinar: Automation of DataWarehouses / Lakes / Stores About the speaker Our competencies Company stats This webinar begins shortly Petr Hájek Senior Advisor Information Management SOFTWARE DEVELOPMENT APPLICATION OUTSOURCING ENTERPRISE INTEGRATION BUSINESS INTELLIGENCE/DWH BIG DATA AND DATA SCIENCE 22+ yrs. On the tech market since 1998. Prague Headquarters at cenrte of Europe. 500+ Experienced and enthusiastic professionals. Top 3 CAD company in Czech Republic (IDC study). 28M € Company revenue in 2019. Multiple areas Clients from Finance, Insurance and Telco industry. – Profinit Data Management Competency Senior Advisor – 20+ years of professional experience in Data Management, Data Warehousing and Business Intelligence – Data Architect, Solution Architect, Management Consultant – Citigroup, KPMG
  • 2.
    Petr Hájek February3, 2021 Webinar: Automation of Data Warehouses / Data Lakes / Data Stores
  • 3.
    About the Lecturer ›Profinit Data Management Competency Senior Advisor › 20+ years of professional experience in Data Management, Data Warehousing and Business Intelligence › Data Architect, Solution Architect, Management Consultant › Citigroup, KPMG Petr Hájek Seasoned and Windswept Information Management Professional
  • 4.
    4 About Profinit ISO 9000ISO 27000 ISO 20000 Our competencies Company stats SOFTWARE DEVELOPMENT APPLICATION OUTSOURCING ENTERPRISE INTEGRATION BUSINESS INTELLIGENCE/DWH BIG DATA AND DATA SCIENCE 22+ yrs. On the tech market since 1998. Prague Headquarters at cenrte of Europe. 500+ Experienced and enthusiastic professionals. Top 3 CAD company in Czech Republic (IDC study). 28M € Company revenue in 2019. Multiple areas Clients from Finance, Insurance and Telco industry. Certifications, culture & quality 50+ We serve many prominent world clients. A long history of technical engineering excellence has lead western companies to rely heavily on skills and expertise from the Czech Republic. We are proud of the quality of our services and the certificates ISO 9001, ISO 27001, ISO 20000, PRINCE 2, underpinning our commitment to provide high quality sustainable services.
  • 5.
    5 What are thespecifics of “Data Oriented Solutions” ? › Data Warehouses (DWH) › Operational Data Stores (ODS) › Data Lakes (DL) › Accummulation of Large and Historical Data from Heterogenous Sources › High Complexity & Robustness
  • 6.
    6 Data oriented solutionsare usually: UNDOCUMENTED FRAGILE COMPLICATED
  • 7.
    7 What are themajor challenges? › Manual coding › Too many people to be organized › Lack of transparency
  • 8.
    8 Reduce the complexity:elementary building blocks Data Oriented Solutions are: orchestrated steps of transformations and storage of data Data Transformation Data Transformation Data Transformation Data
  • 9.
    9 Decomposition to Metadata/ Rules / Templates Metadata Templates Rules
  • 10.
    10 Generating scripts likeprinting envelopes Everything that is variable (dynamic) will be written in the form of metadata. Everything that is repeated (static) will be prepared in templates. Metadata Templates Rules Generator
  • 11.
    11 Framework FRAMEWORK DATA SOLUTION MetadataTemplates Rules Smart Automation Reverse engineering
  • 12.
    12 Investment into theframework TIME & COSTS SCOPE, QUANTITY, COMPLEXITY, ROBUSTNESS FRAME WORK SETUP Gartner: 200 % - 400 % productivity increase Source: Automating Data Warehouse Development, 2020, G00465794
  • 13.
    13 Transformations Patterns Technical › Surrogatekey assignment › Consolidation (Deduplication) › Historization Mapping › 1:1 › Column level mapping › Join & Lookup › Aggregation › Union
  • 14.
    14 Maintain the frameworkand automate the lifecycle! BUSINESS ANALYST TECHNICAL ANALYST FRAMEWORK ENGINEER OPERATIONS & SUPPORT BUSINESS USERS Business specs Prepare METADATA
  • 15.
  • 16.
    16 Case Study –DWH for Gambling Industry Regulation  Data from ~80 public gambling companies loaded every 8 hours  Maintaining over 25 TB of data in >100.000 database tables  First version delivered in 5 months by a team of 5 engineers  99.9 % of SQL code automatically generated  "DATA_FRAME" - DWH Automation methodology & toolset
  • 17.
    17 Advantages of SmartData Automation We do the analysis by reusing pre-defined patterns The result of the analysis are always both human and machine-readable We foster prototyping We enjoy the „license to make mistakes“ at almost no cost We provide immediate feedback in case of errors in the designed metadata We eliminate manual coding We streamline the whole development lifecycle and minimize time-to-market We enable data lineage tracking even before the solution is implemented
  • 18.
    18 Traditional Approach vs.Smart Automation UNDOCUMENTED FRAGILE TRANSPARENT AGILE COMPLICATED ORGANIZED
  • 19.
  • 20.
    20 Webinar: Automation of DataOriented Solutions We need your help to be better! The webinar has ended. Thank you very much for attending! Since you are here, please help us improve our events and vebinars and take a look at our short survey. We appreciate your interest to help us grow. Contacts www.bigdataforbanking.com linkedin.com/company/profinit petr.hajek@profinit.eu www.profinit.eu Petr Hájek Senior Advisor Information Management
  • 21.
    Profinit EU, s.r.o. Tychonova2, 160 00 Prague 6 | Phone + 420 224 316 016 Web www.profinit.eu LinkedIn linkedin.com/company/profinit Twitter twitter.com/Profinit_EU Facebook facebook.com/Profinit.EU Youtube Profinit EU Thank you for your attention