Big Data Expo Utrecht – Thursday September 20th 2018
Is there a light at the end of the tunnel for the
information miner digging for data gold?
By Dave Vanhoudt
introduction
about me
competences & experiences through various positions, from employee over consultant
to business owner – in variety of environments for a variety of customers
active in Data & Analytics since early 2000
likes to be a jack of all trades
connecting the dots between business & technology
helps organizations
to improve their all-over decision-making process and drive change through the smart
usage of D&A technologies, methodologies & latest trends and innovations
Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018 2
agenda
introduction
drowning in data, flooded by technology
and still starving for knowledge
01
02
03
04
the necessity of automation
to solve the problem, we need a different
kind of thinking
getting started
about the pain & gains of implementing a
data warehouse solution
summary
key messages
3Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
- John Naisbitt
Drowning in data and flooded
by technology, but still
starving for knowledge...
Maybe George Orwell was right?
5
society is unprepared for knowledge extraction and
the demands for faster decision-making that our
customers and markets, our planet, will require for
high performance, competitive advantage and likely
survival in the future
- James Canton
6Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
data heaven
what would it be like?
data
dictatorship
data desert
data
democracy
data anarchy
control
agility
I like this spot, and
you?
data
democracy
7Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
we need diversity of thought in the world to
face the new challenges
- Tim Berners-Lee
8Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
9Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
data warehouse automation
a definition & the general business case
What?
the process of accelerating & automating the
data warehouse development cycles
more than simply automating the development process (source system analysis,
design, development, testing, deployment, operations, impact analysis, change
management, documentation)
while assuring quality & consistency
Why?
aimed at improving productivity, cost reduction
and an overall quality improvement
10Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
data warehouse automation
purple positioning to enable agility but remain in control (data democracy)
11
DWA
proto-type friendly
re-usable
DWH platform leverage
scalable
people
knowledge
business
technology
Traditional ETL & DWH
(Informatica, SSIS, ...)
control
enterprise aligned
good data quality
not fast
not cheap
Desktop ETL
(e.g. Alteryx)
agile / flexible
desktop spread
data quality @ risk
fast
relatively cheap
Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
data warehouse automation
are classic ETL tools soon to be something of the past?
the need to integrate data was never this high, and debugging &
maintaining code as important as ever
the link with traditional sources remains vital for vital business reference points
regardless how impressive smart ETL jobs, a data warehouse
automation tool can do this more efficiently
but the use of native code allows for an easy integration with many different
languages & technologies
data warehouse automation frees & elevates the thinking where
it needs to be (solving the critical business questions)
focus upon source & target models
out-of-the box living documentation
free from constraints & requirement towards system architecture
12Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
It only serves to show what sort of person a man must be who can’t
even get testimonials. No, if a man brings references, it proves nothing.
But if he can’t, it proves a great deal.
- Joseph Pulitzer
14Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
getting started
the challenges
requirements are rapidly changing
data sources become more diverse & complex
more demanding business-facing capabilities
15Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
the process is way
too costly
it takes too long to
build a data
warehouse
changes (after
deployment) are very
hard to make
with the implementation of WhereScape we no
longer want to throw bodies at the problem but use
brains instead and automate as much as possible
17Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
getting started
no bird soars too high, if he soars with his own wings
– The project will perform a lift and shift” of the SQL Server
architecture on to Teradata using WhereScape as ETL
engine instead of SSIS
– The first iteration will tackle the WMS system for a NDC
– Purchase WhereScape during Summer 2015
– Kick-off October 2015
– Deployment process to UA operational per January 4th
2016
18Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
19
getting started
work managed in a consistent and robust manner
Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
20
getting started
coping with change
Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
21
getting started
a future proof approach
– Repeatable metadata driven
configuration of individual data flows
based on predefined patterns
– Core deliverable is metadata which is
being intrepreted by WhereScape
– New or updated functionality is
versioned & deployed from
Development to Acceptance to
Production using a fully automated
process (based on the DevOps
philosophy)
– The underlying database is not only
used as the data storage platform but
also as the data processing engine
providing fully auditable code & total
control
methodology
inflow factory idea
– We throw away all the code but keep
the metadata
– Install & tune the new cloud templates
– Metadata needs some one-off
treatments (eg. move from old
repository to new repository, convert
from on-prem DB to cloud DB
compliant SQL, link metadata to new
templates, ...)
– Generate native new code using
interface
re-usability
leverage of past
work & investments
– This inflow factory can easily be
adapted to enable an alternative (on-
prem /cloud) platform as target with
the same inflow architecture but
different technical choices (eg. loading
patterns, data structures, ...)
– All design patterns are abstracted for
end-user(s) and implemented with
templates
– A one-off exercise is to convert the
templates (eg. staging loading, EDW
loading, new data structures)
– Same process as before to configure
business rules & design data flows
flexibility
switching to another
platform
Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
+
increased development speed at
high quality to safeguard resource
allocation
01
02
03
04
05
better business insights thru
intelligent automation that
improves BI effectiveness
core DNA knowledge secured
a clear & consistent way of
working improving
communication & collaboration
documentation, auditability &
impact analysis
22Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
-
capex investment (buy versus
build)01
02
03
04
05
learning curve (size / complexity
of project & team setup)
evaluate best fit for purpose
re-think (organization specific)
best practices
new technology doesn’t match
with old way of thinking
23Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
key message
a solution for cumbersome, labor-intensive & low value work
Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018 24
25Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
one machine can do the work of 50 ordinary
(wo)men – no machine can do the work of 1
extraordinary (wo)man
key message
it is an instrument & facilitator of change
Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018 26
thank you

Bicos - Hear how a top sportswear company produced cutting-edge data infrastructures

  • 1.
    Big Data ExpoUtrecht – Thursday September 20th 2018 Is there a light at the end of the tunnel for the information miner digging for data gold? By Dave Vanhoudt
  • 2.
    introduction about me competences &experiences through various positions, from employee over consultant to business owner – in variety of environments for a variety of customers active in Data & Analytics since early 2000 likes to be a jack of all trades connecting the dots between business & technology helps organizations to improve their all-over decision-making process and drive change through the smart usage of D&A technologies, methodologies & latest trends and innovations Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018 2
  • 3.
    agenda introduction drowning in data,flooded by technology and still starving for knowledge 01 02 03 04 the necessity of automation to solve the problem, we need a different kind of thinking getting started about the pain & gains of implementing a data warehouse solution summary key messages 3Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 4.
    - John Naisbitt Drowningin data and flooded by technology, but still starving for knowledge...
  • 5.
    Maybe George Orwellwas right? 5
  • 6.
    society is unpreparedfor knowledge extraction and the demands for faster decision-making that our customers and markets, our planet, will require for high performance, competitive advantage and likely survival in the future - James Canton 6Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 7.
    data heaven what wouldit be like? data dictatorship data desert data democracy data anarchy control agility I like this spot, and you? data democracy 7Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 8.
    we need diversityof thought in the world to face the new challenges - Tim Berners-Lee 8Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 9.
    9Big Data ExpoUtrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 10.
    data warehouse automation adefinition & the general business case What? the process of accelerating & automating the data warehouse development cycles more than simply automating the development process (source system analysis, design, development, testing, deployment, operations, impact analysis, change management, documentation) while assuring quality & consistency Why? aimed at improving productivity, cost reduction and an overall quality improvement 10Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 11.
    data warehouse automation purplepositioning to enable agility but remain in control (data democracy) 11 DWA proto-type friendly re-usable DWH platform leverage scalable people knowledge business technology Traditional ETL & DWH (Informatica, SSIS, ...) control enterprise aligned good data quality not fast not cheap Desktop ETL (e.g. Alteryx) agile / flexible desktop spread data quality @ risk fast relatively cheap Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 12.
    data warehouse automation areclassic ETL tools soon to be something of the past? the need to integrate data was never this high, and debugging & maintaining code as important as ever the link with traditional sources remains vital for vital business reference points regardless how impressive smart ETL jobs, a data warehouse automation tool can do this more efficiently but the use of native code allows for an easy integration with many different languages & technologies data warehouse automation frees & elevates the thinking where it needs to be (solving the critical business questions) focus upon source & target models out-of-the box living documentation free from constraints & requirement towards system architecture 12Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 14.
    It only servesto show what sort of person a man must be who can’t even get testimonials. No, if a man brings references, it proves nothing. But if he can’t, it proves a great deal. - Joseph Pulitzer 14Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 15.
    getting started the challenges requirementsare rapidly changing data sources become more diverse & complex more demanding business-facing capabilities 15Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 16.
    the process isway too costly it takes too long to build a data warehouse changes (after deployment) are very hard to make
  • 17.
    with the implementationof WhereScape we no longer want to throw bodies at the problem but use brains instead and automate as much as possible 17Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 18.
    getting started no birdsoars too high, if he soars with his own wings – The project will perform a lift and shift” of the SQL Server architecture on to Teradata using WhereScape as ETL engine instead of SSIS – The first iteration will tackle the WMS system for a NDC – Purchase WhereScape during Summer 2015 – Kick-off October 2015 – Deployment process to UA operational per January 4th 2016 18Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 19.
    19 getting started work managedin a consistent and robust manner Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 20.
    20 getting started coping withchange Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 21.
    21 getting started a futureproof approach – Repeatable metadata driven configuration of individual data flows based on predefined patterns – Core deliverable is metadata which is being intrepreted by WhereScape – New or updated functionality is versioned & deployed from Development to Acceptance to Production using a fully automated process (based on the DevOps philosophy) – The underlying database is not only used as the data storage platform but also as the data processing engine providing fully auditable code & total control methodology inflow factory idea – We throw away all the code but keep the metadata – Install & tune the new cloud templates – Metadata needs some one-off treatments (eg. move from old repository to new repository, convert from on-prem DB to cloud DB compliant SQL, link metadata to new templates, ...) – Generate native new code using interface re-usability leverage of past work & investments – This inflow factory can easily be adapted to enable an alternative (on- prem /cloud) platform as target with the same inflow architecture but different technical choices (eg. loading patterns, data structures, ...) – All design patterns are abstracted for end-user(s) and implemented with templates – A one-off exercise is to convert the templates (eg. staging loading, EDW loading, new data structures) – Same process as before to configure business rules & design data flows flexibility switching to another platform Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 22.
    + increased development speedat high quality to safeguard resource allocation 01 02 03 04 05 better business insights thru intelligent automation that improves BI effectiveness core DNA knowledge secured a clear & consistent way of working improving communication & collaboration documentation, auditability & impact analysis 22Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 23.
    - capex investment (buyversus build)01 02 03 04 05 learning curve (size / complexity of project & team setup) evaluate best fit for purpose re-think (organization specific) best practices new technology doesn’t match with old way of thinking 23Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 24.
    key message a solutionfor cumbersome, labor-intensive & low value work Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018 24
  • 25.
    25Big Data ExpoUtrecht – The role of automation in a data infrastructure strategy - September 20th 2018
  • 26.
    one machine cando the work of 50 ordinary (wo)men – no machine can do the work of 1 extraordinary (wo)man key message it is an instrument & facilitator of change Big Data Expo Utrecht – The role of automation in a data infrastructure strategy - September 20th 2018 26
  • 27.