Big Data, Analytics and
4th Generation Data Warehousing
Martyn Jones
Big Data Spain 2015
agenda
∙ Imperatives.
∙ Data value chains.
∙ Resources.
∙ 4th Generation Data
Warehousing.
∙ Analytics Data Store / Big Data.
∙ Information Supply Framework.
Friday 16th 12:30 - 13:15
#BDS15
Room 25 – Technical
0 5 10 15 20 25 30 35 40 45#BDS15
Quote, Unquote
"It is not consciousness of men that
determines their being, but, on the
contrary, their social being that
determines their consciousness.“
Karl Marx
business background
Media presence
Twitter @GoodStratTweet
http://www.goodstrat.com http://www.linkedin.com/grp/home?gid=8338976
http://www.itworld.com/blog/it-circus/
Quote, Unquote
“Do Big Data initiatives require a business case? If so, have you ever seen one?” –
Joseph Cotter, UK
“Big data - reinventing the wheel every day with a new and slightly different value for
Pi.” – Karl Snowsill, Australia
“The Big Data Contrarians - A place where you can find a way to cut through BIG Bull…”
– Sanjay Pandey, Canada
“"If you had all the answers in the world...what would your question be?“ - Yves de
Hondt – Belgium
“Big Data in bite size sessions - walk this way !!” – Steve Scholes, MBA, UK
“The only sane spot in the Big Data asylum.” – Dominic Vincent Ligot, Phillipines
“Enforcing strict limits on koolaid consumption” – Gary Anderson, USA
the ages of data
B . C . L i f e o f B r i a n A . D .
C h a n g eI n s i g h t
P o t e n t i a l l y
u s e f u l
Simplicity
A b u n d a n t
V o l u m e V e l o c i t y V a r i e t y
framework
O b t a i n I n t e g r a t e A n a l y s e P r e s e n t
D A T A
D A T A
D A T A
the road to Big Data success…
S t r a t e g i c
T a c t i c a l
O p e r a t i o n a l A n a l y t i c s
A r c h i t e c t e d
M a n a g e d
I n t e g r a t i o n
D a t a
scope
BIZ DATA DW
BIG
DATA
STATS PRES
Business Imperatives
A good place to start
what’s important to business?
BE
NOTICED
CASH
FLOW
BE
NOTICED
CASH
FLOW
BE
NOTICED
CASH
FLOW
what else is important to business?
Market share
Differentiation
Ability to execute
Liquidity
Profitability
Time and place utility
React to
competitive threats
Enhance service
scope
Improving customer
service
Respond to price
pressure
Segmentation of n
Addressing short-term
attention spans
Ability to respond to
irrationality
Be noticed
Cash flow
Risk
Legislation
No pressBad press
Customer
centricity
Front office
empowerment
Excellence
Channel
excellence
Operational
excellence
Product
excellence
Cultures
IT business
value
Base protection
Expansion
Diversification
Consolidation
Augmented Competitive Forces
Competition from
within the industry
Suppliers Buyers
Replacements
Potential entrants
Threat of replacement
product or service
Threat of new
entrants
Bargaining
power
Bargaining
power
Sources: Michael Porter;Martyn R Jones
and others
Rivalry with
existing
competitors
Pressure groups
Media
Government
Power to
change the game
Exposure
McKinsey 7S Framework
Culture
differentiated capabilities
operating models
Customer segments
Channels
Products
Services
Organsational design
Processes
Data & information
Physical assets
Development
Deployment
Organsational design
Performance management
Information technology
Business
model
Operating
model
People
model
Customers
Systems People
Processes Organisation
objectives
1. Information awareness corresponding
to areas of operation and spheres of
control
2. Comprehensive data and information
supply framework
3. Continually seek to maintain and then
improve data’s contribution to
business
Business data everywhere
Where, when, what, who, why... how?
Data
I n t e r n a l P a s t
E x t e r n a l P r e s e n t
S h a r e d F u t u r e
Data
O p e r a t i o n a l O n l i n e
B i g D a t a A r c h i v e d
D a r k D a t a U n m a n a g e d
Data
A r c h i v e s S o c i a l M e d i a
D o c u m e n t s M a c h i n e L o g
M e d i a S e n s o r
B u s i n e s s
A p p l i c a t i o n s
D a t a
S t o r a g e
P u b l i c W e b
Activities, Abstractions and Relations
Velocity
Volume
Variety
Adequacy
Ambiguity
Small
Availability
Accuracy
Relevance
Persistence
Reliability
Value
Obtuseness
Listo
Complexity
Utility
Descriptiveness
Big
Velocidad
Volumen
Variedad
Adecuación
Ambigüedad
Precisión
Disponibilidad
Exactitud
Relevancia
Persistencia
Confiabilidad
Valor
Obtuso
Smart
Complejidad
Utilidad
Descriptivo
Grande
D a t a
Facets of Big DataFacets of Data
B I G D A T A
I n t e r n e t o f
T h i n g s
C L O U D
S t a t i s t i c s
D a t a
W a r e h o u s i n g
P r e s e n t a t i o n
D a t a S u p p l y F r a m e w o r k
The Data Warehouse
25 years... of sometimes getting it right
Enterprise Data Warehousing – AS IS
S u b j e c t
o r i e n t e d
S t r a t e g i c
d e c i s i o n m a k i n g
I n t e g r a t e d
T i m e
v a r I a n t
N o n – v o l a t i l e
Operational Systems Data Warehouse
Purchasing
HR
Credit
Order
Processing
Marketing
SalesLogistics
Billing
Arrangements
ProductsParty
TimeGeography
Transactions
Subject oriented
Operational Systems Data Warehouse
Euro Account Customer:
Customer: Village Bank GmbH
Country code: D
Mutual Fund Customer:
Customer: Village Bankers
Region: Westphalia
NTIP Customer:
Customer: Village Bank International
Country: Germany
Account:
Number Customer Type
230956 441353 Euro
010555 441353 MF
291284 441353 NTIP
Party:
Number: 100441353
Name: Village Bank GmbH
Country: Germany
Integrated
Operational Systems Data Warehouse
0
10
20
30
40
50
60
70
80
90
100
Trading Activity Snapshots:
Date Security Amount
2006.09.01 MartyBank 79.000.000
2006.09.02 MartyBank 92.000.000
2006.09.03 MartyBank 44.000.000
2006.09.04 MartyBank 39.000.000
2006.09.05 MartyBank 80.000.000
Trading Activity: MartyBank
Time variant
Operational Systems Data Warehouse
Order
Processing
Create
Replace
Update Delete
Orders
Read Read
Read ReadWrite
Read
Non-volatile
Strategic decision support
Supporting strategy formulation,
choice and execution
Data Warehousing 2.0
Data Sources
StructuredData
ETL
Extract
Transform
Load
Internal
ODS
ODS
EDW
ETL
Extract
Transform
Load
Data Marts
StructuredData
Unstructured
Data
Mart
Data
Mart
Report Repository
Reports &
Extracts
Stats
Dataselectionandrepresentation
Dataanalytics
Reportsetandextractcreation
Service
Push/PullTechnology
Visualisation
Annotation
Users
Internal
Clients
Otherstakeholders
Metadata, Workflow/Process Control and CIW Management
Metadata Process
ÊDW
Management
Staging
Staged
Data
EDW
Unstructured
EDW
Data
Mart
StructuredData
Unstructured
The Data Warehouse
25 years... of sometimes getting it right… and
wrong
Enterprise Data Warehousing – AS A BODGE
G e t d a t a
W o n d e r w h y i t ‘ s n o t
m e e t i n g e x p e c t a t I o n s
D u m p d a t a
Q u e r y d a t a V i s u a l i s e d a t a
Enterprise Data Warehousing – AS A BODGE
DW BODGER TEAM HADOOP TEAM
We built a data dog house using Oracle and
IBM technology and we called it a data
warehouse
We can do data warehousing too and it will be
cheaper, faster and smarter
Data Supply Framework
A data architecture for data sourcing,
transformation, integration, storage, search,
analysis and presentation
Data Supply Framework
Operational
Data Store
Data
Warehouse
Business
Intelligence
Data
logistics
Operational
applications
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
All
information
and data
consumers
All
information
consumers
All digital
data
All data processing, enrichment
and information creation
Internal
digital data
Data Supply Framework
External
digital data
Data logistics
Operational
Data Store
Data
Warehouse
Analytics
Data Store
Data Marts
Statistical
Analysis
Business
Intelligence
Scenarios
Data logistics
Primary data flow
Secondary data flow
Operational
applications
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
EDW
ADS
DM
DM
DM
Statistical
analysis
ETL
T/ETL
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Message
Adapter
Message
Queue
OLTP
Staging
ODS
ETLT/ETL
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
TL
Data Supply Framework
Data Sources 4th Generation Data Warehousing
Data Sources Core Statistics
Cambriano Energy 2015
Core Data Sourcing
Comprehensive data acquisition and
transformation
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DW 3.0 Information Supply Framework
Cambriano Energy 2015
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Core Data Sourcing
•Most business data is highly structured
•Most business Big Data is web related
•There is a growing collection of tools for
capturing, transforming and moving
both
•The closer to the money that your data
is, the higher its potential value
Core Data Sourcing
•Most business data is highly structured
•Most business Big Data is web related
•There is a growing collection of tools for
capturing, transforming and moving
both
•The closer to the money that your data
is, the higher its potential value
4th Generation
Data Warehousing
Providing a solid foundation for strategic,
tactical and operational decision making
Enterprise Data Warehousing – 4 GEN
S u b j e c t
o r i e n t e d
S t r a t e g i c ,
t a c t i c a l & o p e r a t i o n a l
s u p p o r t
I n t e g r a t e d
T i m e v a r i a n c e &
t i m e p e r s p e c t i v e s
C o n s t r a i n e d
v o l a t i l i t y
C l a s s i f i c a t i o n
s c h e m a
R u l e b a s e d
t r a n s f o r m a t i o n
4th Generation EDW
Interpretation
Prediction
Diagnosis
Design
Planning
Monitoring
Debugging
Repairing
Instruction
Control
S t r a t e g y
T a c t i c s
O p e r a t i o n s
Using, applying and measuring
Big Data
Big Data
Big Data
Predictive Analytics
Predictive Analytics
Outcomes
EDW 4.0
EDW 4.0E(A)TL
Using, applying and measuring
Big Data
Predictive
analytics
Select
predictions
Define
trackable
actions
Apply
outcomes and
actions to EDW
4
Accumulate
campaign Big
Data
Descriptive
analytics
Select findings
Combine with
trackable
actions
Apply
outcomes and
actions to EDW
4
Run campaign
Analyse campaign and performance of Big Data analytics
Forecasts and results – from all perspectives
-400
-300
-200
-100
0
100
200
300
400
500
01/15 02/15 03/15 04/15 05/15 06/15 07/15 08/15 09/15 10/15 11/15 12/15 01/16 02/16 03/16 04/16 05/16 06/16
Cambriano Big Data Campaign 2015-2016
Forecast Actual Strategy BD Costs Benefit
Values Relativity Dimensions HierarchiesStructures
Past Future
Using, applying and measuring
•Combining Big Data analytics with Data
Warehousing 4.0
•Planning and managing initiatives
•Measuring, analysing and reporting the
effectiveness of business initiatives
•Measuring, analysing and reporting the
tangible contribution of the Big Data
analytics process to the creation of business
value
Big Data and Core Statistics
A multi-faceted data theatre for ad-hoc,
speculative and immediate operational
analytics
Internal
digital data
Data Supply Framework
External
digital data
Data
logistics
Operational
Data Store
Data
Warehouse
Analytics
Data Store
Data Marts
Statistical
Analysis
Business
Intelligence
Scenarios
Data
logistics
Primary data flow
Secondary data flow
Operational
applications
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
DSF 4.0 Data Value Chains
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
DATA INFORMATION KNOWLEDGE
Requires context Requires interpretation Requires wisdom
Relevant Correct Usable
Irrelevant Incorrect Useless
Meaningless Misleading Wrong
Value? Value? Value?
  
  
  
DSF 4.0 Data Assets in MOSCOW
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
RISK
ASSET
SECURE
BAU
Assurance
Highest High Medium/Low
Very
low/None
MUST SHOULD COULD WON’T
Yes Yes Maybe Maybe/No
Yes Yes Yes Maybe/No
Yes Yes Yes Maybe/No
DSF 4.0 Data Assets in MOSCOW
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
RISK
ASSET
SECURE
BAU
Assurance
Highest High Medium/Low
Very
low/None
MUST SHOULD COULD WON’T
Yes Yes Maybe Maybe/No
Yes Yes Yes Maybe/No
Yes Yes Yes Maybe/No
DSF 4.0 Data Supply Framework
External
digital data
Data
logistics
Operational
Data Store
Data
Warehouse
Analytics
Data Store
Data Marts
Statistical
Analysis
Business
Intelligence
Scenarios
Data
logistics
Primary data flow
Secondary data flow
Operational
applications
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
OLTP
Applications
‘What if’
analysis
MIS /
Reporting
Visualisation
Publication
º
All digital
data
Internal
digital data
DSF 4.0 Data Supply Framework
External
digital data
Data
logistics
Operational
Data Store
Data
Warehouse
Analytics
Data Store
Data Marts
Statistical
Analysis
Business
Intelligence
Scenarios
Data
logistics
Primary data flow
Secondary data flow
Operational
applications
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
All
information
consumers
º
All digital
data
Internal
digital data
External
digital data
Primary data flow
Secondary data flow
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
º
Statistics
Data
Science
Big Data
Small Data
Smart Data
This Data
That Data
That
department
Messing
with data
Map Fatten
Retrospect
Reports
Alerts
Visualisation
Analytics
This
department
The other
department
Map Reduce
DSF 4.0 Data Supply Framework
DSF 4.0 Data Supply Framework
Operational
Data Store
Data
Warehouse
Business
Intelligence
Data
logistics
Operational
applications
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
All
information
and data
consumers
All
information
consumers
All digital
data
All data processing, enrichment
and information creation
EDW
ADS
DM
DM
DM
Statistical
analysis
ETL
T/ETL
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Message
Adapter
Message
Queue
OLTP
Staging
ODS
ETLT/ETL
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
TL
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Message
Adapter
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
EDW
ADS
DM
DM
DM
Statistical
analysis
ETL
T/ETL
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Message
Adapter
Message
Queue
OLTP
Staging
ODS
ETLT/ETL
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
TL
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Message
Adapter
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
EDW
ADS
DM
DM
DM
Statistical
analysis
ETL
T/ETL
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Message
Adapter
Message
Queue
OLTP
Staging
ODS
ETLT/ETL
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
TL
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Message
Adapter
Data Sources – This element covers all the current sources, varieties and
volumes of data available which may be used to support processes of
'challenge identification', 'option definition', decision making, including
statistical analysis and scenario generation.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
EDW
ADS
DM
DM
DM
Statistical
analysis
ETL
T/ETL
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Message
Adapter
Message
Queue
OLTP
Staging
ODS
ETLT/ETL
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
TL
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Message
Adapter
Core Data Warehousing – This is a suggested evolution path of the DW 2.0
model. It extends the Inmon paradigm to not only include unstructured and
complex data but also the information and outcomes derived from statistical
analysis performed outside of the 4th generation Data Warehousing
landscape.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
EDW
ADS
DM
DM
DM
Statistical
analysis
ETL
T/ETL
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Message
Adapter
Message
Queue
OLTP
Staging
ODS
ETLT/ETL
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
TL
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Message
Adapter
Core Statistics – This element covers the core body of statistical competence,
especially but not only with regards to evolving data volumes, data velocity
and speed, data quality and data variety.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
INTO THE ZONE!
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Complex Data – This is unstructured or highly complexly
structured data contained in documents and other complex
data artefacts, such as multimedia documents.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Event Data – This is an aspect of Enterprise Process Data,
and typically at a fine-grained level of abstraction. Here are
the business process logs, the internet web activity logs and
other similar sources of event data. The volumes generated
by these sources will tend to be higher than other volumes
of data, and are those that are currently associated with the
Big Data term, covering as it does that masses of
information generated by tracking even the most minor
piece of 'behavioural data' from, for example, someone
casually surfing a web site.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Infrastructure Data – This aspect includes data which could
well be described as signal data. Continuous high velocity
streams of potentially highly volatile data that might be
processed through complex event correlation and analysis
components.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Event Applicance – This puts the dynamic data collation,
selection and reduction functionality as close to the point of
event data generation as physically possible.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Signal Applicance – This puts the dynamic data collation,
selection and reduction functionality as close to the point of
continuous streaming data generation as physically possible.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Distributed Inter Process Communication – Different forms of
messaging allow high volumes of data to be transmitted in
near real time.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Staging and Reduction – Traditional data staging combined
with in-line data reduction.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
ET(A)L – Extending ETL to include data analytics components
tightly integrated into parallel ETL job streams.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
ADS – The Analytics Data Store. 1. Statistics oriented 2.
Integrated by focus area 3. Variable volatility 4. Time variant
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Statistical Analysis – Qualitative analysis. Diagnostic analysis,
predictive analysis, speculative analysis, data mining, data
exploration, modelling.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Scenarios and outcomes – 1. Snapshots of outcomes of
scenario analysis as the process of analyzing possible future
events by generating alternative possible outcomes. 2.
Captured outcomes of statistical analysis.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
DSF 4.0 Data Supply Framework
Martyn Richard Jones 2015 – martynjones.eu
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Write back – The ability to append data, update data and
enrich data within the Analytics Data Store, and to provide
scenario data to the Core Data Warehousing.
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com
DSF 4.0 – Core Statistics: Analytics Data Store
Martyn Richard Jones 2015 – martynjones.eu
ADS
Statistical
analysis
ET(A)L
Staging &
Reduction
Signal
Appliance
Message
Adapter
Message
Queue
Infrastructure
Data
Write back
Complex data
Event Data
Event
Appliance
Scenario 1
Scenario 2
Scenario 3
Core Data Warehousing
Core Statistics
Data
Sources
Message
Adapter
Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com
DSF 4.0 – Analytics Data Store
Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com
Distributed File System
Non-relational distributed file storage / NoSQL
DFS (Including ‘refractoring’ of Unix
primitives)
Unix File Store
POSIX compliant
Document
DBMS
Graph DBMS
Key-Value
DBMS
In-memory Column Oriented Relational
DBMS
Relational DBMS (MPP/SMP/Hybrid)
Object DBMS
POSIX compliant Unix / Linux primitives
Relational DBMS
DSF 4.0 – Analytics Data Store - Technologies
Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com
DSF 4.0 – What’s important?
Cambriano Energy 2015 - http://www.cambriano.es
Data
Warehouse
Martyn Richard Jones 2015 – martynjones.euPublished by goodstrat.com
Business
Intelligence
Operational
Data Store
Analytics Data
Store
Statistical
Analysis
Dark Data
Big Data
Internet of
Things
Knowledge
Management
Structured
Intellectual
Capital
Cloud
Summary
A good place to end, for now
DSF 4.0 Data Supply Framework
Operational
Data Store
Data
Warehouse
Business
Intelligence
Data
logistics
Operational
applications
Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
All
information
and data
consumers
All
information
consumers
All digital
data
All data processing, enrichment
and information creation
DSF 4.0 Perspectives
 Look back
 From now
 From then
 From before
 From the future
 Look at now
 Look at near +/-
 Look foward
 From now
 From before
 From the future
 Multiple worlds and
universes
DSF 4.0 Perspectives
 What we got right
 What we can do better
 What we can retry at another time
 What we can drop
DSF 4.0 Perspectives – Look Back
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
From now
From the
future
From then
Dimensions
Classification
From
before
Data
Summary
• Never open up too many data fronts at
the same time
• Iterate and take baby steps
• Use agile where it makes sense
• Keep everything as close to the
business as possible
• Involve the business – continuously
Summary
• Consider everything
• Question everything
• Never stop hypothesising
• Never stop testing
• For every initiative have a business
imperative
• Make continuous engagement and
involvement a goal
Muchas gracias
Many thanks
Big Data Spain 2015
Big Data, Analytics and
4th Generation Data Warehousing
Big Data Spain 2015

Big data, Analytics and 4th Generation Data Warehousing

  • 1.
    Big Data, Analyticsand 4th Generation Data Warehousing Martyn Jones Big Data Spain 2015
  • 2.
    agenda ∙ Imperatives. ∙ Datavalue chains. ∙ Resources. ∙ 4th Generation Data Warehousing. ∙ Analytics Data Store / Big Data. ∙ Information Supply Framework. Friday 16th 12:30 - 13:15 #BDS15 Room 25 – Technical 0 5 10 15 20 25 30 35 40 45#BDS15
  • 3.
    Quote, Unquote "It isnot consciousness of men that determines their being, but, on the contrary, their social being that determines their consciousness.“ Karl Marx
  • 4.
  • 5.
    Media presence Twitter @GoodStratTweet http://www.goodstrat.comhttp://www.linkedin.com/grp/home?gid=8338976 http://www.itworld.com/blog/it-circus/
  • 6.
    Quote, Unquote “Do BigData initiatives require a business case? If so, have you ever seen one?” – Joseph Cotter, UK “Big data - reinventing the wheel every day with a new and slightly different value for Pi.” – Karl Snowsill, Australia “The Big Data Contrarians - A place where you can find a way to cut through BIG Bull…” – Sanjay Pandey, Canada “"If you had all the answers in the world...what would your question be?“ - Yves de Hondt – Belgium “Big Data in bite size sessions - walk this way !!” – Steve Scholes, MBA, UK “The only sane spot in the Big Data asylum.” – Dominic Vincent Ligot, Phillipines “Enforcing strict limits on koolaid consumption” – Gary Anderson, USA
  • 7.
    the ages ofdata B . C . L i f e o f B r i a n A . D .
  • 8.
    C h an g eI n s i g h t P o t e n t i a l l y u s e f u l Simplicity A b u n d a n t V o l u m e V e l o c i t y V a r i e t y
  • 9.
    framework O b ta i n I n t e g r a t e A n a l y s e P r e s e n t D A T A D A T A D A T A
  • 10.
    the road toBig Data success… S t r a t e g i c T a c t i c a l O p e r a t i o n a l A n a l y t i c s A r c h i t e c t e d M a n a g e d I n t e g r a t i o n D a t a
  • 11.
  • 12.
  • 13.
    what’s important tobusiness? BE NOTICED CASH FLOW BE NOTICED CASH FLOW BE NOTICED CASH FLOW
  • 14.
    what else isimportant to business? Market share Differentiation Ability to execute Liquidity Profitability Time and place utility React to competitive threats Enhance service scope Improving customer service Respond to price pressure Segmentation of n Addressing short-term attention spans Ability to respond to irrationality Be noticed Cash flow Risk Legislation No pressBad press Customer centricity Front office empowerment Excellence Channel excellence Operational excellence Product excellence Cultures IT business value Base protection Expansion Diversification Consolidation
  • 15.
    Augmented Competitive Forces Competitionfrom within the industry Suppliers Buyers Replacements Potential entrants Threat of replacement product or service Threat of new entrants Bargaining power Bargaining power Sources: Michael Porter;Martyn R Jones and others Rivalry with existing competitors Pressure groups Media Government Power to change the game Exposure
  • 16.
  • 17.
  • 18.
    operating models Customer segments Channels Products Services Organsationaldesign Processes Data & information Physical assets Development Deployment Organsational design Performance management Information technology Business model Operating model People model Customers Systems People Processes Organisation
  • 19.
    objectives 1. Information awarenesscorresponding to areas of operation and spheres of control 2. Comprehensive data and information supply framework 3. Continually seek to maintain and then improve data’s contribution to business
  • 20.
    Business data everywhere Where,when, what, who, why... how?
  • 21.
    Data I n te r n a l P a s t E x t e r n a l P r e s e n t S h a r e d F u t u r e
  • 22.
    Data O p er a t i o n a l O n l i n e B i g D a t a A r c h i v e d D a r k D a t a U n m a n a g e d
  • 23.
    Data A r ch i v e s S o c i a l M e d i a D o c u m e n t s M a c h i n e L o g M e d i a S e n s o r B u s i n e s s A p p l i c a t i o n s D a t a S t o r a g e P u b l i c W e b
  • 24.
  • 25.
  • 27.
    B I GD A T A I n t e r n e t o f T h i n g s C L O U D S t a t i s t i c s D a t a W a r e h o u s i n g P r e s e n t a t i o n D a t a S u p p l y F r a m e w o r k
  • 28.
    The Data Warehouse 25years... of sometimes getting it right
  • 29.
    Enterprise Data Warehousing– AS IS S u b j e c t o r i e n t e d S t r a t e g i c d e c i s i o n m a k i n g I n t e g r a t e d T i m e v a r I a n t N o n – v o l a t i l e
  • 30.
    Operational Systems DataWarehouse Purchasing HR Credit Order Processing Marketing SalesLogistics Billing Arrangements ProductsParty TimeGeography Transactions Subject oriented
  • 31.
    Operational Systems DataWarehouse Euro Account Customer: Customer: Village Bank GmbH Country code: D Mutual Fund Customer: Customer: Village Bankers Region: Westphalia NTIP Customer: Customer: Village Bank International Country: Germany Account: Number Customer Type 230956 441353 Euro 010555 441353 MF 291284 441353 NTIP Party: Number: 100441353 Name: Village Bank GmbH Country: Germany Integrated
  • 32.
    Operational Systems DataWarehouse 0 10 20 30 40 50 60 70 80 90 100 Trading Activity Snapshots: Date Security Amount 2006.09.01 MartyBank 79.000.000 2006.09.02 MartyBank 92.000.000 2006.09.03 MartyBank 44.000.000 2006.09.04 MartyBank 39.000.000 2006.09.05 MartyBank 80.000.000 Trading Activity: MartyBank Time variant
  • 33.
    Operational Systems DataWarehouse Order Processing Create Replace Update Delete Orders Read Read Read ReadWrite Read Non-volatile
  • 34.
    Strategic decision support Supportingstrategy formulation, choice and execution
  • 35.
    Data Warehousing 2.0 DataSources StructuredData ETL Extract Transform Load Internal ODS ODS EDW ETL Extract Transform Load Data Marts StructuredData Unstructured Data Mart Data Mart Report Repository Reports & Extracts Stats Dataselectionandrepresentation Dataanalytics Reportsetandextractcreation Service Push/PullTechnology Visualisation Annotation Users Internal Clients Otherstakeholders Metadata, Workflow/Process Control and CIW Management Metadata Process ÊDW Management Staging Staged Data EDW Unstructured EDW Data Mart StructuredData Unstructured
  • 36.
    The Data Warehouse 25years... of sometimes getting it right… and wrong
  • 37.
    Enterprise Data Warehousing– AS A BODGE G e t d a t a W o n d e r w h y i t ‘ s n o t m e e t i n g e x p e c t a t I o n s D u m p d a t a Q u e r y d a t a V i s u a l i s e d a t a
  • 38.
    Enterprise Data Warehousing– AS A BODGE DW BODGER TEAM HADOOP TEAM We built a data dog house using Oracle and IBM technology and we called it a data warehouse We can do data warehousing too and it will be cheaper, faster and smarter
  • 39.
    Data Supply Framework Adata architecture for data sourcing, transformation, integration, storage, search, analysis and presentation
  • 40.
    Data Supply Framework Operational DataStore Data Warehouse Business Intelligence Data logistics Operational applications Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es All information and data consumers All information consumers All digital data All data processing, enrichment and information creation
  • 41.
    Internal digital data Data SupplyFramework External digital data Data logistics Operational Data Store Data Warehouse Analytics Data Store Data Marts Statistical Analysis Business Intelligence Scenarios Data logistics Primary data flow Secondary data flow Operational applications Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
  • 42.
    EDW ADS DM DM DM Statistical analysis ETL T/ETL ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Message Adapter Message Queue OLTP Staging ODS ETLT/ETL Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 TL Data Supply Framework Data Sources 4th Generation Data Warehousing Data Sources Core Statistics Cambriano Energy 2015
  • 43.
    Core Data Sourcing Comprehensivedata acquisition and transformation
  • 44.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DW 3.0 Information Supply Framework Cambriano Energy 2015 Core Data Warehousing Core Statistics Data Sources Message Adapter
  • 45.
    Core Data Sourcing •Mostbusiness data is highly structured •Most business Big Data is web related •There is a growing collection of tools for capturing, transforming and moving both •The closer to the money that your data is, the higher its potential value
  • 46.
    Core Data Sourcing •Mostbusiness data is highly structured •Most business Big Data is web related •There is a growing collection of tools for capturing, transforming and moving both •The closer to the money that your data is, the higher its potential value
  • 47.
    4th Generation Data Warehousing Providinga solid foundation for strategic, tactical and operational decision making
  • 48.
    Enterprise Data Warehousing– 4 GEN S u b j e c t o r i e n t e d S t r a t e g i c , t a c t i c a l & o p e r a t i o n a l s u p p o r t I n t e g r a t e d T i m e v a r i a n c e & t i m e p e r s p e c t i v e s C o n s t r a i n e d v o l a t i l i t y C l a s s i f i c a t i o n s c h e m a R u l e b a s e d t r a n s f o r m a t i o n
  • 49.
  • 50.
    Using, applying andmeasuring Big Data Big Data Big Data Predictive Analytics Predictive Analytics Outcomes EDW 4.0 EDW 4.0E(A)TL
  • 51.
    Using, applying andmeasuring Big Data Predictive analytics Select predictions Define trackable actions Apply outcomes and actions to EDW 4 Accumulate campaign Big Data Descriptive analytics Select findings Combine with trackable actions Apply outcomes and actions to EDW 4 Run campaign Analyse campaign and performance of Big Data analytics
  • 52.
    Forecasts and results– from all perspectives -400 -300 -200 -100 0 100 200 300 400 500 01/15 02/15 03/15 04/15 05/15 06/15 07/15 08/15 09/15 10/15 11/15 12/15 01/16 02/16 03/16 04/16 05/16 06/16 Cambriano Big Data Campaign 2015-2016 Forecast Actual Strategy BD Costs Benefit Values Relativity Dimensions HierarchiesStructures Past Future
  • 53.
    Using, applying andmeasuring •Combining Big Data analytics with Data Warehousing 4.0 •Planning and managing initiatives •Measuring, analysing and reporting the effectiveness of business initiatives •Measuring, analysing and reporting the tangible contribution of the Big Data analytics process to the creation of business value
  • 54.
    Big Data andCore Statistics A multi-faceted data theatre for ad-hoc, speculative and immediate operational analytics
  • 55.
    Internal digital data Data SupplyFramework External digital data Data logistics Operational Data Store Data Warehouse Analytics Data Store Data Marts Statistical Analysis Business Intelligence Scenarios Data logistics Primary data flow Secondary data flow Operational applications Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
  • 56.
    DSF 4.0 DataValue Chains Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es DATA INFORMATION KNOWLEDGE Requires context Requires interpretation Requires wisdom Relevant Correct Usable Irrelevant Incorrect Useless Meaningless Misleading Wrong Value? Value? Value?         
  • 57.
    DSF 4.0 DataAssets in MOSCOW Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es RISK ASSET SECURE BAU Assurance Highest High Medium/Low Very low/None MUST SHOULD COULD WON’T Yes Yes Maybe Maybe/No Yes Yes Yes Maybe/No Yes Yes Yes Maybe/No
  • 58.
    DSF 4.0 DataAssets in MOSCOW Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es RISK ASSET SECURE BAU Assurance Highest High Medium/Low Very low/None MUST SHOULD COULD WON’T Yes Yes Maybe Maybe/No Yes Yes Yes Maybe/No Yes Yes Yes Maybe/No
  • 59.
    DSF 4.0 DataSupply Framework External digital data Data logistics Operational Data Store Data Warehouse Analytics Data Store Data Marts Statistical Analysis Business Intelligence Scenarios Data logistics Primary data flow Secondary data flow Operational applications Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es OLTP Applications ‘What if’ analysis MIS / Reporting Visualisation Publication º All digital data
  • 60.
    Internal digital data DSF 4.0Data Supply Framework External digital data Data logistics Operational Data Store Data Warehouse Analytics Data Store Data Marts Statistical Analysis Business Intelligence Scenarios Data logistics Primary data flow Secondary data flow Operational applications Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es All information consumers º All digital data
  • 61.
    Internal digital data External digital data Primarydata flow Secondary data flow Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es º Statistics Data Science Big Data Small Data Smart Data This Data That Data That department Messing with data Map Fatten Retrospect Reports Alerts Visualisation Analytics This department The other department Map Reduce DSF 4.0 Data Supply Framework
  • 62.
    DSF 4.0 DataSupply Framework Operational Data Store Data Warehouse Business Intelligence Data logistics Operational applications Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es All information and data consumers All information consumers All digital data All data processing, enrichment and information creation
  • 63.
    EDW ADS DM DM DM Statistical analysis ETL T/ETL ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Message Adapter Message Queue OLTP Staging ODS ETLT/ETL Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 TL DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Message Adapter Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
  • 64.
    EDW ADS DM DM DM Statistical analysis ETL T/ETL ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Message Adapter Message Queue OLTP Staging ODS ETLT/ETL Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 TL DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Message Adapter Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es
  • 65.
    EDW ADS DM DM DM Statistical analysis ETL T/ETL ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Message Adapter Message Queue OLTP Staging ODS ETLT/ETL Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 TL DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Message Adapter Data Sources – This element covers all the current sources, varieties and volumes of data available which may be used to support processes of 'challenge identification', 'option definition', decision making, including statistical analysis and scenario generation. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 66.
    EDW ADS DM DM DM Statistical analysis ETL T/ETL ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Message Adapter Message Queue OLTP Staging ODS ETLT/ETL Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 TL DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Message Adapter Core Data Warehousing – This is a suggested evolution path of the DW 2.0 model. It extends the Inmon paradigm to not only include unstructured and complex data but also the information and outcomes derived from statistical analysis performed outside of the 4th generation Data Warehousing landscape. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 67.
    EDW ADS DM DM DM Statistical analysis ETL T/ETL ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Message Adapter Message Queue OLTP Staging ODS ETLT/ETL Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 TL DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Message Adapter Core Statistics – This element covers the core body of statistical competence, especially but not only with regards to evolving data volumes, data velocity and speed, data quality and data variety. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 68.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu INTO THE ZONE!
  • 69.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Complex Data – This is unstructured or highly complexly structured data contained in documents and other complex data artefacts, such as multimedia documents. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 70.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Event Data – This is an aspect of Enterprise Process Data, and typically at a fine-grained level of abstraction. Here are the business process logs, the internet web activity logs and other similar sources of event data. The volumes generated by these sources will tend to be higher than other volumes of data, and are those that are currently associated with the Big Data term, covering as it does that masses of information generated by tracking even the most minor piece of 'behavioural data' from, for example, someone casually surfing a web site. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 71.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Infrastructure Data – This aspect includes data which could well be described as signal data. Continuous high velocity streams of potentially highly volatile data that might be processed through complex event correlation and analysis components. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 72.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Event Applicance – This puts the dynamic data collation, selection and reduction functionality as close to the point of event data generation as physically possible. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 73.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Signal Applicance – This puts the dynamic data collation, selection and reduction functionality as close to the point of continuous streaming data generation as physically possible. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 74.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Distributed Inter Process Communication – Different forms of messaging allow high volumes of data to be transmitted in near real time. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 75.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Staging and Reduction – Traditional data staging combined with in-line data reduction. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 76.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter ET(A)L – Extending ETL to include data analytics components tightly integrated into parallel ETL job streams. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 77.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter ADS – The Analytics Data Store. 1. Statistics oriented 2. Integrated by focus area 3. Variable volatility 4. Time variant Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 78.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Statistical Analysis – Qualitative analysis. Diagnostic analysis, predictive analysis, speculative analysis, data mining, data exploration, modelling. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 79.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Core Data Warehousing Core Statistics Data Sources Message Adapter Scenarios and outcomes – 1. Snapshots of outcomes of scenario analysis as the process of analyzing possible future events by generating alternative possible outcomes. 2. Captured outcomes of statistical analysis. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu
  • 80.
    ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complexdata Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 DSF 4.0 Data Supply Framework Martyn Richard Jones 2015 – martynjones.eu Core Data Warehousing Core Statistics Data Sources Message Adapter Write back – The ability to append data, update data and enrich data within the Analytics Data Store, and to provide scenario data to the Core Data Warehousing. Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com
  • 81.
    DSF 4.0 –Core Statistics: Analytics Data Store Martyn Richard Jones 2015 – martynjones.eu ADS Statistical analysis ET(A)L Staging & Reduction Signal Appliance Message Adapter Message Queue Infrastructure Data Write back Complex data Event Data Event Appliance Scenario 1 Scenario 2 Scenario 3 Core Data Warehousing Core Statistics Data Sources Message Adapter Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com
  • 82.
    DSF 4.0 –Analytics Data Store Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Distributed File System Non-relational distributed file storage / NoSQL DFS (Including ‘refractoring’ of Unix primitives) Unix File Store POSIX compliant Document DBMS Graph DBMS Key-Value DBMS In-memory Column Oriented Relational DBMS Relational DBMS (MPP/SMP/Hybrid) Object DBMS POSIX compliant Unix / Linux primitives Relational DBMS
  • 83.
    DSF 4.0 –Analytics Data Store - Technologies Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com
  • 84.
    DSF 4.0 –What’s important? Cambriano Energy 2015 - http://www.cambriano.es Data Warehouse Martyn Richard Jones 2015 – martynjones.euPublished by goodstrat.com Business Intelligence Operational Data Store Analytics Data Store Statistical Analysis Dark Data Big Data Internet of Things Knowledge Management Structured Intellectual Capital Cloud
  • 85.
    Summary A good placeto end, for now
  • 86.
    DSF 4.0 DataSupply Framework Operational Data Store Data Warehouse Business Intelligence Data logistics Operational applications Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es All information and data consumers All information consumers All digital data All data processing, enrichment and information creation
  • 87.
    DSF 4.0 Perspectives Look back  From now  From then  From before  From the future  Look at now  Look at near +/-  Look foward  From now  From before  From the future  Multiple worlds and universes
  • 88.
    DSF 4.0 Perspectives What we got right  What we can do better  What we can retry at another time  What we can drop
  • 89.
    DSF 4.0 Perspectives– Look Back 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 From now From the future From then Dimensions Classification From before Data
  • 90.
    Summary • Never openup too many data fronts at the same time • Iterate and take baby steps • Use agile where it makes sense • Keep everything as close to the business as possible • Involve the business – continuously
  • 91.
    Summary • Consider everything •Question everything • Never stop hypothesising • Never stop testing • For every initiative have a business imperative • Make continuous engagement and involvement a goal
  • 92.
  • 93.
    Big Data, Analyticsand 4th Generation Data Warehousing Big Data Spain 2015

Editor's Notes

  • #2 Make sure this is visible on the laptop screen!
  • #4 Our knowledge and especially our experiences influences how we see things, what interests us and what we try and communicate
  • #5 Who we work for, who we work with, what we do… will have significance
  • #6 In my case this as resulted in curious inititaives that have been enabled by the age of mass communication and a plethora of available data and information I blog , tweet and run a small Big Data community on LinkedIn called the Big Data Contrarians... Not so much contrary to Big Data, but contrary to the worst excesses of hype and the application of the popular MIUAYGA methodology (also known as Make It Up As You Go Along)
  • #7 Here are some quotes from my fellow members, solicited in fact for this very event... Suggestions wanted! I'm doing a talk next week at Big Data Spain 2015 (it's on .. well laters). I would like to mention The Big Data Contrarians, Would anyone like to give me some supporting quotes for our group? The more outrageous the better..But minimalist cynicism can also do it. Let me know before Friday. Thanks and best regards, Martyn
  • #8 B.C. Before computers... And all that jazz Life of Brian: The age of the electronic Brain A.D. All data, everywhere... Like it or not
  • #9 How do we go from abundance to change worth wanting? How do we harness, measure and manage the ever more volatile and changing environment with the abundance of data (some of it of value) that we have at our disposal.
  • #10 Integration of data from heterogeneous sources. Architecture and management. A cohesive, coherent and realisable framework. The business importance of the well-architected and well-managed integration of data from heterogeneous data sources into a cohesive, coherent and realisable information supply framework. Data integration, not just data affinity A coherent, cohesive and realisable approach to bringing Big Data into the mainstream of business data architecture, data management and decision support
  • #11 The key to the success of Big Data is in the well-architected and managed integration of Big Data into mainstream strategic, tactical and operational analytics and decision support.
  • #17 The model is based on the theory that, for an organization to perform well, these seven elements need to be aligned and mutually reinforcing. So, the model can be used to help identify what needs to be realigned to improve performance, or to maintain alignment (and performance) during other types of change. Usage Improve the performance of a company Examine the likely effects of future changes within a company Align departments and processes during a merger or acquisition Determine how best to implement a proposed strategy
  • #21 It‘s also about the data Where do we want to go? When do we want to get there? What do we want to achieve? How do we reach our goals? How do we want to reach our goals?
  • #23 Operational data – ERP, Legacy OLTP, departmental server databases Items on the dark data ticket include: Email; Instant messages; documents; Sharepoint content; content of collaboration databases; ZIP files; log files; archived sensor and signal data; archived web content; aged audit trails; operational database backups – full and incremental; roll-back, redo and spooled data files; sunsetted applications (code and documentation); partially developed and then abandoned applications; and, code snippets.
  • #24 Operational data – ERP, Legacy OLTP, departmental server databases Items on the dark data ticket include: Email; Instant messages; documents; Sharepoint content; content of collaboration databases; ZIP files; log files; archived sensor and signal data; archived web content; aged audit trails; operational database backups – full and incremental; roll-back, redo and spooled data files; sunsetted applications (code and documentation); partially developed and then abandoned applications; and, code snippets.
  • #29 It‘s also about the data Where do we want to go? When do we want to get there? What do we want to achieve? How do we reach our goals? How do we want to reach our goals?
  • #37 It‘s also about the data Where do we want to go? When do we want to get there? What do we want to achieve? How do we reach our goals? How do we want to reach our goals?
  • #45 For the sake of simplicity there are three explicitly named data sources in the diagram (of course there can be more, and the Enterprise Data Warehouse or it's dependent Data Marts may also act as a data source), but for the purpose of this blog piece I have limited the number to three: Complex data; Event data; and, Infrastructure data. Complex Data – This is unstructured or highly complexly structured data contained in documents and other complex data artefacts, such as multimedia documents. Event Data – This is an aspect of Enterprise Process Data, and typically at a fine-grained level of abstraction. Here are the business process logs, the internet web activity logs and other similar sources of event data. The volumes generated by these sources will tend to be higher than other volumes of data, and are those that are currently associated with the Big Data term, covering as it does that masses of information generated by tracking even the most minor piece of 'behavioural data' from, for example, someone casually surfing a web site. Infrastructure Data – This aspect includes data which could well be described as signal data. Continuous high velocity streams of potentially highly volatile data that might be processed through complex event correlation and analysis components.
  • #94 Make sure this is visible on the laptop screen!