INTO THE STREAMS AND
LAKES
OUT OF THE DATA WAREHOUSES
BIG DATA LDN
ABOUT ME
▸ Valo Product Manager
▸ Engineering background
▸ Big data for utilities
BIG DATA LDN
BIG DATA FOR UTILITIES
▸ Triggered by smart meter roll-out
▸ Quarterly meter reads become half-hourly
▸ Previously 80 million reads per year
▸ Now 350 billion reads per year
BIG DATA LDN
BIG DATA FOR UTILITIES
▸ Triggered by smart meter roll-out
▸ Quarterly meter reads become half-hourly
▸ Previously 80 million reads per year
▸ Now 350 billion reads per year
WHAT TO DO WITH THE DATA?
BIG DATA LDN
TRADITIONAL DATA WAREHOUSING
Use database running batch analytics
on usage patterns:
‣ Dip in consumption due to a
holiday or fraud?
‣ How to model region or cohort
demand?
‣ How can consumption be nudged
to fit supply patterns?
BIG DATA LDN
BIG DATA FOR FINANCE
▸ Infrastructure monitoring: 50 metrics x 000s of machines
▸ Server and application logs
▸ FX: $3.5 trillion per year
▸ HFT
BIG DATA LDN
DATA WAREHOUSE ISSUES
▸ Time criticality of analytics
▸ Unbalanced resource use
▸ Unbounded data means moving goalposts
▸ Data from disparate sources
DATA STREAMS
BIG DATA LDN
DATA STREAMS
▸ Analyse unbounded data “in motion”
▸ Seriously fast
▸ Often achieved using delta & approximation functions
BIG DATA LDN
DATA WAREHOUSE ISSUES
▸ Time criticality of analytics
▸ Unbalanced resource use
▸ Unbounded data means moving goalposts
▸ Data from disparate sources
DATA LAKES
BIG DATA LDN
DATA LAKES
▸ Flexible storage for huge amounts of data
▸ Structured, semi-structured and unstructured
▸ Achieve speed through indexing
▸ No need to define schemas or data dictionaries up front
BIG DATA LDN
DATA WAREHOUSE ISSUES
▸ Unbalanced resource use
▸ Unbounded data means moving goalposts
▸ Time criticality of analytics
▸ Data from disparate sources
BIG DATA LDN
VALO
▸ Open lambda-style architecture
▸ Multiple repositories
▸ Realtime or historical analysis
▸ AP system
Analytics engine
TSR SSR
Stream processing
BIG DATA LDN
P2P CLUSTER
‣ Scale up or scale out
‣ Perform the analysis where the data is
SOUNDS EASY?
Query language
Data atomicity
Distributed execution engine
Distributed CRDTs
Distributed algorithms
Time semantics
Back pressure
Vector clocks
Cluster management
Transports
Distributed joins
Expression trees
Runtime code-gen
Off-heap memory
Gossip protocols
Consistent hashing
Statistical models
Actor systems
Real-time queries
Elasticity
Semi-structured repository
Query rewriting and optimisation
Data distribution
REST API
SDKs
Gap filling
Time-series repository
KV store
BIG DATA LDN
LAKES & STREAMS SIMPLIFIED
BIG DATA LDN
DATA LAKES AND STREAMS WITH VALO
▸ Streaming analytics on time-critical data
▸ Real-time and historical analysis
▸ Different repositories for different kinds of data
▸ Can ingest anything
BIG DATA LDN
THANK YOU
For more info come speak to us at stand 320

Big Data LDN 2016: Out of the Data Warehouses, and into the Data Lakes and Streams

  • 1.
    INTO THE STREAMSAND LAKES OUT OF THE DATA WAREHOUSES
  • 2.
    BIG DATA LDN ABOUTME ▸ Valo Product Manager ▸ Engineering background ▸ Big data for utilities
  • 3.
    BIG DATA LDN BIGDATA FOR UTILITIES ▸ Triggered by smart meter roll-out ▸ Quarterly meter reads become half-hourly ▸ Previously 80 million reads per year ▸ Now 350 billion reads per year
  • 4.
    BIG DATA LDN BIGDATA FOR UTILITIES ▸ Triggered by smart meter roll-out ▸ Quarterly meter reads become half-hourly ▸ Previously 80 million reads per year ▸ Now 350 billion reads per year WHAT TO DO WITH THE DATA?
  • 5.
    BIG DATA LDN TRADITIONALDATA WAREHOUSING Use database running batch analytics on usage patterns: ‣ Dip in consumption due to a holiday or fraud? ‣ How to model region or cohort demand? ‣ How can consumption be nudged to fit supply patterns?
  • 6.
    BIG DATA LDN BIGDATA FOR FINANCE ▸ Infrastructure monitoring: 50 metrics x 000s of machines ▸ Server and application logs ▸ FX: $3.5 trillion per year ▸ HFT
  • 7.
    BIG DATA LDN DATAWAREHOUSE ISSUES ▸ Time criticality of analytics ▸ Unbalanced resource use ▸ Unbounded data means moving goalposts ▸ Data from disparate sources
  • 8.
  • 9.
    BIG DATA LDN DATASTREAMS ▸ Analyse unbounded data “in motion” ▸ Seriously fast ▸ Often achieved using delta & approximation functions
  • 10.
    BIG DATA LDN DATAWAREHOUSE ISSUES ▸ Time criticality of analytics ▸ Unbalanced resource use ▸ Unbounded data means moving goalposts ▸ Data from disparate sources
  • 11.
  • 12.
    BIG DATA LDN DATALAKES ▸ Flexible storage for huge amounts of data ▸ Structured, semi-structured and unstructured ▸ Achieve speed through indexing ▸ No need to define schemas or data dictionaries up front
  • 13.
    BIG DATA LDN DATAWAREHOUSE ISSUES ▸ Unbalanced resource use ▸ Unbounded data means moving goalposts ▸ Time criticality of analytics ▸ Data from disparate sources
  • 14.
    BIG DATA LDN VALO ▸Open lambda-style architecture ▸ Multiple repositories ▸ Realtime or historical analysis ▸ AP system Analytics engine TSR SSR Stream processing
  • 15.
    BIG DATA LDN P2PCLUSTER ‣ Scale up or scale out ‣ Perform the analysis where the data is
  • 16.
    SOUNDS EASY? Query language Dataatomicity Distributed execution engine Distributed CRDTs Distributed algorithms Time semantics Back pressure Vector clocks Cluster management Transports Distributed joins Expression trees Runtime code-gen Off-heap memory Gossip protocols Consistent hashing Statistical models Actor systems Real-time queries Elasticity Semi-structured repository Query rewriting and optimisation Data distribution REST API SDKs Gap filling Time-series repository KV store
  • 17.
    BIG DATA LDN LAKES& STREAMS SIMPLIFIED
  • 18.
    BIG DATA LDN DATALAKES AND STREAMS WITH VALO ▸ Streaming analytics on time-critical data ▸ Real-time and historical analysis ▸ Different repositories for different kinds of data ▸ Can ingest anything
  • 19.
    BIG DATA LDN THANKYOU For more info come speak to us at stand 320