SlideShare a Scribd company logo
1 of 57
Download to read offline
Analyzing petabytes of smart meter data using
Cloud Bigtable, Cloud Dataflow, and BigQuery
Edwin Poot & Erik van Wijk, Energyworx
Max Luebbe, Google
2
ENERGY TRANSITION IN PROGRESS
2
3
● rise of renewable energy sources
● regulation & market demands
● competition & increased costs
● intelligent devices in the home or along the
utilities infrastructure
(“Internet of Things”)
● two-way flow of information instead of one-
way
● increase of consumption
4
1. increasing density brings increasing data quality problems
2. strict regulations for safeguarding user privacy
3. redistribution of economic power and energy demand
4. rising competition between distributed and central
5. innovation outpaces regulation
Top 5 industry challenges
www.energyworx.com
CHINA
435 M
USA
132 M
JAPAN
58.7 MFRANCE
35 M
UK
53 M
NL
8 M
Italy
32 M
Ontario
4.7 M
British
Columbia
1.2 M
Quebec
3.8 M Germany
50 M
5
conventional utility systems cannot cope with this data diversity and endless
stream of all types, shapes and sizes
smart meters
smart grid equipment
sensors
home automation
multichannel customer interactions
consumers’ usage behavior
weather
social
spatial
creating a single, centralized view of data – accessible to many, and for many use
cases, that is the key to success
6
“We enable the energy evolution by
uncovering and monetizing the hidden
value of your data!”
ingest, process, analyze & learn
7
8
Enabling data-driven business models for the Energy & Utility industry since 2012
Offices in The Netherlands and in the United States,
Delivering a revolutionary data management & intelligence cloud
service disrupting the global Energy & Utilities market
Pushing out established vendors using pure play SaaS
Creating actionable information - sparking new
business concepts and models
Crunching data without being limited by scale,
speed and obsolete pricing models
9
generation
Meter Data
Management
Renewable Energy
Management
transmission trading distribution supply
Social Energy
Consumer Engagement
imbalances
settlements
Energy insights
for wholesale
connections
energyworx and the energy value chain
10
ENERGY INTELLIGENCE
ENERGY PROSUMERS
& RETAILERS
Demand Response (price)
Energy Insights
Demand Response (load)
Grid Insights
Renewables Engagement
Gamification Benchmarking
Balancing Congestion
Optimization Anomalies
MARKETS & SOLUTIONS
ENERGY DATA MANAGEMENT
Meter Data Management Energy Data Hub
ENERGY SYSTEM
OPERATORS
11
● Always supporting the latest IoT
products and/or equipment
● Protocol agnostic data ingestion
and limitless computation capacity
● Cloud Machine learning to support
new business concepts and
models
● Pay as you grow SaaS model, so
no large upfront investments
OUR ADVANTAGES
1212
Our platform
13
PLATFORM EVOLUTION HIGHLIGHTS
2012 2013 2014 2015 2016
- batched data
- temporal aggregations
- VEE
- utility connectivity
- API
- multi-tenancy
- permissions
- custom querying
- grouping
- tag properties
- datalabs (EDA)
- Machine learning
- CloudML
- (A)DR
- streaming data
- pseudonymisation
- tagging
- analytics
- dynamic profiling
- PayPerUse model
- IoT devices
- many new adapters
- performance
- web console
- Sheets addon
Data ingestion &
management
Insights & analysis Intelligence & IoT control
14
DELIVER A DATA MANAGEMENT & ANALYTICS SERVICE FOR ENERGY & UTILITY COMPANIES
PUBLIC
&
PRIVATE
CLOUD
15
1616
Big Data Challenges at Google
17
Google's mission to
"organize the world’s
information" presents
new challenges.
18
Big Data technologies invented at Google
2012 20132002 2004 2006 2008 2010
GFS
MapReduce
Bigtable Colossus
Dremel Flume
Millwheel
1919
How do we … ?
20
… build a 100TB+ filesystem?
Need: Google was building enormous data sets, and needed an
abstracted way to store and access at scale.
21
… build a 100TB+ filesystem?
Need: Google was building enormous data sets, and needed an
abstracted way to store and access at scale.
Solution: GFS (replaced by higher-scale Colossus in 2010)
22
… build a 100TB+ filesystem?
Need: Google was building enormous data sets, and needed an
abstracted way to store and access at scale.
Solution: GFS (replaced by higher-scale Colossus in 2010)
Google Cloud Storage
23
Need: Massive data index files took weeks to rebuild. We needed
random read/write access.
… build a petabyte database?
24
Need: Massive data index files took weeks to rebuild. We needed
random read/write access.
Solution: Bigtable (internal service launched 2006)
… build a petabyte database?
25
Need: Massive data index files took weeks to rebuild. We needed
random read/write access.
Solution: Bigtable (internal service launched 2006)
Google Cloud Bigtable
… build a petabyte database?
26
Need: Ad hoc queries over massive quantities of data, in just
seconds.
… query a trillion rows in seconds?
27
Need: Ad hoc queries over massive quantities of data, in just
seconds.
Solution: Dremel
… query a trillion rows in seconds?
28
Need: Ad hoc queries over massive quantities of data, in just
seconds.
Solution: Dremel
Google BigQuery
… query a trillion rows in seconds?
29
Need: Process petabytes of static and streaming data, quickly.
… build data-processing at Google scale?
30
Need: Process petabytes of static and streaming data, quickly.
Solution: MapReduce, Flume, and Millwheel
… build data-processing at Google scale?
31
Need: Process petabytes of static and streaming data, quickly.
Solution: MapReduce, Flume, and Millwheel
Google Cloud Dataflow
… build data-processing at Google scale?
3232
Imagine what one can build...
33
.. when scale is a solved problem.
34
Google Cloud Platform is the same infrastructure
Cloud Storage BigQuery Cloud DataflowCloud Bigtable
35
Cloud Bigtable is the same service Google uses
Cloud Bigtable
Bigtable Service
36
What is Cloud Bigtable?
NoSQL database for large datasets /
large throughput
Supports sequential scans
Auto-adjusts to access patterns
37
Bigtable
Node
Bigtable
Node
Bigtable
Node
How does Cloud Bigtable work?
Colossus
Filesystem
Client Client Client Client Client Client
Processing
Storage
Clients
38
Node
Cloud Bigtable learns access patterns...
Filesystem
Node Node
Client Client Client Client Client Client
Processing
Storage
Clients
A B C D E
39
Node Node Node
… and rebalances data accordingly
Filesystem
Client Client Client Client Client Client
Processing
Storage
Clients
A B C D EB C
40
Throughput can be controlled by node count
Node Node Node
Nodes
80,000
60,000
40,000
20,000
QPS
Bigtable Nodes
86420
0
41
Throughput can be controlled by node count
400,000
300,000
200,000
100,000
QPS
Bigtable Nodes
403020100
0
Nodes
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
42
Throughput can be controlled by node count
4,000,000
3,000,000
2,000,000
1,000,000
QPS
Bigtable Nodes
4003002001000
0
Nodes
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node Node Node Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
Node Node
43
Years of engineering to...
Teach Bigtable to configure itself
Isolate performance from “noisy
neighbors”
React automatically to new patterns,
splitting and balancing
Cloud Bigtable
44
Google has had an internal
cloud for over a decade
The same engineering that has made our
internal services better makes our Cloud
better:
Simpler control planes
Multi-tenancy
Adapts to large, new patterns
4545
Why we chose Google
46
Why did we choose
● Fastest with consistent performance
● Competitive and transparent pricing
● Autoscale to millions of users (and back)
● Unlimited flexible storage and caching
● Big Data & Machine Learning capabilities
● Development SDK & tools
● 24/7 access to expert support resources
47
5 things we’ve learned along the way
1 2 3 4 5
SKILLS,
KNOWLEDGE &
TRAINING
REQUIRED
IMPLEMENTATION
TIME CODE
ABSTRACTION
USING API’S
PAAS
SANDBOX
IMPACT ON
BUSINESS MODEL
understand all PaaS
possibilities and
components to
prevent reinventing
what already exists
and speed-up
implementation &
migration
shorter release cycles
require smaller feature
sets per release, adapt
your software
development &
release management
method
to be cloud agnostic
you need code
abstraction layers
per PaaS service
you use
design and modify
your software
architecture to fit
the PaaS sandbox
adapt your business
model to PaaS cost
model
4848
Our service architecture
49
INGEST PROCESS ANALYZESTORE
App Engine
Cloud PubSub
App EngineCloud Storage
Datastore
Bigtable
BigQuery
Cloud SQL
Dataflow
Dataproc CloudML
Datalab
BigQuery
API
Events
Devices
Validate
Aggregate
Calculate
Timeseries
Metadata
Tags
Insights
Predict
Decide
50
Data Ingestion Process
Cloud PubSub DataFlow
IoT Equipment
Big Table
BigQuery
5151
Use cases
“Creating actionable insights - sparking new business
concepts and models. Crunching data without being
limited by scale, speed and obsolete pricing models.”
52
5353
Uncovering hidden value from data
54
• Classification
• Clustering
• Regression
• Anomaly detection
• Prediction/forecasting
• Motif discovery
• Association rules
Exploratory Data Analysis with Energyworx
Uncover hidden value from your data!
Features:
- part of Energyworx SaaS
- autoscaling with demand
- notebook development
environment
- private & public models
- Energyworx shared models
5555
Demo: Clustering time series data
from Smart Meters
5656
Q & A
5757
Thank you!

More Related Content

What's hot

On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsTokyo University of Science
 
Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0DataWorks Summit
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive AnalyticsInfochimps, a CSC Big Data Business
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsAli Hodroj
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Mariano Gonzalez
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...Big Data Spain
 
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Cloudera, Inc.
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 
State of enterprise data science
State of enterprise data scienceState of enterprise data science
State of enterprise data scienceYan Xu
 
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platformNatalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platformmatteo mazzeri
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionDr. Arif Wider
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More CapacityGera Shegalov
 
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점r-kor
 
What's New in Neo4j
What's New in Neo4j What's New in Neo4j
What's New in Neo4j Neo4j
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 

What's hot (20)

On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
 
Practical advice to build a data driven company
Practical advice to build a data driven companyPractical advice to build a data driven company
Practical advice to build a data driven company
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
State of enterprise data science
State of enterprise data scienceState of enterprise data science
State of enterprise data science
 
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platformNatalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in Production
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
 
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점
Digital Transformation, OSS, 모두를 위한 AI - 마이크로소프트의 관점
 
What's New in Neo4j
What's New in Neo4j What's New in Neo4j
What's New in Neo4j
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 

Viewers also liked

Sample 1 of Survey Results
Sample 1 of Survey ResultsSample 1 of Survey Results
Sample 1 of Survey ResultsAlistair Bertram
 
Kumuwuki Call to Arms
Kumuwuki Call to ArmsKumuwuki Call to Arms
Kumuwuki Call to ArmsFee Plumley
 
Location
LocationLocation
Location049223
 
code lab live Google Cloud Endpoints [DevFest 2015 Bari]
code lab live Google Cloud Endpoints [DevFest 2015 Bari]code lab live Google Cloud Endpoints [DevFest 2015 Bari]
code lab live Google Cloud Endpoints [DevFest 2015 Bari]Nicola Policoro
 
Businessday okt 2016 - Kyocera
Businessday okt 2016 - KyoceraBusinessday okt 2016 - Kyocera
Businessday okt 2016 - KyoceraMarketing Team
 
Unidades básicas de la ecología y relaciones ecologicas
Unidades básicas de la ecología y relaciones ecologicas Unidades básicas de la ecología y relaciones ecologicas
Unidades básicas de la ecología y relaciones ecologicas Alexcastang
 
12. lavado de manos por friccion
12. lavado de manos por friccion12. lavado de manos por friccion
12. lavado de manos por friccioncaedhmh
 
Banana Trends. KFC Russia Charity project and Teens. Case Study
Banana Trends. KFC Russia Charity project and Teens. Case StudyBanana Trends. KFC Russia Charity project and Teens. Case Study
Banana Trends. KFC Russia Charity project and Teens. Case StudyBanana Trends
 
PRINCIPIOS EN BIOSEGURIDAD. LICENCIADA GABRIELA MERETA CURSO 2016-05-20
PRINCIPIOS EN BIOSEGURIDAD. LICENCIADA GABRIELA MERETA CURSO 2016-05-20PRINCIPIOS EN BIOSEGURIDAD. LICENCIADA GABRIELA MERETA CURSO 2016-05-20
PRINCIPIOS EN BIOSEGURIDAD. LICENCIADA GABRIELA MERETA CURSO 2016-05-20LUIS del Rio Diez
 
Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...javier ramirez
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Simon Su
 
Общественно-политическое развитие СССР в середине 1960-х - начале 1980-х гг.
Общественно-политическое развитие СССР в середине 1960-х - начале 1980-х гг.Общественно-политическое развитие СССР в середине 1960-х - начале 1980-х гг.
Общественно-политическое развитие СССР в середине 1960-х - начале 1980-х гг.Пётр Ситник
 

Viewers also liked (15)

Sample 1 of Survey Results
Sample 1 of Survey ResultsSample 1 of Survey Results
Sample 1 of Survey Results
 
FS1 Discussion Brief
FS1 Discussion BriefFS1 Discussion Brief
FS1 Discussion Brief
 
Kumuwuki Call to Arms
Kumuwuki Call to ArmsKumuwuki Call to Arms
Kumuwuki Call to Arms
 
Location
LocationLocation
Location
 
VA2440 Class 02
VA2440 Class 02VA2440 Class 02
VA2440 Class 02
 
code lab live Google Cloud Endpoints [DevFest 2015 Bari]
code lab live Google Cloud Endpoints [DevFest 2015 Bari]code lab live Google Cloud Endpoints [DevFest 2015 Bari]
code lab live Google Cloud Endpoints [DevFest 2015 Bari]
 
Businessday okt 2016 - Kyocera
Businessday okt 2016 - KyoceraBusinessday okt 2016 - Kyocera
Businessday okt 2016 - Kyocera
 
Unidades básicas de la ecología y relaciones ecologicas
Unidades básicas de la ecología y relaciones ecologicas Unidades básicas de la ecología y relaciones ecologicas
Unidades básicas de la ecología y relaciones ecologicas
 
12. lavado de manos por friccion
12. lavado de manos por friccion12. lavado de manos por friccion
12. lavado de manos por friccion
 
Banana Trends. KFC Russia Charity project and Teens. Case Study
Banana Trends. KFC Russia Charity project and Teens. Case StudyBanana Trends. KFC Russia Charity project and Teens. Case Study
Banana Trends. KFC Russia Charity project and Teens. Case Study
 
My senses
My sensesMy senses
My senses
 
PRINCIPIOS EN BIOSEGURIDAD. LICENCIADA GABRIELA MERETA CURSO 2016-05-20
PRINCIPIOS EN BIOSEGURIDAD. LICENCIADA GABRIELA MERETA CURSO 2016-05-20PRINCIPIOS EN BIOSEGURIDAD. LICENCIADA GABRIELA MERETA CURSO 2016-05-20
PRINCIPIOS EN BIOSEGURIDAD. LICENCIADA GABRIELA MERETA CURSO 2016-05-20
 
Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3
 
Общественно-политическое развитие СССР в середине 1960-х - начале 1980-х гг.
Общественно-политическое развитие СССР в середине 1960-х - начале 1980-х гг.Общественно-политическое развитие СССР в середине 1960-х - начале 1980-х гг.
Общественно-политическое развитие СССР в середине 1960-х - начале 1980-х гг.
 

Similar to Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsData Driven Innovation
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapNeo4j
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data ArchitectureWei-Chiu Chuang
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
PLNOG 3: Tomasz Mikołajczyk - Data scalability. Why you should care?
PLNOG 3: Tomasz Mikołajczyk -  Data scalability. Why you should care?PLNOG 3: Tomasz Mikołajczyk -  Data scalability. Why you should care?
PLNOG 3: Tomasz Mikołajczyk - Data scalability. Why you should care?PROIDEA
 
Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyMongoDB
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
 
Accelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyAccelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyMongoDB
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
 
Accelerating a Path to Digital with a Cloud Data Strategy
Accelerating a Path to Digital with a Cloud Data StrategyAccelerating a Path to Digital with a Cloud Data Strategy
Accelerating a Path to Digital with a Cloud Data StrategyMongoDB
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?Denodo
 
Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen KeynoteData Con LA
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Denodo
 
Cloud and Data Analytics Architecture: Data Everywhere for Everyone
Cloud and Data Analytics Architecture: Data Everywhere for EveryoneCloud and Data Analytics Architecture: Data Everywhere for Everyone
Cloud and Data Analytics Architecture: Data Everywhere for EveryoneMichal Hodinka
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraAttunity
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
 
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...Mydbops
 

Similar to Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery (20)

DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data Architecture
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
PLNOG 3: Tomasz Mikołajczyk - Data scalability. Why you should care?
PLNOG 3: Tomasz Mikołajczyk -  Data scalability. Why you should care?PLNOG 3: Tomasz Mikołajczyk -  Data scalability. Why you should care?
PLNOG 3: Tomasz Mikołajczyk - Data scalability. Why you should care?
 
Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data Strategy
 
Data Platform on GCP
Data Platform on GCPData Platform on GCP
Data Platform on GCP
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Accelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyAccelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data Strategy
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
Accelerating a Path to Digital with a Cloud Data Strategy
Accelerating a Path to Digital with a Cloud Data StrategyAccelerating a Path to Digital with a Cloud Data Strategy
Accelerating a Path to Digital with a Cloud Data Strategy
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
 
Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen Keynote
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
 
Cloud and Data Analytics Architecture: Data Everywhere for Everyone
Cloud and Data Analytics Architecture: Data Everywhere for EveryoneCloud and Data Analytics Architecture: Data Everywhere for Everyone
Cloud and Data Analytics Architecture: Data Everywhere for Everyone
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

  • 1. Analyzing petabytes of smart meter data using Cloud Bigtable, Cloud Dataflow, and BigQuery Edwin Poot & Erik van Wijk, Energyworx Max Luebbe, Google
  • 3. 3 ● rise of renewable energy sources ● regulation & market demands ● competition & increased costs ● intelligent devices in the home or along the utilities infrastructure (“Internet of Things”) ● two-way flow of information instead of one- way ● increase of consumption
  • 4. 4 1. increasing density brings increasing data quality problems 2. strict regulations for safeguarding user privacy 3. redistribution of economic power and energy demand 4. rising competition between distributed and central 5. innovation outpaces regulation Top 5 industry challenges
  • 5. www.energyworx.com CHINA 435 M USA 132 M JAPAN 58.7 MFRANCE 35 M UK 53 M NL 8 M Italy 32 M Ontario 4.7 M British Columbia 1.2 M Quebec 3.8 M Germany 50 M 5
  • 6. conventional utility systems cannot cope with this data diversity and endless stream of all types, shapes and sizes smart meters smart grid equipment sensors home automation multichannel customer interactions consumers’ usage behavior weather social spatial creating a single, centralized view of data – accessible to many, and for many use cases, that is the key to success 6
  • 7. “We enable the energy evolution by uncovering and monetizing the hidden value of your data!” ingest, process, analyze & learn 7
  • 8. 8 Enabling data-driven business models for the Energy & Utility industry since 2012 Offices in The Netherlands and in the United States, Delivering a revolutionary data management & intelligence cloud service disrupting the global Energy & Utilities market Pushing out established vendors using pure play SaaS Creating actionable information - sparking new business concepts and models Crunching data without being limited by scale, speed and obsolete pricing models
  • 9. 9 generation Meter Data Management Renewable Energy Management transmission trading distribution supply Social Energy Consumer Engagement imbalances settlements Energy insights for wholesale connections energyworx and the energy value chain
  • 10. 10 ENERGY INTELLIGENCE ENERGY PROSUMERS & RETAILERS Demand Response (price) Energy Insights Demand Response (load) Grid Insights Renewables Engagement Gamification Benchmarking Balancing Congestion Optimization Anomalies MARKETS & SOLUTIONS ENERGY DATA MANAGEMENT Meter Data Management Energy Data Hub ENERGY SYSTEM OPERATORS
  • 11. 11 ● Always supporting the latest IoT products and/or equipment ● Protocol agnostic data ingestion and limitless computation capacity ● Cloud Machine learning to support new business concepts and models ● Pay as you grow SaaS model, so no large upfront investments OUR ADVANTAGES
  • 13. 13 PLATFORM EVOLUTION HIGHLIGHTS 2012 2013 2014 2015 2016 - batched data - temporal aggregations - VEE - utility connectivity - API - multi-tenancy - permissions - custom querying - grouping - tag properties - datalabs (EDA) - Machine learning - CloudML - (A)DR - streaming data - pseudonymisation - tagging - analytics - dynamic profiling - PayPerUse model - IoT devices - many new adapters - performance - web console - Sheets addon Data ingestion & management Insights & analysis Intelligence & IoT control
  • 14. 14
  • 15. DELIVER A DATA MANAGEMENT & ANALYTICS SERVICE FOR ENERGY & UTILITY COMPANIES PUBLIC & PRIVATE CLOUD 15
  • 17. 17 Google's mission to "organize the world’s information" presents new challenges.
  • 18. 18 Big Data technologies invented at Google 2012 20132002 2004 2006 2008 2010 GFS MapReduce Bigtable Colossus Dremel Flume Millwheel
  • 19. 1919 How do we … ?
  • 20. 20 … build a 100TB+ filesystem? Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale.
  • 21. 21 … build a 100TB+ filesystem? Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale. Solution: GFS (replaced by higher-scale Colossus in 2010)
  • 22. 22 … build a 100TB+ filesystem? Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale. Solution: GFS (replaced by higher-scale Colossus in 2010) Google Cloud Storage
  • 23. 23 Need: Massive data index files took weeks to rebuild. We needed random read/write access. … build a petabyte database?
  • 24. 24 Need: Massive data index files took weeks to rebuild. We needed random read/write access. Solution: Bigtable (internal service launched 2006) … build a petabyte database?
  • 25. 25 Need: Massive data index files took weeks to rebuild. We needed random read/write access. Solution: Bigtable (internal service launched 2006) Google Cloud Bigtable … build a petabyte database?
  • 26. 26 Need: Ad hoc queries over massive quantities of data, in just seconds. … query a trillion rows in seconds?
  • 27. 27 Need: Ad hoc queries over massive quantities of data, in just seconds. Solution: Dremel … query a trillion rows in seconds?
  • 28. 28 Need: Ad hoc queries over massive quantities of data, in just seconds. Solution: Dremel Google BigQuery … query a trillion rows in seconds?
  • 29. 29 Need: Process petabytes of static and streaming data, quickly. … build data-processing at Google scale?
  • 30. 30 Need: Process petabytes of static and streaming data, quickly. Solution: MapReduce, Flume, and Millwheel … build data-processing at Google scale?
  • 31. 31 Need: Process petabytes of static and streaming data, quickly. Solution: MapReduce, Flume, and Millwheel Google Cloud Dataflow … build data-processing at Google scale?
  • 32. 3232 Imagine what one can build...
  • 33. 33 .. when scale is a solved problem.
  • 34. 34 Google Cloud Platform is the same infrastructure Cloud Storage BigQuery Cloud DataflowCloud Bigtable
  • 35. 35 Cloud Bigtable is the same service Google uses Cloud Bigtable Bigtable Service
  • 36. 36 What is Cloud Bigtable? NoSQL database for large datasets / large throughput Supports sequential scans Auto-adjusts to access patterns
  • 37. 37 Bigtable Node Bigtable Node Bigtable Node How does Cloud Bigtable work? Colossus Filesystem Client Client Client Client Client Client Processing Storage Clients
  • 38. 38 Node Cloud Bigtable learns access patterns... Filesystem Node Node Client Client Client Client Client Client Processing Storage Clients A B C D E
  • 39. 39 Node Node Node … and rebalances data accordingly Filesystem Client Client Client Client Client Client Processing Storage Clients A B C D EB C
  • 40. 40 Throughput can be controlled by node count Node Node Node Nodes 80,000 60,000 40,000 20,000 QPS Bigtable Nodes 86420 0
  • 41. 41 Throughput can be controlled by node count 400,000 300,000 200,000 100,000 QPS Bigtable Nodes 403020100 0 Nodes Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node
  • 42. 42 Throughput can be controlled by node count 4,000,000 3,000,000 2,000,000 1,000,000 QPS Bigtable Nodes 4003002001000 0 Nodes Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node
  • 43. 43 Years of engineering to... Teach Bigtable to configure itself Isolate performance from “noisy neighbors” React automatically to new patterns, splitting and balancing Cloud Bigtable
  • 44. 44 Google has had an internal cloud for over a decade The same engineering that has made our internal services better makes our Cloud better: Simpler control planes Multi-tenancy Adapts to large, new patterns
  • 46. 46 Why did we choose ● Fastest with consistent performance ● Competitive and transparent pricing ● Autoscale to millions of users (and back) ● Unlimited flexible storage and caching ● Big Data & Machine Learning capabilities ● Development SDK & tools ● 24/7 access to expert support resources
  • 47. 47 5 things we’ve learned along the way 1 2 3 4 5 SKILLS, KNOWLEDGE & TRAINING REQUIRED IMPLEMENTATION TIME CODE ABSTRACTION USING API’S PAAS SANDBOX IMPACT ON BUSINESS MODEL understand all PaaS possibilities and components to prevent reinventing what already exists and speed-up implementation & migration shorter release cycles require smaller feature sets per release, adapt your software development & release management method to be cloud agnostic you need code abstraction layers per PaaS service you use design and modify your software architecture to fit the PaaS sandbox adapt your business model to PaaS cost model
  • 49. 49 INGEST PROCESS ANALYZESTORE App Engine Cloud PubSub App EngineCloud Storage Datastore Bigtable BigQuery Cloud SQL Dataflow Dataproc CloudML Datalab BigQuery API Events Devices Validate Aggregate Calculate Timeseries Metadata Tags Insights Predict Decide
  • 50. 50 Data Ingestion Process Cloud PubSub DataFlow IoT Equipment Big Table BigQuery
  • 52. “Creating actionable insights - sparking new business concepts and models. Crunching data without being limited by scale, speed and obsolete pricing models.” 52
  • 54. 54 • Classification • Clustering • Regression • Anomaly detection • Prediction/forecasting • Motif discovery • Association rules Exploratory Data Analysis with Energyworx Uncover hidden value from your data! Features: - part of Energyworx SaaS - autoscaling with demand - notebook development environment - private & public models - Energyworx shared models
  • 55. 5555 Demo: Clustering time series data from Smart Meters