SlideShare a Scribd company logo
1 of 25
Download to read offline
Data Engineering
At British Gas Connected Homes
1Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
2Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
British Gas / Connected Homes
• British Gas is a 200 year old company
• Connected Homes is BG’s IoT “startup”
• Leader in the UK’s connected home market
Data Sources
• Gas and electricity meter readings
• Thermostat temperature data
• Connected boiler data
• Real time energy consumption data
• Introducing motion sensors, window
and door sensors, etc.
3Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Meter Data
• Millions of gas and electricity
customers
• ~600k smart meters
• Readings every 30 minutes from
smart meters
4Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Machine Learning
applied to Meter Data
• Energy disaggregation
• Similar homes comparison
• Smart meters used in indirect
algorithms for non-smart customers
5Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Connected Thermostats
• > 200k Connected Thermostats
• Temperature data time series
6Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Connected Boilers
• Proactive maintenance
• Failure detection
7Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
In Home Displays
in a mobile App
• Data every 10 seconds
• Still needs an access device
connected to the router
• Allows real time mobile alerts
8Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Technologies we use
Technologies we are trying
Our Engineering process
• Two points of friction at the
intersection between teams
• Sharing datasets is problematic
• Real infrastructure too different from
real environments
• New technologies too hard to
deploy
• Time to production > 6 months
10Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Solution #1:
Data Ops
• Data oriented DevOps instead of
service oriented DevOps:
• Stateful instead of stateless
• Jobs instead of config
• Resource management instead of
resource partitioning
11Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Solution #1: Data Ops
• Ansible and Docker:
1. Smooth transition from development testing to production
2. blue / green deployments
3. swarm / mesos + docker = better use of infrastructure
• Time to production down to < 2 months :-|
12Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Future Solution #2: Data Science Environment
• Ideally Data Science models should be plug and play
• Python and R dataframes in Spark are promising but data scientists don’t
feel the need of Spark
• Data scientists prefer to work with relational DBs
• We need to find a way to make production datasets available to them
13Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Future Solution #2:
Data Science Environment
• Possible solutions we are investigating
are:
• Automated exports into a data
science relational DB
• Spark SQL server
• Automatically generated environment
images
• Objective is to reduce implementation
time for new features to < 1 month
14Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Use Case
High Consumption Alerts
• The red dot on top is what we want
to detect
• The green bottom dots are the
baseline plus the fridge
15Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
High Consumption Alerts
Data Ingest
• Very high volume of messages (every
10 seconds)
• Kafka partitions help us cope with
volume
• (experimental) we’re trying Samza for
quick sliding-window type
transformations
• Often we miss reads, the Samza job
also does basic interpolation
16Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
High Consumption Alerts
Spark Streaming with Cassandra
• Real time data comes from Kafka
• Cassandra stores historical usage
information
• A Spark Streaming job combines
both and applies a machine
learning algorithm to generate high
usage alerts
17Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
High Consumption Alerts
Overall Architecture
• Getting the partitions right is very
important for scalability
• Spark-Cassandra connector keeps
C* partitions
• It’s important to match Kafka
partitioning to CassandraRDD
partitioning
18Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
High Consumption alerts | Main Spark loop
19Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Data Partitioning
• Data systems like Cassandra or
Kafka scale by partitioning data
• Given enough partitions, any
technology can work
• We need a simple hashing algorithm
that works the same in many
languages and across technologies
20Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Cassandra data modelling with buckets
• Using a hashing function that is uniform and deterministic we can cope
time series data of any amount of customers
• One of our preferred strategies is to use buckets
21Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
h(k) = ⌊m * frac(kA)⌋
• Multiplicative hashing is our preferred simple partitioning algorithm
• m= Number of partitions
• A≈(√5−1)/2 = 0.6180339887... (Golden Ratio)
• Online example: jsfiddle.net/joscas/yfp72fq5
22Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Summary
• Increase in productivity with portable environments (Ansible, Docker,
Mesos)
• Getting partitions straight is essential
• Using a simple common hashing algorithm across technologies and
languages is very helpful
23Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Summary
• Streaming technologies are rapidly evolving
• Spark streaming is complex but with many advantages (Spark’s excellent
integration with Cassandra, Spark’s ML libraries, etc.)
• Kafka ticks a lot of boxes for large scale distributed real time data
systems
24Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
Thank you
josep.casals@bgch.co.uk
@jcasals
25Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA

More Related Content

What's hot

Using druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleUsing druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleItai Yaffe
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraDataStax Academy
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
 
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...DataStax
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeItai Yaffe
 
Case Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developerCase Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developerCarlos Alonso Pérez
 
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One DayBuilding A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One DayVMware Tanzu
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluDataWorks Summit
 
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017Big Data Spain
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...DataStax
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleItai Yaffe
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingDatabricks
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Scaling monitoring with Datadog
Scaling monitoring with DatadogScaling monitoring with Datadog
Scaling monitoring with Datadogalexismidon
 
The Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to DatabaseThe Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to DatabaseDataStax Academy
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...Dataconomy Media
 
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft..."Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...Dataconomy Media
 

What's hot (20)

Using druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scaleUsing druid for interactive count distinct queries at scale
Using druid for interactive count distinct queries at scale
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with CassandraReltio: Powering Enterprise Data-driven Applications with Cassandra
Reltio: Powering Enterprise Data-driven Applications with Cassandra
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
 
Case Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developerCase Study: Troubleshooting Cassandra performance issues as a developer
Case Study: Troubleshooting Cassandra performance issues as a developer
 
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One DayBuilding A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at Hulu
 
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Scaling monitoring with Datadog
Scaling monitoring with DatadogScaling monitoring with Datadog
Scaling monitoring with Datadog
 
The Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to DatabaseThe Last Pickle: Distributed Tracing from Application to Database
The Last Pickle: Distributed Tracing from Application to Database
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
 
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft..."Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
"Source Code Abstracts Classification Using CNN", Vadim Markovtsev, Lead Soft...
 

Viewers also liked

Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
 
Apache Cassandra at Narmal 2014
Apache Cassandra at Narmal 2014Apache Cassandra at Narmal 2014
Apache Cassandra at Narmal 2014DataStax Academy
 
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...DataStax Academy
 
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...DataStax Academy
 
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...DataStax Academy
 
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...DataStax Academy
 
Introduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for CassandraIntroduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for CassandraDataStax Academy
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSDataStax Academy
 
Coursera's Adoption of Cassandra
Coursera's Adoption of CassandraCoursera's Adoption of Cassandra
Coursera's Adoption of CassandraDataStax Academy
 
Production Ready Cassandra (Beginner)
Production Ready Cassandra (Beginner)Production Ready Cassandra (Beginner)
Production Ready Cassandra (Beginner)DataStax Academy
 
Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!DataStax Academy
 
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsBattery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsDataStax Academy
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax Academy
 
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...DataStax Academy
 
Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2
Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2
Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2DataStax Academy
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessJon Haddad
 
Crash course intro to cassandra
Crash course   intro to cassandraCrash course   intro to cassandra
Crash course intro to cassandraJon Haddad
 
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax Academy
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core ConceptsJon Haddad
 

Viewers also liked (20)

Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Apache Cassandra at Narmal 2014
Apache Cassandra at Narmal 2014Apache Cassandra at Narmal 2014
Apache Cassandra at Narmal 2014
 
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
 
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...
 
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
 
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
 
Introduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for CassandraIntroduction to Dating Modeling for Cassandra
Introduction to Dating Modeling for Cassandra
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
 
Coursera's Adoption of Cassandra
Coursera's Adoption of CassandraCoursera's Adoption of Cassandra
Coursera's Adoption of Cassandra
 
Production Ready Cassandra (Beginner)
Production Ready Cassandra (Beginner)Production Ready Cassandra (Beginner)
Production Ready Cassandra (Beginner)
 
Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!Cassandra Summit 2014: Monitor Everything!
Cassandra Summit 2014: Monitor Everything!
 
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsBattery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
 
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
 
New features in 3.0
New features in 3.0New features in 3.0
New features in 3.0
 
Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2
Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2
Cassandra Summit 2014: The Cassandra Experience at Orange — Season 2
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 Awesomeness
 
Crash course intro to cassandra
Crash course   intro to cassandraCrash course   intro to cassandra
Crash course intro to cassandra
 
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 

Similar to British Gas Connected Homes: Data Engineering

20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...Amazon Web Services
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQLCrate.io
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraStratio
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandraScyllaDB
 
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...DataStax Academy
 
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...Amazon Web Services
 
Architecture et modèle de données Cassandra
Architecture et modèle de données CassandraArchitecture et modèle de données Cassandra
Architecture et modèle de données CassandraClaude-Alain Glauser
 
Introduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSEIntroduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSEUlises Fasoli
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprisejbellis
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkHentsū
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsData Con LA
 

Similar to British Gas Connected Homes: Data Engineering (20)

20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
 
Couchbase 3.0.2 d1
Couchbase 3.0.2  d1Couchbase 3.0.2  d1
Couchbase 3.0.2 d1
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
 
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...
 
Architecture et modèle de données Cassandra
Architecture et modèle de données CassandraArchitecture et modèle de données Cassandra
Architecture et modèle de données Cassandra
 
Introduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSEIntroduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSE
 
Camunda and Apache Cassandra
Camunda and Apache CassandraCamunda and Apache Cassandra
Camunda and Apache Cassandra
 
Spark meets Smart Meters
Spark meets Smart MetersSpark meets Smart Meters
Spark meets Smart Meters
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
 
Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

British Gas Connected Homes: Data Engineering

  • 1. Data Engineering At British Gas Connected Homes 1Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 2. 2Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA British Gas / Connected Homes • British Gas is a 200 year old company • Connected Homes is BG’s IoT “startup” • Leader in the UK’s connected home market
  • 3. Data Sources • Gas and electricity meter readings • Thermostat temperature data • Connected boiler data • Real time energy consumption data • Introducing motion sensors, window and door sensors, etc. 3Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 4. Meter Data • Millions of gas and electricity customers • ~600k smart meters • Readings every 30 minutes from smart meters 4Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 5. Machine Learning applied to Meter Data • Energy disaggregation • Similar homes comparison • Smart meters used in indirect algorithms for non-smart customers 5Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 6. Connected Thermostats • > 200k Connected Thermostats • Temperature data time series 6Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 7. Connected Boilers • Proactive maintenance • Failure detection 7Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 8. In Home Displays in a mobile App • Data every 10 seconds • Still needs an access device connected to the router • Allows real time mobile alerts 8Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 10. Our Engineering process • Two points of friction at the intersection between teams • Sharing datasets is problematic • Real infrastructure too different from real environments • New technologies too hard to deploy • Time to production > 6 months 10Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 11. Solution #1: Data Ops • Data oriented DevOps instead of service oriented DevOps: • Stateful instead of stateless • Jobs instead of config • Resource management instead of resource partitioning 11Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 12. Solution #1: Data Ops • Ansible and Docker: 1. Smooth transition from development testing to production 2. blue / green deployments 3. swarm / mesos + docker = better use of infrastructure • Time to production down to < 2 months :-| 12Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 13. Future Solution #2: Data Science Environment • Ideally Data Science models should be plug and play • Python and R dataframes in Spark are promising but data scientists don’t feel the need of Spark • Data scientists prefer to work with relational DBs • We need to find a way to make production datasets available to them 13Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 14. Future Solution #2: Data Science Environment • Possible solutions we are investigating are: • Automated exports into a data science relational DB • Spark SQL server • Automatically generated environment images • Objective is to reduce implementation time for new features to < 1 month 14Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 15. Use Case High Consumption Alerts • The red dot on top is what we want to detect • The green bottom dots are the baseline plus the fridge 15Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 16. High Consumption Alerts Data Ingest • Very high volume of messages (every 10 seconds) • Kafka partitions help us cope with volume • (experimental) we’re trying Samza for quick sliding-window type transformations • Often we miss reads, the Samza job also does basic interpolation 16Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 17. High Consumption Alerts Spark Streaming with Cassandra • Real time data comes from Kafka • Cassandra stores historical usage information • A Spark Streaming job combines both and applies a machine learning algorithm to generate high usage alerts 17Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 18. High Consumption Alerts Overall Architecture • Getting the partitions right is very important for scalability • Spark-Cassandra connector keeps C* partitions • It’s important to match Kafka partitioning to CassandraRDD partitioning 18Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 19. High Consumption alerts | Main Spark loop 19Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 20. Data Partitioning • Data systems like Cassandra or Kafka scale by partitioning data • Given enough partitions, any technology can work • We need a simple hashing algorithm that works the same in many languages and across technologies 20Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 21. Cassandra data modelling with buckets • Using a hashing function that is uniform and deterministic we can cope time series data of any amount of customers • One of our preferred strategies is to use buckets 21Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 22. h(k) = ⌊m * frac(kA)⌋ • Multiplicative hashing is our preferred simple partitioning algorithm • m= Number of partitions • A≈(√5−1)/2 = 0.6180339887... (Golden Ratio) • Online example: jsfiddle.net/joscas/yfp72fq5 22Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 23. Summary • Increase in productivity with portable environments (Ansible, Docker, Mesos) • Getting partitions straight is essential • Using a simple common hashing algorithm across technologies and languages is very helpful 23Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 24. Summary • Streaming technologies are rapidly evolving • Spark streaming is complex but with many advantages (Spark’s excellent integration with Cassandra, Spark’s ML libraries, etc.) • Kafka ticks a lot of boxes for large scale distributed real time data systems 24Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA
  • 25. Thank you josep.casals@bgch.co.uk @jcasals 25Josep Casals | @jcasals | CassandraSummit, 2015 Santa Clara CA