Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
IoT @ Google Scale
James Chittenden
Google Cloud Platform Solutions Engineer
jameschi@google.com
+James Chittenden
(Big Data Cloud Engineer)
jameschi@google.com
Big Data at Google
aka. Data at Google
Manage the Entire Lifecycle of Big Data
Cloud Logs
Google App
Engine
Google Analytics
Premium
Cloud Pub/Sub
BigQuery Stora...
End to End View of the GCP IoT Architecture
Device to Device Protocols
● Device Discovery
● Device to Device authentication
● Device Configuration
● Protocol Routing
Machine Learning: Pattern Detection and Prediction
● Subscribers scan real time
streams and feed data into the
Machine Lea...
Cloud Storage Archival and Retrieval
● Data is periodically unloaded
from Big Table and stored in
Cloud Storage for archiv...
Cloud Pub/Sub
Real-time and reliable messaging with Pub/Sub
Messaging is a shock-absorber
Throughput LatencyAvailability
Images by Connie
Zhou
• Buffer new requests
during outages
• ...
Pub/Sub is a change-absorber
Sinks TransformsSources
Images by Connie
Zhou
• New data sources can
plug into old data
flows...
Chat & Mobile
Every time your GMail box
pops up a new message,
it’s because of a push
notification to your
browser or mobi...
HTTP Server
Subscriber
Pub/Sub System
Webhook
Delivery
Publisher
Topic
Subscription
HTTP Push
Delivery
Google
App Engine
P...
Subscriber
Msg
Pub/Sub System
Subscriber
Msg
Pub/Sub System
Ack
RPC Send
RPC Return
Ack
Push Subscription Pull Subscription
“We don’t really run MapReduce at Google anymore”
- Urs Hoelzle
Google Dataflow
Google Technologies
SpannerDremelMapReduce
Big Table
MillWheel
2012 2014+2002 2004 2006 2008 2010
GFS
2013
More!
Flumejava...
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid s...
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid s...
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid s...
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid s...
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid s...
Autoscaling mid-job
Fully managed - No-Ops
Intuitive Data Processing Framework
Batch and Stream Processing in one
Liquid s...
Unified Model
Unified Model
Pub/Sub + Dataflow + BigQuery Demo
Life of a Pipeline
Dataflow
Your Data
BigQuery
Fast ETL
Regex
JSON
UDFs
Spreadsheets
BI Tools
Coworkers
Applications + Reports
PubSub
Cloud S...
Plus True Stream Processing
Plus Autoscaling and per-minute billing
All the benefits of Hadoop-on-Google
Plus a Fully-Mana...
Questions?
IoT at Google Scale
Upcoming SlideShare
Loading in …5
×

IoT at Google Scale

2,404 views

Published on

IoT Analytics at Google Scale with James Chittenden: Using PubSub Dataflow, and BigQuery to Capture Millions of Connected Devices

There is the potential for 50 billion connected devices by 2020. Google Cloud Platform gives you the tools to scale connections, gather and make sense of data, and provide the reliable customer experiences that hardware devices require. Google’s Cloud Platform provides the infrastructure to handle streams of data fed from millions of intelligent devices.

In this meetup, we'll explore one of the world's largest appliance manufacturer's IoT architecture along with Google's partner Archipelago, and will drill into how they are leveraging Google's massive infrastructure in their solution. We'll explore what Google provides for IoT, including Pub/Sub for messaging, Dataflow for data processing, BigQuery for large scale analytics as well as best practices for real time stream processing accounting for ingest, processing, storage and analysis of hundreds of millions of events per hour.

Published in: Technology

IoT at Google Scale

  1. 1. IoT @ Google Scale James Chittenden Google Cloud Platform Solutions Engineer jameschi@google.com
  2. 2. +James Chittenden (Big Data Cloud Engineer) jameschi@google.com
  3. 3. Big Data at Google aka. Data at Google
  4. 4. Manage the Entire Lifecycle of Big Data Cloud Logs Google App Engine Google Analytics Premium Cloud Pub/Sub BigQuery Storage (tables) Cloud Bigtable (noSQL) Cloud Storage (files) Cloud Dataflow BigQuery Analytics (SQL) Capture Store Analyze Batch Real time analytics and Alerts Cloud DataStore Process Stream Cloud Dataflow Cloud Monitoring
  5. 5. End to End View of the GCP IoT Architecture
  6. 6. Device to Device Protocols ● Device Discovery ● Device to Device authentication ● Device Configuration ● Protocol Routing
  7. 7. Machine Learning: Pattern Detection and Prediction ● Subscribers scan real time streams and feed data into the Machine Learning Recognition algorithm ● Dataflow Orchestrates streaming algorithms which compare data streams against Experience Database ● Correlators detect known patterns and publish alerts using Cloud Pub/Sub
  8. 8. Cloud Storage Archival and Retrieval ● Data is periodically unloaded from Big Table and stored in Cloud Storage for archival ● Data in Cloud Storage can be quickly re-loaded in Big Table should it need to be re- processed.
  9. 9. Cloud Pub/Sub Real-time and reliable messaging with Pub/Sub
  10. 10. Messaging is a shock-absorber Throughput LatencyAvailability Images by Connie Zhou • Buffer new requests during outages • Prevent overloads that cause outages • Redirect requests to recover from outages • Smooth out spikes in new request rate • Balance load across multiple workers • Balance arrival rate with service rate • Accept requests closer to the network edge • Optimize message flow across regions • Leverage shared efforts to improve protocols
  11. 11. Pub/Sub is a change-absorber Sinks TransformsSources Images by Connie Zhou • New data sources can plug into old data flows • New data sources can use new schemas • Common security policies for all sources • Data can be sent to new destinations • Push and Pull delivery are both available • Spans organizational boundaries • Select subsets of messages that matter • Helps manage schema and version changes • Can merge streams into new topics
  12. 12. Chat & Mobile Every time your GMail box pops up a new message, it’s because of a push notification to your browser or mobile device. One of the most important real-time information streams in the company is advertising revenue — we use Pub/Sub to broadcast budgets to our entire fleet of search engines Google Cloud Messaging for Android delivers billions of messages a day, reliably and securely for Google’s own mobile apps and the entire developer community Updating search results as you type is a feat of real- time indexing that depends on Pub/Sub to update caches with breaking news Ads & Budgets Instant SearchPush Notifications Pub/Sub at Google
  13. 13. HTTP Server Subscriber Pub/Sub System Webhook Delivery Publisher Topic Subscription HTTP Push Delivery Google App Engine Pull Subscriber Subscription Subscription Google RPC Delivery Cloud Dataflow Subscription On-Prem/Cloud Any Environment
  14. 14. Subscriber Msg Pub/Sub System Subscriber Msg Pub/Sub System Ack RPC Send RPC Return Ack Push Subscription Pull Subscription
  15. 15. “We don’t really run MapReduce at Google anymore” - Urs Hoelzle Google Dataflow
  16. 16. Google Technologies SpannerDremelMapReduce Big Table MillWheel 2012 2014+2002 2004 2006 2008 2010 GFS 2013 More! Flumejava Colossus
  17. 17. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 Dataflow Goodies
  18. 18. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 Pipeline p = Pipeline.create(); p.begin() .apply(TextIO.Read.from(“gs://…”)) .apply(ParDo.of(new ExtractTags()) .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3)) .apply(TextIO.Write.to(“gs://…”)); p.run(); Dataflow Goodies
  19. 19. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 Deploy Schedule & Monitor Dataflow Goodies
  20. 20. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 800 RPS 1200 RPS 5000 RPS 50 RPS Dataflow Goodies
  21. 21. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 Dataflow Goodies
  22. 22. Autoscaling mid-job Fully managed - No-Ops Intuitive Data Processing Framework Batch and Stream Processing in one Liquid sharding mid-job 1 2 3 4 5 Pipeline p = Pipeline.create(); p.begin() .apply(TextIO.Read.from(“gs://…”)) .apply(ParDo.of(new ExtractTags()) .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3)) .apply(TextIO.Write.to(“gs://…”)); p.run(); .apply(PubsubIO.Read.from(“input_topic”)) .apply(Window.<Integer>by(FixedWindows.of(5, MINUTES)) .apply(PubsubIO.Write.to(“output_topic”)); Dataflow Goodies
  23. 23. Unified Model
  24. 24. Unified Model
  25. 25. Pub/Sub + Dataflow + BigQuery Demo
  26. 26. Life of a Pipeline
  27. 27. Dataflow Your Data BigQuery Fast ETL Regex JSON UDFs Spreadsheets BI Tools Coworkers Applications + Reports PubSub Cloud Storage BigTable Enterprise Big Data Architecture on Google
  28. 28. Plus True Stream Processing Plus Autoscaling and per-minute billing All the benefits of Hadoop-on-Google Plus a Fully-Managed Service Plus New, Intuitive Framework 1 2 3 4 5 Why Dataflow?
  29. 29. Questions?

×