Big Data Analytics with
Google Cloud Platform
Speaker
William Vambenepe
Google Cloud Platform
Lead Product Manager for Big Data

Twitter: @vambenepe
Google ran into
“Big Data” problems
in the process of
building its business...
Big data at Google scale
How much video is uploaded
to YouTube every minute?
How many active Gmail
users are there?
How la...
...now that the technology
is available, its usage is
spreading to all industries.
Conventional Methods for Data & Analytics

Focused on optimizing core business processes and
capture details about high in...
Conventional Methods for Data & Analytics
Stopped collecting data at the nearest boundary of business
process:
●
●

Not in...
Rise of Big Data enabled by Tech Innovation

It is Possible to instrument
data capture at any level of
business process...
...and to store and
process it easily
and efficiently.
Big Data enabled by Tech Innovation

Highly scalable, performant and cost effective
computational capabilities in the clou...
For the past 15 years, Google
has been building out the
world’s fastest, most
powerful, highest quality
cloud infrastructu...
Cloud Platform is built on the
same infrastructure that
powers Google.
Google's
Platform
"[Google's] ability to build,
organize, and operate a huge
network of servers and fiberoptic cables with...
Investing In Our Cloud

$2.9B in additional data center investments worldwide
Google Innovations in Software

MapReduce

2002

2004

GFS

Dremel

2006

Big Table

2008

Colossus

2010

Spanner

2012
Google Cloud Platform
Compute
Compute
Engine
App Engine

Storage

App Services

Cloud Storage

BigQuery

Cloud SQL

Cloud
...
Google Compute Engine and
Google Cloud Storage
Google Compute Engine
Compute Engine
Virtual Machine

Hosting

Compute

Network

Storage

Tooling

Launch Virtual
Machines...
Google Compute Engine
“Google Compute Engine is not just
fast. It’s Google fast. In fact, it’s a class
of fast that enable...
Google Compute Engine
MapR on GCE breaks the TeraSort, then MinuteSort records

● sorted 15 billion 100-byte records (1.5T...
Computing in the Cloud with GCE
Cancer Research
"Google Compute Engine is just part of what we see as a whole
new way for ...
Google Cloud Storage
Google Cloud Storage lets you store and access data on Google's
reliable infrastructure
SPEED

RELIAB...
Specialized Big Data
services of Google
Cloud Platform
Big Data example: app logs

User
requests

Application
servers
Big Data example: app logs

App
persistence

BigTable / GFS

App logging

User
requests

Application
servers

Logs cluster...
Big Data example: app logs
MapReduce

App
persistence

BigTable / GFS
Dremel
MapReduce
App logging

User
requests

Applica...
Big Data example: app logs
MapReduce

App
persistence

SQL

BigTable / GFS
Dremel
MapReduce

Analysts
& reporting

App log...
Big Data example: app logs
MapReduce

App
persistence

SQL

BigTable / GFS
Dremel
MapReduce

Analysts
& reporting

App log...
Google Innovations in Software

MapReduce

2002

2004

GFS

Dremel

2006

Big Table

2008

Colossus

2010

Spanner

2012
Google Cloud Platform
Services: BigQuery
Google BigQuery
● Query billions of rows in seconds, with no index
● Uses a SQL-style query syntax
● Bulk and stream data ...
How Google Approaches Analytics
Dremel
Ad-hoc query system for terabyte datasets
●
●
●

Query execution tree
Column Orient...
Google BigQuery
BigQuery: a fully-managed data analytics service in the cloud.
Unlimited storage. Interactive analysis on ...
Big Data Processing Pipeline
Application
level code

Log data

BI tools

Logstore
Backends +
MapReduce
Structured
data

Bi...
Big Data example: app logs
MapReduce

App
persistence

SQL

BigTable / GFS
Dremel
MapReduce

Analysts
& reporting

App log...
Sample BigQuery use cases
Display ads analytics
(global top-5 media agency)

Ads network reporting
(3rd party mobile ads)
...
https://cloud.google.com/resources/starterpack/
Upcoming SlideShare
Loading in...5
×

Big Data Analytics with Google Cloud Platform

1,089

Published on

Big Data Analytics with Google Cloud Platform

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,089
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
143
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big Data Analytics with Google Cloud Platform

  1. 1. Big Data Analytics with Google Cloud Platform
  2. 2. Speaker William Vambenepe Google Cloud Platform Lead Product Manager for Big Data Twitter: @vambenepe
  3. 3. Google ran into “Big Data” problems in the process of building its business...
  4. 4. Big data at Google scale How much video is uploaded to YouTube every minute? How many active Gmail users are there? How large is Google's web search index? How long does it take for Google to respond to a search query? 100 hours 500M+ 100PB+ (over 100,000 TBs) 0.25 seconds
  5. 5. ...now that the technology is available, its usage is spreading to all industries.
  6. 6. Conventional Methods for Data & Analytics Focused on optimizing core business processes and capture details about high interactions/transactions
  7. 7. Conventional Methods for Data & Analytics Stopped collecting data at the nearest boundary of business process: ● ● Not instrumented for tracking of product usage by customer Instrumentation for tracking supplier performance based on supplier specific situation Businesses decision makers fill data gaps with gut feel...
  8. 8. Rise of Big Data enabled by Tech Innovation It is Possible to instrument data capture at any level of business process...
  9. 9. ...and to store and process it easily and efficiently.
  10. 10. Big Data enabled by Tech Innovation Highly scalable, performant and cost effective computational capabilities in the cloud makes it possible to store vast amounts of data from many sources
  11. 11. For the past 15 years, Google has been building out the world’s fastest, most powerful, highest quality cloud infrastructure on the planet.
  12. 12. Cloud Platform is built on the same infrastructure that powers Google.
  13. 13. Google's Platform "[Google's] ability to build, organize, and operate a huge network of servers and fiberoptic cables with an efficiency and speed that rocks physics on its heels. This is what makes Google Google: its physical network, its thousands of fiber miles, and those many thousands of servers that, in aggregate, add up to the mother of all clouds." - Wired Wired, 'Google Throws Open Doors To Its Top Secret Data Center', October 2012 Images by Connie Zhou
  14. 14. Investing In Our Cloud $2.9B in additional data center investments worldwide
  15. 15. Google Innovations in Software MapReduce 2002 2004 GFS Dremel 2006 Big Table 2008 Colossus 2010 Spanner 2012
  16. 16. Google Cloud Platform Compute Compute Engine App Engine Storage App Services Cloud Storage BigQuery Cloud SQL Cloud Endpoints Cloud Datastore Caching Persistent Disk Queues
  17. 17. Google Compute Engine and Google Cloud Storage
  18. 18. Google Compute Engine Compute Engine Virtual Machine Hosting Compute Network Storage Tooling Launch Virtual Machines on demand Connect your VMs together to form powerful clusters Store on persistent disk, local disk or Cloud Storage Control your VMs via REST API or command line Scale Performance Value
  19. 19. Google Compute Engine “Google Compute Engine is not just fast. It’s Google fast. In fact, it’s a class of fast that enables new service architectures entirely.” - Sebastian Stadil, Scalr
  20. 20. Google Compute Engine MapR on GCE breaks the TeraSort, then MinuteSort records ● sorted 15 billion 100-byte records (1.5TB) in 59 seconds ● used 2,103 instances (n1-standard-4-d) ● 8,412 cores
  21. 21. Computing in the Cloud with GCE Cancer Research "Google Compute Engine is just part of what we see as a whole new way for scientists around the world to work more effectively in the cloud. This is a paradigm shift that Google services can help bring about.” – Hector Rovira, senior software engineer at the Institute for Systems Biology High Performance Computing “And we have been very satisfied with the way Google Compute Engine performs. One of the great things with Google Compute Engine in our experience is that the performance is really reliable, so there is not a lot of variability in the performance we see.” — Joe Masters Emison, Founder and VP R&D, BuildFax Google Confidential, Do not Copy or Distribute
  22. 22. Google Cloud Storage Google Cloud Storage lets you store and access data on Google's reliable infrastructure SPEED RELIABILITY SCALABILITY Global Network World-Class Reliability Unlimited Objects Lowest latency for rapid access 99.9% SLA There is no limit to # objects Data Center Efficiency Availability of your Data Big Object Size Maximal service at critical need Read-Your-Write-Consistency Up to 5 Terabytes per object USES Content Delivery & File Sharing Active Archiving Application Storage Computation
  23. 23. Specialized Big Data services of Google Cloud Platform
  24. 24. Big Data example: app logs User requests Application servers
  25. 25. Big Data example: app logs App persistence BigTable / GFS App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  26. 26. Big Data example: app logs MapReduce App persistence BigTable / GFS Dremel MapReduce App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  27. 27. Big Data example: app logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  28. 28. Big Data example: app logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  29. 29. Google Innovations in Software MapReduce 2002 2004 GFS Dremel 2006 Big Table 2008 Colossus 2010 Spanner 2012
  30. 30. Google Cloud Platform Services: BigQuery
  31. 31. Google BigQuery ● Query billions of rows in seconds, with no index ● Uses a SQL-style query syntax ● Bulk and stream data ingestion ● As a service: ○ zero setup/management ○ accessed by a ■ RESTful API ■ console ■ partner tools ○ Pay as you go (per query)
  32. 32. How Google Approaches Analytics Dremel Ad-hoc query system for terabyte datasets ● ● ● Query execution tree Column Oriented records ...and a lot of nodes
  33. 33. Google BigQuery BigQuery: a fully-managed data analytics service in the cloud. Unlimited storage. Interactive analysis on multi-terabyte datasets. Adwords DCLK Analytics Other BI Tools Google Spreadsheets Scalable Storage SQL API Corporate data 3rd party data Store all your data in the cloud App Engine App Co-Workers Analyze interactively Mash it up Securely Share/ distribute the results
  34. 34. Big Data Processing Pipeline Application level code Log data BI tools Logstore Backends + MapReduce Structured data BigQuery Datastore SQL API Interactive Dashboards + apps Hadoop Unstructured data Cloud Storage Store Extract & Transform Custom logic & 3rd party libraries Analyze interactively Google Spreadsheets Serve
  35. 35. Big Data example: app logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  36. 36. Sample BigQuery use cases Display ads analytics (global top-5 media agency) Ads network reporting (3rd party mobile ads) Fleet reservations (online travel operations) Mobile app statistics (online reading vendor) Revenue optimization (holiday/travel properties) Analyze global campaign performance for F500 clients 1 client = 20GB/day of DoubleClick impressions logs Deliver x-platform performance analytics dashboards 1B events/day x 100s of ads customers Monitor customer demand vs supply shortfalls 10,000 routes x 1000s customers = millions of daily events Usage analysis on 60M installs; 10M active users 2B API requests/day, 20GB log data/day Correlate marketing effectiveness vs global reservations 10MBs / day from multiple data warehouses
  37. 37. https://cloud.google.com/resources/starterpack/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×