Big Data Analytics with Google Cloud Platform

1,759 views
1,410 views

Published on

Big Data Analytics with Google Cloud Platform

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,759
On SlideShare
0
From Embeds
0
Number of Embeds
36
Actions
Shares
0
Downloads
157
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big Data Analytics with Google Cloud Platform

  1. 1. Big Data Analytics with Google Cloud Platform
  2. 2. Speaker William Vambenepe Google Cloud Platform Lead Product Manager for Big Data Twitter: @vambenepe
  3. 3. Google ran into “Big Data” problems in the process of building its business...
  4. 4. Big data at Google scale How much video is uploaded to YouTube every minute? How many active Gmail users are there? How large is Google's web search index? How long does it take for Google to respond to a search query? 100 hours 500M+ 100PB+ (over 100,000 TBs) 0.25 seconds
  5. 5. ...now that the technology is available, its usage is spreading to all industries.
  6. 6. Conventional Methods for Data & Analytics Focused on optimizing core business processes and capture details about high interactions/transactions
  7. 7. Conventional Methods for Data & Analytics Stopped collecting data at the nearest boundary of business process: ● ● Not instrumented for tracking of product usage by customer Instrumentation for tracking supplier performance based on supplier specific situation Businesses decision makers fill data gaps with gut feel...
  8. 8. Rise of Big Data enabled by Tech Innovation It is Possible to instrument data capture at any level of business process...
  9. 9. ...and to store and process it easily and efficiently.
  10. 10. Big Data enabled by Tech Innovation Highly scalable, performant and cost effective computational capabilities in the cloud makes it possible to store vast amounts of data from many sources
  11. 11. For the past 15 years, Google has been building out the world’s fastest, most powerful, highest quality cloud infrastructure on the planet.
  12. 12. Cloud Platform is built on the same infrastructure that powers Google.
  13. 13. Google's Platform "[Google's] ability to build, organize, and operate a huge network of servers and fiberoptic cables with an efficiency and speed that rocks physics on its heels. This is what makes Google Google: its physical network, its thousands of fiber miles, and those many thousands of servers that, in aggregate, add up to the mother of all clouds." - Wired Wired, 'Google Throws Open Doors To Its Top Secret Data Center', October 2012 Images by Connie Zhou
  14. 14. Investing In Our Cloud $2.9B in additional data center investments worldwide
  15. 15. Google Innovations in Software MapReduce 2002 2004 GFS Dremel 2006 Big Table 2008 Colossus 2010 Spanner 2012
  16. 16. Google Cloud Platform Compute Compute Engine App Engine Storage App Services Cloud Storage BigQuery Cloud SQL Cloud Endpoints Cloud Datastore Caching Persistent Disk Queues
  17. 17. Google Compute Engine and Google Cloud Storage
  18. 18. Google Compute Engine Compute Engine Virtual Machine Hosting Compute Network Storage Tooling Launch Virtual Machines on demand Connect your VMs together to form powerful clusters Store on persistent disk, local disk or Cloud Storage Control your VMs via REST API or command line Scale Performance Value
  19. 19. Google Compute Engine “Google Compute Engine is not just fast. It’s Google fast. In fact, it’s a class of fast that enables new service architectures entirely.” - Sebastian Stadil, Scalr
  20. 20. Google Compute Engine MapR on GCE breaks the TeraSort, then MinuteSort records ● sorted 15 billion 100-byte records (1.5TB) in 59 seconds ● used 2,103 instances (n1-standard-4-d) ● 8,412 cores
  21. 21. Computing in the Cloud with GCE Cancer Research "Google Compute Engine is just part of what we see as a whole new way for scientists around the world to work more effectively in the cloud. This is a paradigm shift that Google services can help bring about.” – Hector Rovira, senior software engineer at the Institute for Systems Biology High Performance Computing “And we have been very satisfied with the way Google Compute Engine performs. One of the great things with Google Compute Engine in our experience is that the performance is really reliable, so there is not a lot of variability in the performance we see.” — Joe Masters Emison, Founder and VP R&D, BuildFax Google Confidential, Do not Copy or Distribute
  22. 22. Google Cloud Storage Google Cloud Storage lets you store and access data on Google's reliable infrastructure SPEED RELIABILITY SCALABILITY Global Network World-Class Reliability Unlimited Objects Lowest latency for rapid access 99.9% SLA There is no limit to # objects Data Center Efficiency Availability of your Data Big Object Size Maximal service at critical need Read-Your-Write-Consistency Up to 5 Terabytes per object USES Content Delivery & File Sharing Active Archiving Application Storage Computation
  23. 23. Specialized Big Data services of Google Cloud Platform
  24. 24. Big Data example: app logs User requests Application servers
  25. 25. Big Data example: app logs App persistence BigTable / GFS App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  26. 26. Big Data example: app logs MapReduce App persistence BigTable / GFS Dremel MapReduce App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  27. 27. Big Data example: app logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  28. 28. Big Data example: app logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  29. 29. Google Innovations in Software MapReduce 2002 2004 GFS Dremel 2006 Big Table 2008 Colossus 2010 Spanner 2012
  30. 30. Google Cloud Platform Services: BigQuery
  31. 31. Google BigQuery ● Query billions of rows in seconds, with no index ● Uses a SQL-style query syntax ● Bulk and stream data ingestion ● As a service: ○ zero setup/management ○ accessed by a ■ RESTful API ■ console ■ partner tools ○ Pay as you go (per query)
  32. 32. How Google Approaches Analytics Dremel Ad-hoc query system for terabyte datasets ● ● ● Query execution tree Column Oriented records ...and a lot of nodes
  33. 33. Google BigQuery BigQuery: a fully-managed data analytics service in the cloud. Unlimited storage. Interactive analysis on multi-terabyte datasets. Adwords DCLK Analytics Other BI Tools Google Spreadsheets Scalable Storage SQL API Corporate data 3rd party data Store all your data in the cloud App Engine App Co-Workers Analyze interactively Mash it up Securely Share/ distribute the results
  34. 34. Big Data Processing Pipeline Application level code Log data BI tools Logstore Backends + MapReduce Structured data BigQuery Datastore SQL API Interactive Dashboards + apps Hadoop Unstructured data Cloud Storage Store Extract & Transform Custom logic & 3rd party libraries Analyze interactively Google Spreadsheets Serve
  35. 35. Big Data example: app logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  36. 36. Sample BigQuery use cases Display ads analytics (global top-5 media agency) Ads network reporting (3rd party mobile ads) Fleet reservations (online travel operations) Mobile app statistics (online reading vendor) Revenue optimization (holiday/travel properties) Analyze global campaign performance for F500 clients 1 client = 20GB/day of DoubleClick impressions logs Deliver x-platform performance analytics dashboards 1B events/day x 100s of ads customers Monitor customer demand vs supply shortfalls 10,000 routes x 1000s customers = millions of daily events Usage analysis on 60M installs; 10M active users 2B API requests/day, 20GB log data/day Correlate marketing effectiveness vs global reservations 10MBs / day from multiple data warehouses
  37. 37. https://cloud.google.com/resources/starterpack/

×