Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Like this? Share it with your network

Share

GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

on

  • 8,321 views

Google is expanding our storage products by introducing Google Storage for Developers. It offers a RESTful API for storing and accessing data at Google. Developers can take advantage of the ...

Google is expanding our storage products by introducing Google Storage for Developers. It offers a RESTful API for storing and accessing data at Google. Developers can take advantage of the performance and reliability of Google's storage infrastructure, as well as the advanced security and sharing capabilities. We will demonstrate key functionality of the product as well as customer use cases. Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictions from your data. We will demonstrate their use and give instructions on how to get access.

Statistics

Views

Total Views
8,321
Views on SlideShare
7,311
Embed Views
1,010

Actions

Likes
11
Downloads
156
Comments
1

12 Embeds 1,010

http://templariodatecnologia.wordpress.com 849
http://www.accessoweb.com 38
http://dmottab.blogspot.com 32
http://lanyrd.com 32
http://abava.blogspot.com 31
http://www.dmotta.com 12
http://dmotta.android-peru.com 6
http://abava.blogspot.ru 4
http://dmottab.blogspot.com.es 2
https://templariodatecnologia.wordpress.com 2
http://paper.li 1
http://webcache.googleusercontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs Presentation Transcript

  • 1. Google Developer Day 2010 Friday, October 29, 2010
  • 2. Google Storage, Bigquery and Prediction APIs Patrick Chanezon, Developer Advocate, Cloud @chanezon, chanezon@google.com Sao Paulo, October 29th 2010 Google Developer Day 2010 Friday, October 29, 2010
  • 3. Mobile Agenda for GDD http://bit.ly/mgddbr Google Developer Day 2010 Friday, October 29, 2010
  • 4. Agenda • Google Storage for Developers • Prediction API • BigQuery Google Developer Day 2010 Friday, October 29, 2010
  • 5. What is cloud computing? Software… Platform… Infrastructure… … as a Service Friday, October 29, 2010
  • 6. Place Postage Here Software… PaaS SaaS IaaS Platform… Infrastructure… … as a Service computing? What is cloud Friday, October 29, 2010
  • 7. Google's Cloud Offerings 1. Google Apps 2. Third party Apps: Google Apps Marketplace SaaS 3. ________ Google App Engine PaaS Google Storage IaaS Prediction API BigQuery 6 Google Developer Day 2010 Friday, October 29, 2010
  • 8. Google's Cloud Offerings Your Apps 1. Google Apps 2. Third party Apps: Google Apps Marketplace SaaS 3. ________ Google App Engine PaaS Google Storage IaaS Prediction API BigQuery 6 Google Developer Day 2010 Friday, October 29, 2010
  • 9. Google Storage for Developers Store your data in Google's cloud Google Developer Day 2010 Friday, October 29, 2010
  • 10. What Is Google Storage? • Store your data in Google's cloud o any format, any amount, any time • You control access to your data o private, shared, or public • Access via Google APIs or 3rd party tools/libraries Google Developer Day 2010 Friday, October 29, 2010
  • 11. Sample Use Cases Static content hosting e.g. static html, images, music, video Backup and recovery e.g. personal data, business records Sharing e.g. share data with your customers Data storage for applications e.g. used as storage backend for Android, App Engine, Cloud based apps Storage for Computation e.g. BigQuery, Prediction API Google Developer Day 2010 Friday, October 29, 2010
  • 12. Google Storage Benefits High Performance and Scalability Backed by Google infrastructure Strong Security and Privacy Control access to your data Easy to Use Get started fast with Google & 3rd party tools Google Developer Day 2010 Friday, October 29, 2010
  • 13. Google Storage Technical Details • RESTful API  o Verbs: GET, PUT, POST, HEAD, DELETE  o Resources: identified by URI o Compatible with S3  • Buckets  o Flat containers, i.e. no bucket hierarchy   • Objects  o Any type o Size: 100 GB / object • Access Control for Google Accounts  o For individuals and groups • Two Ways to Authenticate Requests  o Sign request using access keys  o Web browser login Google Developer Day 2010 Friday, October 29, 2010
  • 14. Performance and Scalability • Objects of any type and 100 GB / Object • Unlimited numbers of objects, 1000s of buckets • All data replicated to multiple US data centers • Leveraging Google's worldwide network for data delivery • Only you can use bucket names with your domain names • “Read-your-writes” data consistency • Range Get Google Developer Day 2010 Friday, October 29, 2010
  • 15. Security and Privacy Features • Key-based authentication • Authenticated downloads from a web browser • Sharing with individuals • Group sharing via Google Groups • Access control for buckets and objects • Set Read/Write/List permissions Google Developer Day 2010 Friday, October 29, 2010
  • 16. Tools Google Storage Manager gsutil Google Developer Day 2010 Friday, October 29, 2010
  • 17. Google Storage usage within Google Google Google BigQuery Prediction API Haiti Relief Imagery USPTO data Partner Reporting Partner Reporting Google Developer Day 2010 Friday, October 29, 2010
  • 18. Some Early Google Storage Adopters Google Developer Day 2010 Friday, October 29, 2010
  • 19. Google Storage - Pricing o Storage  $0.17/GB/Month o Network  Upload - $0.10/GB  Download  $0.30/GB APAC  $0.15/GB Americas / EMEA o Requests PUT, POST, LIST - $0.01 / 1,000 Requests GET, HEAD - $0.01 / 10,000 Requests Google Developer Day 2010 Friday, October 29, 2010
  • 20. Google Storage - Availability • Limited preview in US* currently o 100GB free storage and network per account o Sign up for wait list at o http://code.google.com/apis/storage/ * Non-US preview available on case-by-case basis Google Developer Day 2010 Friday, October 29, 2010
  • 21. Google Storage Summary • Store any kind of data using Google's cloud infrastructure • Easy to Use APIs • Many available tools and libraries o gsutil, Google Storage Manager o 3rd party: Boto, CloudBerry, CyberDuck, JetS3t, … Google Developer Day 2010 Friday, October 29, 2010
  • 22. Google Prediction API Google's prediction engine in the cloud Google Developer Day 2010 Friday, October 29, 2010
  • 23. Introducing the Google Prediction API • Google's sophisticated machine learning technology • Available as an on-demand RESTful HTTP web service Google Developer Day 2010 Friday, October 29, 2010
  • 24. How does it work? 1. TRAIN The quick brown fox jumped over the "english" The Prediction API lazy dog. finds relevant To err is human, but to really foul things features in the "english" up you need a computer. sample data during "spanish" No hay mal que por bien no venga. training. "spanish" La tercera es la vencida. 2. PREDICT To be or not to be, that is the ? The Prediction API question. later searches for ? La fe mueve montañas. those features during prediction. Google Developer Day 2010 Friday, October 29, 2010
  • 25. A virtually endless number of applications... Customer Transaction Species Message Diagnostics Sentiment Risk Identification Routing Churn Legal Docket Suspicious Work Roster Inappropriate Prediction Classification Activity Assignment Content Recommend Political Uplift Email Career Products Bias Marketing Filtering Counseling ... and many more ... Google Developer Day 2010 Friday, October 29, 2010
  • 26. A Prediction API Example Automatically categorize and respond to emails by language • Customer: ACME Corp, a multinational organization • Goal: Respond to customer emails in their language • Data: Many emails, tagged with their languages • Outcome: Predict language and respond accordingly Google Developer Day 2010 Friday, October 29, 2010
  • 27. Using the Prediction API A simple three step process... Upload your training data to 1. Upload Google Storage Build a model from your data 2. Train 3. Predict Make new predictions Google Developer Day 2010 Friday, October 29, 2010
  • 28. Step 1: Upload Upload your training data to Google Storage • Training data: outputs and input features • Data format: comma separated value format (CSV), result in first column "english","To err is human, but to really ..." "spanish","No hay mal que por bien no venga." ... Upload to Google Storage gsutil cp ${data} gs://yourbucket/${data} Google Developer Day 2010 Friday, October 29, 2010
  • 29. Step 2: Train Create a new model by training on data To train a model: POST prediction/v1.1/training?data=mybucket%2Fmydata Training runs asynchronously. To see if it has finished: GET prediction/v1.1/training/mybucket%2Fmydata {"data":{ "data":"mybucket/mydata", "modelinfo":"estimated accuracy: 0.xx"}}} Google Developer Day 2010 Friday, October 29, 2010
  • 30. Step 3: Predict Apply the trained model to make predictions on new data POST prediction/v1.1/query/mybucket%2Fmydata/predict { "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}} Google Developer Day 2010 Friday, October 29, 2010
  • 31. Step 3: Predict Apply the trained model to make predictions on new data POST prediction/v1.1/query/mybucket%2Fmydata/predict { "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}} { data : { "kind" : "prediction#output", "outputLabel":"French", "outputMulti" :[ {"label":"French", "score": x.xx} {"label":"English", "score": x.xx} {"label":"Spanish", "score": x.xx}]}} Google Developer Day 2010 Friday, October 29, 2010
  • 32. Step 3: Predict Apply the trained model to make predictions on new data import httplib # put new data in JSON format params = { ... } header = {"Content-Type" : "application/json"} conn = httplib.HTTPConnection("www.googleapis.com")conn.reques t("POST", "/prediction/v1.1/query/mybucket%2Fmydata/predict", params, header) print conn.getresponse() Google Developer Day 2010 Friday, October 29, 2010
  • 33. Prediction API Capabilities Data • Input Features: numeric or unstructured text • Output: up to hundreds of discrete categories Training • Many machine learning techniques • Automatically selected • Performed asynchronously Access from many platforms: • Web app from Google App Engine • Apps Script (e.g. from Google Spreadsheet) • Desktop app Google Developer Day 2010 Friday, October 29, 2010
  • 34. Prediction API v1.1 - new features • Updated Syntax • Multi-category prediction o Tag entry with multiple labels • Continuous Output o Finer grained prediction rankings based on multiple labels • Mixed Inputs o Both numeric and text inputs are now supported Can combine continuous output with mixed inputs Google Developer Day 2010 Friday, October 29, 2010
  • 35. Google BigQuery Interactive analysis of large datasets in Google's cloud Google Developer Day 2010 Friday, October 29, 2010
  • 36. Introducing Google BigQuery – Google's large data adhoc analysis technology • Analyze massive amounts of data in seconds – Simple SQL-like query language – Flexible access • REST APIs, JSON-RPC, Google Apps Script Google Developer Day 2010 Friday, October 29, 2010
  • 37. Why BigQuery? Working with large data is a challenge Google Developer Day 2010 Friday, October 29, 2010
  • 38. Many Use Cases ... Trends Interactive Spam Detection Tools Web Network Dashboards Optimization Google Developer Day 2010 Friday, October 29, 2010
  • 39. Key Capabilities of BigQuery • Scalable: Billions of rows • Fast: Response in seconds • Simple: Queries in SQL • Web Service o REST o JSON-RPC o Google App Scripts Google Developer Day 2010 Friday, October 29, 2010
  • 40. Using BigQuery Another simple three step process... Upload your raw data to 1. Upload Google Storage Import raw data into 2. Import BigQuery table 3. Query Perform SQL queries on table Google Developer Day 2010 Friday, October 29, 2010
  • 41. Writing Queries Compact subset of SQL o SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ... LIMIT ...; Common functions o Math, String, Time, ... Additional statistical approximations o TOP o COUNT DISTINCT Google Developer Day 2010 Friday, October 29, 2010
  • 42. BigQuery via REST GET /bigquery/v1/tables/{table name} GET /bigquery/v1/query?q={query} Sample JSON Reply: { "results": { "fields": { [ {"id":"COUNT(*)","type":"uint64"}, ... ] }, "rows": [ {"f":[{"v":"2949"}, ...]}, {"f":[{"v":"5387"}, ...]}, ... ] } } Also supports JSON-RPC Google Developer Day 2010 Friday, October 29, 2010
  • 43. Security and Privacy Standard Google Authentication • Client Login • OAuth • AuthSub HTTPS support • protects your credentials • protects your data Relies on Google Storage to manage access Google Developer Day 2010 Friday, October 29, 2010
  • 44. Large Data Analysis Example Wikimedia Revision History Wikimedia Revision history data from: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z Google Developer Day 2010 Friday, October 29, 2010
  • 45. Large Data Analysis Example Wikimedia Revision History Wikimedia Revision history data from: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z Google Developer Day 2010 Friday, October 29, 2010
  • 46. Using BigQuery Shell Python DB API 2.0 + B. Clapper's sqlcmd http://www.clapper.org/software/python/sqlcmd/ Google Developer Day 2010 Friday, October 29, 2010
  • 47. BigQuery from a Spreadsheet Google Developer Day 2010 Friday, October 29, 2010
  • 48. BigQuery from a Spreadsheet Google Developer Day 2010 Friday, October 29, 2010
  • 49. Prediction API and BigQuery Demo: Tagger Input Data: http://delic.io.us/chanezon – 6000 urls, 14000 tags in 6 years Analyze my delicious tags – use delicious API to get all tagged urls – cleanup data, resize (100Mb limit) – PUT data in Google storage – Define table – analyze Predict how I would tag a technology article – input is tag,url,text – send new url and text – get predicted tag Google Developer Day 2010 Friday, October 29, 2010
  • 50. Guessing Subreddits with Prediction API Nick Johnson’s blog –http://blog.notdot.net/2010/06/Trying-out-the- new-Prediction-API –42,753 submissions, for a week –63% accuracy, to categorize new submissions Google Developer Day 2010 Friday, October 29, 2010
  • 51. Recap • Google Storage o High speed data storage on Google Cloud • Prediction API o Google's machine learning technology able to predict outcomes based on sample data • BigQuery o Interactive analysis of very large data sets o Simple SQL query language access Google Developer Day 2010 Friday, October 29, 2010
  • 52. More information • Google Storage for Developers o http://code.google.com/apis/storage • Prediction API o http://code.google.com/apis/prediction • BigQuery o http://code.google.com/apis/bigquery Google Developer Day 2010 Friday, October 29, 2010
  • 53. Mobile Agenda for GDD http://bit.ly/mgddbr Google Developer Day 2010 Friday, October 29, 2010
  • 54. Google Developer Day 2010 Friday, October 29, 2010