0
Google Storage, Bigquery and Prediction APIs
Patrick Chanezon, Developer Advocate, Cloud
@chanezon, chanezon@google.com
Sa...
Mobile Agenda for GDD
http://bit.ly/mgddbr
Developer DayGoogle 2010
Friday, October 29, 2010
Developer DayGoogle 2010
Agenda
• Google Storage for Developers
• Prediction API
• BigQuery
Friday, October 29, 2010
What is
cloud
computing?
Infrastructure…
Platform…
Software…
… as a Service
Friday, October 29, 2010
What is
cloud
computing?
Place
Postage
Here
IaaS
PaaS
SaaS
Infrastructure…
Platform…
Software…
…asaService
Friday, October...
Developer DayGoogle 2010
Google Storage
Prediction API
BigQuery
1. Google Apps
2. Third party Apps:
Google Apps Marketplac...
Developer DayGoogle 2010
Google Storage
Prediction API
BigQuery
Your Apps
1. Google Apps
2. Third party Apps:
Google Apps ...
Developer DayGoogle 2010
Google Storage for Developers
Store your data in Google's cloud
Friday, October 29, 2010
Developer DayGoogle 2010
What Is Google Storage?
• Store your data in Google's cloud
o any format, any amount, any time
• ...
Developer DayGoogle 2010
Sample Use Cases
Static content hosting
e.g. static html, images, music, video
Backup and recover...
Developer DayGoogle 2010
Google Storage Benefits
High Performance and Scalability
Backed by Google infrastructure
Strong S...
Developer DayGoogle 2010
Google Storage Technical Details
• RESTful API 
o Verbs: GET, PUT, POST, HEAD, DELETE 
o Resource...
Developer DayGoogle 2010
Performance and Scalability
• Objects of any type and 100 GB / Object
• Unlimited numbers of obje...
Developer DayGoogle 2010
Security and Privacy Features
• Key-based authentication
• Authenticated downloads from a web bro...
Developer DayGoogle 2010
Tools
Google Storage Manager
gsutil
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage usage within Google
Haiti Relief Imagery USPTO data
Partner Reporting
Google
BigQu...
Developer DayGoogle 2010
Some Early Google Storage Adopters
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage - Pricing
o Storage
$0.17/GB/Month
o Network
Upload - $0.10/GB
Download
$0.30/...
Developer DayGoogle 2010
Google Storage - Availability
• Limited preview in US* currently
o 100GB free storage and network...
Developer DayGoogle 2010
Google Storage Summary
• Store any kind of data using Google's cloud infrastructure
• Easy to Use...
Developer DayGoogle 2010
Google Prediction API
Google's prediction engine in the cloud
Friday, October 29, 2010
Developer DayGoogle 2010
Introducing the Google Prediction API
• Google's sophisticated machine learning technology
• Avai...
Developer DayGoogle 2010
"english"
The quick brown fox jumped over the
lazy dog.
"english"
To err is human, but to really ...
Developer DayGoogle 2010
Customer
Sentiment
Transaction
Risk
Species
Identification
Message
Routing
Legal Docket
Classific...
Developer DayGoogle 2010
Automatically categorize and respond to emails by language
• Customer: ACME Corp, a multinational...
Developer DayGoogle 2010
Using the Prediction API
1. Upload
2. Train
Upload your training data to
Google Storage
Build a m...
Developer DayGoogle 2010
Upload your training data to Google Storage
• Training data: outputs and input features
• Data fo...
Developer DayGoogle 2010
Create a new model by training on data
To train a model:
POST prediction/v1.1/training?data=mybuc...
Developer DayGoogle 2010
Apply the trained model to make predictions on new data
POST prediction/v1.1/query/mybucket%2Fmyd...
Developer DayGoogle 2010
Apply the trained model to make predictions on new data
POST prediction/v1.1/query/mybucket%2Fmyd...
Developer DayGoogle 2010
Apply the trained model to make predictions on new data
import httplib
# put new data in JSON for...
Developer DayGoogle 2010
Data
• Input Features: numeric or unstructured text
• Output: up to hundreds of discrete categori...
Developer DayGoogle 2010
• Updated Syntax
• Multi-category prediction
o Tag entry with multiple labels
• Continuous Output...
Developer DayGoogle 2010
Google BigQuery
Interactive analysis of large datasets in Google's cloud
Friday, October 29, 2010
Developer DayGoogle 2010
Introducing Google BigQuery
– Google's large data adhoc analysis technology
• Analyze massive amo...
Developer DayGoogle 2010
Working with large data is a challenge
Why BigQuery?
Friday, October 29, 2010
Developer DayGoogle 2010
Spam
Trends
Detection
Web
Dashboards
Network
Optimization
Interactive
Tools
Many Use Cases ...
Fr...
Developer DayGoogle 2010
• Scalable: Billions of rows
• Fast: Response in seconds
• Simple: Queries in SQL
• Web Service
o...
Developer DayGoogle 2010
1. Upload
2. Import
Upload your raw data to
Google Storage
Import raw data into
BigQuery table
Pe...
Developer DayGoogle 2010
Compact subset of SQL
o SELECT ... FROM ...
WHERE ...
GROUP BY ... ORDER BY ...
LIMIT ...;
Common...
Developer DayGoogle 2010
GET /bigquery/v1/tables/{table name}
GET /bigquery/v1/query?q={query}
Sample JSON Reply:
{
"resul...
Developer DayGoogle 2010
Standard Google Authentication
• Client Login
• OAuth
• AuthSub
HTTPS support
• protects your cre...
Developer DayGoogle 2010
Wikimedia Revision history data from:
http://download.wikimedia.org/enwiki/latest/enwiki-latest-p...
Developer DayGoogle 2010
Wikimedia Revision history data from:
http://download.wikimedia.org/enwiki/latest/enwiki-latest-p...
Developer DayGoogle 2010
Python DB API 2.0 + B. Clapper's sqlcmd
http://www.clapper.org/software/python/sqlcmd/
Using BigQ...
Developer DayGoogle 2010
BigQuery from a Spreadsheet
Friday, October 29, 2010
Developer DayGoogle 2010
BigQuery from a Spreadsheet
Friday, October 29, 2010
Developer DayGoogle 2010
Input Data: http://delic.io.us/chanezon
–6000 urls, 14000 tags in 6 years
Analyze my delicious ta...
Developer DayGoogle 2010
Nick Johnson’s blog
–http://blog.notdot.net/2010/06/Trying-out-the-
new-Prediction-API
–42,753 su...
Developer DayGoogle 2010
• Google Storage
o High speed data storage on Google Cloud
• Prediction API
o Google's machine le...
Developer DayGoogle 2010
• Google Storage for Developers
o http://code.google.com/apis/storage
• Prediction API
o http://c...
Mobile Agenda for GDD
http://bit.ly/mgddbr
Developer DayGoogle 2010
Friday, October 29, 2010
Developer DayGoogle 2010
Friday, October 29, 2010
Upcoming SlideShare
Loading in...5
×

GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

7,831

Published on

Google is expanding our storage products by introducing Google Storage for Developers. It offers a RESTful API for storing and accessing data at Google. Developers can take advantage of the performance and reliability of Google's storage infrastructure, as well as the advanced security and sharing capabilities. We will demonstrate key functionality of the product as well as customer use cases. Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictions from your data. We will demonstrate their use and give instructions on how to get access.

Published in: Technology, Business
1 Comment
12 Likes
Statistics
Notes
No Downloads
Views
Total Views
7,831
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
159
Comments
1
Likes
12
Embeds 0
No embeds

No notes for slide

Transcript of "GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs"

  1. 1. Google Storage, Bigquery and Prediction APIs Patrick Chanezon, Developer Advocate, Cloud @chanezon, chanezon@google.com Sao Paulo, October 29th 2010 Developer DayGoogle 2010 Friday, October 29, 2010
  2. 2. Mobile Agenda for GDD http://bit.ly/mgddbr Developer DayGoogle 2010 Friday, October 29, 2010
  3. 3. Developer DayGoogle 2010 Agenda • Google Storage for Developers • Prediction API • BigQuery Friday, October 29, 2010
  4. 4. What is cloud computing? Infrastructure… Platform… Software… … as a Service Friday, October 29, 2010
  5. 5. What is cloud computing? Place Postage Here IaaS PaaS SaaS Infrastructure… Platform… Software… …asaService Friday, October 29, 2010
  6. 6. Developer DayGoogle 2010 Google Storage Prediction API BigQuery 1. Google Apps 2. Third party Apps: Google Apps Marketplace 3. ________ 5 Google App Engine IaaS PaaS SaaS Google's Cloud Offerings Friday, October 29, 2010
  7. 7. Developer DayGoogle 2010 Google Storage Prediction API BigQuery Your Apps 1. Google Apps 2. Third party Apps: Google Apps Marketplace 3. ________ 5 Google App Engine IaaS PaaS SaaS Google's Cloud Offerings Friday, October 29, 2010
  8. 8. Developer DayGoogle 2010 Google Storage for Developers Store your data in Google's cloud Friday, October 29, 2010
  9. 9. Developer DayGoogle 2010 What Is Google Storage? • Store your data in Google's cloud o any format, any amount, any time • You control access to your data o private, shared, or public • Access via Google APIs or 3rd party tools/libraries Friday, October 29, 2010
  10. 10. Developer DayGoogle 2010 Sample Use Cases Static content hosting e.g. static html, images, music, video Backup and recovery e.g. personal data, business records Sharing e.g. share data with your customers Data storage for applications e.g. used as storage backend for Android, App Engine, Cloud based apps Storage for Computation e.g. BigQuery, Prediction API Friday, October 29, 2010
  11. 11. Developer DayGoogle 2010 Google Storage Benefits High Performance and Scalability Backed by Google infrastructure Strong Security and Privacy Control access to your data Easy to Use Get started fast with Google & 3rd party tools Friday, October 29, 2010
  12. 12. Developer DayGoogle 2010 Google Storage Technical Details • RESTful API  o Verbs: GET, PUT, POST, HEAD, DELETE  o Resources: identified by URI o Compatible with S3  • Buckets  o Flat containers, i.e. no bucket hierarchy   • Objects  o Any type o Size: 100 GB / object • Access Control for Google Accounts  o For individuals and groups • Two Ways to Authenticate Requests  o Sign request using access keys  o Web browser login Friday, October 29, 2010
  13. 13. Developer DayGoogle 2010 Performance and Scalability • Objects of any type and 100 GB / Object • Unlimited numbers of objects, 1000s of buckets • All data replicated to multiple US data centers • Leveraging Google's worldwide network for data delivery • Only you can use bucket names with your domain names • “Read-your-writes” data consistency • Range Get Friday, October 29, 2010
  14. 14. Developer DayGoogle 2010 Security and Privacy Features • Key-based authentication • Authenticated downloads from a web browser • Sharing with individuals • Group sharing via Google Groups • Access control for buckets and objects • Set Read/Write/List permissions Friday, October 29, 2010
  15. 15. Developer DayGoogle 2010 Tools Google Storage Manager gsutil Friday, October 29, 2010
  16. 16. Developer DayGoogle 2010 Google Storage usage within Google Haiti Relief Imagery USPTO data Partner Reporting Google BigQuery Google Prediction API Partner Reporting Friday, October 29, 2010
  17. 17. Developer DayGoogle 2010 Some Early Google Storage Adopters Friday, October 29, 2010
  18. 18. Developer DayGoogle 2010 Google Storage - Pricing o Storage $0.17/GB/Month o Network Upload - $0.10/GB Download $0.30/GB APAC $0.15/GB Americas / EMEA o Requests PUT, POST, LIST - $0.01 / 1,000 Requests GET, HEAD - $0.01 / 10,000 Requests Friday, October 29, 2010
  19. 19. Developer DayGoogle 2010 Google Storage - Availability • Limited preview in US* currently o 100GB free storage and network per account o Sign up for wait list at o http://code.google.com/apis/storage/ * Non-US preview available on case-by-case basis Friday, October 29, 2010
  20. 20. Developer DayGoogle 2010 Google Storage Summary • Store any kind of data using Google's cloud infrastructure • Easy to Use APIs • Many available tools and libraries o gsutil, Google Storage Manager o 3rd party: Boto, CloudBerry, CyberDuck, JetS3t, … Friday, October 29, 2010
  21. 21. Developer DayGoogle 2010 Google Prediction API Google's prediction engine in the cloud Friday, October 29, 2010
  22. 22. Developer DayGoogle 2010 Introducing the Google Prediction API • Google's sophisticated machine learning technology • Available as an on-demand RESTful HTTP web service Friday, October 29, 2010
  23. 23. Developer DayGoogle 2010 "english" The quick brown fox jumped over the lazy dog. "english" To err is human, but to really foul things up you need a computer. "spanish" No hay mal que por bien no venga. "spanish" La tercera es la vencida. ? To be or not to be, that is the question. ? La fe mueve montañas. 2. PREDICT The Prediction API later searches for those features during prediction. How does it work? 1. TRAIN The Prediction API finds relevant features in the sample data during training. Friday, October 29, 2010
  24. 24. Developer DayGoogle 2010 Customer Sentiment Transaction Risk Species Identification Message Routing Legal Docket Classification Suspicious Activity Work Roster Assignment Recommend Products Political Bias Uplift Marketing Email Filtering Diagnostics Inappropriate Content Career Counseling Churn Prediction ... and many more ... A virtually endless number of applications... Friday, October 29, 2010
  25. 25. Developer DayGoogle 2010 Automatically categorize and respond to emails by language • Customer: ACME Corp, a multinational organization • Goal: Respond to customer emails in their language • Data: Many emails, tagged with their languages • Outcome: Predict language and respond accordingly A Prediction API Example Friday, October 29, 2010
  26. 26. Developer DayGoogle 2010 Using the Prediction API 1. Upload 2. Train Upload your training data to Google Storage Build a model from your data Make new predictions3. Predict A simple three step process... Friday, October 29, 2010
  27. 27. Developer DayGoogle 2010 Upload your training data to Google Storage • Training data: outputs and input features • Data format: comma separated value format (CSV), result in first column "english","To err is human, but to really ..." "spanish","No hay mal que por bien no venga." ... Upload to Google Storage gsutil cp ${data} gs://yourbucket/${data} Step 1: Upload Friday, October 29, 2010
  28. 28. Developer DayGoogle 2010 Create a new model by training on data To train a model: POST prediction/v1.1/training?data=mybucket%2Fmydata Training runs asynchronously. To see if it has finished: GET prediction/v1.1/training/mybucket%2Fmydata {"data":{ "data":"mybucket/mydata", "modelinfo":"estimated accuracy: 0.xx"}}} Step 2: Train Friday, October 29, 2010
  29. 29. Developer DayGoogle 2010 Apply the trained model to make predictions on new data POST prediction/v1.1/query/mybucket%2Fmydata/predict { "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}} Step 3: Predict Friday, October 29, 2010
  30. 30. Developer DayGoogle 2010 Apply the trained model to make predictions on new data POST prediction/v1.1/query/mybucket%2Fmydata/predict { "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}} { data : { "kind" : "prediction#output", "outputLabel":"French", "outputMulti" :[ {"label":"French", "score": x.xx} {"label":"English", "score": x.xx} {"label":"Spanish", "score": x.xx}]}} Step 3: Predict Friday, October 29, 2010
  31. 31. Developer DayGoogle 2010 Apply the trained model to make predictions on new data import httplib # put new data in JSON format params = { ... } header = {"Content-Type" : "application/json"} conn = httplib.HTTPConnection("www.googleapis.com")conn.reques t("POST", "/prediction/v1.1/query/mybucket%2Fmydata/predict", params, header) print conn.getresponse() Step 3: Predict Friday, October 29, 2010
  32. 32. Developer DayGoogle 2010 Data • Input Features: numeric or unstructured text • Output: up to hundreds of discrete categories Training • Many machine learning techniques • Automatically selected • Performed asynchronously Access from many platforms: • Web app from Google App Engine • Apps Script (e.g. from Google Spreadsheet) • Desktop app Prediction API Capabilities Friday, October 29, 2010
  33. 33. Developer DayGoogle 2010 • Updated Syntax • Multi-category prediction o Tag entry with multiple labels • Continuous Output o Finer grained prediction rankings based on multiple labels • Mixed Inputs o Both numeric and text inputs are now supported Can combine continuous output with mixed inputs Prediction API v1.1 - new features Friday, October 29, 2010
  34. 34. Developer DayGoogle 2010 Google BigQuery Interactive analysis of large datasets in Google's cloud Friday, October 29, 2010
  35. 35. Developer DayGoogle 2010 Introducing Google BigQuery – Google's large data adhoc analysis technology • Analyze massive amounts of data in seconds – Simple SQL-like query language – Flexible access • REST APIs, JSON-RPC, Google Apps Script Friday, October 29, 2010
  36. 36. Developer DayGoogle 2010 Working with large data is a challenge Why BigQuery? Friday, October 29, 2010
  37. 37. Developer DayGoogle 2010 Spam Trends Detection Web Dashboards Network Optimization Interactive Tools Many Use Cases ... Friday, October 29, 2010
  38. 38. Developer DayGoogle 2010 • Scalable: Billions of rows • Fast: Response in seconds • Simple: Queries in SQL • Web Service oREST oJSON-RPC oGoogle App Scripts Key Capabilities of BigQuery Friday, October 29, 2010
  39. 39. Developer DayGoogle 2010 1. Upload 2. Import Upload your raw data to Google Storage Import raw data into BigQuery table Perform SQL queries on table 3. Query Another simple three step process... Using BigQuery Friday, October 29, 2010
  40. 40. Developer DayGoogle 2010 Compact subset of SQL o SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ... LIMIT ...; Common functions o Math, String, Time, ... Additional statistical approximations o TOP o COUNT DISTINCT Writing Queries Friday, October 29, 2010
  41. 41. Developer DayGoogle 2010 GET /bigquery/v1/tables/{table name} GET /bigquery/v1/query?q={query} Sample JSON Reply: { "results": { "fields": { [ {"id":"COUNT(*)","type":"uint64"}, ... ] }, "rows": [ {"f":[{"v":"2949"}, ...]}, {"f":[{"v":"5387"}, ...]}, ... ] } } Also supports JSON-RPC BigQuery via REST Friday, October 29, 2010
  42. 42. Developer DayGoogle 2010 Standard Google Authentication • Client Login • OAuth • AuthSub HTTPS support • protects your credentials • protects your data Relies on Google Storage to manage access Security and Privacy Friday, October 29, 2010
  43. 43. Developer DayGoogle 2010 Wikimedia Revision history data from: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z Wikimedia Revision History Large Data Analysis Example Friday, October 29, 2010
  44. 44. Developer DayGoogle 2010 Wikimedia Revision history data from: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z Wikimedia Revision History Large Data Analysis Example Friday, October 29, 2010
  45. 45. Developer DayGoogle 2010 Python DB API 2.0 + B. Clapper's sqlcmd http://www.clapper.org/software/python/sqlcmd/ Using BigQuery Shell Friday, October 29, 2010
  46. 46. Developer DayGoogle 2010 BigQuery from a Spreadsheet Friday, October 29, 2010
  47. 47. Developer DayGoogle 2010 BigQuery from a Spreadsheet Friday, October 29, 2010
  48. 48. Developer DayGoogle 2010 Input Data: http://delic.io.us/chanezon –6000 urls, 14000 tags in 6 years Analyze my delicious tags –use delicious API to get all tagged urls –cleanup data, resize (100Mb limit) –PUT data in Google storage –Define table –analyze Predict how I would tag a technology article –input is tag,url,text –send new url and text –get predicted tag Prediction API and BigQuery Demo: Tagger Friday, October 29, 2010
  49. 49. Developer DayGoogle 2010 Nick Johnson’s blog –http://blog.notdot.net/2010/06/Trying-out-the- new-Prediction-API –42,753 submissions, for a week –63% accuracy, to categorize new submissions Guessing Subreddits with Prediction API Friday, October 29, 2010
  50. 50. Developer DayGoogle 2010 • Google Storage o High speed data storage on Google Cloud • Prediction API o Google's machine learning technology able to predict outcomes based on sample data • BigQuery o Interactive analysis of very large data sets o Simple SQL query language access Recap Friday, October 29, 2010
  51. 51. Developer DayGoogle 2010 • Google Storage for Developers o http://code.google.com/apis/storage • Prediction API o http://code.google.com/apis/prediction • BigQuery o http://code.google.com/apis/bigquery More information Friday, October 29, 2010
  52. 52. Mobile Agenda for GDD http://bit.ly/mgddbr Developer DayGoogle 2010 Friday, October 29, 2010
  53. 53. Developer DayGoogle 2010 Friday, October 29, 2010
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×