1. Introduction to Google’s Cloud
Technologies
Chris
Schalk
OpenCF
Summit
Google
Developer
Advocate
Monday
Feb
21st,
2010
2. Google Cloud Technologies at a Glance
ExisIng
Google
App
Engine
Google
App
Engine
for
Business
(new)
New!
Google
Google
BigQuery
Predic0on
API
Google
Storage
3. Agenda
• Part I - Intro to App Engine
• App Engine Details
• Development Tools
• App Engine for Business
• Part II – Google’s new cloud technologies
• Google Storage
• Prediction API
• BigQuery
• Part III - Dabbling with CFML on App Engine
4. Part I – Intro to App Engine
Topics covered
• App Engine a PaaS
• App Engine usage/customers
• App Engine Technical Details
13. Cloud Development in a Box
•
Downloadable SDK
• Application runtimes
• Java, Python
• Local development tools
• Eclipse plugin, AppEngine Launcher
• Specialized application services
• Cloud based dashboard
• Ready to scale
• Built in fault tolerance, load balancing
13
14. Specialized Services
Memcache
Datastore
URL
Fetch
Mail
XMPP
Task
Queue
Images
Blobstore
User
Service
14
25. Two+ years in review
Apr 2008! Python launch
May 2008! Memcache, Images API
Jul 2008! Logs export
Aug 2008! Batch write/delete
Oct 2008! HTTPS support
Dec 2008! Status dashboard, quota details
Feb 2009! Billing, larger files
Apr 2009! Java launch, DB import, cron support, SDC
May 2009! Key-only queries
Jun 2009! Task queues
Aug 2009! Kindless queries
Sep 2009! XMPP
Oct 2009! Incoming email
Dec 2009! Blobstore
Feb 2010! Datastore cursors, Appstats
Mar 2010! Read policies, IPv6
May 2010! App Engine for Business
Jun 2010! Task queue increases, Python pre-compilation…
Jul 2010! Mapper API
Aug 2010! Multi-tenancy, hi perf img serving, custom err pages
Oct 2010! Instances Console, Delete Kind/App Data
25
26. App Engine 1.4 Release New Features
1.
Channel
API
Allows
for
Server
Push
(Comet)
to
browser
-‐
hWp://code.google.com/appengine/docs/java/channel/
2.
Always
On
3.
Warm
Up
Requests
–
Enabled
by
default
for
Java
apps
–
Can
turn
off
in
appengine-‐web.xml
via:
<warmup-‐requests-‐
enabled>false</warmup-‐requests-‐enabled>
27. App Engine 1.4 Release New Features
4.
Hard
Limit
Updates
–
No
more
30
second
limit
for
background
work
-‐>
up
to
10
minutes
–
Response
size
limits
for
URLFetch
have
been
raised
from
1MB
to
32MB
–
Memcache
batch
get/put
can
now
also
do
up
to
32MB
requests
–
Image
API
requests
and
response
size
limits
have
been
raised
from
1MB
to
32MB
–
Mail
API
outgoing
aWachments
have
been
increased
from
1MB
to
10MB
28. Introducing App Engine for Business
App Engine for Business
Same scalable cloud platform, but designed for the Enterprise
29
29. Google App Engine for Business Details
• Enterprise application management
– Centralized domain console (preview available)
• Enterprise reliability and support Google App Engine
– 99.9% Service Level Agreement for Business
– Direct support
• Hosted SQL
– Relational SQL database in the cloud (preview available)
• SSL on your domain
• Extremely Secure by default
– Integrated Single Sign On (SSO)
• Pricing that makes sense
– Apps cost $8 per user, up to $1000 max per month
30
30. App Engine for Business
Roadmap
Enterprise Administration
Preview (signups available)
Console
Direct Support Preview (signups available)
Hosted SQL Preview (signups available)
Service Level Agreement Preview (Draft published)
Custom Domain SSL Limited Release Q1 2011
32
31. App Engine Resources
Get started with App Engine
• http://code.google.com/appengine
Read up on App Engine for Business and become a trusted tester
• http://code.google.com/appengine/business
32. App Engine Demos
• App Engine
• Getting started
• App Engine for Business
• Domain Console
• Guestbook on SQL on GAE4B
33. Part II - Google’s new Cloud Technologies
Topics covered
• Google Storage for Developers
• Prediction API (machine learning)
• BigQuery
35. What Is Google Storage?
• Store
your
data
in
Google's
cloud
o any
format,
any
amount,
any
Ime
• You
control
access
to
your
data
o private,
shared,
or
public
•
Access
via
Google
APIs
or
3rd
party
tools/libraries
36. Sample Use Cases
Static content hosting
e.g. static html, images, music, video
Backup and recovery
e.g. personal data, business records
Sharing
e.g. share data with your customers
Data storage for applications
e.g. used as storage backend for Android, AppEngine, Cloud based apps
Storage for Computation
e.g. BigQuery, Prediction API
37. Google Storage Benefits
High
Performance
and
Scalability
Backed
by
Google
infrastructure
Strong
Security
and
Privacy
Control
access
to
your
data
Easy
to
Use
Get
started
fast
with
Google
&
3rd
party
tools
38. Google Storage Technical Details
• RESTful API
o Verbs: GET, PUT, POST, HEAD, DELETE
o Resources: identified by URI
o Compatible with S3
• Buckets
o Flat containers
• Objects
o Any type
o Size: 100 GB / object
• Access Control for Google Accounts
o For individuals and groups
• Two Ways to Authenticate Requests
o Sign request using access keys
o Web browser login#
39. Performance and Scalability
• Objects of any type and 100 GB / Object
• Unlimited numbers of objects, 1000s of buckets
• All data replicated to multiple US data centers
• Utilizes Google's worldwide network for data delivery
• Only you can use bucket names with your domain names
• Read-your-writes data consistency
• Range Get
41. Google Storage - Availability
• Preview in US currently
o 100GB free storage and network from Google per
account
o Sign up for waitlist at http://code.google.com/apis/
storage/
• Note: Non US preview available on case-by-case basis
• http://bit.ly/dKm770 (for Storage, BigQuery, Prediction)
42. Demo
• Tools:
o GS Manager
o GSUtil
• Upload / Download
44. Introducing the Google Prediction API
• Google's sophisticated machine learning technology
• Available as an on-demand RESTful HTTP web service
45. How
does
it
work?
"english"
The
quick
brown
fox
jumped
over
the
lazy
The Prediction API dog.
finds relevant
features in the "english"
To
err
is
human,
but
to
really
foul
things
up
sample data during you
need
a
computer.
training. "spanish"
No
hay
mal
que
por
bien
no
venga.
"spanish"
La
tercera
es
la
vencida.
The
PredicIon
API
?
To
be
or
not
to
be,
that
is
the
quesIon.
later
searches
for
those
features
?
La
fe
mueve
montañas.
during
predicIon.
46. A
virtually
endless
number
of
applicaIons...
Customer TransacIon
Species
Message
DiagnosIcs
Sentiment Risk
IdenIficaIon
RouIng
Churn
Legal
Docket
Suspicious
Work
Roster
Inappropriate
PredicIon
ClassificaIon
AcIvity
Assignment
Content
Recommend
PoliIcal
Uplin
Email
Career
Products
Bias
MarkeIng
Filtering
Counselling
...
and
many
more
...
47. Using the Prediction API
A
simple
three
step
process...
Upload
your
training
data
to
1.
Upload
Google
Storage
Build
a
model
from
your
data
2.
Train
3.
Predict
Make
new
predicIons
48. Step
1:
Upload
Upload
your
training
data
to
Google
Storage
• Training data: outputs and input features
• Data format: comma separated value format
(CSV)
"english","To err is human, but to really ..."
"spanish","No hay mal que por bien no venga."
...
Upload
to
Google
Storage
gsutil cp ${data} gs://yourbucket/${data}
49. Step
2:
Train
Create
a
new
model
by
training
on
data
To train a model:
POST prediction/v1.1/training?data=mybucket%2Fmydata
Training runs asynchronously. To see if it has finished:
GET prediction/v1.1/training/mybucket%2Fmydata
{"data":{
"data":"mybucket/mydata",
"modelinfo":"estimated accuracy: 0.xx"}}}
50. Step
3:
Predict
Apply
the
trained
model
to
make
predicIons
on
new
data
POST prediction/v1.1/query/mybucket%2Fmydata/predict
{ "data":{
"input": { "text" : [
"J'aime X! C'est le meilleur" ]}}}
51. Step
3:
Predict
Apply
the
trained
model
to
make
predicIons
on
new
data
POST prediction/v1.1/query/mybucket%2Fmydata/predict
{ "data":{
"input": { "text" : [
"J'aime X! C'est le meilleur" ]}}}
{ data : {
"kind" : "prediction#output",
"outputLabel":"French",
"outputMulti" :[
{"label":"French", "score": x.xx}
{"label":"English", "score": x.xx}
{"label":"Spanish", "score": x.xx}]}}
52. Step
3:
Predict
Apply
the
trained
model
to
make
predicIons
on
new
data
An
example
using
Python
import httplib
header = {"Content-Type" : "application/json"}
#...put new data in JSON format in params variable
conn = httplib.HTTPConnection("www.googleapis.com")conn.request("POST",
"/prediction/v1.1/query/mybucket%2Fmydata/predict”, params, header)
print conn.getresponse()
53. Prediction API Capabilities
Data
• Input Features: numeric or unstructured text
• Output: up to hundreds of discrete categories
Training
• Many machine learning techniques
• Automatically selected
• Performed asynchronously
Access from many platforms:
• Web app from Google App Engine
• Apps Script (e.g. from Google Spreadsheet)
• Desktop app
54. Prediction API v1.1 - features
• Updated Syntax
• Multi-category prediction
o Tag entry with multiple labels
• Continuous Output
o Finer grained prediction rankings based on multiple labels
• Mixed Inputs
o Both numeric and text inputs are now supported
Can combine continuous output with mixed inputs
55. Prediction API Demos
• Creating training data – recipes.csv
• Simple REST access
• Training the prediction engine
• Start predicting!
• A Java Web example
57. Introducing Google BigQuery
• Google's large data adhoc analysis technology
o Analyze massive amounts of data in seconds
• Simple SQL-like query language
• Flexible access
o REST APIs, JSON-RPC, Google Apps Script
58. Why
BigQuery?
Working
with
large
data
is
a
challenge
59. Many
Use
Cases
...
InteracIve
Tools
Trends
Spam DetecIon
Web
Dashboards
Network
OpImizaIon
60. Key
CapabiliIes
of
BigQuery
• Scalable: Billions of rows
• Fast: Response in seconds
• Simple: Queries in SQL
• Web Service
o REST
o JSON-RPC
o Google App Scripts
61. Using BigQuery
Another
simple
three
step
process...
Upload
your
raw
data
to
1.
Upload
Google
Storage
Import
raw
data
into
BigQuery
table
2.
Import
3.
Query
Perform
SQL
queries
on
table
62. Writing Queries
Compact subset of SQL
o SELECT ... FROM ...
WHERE ...
GROUP BY ... ORDER BY ...
LIMIT ...;
Common functions
o Math, String, Time, ...
Statistical approximations
o TOP
o COUNT DISTINCT
63. BigQuery via REST
GET /bigquery/v1/tables/{table name}
GET /bigquery/v1/query?q={query}
Sample JSON Reply:
{
"results": {
"fields": { [
{"id":"COUNT(*)","type":"uint64"}, ... ]
},
"rows": [
{"f":[{"v":"2949"}, ...]},
{"f":[{"v":"5387"}, ...]}, ... ]
}
}
Also supports JSON-RPC
64. Security and Privacy
Standard Google Authentication
• Client Login
• OAuth
• AuthSub
HTTPS support
• protects your credentials
• protects your data
Relies on Google Storage to manage access
65. Large Data Analysis Example
Wikimedia
Revision
History
Wikimedia
Revision
history
data
from:
hWp://download.wikimedia.org/enwiki/latest/enwiki-‐
latest-‐pages-‐meta-‐history.xml.7z
66. Using
BigQuery
Shell
Python DB API 2.0 + B. Clapper's sqlcmd
http://www.clapper.org/software/python/sqlcmd/
69. Further
info
available
at:
• Google Storage for Developers
o http://code.google.com/apis/storage
• Prediction API
o http://code.google.com/apis/predict
• BigQuery
o http://code.google.com/apis/bigquery
70. Recap
• Google App Engine
o Google’s PaaS cloud development platform
• Google App Engine for Business
o New enterprise version of App Engine
• Google Storage
o New high speed data storage on Google Cloud
• Prediction API
o New machine learning technology able to predict
outcomes based on sample data
• BigQuery
o New service for Interactive analysis of very large data
sets using SQL
71. Part III – Dabbling with CFML on App Engine
• How?
– Answer: Open BlueDragon
• OpenBD is a Java CFML runtime engine
• Has Google App Engine port
• Easy installation
– Just copy an example War directory to new GAE Java Web App
– Make sure to merge the jar files from the new WEBINF/lib with
the existing WEBINF/lib