Why Teams call analytics are critical to your entire business
Gae icc fall2011
1. Building webapps for the Cloud
with Python and Google App Engine
Juan Gomez
PythonKC Co-founder
October 29th, 2011
2. Agenda
1. Intro to Google App Engine (GAE for short)
• Brief intro to Python
• Structure of a Python webapp
• Setting up your dev environment with GAE
• App Engine Architecture
• The App Engine datastore and GQL
2. Brief intro to Django
• Using templates with GAE
3. Quick demo of a sample webapp
4. Beyond the basics
• Scalability & Security
• Quotas
• Using the Google Data Services API
5. Summary
• Where to go from here?
• References 2
14. A Python Code Sample
x = 34 - 23 # A comment.
y = “Hello” # Another one.
z = 3.45
if z is not 3.46 or y is “Hello”:
x = x + 1
y = y + “ World” # String concat.
print x
print y
14
15. Enough to Understand the Code
• First assignment to a variable creates it
• Assignment is = and comparison is == (or is)
• For numbers + - * / % are as expected
• Special use:
• + for string concatenation
• % for string formatting (as in C’s printf)
• Logical operators are words (and, or,
not) not symbols (&&, ||, !).
• The basic printing command is print
15
16. Comments
• Start comments with #, rest of line is ignored
• Can include a “documentation string” as the
first line of a new function or class you define
• Development environments, debugger, and
other tools use it: it’s good style to include one
def my_function(x, y):
“““This is the docstring. This
function does blah blah blah.”””
# The code would go here...
16
17. Python and Types
• Everything is an object!
• “Dynamic Typing”-> Data types determined automatically.
• “Strong Typing” -> Enforces them after it figures them out.
x = “the answer is ” # Decides x is string.
y = 23 # Decides y is integer.
print x + y # Python will complain about this.
17
18. Basic Datatypes
• Integers (default for numbers)
•z = 5 / 2 # Answer 2, integer division
• Floats
•x = 3.456
• Strings
• Can use “” or ‘’ to specify with “abc” == ‘abc’
• Unmatched can occur within the string: “matt’s”
• Use triple double-quotes for multi-line strings or strings
that contain both ‘ and “ inside of them:
“““a‘b“c”””
18
19. Whitespace
Whitespace is meaningful in Python: especially
indentation and placement of newlines
•Use a newline to end a line of code
Use when must go to next line prematurely
•No braces {} to mark blocks of code, use
consistent indentation instead
• First line with less indentation is outside of the block
• First line with more indentation starts a nested block
•Colons start of a new block in many constructs,
e.g. function definitions, then clauses
19
20. Assignment
• You can assign to multiple names at the
same time
>>> x, y = 2, 3
>>> x
2
>>> y
3
This makes it easy to swap values
>>> x, y = y, x
• Assignments can be chained
>>> a = b = x = 2
20
21. A Python Code Sample
x = 34 - 23 # A comment.
y = “Hello” # Another one.
z = 3.45
if z is not 3.46 or y is “Hello”:
x = x + 1
y = y + “ World” # String concat.
print x
print y
21
22. Side by Side with Java
Java (C#) Python
public class Employee class Employee():
{
private String myEmployeeName;
private int myTaxDeductions = 1;
private String myMaritalStatus = "single";
def __init__(self,
employeeName
public Employee(String EmployeName)
{ , taxDeductions=1
this(EmployeName, 1);
} , maritalStatus="single"
):
public Employee(String EmployeName, int taxDeductions)
{
this(EmployeName, taxDeductions, "single");
} self.employeeName = employeeName
public Employee(String EmployeName,
int taxDeductions, self.taxDeductions = taxDeductions
String maritalStatus)
self.maritalStatus = maritalStatus
{
this.myEmployeeName = EmployeName;
this.myTaxDeductions = taxDeductions;
this.myMaritalStatus = maritalStatus;
}
}
22
23. Life is Short
(You Need Python)
- Bruce Eckel (Thinking in C++)
23
24. Things to read through:
The Quick Python Book, 2nd Ed
http://amzn.to/lXKzH5
“Learn Python The Hard Way”
http://learnpythonthehardway.org
Python 101 – Beginning Python
http://www.rexx.com/~dkuhlman/python_101/python_101.html
24
25. Things to refer to:
The Python Standard Library
by Example
http://amzn.to/sx3It1
Programming Python, 4th Ed
http://amzn.to/kWjaW2
The Official Python Tutorial
http://www.python.org/doc/current/tut/tut.html
The Python Quick Reference
http://rgruet.free.fr/PQR2.3.html 25
28. App Engine Does One Thing Well
• App Engine handles HTTP(S) requests, nothing else
– Think RPC: request in, processing, response out
– Works well for the web and AJAX; also for other services
• App configuration is dead simple
– No performance tuning needed
• Everything is built to scale
– “infinite” number of apps, requests/sec, storage capacity
– APIs are simple, stupid
28
29. And that allows it to:
• Serve static files
• Serve dynamic requests
• Store data
• Call web services
• Authenticate against Google’s user database
• Send e-mail, process images, use memcache
29
30. Scaling
• 5 million page views a month at the free quota
• Low-usage apps: many apps per physical host
• High-usage apps: multiple physical hosts per app
• Stateless APIs are trivial to replicate
• Memcache is trivial to shard
• Datastore built on top of Bigtable; designed to scale well
– Abstraction on top of Bigtable
– API influenced by scalability
• No joins
• Recommendations: denormalize schema; precompute joins
30
31. Python webapps
• App Engine includes a simple web application
framework called webapp.
• webapp is a WSGI-compatible framework.
• You can use webapp or any other WSGI
framework with GAE (like web.py, CherryPy,
Tornado, Django)
• Basic apps need Config file + webapp CGI
31
32. Config file (No XML!)
• A webapp specifies runtime configuration, including
versions and URLs, in a file named app.yaml.
application: myapp
version: 1
runtime: python
api_version: 1
handlers:
- url: /admin/.*
script: admin.py
login: admin
- url: /index.html
script: home.py
- url: /(.*.(gif|png|jpg))
static_files: static/1
upload: static/(.*.(gif|png|jpg))
32
33. Structure of a Python webapp
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
class MainPage(webapp.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write('Hello, webapp World!')
application = webapp.WSGIApplication([('/', MainPage)],
debug=True)
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
33
34. Setting up your dev environment with GAE
• The dev environment is really, really nice
• Download the (open source) SDK
– http://code.google.com/appengine/downloads.html
– Google_App_Engine_SDK_for_Python
• a full simulation of the App Engine environment
• dev_appserver.py myapp for a local
webserver
• appcfg.py update myapp to deploy to the
cloud
• You get a GUI on OS X and Windows.
34
35. App Engine Architecture
req/resp
stateless APIs R/O FS
urlfech Python stdlib
VM
mail
process app
images
stateful datastore
APIs memcache
35
36. The Datastore
• Based on “BigTable”
• Schemaless
• Scales infinitely
• NoSQL, with SQL type queries (GQL)
– No joins (they do have “reference fields”)
– No aggregate queries - not even count()!
– Hierarchy affects sharding and transactions
– All queries must run against an existing index
• Maybe the best part of App Engine!
36
37. Hierarchical Datastore
• Entities have a Kind, a Key, and Properties
– Entity -> Record -> Python dict -> Python class instance
– Key -> structured foreign key; includes Kind
– Kind -> Table -> Python class
– Property -> Column or Field; has a type
• Dynamically typed: Property types are recorded per Entity
• Key has either id or name
– the id is auto-assigned; alternatively, the name is set by app
– A key can be a path including the parent key, and so on
• Paths define entity groups which limit transactions
– A transaction locks the root entity (parentless ancestor key)
37
38. Creating an Entity in the Datastore
from google.appengine.ext import db
...
class FoursquareUser(db.Model):
"""Creation of an entity of the kind 'FoursquareUser'"""
created = db.DateTimeProperty(auto_now_add=True)
name = db.TextProperty()
email = db.TextProperty()
description = db.StringProperty(multiline=True)
38
39. GQL
• GQL is a SQL-like language for retrieving entities or keys
from the datastore.
• GQL's features are different from a query language for a
traditional relational database.
• GQL syntax is (very) similar to that of SQL
• SELECT [* | __key__] FROM <kind>
[WHERE <condition> [AND <condition> ...]]
[ORDER BY <property> [ASC | DESC] [, <property> [ASC |
DESC] ...]]
[LIMIT [<offset>,]<count>]
[OFFSET <offset>]
<condition> := <property> {< | <= | > | >= | = | != } <value>
<condition> := <property> IN <list>
<condition> := ANCESTOR IS <entity or key>
39
40. Querying my FoursquareUser Entity
from google.appengine.ext import db
...
# I can query the datastore using the Query API
query = db.Query(FoursquareUser)
# Or the methods inherited form db.Model
model_query = FoursquareUser.all()
# Or using GQL
GQL_query = db.GqlQuery("SELECT * FROM FoursquareUser")
# I can print the results to the HTTP response object
for user in query:
self.response.out.write("User name: %s" % user["name"])
self.response.out.write("User email: %s" % user["email"])
40
41. Creating the Views
• HTML embedded in code is messy and
difficult to maintain.
• It's better to use a templating system.
– HTML is kept in a separate files with special syntax to indicate
where the data from the application appears in the view.
• There are many templating systems for Python: EZT,
Cheetah, ClearSilver, Quixote, and Django are just a
few.
• You can use your template engine of choice by bundling
it with your application code.
• Or you can use Django out of the box!
41
43. Why use Django over webapp?
• Django templating is one of the best in its
class
• Django has easy cookies and custom 500
errors
• Django is less verbose
• Django middleware is really handy
• URL patterns
– mapping between URL patterns (RegEx) to callback
functions (views).
– URLconf
43
44. Using templates with GAE
from django.conf.urls.defaults import *
from django.http import HttpResponse
def hello(request):
return HttpResponse("Hello, World!")
urlpatterns = patterns('', ('^$', hello),)
# You have to write even less code!
44
46. Automatic Scaling to Application Needs
• You don’t need to configure your resource needs
• One CPU can handle many requests per second
• Apps are hashed (really mapped) onto CPUs:
– One process per app, many apps per CPU
– Creating a new process is a matter of cloning a generic “model”
process and then loading the application code (in fact the
clones are pre-created and sit in a queue)
– The process hangs around to handle more requests (reuse)
– Eventually old processes are killed (recycle)
• Busy apps (many QPS) get assigned to multiple CPUs
– This automatically adapts to the need
• as long as CPUs are available
46
47. Security
• Prevent the bad guys from breaking (into) your app
• Constrain direct OS functionality
– no processes, threads, dynamic library loading (use Task Queue)
– no sockets (use urlfetch API)
– can’t write files (use datastore)
– disallow unsafe Python extensions (e.g. ctypes)
• Limit resource usage
– Limit 10,000 files per app, adding up to 32 MB
– Hard time limit of 60 seconds per request
– Daily Max of 6.50 CPU hours (CPU cycles on 1.2 GHz Intel x86)
– Hard limit of 32 MB on request and response size, API call size,
etc.
– Quota system for number of requests, API calls, emails sent, etc
47
48. Preserving Fairness Through Quotas
• Everything an app does is limited by quotas.
• If you run out of quota that particular operation is
blocked
• Free quotas are tuned so that a well-written app (light
CPU/datastore use) can survive a moderate
“slashdotting”
• The point of quotas is to be able to support a very large
number of small apps.
• Large apps need raised quotas ($$$)
48
52. Using the Google Data Service API
import gdata.docs.service
# Create a client class which will make HTTP requests with
# Google Docs server.
client = gdata.docs.service.DocsService()
# Authenticate using your Google Docs email address and
# password.
client.ClientLogin('joe@gmail.com', 'password')
# Query the server for an Atom feed containing a list of your
# documents.
documents_feed = client.GetDocumentListFeed()
# Loop through the feed and extract each document entry.
for document_entry in documents_feed.entry:
# Display the title of the document on the command line.
print document_entry.title.text
52
54. What about vendor Lock-in?
• Use Django-nonrel
– independent branch of Django that adds NoSQL
database support to the ORM
http://www.allbuttonspressed.com/projects/django-
nonrel
– No JOINs! :-(
• Port existing Django projects with very few
changes!
• Switch from Bigtable to other NoSQL DBs
– MongoDB, SimpleDB. (soon Cassandra, CouchDB)
• Move freely between Django hosting providers
– ep.io, gondor.io, Heroku or your own VPS
54
55. Where to go from here?
• Learn Python!
• Download the App Engine SDK
• Build Apps...Many Apps
• Learn Django! (is awesome)
• Join a local Python User Group
– http://www.pyowa.org or @pyowa on Twitter.
– http://pythonkc.com/ or @pythonkc on Twitter.
55
56. Resources
• Read the Articles on the Google App Engine site.
– http://code.google.com/appengine/docs/python/gettingstarted/
– http://code.google.com/appengine/docs/python/overview.html
– http://appengine-cookbook.appspot.com/
• Keep up with new releases through the Blog
– http://googleappengine.blogspot.com/
• Read the source code from the SDK
– http://code.google.com/p/googleappengine/
– Maybe you will find an undocumented API.
56
57. Things to read through:
Programming Google App
Engine. (Python & Java)
http://amzn.to/ovzX4q
57
MAC APPLICATION FOR UPLOADING APPS\nCOMMAND-LINE UTILITY FOR WINDOWS, LINUX, AND MAC\nTHAT&#x2019;S IT, THIS APP WOULD BE SERVING LIVE FROM GOOGLE DATACENTERS\nABLE TO SCALE TO MILLIONS OF USERS\n
MEMCACHE\nSIMPLE COHERENT CACHING LAYER\nSPEEDS UP RESPONSE TIMES FOR COMMON PAGES + QUERIES\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
A webapp application consists of three parts:\none or more RequestHandler classes (described in Request Handlers)\na WSGIApplication object that maps URLs to RequestHandler classes\na main routine that runs the WSGIApplication using a CGI adaptor\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
for example:\nrequest count, bandwidth used, CPU usage, datastore call count, disk space used, emails sent, even errors!\n\n(raising an exception) for a while (~10 min) until replenished\n