Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
From Spark Plug to Drive Train:
Life of an App Engine Request
Patrick Chanezon @chanezon
Ignacio Blanco @blanconet
11/16/0...
Designing for Scale and Reliability
Agenda

 Designing for Scale and Reliability

 App Engine: Design Motivations

 Life of a Request
    Request for static c...
Google App Engine
LiveJournal circa 2007
From Brad Fitzpatrick's USENIX '07 talk:
"LiveJournal: Behind the Scenes"
LiveJournal circa 2007
 From Brad Fitzpatrick's USENIX '07 talk:
 "LiveJournal: Behind the Scenes"        Memcache
       ...
Basic LAMP

Linux, Apache, MySQL,
Programming Language
Scalable?
Shared machine for database
and webserver
Reliable?
Singl...
Dedicated Database

Database running on a
separate server
Requirements
Another machine plus
additional management
Scalable...
Multiple Web Servers




Benefits:
Grow traffic beyond the capacity of one
webserver
Requirements:
More machines
Set up lo...
Multiple Web Servers
Load Balancing: DNS Round Robin




Register list of IPs with DNS
Statistical load balancing
DNS reco...
Multiple Web Servers
Load Balancing: DNS Round Robin




Register list of IPs with DNS
Statistical load balancing
DNS reco...
Multiple Web Servers
  Load Balancing: DNS Round Robin




    Scalable?
Add more webservers as necessary
Still I/O bound ...
Reverse Proxy




  Benefits:
Custom Routing
      Specialization
      Application-level load balancing
  Requirements:
M...
Reverse Proxy
  Scalable?
Add more web servers
Specialization
Bound by
      Routing capacity of reverse proxy
      One d...
Master-Slave Database




    Benefits:
Better read throughput
Invisible to application
    Requirements:
Even more machin...
Master-Slave Database

   Scalable?
Scales read rate with # of servers
      But not writes




  What happens eventually?
Master-Slave Database

  Reliable?
Master is SPOF for writes
Master may die before replication
Partitioned Database




Benefits:               Requirements:
Increase in both read   Even more machines
and write throug...
The App Engine Stack
App Engine:
Design Motivations
Design Motivations


  Build on Existing Google Technology

  Provide an Integrated Environment

  Encourage Small Per-Req...
Life of a Request:
Request for Static Content
Request for Static Content on Google Network




 Routed to the nearest Google datacenter
 Travels over Google's network
 ...
Request for Static Content
 Routing at the Front End




   Google App Engine Front Ends
Load balancing
Routing

   Fronte...
Request for Static Content
 Static Content Servers




   Google Static Content Serving
Built on shared Google Infrastruct...
Request for Static Content
What content is static?

Java Runtime: appengine-web.xml
...
<static>
  <include path="/**.png"...
Request For Static Content
Response to the user




  Back to the Front End and out to the user

  Specialized infrastruct...
Life of a Request:
Request for Dynamic Content
Request for Dynamic Content: New Components
 App Servers and App Master



   App Servers
Serve dynamic requests
Where you...
Request for Dynamic Content: Appservers
What do they do?




 Many applications
 Many concurrent requests
     Smaller app...
Request For Dynamic Content
Routing at the Frontend




 Front Ends route dynamic
 requests to App Servers
Request for Dynamic Content
 App Server




1. Checks for cached runtime
      If it exists, no initialization
2. Execute ...
Life of a Request:
Requests accessing APIs
API Requests
App Server

1.   App issues API call
2.   App Server accepts
3.   App Server blocks runtime
4.   App Server i...
APIs
Memcacheg
A more persistent in-memory cache




 Distributed in-memory cache
 memcacheg
     Also written by Brad Fitzpatr...
The App Engine Datastore
Persistent storage
Bigtable in one slide
Scalable structured storage




          Is not...                  Is...

   database             ...
Datastore Under the Covers




  Scalability does not come for free
  Datastore built on top of Bigtable
  Queries
  Index...
Bigtable


                        Sort



           Machine #1




           Machine #2




           Machine #3
Bigtable
Operations

Read              Write                   Delete




             Single row transaction (atomic)



...
Bigtable
Operations




   Scan: prefix b   Scan: range b to d
Entities table




    Primary table
    Every entity in every app
    Generic, schemaless
    Row name -> entity key
    ...
Entity keys
Based on parent entities: root entity, then child, child, ...


   class Grandparent(db.Model): pass
   class ...
Queries
GQL < SQL

     Restrict by kind
     Filter by property values
     Sort on properties
     Ancestor

      SELEC...
Indexes

     Dedicated Bigtable tables
     Map property values to entities
     Each row = index data + entity key


Con...
Query planning
Convert query to dense index scan


      1. Pick index
      2. Compute prefix or range
      3. Scan!
Kind index

    SELECT * FROM Grandparent

  1. Kind index
  2. Prefix = Grandparent
  3. Scan!

           Row name      ...
Single-property index
Queries on single property (one ASC one DESC)

     WHERE name = 'John'
     ORDER BY name DESC
    ...
Single-property index
Queries on single property (one ASC one DESC)



                  Row name                      Key
Single-property index

     WHERE name = 'John'
  1. Prefix = Parent name John (fully specified)
  2. Scan!
Single-property index

     ORDER BY name DESC
  1. Prefix = Parent name
  2. Scan!
Single-property index

     WHERE name >= 'B' and name < 'C' ORDER BY name
  1. RANGE = [Parent name B, Parent name C)
  2...
The App Engine Datastore
Persistent storage




   Your data is already partitioned
      Use Entity Groups
   Explicit In...
Other APIs


             GMail       Gadget API




             Google      Tasks Queue
             Accounts



       ...
App Engine:
Design Motivations, Recap
Build on Existing Google Technology




               creative commons licensed photograph: http://www.flickr.com/photos/...
Provide an Integrated Environment

    Why?
Manage all apps together
    What it means for you:
Follow best practices
Some...
Encourage Small Per-Request Footprints

   Why?
Better utilization of App Servers
Fairness
   What it means for your app:
...
Encourage Fast Requests

   Why?
Better utilization of appservers
Fairness between applications
Routing and scheduling agi...
Maintain Isolation Between Apps

   Why?
Safety
Predictability
   What it means for your app:
Certain system calls unavail...
Encourage Statelessness and Specialization

   Why?
App Server performance
Load balancing
Fault tolerance
   What this mea...
Require Partitioned Data Model

   Why?
The Datastore
   What this means for your app:
Data model + Indexes
Reads are fast...
Google App Engine:
The Numbers
Google App Engine

 Currently, over 100K applications

 Serving over 250M pageviews per day

 Written by over 200K develop...
App Engine
Open For Questions
 The White House's "Open For Questions" application accepted
 100K questions and 3.6M votes in under 48...
18 months in review
Apr 2008   Python launch
May 2008   Memcache, Images API
Jul 2008   Logs export
Aug 2008   Batch write...
Roadmap

Storing/serving large files
Mapping operations across data
sets
Datastore cursors
Notification alerts for applica...
Wrap up
Always free to get started
~5M pageviews/month
  6.5 CPU hrs/day
  1 GB storage
  650K URL Fetch calls
  2,000 recipients ...
Purchase additional resources *




* free monthly quota of ~5 million page views still in full
effect
Thank you

Read more
    http://code.google.com/appengine/
Questions?
Post your questions for this talk on Twitter #devfest09 #appengine
Thanks
 To Alon Levi, Fred Sauer for most of these slides
Google Devfest 2009 Argentina - Intro to Appengine
Upcoming SlideShare
Loading in …5
×

Google Devfest 2009 Argentina - Intro to Appengine

3,072 views

Published on

Google Devfest 2009 Argentina - Intro to Appengine

Published in: Technology
  • Be the first to comment

Google Devfest 2009 Argentina - Intro to Appengine

  1. 1. From Spark Plug to Drive Train: Life of an App Engine Request Patrick Chanezon @chanezon Ignacio Blanco @blanconet 11/16/09 Post your questions for this talk on Twitter #devfest09 #appengine
  2. 2. Designing for Scale and Reliability
  3. 3. Agenda Designing for Scale and Reliability App Engine: Design Motivations Life of a Request Request for static content Request for dynamic content Requests that use APIs App Engine: Design Motivations, Recap App Engine: The Numbers
  4. 4. Google App Engine
  5. 5. LiveJournal circa 2007 From Brad Fitzpatrick's USENIX '07 talk: "LiveJournal: Behind the Scenes"
  6. 6. LiveJournal circa 2007 From Brad Fitzpatrick's USENIX '07 talk: "LiveJournal: Behind the Scenes" Memcache Application Frontends Servers Storage Static File Servers
  7. 7. Basic LAMP Linux, Apache, MySQL, Programming Language Scalable? Shared machine for database and webserver Reliable? Single point of failure (SPOF)
  8. 8. Dedicated Database Database running on a separate server Requirements Another machine plus additional management Scalable? Up to one web server Reliable? Two single points of failure
  9. 9. Multiple Web Servers Benefits: Grow traffic beyond the capacity of one webserver Requirements: More machines Set up load balancing
  10. 10. Multiple Web Servers Load Balancing: DNS Round Robin Register list of IPs with DNS Statistical load balancing DNS record is cached with Time To Live (TTL) TTL may not be respected
  11. 11. Multiple Web Servers Load Balancing: DNS Round Robin Register list of IPs with DNS Statistical load balancing DNS record is cached with Time To Live (TTL) TTL may not be respected Now wait for DNS changes to propagate :-(
  12. 12. Multiple Web Servers Load Balancing: DNS Round Robin Scalable? Add more webservers as necessary Still I/O bound on one database Reliable? Cannot redirect traffic quickly Database still SPOF
  13. 13. Reverse Proxy Benefits: Custom Routing Specialization Application-level load balancing Requirements: More machines Configuration and code for reverse proxies
  14. 14. Reverse Proxy Scalable? Add more web servers Specialization Bound by Routing capacity of reverse proxy One database server Reliable? Agile application-level routing Specialized components are more robust Multiple reverse proxies requires network-level routing DNS Round Robin (again) Fancy network routing hardware Database is still SPOF
  15. 15. Master-Slave Database Benefits: Better read throughput Invisible to application Requirements: Even more machines Changes to MySQL
  16. 16. Master-Slave Database Scalable? Scales read rate with # of servers But not writes What happens eventually?
  17. 17. Master-Slave Database Reliable? Master is SPOF for writes Master may die before replication
  18. 18. Partitioned Database Benefits: Requirements: Increase in both read Even more machines and write throughput Lots of management Re-architect data model Rewrite queries
  19. 19. The App Engine Stack
  20. 20. App Engine: Design Motivations
  21. 21. Design Motivations Build on Existing Google Technology Provide an Integrated Environment Encourage Small Per-Request Footprints Encourage Fast Requests Maintain Isolation Between Applications Encourage Statelessness and Specialization Require Partitioned Data Model
  22. 22. Life of a Request: Request for Static Content
  23. 23. Request for Static Content on Google Network Routed to the nearest Google datacenter Travels over Google's network Same infrastructure other Google products use Lots of advantages for free
  24. 24. Request for Static Content Routing at the Front End Google App Engine Front Ends Load balancing Routing Frontends route static requests to specialized serving infrastructure
  25. 25. Request for Static Content Static Content Servers Google Static Content Serving Built on shared Google Infrastructure Static files are physically separate from code files How are static files defined?
  26. 26. Request for Static Content What content is static? Java Runtime: appengine-web.xml ... <static> <include path="/**.png" /> <exclude path="/data/**.png /> </static> ... Python Runtime: app.yaml ... - url: /images static_dir: static/images OR - url: /images/(.*) static_files: static/images/1 upload: static/images/(.*) ...
  27. 27. Request For Static Content Response to the user Back to the Front End and out to the user Specialized infrastructure App runtimes don't serve static content
  28. 28. Life of a Request: Request for Dynamic Content
  29. 29. Request for Dynamic Content: New Components App Servers and App Master App Servers Serve dynamic requests Where your code runs App Master Schedules applications Informs Front Ends
  30. 30. Request for Dynamic Content: Appservers What do they do? Many applications Many concurrent requests Smaller app footprint + fast requests = more apps Enforce Isolation Keeps apps safe from each other Enforce statelessness Allows for scheduling flexibility Service API requests
  31. 31. Request For Dynamic Content Routing at the Frontend Front Ends route dynamic requests to App Servers
  32. 32. Request for Dynamic Content App Server 1. Checks for cached runtime If it exists, no initialization 2. Execute request 3. Cache the runtime System is designed to maximize caching Slow first request, faster subsequent requests Optimistically cache data in your runtime!
  33. 33. Life of a Request: Requests accessing APIs
  34. 34. API Requests App Server 1. App issues API call 2. App Server accepts 3. App Server blocks runtime 4. App Server issues call 5. Returns the response Use APIs to do things you don't want to do in your runtime, such as...
  35. 35. APIs
  36. 36. Memcacheg A more persistent in-memory cache Distributed in-memory cache memcacheg Also written by Brad Fitzpatrick adds: set_multi, get_multi, add_multi Optimistically cache for optimization Very stable, robust and specialized
  37. 37. The App Engine Datastore Persistent storage
  38. 38. Bigtable in one slide Scalable structured storage Is not... Is... database sharded, sorted array sharded database distributed hashtable
  39. 39. Datastore Under the Covers Scalability does not come for free Datastore built on top of Bigtable Queries Indexes
  40. 40. Bigtable Sort Machine #1 Machine #2 Machine #3
  41. 41. Bigtable Operations Read Write Delete Single row transaction (atomic) Operate
  42. 42. Bigtable Operations Scan: prefix b Scan: range b to d
  43. 43. Entities table Primary table Every entity in every app Generic, schemaless Row name -> entity key Column (only 1) serialized entity
  44. 44. Entity keys Based on parent entities: root entity, then child, child, ... class Grandparent(db.Model): pass class Parent(db.Model): pass class Child(db.Model): pass Felipe = Grandparent() Carlos = Parent(parent=Felipe) Ignacio = Child(parent=Carlos) my family /Grandparent:Felipe /Grandparent:Felipe/Parent:Carlos /Grandparent:Felipe/Parent:Carlos/Child:Ignacio
  45. 45. Queries GQL < SQL Restrict by kind Filter by property values Sort on properties Ancestor SELECT * FROM Person ... WHERE name = 'John'; ORDER BY name DESC; WHERE city = 'Sonoma' AND state = 'CA' AND country = 'USA'; WHERE ANCESTOR IS :Felipe ORDER BY name;
  46. 46. Indexes Dedicated Bigtable tables Map property values to entities Each row = index data + entity key Convert queries to scans No filtering in memory No sorting in memory
  47. 47. Query planning Convert query to dense index scan 1. Pick index 2. Compute prefix or range 3. Scan!
  48. 48. Kind index SELECT * FROM Grandparent 1. Kind index 2. Prefix = Grandparent 3. Scan! Row name Row name
  49. 49. Single-property index Queries on single property (one ASC one DESC) WHERE name = 'John' ORDER BY name DESC WHERE name >= 'B' AND name < 'C' ORDER BY name Row name Key
  50. 50. Single-property index Queries on single property (one ASC one DESC) Row name Key
  51. 51. Single-property index WHERE name = 'John' 1. Prefix = Parent name John (fully specified) 2. Scan!
  52. 52. Single-property index ORDER BY name DESC 1. Prefix = Parent name 2. Scan!
  53. 53. Single-property index WHERE name >= 'B' and name < 'C' ORDER BY name 1. RANGE = [Parent name B, Parent name C) 2. Scan!
  54. 54. The App Engine Datastore Persistent storage Your data is already partitioned Use Entity Groups Explicit Indexes make for fast reads But slower writes Replicated and fault tolerant On commit: ≥3 machines Geographically distributed Bonus: Keep globally unique IDs for free
  55. 55. Other APIs GMail Gadget API Google Tasks Queue Accounts Picasaweb Google Talk
  56. 56. App Engine: Design Motivations, Recap
  57. 57. Build on Existing Google Technology creative commons licensed photograph: http://www.flickr.com/photos/cote/54408562/
  58. 58. Provide an Integrated Environment Why? Manage all apps together What it means for you: Follow best practices Some restrictions Use our tools Benefits: Use our tools Admin Console All of your logs in one place No machines to configure or manage Easy deployment
  59. 59. Encourage Small Per-Request Footprints Why? Better utilization of App Servers Fairness What it means for your app: Less Memory Usage Limited CPU Benefits: Better use of resources
  60. 60. Encourage Fast Requests Why? Better utilization of appservers Fairness between applications Routing and scheduling agility What it means for your app: Runtime caching Request deadlines Benefits: Optimistically share state between requests Better throughput Fault tolerance Better use of resources
  61. 61. Maintain Isolation Between Apps Why? Safety Predictability What it means for your app: Certain system calls unavailable Benefits: Security Performance
  62. 62. Encourage Statelessness and Specialization Why? App Server performance Load balancing Fault tolerance What this means for you app: Use API calls Benefits: Automatic load balancing Fault tolerance Less code for you to write Better use of resources
  63. 63. Require Partitioned Data Model Why? The Datastore What this means for your app: Data model + Indexes Reads are fast, writes are slower Benefits: Design your schema once No need to re-architect for scalability More efficient use of cpu and memory
  64. 64. Google App Engine: The Numbers
  65. 65. Google App Engine Currently, over 100K applications Serving over 250M pageviews per day Written by over 200K developers
  66. 66. App Engine
  67. 67. Open For Questions The White House's "Open For Questions" application accepted 100K questions and 3.6M votes in under 48 hours
  68. 68. 18 months in review Apr 2008 Python launch May 2008 Memcache, Images API Jul 2008 Logs export Aug 2008 Batch write/delete Oct 2008 HTTPS support Dec 2008 Status dashboard, quota details Feb 2009 Billing, larger files Apr 2009 Java launch, DB import, cron support, SDC May 2009 Key-only queries Jun 2009 Task queues Aug 2009 Kindless queries Sep 2009 XMPP Oct 2009 Incoming email
  69. 69. Roadmap Storing/serving large files Mapping operations across data sets Datastore cursors Notification alerts for application exceptions Datastore dump and restore facility
  70. 70. Wrap up
  71. 71. Always free to get started ~5M pageviews/month 6.5 CPU hrs/day 1 GB storage 650K URL Fetch calls 2,000 recipients emailed 1 GB/day bandwidth N tasks
  72. 72. Purchase additional resources * * free monthly quota of ~5 million page views still in full effect
  73. 73. Thank you Read more http://code.google.com/appengine/
  74. 74. Questions? Post your questions for this talk on Twitter #devfest09 #appengine
  75. 75. Thanks To Alon Levi, Fred Sauer for most of these slides

×