Google Devfest 2009 Argentina - Intro to Appengine


Published on

Google Devfest 2009 Argentina - Intro to Appengine

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Google Devfest 2009 Argentina - Intro to Appengine

  1. 1. From Spark Plug to Drive Train: Life of an App Engine Request Patrick Chanezon @chanezon Ignacio Blanco @blanconet 11/16/09 Post your questions for this talk on Twitter #devfest09 #appengine
  2. 2. Designing for Scale and Reliability
  3. 3. Agenda Designing for Scale and Reliability App Engine: Design Motivations Life of a Request Request for static content Request for dynamic content Requests that use APIs App Engine: Design Motivations, Recap App Engine: The Numbers
  4. 4. Google App Engine
  5. 5. LiveJournal circa 2007 From Brad Fitzpatrick's USENIX '07 talk: "LiveJournal: Behind the Scenes"
  6. 6. LiveJournal circa 2007 From Brad Fitzpatrick's USENIX '07 talk: "LiveJournal: Behind the Scenes" Memcache Application Frontends Servers Storage Static File Servers
  7. 7. Basic LAMP Linux, Apache, MySQL, Programming Language Scalable? Shared machine for database and webserver Reliable? Single point of failure (SPOF)
  8. 8. Dedicated Database Database running on a separate server Requirements Another machine plus additional management Scalable? Up to one web server Reliable? Two single points of failure
  9. 9. Multiple Web Servers Benefits: Grow traffic beyond the capacity of one webserver Requirements: More machines Set up load balancing
  10. 10. Multiple Web Servers Load Balancing: DNS Round Robin Register list of IPs with DNS Statistical load balancing DNS record is cached with Time To Live (TTL) TTL may not be respected
  11. 11. Multiple Web Servers Load Balancing: DNS Round Robin Register list of IPs with DNS Statistical load balancing DNS record is cached with Time To Live (TTL) TTL may not be respected Now wait for DNS changes to propagate :-(
  12. 12. Multiple Web Servers Load Balancing: DNS Round Robin Scalable? Add more webservers as necessary Still I/O bound on one database Reliable? Cannot redirect traffic quickly Database still SPOF
  13. 13. Reverse Proxy Benefits: Custom Routing Specialization Application-level load balancing Requirements: More machines Configuration and code for reverse proxies
  14. 14. Reverse Proxy Scalable? Add more web servers Specialization Bound by Routing capacity of reverse proxy One database server Reliable? Agile application-level routing Specialized components are more robust Multiple reverse proxies requires network-level routing DNS Round Robin (again) Fancy network routing hardware Database is still SPOF
  15. 15. Master-Slave Database Benefits: Better read throughput Invisible to application Requirements: Even more machines Changes to MySQL
  16. 16. Master-Slave Database Scalable? Scales read rate with # of servers But not writes What happens eventually?
  17. 17. Master-Slave Database Reliable? Master is SPOF for writes Master may die before replication
  18. 18. Partitioned Database Benefits: Requirements: Increase in both read Even more machines and write throughput Lots of management Re-architect data model Rewrite queries
  19. 19. The App Engine Stack
  20. 20. App Engine: Design Motivations
  21. 21. Design Motivations Build on Existing Google Technology Provide an Integrated Environment Encourage Small Per-Request Footprints Encourage Fast Requests Maintain Isolation Between Applications Encourage Statelessness and Specialization Require Partitioned Data Model
  22. 22. Life of a Request: Request for Static Content
  23. 23. Request for Static Content on Google Network Routed to the nearest Google datacenter Travels over Google's network Same infrastructure other Google products use Lots of advantages for free
  24. 24. Request for Static Content Routing at the Front End Google App Engine Front Ends Load balancing Routing Frontends route static requests to specialized serving infrastructure
  25. 25. Request for Static Content Static Content Servers Google Static Content Serving Built on shared Google Infrastructure Static files are physically separate from code files How are static files defined?
  26. 26. Request for Static Content What content is static? Java Runtime: appengine-web.xml ... <static> <include path="/**.png" /> <exclude path="/data/**.png /> </static> ... Python Runtime: app.yaml ... - url: /images static_dir: static/images OR - url: /images/(.*) static_files: static/images/1 upload: static/images/(.*) ...
  27. 27. Request For Static Content Response to the user Back to the Front End and out to the user Specialized infrastructure App runtimes don't serve static content
  28. 28. Life of a Request: Request for Dynamic Content
  29. 29. Request for Dynamic Content: New Components App Servers and App Master App Servers Serve dynamic requests Where your code runs App Master Schedules applications Informs Front Ends
  30. 30. Request for Dynamic Content: Appservers What do they do? Many applications Many concurrent requests Smaller app footprint + fast requests = more apps Enforce Isolation Keeps apps safe from each other Enforce statelessness Allows for scheduling flexibility Service API requests
  31. 31. Request For Dynamic Content Routing at the Frontend Front Ends route dynamic requests to App Servers
  32. 32. Request for Dynamic Content App Server 1. Checks for cached runtime If it exists, no initialization 2. Execute request 3. Cache the runtime System is designed to maximize caching Slow first request, faster subsequent requests Optimistically cache data in your runtime!
  33. 33. Life of a Request: Requests accessing APIs
  34. 34. API Requests App Server 1. App issues API call 2. App Server accepts 3. App Server blocks runtime 4. App Server issues call 5. Returns the response Use APIs to do things you don't want to do in your runtime, such as...
  35. 35. APIs
  36. 36. Memcacheg A more persistent in-memory cache Distributed in-memory cache memcacheg Also written by Brad Fitzpatrick adds: set_multi, get_multi, add_multi Optimistically cache for optimization Very stable, robust and specialized
  37. 37. The App Engine Datastore Persistent storage
  38. 38. Bigtable in one slide Scalable structured storage Is not... Is... database sharded, sorted array sharded database distributed hashtable
  39. 39. Datastore Under the Covers Scalability does not come for free Datastore built on top of Bigtable Queries Indexes
  40. 40. Bigtable Sort Machine #1 Machine #2 Machine #3
  41. 41. Bigtable Operations Read Write Delete Single row transaction (atomic) Operate
  42. 42. Bigtable Operations Scan: prefix b Scan: range b to d
  43. 43. Entities table Primary table Every entity in every app Generic, schemaless Row name -> entity key Column (only 1) serialized entity
  44. 44. Entity keys Based on parent entities: root entity, then child, child, ... class Grandparent(db.Model): pass class Parent(db.Model): pass class Child(db.Model): pass Felipe = Grandparent() Carlos = Parent(parent=Felipe) Ignacio = Child(parent=Carlos) my family /Grandparent:Felipe /Grandparent:Felipe/Parent:Carlos /Grandparent:Felipe/Parent:Carlos/Child:Ignacio
  45. 45. Queries GQL < SQL Restrict by kind Filter by property values Sort on properties Ancestor SELECT * FROM Person ... WHERE name = 'John'; ORDER BY name DESC; WHERE city = 'Sonoma' AND state = 'CA' AND country = 'USA'; WHERE ANCESTOR IS :Felipe ORDER BY name;
  46. 46. Indexes Dedicated Bigtable tables Map property values to entities Each row = index data + entity key Convert queries to scans No filtering in memory No sorting in memory
  47. 47. Query planning Convert query to dense index scan 1. Pick index 2. Compute prefix or range 3. Scan!
  48. 48. Kind index SELECT * FROM Grandparent 1. Kind index 2. Prefix = Grandparent 3. Scan! Row name Row name
  49. 49. Single-property index Queries on single property (one ASC one DESC) WHERE name = 'John' ORDER BY name DESC WHERE name >= 'B' AND name < 'C' ORDER BY name Row name Key
  50. 50. Single-property index Queries on single property (one ASC one DESC) Row name Key
  51. 51. Single-property index WHERE name = 'John' 1. Prefix = Parent name John (fully specified) 2. Scan!
  52. 52. Single-property index ORDER BY name DESC 1. Prefix = Parent name 2. Scan!
  53. 53. Single-property index WHERE name >= 'B' and name < 'C' ORDER BY name 1. RANGE = [Parent name B, Parent name C) 2. Scan!
  54. 54. The App Engine Datastore Persistent storage Your data is already partitioned Use Entity Groups Explicit Indexes make for fast reads But slower writes Replicated and fault tolerant On commit: ≥3 machines Geographically distributed Bonus: Keep globally unique IDs for free
  55. 55. Other APIs GMail Gadget API Google Tasks Queue Accounts Picasaweb Google Talk
  56. 56. App Engine: Design Motivations, Recap
  57. 57. Build on Existing Google Technology creative commons licensed photograph:
  58. 58. Provide an Integrated Environment Why? Manage all apps together What it means for you: Follow best practices Some restrictions Use our tools Benefits: Use our tools Admin Console All of your logs in one place No machines to configure or manage Easy deployment
  59. 59. Encourage Small Per-Request Footprints Why? Better utilization of App Servers Fairness What it means for your app: Less Memory Usage Limited CPU Benefits: Better use of resources
  60. 60. Encourage Fast Requests Why? Better utilization of appservers Fairness between applications Routing and scheduling agility What it means for your app: Runtime caching Request deadlines Benefits: Optimistically share state between requests Better throughput Fault tolerance Better use of resources
  61. 61. Maintain Isolation Between Apps Why? Safety Predictability What it means for your app: Certain system calls unavailable Benefits: Security Performance
  62. 62. Encourage Statelessness and Specialization Why? App Server performance Load balancing Fault tolerance What this means for you app: Use API calls Benefits: Automatic load balancing Fault tolerance Less code for you to write Better use of resources
  63. 63. Require Partitioned Data Model Why? The Datastore What this means for your app: Data model + Indexes Reads are fast, writes are slower Benefits: Design your schema once No need to re-architect for scalability More efficient use of cpu and memory
  64. 64. Google App Engine: The Numbers
  65. 65. Google App Engine Currently, over 100K applications Serving over 250M pageviews per day Written by over 200K developers
  66. 66. App Engine
  67. 67. Open For Questions The White House's "Open For Questions" application accepted 100K questions and 3.6M votes in under 48 hours
  68. 68. 18 months in review Apr 2008 Python launch May 2008 Memcache, Images API Jul 2008 Logs export Aug 2008 Batch write/delete Oct 2008 HTTPS support Dec 2008 Status dashboard, quota details Feb 2009 Billing, larger files Apr 2009 Java launch, DB import, cron support, SDC May 2009 Key-only queries Jun 2009 Task queues Aug 2009 Kindless queries Sep 2009 XMPP Oct 2009 Incoming email
  69. 69. Roadmap Storing/serving large files Mapping operations across data sets Datastore cursors Notification alerts for application exceptions Datastore dump and restore facility
  70. 70. Wrap up
  71. 71. Always free to get started ~5M pageviews/month 6.5 CPU hrs/day 1 GB storage 650K URL Fetch calls 2,000 recipients emailed 1 GB/day bandwidth N tasks
  72. 72. Purchase additional resources * * free monthly quota of ~5 million page views still in full effect
  73. 73. Thank you Read more
  74. 74. Questions? Post your questions for this talk on Twitter #devfest09 #appengine
  75. 75. Thanks To Alon Levi, Fred Sauer for most of these slides