0
Scaling From Zero
to 6 Million Mobile Users
KW Justin Leung, Banjo

November 15, 2013

© 2013 Amazon.com, Inc. and its aff...
Banjo
•
•
•
•
•

Real-time location meets social data
An engineering-focused company
Events recommendation, alert, & disco...
Growth Factors
• Grew from 0 to 5+ Million in 2 years
• Indexed over 700 Million profiles
• Processing Billions of locatio...
The Stack
• Amazon EC2 / Elastic Load Balancing / Amazon S3
/ Elastic Beanstalk / Heroku
• Ruby on Rails
• MongoDB
• Redis...
First 9 Months, from 0 to Million
•
•
•
•

Amazon EC2 deployment with Rubber
No background jobs, frontend instances only
H...
Challenges @ 1M Users
•
•
•
•

Limited engineering resources
Not too agile with Rubber
Outgrew hosted MongoDB limit
No Dev...
Growing to 2M+ Users
•
•
•
•
•

Migrated from EC2 instances to Heroku
Delayed jobs: GirlFriday -> Qu -> Sidekiq
In-house M...
Challenges @ 2M Users
• Explosion of social graph size
• Cost to process background jobs
• Latency to poll external social...
Banjo @ 4M+ Users
• 100x Heroku workers
• Social graph increased to 400M+ profiles
• Indexed one month of global location-...
Challenges @ 4M Users
-I
• Heroku Dynos limited to 512MB of memory, slow
CPU
• Heroku routing latency becomes obvious
• Bl...
Now, 6 Million Users
•
•
•
•
•
•
•

Social graph increased to 700M profiles
Heroku -> Elastic Beanstalk
Service-oriented a...
Heroku PROS / CONS
• The Pros:
–
–
–
–

Brainless deploy / rollback flow
Instant availability of dynos and workers
Zero se...
Elastic Beanstalk - PROS / CONS
• The Pros:
–
–
–
–

Choice of instance types, Availability Zones
Increased concurrency wi...
Managed EC2 Instances
• MongoDB instances (DBA)
• Elastic Beanstalk managed environments (Eng)
• Heroku managed services (...
Recommendation for startups:
• Start prototyping on small scale PaaS services
• Add-ons are really helpful
Papertrail, New...
Q&A

justin@teambanjo.com
@girak
http://ban.jo
Please give us your feedback on this
presentation

CPN303
As a thank you, we will select prize
winners daily for completed...
Cloud Connected Devices On A Global Scale
Bryant Eastham, Panasonic
November 15, 2013

© 2013 Amazon.com, Inc. and its aff...
Two roads diverged in a wood, and I—
I took the one less traveled by …
Robert Frost
Understanding “Small” And “Cloud”
• What is “small”?
– Production scale that required minimal cost
– Devices that Moore fo...
Implications Of Small Devices
• Support for whitelisting
– Yes, it is still done
– No, “open up all outbound traffic” is n...
Cloud System Requirements
•
•
•
•

Support whitelisting (fixed IP address set)
Support UDP as well as TCP
Support Auto Sca...
Connectivity Using Elastic IP Addresses,
With A Configuration Detour
Meeting The
Requirements

(IP1 .. IPn)

• Reuse when possible,
invent when necessary
• Reuse =
Amazon Web Services:
– Amaz...
Application-managed IP Addresses
• “Standard” EIP are not enough
– All addresses must be active all the time
– Addresses m...
Chickens And Eggs

• Managed IP addresses require
multiple EIPs and configuration
• VPC is required to allow multiple EIP
...
Breaking The Cycle
• Each VPC requires a bridge for network access
• Putting a Puppet Master on this bridge breaks
the cyc...
VPC (per AZ)

VPC (per AZ)
VPC (per AZ)

VPC (perIPn)
(IP1 .. AZ)

(IP1 .. IPn) AZ)
VPC (per
(IP1 .. IPn)

Bridge/Puppet M...
What About All Those Masters?
• World-wide support requires many VPCs
– Multiple Availability Zones
– Multiple regions

• ...
Mastering The Puppet Masters
• Amazon S3 is an excellent choice for Puppet
Master configuration
– Global, highly available...
What About Performance?
• We cannot stop here – we don’t want all traffic to
always go through a bridge
• So we do not sto...
Tools For Instance Configuration
• Instance metadata
– Instance ID, user data (always available)
– AWS (requires Internet ...
EIP Management
• DNS (Amazon Route 53) for address configuration
– Configure a master name that contain all EIPs (for conf...
EIP Pseudo-Code – Startup
Get a Primary Public IP – repeat until successful
Allocate Network Interfaces and Private IPs (b...
EIP Pseudo-Code – Periodic
Use DNS, EIP APIs to determine current pool for my region, intersect
Validate my Primary Public...
EIP Pseudo-Code – Shutdown
Release all additional Network Interfaces
Release all Public IPs except my Primary Public IP (f...
IP1 .. IPn

VPC (per AZ)
User Data

Instance Data

Route 53
Bridge/Puppet Master
Amazon S3

EIP Management
Global Scale, Global Services
Adding Back The 90%
• Our configured instances play nice with AWS
–
–
–
–
–

Bootstrapping through AMI or Cloud Init
Auto ...
AutoScale (per region)
Instance Data

Route 53

VPC (per AZ)
User Data

IP1 .. IPn
Bridge/Puppet Master

Amazon S3

Elasti...
Please give us your feedback on this
presentation

CPN303
As a thank you, we will select prize
winners daily for completed...
Upcoming SlideShare
Loading in...5
×

Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013

1,639

Published on

Increasingly, mobile and other connected devices are leveraging the scalability and capabilities of the cloud to deliver services to end users. However, connecting these devices to the cloud presents unique challenges. Resource constraints make it impossible to use many common frameworks and transport restrictions make it difficult to use dynamic cloud resources. In this session, learn how you can develop and deploy highly-scalable global solutions using Amazon Web Services (Amazon Virtual Private Cloud, Elastic IP addresses, Amazon Route 53, Auto Scaling) and tools like Puppet. Hear how Panasonic and Banjo architect their cloud infrastructure from both a start-up and enterprise perspective.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,639
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013"

  1. 1. Scaling From Zero to 6 Million Mobile Users KW Justin Leung, Banjo November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Banjo • • • • • Real-time location meets social data An engineering-focused company Events recommendation, alert, & discovery Top Developer and Editor’s Choice in Google Play Named Top 10 World Innovator in Local - Fast Company
  3. 3. Growth Factors • Grew from 0 to 5+ Million in 2 years • Indexed over 700 Million profiles • Processing Billions of location-based social posts • Geospatial indexing for 500K+ posts per hour • Categorized 1000’s of event types • Over 50 Million background jobs processed daily
  4. 4. The Stack • Amazon EC2 / Elastic Load Balancing / Amazon S3 / Elastic Beanstalk / Heroku • Ruby on Rails • MongoDB • Redis • Memcached • Sidekiq • NewRelic / PagerDuty / Papertrail / Graphite
  5. 5. First 9 Months, from 0 to Million • • • • Amazon EC2 deployment with Rubber No background jobs, frontend instances only Hosted MongoDB clusters 0 DevOp
  6. 6. Challenges @ 1M Users • • • • Limited engineering resources Not too agile with Rubber Outgrew hosted MongoDB limit No DevOp
  7. 7. Growing to 2M+ Users • • • • • Migrated from EC2 instances to Heroku Delayed jobs: GirlFriday -> Qu -> Sidekiq In-house MongoDB clusters on EC2 Social graph increased to 300M+ profiles 1 DBA / DevOp
  8. 8. Challenges @ 2M Users • Explosion of social graph size • Cost to process background jobs • Latency to poll external social feeds
  9. 9. Banjo @ 4M+ Users • 100x Heroku workers • Social graph increased to 400M+ profiles • Indexed one month of global location-based posts • 10 Millions of background jobs processed daily • Still -1 DBA / DevOp
  10. 10. Challenges @ 4M Users -I • Heroku Dynos limited to 512MB of memory, slow CPU • Heroku routing latency becomes obvious • Bloated codebase, limited forking for concurrency • Power users with large social graph churns data
  11. 11. Now, 6 Million Users • • • • • • • Social graph increased to 700M profiles Heroku -> Elastic Beanstalk Service-oriented architecture Unicorn -> Elastic Beanstalk with Nginx + Passenger 50 Million background jobs processed daily Hundreds of EC2 instances And... still 1 DBA/Dev-Op
  12. 12. Heroku PROS / CONS • The Pros: – – – – Brainless deploy / rollback flow Instant availability of dynos and workers Zero setup & maintenance cost No Dev-Op need • The Cons: – – – – Limited memory & CPU make it hard for concurrency Routing layer latency No built-in auto-scaling, limited available zones (US/EU) Not enough control, limited access when there are platform issues
  13. 13. Elastic Beanstalk - PROS / CONS • The Pros: – – – – Choice of instance types, Availability Zones Increased concurrency with Passenger / Nginx, support for auto-scaling Low latency with Amazon Route 53 & Elastic Load Balancing Cost efficient • The Cons: – – – – Initial setup cost for beanstalk containers and environments Slow container updates - currently Ruby 1.9 / Passenger 3.0.17 + Nginx 1.23 Time to spin up new instances for seamless deploys There’s some learning curve to Elastic Beanstalk scripts
  14. 14. Managed EC2 Instances • MongoDB instances (DBA) • Elastic Beanstalk managed environments (Eng) • Heroku managed services (Eng) • Elastic Beanstalk + Heroku can easily be managed by small-sized, agile engineering team
  15. 15. Recommendation for startups: • Start prototyping on small scale PaaS services • Add-ons are really helpful Papertrail, NewRelic, Hosted MongoDB/Redis/Memcached/Metrics • Pager alerts with ScoutApp, Pingdom, PagerDuty • Make use of health & metrics dashboards • Deploy frequently & scale up along the way
  16. 16. Q&A justin@teambanjo.com @girak http://ban.jo
  17. 17. Please give us your feedback on this presentation CPN303 As a thank you, we will select prize winners daily for completed surveys!
  18. 18. Cloud Connected Devices On A Global Scale Bryant Eastham, Panasonic November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  19. 19. Two roads diverged in a wood, and I— I took the one less traveled by … Robert Frost
  20. 20. Understanding “Small” And “Cloud” • What is “small”? – Production scale that required minimal cost – Devices that Moore forgot – Speed in MHz, memory in KB, solution designed around resources • What is “cloud”? – HTML/XML for transport – SSL for security – Solutions typically don’t consider resource-constrained devices
  21. 21. Implications Of Small Devices • Support for whitelisting – Yes, it is still done – No, “open up all outbound traffic” is not acceptable • One-stop connectivity – Not every protocol uses TCP – UDP is still great for some things, and required for others (NTP)
  22. 22. Cloud System Requirements • • • • Support whitelisting (fixed IP address set) Support UDP as well as TCP Support Auto Scaling and Elastic Load Balancing Off-instance logging and monitoring • AWS gets us 90% there – the last 10% is our focus today
  23. 23. Connectivity Using Elastic IP Addresses, With A Configuration Detour
  24. 24. Meeting The Requirements (IP1 .. IPn) • Reuse when possible, invent when necessary • Reuse = Amazon Web Services: – Amazon VPC – EIP addresses – Amazon Route 53
  25. 25. Application-managed IP Addresses • “Standard” EIP are not enough – All addresses must be active all the time – Addresses must move to adapt to scale changes – Support multiple addresses per instance for low-scale periods • Application-managed IP addresses fill the gaps – All addresses can be active (assigned to an instance) – API control of EIP assignment provides migration during scaling, and this can be done “cleanly” by the application  However, only VPC instances allow multiple EIP assignments
  26. 26. Chickens And Eggs • Managed IP addresses require multiple EIPs and configuration • VPC is required to allow multiple EIP management • Configuration requires Puppet and AWS access • Puppet and AWS require access to the network (from the VPC) • Network access requires instance configuration and a VPC bridge • Instance configuration is part of application configuration (managed IP address information) • Rinse, lather, repeat…
  27. 27. Breaking The Cycle • Each VPC requires a bridge for network access • Putting a Puppet Master on this bridge breaks the cycle of network access/configuration – – – – Allows the use of VPC security groups to control access Use a lightweight instance Assign any EIP to allow external access Configure to support VPC/Internet bridging • All VPC instances are configured to use the Puppet Master as their gateway (initially)
  28. 28. VPC (per AZ) VPC (per AZ) VPC (per AZ) VPC (perIPn) (IP1 .. AZ) (IP1 .. IPn) AZ) VPC (per (IP1 .. IPn) Bridge/Puppet Master Bridge/Puppet Master Bridge/Puppet Master (IP1 .. IPn) (IP1 .. IPn) Bridge/Puppet Master Bridge/Puppet Master Too Many Puppet Masters
  29. 29. What About All Those Masters? • World-wide support requires many VPCs – Multiple Availability Zones – Multiple regions • Each VPC requires network access – Each VPC requires a bridge • We solved one problem, and introduced another – Puppet Master configuration
  30. 30. Mastering The Puppet Masters • Amazon S3 is an excellent choice for Puppet Master configuration – Global, highly available – Excellent security (access control) and logging – Sharable between accounts • One-way synchronization from Amazon S3 to distributed Puppet Masters solves the configuration problem
  31. 31. What About Performance? • We cannot stop here – we don’t want all traffic to always go through a bridge • So we do not stop here, we only configure here – Access to the Puppet Master and the network allows access to our configuration – Our configuration includes information about our EIP pool, as well as whether we need to acquire additional EIPs – If we don’t need an EIP, then we continue to use our the bridge/Puppet Master
  32. 32. Tools For Instance Configuration • Instance metadata – Instance ID, user data (always available) – AWS (requires Internet access) • Remember, instance data uses AWS API calls • Puppet – Configuration rules – Unsecured files • Amazon S3 – Secured files (use role-based API authentication)
  33. 33. EIP Management • DNS (Amazon Route 53) for address configuration – Configure a master name that contain all EIPs (for configuration) – Configure host name for regional EIPs (latency-based) • Each instance knows the master name • Use EIP APIs to intersect the master list with the EIP list of the instance’s VPC • Instances find their neighbors and share the EIPs • Each instance periodically checks itself
  34. 34. EIP Pseudo-Code – Startup Get a Primary Public IP – repeat until successful Allocate Network Interfaces and Private IPs (based on instance type) Notify application of all Public IPs acquired with a Network Interface
  35. 35. EIP Pseudo-Code – Periodic Use DNS, EIP APIs to determine current pool for my region, intersect Validate my Primary Public IP – get one if required Validate configured Public IPs – release if no longer configured Check Scale Group, determine address count per instance (ROUND UP) Determine Public IP changes, and allocate/release with application’s help Release a Public IP if I have too many (application determines which) Allocate all required Public IPs if I have too few If there are nodes without an address, give one up Instances are ordered, and know who will give up an address Application picks the least used address
  36. 36. EIP Pseudo-Code – Shutdown Release all additional Network Interfaces Release all Public IPs except my Primary Public IP (for logging) The instance then terminates, freeing the Primary Public IP
  37. 37. IP1 .. IPn VPC (per AZ) User Data Instance Data Route 53 Bridge/Puppet Master Amazon S3 EIP Management
  38. 38. Global Scale, Global Services
  39. 39. Adding Back The 90% • Our configured instances play nice with AWS – – – – – Bootstrapping through AMI or Cloud Init Auto Scaling groups set user and instance data Load balancers managed with Auto Scaling groups Latency-based Route 53 address for TCP/HTTP Latency-based Route 53 address for UDP and whitelists • Internet access for remote logging • Amazon CloudWatch for monitoring
  40. 40. AutoScale (per region) Instance Data Route 53 VPC (per AZ) User Data IP1 .. IPn Bridge/Puppet Master Amazon S3 Elastic Load Balancing Goal Achieved CloudWatch papertrail Route 53 Latency-based Lookups
  41. 41. Please give us your feedback on this presentation CPN303 As a thank you, we will select prize winners daily for completed surveys!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×