• Like
  • Save
Scaling and Distributing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Scaling and Distributing

  • 288 views
Published

Building and scaling distributed infrastructure. Presented at OdessaPy on Dec 7, 2013.

Building and scaling distributed infrastructure. Presented at OdessaPy on Dec 7, 2013.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
288
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Scaling and Distributing Dima Nedbaylo Dima Malenko
  • 2. So, you want to tell us about distributed systems?
  • 3. OpenOffice.org on iPad
  • 4. Systems are distributed for a reason Our reasons were following: • low latency for end-user connections • horizontal scalability • availability
  • 5. Systems are distributed for a reason Our reasons were following: • low latency for end-user connections • horizontal scalability • availability
  • 6. Historia de número uno Práctico
  • 7. We have to be close to the user Every 5ms of ping between the user and the server counts
  • 8. Computing power provider with multiple locations and great managements capabilities?
  • 9. Computing power provider with multiple locations and great managements capabilities?
  • 10. [Only] 8 locations [Just] $0.6/hour c3.2xlarge 14 ECU 15 GB RAM 2*80 GB SSD $0.6 * 720 = $432/mo
  • 11. So, you’ve got new server… • Minimal setup of Ubuntu 12.04 • Magic fabric script to turn minimal Ubuntu into rollApp app server • Works great and allows to get server up and running within couple of hours
  • 12. Now you’ve got 10 servers… • And need to update one of your components • Or run a maintenance procedure • Or install a couple of new packages • Or correct config
  • 13. Ansible http://www.ansibleworks.com • Learned in just 1 hour • In 2 hours we had a script to setup new app server • Way better and more reliable than fabric for the same purpose • Way easier than chef
  • 14. Ansible vs. Fabric • Requires hosts inventory database • If you need something very custom – have to code in Python 00 00 00 00 00 00 00
  • 15. Ansibe Ad-hoc Ansible has a lot of modules for ad-hoc mode: • downloading/uploading files • managing packages, users, services • Launching EC2 instances and other AWS stuff • databases and db users operations
  • 16. Inventory Exmple [appservers_eu] appsrv-007.rollapp.com ansible_ssh_user=guess_who ansible_ssh_private_key_file=... [appservers:children] appservers_eu [appservers:vars] root_password='password'
  • 17. Playbooks • Plain yaml file • Contains server configuration • Per se it is just set of tasks that invoke ansible modules
  • 18. Simple Playbook --- include: playbooks/timezone.yml - include: playbooks/ntp.yml - hosts: appservers sudo: yes vars: prefix_dir: /opt tasks: # user: dnedbaylo - name: create dnedbaylo user user: name=dnedbaylo groups=admin shell=/bin/zsh - name: authorized keys for dnedbaylo authorized_key: user=dnedbaylo key=”ssh-rsa ….”
  • 19. Ansible vs. Fabric: playbook mode • Plain language (YAML) for playbooks • A lot of modules ready to use (like creating EC2 instances, users management, apt repositories management, etc.) • No need to worry about details (does user already exist?) • Playbooks are “idempotent”
  • 20. Ansible vs. Chef
  • 21. Ansible vs. Chef • No need to learn chef • No need to learn ruby • No weird ruby requirements (not so easy to install chef on Linux Mint) • No need to use additional tools to make life with chef solo easier (hello knife-solo)
  • 22. Historia de número dos Instructivo
  • 23. The difficulty with distributed systems is that they are … distributed
  • 24. My First Law of Distributed Objects Design: Do not distribute your objects http://martinfowler.com/bliki/FirstLaw.html
  • 25. At all times protect integrity of the system… at all cost
  • 26. Put on your oxygen mask first before helping others
  • 27. App Server 1 App Server 2 Web App Server 3 App Server N
  • 28. Web keeps track of consequent errors for each app server App Server 1 App Server 2 Web App Server 3 App Server N
  • 29. Web keeps track of consequent errors for each app server App Server 1 App Server 2 Web App Server 3 App Server monitors its internal state and deactivates itself if bad things happen App Server N
  • 30. App Server 1 App Server 2 Web App Server 3 App Server N
  • 31. App Server 1 App Server 2 Web App Server 3 App Server N
  • 32. App Server 1 App Server 2 Web App Server 3 App Server N
  • 33. App Server 1 App Server 2 Web App Server 3 App Server N
  • 34. What Happened?
  • 35. App Server 1 App Server 2 Web App Server 3 App Server N
  • 36. OOM killer engaged • pings work • simple status checks work(!) App Server 1 App Server 2 Web App Server 3 App Server N
  • 37. Requests still come in, but never get actually processed App Server 1 App Server 2 Web App Server 3 App Server N
  • 38. DB connections pool got saturated. Old requests hung, new requests fail App Server 1 App Server 2 Web App Server 3 App Server N
  • 39. Irregular errors and failures because of resource starvation App Server 1 App Server 2 Web App Server 3 App Server N
  • 40. Верить нельзя никому, порой даже себе… Мне – можно!
  • 41. Lessons Learned • Timeouts on all connections to other components • Monitoring beyond just vitality signs • Keep track of “in progress” requests to prevent cascading errors
  • 42. Things to Remember • [Almost] all modern applications are distributed • Never trust any external interface • Be ready to sacrifice part to keep the entire system afloat • Monitor each interface from both sides on different layers (not just pinging)
  • 43. Never Trusting Is Not Easy • No (!) out of the box solutions for controlling response timeouts – requests only has connection timeouts – if you are on gevent or the like – you are good to go with greenlet timeouts • [Almost] always opt for aggressive health control parameters – request processing time – max address space – max number of queued requests
  • 44. Historia de número tres Inesperado
  • 45. Get application startup time optimization
  • 46. Here ought to be a screenshot but it is not here
  • 47. Here ought to be a screenshot but it is not here Always (I mean ALWAYS) make screenshots when you come across something interesting
  • 48. Application Startup User Browser preparing to launch connect to application App Server Web launch application 52
  • 49. Close to one another Application Startup User Browser Web Faster App Server Slower 53
  • 50. Application Startup User Browser Web App Server 54
  • 51. Do not take anything for granted
  • 52. Any questions? Now: and later: dnedbaylo@rollapp.com @dmalenko dmalenko@rollapp.com www.dmalenko.org