• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Scaling and Distributing
 

Scaling and Distributing

on

  • 355 views

Building and scaling distributed infrastructure. Presented at OdessaPy on Dec 7, 2013.

Building and scaling distributed infrastructure. Presented at OdessaPy on Dec 7, 2013.

Statistics

Views

Total Views
355
Views on SlideShare
355
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Scaling and Distributing Scaling and Distributing Presentation Transcript

    • Scaling and Distributing Dima Nedbaylo Dima Malenko
    • So, you want to tell us about distributed systems?
    • OpenOffice.org on iPad
    • Systems are distributed for a reason Our reasons were following: • low latency for end-user connections • horizontal scalability • availability
    • Systems are distributed for a reason Our reasons were following: • low latency for end-user connections • horizontal scalability • availability
    • Historia de número uno Práctico
    • We have to be close to the user Every 5ms of ping between the user and the server counts
    • Computing power provider with multiple locations and great managements capabilities?
    • Computing power provider with multiple locations and great managements capabilities?
    • [Only] 8 locations [Just] $0.6/hour c3.2xlarge 14 ECU 15 GB RAM 2*80 GB SSD $0.6 * 720 = $432/mo
    • So, you’ve got new server… • Minimal setup of Ubuntu 12.04 • Magic fabric script to turn minimal Ubuntu into rollApp app server • Works great and allows to get server up and running within couple of hours
    • Now you’ve got 10 servers… • And need to update one of your components • Or run a maintenance procedure • Or install a couple of new packages • Or correct config
    • Ansible http://www.ansibleworks.com • Learned in just 1 hour • In 2 hours we had a script to setup new app server • Way better and more reliable than fabric for the same purpose • Way easier than chef
    • Ansible vs. Fabric • Requires hosts inventory database • If you need something very custom – have to code in Python 00 00 00 00 00 00 00
    • Ansibe Ad-hoc Ansible has a lot of modules for ad-hoc mode: • downloading/uploading files • managing packages, users, services • Launching EC2 instances and other AWS stuff • databases and db users operations
    • Inventory Exmple [appservers_eu] appsrv-007.rollapp.com ansible_ssh_user=guess_who ansible_ssh_private_key_file=... [appservers:children] appservers_eu [appservers:vars] root_password='password'
    • Playbooks • Plain yaml file • Contains server configuration • Per se it is just set of tasks that invoke ansible modules
    • Simple Playbook --- include: playbooks/timezone.yml - include: playbooks/ntp.yml - hosts: appservers sudo: yes vars: prefix_dir: /opt tasks: # user: dnedbaylo - name: create dnedbaylo user user: name=dnedbaylo groups=admin shell=/bin/zsh - name: authorized keys for dnedbaylo authorized_key: user=dnedbaylo key=”ssh-rsa ….”
    • Ansible vs. Fabric: playbook mode • Plain language (YAML) for playbooks • A lot of modules ready to use (like creating EC2 instances, users management, apt repositories management, etc.) • No need to worry about details (does user already exist?) • Playbooks are “idempotent”
    • Ansible vs. Chef
    • Ansible vs. Chef • No need to learn chef • No need to learn ruby • No weird ruby requirements (not so easy to install chef on Linux Mint) • No need to use additional tools to make life with chef solo easier (hello knife-solo)
    • Historia de número dos Instructivo
    • The difficulty with distributed systems is that they are … distributed
    • My First Law of Distributed Objects Design: Do not distribute your objects http://martinfowler.com/bliki/FirstLaw.html
    • At all times protect integrity of the system… at all cost
    • Put on your oxygen mask first before helping others
    • App Server 1 App Server 2 Web App Server 3 App Server N
    • Web keeps track of consequent errors for each app server App Server 1 App Server 2 Web App Server 3 App Server N
    • Web keeps track of consequent errors for each app server App Server 1 App Server 2 Web App Server 3 App Server monitors its internal state and deactivates itself if bad things happen App Server N
    • App Server 1 App Server 2 Web App Server 3 App Server N
    • App Server 1 App Server 2 Web App Server 3 App Server N
    • App Server 1 App Server 2 Web App Server 3 App Server N
    • App Server 1 App Server 2 Web App Server 3 App Server N
    • What Happened?
    • App Server 1 App Server 2 Web App Server 3 App Server N
    • OOM killer engaged • pings work • simple status checks work(!) App Server 1 App Server 2 Web App Server 3 App Server N
    • Requests still come in, but never get actually processed App Server 1 App Server 2 Web App Server 3 App Server N
    • DB connections pool got saturated. Old requests hung, new requests fail App Server 1 App Server 2 Web App Server 3 App Server N
    • Irregular errors and failures because of resource starvation App Server 1 App Server 2 Web App Server 3 App Server N
    • Верить нельзя никому, порой даже себе… Мне – можно!
    • Lessons Learned • Timeouts on all connections to other components • Monitoring beyond just vitality signs • Keep track of “in progress” requests to prevent cascading errors
    • Things to Remember • [Almost] all modern applications are distributed • Never trust any external interface • Be ready to sacrifice part to keep the entire system afloat • Monitor each interface from both sides on different layers (not just pinging)
    • Never Trusting Is Not Easy • No (!) out of the box solutions for controlling response timeouts – requests only has connection timeouts – if you are on gevent or the like – you are good to go with greenlet timeouts • [Almost] always opt for aggressive health control parameters – request processing time – max address space – max number of queued requests
    • Historia de número tres Inesperado
    • Get application startup time optimization
    • Here ought to be a screenshot but it is not here
    • Here ought to be a screenshot but it is not here Always (I mean ALWAYS) make screenshots when you come across something interesting
    • Application Startup User Browser preparing to launch connect to application App Server Web launch application 52
    • Close to one another Application Startup User Browser Web Faster App Server Slower 53
    • Application Startup User Browser Web App Server 54
    • Do not take anything for granted
    • Any questions? Now: and later: dnedbaylo@rollapp.com @dmalenko dmalenko@rollapp.com www.dmalenko.org