• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Automating Cloud Applications Using Open Source
 

Automating Cloud Applications Using Open Source

on

  • 4,168 views

With the proliferation of tools, frameworks, and libraries, it’s now easier than ever to build cloud-based systems. However, while each tool is designed to solve a specific pain point, gaps exist ...

With the proliferation of tools, frameworks, and libraries, it’s now easier than ever to build cloud-based systems. However, while each tool is designed to solve a specific pain point, gaps exist when it comes to a holistic approach to managing the cloud-based software lifecycle. Using real-world examples, BrightTag engineers explain how they helped design a highly scalable platform and automated zero-downtime deploys using primarily off-the-shelf open source software. The talk will focus on the software lifecycle, broken into three high-level areas of focus: Design, deployment and monitoring. This session will review considerations for designing applications to take advantage of cloud-based deployment and demonstrate how to leverage existing open source tools like fabric, haproxy, libcloud, and graphite to create a scalable and flexible infrastructure.

Statistics

Views

Total Views
4,168
Views on SlideShare
2,170
Embed Views
1,998

Actions

Likes
4
Downloads
51
Comments
0

5 Embeds 1,998

http://getvoip.com 1838
http://codyaray.com 152
http://plus.url.google.com 5
http://translate.googleusercontent.com 2
http://www.brighttag.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Automating Cloud Applications Using Open Source Automating Cloud Applications Using Open Source Presentation Transcript

    • Automating Life in the Cloud!Joshua Buss, Matthew Kemp & Cody Ray !
    • 2
    • Designing for the Cloud!Use Case 0: Scalability and Reliability!Add more features!!!This widget is too slow!!!No more downtime!!!Weʼre losing potential customers in Asia!! 3
    • Scalability!Use Case 0: Scalability and Reliability! Focus on scaling applications horizontally.! 4
    • Service Oriented Architecture!Use Case 0: Scalability and Reliability!Wikipedia Definition:!SOA as an architecture relies on service-orientation as itsfundamental design principle. If a service presents a simpleinterface that abstracts away its underlying complexity,users can access independent services without knowledgeof the services platform implementation.!!Laymanʼs terms:!A complex system is broken into simple components thatare able to interact with each other (and possibly outsidesources).! 5
    • What is a Service in SOA?!Use Case 0: Scalability and Reliability! An independent unit Presenta(on  (web,  api,  etc)   thats composable with other components.!Business  Logic   Business  Logic   Data   Data   Data   Access     Access   Access   Data  Stores   6
    • Services at BrightTag!Use Case 0: Scalability and Reliability! ui   stathub   database   tagserve   datahub   database   7
    • Service Division of Labor!Use Case 0: Scalability and Reliability! When should you split services up?! 8
    • Design for Failure!Use Case 0: Scalability and Reliability!Keep failures selfcontained.!Release It! by Nyard is a great resourcefor stability patterns 9
    • Redundancy at BrightTag!Use Case 0: Scalability and Reliability!Run a full stack in each region.! ui   ui   stathub   stathub   database   database   tagserve   tagserve   datahub   datahub   database   database   10
    • Load Balancers!Use Case 0: Scalability and Reliability!!Services are over HTTP.!!Able to use standardtools and componentswithout extra effort.! 11
    • Backwards Compatibility!Use Case 0: Scalability and Reliability!Changes need to be allowed, but compatibilityneeds to be maintained.!! 12
    • Cross-Region Data Replication!Case 1: Inter-Region Communication! Need some data available in all regions, but keep inter-region communication to a minimum.! ! 13
    • What is Cassandra?!Case 1: Inter-Region Communication!Googles BigTable datamodel on AmazonsDynamo infrastructure.! 14
    • Cassandra Token Ring!Case 1: Inter-Region Communication! East West cassandra04   cassandra01   cassandra04   cassandra01   [192-­‐255]   [0-­‐63]   [193-­‐0]   [1-­‐64]   cassandra03   cassandra02   cassandra03   cassandra02   [128-­‐191]   [64-­‐127]   [129-­‐192]   [65-­‐128]   Key hashes to 157? 15
    • How Cassandra Writes!Case 1: Inter-Region Communication! East West cassandra04   cassandra01   cassandra04   cassandra01   [192-­‐255]   [0-­‐63]   [193-­‐0]   [1-­‐64]   cassandra03   cassandra02   cassandra03   cassandra02   [128-­‐191]   [64-­‐127]   [129-­‐192]   [65-­‐128]   Writes goes here. 16
    • Cross Region Messaging (Hiveway)! Case 1: Inter-Region Communication! Cross region messaging over HTTPS with compression.!Messages Messages local   remote   hiveway   hiveway   17
    • Smooth Code Pushes!Use Case 2: Zero Downtime Builds! 18
    • Mirror Environment Cutover!Use Case 2: Zero Downtime Builds! Easy migrations and upgrade path.! Can be more expensive.! 19
    • Rolling Deploy!Use Case 2: Zero Downtime Builds!More complicatedmigrations andupgrades.!!Longer deploy window.!!Usually cheaper.!! 20
    • Fabric Pseudocode!Use Case 2: Zero Downtime Builds!  for  region  in  regions:      for  app  in  apps:          for  server  in  region:              if  app  on  server:                  maintenance  app                  scp  new  code  to  <deployment_tag>  dir                  symlink  app/current  to  app/<deployment_tag>                  restart  app                  wait  for  healthy! 21
    • Health Checks at BrightTag!Use Case 2: Zero Downtime Builds!! Standardized health checks across services.! ! ! $  curl  -­‐si  http://service/bthc   HTTP/1.1  204  No  Content     $  curl  -­‐si  http://service/bthc?action=maint   HTTP/1.1  500  Internal  Server  Error   Connection:  close   Content-­‐Length:  5     MAINT   22
    • Keeping an Eye on the Pulse!Use Case 2: Zero Downtime Builds!At a glance environment health.! 23
    • Runtime Controls!Use Case 2: Zero Downtime Builds!Provide multiple modesof operation.! 24
    • ConnectivityUse Case 3: Generating /etc/hosts
    • What is Zerg?!Use Case 3: Generating /etc/host +   =   26
    • Flask and libcloud Working Together!Use Case 3: Generating /etc/hostsDRIVER_MAPPING  =  {        "dev":  {          "office":  get_driver(Provider.EUCALYPTUS)(              DEV_ID,  secret=DEV_KEY,  host="openmaster",  port=8773,              secure=False,  path="/services/Cloud")      },      "prod":  {          "us-­‐east-­‐1":  get_driver(Provider.EC2_US_EAST)(PROD_ID,  PROD_KEY),          "eu-­‐west-­‐1":  get_driver(Provider.EC2_EU_WEST)(PROD_ID,  PROD_KEY)      }  }    @app.route("/hosts/<env>/<region>")  def  hosts(env,  region):      hosts  =  DRIVER_MAPPING[env][region].list_nodes()      return  str([d.extra[private_dns]  for  host  in  hosts])  ! 27
    • The Zerg Code! Use Case 3: Generating /etc/hosts@app.route("/etchosts/<env>/<region>")  def  etchosts(env,  region):      driver  =  DRIVER_MAPPING[env][region]      sorted_nodes  =  sorted((node.name,  node.private_ips,  node.public_ips)  for  node   in  driver.list_nodes())      hosts  =  [{private_ip:private_ips[0],  name:name,  public_ip:public_ips[0]}   for  (name,  private_ips,  public_ips)  in  sorted_nodes]      response  =  render_template(etc_hosts.txt,  hosts=hosts)      return  Response(response,  content_type=text/plain)    Template:!#  The  following  lines  are  desirable  for  IPv6  capable  hosts  ::1  ip6-­‐localhost  ip6-­‐loopback  {%  for  host  in  hosts  %}  {{  "%-­‐21s%-­‐21s#  External:  %s"|format(host.private_ip,  host.name,   host.public_ip)  }}  {%-­‐  endfor  %}     28
    • The Zerg HTTP Response!Use Case 3: Generating /etc/hosts!$ curl –s http://zerg/etchosts/prod/eu-west-1# The following lines are desirable for IPv6 capable hosts"::1 ip6-localhost ip6-loopback10.0.0.10 server01 # External: 123.123.123.12310.0.0.11 server02 # External: 123.123.123.12410.0.0.12 server03 # External: 123.123.123.12510.0.0.13 server04 # External: 123.123.123.12610.0.0.14 server05 # External: 123.123.123.12710.0.0.15 server06 # External: 123.123.123.128 29
    • The bash update_hosts.sh script! Use Case 3: Generating /etc/hosts#  Set  variables  read  -­‐r  -­‐d    STATIC_HOSTS  <<  static_hosts  #  The  following  lines  are  included  by  default  127.0.0.1              localhost    #  DO  NOT  EDIT  THIS  COMMENT  -­‐  everything  after  this  line  is  managed  by  zerg!  static_hosts  cp  /etc/hosts  ${TMPDIR}/old_hosts  grep  -­‐B  5000000  #  DO  NOT  ${TMPDIR}/old_hosts  >>  ${TMPDIR}/static_hosts  cp  ${TMPDIR}/static_hosts  ${TMPDIR}/new_hosts  wget  -­‐qO-­‐  "http://${ZERG_IP}/etchosts/${E}/${R}"  >>  ${TMPDIR}/new_hosts  &&      if  [[  $(diff  ${TMPDIR}/new_hosts  /etc/hosts  |  wc  -­‐l  |  awk  {print  $1})  <  7   ||  ${FORCE}  ==  -­‐-­‐force  ]];  then        cp  ${TMPDIR}/new_hosts  /etc/hosts;  fi   30
    • Configuring Load Balanced Services!Use Case 4: Generating Load Balancer Configuration! Update timing tricky to get right! Too important to leave completely autonomous! 31
    • Consistency > *Use Case 4: Generating Load Balancer Configuration!Need a rock-solid foundation to deploy onto.
    • Single Puppet MasterUse Case 4: Generating Load Balancer Configuration!Set environment per-instance: /etc/puppet/puppet.confSymlink /etc/puppet/environments/ on master to variousgit checkouts of the source:$ cd /etc/puppet/environments$ ln –s ~/src/puppet/prod_stable prod_stable$ ln –s ~/src/puppet/dev_stable dev_stable$ ln –s ~/src/puppet/dev_test dev_testUse cron to keep all branches up-to-date
    • Source Controlled Puppet ConfigsUse Case 4: Generating Load Balancer Configuration! Each environment has its own branch. Make a new branch for every new feature. Merge into a test branch to test. Merge into stable.
    • The App Definitions in Zerg! Use Case 4 – Load Balancer Configs!APP_DEFS  :  {      "zerg":  {  "type":  "http",  "healthcheck":  {"port":  19999,  "resource":  "/zerghealth"}  },      "awesome":  {  "type":  "http",  "healthcheck":  {"port":  20000,  "resource":  "/ahc"},          "frontend"  :  "10080"  },      "haproxy_awesome":{  "type":  "http",  "healthcheck":  {"port":  20001,  "resource":  "/"}  },      "foo":  {  "type":  "http",  "healthcheck":  {"port":  20002,  "resource":  "/"},          "frontend"  :  "10081"  },      "mashed_potatoes":  {  "type":  "http",  "healthcheck":  {"port":  20003,  "resource":  ”/"},          "frontend"  :  "10082"  },      "haproxy_foo":  {  "type":  "http",  "healthcheck":  {"port":  20004,  "resource":  "/hc"}  },      "thehardproblem":  {  "type":  "http",  "healthcheck":  {"port":  20006,  "resource":  "/"}  },      "redis":  {  "type":  "tcp",  "healthcheck":  {"port":  20007,  "resource":  "/rhc"}  },      "dataserver":  {  "type":  "http",  "healthcheck":  {"port":  20008,  "resource":  "/"}  },          "frontend"  :  "10083"  },      "itshards":{  "type":  "http",  "healthcheck":  {"port":  20009,  "resource":  "/"}  },      "devnull":  {  "type":  "http",  "healthcheck":  {"port":  200010,  "resource":  "/hc"}  }  }   35
    • The Zerg Code! Use Case 4 – Load Balancer Configs!@app.route("/haproxy/<env>/<region>/<type>")  def  haproxy(env,  region,  type):      instances  =  get_region_manifest(region)      apps  =  {}      for  app  in  APP_DEFS[env]:          if  frontend  in  APP_PORTS[env][app].keys():              app_object  =  {                  servers:[],                  backend_port:  APP_PORTS[env][app][healthcheck][port],                  frontend_port:  APP_PORTS[env][app][frontend]              }              for  server  in  instances:                  if  app  in  instances[server][roles]:                      app_object[servers].append({name:server,  details:instances[server]})              apps[app]  =  app_object      return  render_template(haproxy_%s_%s_%s.txt  %  (env,  region,  type),  vips=apps)   36
    • The Zerg Flask Template! Use Case 4 – Load Balancer Configs!global                  blah                  blah    defaults                  blah                  blah    frontend  dataserver_vip                  bind  *:{{  vips.dataserver.frontend_port  }}                  default_backend  dataserver    frontend  mashed_potatoes_vip                  bind  *:{{  vips.mashed_potatoes.frontend_port  }}                  default_backend  mashed_potatoes      backend  dataserver                  balance  roundrobin                  {%-­‐  for  server  in  vips.dataserver.servers  %}                  server  {{  server[name]  }}  {{  server.details[private  ip]  }}:{{  vips.dataserver.backend_port  }}  check                  {%-­‐  endfor  %}    backend  mashed_potatoes                  balance  roundrobin                  {%-­‐  for  server  in  vips.mashed_potatoes.servers  %}                  server  {{  server[name]  }}  {{  server.details[private  ip]  }}:{{  vips.mashed_potatoes.backend_port  }}  check                  {%-­‐  endfor  %}   37
    • The Zerg HTTP Response! Use Case 4 – Load Balancer Configs!$  curl  –s  http://zerg/haproxy/<env>/<region>/<type>    globals  and  defaults  blah  blah  frontend  dataserver_vip                  bind  *:10083                        default_backend  dataserver    frontend  mashed_potatoes_vip                  bind  *:10082                  default_backend  mashed_potatoes    backend  dataserver                  blah  blah  options                  server  dataserv01  10.0.0.28:20008  check                  server  dataserv02  10.0.0.29:20008  check    backend  mashed_potatoes                  blah  blah  options                  server  taters01  10.0.0.30:20003  check                  server  taters02  10.0.0.31:20003  check   38
    • The Config Workflow! Use Case 4 – Load Balancer Configs! Zerg   (genera(on)   Git  (ops)   Script   (human)   Large  changes  to  templates   Git  (puppet)   (human)   Server   Server   Server   39
    • The bash update_haproxy.sh script!Use Case 4 – Load Balancer Configs!./update_haproxy.sh  <env>  <region>  <service>  **  Git  is  clean  and  in  sync  with  origin..  now  waiting  for  zerg  http  response..  [prod_stable  012345]  [puppet]  Haproxy  Auto-­‐Commit  for  <env>  <region>  <service>    1  files  changed,  2  insertions(+),  2  deletions(-­‐)  **  Template  pulled  and  committed  **  Here  is  the  diff  from  origin  to  the  new  version:  diff  -­‐-­‐git  a/modules/haproxy/templates/haproxy_<env>_<region>_<service>_cfg.erb    b/modules/haproxy/templates/haproxy_<env>_<region>_<service>_cfg.erb  -­‐-­‐-­‐  a/modules/haproxy/templates/haproxy_prod_us-­‐east-­‐1_tagserve_cfg.erb  +++  b/modules/haproxy/templates/haproxy_prod_us-­‐east-­‐1_tagserve_cfg.erb  -­‐      server  oldandslow01  10.0.0.23:20003  check  -­‐      server  oldandslow02  10.0.0.24:20003  check  +      server  taters01  10.0.0.30:20003  check  +      server  taters01  10.0.0.31:20003  check  **  Do  you  want  to  push  this  change?  (y/n)  y  blah  blah  successful  git  push  message  **  Commit  successfully  pushed  to  origin  **  All  done!   40
    • Whats really going on?!Use Case 5: Dashboards & Alerting!!Alerting, Monitoring & Visualization!! 41
    • What to monitor?!Use Case 5: Dashboards & Alerting!Identify metrics that actas signals.!!Add alerts after everyincident.! 42
    • Metric Polling at BrightTag!Use Case 5: Dashboards & Alerting! datahub   datahub   redis   tagserve   redis   tagserve   mpoller   cassandra   haproxy   cassandra   haproxy   mpoller   mpoller   graphite   graphite   carbon   carbon   mpoller   43
    • Graphite!Use Case 5: Dashboards & Alerting! Storage of historical metrics allows for trending and comparisons.! ! Aggregation is performed on data retrieval via the webapp.! 44
    • Branches and Leaves!Use Case 5: Dashboards & Alerting!Expose a "metrics"service per region.!!Enables a flexibletopology.!! 45
    • Metric Aggregation at BrightTag!Use Case 5: Dashboards & Alerting! datahub   metrics   metrics   redis   metrics   tagserve   cassandra   haproxy   dashboard   46
    • Realtime Numbers Across Regions!Use Case 5: Dashboards & Alerting!Requests are farmed out to each metrics service. 47
    • Visualization!Use Case 5: Dashboards & Alerting!!Different visualizations tell you different things.! 48
    • Alerting!Use Case 5: Dashboards & Alerting!Tattle allows us to alert on any metric in Graphite.!!Alerting is done per region.! 49
    • Fabric vs Puppet!Deployment!Fabric is push, puppet is pull.!!Businesses dont move as fast as infrastructurechanges, but configs have to stay up to date all thetime.!(/etc/hosts)  (systempoller.py)  (mashed_potatoes.env)                        (dataserver.war)  puppet  =====================================  fabric  (real-­‐time  up-­‐to-­‐date)                  (moderately  up-­‐to-­‐date)                                    (weekly)  ! 50
    • Virtual Machines!Designing for the Cloud!Have to go with whatcloud provider offers.!!Not always ideal forevery workload.! 51
    • There are no Silver Bullets!(but if you find one let us know)! 52
    • Questions?! 53