A closer look to locaweb IaaS

1,343 views
1,283 views

Published on

Apresentação sobre IaaS da Locaweb na RubyConf 2012 Brazil

Published in: Technology

A closer look to locaweb IaaS

  1. 1. A closer look to Locaweb Iaas Gleicon Moraes Engineering Manager PaaS/IaaS @gleicon - http://blog.7co.cc
  2. 2. Agenda
  3. 3. Agenda• Engineering Team
  4. 4. Agenda• Engineering Team• IaaS
  5. 5. Agenda• Engineering Team• IaaS• Virtual/Physical servers
  6. 6. Agenda• Engineering Team• IaaS• Virtual/Physical servers• Architecture
  7. 7. Agenda• Engineering Team• IaaS• Virtual/Physical servers• Architecture• OSS
  8. 8. Agenda• Engineering Team• IaaS• Virtual/Physical servers• Architecture• OSS• Provisioning
  9. 9. Agenda• Engineering Team• IaaS• Virtual/Physical servers• Architecture• OSS• Provisioning• CMDB/Closed Loop
  10. 10. Agenda• Engineering Team• IaaS• Virtual/Physical servers• Architecture• OSS• Provisioning• CMDB/Closed Loop• Resource usage gathering
  11. 11. Agenda• Engineering Team• IaaS• Virtual/Physical servers• Architecture• OSS• Provisioning• CMDB/Closed Loop• Resource usage gathering• Software defined networks
  12. 12. Engineering Team• We aim to be efficient• DC and IaaS Automation• IaaS and PaaS products• Email and Domain Registration products• Coffee/psychological help/counseling• 40 people team (devs/architects/1 master devops)
  13. 13. IaaS - NIST definition “ The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).”* http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
  14. 14. IaaS - Wikipedia “In this most basic cloud service model, cloud providers offer computers, as physical or more often as virtual machines, and other resources. The virtual machines are run as guests by a hypervisor, such as Xen or KVM. Management of pools of hypervisors by the cloud operational support system leads to the ability to scale to support a large number of virtual machines. Other resources in IaaS clouds include images in a virtual machine image library, raw (block) and file-based storage, firewalls, load balancers, IP addresses, virtual local area networks (VLANs), and software bundles.[46] IaaS cloud providers supply these resources on demand from their large pools installed in data centers. For wide area connectivity, the Internet can be used or—in carrier clouds -- dedicated virtual private networks can be configured.”* http://en.wikipedia.org/wiki/Infrastructure_as_a_service#Service_models
  15. 15. IaaS - tl;dr“Automate infrastructure such as the customer will notknow the underlying details, will not manage them and canprovision services automagically.”
  16. 16. IaaS - building blocks• Servers: virtual and physical• Storage area• Network devices: firewall, switches, load balancer
  17. 17. IaaS - High Level• Automation• Resource Management• Install, Uninstall, Migrate• High Availability, Scalability, Capacity Planning
  18. 18. IaaS at Locaweb• 3 DCs, 6k Servers (physical), 1k storages 6PB area, 12K network equipments/ports, > 100 Km of cables• 10k VMs, 3.2M email accounts, 250k hosting customers, ~500k sites, ~600k DB• 130 people at day to day 24/7 Operations team (from DC basics to managing apps and platforms), < 40 sysadmins• Currently ~ 18 people from Engineering team taking care of IaaS
  19. 19. Virtual and Physical• Single tenant per Physical Server• Single tenant per VM• Multiple tenants per VM• Multiple tenants per Physical Server• Multiple VMs per Physical Server
  20. 20. Virtual and Physical• Single tenant per Physical Server• Single tenant per VM• Multiple tenants per VM• Multiple tenants per Physical Server• Multiple VMs per Physical Server = Cloud
  21. 21. Cloud• Check back NIST definition• Hypervisor + set of servers + set of storages + network = time sharing• Distinct capacity planning than physical servers• Flexible configuration options• Vertical Scaling• Horizontal scaling
  22. 22. Architecture - Cloud Internet Main Network NetworkPhysical Servers Firewall hypervisor ovs GearSimplestack SimpleNet/Quantum
  23. 23. Architecture - Physical Internet Main Network Why not ? NetworkPhysical Servers Firewall Gear Simplestack SimpleNet/Quantum
  24. 24. OSS
  25. 25. OSS• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs
  26. 26. OSS• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs• Up-to-date technology
  27. 27. OSS• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs• Up-to-date technology• No lock-ins
  28. 28. OSS• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs• Up-to-date technology• No lock-ins• Vendor neutral
  29. 29. OSS• Ruby, Rails, Python, CFEngine, PostgreSQL, MySQL, Cassandra, Redis, XEN, KVM, Haskell, Cyclone.io, bottle.py, Quantum, R, EjabberD, Resque, lots of gem, eggs• Up-to-date technology• No lock-ins• Vendor neutral• We contribute back
  30. 30. Our projectshttp://locaweb.github.com
  31. 31. Our projects
  32. 32. Our projects• Leela - Data collection monster
  33. 33. Our projects• Leela - Data collection monster• SimpleStack - Provisioning made easy
  34. 34. Our projects• Leela - Data collection monster• SimpleStack - Provisioning made easy• SimpleNet - OVS and FW controller
  35. 35. Our projects• Leela - Data collection monster• SimpleStack - Provisioning made easy• SimpleNet - OVS and FW controller• NET/L2 - Controller/Inventory for network equipment
  36. 36. Our projects• Leela - Data collection monster• SimpleStack - Provisioning made easy• SimpleNet - OVS and FW controller• NET/L2 - Controller/Inventory for network equipment• BrickLayer - packaging for normal people
  37. 37. Our projects• Leela - Data collection monster• SimpleStack - Provisioning made easy• SimpleNet - OVS and FW controller• NET/L2 - Controller/Inventory for network equipment• BrickLayer - packaging for normal people• Logix - Graylog2 message bus for log streams
  38. 38. Our projects• Leela - Data collection monster• SimpleStack - Provisioning made easy• SimpleNet - OVS and FW controller• NET/L2 - Controller/Inventory for network equipment• BrickLayer - packaging for normal people• Logix - Graylog2 message bus for log streams• xenapi-ruby - XEN API bindings for Ruby
  39. 39. Our projects• Leela - Data collection monster• SimpleStack - Provisioning made easy• SimpleNet - OVS and FW controller• NET/L2 - Controller/Inventory for network equipment• BrickLayer - packaging for normal people• Logix - Graylog2 message bus for log streams• xenapi-ruby - XEN API bindings for Ruby• otto, debundler, bpmachine and more each week
  40. 40. Our Contributions
  41. 41. Our Contributions• Contributed to Quantum, from Openstack
  42. 42. Our Contributions• Contributed to Quantum, from Openstack• Snorby/snort contributions
  43. 43. Our Contributions• Contributed to Quantum, from Openstack• Snorby/snort contributions• Mod_security for Nginx and helping on IIS
  44. 44. Our Contributions• Contributed to Quantum, from Openstack• Snorby/snort contributions• Mod_security for Nginx and helping on IIS• hired consulting from grsecurity and dovecot teams - we support OSS companies
  45. 45. Bricklayer
  46. 46. Bricklayer• First opensource project from Locaweb
  47. 47. Bricklayer• First opensource project from Locaweb• Package builder (deb + rpm) straight from git
  48. 48. Bricklayer• First opensource project from Locaweb• Package builder (deb + rpm) straight from git• 150+ projects, 500+ builds/day
  49. 49. Bricklayer• First opensource project from Locaweb• Package builder (deb + rpm) straight from git• 150+ projects, 500+ builds/day• tag your project, get the packages done and on repositories
  50. 50. Logix
  51. 51. Logix• We have lots of logs. Everything broke.
  52. 52. Logix• We have lots of logs. Everything broke.• 26.753.205.474 lines of log/day
  53. 53. Logix• We have lots of logs. Everything broke.• 26.753.205.474 lines of log/day• Highly distributed: local syslog daemon to RabbitMQ
  54. 54. Logix• We have lots of logs. Everything broke.• 26.753.205.474 lines of log/day• Highly distributed: local syslog daemon to RabbitMQ• Elastic search + graylog2 to store, search and filter
  55. 55. Provisioning• Ruby: Panel, Control panel, Scheduler• Python: Provisioning, Server management, Metric collection• REST APIs to Hypervisor, Network, Firewall, XMPP
  56. 56. Provisioning - Cloud InternetCloud Control Panel API Sales Main Network Network Physical Servers Firewall hypervisor ovs Gear Provisioner Simplestack SimpleNet/Quantum
  57. 57. Provisioning - Servers Dedicated Servers Control Panel API Closed Loop Racked Servers Sales Internet Main Network Network Physical Servers Firewall hypervisor ovs Gear Simplestack SimpleNet/Quantum
  58. 58. Provisioning - Managed ServersManaged Servers Control Panel Sales Dedicated Servers Control Panel PaaS API Closed Loop Provisioner Racked Servers Sales Internet Cloud Control Panel API Sales Main Network Network Physical Servers Firewall hypervisor ovs Gear Provisioner Simplestack SimpleNet/Quantum
  59. 59. Cloud provisioner Jobs DHCP CMDB FW API quantum/Control Panel core Resque simplenet Sales simplestack Notifications Leela console
  60. 60. Closed loopThe closed loop process CMDB API Conductor Network Futurama Cobbler Hardware
  61. 61. Closed loop
  62. 62. Closed loop• All servers get racked, wired, tested and configured
  63. 63. Closed loop• All servers get racked, wired, tested and configured• Power management discovery
  64. 64. Closed loop• All servers get racked, wired, tested and configured• Power management discovery• Network configuration
  65. 65. Closed loop• All servers get racked, wired, tested and configured• Power management discovery• Network configuration• OS install: Windows, Linux and OpenSolaris aware
  66. 66. Closed loop• All servers get racked, wired, tested and configured• Power management discovery• Network configuration• OS install: Windows, Linux and OpenSolaris aware• Server life cycle: once deactivated it gets back to the pool to be used again
  67. 67. CMDB Futurama Power audit Ops NET/L2 Frontend Controllers Database API Product Resque provisioners IP provisioningServer provisioning IT chg SAP management
  68. 68. Futurama Management CFEngineServer side Planet Express Leela- CF-Agent Server Conductor- CMDB Leela-agent bkp-agent audit Cegonha Asdrubal CFTools
  69. 69. Resource Metering and Monitoring - Leela Cassandra Leela- Cassandra Cassandra Lasergun Leela-agent Leela- Reader Cassandra Cassandra Cassandra API
  70. 70. Resource Metering and Monitoring - Leela
  71. 71. Resource Metering and Monitoring - Leela• 18k writes/sec
  72. 72. Resource Metering and Monitoring - Leela• 18k writes/sec• 6 TB total per cluster
  73. 73. Resource Metering and Monitoring - Leela• 18k writes/sec• 6 TB total per cluster• 13 baseline metrics + 68 distinct metrics
  74. 74. Resource Metering and Monitoring - Leela• 18k writes/sec• 6 TB total per cluster• 13 baseline metrics + 68 distinct metrics• ~600GB/mo
  75. 75. Resource Metering and Monitoring - Leela• 18k writes/sec• 6 TB total per cluster• 13 baseline metrics + 68 distinct metrics• ~600GB/mo• 1M keys (~5k servers)
  76. 76. Resource Metering and Monitoring - Leela• 18k writes/sec• 6 TB total per cluster• 13 baseline metrics + 68 distinct metrics• ~600GB/mo• 1M keys (~5k servers)• Write latency: 15 us
  77. 77. Resource Metering and Monitoring - Leela• 18k writes/sec• 6 TB total per cluster• 13 baseline metrics + 68 distinct metrics• ~600GB/mo• 1M keys (~5k servers)• Write latency: 15 us• Read latency: 1s to read 1mo worth of data
  78. 78. Resource Metering and Monitoring - Leela• 18k writes/sec• 6 TB total per cluster• 13 baseline metrics + 68 distinct metrics• ~600GB/mo• 1M keys (~5k servers)• Write latency: 15 us• Read latency: 1s to read 1mo worth of data• Down to minute resolution
  79. 79. Resource Metering and Monitoring - Leela• 18k writes/sec• 6 TB total per cluster• 13 baseline metrics + 68 distinct metrics• ~600GB/mo• 1M keys (~5k servers)• Write latency: 15 us• Read latency: 1s to read 1mo worth of data• Down to minute resolution• http://leela.readthedocs.org/en/latest/intro/archnut.html
  80. 80. Resource Metering and Monitoring - Leela
  81. 81. Resource Metering and Monitoring - Leela• Map/Reduce with SQL like interface:
  82. 82. Resource Metering and Monitoring - Leela• Map/Reduce with SQL like interface: -SELECT mov_avg_samples = 7 (function)
  83. 83. Resource Metering and Monitoring - Leela• Map/Reduce with SQL like interface: -SELECT mov_avg_samples = 7 (function) -FROM cpro9559.cpu.cpu8.idle (metric)
  84. 84. Resource Metering and Monitoring - Leela• Map/Reduce with SQL like interface: -SELECT mov_avg_samples = 7 (function) -FROM cpro9559.cpu.cpu8.idle (metric) -WHERE timestamp >= 1346279003 (timeframe)
  85. 85. Resource Metering and Monitoring - Leela
  86. 86. Resource Metering and Monitoring - Leela• Create charts
  87. 87. Resource Metering and Monitoring - Leela• Create charts - var widget = LEELA.widget(jQuery.(“#target”));
  88. 88. Resource Metering and Monitoring - Leela• Create charts - var widget = LEELA.widget(jQuery.(“#target”)); - jQuery.ajax(“/v1/pastweek/ cpro9559.cpu.cpu8.idle”, {dataType: “jsonp”, success: widget.render});
  89. 89. Software defined network
  90. 90. Software defined network• Traditional equipment: local config and controller
  91. 91. Software defined network• Traditional equipment: local config and controller• SDN: flows (commands), openflow 1.0, central controller, distributed data plane
  92. 92. Software defined network• Traditional equipment: local config and controller• SDN: flows (commands), openflow 1.0, central controller, distributed data plane• Abstraction over VLANs with ACLs, Tunnels or even VLAN QoQ
  93. 93. Software defined networkSwitch Vendor A Switch Vendor B Data path (hardware) Data path (hardware) Control path Openflow Control path Openflow API Controller OpenVSwitch
  94. 94. Software defined network Cisco Force 10 HP OpenVSwitch FirewallsAPI Net/L2 Quantum CMDB
  95. 95. Ruby @ LocawebNot only for front-end
  96. 96. ?
  97. 97. Thanks !

×