Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Let's build a PaaS platform, how hard could it be?

246 views

Published on

Presentation given by Błażej Kasperczyk at Pykonik meetup in Kraków.

How many applications, and where do we put them? Why is our system so bad at keeping up with what the users want? What to do in case of a noisy neighbour?

When you're aiming to provide a platform where the developers could easily launch an application without worrying about configuring the system, you will have to code it sooner or later. As with most very simple concepts, it presents a plethora of challenges to deal with.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Let's build a PaaS platform, how hard could it be?

  1. 1. Błażej Kasperczyk, Kraków, 05.10.2017 Hey, let's build a PaaS Cloud! ...it's easy, right?
  2. 2. Team PaaS... 3 • DevOps team • Develop and maintain the Platform • Backend-oriented • Python 3.x, Tornado "With the friends you have in your team, you don't really need enemies!"
  3. 3. • Approx. 2300 VMs of varying sizes • 1400 active applications, 600 of them in Python3.x • 9000+ running instances • ...a third of it is Python, Tornado-based applications ...And our little cloud – and what runs on it 4
  4. 4. • Push-button deployment • Scale by available resources and the amount of applications • Quick application installation with our build system • Communications bus between applications The PaaS layer 5
  5. 5. The slow start • Work started in 2011 • Python2.7 + GEvent • Works over SSH • Push model • Hard limit: 10 applications on each vm • ...and it works! 6
  6. 6. The inevitable • Approx. 150 VMs max • 300 VMs becomes a hard limit that cannot be bypassed • A single point of failure 7 While the panel was primitive, "papyrus" was a top trending colour!
  7. 7. • In place of the old orchestrator – a table of states and a coordinator • An API that exposes what needs to be done to reach the desired state • A daemon running on the VM handles the rest • It's 2013 - let's be modern, let's do it in Python3! A moment of reinvention: What if we use our cloud, to scale our cloud? 8
  8. 8. Scoreboard • Coordinates cloud management • PostgreSQL backend • Responsible for provisioning • Supports over 2000 machines... • ...each querying multiple times every minute... • ...currently. • It can rebuild itself in case of a database failure 9
  9. 9. Agent daemon • Runs on the VM it manages • Automatically launched with each new VM • Launches and maintains applications • Reports statistics for monitoring purposes • Allows the developer to remotely shut the application down 10
  10. 10. Density problems • Over-taxing VMs causes performance issues • As it is, the allocation is hit and miss. 11
  11. 11. Weight balancing • Each VM has a capacity limit • Each application declares its size • Light (White/Green) • Medium (Yellow) • Heavy (Red) • ...that should do it, right? 12
  12. 12. Oversized cats • A worker can have spikes of 100% CPU usage and 10% averaged. • An application can declare high usage but be harmless. 13
  13. 13. The RnD • Docker? • LXC? • ...CGroups? 14
  14. 14. Docker • Requires a major overhaul of our application building and deployment... • ...and will actually do what we already have. 15
  15. 15. LXC • Current architecture requires a lack of network translation between the Agent and Application... • ...and that caused issues when launching applications 16
  16. 16. CGroups! • The same mechanism that is used by most containers • Automatic cleanup • Simplicity of the solution 17
  17. 17. • Applications in the cloud no longer exceed their assigned resources • CPU is limited for each instance • OOMKiller kicks in for memory-heavy applications that tries to exceed its limits Everything is now in a box... 18
  18. 18. • Time does not stop, or that time we went Xenial and got eaten by SystemD • The Damocles' sword called "Impending Knapsack Problem" • Autoscaling • ...and a few other things ...time to relax, right? 19 As a side effect, we actually made a sane frontend.

×