Hans Kristian Flaatten
Hans Kristian Flaatten
Hans Kristian Flaatten
• Support for Node.js and other projects.
• Which requires:
• Wide platform coverage
• High availability of build farms.
• Automation and documentation to reduce bus factor
• We have no 24/7 on-call staff!
• Problem: How do we give people the confidence to fix
machines, in architectures they’re unfamiliar with?
• Solution: One-click “destroy and reprovision machine”
• Currently acknowledged on the Build WG README.
• Want to do something like
• Only basic html knowledge required! nodejs/nodejs.org#1257
• Get involved!
My name is Gibson, and today I’m going to talk about the Node Build Working Group. Where we are today, where we’re trying to get to, and how you can help.
If you have any questions, criticisms, can’t understand my accent, or spot any spelling mistakes during this talk, feel free to put up a hand at any point.
Apologies in advance for my voice, Stephen and Myles had some banging beats last night.
Let’s start with my favourite part of the talk, talking about me.
My name is Gibson Fahnestock. I work for IBM in the Runtimes Node team. We do a lot of work in the community, and we also ship our own build of Node.js. We recently released a build of Node that runs on z/OS, and we’re working on getting that upstreamed into the community. So if any of you happen to have a mainframe at home, feel free to try running Node on it!
On the community side I’m a core collaborator, and I’m also involved in several of the working groups. If you’re not familiar with Node.js Working Groups, they’re basically specialised task forces that focus on key areas.
I’m in Build, Release, Moderation, and CitGM. If you want to learn more about any of these, or anything , feel free to find me online, or even in real life.
The talk is split into three easy-to-digest sections, and I’ve put little numbers at the top, in case you’re counting down the minutes.
The first is The Road so Far: how we currently operate and what we can do today.
The second is the Road Ahead, what we’re progressing towards, and all the stuff we want to do.
The last is how you can get involved. The world of devops can seem forbidding, but it’s actually a really great way to get involved with Open Source, and it’s easier than you think.
Unless you already think it’s easy, in which case … (feel free)
(in which case) … feel free to use this angry tweet template. I’ve actually included sample tweets, that could also be used as comments on the Hacker News, for easy flaming.
So this rogues gallery is the current membership of the Build WG.
There are two different groups of people in the team, the first … (is the oldtimers)
(the first) … is the oldtimers, these people have been in the build working group for aeons, they built the Node infrastructure up with their bare hands, and they are comfortable getting into the bowels of a machine and digging around to fix problems manually.
The second is the newer members. Everyone here joined the team in the last year. I’ve barely worked out where half the machines are so far.
One of the key changes we want to make is to reduce the barrier to entry, and make it easier for people to get involved.
The mission of the WG is to give the rest of the Node foundation everything they need to make sure Node runs everywhere. Kinda like Q branch in a Bond movie, we give you the gadgets, and you go save the world.
We provide the infra which allows Node core, and other top level projects like libuv, node-gyp, and llnode, to compile, run test suites and get benchmark results on a bunch of different platforms.
We also host nodejs.org, which contains some great node binaries, and there are also some docs or something.
There are 570 people in the node foundation, and over 100 collaborators. When you have this many users, high availability is pretty important. Giving a wider group the power to fix issues is key to maintaining an open source build farm, and part of this talk is about how you can manage that.
These are the amazing people who provide infrastructure for the foundation. Let’s just take a moment to thank everyone who helps us make sure Node runs everywhere.
We also have a bunch of Raspberry Pis that were donated by generous community members. It’s great to know that if you’re willing to donate, you too could have a Pi in Rod’s basement
And there they are, one day one of these could have your name on it
Our sponsors allow us to rack up some pretty impressive stats, we currently have over 165 build machines! We also cover a bunch of different platforms. Every PR is tested against almost 45 platforms!
Of course that sounds cooler when at least one of the builds are actually green.
All a new collaborator needs to know about running CI is that you go to a URL and fill in a form. However they can also dive into the individual jobs for more control.
There are three things we rely on for high availability. Jenkins
You know what, there’s actually a problem with this slide. Jenkins is just not inspiring enough.
Okay, that’s scary
Cute, but not crazy enough
Okay, getting good
There we are, that’s what Jenkins should look like.
So we use Jenkins for job management, Cloud provisioning for super-fast machine creation,
and Ansible for machine configuration.
The first part is Jenkins. I’m just going to leave that guy there. If you maintain an open-source project you probably use Travis, and Travis is great.
But when you need more manual control, and when you want to support a wider range of machines, sometimes you need the raw power and Java heap space errors that only Jenkins can offer.
We have two Jenkins instances, a public one anyone can go see, and a private one where we do all the top-secret stuff I’m not allowed to talk about.
Oh, and naturally the most important feature is the UI, so no expense was spared on the theme.
Keeping lists of people co-ordinated is a pain, so we just use the Github teams we already have to give each Node group access to their own jobs.
When you’re running build and test for each PR 45 times, and you’re getting 30-50 builds a day, that’s around 2,000, it’s pretty important to be quick.
If a machine goes down and we don’t have a spare, everything stops, leading to complaints.
One thing that really makes a difference is to cache everything you can. Cache your downloads and git clones, and use this great tool called ccache to cache compilation results, so if you compile something that is 99% the same, you only have to recompile what you changed.
The other thing we do … (is fanning).
(The other thing we do) … is fanning, cause computers get hot too.
Okay, mandatory GIF out the way, it is pronounced JIF by the way, I’m happy to debate that with anyone afterwards.
So fanning is when you split out a build and test to run on multiple machines in parallel. So yes, your CI runs may finish really quickly on your mainframes, but on your 1st generation Raspberry Pis they can take a bit longer.
But hey, Pis are cheap, and Rod’s basement is large, so we can have plenty of them.
We also want to make the onboarding easier for new Build team members (beards are not required by the way).
And one of the key ways we do this is with Infrastructure as Code. If you haven’t used Ansible before, it’s a way of automating machine setup and configuration. Basically it’s like the set of bash scripts you have to set up your machines, but much much more complicated and full-featured.
The other great thing about this is that anyone can set up their own machines to build and run Node with the same scripts. I mentioned that we do our own builds of Node at IBM, well we’re working on making our machine configuration use the community one.
And now we come to the second part of our talk, the quest.
We want to reduce the barrier to entry, and increase the pool of people who can fix Node infra issues.
So, this is the job that builds Node releases. I'll give you time to read through this. Editing this file is a bit like coding in the dark, you’re wandering around a job that goes on … (and on)
(goes on) … and on .
Another problem with Jenkins was that you had to enter the configuration information in the job itself. Everything is stored in a giant XML file, which is pretty hard to read, and pretty much impossible to edit.
As a general rule, if it’s not code stored in a Git repo with Pull Requests, it’s invisible (and it rots).
Fortunately the folks at Jenkins have been working hard on a solution.
It’s called pipelines. A pipeline allows you to put all the configuration in a Jenkinsfile stored in Git, basically like a travis.yml on steroids.
It also allows us to open up our jobs for anyone to contribute to.
Visibility is important, take this code for example. Can you spot the issue with it?
I’ll give you a hint, it works fine now, but it’s going to start to cause problems around April next year.
The issue is that the NODE_VERSION code just takes the first character from the Node version, so Node 10.0.0 will become Node 1.x
The point of this isn’t to shame the person who wrote it, the point is that with enough eyeballs, all bugs are shallow.
Also there’s no git blame, so I can’t find out who wrote it to shame them.
So, the problem with Jenkins pipelines is that they’re new technology, no-one in the current build team has much experience with them. So this is where you come in. If you’ve used pipelines before, or you’re willing to learn, then please come and get involved.
Shout-out to Jon, who magically showed up and raised this issue a week after we decided in a meeting that we should probably look at this pipeline stuff. Help with this would be really amazing, so come talk to us.
Imagine that you’re a node collaborator like Brian, and you get this error when you try to run CI on your Pull Request. What do you do?
If you’re an experienced sysadmin you would ssh into the machine and … (do stuff)
(and) … do stuff until it’s fixed.
But what if there was a better, no prior knowledge solution?
Ansible scripts, especially when run through a graphical interface like Ansible Tower, allow you to simplify most problems down to “Click button to reprovision machine”
Not all the stuff we do is managing machines. One of the things we want to do is have a really nice Sponsors page on the website, to properly thank our sponsors. Being in the Build WG readme is pretty great, but how many people have actually seen the build WG readme?
This is something where a frontend developer could probably make something really nice in five minutes, whereas I’d flail around for half an hour and make some monstrosity.
This is an example of something another open-source project did. It looks really professional, and it’d be great to have something like that.
By the way, if you haven’t tried out the new superfast firefox nightly, I recommend checking it out.
So, to wrap up there’s loads of really exciting stuff we’re doing at the Build working group, and we’d really like your help.