From hello world to goodbye code

Kim Moir
Kim MoirRelease Engineer at Mozilla Corporation

At some point, the code you write today will be deleted and replaced with something new. This talk will discuss the life cycle of a large code base, and how to manage it over time to accommodate rewrites, giving examples from a major rewrite of the Firefox build and release pipeline over the last two years. You'll learn how to replace components of a running distributed system while keeping it operational, the proverbial replacing the wing of an airplane in flight.

From Hello World to
Goodbye Code
Kim Moir, Staff Release Engineer, Mozilla, @kmoir
Bonjour à toutes et à tous, hello. I’m very happy to see you all this morning. Je suis
très heureuse de vous voir tous ce matin. My name is Kim Moir and I’m a staff release
engineer at Mozilla. Montreal in January is only slightly colder than Ottawa in January
where I live, so I was not scared off by the weather.
I’ve been paid to work full-time in open source since 2001. Before that I worked in
government, education, and at other tech companies. Before that I was a student just
like you. We didn’t have email on our phones, in fact, we barely had email. I’ve been
working longer than most of you have been alive. But that’s okay. If I can survive 20+
years in the tech industry, so can you.
Mozilla is most well known for building Firefox the web browser. As well as for their
mission to make the internet open and accessible to all. I don’t work on the Firefox
code base itself. As a release engineer, I write tools to scale our large build and
release pipeline that transforms the Firefox code into a shippable product. This
pipeline is a large distributed system. We are constantly optimizing this system to be
more scalable, more resilient to failure and modifying the services it provides.
Outside of work, I like baking and running long distances. I have an amazing family
too! I put these pictures up here to show you that as a developer you can have a life
outside of work. Our industry tends to glamourize long hours at the keyboard at the
expense of everything else but it doesn’t have to be that way.
Firefox logo
https://blog.mozilla.org/blog/2017/11/14/fast-for-good-launching-the-new-firefox-into-t
he-world/
Today’s agenda
● The life cycle of code
● Distributed systems
● Replacing components of a running distributed system
● You can try it too!
● The life cycle of code
● Distributed systems
● Replacing components of a running distributed system (in the context of
Firefox pipeline rewrite)
● You can try it too!
“And as everyone knows, the best kind of laughter is
laughter born of a shared memory.”
― Mindy Kaling, Why Not Me?
Let’s create some memories and talk about distributed systems and deleting code!
Hands up, how many of you have worked with a completely new code base in a work
context? How many of you have worked with a existing code base?
I’ve mentored a number of interns over the years, and one thing that I notice is that
many school assignments are based on a completely new code base. I understand
that this is done because everyone is learning language semantics, ui or testing
frameworks for the course curriculum.
In most companies, you will be looking at a existing code base. Even if you start your
own company, you will probably use existing open source or language specific
libraries, or call existing APIs. So an really important skill is learn how to work with an
existing code base.
Photo by Markus Spiske on Unsplash
Photo by Francesco Gallarotti on Unsplash
Often an existing code base is like a very large, well established forest that you need
to walk around in for a few hours, days or even a few weeks. Just to understand how
it all works.
Photo by Koen Eijkelenboom on Unsplash
It’s also good to talk to other people that have wandered in the code before. What do
they know? What can you learn from them? Asking lots of questions as a software
engineer is one of of the most important skills you can learn.
Healthy code bases and their teams
● Documented shipping and deployment processes that work
● Ship new binaries or provide updates on a regular basis
These are things that I look for when I look at a new code base. As a release
engineer, I’m biased to these qualities because I really care about shipping.
Is the process documented on how to ship?
Can more than one person ship the product or is this a magical set of steps that only
one person knows how to execute?
How often do you deploy or update users
Healthy code bases and their teams (cont’d)
● Readable code
● Tested code - correctness, integration, performance
● Feedback mechanism between developers and users
Is it readable code or is there dead code and tests?
Are there tests with a reasonable level of code coverage?
Where do you report bugs? Or request new features?
Is there telemetry that report failures in the product automatically?
● Code ownership and review is shared among multiple people
● Ownership = responsibility for change
● This doesn’t mean that you have to do everything yourself
● You can serve a code reviewer and mentor new people
● People need to CARE about the code and the people who use and maintain it
Healthy code bases and their teams (cont’d)
When I used to work in the Eclipse community many years ago, the project I worked
on didn’t have a code review process in place until a few weeks before the release
each release. The problem was this approach was that there were limited people
who understood different components. And when they decided to leave the
community, the expertise left with them. (This process has since changed and they do
have more code review in place)
At Mozilla, we have the concept of module ownership and a robust code review
process. This helps a larger group of people understand components of the code
base because people are required to evaluate contributions. Reduces the bus factor
as well when people leave.
● Photo by John Baker on Unsplash
● Examples of old code bases actively being updated
○ Voyager space probes (~40 years)
○ Airplanes (~30 year service lifetime)
○ Industrial robots (~20 years)
○ The first Firefox release was over 15 years ago. I’m not sure how much
of the original code base remains. I often think that large code bases
are like the cells in a human body, over time, much will be replaced by
new, but eventually it will die.
Industial robots
https://www.bastiansolutions.com/blog/index.php/2015/04/30/increase-life-span-of-ind
ustrial-robot/
Voyager https://www.nasa.gov/mission_pages/voyager/index.html
Social implications of old code
Updating voyager software
https://www.quora.com/Was-the-opportunity-to-update-the-Voyager-spacecraft-firmwa
re-ever-considered-If-there-are-plans-to-launch-another-Voyager-could-we-keep-upda
ting-its-Earth-information-content
Nasa retiring engineer voyager
http://www.popularmechanics.com/space/a17991/voyager-1-voyager-2-retiring-engine
er/
There are also social issues to maintaining old code bases. For instance, last year
NASA was looking for a new developer to maintain it’s code base for the Voyager
Space probes because last of the original team members were getting ready to retire.
Firefox continuous integration
Land code
Unit tests
Decision
graph
Builds x N
platforms
Performance
tests
Sign Builds
This is a very simplified diagram of the process that occurs when a developer lands
code on our build pipeline. With her commit, a decision graph is generated that lists
all the jobs that need to run. Then we build for four platforms - Linux, Mac, Windows
and Android. These builds are then signed, and we run unit tests and performance
tests so the developers can see the results of their commit. Did the tests fail? Or are
there performance regressions they need to address?
Pipeline Metrics
● Constraints - it needs to be up and running all the time for developer
productivity
● ~500 commits a day
● 140K jobs a day
The build and release pipeline for Firefox is a large distributed system. Here are
some metrics about it
● Developers love to ship. In order to ship, they need feedback on their
patches. Can I ship this? Or does is there a regression that needs to be
backed out? Improves happiness if they can see the results of their work more
quickly
Photo by Uroš Jovičić on Unsplash
End to end times - This is the time from a developer lands a commit until we are able
to ship the finished product.
Why are they important?
1. Landing small incremental patches reduces risk. Too difficult to figure out what
went wrong on a high velocity team with a huge number of commits.
2. 0 days - we need to be able to get security patches to our users quickly. For
instance last week we released five releases to address the recent Meltdown
and Spectre vulnerabilities.
This is a picture of the Firefox release engineering pipeline from 3 years ago that
Selena Deckelmann created. It took (optimistically around 11 hours at that time to
ship a release from the time a developer landed a commit to builds being available to
users). You don’t have to understand or read all the components of this diagram, only
understand that it was scary and had many single points of failure and scalability
issues.
It takes 4-5 hours from developer commit to builds we can ship.
http://www.chesnok.com/daily/2014/05/02/release-engineering-a-draft-of-an-architect
ure-diagram/
Why did we rewrite?
● Developer autonomy
● Fail faster
● Better local and pipeline testing
● Change technology stack (Docker, microservices, graph generation,
optimization and transformation, task parallelization)
● Learn new things!
● So we decided to rewrite our existing pipeline to be more resilient and scalable
● Any developer can make changes to build and test configuration, before
releng was a blocker for these changes
● With every push to a repo, a decision graph is generated automatically.
Basically it contains a list of tasks and all their dependencies that are needed
to run associated with that push. If it fails, the builds aren’t run which saves
resources
● Developers can also test these changes locally or on the build pipeline
● Photo by ARTHUR YAO on Unsplash
Reasons not to rewrite?
● Failure is highly likely
● Really expensive
● May lose people on your team who aren’t interested in working on a new
technology stack
Have to defer other project work because you are heads down on a rewriting project.
There is also usually a huge learning curve if you are moving to a new technology
stack, not just for developers but for operations folks as well
“A system that spans more than one physical
location and uses the related concepts of copying
and decoupling to improve operational efficiency
(speed, resilience) and, more recently, developer
efficiency (team productivity).”
-Anne Currie
Distributed system
If your system spans more than one location you can make it more resilient.
For instance, our pipeline uses Amazon instances to run builds and tests, and we run
these jobs in multiple Amazon regions which correspond to different geographic
areas.
Copying data means that that it is available in more than one location, which is
another way to make the system more resilient. For instance, when we release
Firefox we release it from multiple CDNs.
Decoupling means that you have services that can operate on their own without
depending on other services being available
Decoupled services usually communicate with each other via APIs
This allows you to change the internal implementation without the other services
having to change the way you interact with the service
In this approach you can also stop, start and replace parts of the system. With a
monolith, this is more difficult to do.
This approach also allows team members to work on different parts of the system
without everyone contending for the same resources.
Another reason that we use distributed system is that is allows us to scale up capacity
incrementally by instantiating copies of existing services. For instance with our
migration we ran many more services in parallel to allow the end to end time for
releases to drop significantly.
They also allow us to provide a reasonable level of service to clients.
Availability means we can always provide a predictable service to clients. Even if
there are issues like network problems, the system can appear available.
Why do we use distributed systems
http://container-solutions.com/use-distributed-systems-resilience-performance-availab
ility/
Resilience, Performance & Availability
How to approach migration
● Incremental portions of pool
● Communication
● Checklist
● Monitor capacity and wait times
● Monitor state after migration
● Rollback plan
● Decommission old
● Migrate more
● This is in the context of a large migration that we did at Mozilla where we
migrated components of our build and release pipeline to a new microservices
architecture and Docker
● Communicate - open an issue.
● Let people know via mailing list, Slack/irc of timeframes for deletion
● Update issue tracker with plan and time
Strangler Application - Martin Fowler
From Jez Humble’s Continuous delivery page
https://continuousdelivery.com/implementing/architecture/
“One pattern that is particularly valuable in this context is the strangler application. In
this pattern, we iteratively replace a monolithic architecture with a more
componentized one by ensuring that new work is done following the principles of a
service-oriented architecture, while accepting that the new architecture may well
delegate to the system it is replacing. Over time, more and more functionality will be
performed in the new architecture, and the old system being replaced is “strangled”.”
In Mozilla releng, we recently migrated from an old build job scheduling system called
Buildbot to one called Taskcluster.
One of the things that really helped us achieve this in our transition was an application
called buildbot bridge. This allowed us to schedule jobs on taskcluster, but continue
to run them on buildbot. This is similar to the dispatcher function showed in the
diagram above.
What have we learned?
● Incrementalism - change one thing, evaluate, then change
another
● Expectations change. The faster we build, the faster other
groups expect to be able to ship
● Staging environment is important to test new automation
● Communication
● Organizational changes
● Consider the operational side, not just landing code
This is an excellent talk on code rewrites as well
So you want to rewrite that - Camille Fournier
https://www.youtube.com/watch?v=PhYUvtifJXk
How to delete code
● Communicate, note in issue tracker
● Delete. Don’t comment it out.
● Update or delete relevant tests
● Look at dependencies - can they also be updated or removed?
● Celebrate!
I’ve looked at a lot of code bases in the past where people are afraid to delete code,
so they comment it out. This makes the code really unreadable for future
maintainers. Or they leave the tests in place that are no longer relevant.
It’s 2018 and version control is your friend. If you need to look and see why the code
was deleted, you can bisect the code.
Hard to open up that door
When you're not sure what you're going for
But we've got to grow
We've got to try
Though it's hard so hard
We've got to say goodbye
―Beyoncé
Sometimes it’s hard to delete code. You get emotionally attached to it. You spent so
much time working on it. It’s okay, there will be something new to learn about!
From WOCintechchat stock photos License Creative Commons Attribution 2.0
Generic (CC BY 2.0)
How can you apply these principles yourself?
When you work on a new project, think about the lifecycle of the code
What is the update strategy? Mobile or web? With desktop apps you can’t ship 1.0
until you have an update strategy for 2.0
What is your deployment strategy
How will you find out if your users are unhappy
How can you distribute code ownership?
In conclusion, as you embark upon your careers in engineering, it has been my
experience that people matter more than code.
We are hiring - check out
https://careers.mozilla.org/
Thank you!
Also I have a couple hundred Firefox and Mozilla stickers, please see me afterwards
if you are interested
Additional Reading
● Camille Fournier: So you want to rewrite that, GOTO conference, Chicago,
2014 https://www.youtube.com/watch?v=PhYUvtifJXk
● Caitie McCaffrey: Resources for Getting started with distributed systems
https://caitiem.com/2017/09/07/getting-started-with-distributed-systems/
● Anne Currie:
○ What is a Distributed system? https://container-solutions.com/what-is-a-distributed-system/
○ Why is a Single-Threaded Application like a Distributed System?
http://container-solutions.com/single-threaded-application-like-distributed-system/
○ Why Use Distributed Systems? Resilience, Performance, and Availability
http://container-solutions.com/use-distributed-systems-resilience-performance-availability/
Additional Reading
● Lin Clark: Entering the Quantum Era—How Firefox got fast again and where
it’s going to get faster
https://hacks.mozilla.org/2017/11/entering-the-quantum-era-how-firefox-got-fa
st-again-and-where-its-going-to-get-faster/
●

Recommended

Continuous delivery of embedded systems embedded meetup by
Continuous delivery of embedded systems   embedded meetupContinuous delivery of embedded systems   embedded meetup
Continuous delivery of embedded systems embedded meetupMike Long
660 views62 slides
Devops is (not ) a buzzword by
Devops is (not ) a buzzwordDevops is (not ) a buzzword
Devops is (not ) a buzzwordMiguel Fonseca
1K views21 slides
PuppetConf track overview: Culture by
PuppetConf track overview: CulturePuppetConf track overview: Culture
PuppetConf track overview: CulturePuppet
433 views20 slides
The way Devs do Ops by
The way Devs do OpsThe way Devs do Ops
The way Devs do OpsMiguel Fonseca
553 views21 slides
PuppetConf track overview: Windows by
PuppetConf track overview: WindowsPuppetConf track overview: Windows
PuppetConf track overview: WindowsPuppet
441 views18 slides
JavaLand 2022 - Software architecture in a DevOps world by
JavaLand 2022 - Software architecture in a DevOps worldJavaLand 2022 - Software architecture in a DevOps world
JavaLand 2022 - Software architecture in a DevOps worldBert Jan Schrijver
143 views35 slides

More Related Content

What's hot

I broke what?!??!? Taking over maintenance on well loved projects by
I broke what?!??!? Taking over maintenance on well loved projectsI broke what?!??!? Taking over maintenance on well loved projects
I broke what?!??!? Taking over maintenance on well loved projectsBert JW Regeer
192 views54 slides
PuppetConf track overview: Inside Puppet by
PuppetConf track overview: Inside PuppetPuppetConf track overview: Inside Puppet
PuppetConf track overview: Inside PuppetPuppet
566 views23 slides
PuppetConf track overview: Security by
PuppetConf track overview: SecurityPuppetConf track overview: Security
PuppetConf track overview: SecurityPuppet
328 views17 slides
Merge hells!! feature toggles to the rescue by
Merge hells!! feature toggles to the rescueMerge hells!! feature toggles to the rescue
Merge hells!! feature toggles to the rescueLeena N
594 views53 slides
Skills Matter DevSecOps eXchange Forum 2022 - Software architecture in a DevO... by
Skills Matter DevSecOps eXchange Forum 2022 - Software architecture in a DevO...Skills Matter DevSecOps eXchange Forum 2022 - Software architecture in a DevO...
Skills Matter DevSecOps eXchange Forum 2022 - Software architecture in a DevO...Bert Jan Schrijver
239 views35 slides
PuppetConf track overview: Puppet Applied by
PuppetConf track overview: Puppet AppliedPuppetConf track overview: Puppet Applied
PuppetConf track overview: Puppet AppliedPuppet
348 views22 slides

What's hot(20)

I broke what?!??!? Taking over maintenance on well loved projects by Bert JW Regeer
I broke what?!??!? Taking over maintenance on well loved projectsI broke what?!??!? Taking over maintenance on well loved projects
I broke what?!??!? Taking over maintenance on well loved projects
Bert JW Regeer192 views
PuppetConf track overview: Inside Puppet by Puppet
PuppetConf track overview: Inside PuppetPuppetConf track overview: Inside Puppet
PuppetConf track overview: Inside Puppet
Puppet566 views
PuppetConf track overview: Security by Puppet
PuppetConf track overview: SecurityPuppetConf track overview: Security
PuppetConf track overview: Security
Puppet328 views
Merge hells!! feature toggles to the rescue by Leena N
Merge hells!! feature toggles to the rescueMerge hells!! feature toggles to the rescue
Merge hells!! feature toggles to the rescue
Leena N594 views
Skills Matter DevSecOps eXchange Forum 2022 - Software architecture in a DevO... by Bert Jan Schrijver
Skills Matter DevSecOps eXchange Forum 2022 - Software architecture in a DevO...Skills Matter DevSecOps eXchange Forum 2022 - Software architecture in a DevO...
Skills Matter DevSecOps eXchange Forum 2022 - Software architecture in a DevO...
Bert Jan Schrijver239 views
PuppetConf track overview: Puppet Applied by Puppet
PuppetConf track overview: Puppet AppliedPuppetConf track overview: Puppet Applied
PuppetConf track overview: Puppet Applied
Puppet348 views
Waste Driven Development - Agile Coaching Serbia Meetup by Lemi Orhan Ergin
Waste Driven Development - Agile Coaching Serbia MeetupWaste Driven Development - Agile Coaching Serbia Meetup
Waste Driven Development - Agile Coaching Serbia Meetup
Lemi Orhan Ergin955 views
8 Principles for Enabling Build/Measure/Learn: Lean Engineering in Action by Bill Scott
8 Principles for Enabling Build/Measure/Learn: Lean Engineering in Action8 Principles for Enabling Build/Measure/Learn: Lean Engineering in Action
8 Principles for Enabling Build/Measure/Learn: Lean Engineering in Action
Bill Scott14.9K views
Codemotion Berlin 2015 recap by Torben Dohrn
Codemotion Berlin 2015   recapCodemotion Berlin 2015   recap
Codemotion Berlin 2015 recap
Torben Dohrn713 views
The StartUp Agency - A Case Study on CFPB by GovLoop
The StartUp Agency - A Case Study on CFPBThe StartUp Agency - A Case Study on CFPB
The StartUp Agency - A Case Study on CFPB
GovLoop18K views
DSC UTeM DevOps Session#1: Intro to DevOps Presentation Slides by DSC UTeM
DSC UTeM DevOps Session#1: Intro to DevOps Presentation SlidesDSC UTeM DevOps Session#1: Intro to DevOps Presentation Slides
DSC UTeM DevOps Session#1: Intro to DevOps Presentation Slides
DSC UTeM73 views
有了 Agile,為什麼還要有 DevOps? by William Yeh
有了 Agile,為什麼還要有 DevOps?有了 Agile,為什麼還要有 DevOps?
有了 Agile,為什麼還要有 DevOps?
William Yeh12K views
Advantages of java development by webjohn52
Advantages of java developmentAdvantages of java development
Advantages of java development
webjohn52172 views
Practical DevOps & Continuous Delivery – A Webinar to learn in depth on DevO... by Hugo Messer
Practical DevOps & Continuous Delivery –  A Webinar to learn in depth on DevO...Practical DevOps & Continuous Delivery –  A Webinar to learn in depth on DevO...
Practical DevOps & Continuous Delivery – A Webinar to learn in depth on DevO...
Hugo Messer428 views
Contributing to an Open Source Project 101 by POSSCON
Contributing to an Open Source Project 101Contributing to an Open Source Project 101
Contributing to an Open Source Project 101
POSSCON374 views
JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?" by Daniel Bryant
JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?"JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?"
JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?"
Daniel Bryant1.5K views

Similar to From hello world to goodbye code

Built to Scale: The Mozilla Release Engineering toolbox by
Built to Scale: The Mozilla Release Engineering toolboxBuilt to Scale: The Mozilla Release Engineering toolbox
Built to Scale: The Mozilla Release Engineering toolboxKim Moir
2.3K views32 slides
The Development History of PVS-Studio for Linux by
The Development History of PVS-Studio for LinuxThe Development History of PVS-Studio for Linux
The Development History of PVS-Studio for LinuxPVS-Studio
117 views13 slides
What_is_DevOps.pptx by
What_is_DevOps.pptxWhat_is_DevOps.pptx
What_is_DevOps.pptxmridulsharma774687
8 views22 slides
Bring Your Project From a 10 Years to a 3 Months Release Cycle by
Bring Your Project From a 10 Years to a 3 Months Release CycleBring Your Project From a 10 Years to a 3 Months Release Cycle
Bring Your Project From a 10 Years to a 3 Months Release CycleSamsung Open Source Group
386 views24 slides
Usable Software Design by
Usable Software DesignUsable Software Design
Usable Software DesignAlexandru Bolboaca
445 views83 slides
Dev Ops for systems of record - Talk at Agile Australia 2015 by
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015Mirco Hering
969 views33 slides

Similar to From hello world to goodbye code(20)

Built to Scale: The Mozilla Release Engineering toolbox by Kim Moir
Built to Scale: The Mozilla Release Engineering toolboxBuilt to Scale: The Mozilla Release Engineering toolbox
Built to Scale: The Mozilla Release Engineering toolbox
Kim Moir2.3K views
The Development History of PVS-Studio for Linux by PVS-Studio
The Development History of PVS-Studio for LinuxThe Development History of PVS-Studio for Linux
The Development History of PVS-Studio for Linux
PVS-Studio117 views
Dev Ops for systems of record - Talk at Agile Australia 2015 by Mirco Hering
Dev Ops for systems of record - Talk at Agile Australia 2015Dev Ops for systems of record - Talk at Agile Australia 2015
Dev Ops for systems of record - Talk at Agile Australia 2015
Mirco Hering969 views
Reactive Microservice Architecture with Groovy and Grails by Steve Pember
Reactive Microservice Architecture with Groovy and GrailsReactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and Grails
Steve Pember7.8K views
Lean Engineering: How to make Engineering a full Lean UX partner by Bill Scott
Lean Engineering: How to make Engineering a full Lean UX partnerLean Engineering: How to make Engineering a full Lean UX partner
Lean Engineering: How to make Engineering a full Lean UX partner
Bill Scott24.8K views
DockerCon SF 2015: Keynote Day 1 by Docker, Inc.
DockerCon SF 2015: Keynote Day 1DockerCon SF 2015: Keynote Day 1
DockerCon SF 2015: Keynote Day 1
Docker, Inc.19K views
DockerDay2015: Keynote by Docker-Hanoi
DockerDay2015: KeynoteDockerDay2015: Keynote
DockerDay2015: Keynote
Docker-Hanoi441 views
Lean engineering for lean/balanced teams: lessons learned (and still learning... by Balanced Team
Lean engineering for lean/balanced teams: lessons learned (and still learning...Lean engineering for lean/balanced teams: lessons learned (and still learning...
Lean engineering for lean/balanced teams: lessons learned (and still learning...
Balanced Team3K views
Introduction to Docker and Containers- Learning Simple by Sandeep Hijam
Introduction to Docker and Containers- Learning SimpleIntroduction to Docker and Containers- Learning Simple
Introduction to Docker and Containers- Learning Simple
Sandeep Hijam54 views
Guided Path to DevOps Career. by wahabwelcome
Guided Path to DevOps Career.Guided Path to DevOps Career.
Guided Path to DevOps Career.
wahabwelcome197 views
AliExpress’ Way to Microservices - microXchg 2017 by juvenxu
AliExpress’ Way to Microservices  - microXchg 2017AliExpress’ Way to Microservices  - microXchg 2017
AliExpress’ Way to Microservices - microXchg 2017
juvenxu1.9K views
How to get started with Site Reliability Engineering by Andrew Kirkpatrick
How to get started with Site Reliability EngineeringHow to get started with Site Reliability Engineering
How to get started with Site Reliability Engineering
Cara Tepat Menjadi iOS Developer Expert - Gilang Ramadhan by DicodingEvent
Cara Tepat Menjadi iOS Developer Expert - Gilang RamadhanCara Tepat Menjadi iOS Developer Expert - Gilang Ramadhan
Cara Tepat Menjadi iOS Developer Expert - Gilang Ramadhan
DicodingEvent211 views
DevOps - Boldly Go for Distro by Paul Boos
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for Distro
Paul Boos118 views

More from Kim Moir

Distributed Systems at Scale: Reducing the Fail by
Distributed Systems at Scale:  Reducing the FailDistributed Systems at Scale:  Reducing the Fail
Distributed Systems at Scale: Reducing the FailKim Moir
3K views23 slides
Scaling mobile testing on AWS: Emulators all the way down by
Scaling mobile testing on AWS: Emulators all the way downScaling mobile testing on AWS: Emulators all the way down
Scaling mobile testing on AWS: Emulators all the way downKim Moir
3.2K views51 slides
Scaling capacity while saving cash by
Scaling capacity while saving cashScaling capacity while saving cash
Scaling capacity while saving cashKim Moir
1.3K views49 slides
Let's Git this Party Started: An Introduction to Git and GitHub by
Let's Git this Party Started: An Introduction to Git and GitHubLet's Git this Party Started: An Introduction to Git and GitHub
Let's Git this Party Started: An Introduction to Git and GitHubKim Moir
2K views54 slides
Has it really been 10 years? by
Has it really been 10 years?Has it really been 10 years?
Has it really been 10 years?Kim Moir
1.6K views38 slides
Migrating to Git: Rethinking the Commit by
Migrating to Git:  Rethinking the CommitMigrating to Git:  Rethinking the Commit
Migrating to Git: Rethinking the CommitKim Moir
4.2K views30 slides

More from Kim Moir(7)

Distributed Systems at Scale: Reducing the Fail by Kim Moir
Distributed Systems at Scale:  Reducing the FailDistributed Systems at Scale:  Reducing the Fail
Distributed Systems at Scale: Reducing the Fail
Kim Moir3K views
Scaling mobile testing on AWS: Emulators all the way down by Kim Moir
Scaling mobile testing on AWS: Emulators all the way downScaling mobile testing on AWS: Emulators all the way down
Scaling mobile testing on AWS: Emulators all the way down
Kim Moir3.2K views
Scaling capacity while saving cash by Kim Moir
Scaling capacity while saving cashScaling capacity while saving cash
Scaling capacity while saving cash
Kim Moir1.3K views
Let's Git this Party Started: An Introduction to Git and GitHub by Kim Moir
Let's Git this Party Started: An Introduction to Git and GitHubLet's Git this Party Started: An Introduction to Git and GitHub
Let's Git this Party Started: An Introduction to Git and GitHub
Kim Moir2K views
Has it really been 10 years? by Kim Moir
Has it really been 10 years?Has it really been 10 years?
Has it really been 10 years?
Kim Moir1.6K views
Migrating to Git: Rethinking the Commit by Kim Moir
Migrating to Git:  Rethinking the CommitMigrating to Git:  Rethinking the Commit
Migrating to Git: Rethinking the Commit
Kim Moir4.2K views
Eclipse Top Ten: Important lessons I've learned working on Eclipse by Kim Moir
Eclipse Top Ten: Important lessons I've learned working on Eclipse Eclipse Top Ten: Important lessons I've learned working on Eclipse
Eclipse Top Ten: Important lessons I've learned working on Eclipse
Kim Moir1.3K views

Recently uploaded

ict act 1.pptx by
ict act 1.pptxict act 1.pptx
ict act 1.pptxsanjaniarun08
13 views17 slides
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ... by
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...marksimpsongw
76 views34 slides
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J... by
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...Deltares
9 views24 slides
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)... by
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...Deltares
9 views34 slides
Upgrading Incident Management with Icinga - Icinga Camp Milan 2023 by
Upgrading Incident Management with Icinga - Icinga Camp Milan 2023Upgrading Incident Management with Icinga - Icinga Camp Milan 2023
Upgrading Incident Management with Icinga - Icinga Camp Milan 2023Icinga
38 views17 slides
Advanced API Mocking Techniques by
Advanced API Mocking TechniquesAdvanced API Mocking Techniques
Advanced API Mocking TechniquesDimpy Adhikary
19 views11 slides

Recently uploaded(20)

Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ... by marksimpsongw
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
marksimpsongw76 views
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J... by Deltares
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
Deltares9 views
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)... by Deltares
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
Deltares9 views
Upgrading Incident Management with Icinga - Icinga Camp Milan 2023 by Icinga
Upgrading Incident Management with Icinga - Icinga Camp Milan 2023Upgrading Incident Management with Icinga - Icinga Camp Milan 2023
Upgrading Incident Management with Icinga - Icinga Camp Milan 2023
Icinga38 views
Advanced API Mocking Techniques by Dimpy Adhikary
Advanced API Mocking TechniquesAdvanced API Mocking Techniques
Advanced API Mocking Techniques
Dimpy Adhikary19 views
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ... by Deltares
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
Deltares9 views
DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM... by Deltares
DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM...DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM...
DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM...
Deltares7 views
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx by animuscrm
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
animuscrm13 views
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea... by Safe Software
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Safe Software412 views
Consulting for Data Monetization Maximizing the Profit Potential of Your Data... by Flexsin
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...Consulting for Data Monetization Maximizing the Profit Potential of Your Data...
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...
Flexsin 15 views
Roadmap y Novedades de producto by Neo4j
Roadmap y Novedades de productoRoadmap y Novedades de producto
Roadmap y Novedades de producto
Neo4j50 views
A first look at MariaDB 11.x features and ideas on how to use them by Federico Razzoli
A first look at MariaDB 11.x features and ideas on how to use themA first look at MariaDB 11.x features and ideas on how to use them
A first look at MariaDB 11.x features and ideas on how to use them
Federico Razzoli45 views
Cycleops - Automate deployments on top of bare metal.pptx by Thanassis Parathyras
Cycleops - Automate deployments on top of bare metal.pptxCycleops - Automate deployments on top of bare metal.pptx
Cycleops - Automate deployments on top of bare metal.pptx
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports by Ra'Fat Al-Msie'deen
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug ReportsBushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut... by HCLSoftware
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...
HCLSoftware6 views

From hello world to goodbye code

  • 1. From Hello World to Goodbye Code Kim Moir, Staff Release Engineer, Mozilla, @kmoir Bonjour à toutes et à tous, hello. I’m very happy to see you all this morning. Je suis très heureuse de vous voir tous ce matin. My name is Kim Moir and I’m a staff release engineer at Mozilla. Montreal in January is only slightly colder than Ottawa in January where I live, so I was not scared off by the weather. I’ve been paid to work full-time in open source since 2001. Before that I worked in government, education, and at other tech companies. Before that I was a student just like you. We didn’t have email on our phones, in fact, we barely had email. I’ve been working longer than most of you have been alive. But that’s okay. If I can survive 20+ years in the tech industry, so can you. Mozilla is most well known for building Firefox the web browser. As well as for their mission to make the internet open and accessible to all. I don’t work on the Firefox code base itself. As a release engineer, I write tools to scale our large build and release pipeline that transforms the Firefox code into a shippable product. This pipeline is a large distributed system. We are constantly optimizing this system to be more scalable, more resilient to failure and modifying the services it provides.
  • 2. Outside of work, I like baking and running long distances. I have an amazing family too! I put these pictures up here to show you that as a developer you can have a life outside of work. Our industry tends to glamourize long hours at the keyboard at the expense of everything else but it doesn’t have to be that way. Firefox logo https://blog.mozilla.org/blog/2017/11/14/fast-for-good-launching-the-new-firefox-into-t he-world/
  • 3. Today’s agenda ● The life cycle of code ● Distributed systems ● Replacing components of a running distributed system ● You can try it too! ● The life cycle of code ● Distributed systems ● Replacing components of a running distributed system (in the context of Firefox pipeline rewrite) ● You can try it too!
  • 4. “And as everyone knows, the best kind of laughter is laughter born of a shared memory.” ― Mindy Kaling, Why Not Me? Let’s create some memories and talk about distributed systems and deleting code!
  • 5. Hands up, how many of you have worked with a completely new code base in a work context? How many of you have worked with a existing code base? I’ve mentored a number of interns over the years, and one thing that I notice is that many school assignments are based on a completely new code base. I understand that this is done because everyone is learning language semantics, ui or testing frameworks for the course curriculum. In most companies, you will be looking at a existing code base. Even if you start your own company, you will probably use existing open source or language specific libraries, or call existing APIs. So an really important skill is learn how to work with an existing code base. Photo by Markus Spiske on Unsplash
  • 6. Photo by Francesco Gallarotti on Unsplash Often an existing code base is like a very large, well established forest that you need to walk around in for a few hours, days or even a few weeks. Just to understand how it all works.
  • 7. Photo by Koen Eijkelenboom on Unsplash It’s also good to talk to other people that have wandered in the code before. What do they know? What can you learn from them? Asking lots of questions as a software engineer is one of of the most important skills you can learn.
  • 8. Healthy code bases and their teams ● Documented shipping and deployment processes that work ● Ship new binaries or provide updates on a regular basis These are things that I look for when I look at a new code base. As a release engineer, I’m biased to these qualities because I really care about shipping. Is the process documented on how to ship? Can more than one person ship the product or is this a magical set of steps that only one person knows how to execute? How often do you deploy or update users
  • 9. Healthy code bases and their teams (cont’d) ● Readable code ● Tested code - correctness, integration, performance ● Feedback mechanism between developers and users Is it readable code or is there dead code and tests? Are there tests with a reasonable level of code coverage? Where do you report bugs? Or request new features? Is there telemetry that report failures in the product automatically?
  • 10. ● Code ownership and review is shared among multiple people ● Ownership = responsibility for change ● This doesn’t mean that you have to do everything yourself ● You can serve a code reviewer and mentor new people ● People need to CARE about the code and the people who use and maintain it Healthy code bases and their teams (cont’d) When I used to work in the Eclipse community many years ago, the project I worked on didn’t have a code review process in place until a few weeks before the release each release. The problem was this approach was that there were limited people who understood different components. And when they decided to leave the community, the expertise left with them. (This process has since changed and they do have more code review in place) At Mozilla, we have the concept of module ownership and a robust code review process. This helps a larger group of people understand components of the code base because people are required to evaluate contributions. Reduces the bus factor as well when people leave.
  • 11. ● Photo by John Baker on Unsplash ● Examples of old code bases actively being updated ○ Voyager space probes (~40 years) ○ Airplanes (~30 year service lifetime) ○ Industrial robots (~20 years) ○ The first Firefox release was over 15 years ago. I’m not sure how much of the original code base remains. I often think that large code bases are like the cells in a human body, over time, much will be replaced by new, but eventually it will die. Industial robots https://www.bastiansolutions.com/blog/index.php/2015/04/30/increase-life-span-of-ind ustrial-robot/ Voyager https://www.nasa.gov/mission_pages/voyager/index.html Social implications of old code Updating voyager software https://www.quora.com/Was-the-opportunity-to-update-the-Voyager-spacecraft-firmwa re-ever-considered-If-there-are-plans-to-launch-another-Voyager-could-we-keep-upda ting-its-Earth-information-content
  • 12. Nasa retiring engineer voyager http://www.popularmechanics.com/space/a17991/voyager-1-voyager-2-retiring-engine er/
  • 13. There are also social issues to maintaining old code bases. For instance, last year NASA was looking for a new developer to maintain it’s code base for the Voyager Space probes because last of the original team members were getting ready to retire.
  • 14. Firefox continuous integration Land code Unit tests Decision graph Builds x N platforms Performance tests Sign Builds This is a very simplified diagram of the process that occurs when a developer lands code on our build pipeline. With her commit, a decision graph is generated that lists all the jobs that need to run. Then we build for four platforms - Linux, Mac, Windows and Android. These builds are then signed, and we run unit tests and performance tests so the developers can see the results of their commit. Did the tests fail? Or are there performance regressions they need to address?
  • 15. Pipeline Metrics ● Constraints - it needs to be up and running all the time for developer productivity ● ~500 commits a day ● 140K jobs a day The build and release pipeline for Firefox is a large distributed system. Here are some metrics about it ● Developers love to ship. In order to ship, they need feedback on their patches. Can I ship this? Or does is there a regression that needs to be backed out? Improves happiness if they can see the results of their work more quickly
  • 16. Photo by Uroš Jovičić on Unsplash End to end times - This is the time from a developer lands a commit until we are able to ship the finished product. Why are they important? 1. Landing small incremental patches reduces risk. Too difficult to figure out what went wrong on a high velocity team with a huge number of commits. 2. 0 days - we need to be able to get security patches to our users quickly. For instance last week we released five releases to address the recent Meltdown and Spectre vulnerabilities.
  • 17. This is a picture of the Firefox release engineering pipeline from 3 years ago that Selena Deckelmann created. It took (optimistically around 11 hours at that time to ship a release from the time a developer landed a commit to builds being available to users). You don’t have to understand or read all the components of this diagram, only understand that it was scary and had many single points of failure and scalability issues. It takes 4-5 hours from developer commit to builds we can ship. http://www.chesnok.com/daily/2014/05/02/release-engineering-a-draft-of-an-architect ure-diagram/
  • 18. Why did we rewrite? ● Developer autonomy ● Fail faster ● Better local and pipeline testing ● Change technology stack (Docker, microservices, graph generation, optimization and transformation, task parallelization) ● Learn new things! ● So we decided to rewrite our existing pipeline to be more resilient and scalable ● Any developer can make changes to build and test configuration, before releng was a blocker for these changes ● With every push to a repo, a decision graph is generated automatically. Basically it contains a list of tasks and all their dependencies that are needed to run associated with that push. If it fails, the builds aren’t run which saves resources ● Developers can also test these changes locally or on the build pipeline ● Photo by ARTHUR YAO on Unsplash
  • 19. Reasons not to rewrite? ● Failure is highly likely ● Really expensive ● May lose people on your team who aren’t interested in working on a new technology stack Have to defer other project work because you are heads down on a rewriting project. There is also usually a huge learning curve if you are moving to a new technology stack, not just for developers but for operations folks as well
  • 20. “A system that spans more than one physical location and uses the related concepts of copying and decoupling to improve operational efficiency (speed, resilience) and, more recently, developer efficiency (team productivity).” -Anne Currie Distributed system If your system spans more than one location you can make it more resilient. For instance, our pipeline uses Amazon instances to run builds and tests, and we run these jobs in multiple Amazon regions which correspond to different geographic areas. Copying data means that that it is available in more than one location, which is another way to make the system more resilient. For instance, when we release Firefox we release it from multiple CDNs. Decoupling means that you have services that can operate on their own without depending on other services being available Decoupled services usually communicate with each other via APIs This allows you to change the internal implementation without the other services having to change the way you interact with the service In this approach you can also stop, start and replace parts of the system. With a monolith, this is more difficult to do. This approach also allows team members to work on different parts of the system without everyone contending for the same resources. Another reason that we use distributed system is that is allows us to scale up capacity incrementally by instantiating copies of existing services. For instance with our
  • 21. migration we ran many more services in parallel to allow the end to end time for releases to drop significantly. They also allow us to provide a reasonable level of service to clients. Availability means we can always provide a predictable service to clients. Even if there are issues like network problems, the system can appear available. Why do we use distributed systems http://container-solutions.com/use-distributed-systems-resilience-performance-availab ility/ Resilience, Performance & Availability
  • 22. How to approach migration ● Incremental portions of pool ● Communication ● Checklist ● Monitor capacity and wait times ● Monitor state after migration ● Rollback plan ● Decommission old ● Migrate more ● This is in the context of a large migration that we did at Mozilla where we migrated components of our build and release pipeline to a new microservices architecture and Docker ● Communicate - open an issue. ● Let people know via mailing list, Slack/irc of timeframes for deletion ● Update issue tracker with plan and time
  • 23. Strangler Application - Martin Fowler From Jez Humble’s Continuous delivery page https://continuousdelivery.com/implementing/architecture/ “One pattern that is particularly valuable in this context is the strangler application. In this pattern, we iteratively replace a monolithic architecture with a more componentized one by ensuring that new work is done following the principles of a service-oriented architecture, while accepting that the new architecture may well delegate to the system it is replacing. Over time, more and more functionality will be performed in the new architecture, and the old system being replaced is “strangled”.” In Mozilla releng, we recently migrated from an old build job scheduling system called Buildbot to one called Taskcluster. One of the things that really helped us achieve this in our transition was an application called buildbot bridge. This allowed us to schedule jobs on taskcluster, but continue to run them on buildbot. This is similar to the dispatcher function showed in the diagram above.
  • 24. What have we learned? ● Incrementalism - change one thing, evaluate, then change another ● Expectations change. The faster we build, the faster other groups expect to be able to ship ● Staging environment is important to test new automation ● Communication ● Organizational changes ● Consider the operational side, not just landing code This is an excellent talk on code rewrites as well So you want to rewrite that - Camille Fournier https://www.youtube.com/watch?v=PhYUvtifJXk
  • 25. How to delete code ● Communicate, note in issue tracker ● Delete. Don’t comment it out. ● Update or delete relevant tests ● Look at dependencies - can they also be updated or removed? ● Celebrate! I’ve looked at a lot of code bases in the past where people are afraid to delete code, so they comment it out. This makes the code really unreadable for future maintainers. Or they leave the tests in place that are no longer relevant. It’s 2018 and version control is your friend. If you need to look and see why the code was deleted, you can bisect the code.
  • 26. Hard to open up that door When you're not sure what you're going for But we've got to grow We've got to try Though it's hard so hard We've got to say goodbye ―Beyoncé Sometimes it’s hard to delete code. You get emotionally attached to it. You spent so much time working on it. It’s okay, there will be something new to learn about!
  • 27. From WOCintechchat stock photos License Creative Commons Attribution 2.0 Generic (CC BY 2.0) How can you apply these principles yourself? When you work on a new project, think about the lifecycle of the code What is the update strategy? Mobile or web? With desktop apps you can’t ship 1.0 until you have an update strategy for 2.0 What is your deployment strategy How will you find out if your users are unhappy How can you distribute code ownership?
  • 28. In conclusion, as you embark upon your careers in engineering, it has been my experience that people matter more than code.
  • 29. We are hiring - check out https://careers.mozilla.org/ Thank you! Also I have a couple hundred Firefox and Mozilla stickers, please see me afterwards if you are interested
  • 30. Additional Reading ● Camille Fournier: So you want to rewrite that, GOTO conference, Chicago, 2014 https://www.youtube.com/watch?v=PhYUvtifJXk ● Caitie McCaffrey: Resources for Getting started with distributed systems https://caitiem.com/2017/09/07/getting-started-with-distributed-systems/ ● Anne Currie: ○ What is a Distributed system? https://container-solutions.com/what-is-a-distributed-system/ ○ Why is a Single-Threaded Application like a Distributed System? http://container-solutions.com/single-threaded-application-like-distributed-system/ ○ Why Use Distributed Systems? Resilience, Performance, and Availability http://container-solutions.com/use-distributed-systems-resilience-performance-availability/
  • 31. Additional Reading ● Lin Clark: Entering the Quantum Era—How Firefox got fast again and where it’s going to get faster https://hacks.mozilla.org/2017/11/entering-the-quantum-era-how-firefox-got-fa st-again-and-where-its-going-to-get-faster/ ●