This talk is geared towards Infrastructure and system engineers who are interested in learning about structuring a large monorepo codebase, consisting of multiple micro services that share many dependencies. This talk will introduce Pants as a build system for such large monolithic codebase and how it ties with today’s container ecosystem principles.
https://pycon.sg/schedule/presentation/73/
4. Service A Service B Service C
Project
Repo A
Project
Repo B
Project
Repo C
Code Organization
Shared code
Does not scale well for a large number of
microservices
Complex method of sharing libraries (publishing
artifacts, versioning hell)
5. Code Organization
Libraries repository
Service A
Service B
Service C
Libraries as Code Units
Single Lint, Build, Test and Release process
Easy to coordinate changes across modules
Easier to setup development environment
Tests run across modules are run together
Promote the idea of writing shareable code
Monorepo
- A repository with a defined structure for organizing reusable components of
code
6. Pain Points
Virtualenv to manage dependencies for python projects is painful. Need
something simpler.
Need easier code sharing amongst projects. Fixing a bug in a function should not
require changing versions of other downstream projects.
Need standardization in testing and building process
7. Pants
Build system for managing targets sharing
a single repository
Dependencies are managed in BUILD files
that live alongside the code.
History - Used to be a python wrapper
around Ant build tool which generated
build.xml files and handed the build files
to ant. (Python + Ant = Pants)
Later, rewritten to be an independent build
tool with main support for JVM languages
and Python.
8. Pants
Define source tree - src/<lang> e.g. src/python/
BUILD files define targets at each leaf node in the source tree.
DSL, which invokes python constructors in the background
Targets can be either a binary (e.g PEX for python, JAR for Java) or a library
which can be referenced by other targets.
9. PEX
PEX files - Python Executables, similar in idea to a virtual environment.
Generate Immutable artifacts, that will run on any server
Run targets locally, without maintaining complex virtual environments
Easier debugging through standardized versioning of 3rdparty dependencies
I am Angad and today I am going to tell you about build systems for large code bases.
First, about me - I graduated from NUS after attending many tutorials/lectures in this very room. I joined twitter after I graduated, to work as a Site Reliability Engineer. I came back to Singapore in 2014 to work with Viki as an infrastructure and devops lead.
I am going to talk about Code Organization for large codebases first and then some of the Pain points that need to be solved for improved developer productivity. We will go over the Pants as a build tool and how it generates easily portable PEX files. We will then go over some examples of pants usage.
I might be talking about Pants here as I am most familiar with it and it is written in python - but a build system for large codebases is a common concept amongst large companies and there are other open source tools as well such as Buck by facebook, Bazel by google and i urge you to explore them as well. In the end, using a build tool is far better than not using one.
Lets start with Code organization. When you are a small team or a startup, a very basic and intuitive way to organize code is to have project repositories for each of your microservices.
And microservices is a great idea if your team is growing and you need separation of responsibilities. The popularization of microservices has led to a growth of some bad practices as well. Microservices does not mean Microrepositories. Splitting a monolith service into a microservice is the trend of the day and it will be very tempting to start by splitting the source into multiple micro repositories.
With github offering unlimited free private repositories, this sounds like a logical and tempting solution.
Now this is good because all your developers can create as many repositories as they want and start writing a lot of code for your company.
But this does not scale well for a large company. Success of a large team is heavily dependent on building on top of knowledge and code of other people.
When there is a lot of shared code, Juggling a multimodule project over multiple repositories can be quite painful. You end up using some form of artifact sharing and end up in a versioning hell where all services are using a different version of the shared library. And if you want to fix some bug in a shared library, you have to update the version in all downstream repositories. That is a productivity nightmare.
So instead of thinking of units of code as projects, think of libraries as the units of code. Now each of the services can just be composed of these library units.
We can have single lint, build, test and release process. We can easily enforce a style guide as all the code is in one place. Much easier to setup a development environment.
This promotes the idea of writing code that is shareable and reusable from the beginning.
This is quite a popular concept amongst large companies. This started with Google and then Facebook, Twitter and a bunch of companies have Monorepo. Which is a large repository with all the code, with a defined structure for organizing reusable components.
So we now know that there is some benefit of thinking of code as libraries which can be reused. In summary, we need a system which can solve the following pain points. First, we need an easier way of sharing code.
Next we need standardization in testing and building process.
Next, I have spent a fare share of time managing virtual environments for python projects and that is quite painful if you have multiple python projects. We need something more automated.
So here comes Pants. Pants is a build tool with support for multiple languages, writen in Python. It was developed at Twitter and Foursquare to manage multiple build targets in a single repository.
Dependencies for pants are managed in BUILD files that live alongside the code.
You might be wondering why it is called Pants and why is it such a weird name for a build tool. Pants started as a build tool helper for Ant, a Java build tool. It used to be a simple tool that used to spew out hundreds of build.xml files and then invoke ant to work on them. Hence the name Pants from Python + Ant.
But then it was rewritten as an independent build tool with main support for JVM based languages and Python.
Lets go over some basic concepts of Pants.
You start by defining a source tree which is organized by language. For every leaf node in this source tree, you place BUILD files which define the target. BUILD files are similar to writing python but is a pants specific DSL. It essentially invokes python constructors in pants.
Targets can either be a library or a binary. Binaries can be put in a docker container and run on any server and libraries can be referenced by any other targets.
One important aspect of pants is its ability to generate PEX files. PEX files are special python executables. You can think of them as a statically compiled golang binary or a fat Java JAR file. They are similar in idea to what a virtual environment is but its essentially all packed in together and made executable.
Its a zip file with a python directive and a special __main__.py that allows you to interact with PEX runtime.
This follows the same philosophy that we see today in docker containers where you develop immutable containers and create or destroy them as needed. Similarly, PEX files are immutable artifacts. You package your application to be able to run on any server that can run python.
You can also run all targets locally without having to maintain complex virtual environments. This also helps in debugging as your projects can use standard versions of 3rdparty dependencies.
Lets go over an example of a BUILD file.
As mentioned, this is Pants DSL but it is essentially a function call.
python_library will create a library named :shared_lib that can be used by other targets. You define dependencies in a simple array and your source file.
Now we want to use this library to create a CLI binary. We specify the dependencies for the CLI and include the shared library as one of the dependency.
This is a very simple introduction to Pants. And there are some open source repositories using pants. Lets go over a simple project that I created for this talk.