Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Leveraging the Anaconda
PlatformPlatform to Build a Custom Data Scienc...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Outline
• Motivation
• Overview of the Anaconda Platform
• The Bloombe...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Motivation
• Bloomberg is first and foremost a data company
• Many tea...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Requirements
• Deployment to the desktop
• Windows only
• Support only...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
The Anaconda Platform
• Conda
• A cross-platform binary package and en...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Building a conda package
• Packages are built from “recipes” with cond...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
conda-forge
• Community-driven effort to provide conda recipes and bui...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
The Package Manifest
• Pin versions of all packages in the manifest
• ...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Platform Versioning
• Assign a semantic version number to each manifes...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Picking a Platform Version
• Every version is installed into its own e...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Garbage Collection
• Need a way to remove unused packages and environm...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Build System
• Inspired by conda-forge
• Feedstock repositories separa...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Customization
• matplotlib conda package depends on Qt
• Not needed in...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Deployment
• All builds end up in an internal (“dev”) channel
• When a...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Lessons Learned: Install Order
• We are using packages from both conda...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Lessons Learned: Channel Pinning
• One package that we pinned is
mpmat...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Lessons Learned: Reprod. Builds
• Build time dependencies are not pinn...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Wishlist: conda download
• A command that downloads packages but does ...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Wishlist: Parallelize Install Steps
• Optimizing the install time of t...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Wishlist: .xz conda packages
• LZMA has better compression ratio and b...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Conclusions
• The Anaconda Platform together with conda-forge is a gre...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Thank you!
22
AnacondaCON 2017
Armin Burgmeier <aburgmeier@bloomberg.n...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Lessons Learned: Conflict Hints
• Conflict is if dependency constraint...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Environment extensions
• What if you need a package not included in th...
©	2017	Bloomberg	Finance	L.P.		All	rights	reserved.
Environment extensions (cont.)
• Alternative: Use conda
• User specifi...
Upcoming SlideShare
Loading in …5
×

Using Anaconda to Build a Custom Data Science Distribution at Bloomberg | AnacondaCON 2017

3,275 views

Published on

I will present the experiences our team has had using the conda package manager to build a custom distribution of free software packages combined with Bloomberg-specific libraries for data retrieval and analytics, targeting other teams within the company as consumers. Conda helps solve one of our central challenges which is avoiding inadvertently breaking the code of our users while providing an up to date Python software stack. The talk will also discuss the architecture of our infrastructure based on conda-forge and buildbot.

Presented at AnacondaCON 2017 by Armen Burgmeier, Bloomberg.

Published in: Data & Analytics
  • Be the first to comment

Using Anaconda to Build a Custom Data Science Distribution at Bloomberg | AnacondaCON 2017

  1. 1. © 2017 Bloomberg Finance L.P. All rights reserved. Leveraging the Anaconda PlatformPlatform to Build a Custom Data Science ScienceDistribution at Bloomberg AnacondaCON 2017 Armin Burgmeier <aburgmeier@bloomberg.net> Senior Software Engineer February 9, 2017 1
  2. 2. © 2017 Bloomberg Finance L.P. All rights reserved. Outline • Motivation • Overview of the Anaconda Platform • The Bloomberg Distribution • Building and Deployment • Lessons Learned • Wishlist 2
  3. 3. © 2017 Bloomberg Finance L.P. All rights reserved. Motivation • Bloomberg is first and foremost a data company • Many teams augment, ingest and analyze financial data • Provide a modern data science platform for such teams • “Replace Excel with Jupyter notebooks” • Combining the Python scientific stack with Bloomberg data and services 3
  4. 4. © 2017 Bloomberg Finance L.P. All rights reserved. Requirements • Deployment to the desktop • Windows only • Support only a limited combination of packages • Reproducible runtime environments • Facilitate sharing of projects Differences from Anaconda: • Different set of standard packages (Financial domain) • Automatic management of packages and environments • Allow update of stable environments for security fixes, adaptions to backend changes 4
  5. 5. © 2017 Bloomberg Finance L.P. All rights reserved. The Anaconda Platform • Conda • A cross-platform binary package and environment manager • Anaconda • Conda + a set of commonly used Python packages • Anaconda Cloud • Share notebooks, environments, packages, … A conda package in a nutshell: matplotlib-1.5.3-np111py35_1.tar.bz2 Name Version Build string Tarball Set of files to install + metadata: • Name, Version, Build string • Dependencies • Platform • License • … 5Conda logo from http://conda.pydata.org
  6. 6. © 2017 Bloomberg Finance L.P. All rights reserved. Building a conda package • Packages are built from “recipes” with conda-build • Recipes are meant to describe a reproducible build environment • Consists of • Name, Version, Build string • Dependencies needed to build (such as Python and setuptools for Python packages) • Build scripts (might invoke C/C++ compiler) • Tests for the package • conda-build ensures binary compatibility 6
  7. 7. © 2017 Bloomberg Finance L.P. All rights reserved. conda-forge • Community-driven effort to provide conda recipes and build infrastructure • One git repository (“feedstock”) per recipe • Each recipe gets built by the infrastructure • https://conda-forge.github.io/ • Packages available through a separate “channel” • Some packages available both through anaconda and conda-forge • Others are exclusive in either anaconda and conda-forge • mix-and-match 7conda-forge logo from https://github.com/conda-forge
  8. 8. © 2017 Bloomberg Finance L.P. All rights reserved. The Package Manifest • Pin versions of all packages in the manifest • Keep a separate manifest of “intended” packages • Both manifests are conda packages python 3.5.2 1 numpy 1.11.1 py35_2 pandas 0.19.1 np111_py35_0 … “Locked” Manifest python 3.5* pandas … “Intended” Manifest • Similar idea as the difference between cargo.lock and cargo.toml in Rust 8
  9. 9. © 2017 Bloomberg Finance L.P. All rights reserved. Platform Versioning • Assign a semantic version number to each manifest Distribution 1 Distribution 2 Distribution 3 Distribution 4 Deprecated Supported Supported Preview 9
  10. 10. © 2017 Bloomberg Finance L.P. All rights reserved. Picking a Platform Version • Every version is installed into its own environment • Creating a new notebook: • Choose current default major version • Opening an existing notebook: • Same major version than the one the notebook was created with • Latest minor version • Deprecating a major version: • Refuse to open notebooks with that major version • Upgrading an existing notebook: • Always a conscious action • Run the notebook in the new environment • Testing and verification by developer Determine major version Find latest minor version (maybe) Create environment (maybe) Launch NB server Open file in NB server User Action 10
  11. 11. © 2017 Bloomberg Finance L.P. All rights reserved. Garbage Collection • Need a way to remove unused packages and environments • Observation: we are never running a version that has a more recent version in the same stable series (same major version) • Remove all deprecated versions • Remove all versions with no longer supported major versions • Remove all packages no longer installed in any environment • Beware of concurrent operations! Distribution 1 Distribution 2 Distribution 3 Distribution 4 11
  12. 12. © 2017 Bloomberg Finance L.P. All rights reserved. Build System • Inspired by conda-forge • Feedstock repositories separate from upstream code • Continuous Integration • Buildbot builds the recipe on every PR and every push to master • Upload to internal Bloomberg channel • Works great for C# codebases as well PR on Feedstock repo Automatic build Upload to separate channel Manual Testing if needed Merge PR Upload to Main Channel 12buildbot logo from http://buildbot.net/about.html
  13. 13. © 2017 Bloomberg Finance L.P. All rights reserved. Customization • matplotlib conda package depends on Qt • Not needed in a Jupyter notebook-based environment • No notion of “optional” dependencies in conda • Fork conda-forge matplotlib-feedstock repo • Make customization and add “noqt” feature to the build • Created package is matplotlib-1.5.3-np111py35_noqt_0 • Avoids collision with packages from other channels • Tracking the “noqt” feature in our environment makes conda prefer our customization over the default package Need for customization of upstream package: 13
  14. 14. © 2017 Bloomberg Finance L.P. All rights reserved. Deployment • All builds end up in an internal (“dev”) channel • When a new platform version is ready for a wider audience, propagate the platform package and all packages it contains into a production channel. 1.0 1.1 1.2 2.0 “dev” channel 1.0 1.1 1.2 “prod” channel 14
  15. 15. © 2017 Bloomberg Finance L.P. All rights reserved. Lessons Learned: Install Order • We are using packages from both conda-forge and anaconda • Sometimes they don’t play well together • Bqplot needs ipywidgets installed at install time for post-install script • Circular dependencies are handled fine by the conda solver • But no guarantee about installation order! bqplotipywidgets _nb_ext_conf ipywidgets conda-forge conda-forgeanaconda anaconda • Workaround: prefer conda-forge over anaconda • https://github.com/conda-forge/bqplot-feedstock/issues/11 15
  16. 16. © 2017 Bloomberg Finance L.P. All rights reserved. Lessons Learned: Channel Pinning • One package that we pinned is mpmath-0.19-py35_1 • Originally it was available in the anaconda channel • “Suddenly” it became available in conda-forge • With different dependencies • Build fails because the new dependencies are not pinned • Ideally we could pin the channel as well • In addition to version and build string • Workaround: Upload mpmath from anaconda to Bloomberg channel • Ultimate channel priority: Bloomberg -> conda-forge -> anaconda mpmath-0.19- py35_1 python mpir mpfr gmpy mpmath-0.19- py35_1 anaconda conda-forge 16
  17. 17. © 2017 Bloomberg Finance L.P. All rights reserved. Lessons Learned: Reprod. Builds • Build time dependencies are not pinned • Hard to enforce with current conda tools • A build that works today might no longer work tomorrow • e.g. pandas 0.17.1 changed merge behavior which broke one of our packages • Possible solution: • After a successful build, “freeze” the dependency resolution and add to the recipe • On subsequent builds, use the “frozen” dependency resolution • Make it an explicit action to re-resolve dependencies • Would need separate resolutions for different features, platforms, py/np versions 17
  18. 18. © 2017 Bloomberg Finance L.P. All rights reserved. Wishlist: conda download • A command that downloads packages but does not install them • Allows to work around build dependency pinning: • Download the build dependencies for a package • Add them to a local channel • Build the package with dependencies only from that channel • Conda is forced to resolve dependencies with the previous downloaded packages • Allows to ship packages so they can be installed later without connectivity to the original channels • http://github.com/conda/conda/issues/1150 18
  19. 19. © 2017 Bloomberg Finance L.P. All rights reserved. Wishlist: Parallelize Install Steps • Optimizing the install time of the first environment is crucial in our scenario • Creating a conda environment takes three steps • Download package tarballs (Network I/O- bound) • Extract package tarballs (CPU-bound) • Install packages into the environment (Disk I/O and/or CPU bound) • Download size is O(200MiB) • First two steps could be (easily?) parallelized Download Package A Download Package B Download Package C Download Package D Extract Package A Extract Package B Extract Package C Extract Package D time 19
  20. 20. © 2017 Bloomberg Finance L.P. All rights reserved. Wishlist: .xz conda packages • LZMA has better compression ratio and better decompression speed • Would significantly improve time to download packages and create an environment Method Size (MiB) Decompression Speed Decompression Memory bz2 110.0 18.6s 4M xz 73.3 9.6s 8M xz -9 59.0 8.3s 64M Test object: win-64/mkl-11.3.3-1.tar.bz2 • Drawbacks: • Higher memory requirements at decompression • Extra dependency in Python 2.7 (backports.lzma) 20
  21. 21. © 2017 Bloomberg Finance L.P. All rights reserved. Conclusions • The Anaconda Platform together with conda-forge is a great ecosystem for creating Python distributions • Bloomberg builds its own provisioning of environments around it • Automatic management of environments • Long-term support for existing notebooks • Allow minor updates to stable environments for continued maintenance • Mixing anaconda and conda-forge has some quirks • Pinning of packages • “Intended” set of packages vs. “frozen” set of packages 21
  22. 22. © 2017 Bloomberg Finance L.P. All rights reserved. Thank you! 22 AnacondaCON 2017 Armin Burgmeier <aburgmeier@bloomberg.net> Senior Software Engineer
  23. 23. © 2017 Bloomberg Finance L.P. All rights reserved. Lessons Learned: Conflict Hints • Conflict is if dependency constraints cannot be satisfied • e.g. ipywidgets=5.2.2 widgetsnbextension=1.2.3 • ipywidgets depends on widgetsnbextension >= 1.2.6 • conda creates “hints” on how to resolve a conflict: The following specifications were found to be in conflict: - alabaster 0.7.8 py35_0 - widgetsnbextension 1.2.3 py35_1 • alabaster just happens to be the first entry in the list of dependencies • http://github.com/conda/conda/issues/1859 23
  24. 24. © 2017 Bloomberg Finance L.P. All rights reserved. Environment extensions • What if you need a package not included in the platform? • Add an IPython extension “%install” • Runs the equivalent of • pip install –t some-directory <name>==<version> --no-deps --only-binary=:all: • Adds it to sys.path • Pros: • Reproducible • Sources the python package archive • Cons: • No dependency resolution • No installation of data files (such as Javascript for IPython widgets) • Only works for wheels (no custom code execution at install time) 24
  25. 25. © 2017 Bloomberg Finance L.P. All rights reserved. Environment extensions (cont.) • Alternative: Use conda • User specifies extra requirements, for example pandas >=0.19.1 • When creating the environment • Install packages from platform • Then install extra requirements • “Freeze” list of additional packages installed (possibly replacing platform packages) • When re-creating the environment • Install the packages from the recorded (“frozen”) package list • Might create a conflict when the minor platform version has changed: • Conda should be able to solve by downgrading some packages in the platform • Provide option to re-resolve requirements at a later point 25

×