SlideShare a Scribd company logo
1 of 43
Download to read offline
Effectively using Open Source
with conda
Travis E. Oliphant, PhD	

Continuum Analytics, Inc
The Opportunity
• Millions of
projects that can
be used in the
enterprise	

• Not enough to
just adopt once
— these projects
change rapidly 	

• Effective use
requires a plan
for managing
updates
The Challenge
Separation of Concerns leads to granular libraries with
often deep dependencies
The Challenge
• Different “entry-points” (end-user applications or
scripts) can have different dependencies. Often many
of the dependencies are shared but a few applications
need different versions of some packages.	

• Not specific to any particular language or ecosystem.
Python, Ruby, Node.Js, C/C++, .NET, Java, all have the
same problem: How do you manage software life-cycle
effectively?	

• Production deployments need stability. IT managers
want ease of deployment and testing. Developers want
agility and ease of development.
The Challenge
How can developers and domain
experts in an organization quickly and
easily take advantage of the latest
software developments yet still have
stable production deployments of
complex software?
You cannot take full advantage of the
pace of open-source development if
you don’t address this!
Case Study: SciPy
There was this thing called the
Internet and one could make a
web-page and put code up on
it and people started using it ...	

Facebook for Hackers
I started SciPy in 1999 while I was in grad-
school at the Mayo Clinic	

(it was called Multipack back then)
Case Study: SciPy
Packaging circa 1999: Source tar ball and
make file (users had to build)
SciPy is basically a bunch of C/C++/Fortran routines
with Python interfaces
Observation: Popularity of Multipack (Early SciPy)
grew significantly when Robert Kern made pre-
built binaries for Windows
Case Study: SciPy
• Difficulty of producing binaries plus the desire to avoid
the dependency chain and lack of broad packaging
solutions led to early SciPy being a “distribution” instead
of separate inter-related libraries.	

• There were (and are) too many different projects in
SciPy (projects need 1-5 core contributors for
communication dynamic reasons related to team-sizes)
Case Study: NumPy
I started writing NumPy in 2005 while I
was teaching at BYU (it was a merger
of Numeric and Numarray)
NumPy ABI has not changed
“officially” since 1.0 came out in 2006
Presumably extension modules (SciPy, scikit-learn, matplotlib,
etc.) compiled against NumPy 1.0 will still work on NumPy 1.8.1
This was not a design goal!!!
Case Study: NumPy
This was a point of some contention and
community difficulty when date-time was
added in version 1.4 (impossible without
changing the ABI in some way) but not
really settled until version 1.7
The fundamental reason was a user-driven
obsession with keeping ABI compatibility.
Windows users lacked useful packaging
solution in face of NumPy-Stack
NumPy Stack (cry for conda...)
NumPy
SciPy Pandas Matplotlib
scikit-learnscikit-image statsmodels
PyTables
OpenCV
Cython
Numba SymPy NumExpr
astropy BioPython GDALPySAL
... many many more ...
Fundamental Principles
•Complex things are built out of simple things	

•Fundamental principle of software engineering is
“separation of concerns” (modularity)	

•Reusability is enhanced when you “do one thing
and do it well”	

•But, to deploy you need to bring the pieces back
together.	

!
•This means you need a good packaging system for
binary artifacts — with multiple-environments.
Continuum Solutions (Free)
Conda
binstar.org Anaconda
Free all-in-one distribution of Python for
Analytics andVisualization	

• numpy, scipy, ipython	

• matplotlib, bokeh, 	

• pandas, statsmodels, scikit-learn	

• many, many more… 100+
Miniconda
Python + conda — with these you can
install exactly what you want…
• Binary repository of packages (public)	

• Multiple package types 	

• Free public build queue	

• Current focus on:	

• Python pypi-compatible packages
(source distributions)	

• conda packages (binary distributions)
$ conda install anaconda
• Cross-platform package manager	

• Dependency management (uses SAT
solver to resolve all dependencies)	

• System-level virtual environments (more
flexible than virtualenv)
Continuum Solutions (Premium)
Anaconda	

Server
• Binary repository for private package
Premium features:	

• hosting of private packages (public
packages are free)	

• access to priority build queue	

• $10 / month (individuals)	

• 25 private packages	

• 5 GB disk space	

• $50 / month (organizations)	

• 200 private packages	

• 30 GB disk space	

• right to have private packages in
organizations	

• $1500 / year	

• unlimited private packages	

• 100 GB of disk space
binstar.org
• Internal mirror of public repositories	

• Mix private internal packages with public
repositories	

• Build customized versions of Anaconda
installers	

• Environment to .exe and .rpm tools	

• Comprehensive licensing	

• Comprehensive support	

• On-premise version of binstar.org
System Packaging solutions
yum (rpm)	

apt-get (dpkg)
Linux OSX
macports 	

homebrew
Windows
chocolatey	

npackd
Cross-platform
conda
With virtual environments conda provides a modern, cross-
platform, system-level packaging and deployment solution
Conda Features
• Excellent support for “system-level” environments (like
having miniVMs but much lighter weight than docker.io)	

• Minimizes code-copies (uses hard/soft links if possible)	

• Dependency solver using fast satisfiability solver (SAT
solver)	

• Simple format binary tar-ball + meta-data 	

• Meta-data allows static analysis of dependencies	

• Easy to create multiple “channels” which are repositories
for binary packages	

• User installable (no root privileges needed)	

• Can still use tools like pip --- conda fills in where they
fail.
Examples
Setup a test environment
$ conda update conda
$ conda create -n test python pip
$ source activate test
Install another package
(test)$ conda install scikit-learn
$ activate test
Windows
First steps
$ conda create -n py3k python=3.3
$ source activate py3k
Create an environment
Install IPython notebook
(py3k) $ conda install ipython-notebook
$ conda create -n py3k python=3.3 ipython-notebook
$ source activate py3k
All in One
Anaconda installation
ROOT_DIR!
The directory that Anaconda was installed into; for
example, /opt/Anaconda or C:Anaconda!
/pkgs!
Also referred to as PKGS_DIR. This directory contains
exploded packages, ready to be linked in conda
environments. Each package resides in a subdirectory
corresponding to its canonical name.!
/envs!
The system location for additional conda environments to be
created.!
!
the default, or root, environment!
/bin!
/include!
/lib!
/share
Look at conda package --- a simple .tar.bz2
http://docs.continuum.io/conda/intro.html
Anatomy of unpacked conda package
/lib	

/include	

/bin	

/man
/info	

files	

index.json
bzipped tarfile of all the files comprising
the package at the full-paths they would
be installed to relative to a “system”
install or “chroot jail”
an environment is just a “union” of these paths
All conda packages have this info directory
which contains meta-data for tracked files,
dependency information, etc.
Environments
One honking great idea!	

Let’s do more of those!
Easy to make	

Easy to throw away
Uses:	

• Testing (python 2.6, 2.7, 3.3)	

• Development	

• Trying new packages from PyPI	

• Separating deployed apps with
different dependency needs	

• Trying new versions of Python	

• Reproducing someone’s work conda create -h
conda info -e
Getting System information
Basic info
conda info
Named-environment info
conda info --all
System info
conda info --system
conda install -n py3k scipy pip
http://repo.continuum.io/pkgs/dev
Experimental or developmental versions of packages
http://repo.continuum.io/pkgs/gpl
GPL licensed packages
http://repo.continuum.io/pkgs/free
non GPL open source packages
Default package repositories (configurable)
Installing packages
How it works
Channel 1
Channel 2
Channel N
metadata
metadata
metadata
conda
merged	

metadata
l
l
l
Create channels
• Create a directory of conda packages	

• Run conda index <dirname>	

• Either use file:///path/to/dir in .condarc or
use simple web server on the /path/to/dir
Option 1
Option 2
Use binstar.org (also available as on-premise solution	

with Anaconda Server)
Binstar.org — channels (request invite)
conda install -c
<channel name>
<pkg name> 	

!
will install from
binstar channel	

!
or you can add
channel to your
config file
free for public packages
conda list also includes packages installed via pip!
List Installed packages
conda create -n py3k scipy pip	

source activate py3k	

pip install pint
$ conda list
# packages in environment at /Users/travis/anaconda/envs/py3k:
#
numpy 1.8.1 py27_0
openssl 1.0.1g 0
pint 0.4.2 <pip>
pip 1.5.4 py27_0
python 2.7.6 1
readline 6.2 2
scipy 0.13.3 np18py27_0
setuptools 3.1 py27_0
sqlite 3.7.13 1
tk 8.5.13 1
wsgiref 0.1.2 <pip>
zlib 1.2.7 1
Output
Update a package to latest
conda update pandas
get the latest pandas from the 	

channels you are subscribed to
conda update anaconda
change to the latest released anaconda
including its specific dependencies
this can downgrade packages if
they are newer than those in
the “released” Anaconda
conda update --all
To update all the packages in an
environment to the latest
versions use the --all option
conda search <regex>Search for a package
Find packages and channels they are in
conda search --outdated sympy
Only show packages matching regex
that are installed but outdated
conda search typo
typogrify * 2.0.0 py27_0 http://conda.binstar.org/travis/osx-64/
2.0.0 py33_1 http://conda.binstar.org/asmeurer/osx-64/
2.0.0 py26_1 http://conda.binstar.org/asmeurer/osx-64/
sympy 0.7.1 py27_0 defaults
!
0.7.4 py26_0 defaults
0.7.4.1 py33_0 defaults
* 0.7.4.1 py27_0 defaults
0.7.4.1 py26_0 defaults
0.7.5 py34_0 defaults
0.7.5 py33_0 defaults
l
l
l
l
l
l
conda remove -n py3k scipy matplotlib
Removing files and environments
Removing Packages
Removing Environment
conda remove -n py3k --all
Note: packages are
just “unlinked” from
environment. All the
files are still available
unpacked in a
package cache.
Removing unused packages
conda clean -t	

conda clean -p
Remove unused tarballs
Remove unused directories
conda package -u	

conda package --pkg-name bulk --pkg-version 0.1
Untracked Files
Easy way to install into an environment using
anything (pip, make, setup.py, etc.) and then package
up all of it into a binary tar-ball deployable via 	

conda install <pkg-name>.tar.bz2
!
pickle for binary code!
# This is a sample .condarc file
!
# channel locations. These override conda defaults, i.e., conda will
# search *only* the channels listed here, in the order given. Use "default" to
# automatically include all default channels.
!
channels:
- defaults
- http://some.custom/channel
!
# Proxy settings
# http://[username]:[password]@[server]:[port]
proxy_servers:
http: http://user:pass@corp.com:8080
https: https://user:pass@corp.com:8080
!
envs_dirs:
- /opt/anaconda/envs
- /home/joe/my-envs
!
pkg_dirs:
- /home/joe/user-pkg-cache
- /opt/system/pkgs
!
changeps1: False
!
# binstar.org upload (not defined here means ask)
binstar_upload: True
Conda configuration
Scripting interface
conda config —add KEY VALUE
conda config —remove-key KEY
conda config —get KEY
conda config —set KEY BOOL
conda config —remove KEY VALUE
conda skeleton pypi <pypi-name>
Building new packages
conda build <recipe-dir>
Option 1
Option 2
conda pipbuild <pypi-name>
conda install conda-build
Conda Recipe is a directory
build.sh BASH build commands (POSIX)	

bld.bat CMD build commands (Win)	

meta.yaml extended yaml declarative meta-data
Required
Optional
run_test.py will be executed during test phase	

*.patch patch-files for the source 	

* any other resources needed by build but not included	

in sources described in meta.yaml file
Recipe MetaData
package:
name: # name of package
version: # version of package
about:
home: # home-page
license: # license
!
# All optional from here....
source:
fn: # filename of source
url: # url of source
md5: # hash of source
# or from git:
git_url:
git_tag:
patches: # list of patches to source
- fix.patch
build:
entry_points: # entry-points (binary commands or scripts)
- name = module:function
number: # defaults to 0
requirements: # lists of requirements
build: # requirements for build (as a list)
run: # requirements for running (as a list)
test:
requires: # list of requirements for testing
commands: # commands to run for testing (entry-points)
imports: # modules to import for testing
http://docs.continuum.io/conda/build.html
Converting to another platform
Conda packages are specific to a particular
platform. However, if there are no platform-
specific binary files in a package, it can be
converted automatically to a package that can
be installed on another platform.
conda convert --output-dir win32 --platform win-32 <package-file>
Example
Binstar.org (request invite)
Once you
have built a
conda
package, you
can share it
with the
world on
binstar.org	

!
conda install
-c <name>	

<pkgname>
free for public packages
Binstar
$ conda config --add channels
'http://conda.binstar.org/travis'
$ conda config --add channels
'http://conda.binstar.org/asmuerer'
Adding channels
Uploading packages
binstar upload /full/path/to/package.tar.bz2
binstar register /full/path/to/package.tar.bz2
if package never uploaded before
Binstar Package Types
Permissions Description
Private
Only people given permission can see this
package.
Personal
Everyone will be able to see this package in
your user repository.
Publish
This package will be published in the global
public repository.
Useful aliases
workon=‘source activate’	

workoff=‘source deactivate’
• Cross-platform Tested and Supported Python
Distribution	

• Enterprise Python Deployment	

• Private, Secure On-premise package repository	

• Comprehensive Licensing	

• Customized Installers and Mirrors	

• Additional Products	

• Enhanced Support	

• Optional, On-premise binstar.org
Thanks!
Aaron Meurer	

conda and binstar developer
Sean Ross-Ross (principal binstar.org)
BryanVan deVen (original conda author)
Ilan Schnell (principal conda developer)

More Related Content

What's hot

Introducing GitLab (September 2018)
Introducing GitLab (September 2018)Introducing GitLab (September 2018)
Introducing GitLab (September 2018)Noa Harel
 
Introducing GitLab (June 2018)
Introducing GitLab (June 2018)Introducing GitLab (June 2018)
Introducing GitLab (June 2018)Noa Harel
 
Intro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopIntro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopWeaveworks
 
Gitlab ci, cncf.sk
Gitlab ci, cncf.skGitlab ci, cncf.sk
Gitlab ci, cncf.skJuraj Hantak
 
Jenkins Introduction
Jenkins IntroductionJenkins Introduction
Jenkins IntroductionPavan Gupta
 
Introduction to Kubernetes and Google Container Engine (GKE)
Introduction to Kubernetes and Google Container Engine (GKE)Introduction to Kubernetes and Google Container Engine (GKE)
Introduction to Kubernetes and Google Container Engine (GKE)Opsta
 
GITS Class #16: CI/CD (Continuous Integration & Continuous Deployment) with G...
GITS Class #16: CI/CD (Continuous Integration & Continuous Deployment) with G...GITS Class #16: CI/CD (Continuous Integration & Continuous Deployment) with G...
GITS Class #16: CI/CD (Continuous Integration & Continuous Deployment) with G...GITS Indonesia
 
Github Case Study By Amil Ali
Github Case Study By Amil AliGithub Case Study By Amil Ali
Github Case Study By Amil AliAmilAli1
 
Introduction to github slideshare
Introduction to github slideshareIntroduction to github slideshare
Introduction to github slideshareRakesh Sukumar
 
Introduction to GitHub
Introduction to GitHubIntroduction to GitHub
Introduction to GitHubNishan Bose
 
Introduction to openshift
Introduction to openshiftIntroduction to openshift
Introduction to openshiftMamathaBusi
 

What's hot (20)

Github
GithubGithub
Github
 
Introducing GitLab (September 2018)
Introducing GitLab (September 2018)Introducing GitLab (September 2018)
Introducing GitLab (September 2018)
 
Introducing GitLab (June 2018)
Introducing GitLab (June 2018)Introducing GitLab (June 2018)
Introducing GitLab (June 2018)
 
Intro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopIntro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps Workshop
 
Gitlab ci, cncf.sk
Gitlab ci, cncf.skGitlab ci, cncf.sk
Gitlab ci, cncf.sk
 
GitHub Basics - Derek Bable
GitHub Basics - Derek BableGitHub Basics - Derek Bable
GitHub Basics - Derek Bable
 
Qt programming-using-cpp
Qt programming-using-cppQt programming-using-cpp
Qt programming-using-cpp
 
Introducing GitLab
Introducing GitLabIntroducing GitLab
Introducing GitLab
 
Jenkins Introduction
Jenkins IntroductionJenkins Introduction
Jenkins Introduction
 
Introduction to Kubernetes and Google Container Engine (GKE)
Introduction to Kubernetes and Google Container Engine (GKE)Introduction to Kubernetes and Google Container Engine (GKE)
Introduction to Kubernetes and Google Container Engine (GKE)
 
Gitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCDGitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCD
 
GITS Class #16: CI/CD (Continuous Integration & Continuous Deployment) with G...
GITS Class #16: CI/CD (Continuous Integration & Continuous Deployment) with G...GITS Class #16: CI/CD (Continuous Integration & Continuous Deployment) with G...
GITS Class #16: CI/CD (Continuous Integration & Continuous Deployment) with G...
 
Gitlab CI/CD
Gitlab CI/CDGitlab CI/CD
Gitlab CI/CD
 
GitOps w/argocd
GitOps w/argocdGitOps w/argocd
GitOps w/argocd
 
Jenkins
JenkinsJenkins
Jenkins
 
Github Case Study By Amil Ali
Github Case Study By Amil AliGithub Case Study By Amil Ali
Github Case Study By Amil Ali
 
Openshift
Openshift Openshift
Openshift
 
Introduction to github slideshare
Introduction to github slideshareIntroduction to github slideshare
Introduction to github slideshare
 
Introduction to GitHub
Introduction to GitHubIntroduction to GitHub
Introduction to GitHub
 
Introduction to openshift
Introduction to openshiftIntroduction to openshift
Introduction to openshift
 

Viewers also liked

Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData SolutionsTravis Oliphant
 
Python as the Zen of Data Science
Python as the Zen of Data SciencePython as the Zen of Data Science
Python as the Zen of Data ScienceTravis Oliphant
 
IPython from 30,000 feet
IPython from 30,000 feetIPython from 30,000 feet
IPython from 30,000 feettakluyver
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQLSATOSHI TAGOMORI
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen
 
Jupyter notebook 이해하기
Jupyter notebook 이해하기 Jupyter notebook 이해하기
Jupyter notebook 이해하기 Yong Joon Moon
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionGuido Schmutz
 

Viewers also liked (11)

Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
Python as the Zen of Data Science
Python as the Zen of Data SciencePython as the Zen of Data Science
Python as the Zen of Data Science
 
IPython from 30,000 feet
IPython from 30,000 feetIPython from 30,000 feet
IPython from 30,000 feet
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Power point
Power pointPower point
Power point
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
Code with style
Code with styleCode with style
Code with style
 
Jupyter notebook 이해하기
Jupyter notebook 이해하기 Jupyter notebook 이해하기
Jupyter notebook 이해하기
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
 

Similar to Effectively using Open Source with conda

Installing Anaconda Distribution of Python
Installing Anaconda Distribution of PythonInstalling Anaconda Distribution of Python
Installing Anaconda Distribution of PythonJatin Miglani
 
Docker based-Pipelines with Codefresh
Docker based-Pipelines with CodefreshDocker based-Pipelines with Codefresh
Docker based-Pipelines with CodefreshCodefresh
 
Docker based-pipelines
Docker based-pipelinesDocker based-pipelines
Docker based-pipelinesDevOps.com
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
O365Con19 - Sharing Code Efficiently in your Organisation - Elio Struyf
O365Con19 - Sharing Code Efficiently in your Organisation - Elio StruyfO365Con19 - Sharing Code Efficiently in your Organisation - Elio Struyf
O365Con19 - Sharing Code Efficiently in your Organisation - Elio StruyfNCCOMMS
 
Conda: A Cross-Platform Package Manager for Any Binary Distribution (SciPy 2014)
Conda: A Cross-Platform Package Manager for Any Binary Distribution (SciPy 2014)Conda: A Cross-Platform Package Manager for Any Binary Distribution (SciPy 2014)
Conda: A Cross-Platform Package Manager for Any Binary Distribution (SciPy 2014)Aaron Meurer
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack SummitMiguel Zuniga
 
The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
 The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
The Latest Status of CE Workgroup Shared Embedded Linux Distribution ProjectYoshitake Kobayashi
 
Undine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development EnvironmentsUndine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development EnvironmentsDavid Watson
 
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdfManaging Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdfAndrew Lamb
 
Building distribution packages with Docker
Building distribution packages with DockerBuilding distribution packages with Docker
Building distribution packages with DockerBruno Cornec
 
Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub Docker, Inc.
 
Leverage the power of Open Source in your company
Leverage the power of Open Source in your company Leverage the power of Open Source in your company
Leverage the power of Open Source in your company Guillaume POTIER
 
Survey of Container Build Tools
Survey of Container Build ToolsSurvey of Container Build Tools
Survey of Container Build ToolsMichael Ducy
 
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
Leonid Vasilyev  "Building, deploying and running production code at Dropbox"Leonid Vasilyev  "Building, deploying and running production code at Dropbox"
Leonid Vasilyev "Building, deploying and running production code at Dropbox"IT Event
 
2022.03.23 Conda and Conda environments.pptx
2022.03.23 Conda and Conda environments.pptx2022.03.23 Conda and Conda environments.pptx
2022.03.23 Conda and Conda environments.pptxPhilip Ashton
 
Ben keynote 5
Ben keynote 5Ben keynote 5
Ben keynote 5Ben Golub
 

Similar to Effectively using Open Source with conda (20)

Installing Anaconda Distribution of Python
Installing Anaconda Distribution of PythonInstalling Anaconda Distribution of Python
Installing Anaconda Distribution of Python
 
Docker based-Pipelines with Codefresh
Docker based-Pipelines with CodefreshDocker based-Pipelines with Codefresh
Docker based-Pipelines with Codefresh
 
Conda environment system & how to use it on CSUC machines
Conda environment system & how to use it on CSUC machinesConda environment system & how to use it on CSUC machines
Conda environment system & how to use it on CSUC machines
 
Docker based-pipelines
Docker based-pipelinesDocker based-pipelines
Docker based-pipelines
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
O365Con19 - Sharing Code Efficiently in your Organisation - Elio Struyf
O365Con19 - Sharing Code Efficiently in your Organisation - Elio StruyfO365Con19 - Sharing Code Efficiently in your Organisation - Elio Struyf
O365Con19 - Sharing Code Efficiently in your Organisation - Elio Struyf
 
Conda: A Cross-Platform Package Manager for Any Binary Distribution (SciPy 2014)
Conda: A Cross-Platform Package Manager for Any Binary Distribution (SciPy 2014)Conda: A Cross-Platform Package Manager for Any Binary Distribution (SciPy 2014)
Conda: A Cross-Platform Package Manager for Any Binary Distribution (SciPy 2014)
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
 
Bioconda and the Conda Package Manager
Bioconda and the Conda Package ManagerBioconda and the Conda Package Manager
Bioconda and the Conda Package Manager
 
The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
 The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
The Latest Status of CE Workgroup Shared Embedded Linux Distribution Project
 
Undine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development EnvironmentsUndine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development Environments
 
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdfManaging Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
Managing Software Dependencies and the Supply Chain_ MIT EM.S20.pdf
 
Building distribution packages with Docker
Building distribution packages with DockerBuilding distribution packages with Docker
Building distribution packages with Docker
 
Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub
 
Leverage the power of Open Source in your company
Leverage the power of Open Source in your company Leverage the power of Open Source in your company
Leverage the power of Open Source in your company
 
R reproducibility
R reproducibilityR reproducibility
R reproducibility
 
Survey of Container Build Tools
Survey of Container Build ToolsSurvey of Container Build Tools
Survey of Container Build Tools
 
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
Leonid Vasilyev  "Building, deploying and running production code at Dropbox"Leonid Vasilyev  "Building, deploying and running production code at Dropbox"
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
 
2022.03.23 Conda and Conda environments.pptx
2022.03.23 Conda and Conda environments.pptx2022.03.23 Conda and Conda environments.pptx
2022.03.23 Conda and Conda environments.pptx
 
Ben keynote 5
Ben keynote 5Ben keynote 5
Ben keynote 5
 

More from Travis Oliphant

Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataTravis Oliphant
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019Travis Oliphant
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019Travis Oliphant
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationTravis Oliphant
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsTravis Oliphant
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona KeynoteTravis Oliphant
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with AnacondaTravis Oliphant
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataTravis Oliphant
 
Blaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for PythonBlaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for PythonTravis Oliphant
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyTravis Oliphant
 

More from Travis Oliphant (18)

Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019
 
PyCon Estonia 2019
PyCon Estonia 2019PyCon Estonia 2019
PyCon Estonia 2019
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft Presentation
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
 
London level39
London level39London level39
London level39
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Blaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for PythonBlaze: a large-scale, array-oriented infrastructure for Python
Blaze: a large-scale, array-oriented infrastructure for Python
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPy
 
Numba lightning
Numba lightningNumba lightning
Numba lightning
 
PyData Introduction
PyData IntroductionPyData Introduction
PyData Introduction
 
Numba
NumbaNumba
Numba
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Effectively using Open Source with conda

  • 1. Effectively using Open Source with conda Travis E. Oliphant, PhD Continuum Analytics, Inc
  • 2. The Opportunity • Millions of projects that can be used in the enterprise • Not enough to just adopt once — these projects change rapidly • Effective use requires a plan for managing updates
  • 3. The Challenge Separation of Concerns leads to granular libraries with often deep dependencies
  • 4. The Challenge • Different “entry-points” (end-user applications or scripts) can have different dependencies. Often many of the dependencies are shared but a few applications need different versions of some packages. • Not specific to any particular language or ecosystem. Python, Ruby, Node.Js, C/C++, .NET, Java, all have the same problem: How do you manage software life-cycle effectively? • Production deployments need stability. IT managers want ease of deployment and testing. Developers want agility and ease of development.
  • 5. The Challenge How can developers and domain experts in an organization quickly and easily take advantage of the latest software developments yet still have stable production deployments of complex software? You cannot take full advantage of the pace of open-source development if you don’t address this!
  • 6. Case Study: SciPy There was this thing called the Internet and one could make a web-page and put code up on it and people started using it ... Facebook for Hackers I started SciPy in 1999 while I was in grad- school at the Mayo Clinic (it was called Multipack back then)
  • 7. Case Study: SciPy Packaging circa 1999: Source tar ball and make file (users had to build) SciPy is basically a bunch of C/C++/Fortran routines with Python interfaces Observation: Popularity of Multipack (Early SciPy) grew significantly when Robert Kern made pre- built binaries for Windows
  • 8. Case Study: SciPy • Difficulty of producing binaries plus the desire to avoid the dependency chain and lack of broad packaging solutions led to early SciPy being a “distribution” instead of separate inter-related libraries. • There were (and are) too many different projects in SciPy (projects need 1-5 core contributors for communication dynamic reasons related to team-sizes)
  • 9. Case Study: NumPy I started writing NumPy in 2005 while I was teaching at BYU (it was a merger of Numeric and Numarray) NumPy ABI has not changed “officially” since 1.0 came out in 2006 Presumably extension modules (SciPy, scikit-learn, matplotlib, etc.) compiled against NumPy 1.0 will still work on NumPy 1.8.1 This was not a design goal!!!
  • 10. Case Study: NumPy This was a point of some contention and community difficulty when date-time was added in version 1.4 (impossible without changing the ABI in some way) but not really settled until version 1.7 The fundamental reason was a user-driven obsession with keeping ABI compatibility. Windows users lacked useful packaging solution in face of NumPy-Stack
  • 11. NumPy Stack (cry for conda...) NumPy SciPy Pandas Matplotlib scikit-learnscikit-image statsmodels PyTables OpenCV Cython Numba SymPy NumExpr astropy BioPython GDALPySAL ... many many more ...
  • 12. Fundamental Principles •Complex things are built out of simple things •Fundamental principle of software engineering is “separation of concerns” (modularity) •Reusability is enhanced when you “do one thing and do it well” •But, to deploy you need to bring the pieces back together. ! •This means you need a good packaging system for binary artifacts — with multiple-environments.
  • 13. Continuum Solutions (Free) Conda binstar.org Anaconda Free all-in-one distribution of Python for Analytics andVisualization • numpy, scipy, ipython • matplotlib, bokeh, • pandas, statsmodels, scikit-learn • many, many more… 100+ Miniconda Python + conda — with these you can install exactly what you want… • Binary repository of packages (public) • Multiple package types • Free public build queue • Current focus on: • Python pypi-compatible packages (source distributions) • conda packages (binary distributions) $ conda install anaconda • Cross-platform package manager • Dependency management (uses SAT solver to resolve all dependencies) • System-level virtual environments (more flexible than virtualenv)
  • 14. Continuum Solutions (Premium) Anaconda Server • Binary repository for private package Premium features: • hosting of private packages (public packages are free) • access to priority build queue • $10 / month (individuals) • 25 private packages • 5 GB disk space • $50 / month (organizations) • 200 private packages • 30 GB disk space • right to have private packages in organizations • $1500 / year • unlimited private packages • 100 GB of disk space binstar.org • Internal mirror of public repositories • Mix private internal packages with public repositories • Build customized versions of Anaconda installers • Environment to .exe and .rpm tools • Comprehensive licensing • Comprehensive support • On-premise version of binstar.org
  • 15. System Packaging solutions yum (rpm) apt-get (dpkg) Linux OSX macports homebrew Windows chocolatey npackd Cross-platform conda With virtual environments conda provides a modern, cross- platform, system-level packaging and deployment solution
  • 16. Conda Features • Excellent support for “system-level” environments (like having miniVMs but much lighter weight than docker.io) • Minimizes code-copies (uses hard/soft links if possible) • Dependency solver using fast satisfiability solver (SAT solver) • Simple format binary tar-ball + meta-data • Meta-data allows static analysis of dependencies • Easy to create multiple “channels” which are repositories for binary packages • User installable (no root privileges needed) • Can still use tools like pip --- conda fills in where they fail.
  • 17. Examples Setup a test environment $ conda update conda $ conda create -n test python pip $ source activate test Install another package (test)$ conda install scikit-learn $ activate test Windows
  • 18. First steps $ conda create -n py3k python=3.3 $ source activate py3k Create an environment Install IPython notebook (py3k) $ conda install ipython-notebook $ conda create -n py3k python=3.3 ipython-notebook $ source activate py3k All in One
  • 19. Anaconda installation ROOT_DIR! The directory that Anaconda was installed into; for example, /opt/Anaconda or C:Anaconda! /pkgs! Also referred to as PKGS_DIR. This directory contains exploded packages, ready to be linked in conda environments. Each package resides in a subdirectory corresponding to its canonical name.! /envs! The system location for additional conda environments to be created.! ! the default, or root, environment! /bin! /include! /lib! /share
  • 20. Look at conda package --- a simple .tar.bz2 http://docs.continuum.io/conda/intro.html
  • 21. Anatomy of unpacked conda package /lib /include /bin /man /info files index.json bzipped tarfile of all the files comprising the package at the full-paths they would be installed to relative to a “system” install or “chroot jail” an environment is just a “union” of these paths All conda packages have this info directory which contains meta-data for tracked files, dependency information, etc.
  • 22. Environments One honking great idea! Let’s do more of those! Easy to make Easy to throw away Uses: • Testing (python 2.6, 2.7, 3.3) • Development • Trying new packages from PyPI • Separating deployed apps with different dependency needs • Trying new versions of Python • Reproducing someone’s work conda create -h
  • 23. conda info -e Getting System information Basic info conda info Named-environment info conda info --all System info conda info --system
  • 24. conda install -n py3k scipy pip http://repo.continuum.io/pkgs/dev Experimental or developmental versions of packages http://repo.continuum.io/pkgs/gpl GPL licensed packages http://repo.continuum.io/pkgs/free non GPL open source packages Default package repositories (configurable) Installing packages
  • 25. How it works Channel 1 Channel 2 Channel N metadata metadata metadata conda merged metadata l l l
  • 26. Create channels • Create a directory of conda packages • Run conda index <dirname> • Either use file:///path/to/dir in .condarc or use simple web server on the /path/to/dir Option 1 Option 2 Use binstar.org (also available as on-premise solution with Anaconda Server)
  • 27. Binstar.org — channels (request invite) conda install -c <channel name> <pkg name> ! will install from binstar channel ! or you can add channel to your config file free for public packages
  • 28. conda list also includes packages installed via pip! List Installed packages conda create -n py3k scipy pip source activate py3k pip install pint $ conda list # packages in environment at /Users/travis/anaconda/envs/py3k: # numpy 1.8.1 py27_0 openssl 1.0.1g 0 pint 0.4.2 <pip> pip 1.5.4 py27_0 python 2.7.6 1 readline 6.2 2 scipy 0.13.3 np18py27_0 setuptools 3.1 py27_0 sqlite 3.7.13 1 tk 8.5.13 1 wsgiref 0.1.2 <pip> zlib 1.2.7 1 Output
  • 29. Update a package to latest conda update pandas get the latest pandas from the channels you are subscribed to conda update anaconda change to the latest released anaconda including its specific dependencies this can downgrade packages if they are newer than those in the “released” Anaconda conda update --all To update all the packages in an environment to the latest versions use the --all option
  • 30. conda search <regex>Search for a package Find packages and channels they are in conda search --outdated sympy Only show packages matching regex that are installed but outdated conda search typo typogrify * 2.0.0 py27_0 http://conda.binstar.org/travis/osx-64/ 2.0.0 py33_1 http://conda.binstar.org/asmeurer/osx-64/ 2.0.0 py26_1 http://conda.binstar.org/asmeurer/osx-64/ sympy 0.7.1 py27_0 defaults ! 0.7.4 py26_0 defaults 0.7.4.1 py33_0 defaults * 0.7.4.1 py27_0 defaults 0.7.4.1 py26_0 defaults 0.7.5 py34_0 defaults 0.7.5 py33_0 defaults l l l l l l
  • 31. conda remove -n py3k scipy matplotlib Removing files and environments Removing Packages Removing Environment conda remove -n py3k --all Note: packages are just “unlinked” from environment. All the files are still available unpacked in a package cache. Removing unused packages conda clean -t conda clean -p Remove unused tarballs Remove unused directories
  • 32. conda package -u conda package --pkg-name bulk --pkg-version 0.1 Untracked Files Easy way to install into an environment using anything (pip, make, setup.py, etc.) and then package up all of it into a binary tar-ball deployable via conda install <pkg-name>.tar.bz2 ! pickle for binary code!
  • 33. # This is a sample .condarc file ! # channel locations. These override conda defaults, i.e., conda will # search *only* the channels listed here, in the order given. Use "default" to # automatically include all default channels. ! channels: - defaults - http://some.custom/channel ! # Proxy settings # http://[username]:[password]@[server]:[port] proxy_servers: http: http://user:pass@corp.com:8080 https: https://user:pass@corp.com:8080 ! envs_dirs: - /opt/anaconda/envs - /home/joe/my-envs ! pkg_dirs: - /home/joe/user-pkg-cache - /opt/system/pkgs ! changeps1: False ! # binstar.org upload (not defined here means ask) binstar_upload: True Conda configuration Scripting interface conda config —add KEY VALUE conda config —remove-key KEY conda config —get KEY conda config —set KEY BOOL conda config —remove KEY VALUE
  • 34. conda skeleton pypi <pypi-name> Building new packages conda build <recipe-dir> Option 1 Option 2 conda pipbuild <pypi-name> conda install conda-build
  • 35. Conda Recipe is a directory build.sh BASH build commands (POSIX) bld.bat CMD build commands (Win) meta.yaml extended yaml declarative meta-data Required Optional run_test.py will be executed during test phase *.patch patch-files for the source * any other resources needed by build but not included in sources described in meta.yaml file
  • 36. Recipe MetaData package: name: # name of package version: # version of package about: home: # home-page license: # license ! # All optional from here.... source: fn: # filename of source url: # url of source md5: # hash of source # or from git: git_url: git_tag: patches: # list of patches to source - fix.patch build: entry_points: # entry-points (binary commands or scripts) - name = module:function number: # defaults to 0 requirements: # lists of requirements build: # requirements for build (as a list) run: # requirements for running (as a list) test: requires: # list of requirements for testing commands: # commands to run for testing (entry-points) imports: # modules to import for testing http://docs.continuum.io/conda/build.html
  • 37. Converting to another platform Conda packages are specific to a particular platform. However, if there are no platform- specific binary files in a package, it can be converted automatically to a package that can be installed on another platform. conda convert --output-dir win32 --platform win-32 <package-file> Example
  • 38. Binstar.org (request invite) Once you have built a conda package, you can share it with the world on binstar.org ! conda install -c <name> <pkgname> free for public packages
  • 39. Binstar $ conda config --add channels 'http://conda.binstar.org/travis' $ conda config --add channels 'http://conda.binstar.org/asmuerer' Adding channels Uploading packages binstar upload /full/path/to/package.tar.bz2 binstar register /full/path/to/package.tar.bz2 if package never uploaded before
  • 40. Binstar Package Types Permissions Description Private Only people given permission can see this package. Personal Everyone will be able to see this package in your user repository. Publish This package will be published in the global public repository.
  • 42. • Cross-platform Tested and Supported Python Distribution • Enterprise Python Deployment • Private, Secure On-premise package repository • Comprehensive Licensing • Customized Installers and Mirrors • Additional Products • Enhanced Support • Optional, On-premise binstar.org
  • 43. Thanks! Aaron Meurer conda and binstar developer Sean Ross-Ross (principal binstar.org) BryanVan deVen (original conda author) Ilan Schnell (principal conda developer)