3. Who am I?
APhysicist -MPhys fromUniversity of Southampton
AData Scientist at Adthena
APyData meetupand conference co-organiser
Data Science Advisorforuntangleconsulting.io
Programming in python fornearly 10years
Using dockerfor3months
8. Then Idecided to contribute to sklearn.
Followed by many build errors.
The quickest solution -disable macports.
$ git clone git://github.com/scikit-learn/scikit-learn.git
$ python setup.py build_ext --inplace
10. With my environment in tatters...
...and faced with re-installing fromscratch,Idecided there must be a betterway.
What about:
Homebrew
Boxen
virtualenv
anaconda
npm,rpm
vagrant,chef,puppet,ansible
VirtualBox,VMFusion
Docker,CoreOS Rocket
fig,dokku,flynn,deis
Surely one of these would help?
11. What do I want in a solution?
Trivialto wipe the slate clean and recreate
Portable (home laptopenv ==work laptopstate)
Easy to share
Configure once,use everywhere
Remote databases,servers etc.
Customisation (sublime,.virc,.bashrcetc.)
Installation quirks
No system-wide backuprequired
Compatible with deployment to servers
OS X-centric
12. Introducing Docker!!
boot2docker-a single virtualmachine running on VirtualBox(OS XorWindows)
dockerdaemon running on boot2dockeros
dockercontainers running in partialisolation inside the same boot2dockervirtual
machine
dockerclient running on the host (OS X)to simplify the issuing of commands
dockerimages as templates forcontainers
dockerimages ->.isostyle templates
dockercontainers ->lightweight virtualmachines intended to run justone process each
13. What do you get with docker?
Run multiple environments indipendently
Run services indipendently of environments e.g.databases
Permit an environment to interact with a specific subset of the host files
Share a poolof resources between allenvironments
Asingle containercan consume 100%of CPU,RAMand HDD
Quotas forwhen resources busy
14. What can be problematic
Trust -processes are given read-write access to yourfiles
stick to trusted builds and automated builds
not really different to installing any software
Resources are limited to VMallocation
Lot's to learn
Managing containers (starting,stopping etc.)
21. What do all these arguments do?
-drun as daemon
-i -t --rmrun interactively and auto-remove on exit
-eset an env variable
-pmapa port like -p host:container
-vmapa host volume into the container
--linkautomatically link containers,particularly databases
-wset the working directory
--volumes-frommapallthe volumes fromthe named container
23. Where do images come from?
The trusted builds on dockerhub (like ubuntu,postgres,node etc.)
Open source providers with automated builds (like ipython,julia etc.)
Public images uploaded in a built state (quite opaque)
Private images (built locally orvia docker login)
24. How do I build my own images?
Write a Dockerfile.
Eitherbuild and run it locally like:
Orupload it to github and have the dockerhub build it foryou automatically:
Wait forbuild...
$ docker build -t calvingiles/magic-image .
$ docker run calvingiles/magic-image
$ git push
$ docker run calvingiles/magic-image
25. What is this Dockerfile?
FROM ipython/scipyserver
MAINTAINER Calvin Giles <calvin.giles@gmail.com>
# Create install folder
RUN mkdir /install_files
# Install postgres libraries and python dev libraries
# so we can install psycopg2 later
RUN apt-get update
RUN apt-get install libpq-dev python-dev
# install python requirements
COPY requirements.txt /install_files/requirements.txt
RUN pip2 install -r /install_files/requirements.txt
RUN pip3 install -r /install_files/requirements.txt
# Set the working directory to /notebooks
WORKDIR /notebooks
26. Components of a Dockerfile
FROM:anotherimage to build upon (ubuntu,debian,ipython...)
RUN:execute a command in the containerand write teh results into the image
COPY:copy a file fromthe build filesystemto the image
WORKDIR:change the working directory (the containerstarts in the last WORKDIR)
ENV:set and env variable
EXPOSE:open upa port to linked containers and the host
27. So how do I actually use docker?
Find an image to start yourenvironment off (ubuntu,ipython/scipystack,
rocker/rstudio)
Create a Dockerfilecontaining only a FROMline:
build and run
FROM ipython/scipystack
28. Let's start with the ipython notebook serverwith scipystack:
Find yourboot2docker ip:
Navigate there https://your-ip:443and sign in with the PASSWORD
$ echo 'FROM ipython/scipyserver' > Dockerfile
$ docker build -t ipython-dev-env .
$ docker run -i -t --rm -e PASSWORD=MyPass -p 443:8888 ipython-dev-env
$ boot2docker ip
36. How do I get a database?
You willget the IP and PORTS to connect to as env variables in the ipython-dev-env container
$ docker run -d --name dev-postgres postgres
$ docker run -d
-e PASSWORD=MyPass
-p 443:8888
--link dev-postgres:dev-postgres
ipython-dev-env
37. What about my data?
$ docker run -d
-v "~/Google Drive/data:/data"
--name gddata
busybox echo
$ docker run -d
-e PASSWORD=MyPass
-p 443:8888
--volumes-from gddata
ipython-dev-env
39. Git push?
In Dockerhub,create new and select a Automated Build.
Point it to yourgithub orbitbucket repo
Wait forthe build to complete
$ docker pull calvingiles/data-science-environment
$ docker run calvingiles/data-science-environment
42. FROM ipython/scipyserver
MAINTAINER Calvin Giles <calvin.giles@gmail.com>
# Create install folder
RUN mkdir /install_files
# Update aptitude with new repo
RUN apt-get update
# Install software
RUN apt-get install -y git
# Make ssh dir
RUN mkdir /root/.ssh/
## Authenticate with github
# Copy over private key, and set permissions
COPY id_rsa /root/.ssh/id_rsa
RUN chmod 600 /root/.ssh/id_rsa
# Create known_hosts
RUN touch /root/.ssh/known_hosts
# Add github key
RUN ssh-keyscan github.com >> /root/.ssh/known_hosts
## install pyodbc so we can talk to MS SQL
# install unixodbc and freetds
RUN apt-get -y install unixodbc unixodbc-dev freetds-dev tdsodbc
# configure Adthena database with read-only permissions
COPY freetds.conf.suffix /install_files/freedts.conf.suffix
RUN cat /install_files/freedts.conf.suffix >> /etc/freetds/freetds.conf
COPY odbcinst.ini /etc/odbcinst.ini
COPY odbc.ini /etc/odbc.ini
# Install pyodbc from source
RUN pip2 install https://pyodbc.googlecode.com/files/pyodbc-3.0.7.zip
RUN pip3 install https://pyodbc.googlecode.com/files/pyodbc-3.0.7.zip
43. # install python requirements
COPY requirements.txt /install_files/requirements.txt
RUN pip2 install -r /install_files/requirements.txt
RUN pip3 install -r /install_files/requirements.txt
# Clone wayside into the docker container
RUN mkdir -p /repos/wayside
WORKDIR /repos/wayside
RUN git clone git@github.com:Adthena/wayside.git .
RUN python2 setup.py develop
RUN python3 setup.py develop
# Get rid of ssh key from image now repos have been cloned
RUN rm /root/.ssh/id_rsa
# Put the working directory back to notebooks at the end
WORKDIR /notebooks
44. Sum up
Find a base image
Run a containerand trialrun yourinstallsteps
Create a Dockerfileto performthose steps consistently
My environments
my public development environment -
my public dockerimages -
docker run -it --rm calvingiles/<image>
build upon with FROM calvingiles/<image>
fork (in github)if you need things a little different
github.com/calvingiles/data-science-
environment (https://github.com/calvingiles/data-science-environment)
hub.docker.com/u/calvingiles/
(https://hub.docker.com/u/calvingiles/)