Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using docker for data science - part 2


Published on

A lightning talk for PyData London ( on using docker and fig to manage your data science development environment.

Published in: Software

Using docker for data science - part 2

  2. 2. RECAP
  3. 3. WHY DOCKER Portable environment Isolated between projects Stateless Fast local file access Hetrogenous
  4. 4. GET DOCKER boot2docker .dmg or .exe apt-get install ...
  5. 5. RUN SCIPYSERVER $ docker run -d -e "PASSWORD=YourPassword?" ipython/scipyserver $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 ipython/scipyserver https://localhost:443 https://{boot2docker ip}:443
  6. 6. CREATE DATA-ONLY CONTAINERS $ docker run -d -v ~/notebooks:/notebooks --name notebooks_container ubuntu echo notebooks $ docker run -d -v ~/data:/data --name data_container ubuntu echo
  7. 7. MOUNT DATA-ONLY CONTAINERS $ docker stop dev_notebook $ docker rm dev_notebook $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container ipython/scipyserver
  8. 8. CREATE A DOCKERFILE FROM ipython/scipyserver MAINTAINER Calvin Giles <> COPY requirements.txt /requirements.txt RUN pip2 install -r /requirements.txt RUN pip3 install -r /requirements.txt $ docker build -t calvingiles/ds-notebook . $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container calvingiles/ds-notebook
  9. 9. THIS TIME Creating and connecting to local database containers Tweaking the boot2docker vm memory from 2GB to 8 (or more...) Automated builds with github linking Forget everything and use fig
  10. 10. CREATE LOCAL DATABASE CONTAINERS $ docker run -d -v /var/lib/postgresql/data --name=pg_data ubuntu $ docker run -d --name=dev_postgres postgres $ docker run -d --name=dev_mongo mongo $ docker run -d -e "PASSWORD=YourPassword?" --link dev_postgres:dev_postgres --link dev_mongo:dev_mongo --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container calvingiles/ds-notebook
  11. 11. TWEAK YOU MEMORY IN YOUR VM ABOVE 2GB Either: $ boot2docker delete $ boot2docker init -m 5555 ... lots of output ... $ boot2docker info { ... "Memory":5555 ...} Or (doesn't loose non-host data persistence): $ VBoxManage modifyvm boot2docker-vm --memory 5555 $ boot2docker stop $ boot2docker start $ boot2docker info { ... "Memory":5555 ...}
  12. 12. AUTOMATED BUILDS WITH GITHUB LINKING Commit Dockerfile, requirements.txt etc. to a github repo Add an "Automated Buld" on docker hub Select the repo and accept defaults Check the "Build Details" for your repo build to finish $ docker run <dockername>/<reponame>
  13. 13. FORGET EVERYTHING AND USE FIG $ curl -L 1.0.1/fig-`uname -s`-`uname -m` > ~/bin/fig $ chmod +x ~/bin/fig
  14. 14. FIG.YML -- DATA notebooks: command: echo created image: busybox volumes: - "~/Google Drive/notebooks:/notebooks/analysis" data: command: echo created image: busybox volumes: - "~/Google Drive/data:/data/analysis" ...
  15. 15. FIG.YML -- POSTGRES ... devpostgresdata: command: echo created image: busybox volumes: - /var/lib/postgresql/data devpostgres: environment: - POSTGRES_PASSWORD image: postgres links: ports: - "5432:5432" volumes_from: - devpostgresdata ...
  16. 16. FIG.YML -- NOTEBOOK SERVER ... ds_server: environment: - PASSWORD image: calvingiles/data-science-environment links: - devpostgres:postgres ports: - "443:8888" volumes_from: - notebooks - data
  17. 17. FIG UP In the same directory as fig.yml: $ fig rm $ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
  18. 18. HERE'S ONE I MADE EARLIER $ curl -L > fig.yml $ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
  19. 19. NEXT TIME Linking to private git repositories Lessons learnt from using fig Resizing boot2docker volume (to fix "no space left on device") Fixing "Error response from daemon: client and server don't have same version" TLS and CA certs to fix "Your connection is not private" Whatever other pain I have had to deal with before then Whatever pain you feel -- let me know @cavingiles
  20. 20. MORE? Docker: Fig: ipython docker images: my docker image: fig.yml gist:
  21. 21. ABOUT ME Calvin Giles Data Scientist at Adthena PyData Meetup Organiser @calvingiles on twitter, github, docker hub (and many more)