Using docker for data science - part 2

1,579 views

Published on

A lightning talk for PyData London (http://www.meetup.com/PyData-London-Meetup/) on using docker and fig to manage your data science development environment.

Published in: Software

Using docker for data science - part 2

  1. 1. USING DOCKER FOR DATA SCIENCE
  2. 2. RECAP
  3. 3. WHY DOCKER Portable environment Isolated between projects Stateless Fast local file access Hetrogenous
  4. 4. GET DOCKER https://docs.docker.com/installation/ boot2docker .dmg or .exe apt-get install docker.io ...
  5. 5. RUN SCIPYSERVER $ docker run -d -e "PASSWORD=YourPassword?" ipython/scipyserver $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 ipython/scipyserver https://localhost:443 https://{boot2docker ip}:443
  6. 6. CREATE DATA-ONLY CONTAINERS $ docker run -d -v ~/notebooks:/notebooks --name notebooks_container ubuntu echo notebooks $ docker run -d -v ~/data:/data --name data_container ubuntu echo
  7. 7. MOUNT DATA-ONLY CONTAINERS $ docker stop dev_notebook $ docker rm dev_notebook $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container ipython/scipyserver
  8. 8. CREATE A DOCKERFILE FROM ipython/scipyserver MAINTAINER Calvin Giles <calvin.giles@gmail.com> COPY requirements.txt /requirements.txt RUN pip2 install -r /requirements.txt RUN pip3 install -r /requirements.txt $ docker build -t calvingiles/ds-notebook . $ docker run -d -e "PASSWORD=YourPassword?" --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container calvingiles/ds-notebook
  9. 9. THIS TIME Creating and connecting to local database containers Tweaking the boot2docker vm memory from 2GB to 8 (or more...) Automated builds with github linking Forget everything and use fig
  10. 10. CREATE LOCAL DATABASE CONTAINERS $ docker run -d -v /var/lib/postgresql/data --name=pg_data ubuntu $ docker run -d --name=dev_postgres postgres $ docker run -d --name=dev_mongo mongo $ docker run -d -e "PASSWORD=YourPassword?" --link dev_postgres:dev_postgres --link dev_mongo:dev_mongo --name dev_notebook -p 443:8888 --volumes-from data_container --volumes-from notebooks_container calvingiles/ds-notebook
  11. 11. TWEAK YOU MEMORY IN YOUR VM ABOVE 2GB Either: $ boot2docker delete $ boot2docker init -m 5555 ... lots of output ... $ boot2docker info { ... "Memory":5555 ...} Or (doesn't loose non-host data persistence): $ VBoxManage modifyvm boot2docker-vm --memory 5555 $ boot2docker stop $ boot2docker start $ boot2docker info { ... "Memory":5555 ...}
  12. 12. AUTOMATED BUILDS WITH GITHUB LINKING Commit Dockerfile, requirements.txt etc. to a github repo Add an "Automated Buld" on docker hub Select the repo and accept defaults Check the "Build Details" for your repo build to finish $ docker run <dockername>/<reponame>
  13. 13. FORGET EVERYTHING AND USE FIG http://www.fig.sh/install.html $ curl -L https://github.com/docker/fig/releases/download/ 1.0.1/fig-`uname -s`-`uname -m` > ~/bin/fig $ chmod +x ~/bin/fig
  14. 14. FIG.YML -- DATA notebooks: command: echo created image: busybox volumes: - "~/Google Drive/notebooks:/notebooks/analysis" data: command: echo created image: busybox volumes: - "~/Google Drive/data:/data/analysis" ...
  15. 15. FIG.YML -- POSTGRES ... devpostgresdata: command: echo created image: busybox volumes: - /var/lib/postgresql/data devpostgres: environment: - POSTGRES_PASSWORD image: postgres links: ports: - "5432:5432" volumes_from: - devpostgresdata ...
  16. 16. FIG.YML -- NOTEBOOK SERVER ... ds_server: environment: - PASSWORD image: calvingiles/data-science-environment links: - devpostgres:postgres ports: - "443:8888" volumes_from: - notebooks - data
  17. 17. FIG UP In the same directory as fig.yml: $ fig rm $ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
  18. 18. HERE'S ONE I MADE EARLIER $ curl -L http://goo.gl/rW47v3 > fig.yml $ PASSWORD=MyPass POSTGRES_PASSWORD=PGPass fig up -d
  19. 19. NEXT TIME Linking to private git repositories Lessons learnt from using fig Resizing boot2docker volume (to fix "no space left on device") Fixing "Error response from daemon: client and server don't have same version" TLS and CA certs to fix "Your connection is not private" Whatever other pain I have had to deal with before then Whatever pain you feel -- let me know @cavingiles
  20. 20. MORE? Docker: http://docs.docker.com/userguide/ http://docs.docker.com/reference/commandline/cli/ Fig: http://www.fig.sh/ ipython docker images: https://registry.hub.docker.com/repos/ipython/ my docker image: https://github.com/calvingiles/data-science-environment https://registry.hub.docker.com/u/calvingiles/data-science-environment/ fig.yml gist: http://goo.gl/rW47v3
  21. 21. ABOUT ME Calvin Giles Data Scientist at Adthena PyData Meetup Organiser untangleconsulting.io calvin.giles@gmail.com @calvingiles on twitter, github, docker hub (and many more)

×