Using docker for data science - part 2Calvin Giles
A lightning talk for PyData London (http://www.meetup.com/PyData-London-Meetup/) on using docker and fig to manage your data science development environment.
Using python and docker for data scienceCalvin Giles
PyData London meetup group lightning talk slides on getting an ipython notebook with the scipy stack and custom packages running in a notebook server in 5 minutes.
Docker is a very useful tool in every data scientists toolbox. In this talk I present motivations to use Docker and made some live demos of typical tools used in data science, such as RStudio, Jupyter Notebook, or Elasticsearch.
Invited to introduce Docker to the Dept. for Information and Communication Services (Informations- und Kommunikationsdienste - IuK) at the University of Rostock.
Drupal Camp Brighton 2015: Ansible Drupal Medicine showGeorge Boobyer
In this session we are going to look at the latest craze amongst developers with some Sysadmin responsibilities - Ansible.
As with all trending technologies you can be led to believe that it is the new wonder drug (multi purpose in a jar - if you ain't ill it will fix your car). But in this case we will look at some of the key ways that automated provisioning, configuration and state management can actually cure some of the critical headaches you face securing and managing production infrastructure and Drupal sites - (as with all such wonder drugs seek the advice of your GP before radically changing your lifestyle). Also as a warning once you start delving deeper into the world of web security you'll need a pretty thick skin - denial was a comfortable place to be. We won’t be covering Ansible for use in local development with systems such as VLAD - that hopefully will be the subject of other presentations.
Critically we are going to look at Ansible in a Drupal context with a focus on security and hopefully encourage participation in the development of tighter integration with Drupal site deployment and management as well as security defence measures.
By the end of the session we hope to have been convinced that with the adoption of Ansible you will feel more secure, more efficient and more relaxed about managing your infrastructure and sites and also to show how the principles of collaboration common within the Drupal community can transpose with great effect to the Ansible community . Code examples will be provided to support the topics covered.
Cette présentation vous montrera comment utiliser et profiter rapidement de Docker, quelles commandes utiliser et quelles fonctionnalités sont disponibles.
sfPot de Lille - Le 15 janvier 2015
Using docker for data science - part 2Calvin Giles
A lightning talk for PyData London (http://www.meetup.com/PyData-London-Meetup/) on using docker and fig to manage your data science development environment.
Using python and docker for data scienceCalvin Giles
PyData London meetup group lightning talk slides on getting an ipython notebook with the scipy stack and custom packages running in a notebook server in 5 minutes.
Docker is a very useful tool in every data scientists toolbox. In this talk I present motivations to use Docker and made some live demos of typical tools used in data science, such as RStudio, Jupyter Notebook, or Elasticsearch.
Invited to introduce Docker to the Dept. for Information and Communication Services (Informations- und Kommunikationsdienste - IuK) at the University of Rostock.
Drupal Camp Brighton 2015: Ansible Drupal Medicine showGeorge Boobyer
In this session we are going to look at the latest craze amongst developers with some Sysadmin responsibilities - Ansible.
As with all trending technologies you can be led to believe that it is the new wonder drug (multi purpose in a jar - if you ain't ill it will fix your car). But in this case we will look at some of the key ways that automated provisioning, configuration and state management can actually cure some of the critical headaches you face securing and managing production infrastructure and Drupal sites - (as with all such wonder drugs seek the advice of your GP before radically changing your lifestyle). Also as a warning once you start delving deeper into the world of web security you'll need a pretty thick skin - denial was a comfortable place to be. We won’t be covering Ansible for use in local development with systems such as VLAD - that hopefully will be the subject of other presentations.
Critically we are going to look at Ansible in a Drupal context with a focus on security and hopefully encourage participation in the development of tighter integration with Drupal site deployment and management as well as security defence measures.
By the end of the session we hope to have been convinced that with the adoption of Ansible you will feel more secure, more efficient and more relaxed about managing your infrastructure and sites and also to show how the principles of collaboration common within the Drupal community can transpose with great effect to the Ansible community . Code examples will be provided to support the topics covered.
Cette présentation vous montrera comment utiliser et profiter rapidement de Docker, quelles commandes utiliser et quelles fonctionnalités sont disponibles.
sfPot de Lille - Le 15 janvier 2015
Shared Object images in Docker: What you need is what you want.Workhorse Computing
Docker images require appropriate shared object files (".so") to run. Rather than assume Ubuntu has the correct lib's, use ldd to get a list and install the ones you know you need. This can reduce the underlying images from GB to a few MB.
Raphaël Pinson's talk on "Configuration surgery with Augeas" at PuppetCamp Geneva '12. Video at http://youtu.be/H0MJaIv4bgk
Learn more: www.puppetlabs.com
A bit of history, frustration-driven development, and why and how we started looking into Puppet at Opera Software. What we're doing, successes, pain points and what we're going to do with Puppet and Config Management next.
Training delivered in 2009 for a compute cluster customer in Calcutta, India. I honestly have no idea what I was thinking. There is no possible audience who would have been pleased with this talk.
Walter Heck, founder of OlinData, presented a step-by-step guide on how to set up a proper puppet repository, complete with the brand new PuppetDB, exported resources and usage of open source modules.
Czy wiesz co potrafi zrobić twój serwer reverse-proxy? Wydaje Ci się, że żeby zrobić sprytny routing / uwierzytelnianie / autoryzację (niepotrzebne skreślić) między serwisami musisz go napisać w Javie lub jako moduł w C? A co jeżeli odpalanie JVM tylko po to, żeby do każdego żądania http dokleić jeden nagłówek to armata na wróbla? Zwłaszcza, że prawie na pewno gdzieś tam po drodze mijasz nginx... Zapraszam Cię do świata idealnej symbiozy nginx i Lua.
PuppetCamp SEA 1 - Puppet Deployment at OnAppWalter Heck
Wai Keen Woon, CTO CDN Division OnApp Malaysia, gave an interesting overview of what the Puppet architecture at OnApp looks like. The CDN division at OnApp is a large provider of CDN services, and as such makes a very interesting candidate for a case study.
Using Docker for GPU Accelerated ApplicationsNVIDIA
Build and run Docker containers leveraging NVIDIA GPUs. Containerizing GPU applications provides several benefits, among them:
* Reproducible builds
* Ease of deployment
* Isolation of individual devices
* Run across heterogeneous driver/toolkit environments
* Requires only the NVIDIA driver to be installed
* Enables "fire and forget" GPU applications
* Facilitate collaboration
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Vincenzo Ferme
The ability to replicate and reproduce scientific results has become an increasingly important topic for many academic disciplines. In computer science and, more specifically, software and Web engineering, contributions of scientific work rely on developed algorithms, tools and prototypes, quantitative evaluations, and other computational analyses. Published code and data come with many undocumented assumptions, dependencies, and configurations that are internal knowledge and make reproducibility hard to achieve. This tutorial presents how Docker containers can overcome these issues and aid the reproducibility of research artefacts in software engineering and discusses their applications in the field.
Cite us: http://link.springer.com/chapter/10.1007/978-3-319-38791-8_58
Shared Object images in Docker: What you need is what you want.Workhorse Computing
Docker images require appropriate shared object files (".so") to run. Rather than assume Ubuntu has the correct lib's, use ldd to get a list and install the ones you know you need. This can reduce the underlying images from GB to a few MB.
Raphaël Pinson's talk on "Configuration surgery with Augeas" at PuppetCamp Geneva '12. Video at http://youtu.be/H0MJaIv4bgk
Learn more: www.puppetlabs.com
A bit of history, frustration-driven development, and why and how we started looking into Puppet at Opera Software. What we're doing, successes, pain points and what we're going to do with Puppet and Config Management next.
Training delivered in 2009 for a compute cluster customer in Calcutta, India. I honestly have no idea what I was thinking. There is no possible audience who would have been pleased with this talk.
Walter Heck, founder of OlinData, presented a step-by-step guide on how to set up a proper puppet repository, complete with the brand new PuppetDB, exported resources and usage of open source modules.
Czy wiesz co potrafi zrobić twój serwer reverse-proxy? Wydaje Ci się, że żeby zrobić sprytny routing / uwierzytelnianie / autoryzację (niepotrzebne skreślić) między serwisami musisz go napisać w Javie lub jako moduł w C? A co jeżeli odpalanie JVM tylko po to, żeby do każdego żądania http dokleić jeden nagłówek to armata na wróbla? Zwłaszcza, że prawie na pewno gdzieś tam po drodze mijasz nginx... Zapraszam Cię do świata idealnej symbiozy nginx i Lua.
PuppetCamp SEA 1 - Puppet Deployment at OnAppWalter Heck
Wai Keen Woon, CTO CDN Division OnApp Malaysia, gave an interesting overview of what the Puppet architecture at OnApp looks like. The CDN division at OnApp is a large provider of CDN services, and as such makes a very interesting candidate for a case study.
Using Docker for GPU Accelerated ApplicationsNVIDIA
Build and run Docker containers leveraging NVIDIA GPUs. Containerizing GPU applications provides several benefits, among them:
* Reproducible builds
* Ease of deployment
* Isolation of individual devices
* Run across heterogeneous driver/toolkit environments
* Requires only the NVIDIA driver to be installed
* Enables "fire and forget" GPU applications
* Facilitate collaboration
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Vincenzo Ferme
The ability to replicate and reproduce scientific results has become an increasingly important topic for many academic disciplines. In computer science and, more specifically, software and Web engineering, contributions of scientific work rely on developed algorithms, tools and prototypes, quantitative evaluations, and other computational analyses. Published code and data come with many undocumented assumptions, dependencies, and configurations that are internal knowledge and make reproducibility hard to achieve. This tutorial presents how Docker containers can overcome these issues and aid the reproducibility of research artefacts in software engineering and discusses their applications in the field.
Cite us: http://link.springer.com/chapter/10.1007/978-3-319-38791-8_58
Data Day Texas 2017: Scaling Data Science at Stitch FixStefan Krawczyk
At Stitch Fix we have a lot of Data Scientists. Around eighty at last count. One reason why I think we have so many, is that we do things differently. To get their work done, Data Scientists have access to whatever resources they need (within reason), because they’re end to end responsible for their work; they collaborate with their business partners on objectives and then prototype, iterate, productionize, monitor and debug everything and anything required to get the output desired. They’re full data-stack data scientists!
The teams in the organization do a variety of different tasks:
- Clothing recommendations for clients.
- Clothes reordering recommendations.
- Time series analysis & forecasting of inventory, client segments, etc.
- Warehouse worker path routing.
- NLP.
… and more!
They’re also quite prolific at what they do -- we are approaching 4500 job definitions at last count. So one might be wondering now, how have we enabled them to get their jobs done without getting in the way of each other?
This is where the Data Platform teams comes into play. With the goal of lowering the cognitive overhead and engineering effort required on part of the Data Scientist, the Data Platform team tries to provide abstractions and infrastructure to help the Data Scientists. The relationship is a collaborative partnership, where the Data Scientist is free to make their own decisions and thus choose they way they do their work, and the onus then falls on the Data Platform team to convince Data Scientists to use their tools; the easiest way to do that is by designing the tools well.
In regard to scaling Data Science, the Data Platform team has helped establish some patterns and infrastructure that help alleviate contention. Contention on:
Access to Data
Access to Compute Resources:
Ad-hoc compute (think prototype, iterate, workspace)
Production compute (think where things are executed once they’re needed regularly)
For the talk (and this post) I only focused on how we reduced contention on Access to Data, & Access to Ad-hoc Compute to enable Data Science to scale at Stitch Fix. With that I invite you to take a look through the slides.
Deep Neural Networks that talk (Back)… with styleRoelof Pieters
Talk at Nuclai 2016 in Vienna
Can neural networks sing, dance, remix and rhyme? And most importantly, can they talk back? This talk will introduce Deep Neural Nets with textual and auditory understanding and some of the recent breakthroughs made in these fields. It will then show some of the exciting possibilities these technologies hold for "creative" use and explorations of human-machine interaction, where the main theorem is "augmentation, not automation".
http://events.nucl.ai/track/cognitive/#deep-neural-networks-that-talk-back-with-style
Explore Data: Data Science + VisualizationRoelof Pieters
Talk on Data Visualization for Data Scientist at Stockholm NLP Meetup June 2015: http://www.meetup.com/Stockholm-Natural-Language-Processing-Meetup/events/222609869/
Video recording at https://www.youtube.com/watch?v=3Li_xIQ1K84
In this talk, we'll discover how Docker comes to the rescue of the Ops Team, while rebuilding from scratch our monitoring infrastructure. We'll start by quickly describing the challenges, to focus on why and how using docker saved the project. From fixing dependencies and isolation issues, implementing rolling upgrades and new features hot addition, to building a completely modular, scalable and resilient infrastructure, we'll talk about why CI/CD workflows, docker tooling and Docker Swarm were the key to success.
Slides from my DockerCon EU 2017 Talk.
Find the abstract below:
"In this talk, we'll discover how Docker comes to the rescue of the Ops Team, while rebuilding from scratch our monitoring infrastructure. We'll start by quickly describing the challenges, to focus on why and how using docker saved the project. From fixing dependencies and isolation issues, implementing rolling upgrades and new features hot addition, to building a completely modular, scalable and resilient infrastructure, we'll talk about why CI/CD workflows, docker tooling and Docker Swarm were the key to success."
The perl on most linux distros is a mess. Docker makes it easier to build and packge a local perl and applications. Problem is that Docker's manuals produce a mess of their own.
Distributing perl on top of Gentoo's stage3 distro, busybox, or nothing at all made good alternatives. This talk includes basics of setting up docker, building a local perl for it, and packaging perl or applications into images for use in containers.
Title: Introduction to Docker
Abstract:
During the year since it’s inception, Docker have changed our perception of the OS-level Virtualization also called Containers.
At this workshop we will introduce the concept of Linux containers in general and Docker specifically. We will guide the participants through a practical exercise that will include use of various Docker commands and a setting up a functional Wordpress/MySQL system running in two containers and communication with each other using Serf
Topics:
Docker Installation (in case is missing)
Boot2Docker
Docker commands
- basic commands
- different types of containers
- Dockerfiles
Serf
Wordpress Exercise
- setting up Serf cluster
- deploying MySQL
- deploying Wordpress and connecting to MySQL
Prerequisites:
Working installation of Docker
On Mac - https://docs.docker.com/installation/mac/
On Windows - https://docs.docker.com/installation/windows/
Other Platforms - https://docs.docker.com/installation/#installation
distribute und pip als Ersatz für setuptools und easy_install bieten im Zusammenspiel mit virtualenv viele neue Möglichkeiten bei der Entwicklung und dem Deployment von Python-Applikationen. In diesem Vortrag stelle ich alle Werkzeuge kurz vor und zeige, wie man sie zusammen einsetzen kann.
Deep dive into Verdaccio - NodeTLV 2022 - IsraelJuan Picado
In this talk, you will discover a deep understanding of how a Node.js registry works. Advanced features that will help boost your registry productivity and what´s new for the next major release.
Get hands-on with security features and best practices to protect your containerized services. Learn to push and verify signed images with Docker Content Trust, and collaborate with delegation roles. Intermediate to advanced level Docker experience recommended, participants will be building and pushing with Docker during the workshop.
Led By Docker Security Experts:
Riyaz Faizullabhoy
David Lawrence
Viktor Stanchev
Experience Level: Intermediate to advanced level Docker experience recommended
Minimum Viable Docker: our journey towards orchestrationOutlyer
While Kubernetes and Mesos are all the rage, you don't necessarily need a complex orchestration layer to start using and benefiting from Docker. We will present how Babylon Health is running its dockerised AI microservices in production, pros and cons, and what we have in store for the future.
Talk given at Devoxx Belgium 2018
Spring Boot is awesome. Docker is awesome. Together you can do great things. But, are you doing it the right way? We'll walk you through, in detail, the optimal way to structure Docker images for Spring Boot applications for iterative development. Structuring your Docker images correctly is really important for teams doing continuous integration and continuous delivery. Using Docker best practices, we'll show you the code and the technologies used to optimize Docker images for Spring Boot apps!
Bartosz Tkaczewski - Przygód z Dockerem ciąg dalszy
http://www.tsh.io
Docker jest zauważalny już niemal wszędzie. Na prezentacji zobaczysz działające środowisko developerskie, poznasz kilka sztuczek, jak sobie z nim dobrze radzić i efektywnie pracować, zobaczysz też, jak szybko można prosty projekt wzbogacić o zaawansowane stacki aplikacji (na przykładzie ELK). Postaram się również opowiedzieć, jak można sobie z tym potworkiem poradzić na produkcji.
Prezentacja z Uszanowanka Programowanka #16 - http://www.meetup.com/Uszanowanko-Programowanko/events/234826115/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
3. Who am I?
APhysicist -MPhys fromUniversity of Southampton
AData Scientist at Adthena
APyData meetupand conference co-organiser
Data Science Advisorforuntangleconsulting.io
Programming in python fornearly 10years
Using dockerfor3months
8. Then Idecided to contribute to sklearn.
Followed by many build errors.
The quickest solution -disable macports.
$ git clone git://github.com/scikit-learn/scikit-learn.git
$ python setup.py build_ext --inplace
10. With my environment in tatters...
...and faced with re-installing fromscratch,Idecided there must be a betterway.
What about:
Homebrew
Boxen
virtualenv
anaconda
npm,rpm
vagrant,chef,puppet,ansible
VirtualBox,VMFusion
Docker,CoreOS Rocket
fig,dokku,flynn,deis
Surely one of these would help?
11. What do I want in a solution?
Trivialto wipe the slate clean and recreate
Portable (home laptopenv ==work laptopstate)
Easy to share
Configure once,use everywhere
Remote databases,servers etc.
Customisation (sublime,.virc,.bashrcetc.)
Installation quirks
No system-wide backuprequired
Compatible with deployment to servers
OS X-centric
12. Introducing Docker!!
boot2docker-a single virtualmachine running on VirtualBox(OS XorWindows)
dockerdaemon running on boot2dockeros
dockercontainers running in partialisolation inside the same boot2dockervirtual
machine
dockerclient running on the host (OS X)to simplify the issuing of commands
dockerimages as templates forcontainers
dockerimages ->.isostyle templates
dockercontainers ->lightweight virtualmachines intended to run justone process each
13. What do you get with docker?
Run multiple environments indipendently
Run services indipendently of environments e.g.databases
Permit an environment to interact with a specific subset of the host files
Share a poolof resources between allenvironments
Asingle containercan consume 100%of CPU,RAMand HDD
Quotas forwhen resources busy
14. What can be problematic
Trust -processes are given read-write access to yourfiles
stick to trusted builds and automated builds
not really different to installing any software
Resources are limited to VMallocation
Lot's to learn
Managing containers (starting,stopping etc.)
21. What do all these arguments do?
-drun as daemon
-i -t --rmrun interactively and auto-remove on exit
-eset an env variable
-pmapa port like -p host:container
-vmapa host volume into the container
--linkautomatically link containers,particularly databases
-wset the working directory
--volumes-frommapallthe volumes fromthe named container
23. Where do images come from?
The trusted builds on dockerhub (like ubuntu,postgres,node etc.)
Open source providers with automated builds (like ipython,julia etc.)
Public images uploaded in a built state (quite opaque)
Private images (built locally orvia docker login)
24. How do I build my own images?
Write a Dockerfile.
Eitherbuild and run it locally like:
Orupload it to github and have the dockerhub build it foryou automatically:
Wait forbuild...
$ docker build -t calvingiles/magic-image .
$ docker run calvingiles/magic-image
$ git push
$ docker run calvingiles/magic-image
25. What is this Dockerfile?
FROM ipython/scipyserver
MAINTAINER Calvin Giles <calvin.giles@gmail.com>
# Create install folder
RUN mkdir /install_files
# Install postgres libraries and python dev libraries
# so we can install psycopg2 later
RUN apt-get update
RUN apt-get install libpq-dev python-dev
# install python requirements
COPY requirements.txt /install_files/requirements.txt
RUN pip2 install -r /install_files/requirements.txt
RUN pip3 install -r /install_files/requirements.txt
# Set the working directory to /notebooks
WORKDIR /notebooks
26. Components of a Dockerfile
FROM:anotherimage to build upon (ubuntu,debian,ipython...)
RUN:execute a command in the containerand write teh results into the image
COPY:copy a file fromthe build filesystemto the image
WORKDIR:change the working directory (the containerstarts in the last WORKDIR)
ENV:set and env variable
EXPOSE:open upa port to linked containers and the host
27. So how do I actually use docker?
Find an image to start yourenvironment off (ubuntu,ipython/scipystack,
rocker/rstudio)
Create a Dockerfilecontaining only a FROMline:
build and run
FROM ipython/scipystack
28. Let's start with the ipython notebook serverwith scipystack:
Find yourboot2docker ip:
Navigate there https://your-ip:443and sign in with the PASSWORD
$ echo 'FROM ipython/scipyserver' > Dockerfile
$ docker build -t ipython-dev-env .
$ docker run -i -t --rm -e PASSWORD=MyPass -p 443:8888 ipython-dev-env
$ boot2docker ip
36. How do I get a database?
You willget the IP and PORTS to connect to as env variables in the ipython-dev-env container
$ docker run -d --name dev-postgres postgres
$ docker run -d
-e PASSWORD=MyPass
-p 443:8888
--link dev-postgres:dev-postgres
ipython-dev-env
37. What about my data?
$ docker run -d
-v "~/Google Drive/data:/data"
--name gddata
busybox echo
$ docker run -d
-e PASSWORD=MyPass
-p 443:8888
--volumes-from gddata
ipython-dev-env
39. Git push?
In Dockerhub,create new and select a Automated Build.
Point it to yourgithub orbitbucket repo
Wait forthe build to complete
$ docker pull calvingiles/data-science-environment
$ docker run calvingiles/data-science-environment
42. FROM ipython/scipyserver
MAINTAINER Calvin Giles <calvin.giles@gmail.com>
# Create install folder
RUN mkdir /install_files
# Update aptitude with new repo
RUN apt-get update
# Install software
RUN apt-get install -y git
# Make ssh dir
RUN mkdir /root/.ssh/
## Authenticate with github
# Copy over private key, and set permissions
COPY id_rsa /root/.ssh/id_rsa
RUN chmod 600 /root/.ssh/id_rsa
# Create known_hosts
RUN touch /root/.ssh/known_hosts
# Add github key
RUN ssh-keyscan github.com >> /root/.ssh/known_hosts
## install pyodbc so we can talk to MS SQL
# install unixodbc and freetds
RUN apt-get -y install unixodbc unixodbc-dev freetds-dev tdsodbc
# configure Adthena database with read-only permissions
COPY freetds.conf.suffix /install_files/freedts.conf.suffix
RUN cat /install_files/freedts.conf.suffix >> /etc/freetds/freetds.conf
COPY odbcinst.ini /etc/odbcinst.ini
COPY odbc.ini /etc/odbc.ini
# Install pyodbc from source
RUN pip2 install https://pyodbc.googlecode.com/files/pyodbc-3.0.7.zip
RUN pip3 install https://pyodbc.googlecode.com/files/pyodbc-3.0.7.zip
43. # install python requirements
COPY requirements.txt /install_files/requirements.txt
RUN pip2 install -r /install_files/requirements.txt
RUN pip3 install -r /install_files/requirements.txt
# Clone wayside into the docker container
RUN mkdir -p /repos/wayside
WORKDIR /repos/wayside
RUN git clone git@github.com:Adthena/wayside.git .
RUN python2 setup.py develop
RUN python3 setup.py develop
# Get rid of ssh key from image now repos have been cloned
RUN rm /root/.ssh/id_rsa
# Put the working directory back to notebooks at the end
WORKDIR /notebooks
44. Sum up
Find a base image
Run a containerand trialrun yourinstallsteps
Create a Dockerfileto performthose steps consistently
My environments
my public development environment -
my public dockerimages -
docker run -it --rm calvingiles/<image>
build upon with FROM calvingiles/<image>
fork (in github)if you need things a little different
github.com/calvingiles/data-science-
environment (https://github.com/calvingiles/data-science-environment)
hub.docker.com/u/calvingiles/
(https://hub.docker.com/u/calvingiles/)