Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reproducibility of your development environment


Published on

"Reproducible" is not a new word in science, but it can be a surprise for newcomers that they also need to have reproducible software environments. What was working yesterday may not work today, but there is a way to prevent that and that's what we are going to explore. We will take a sneak peek at different open source solutions to help maintain a reproducible, shareable environment.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Reproducibility of your development environment

  1. 1. 11/20/2015 Reproducibility of your development environment http://localhost:4567/slides/environments.html#/?pdf-print 1/1 | |PyData2015 ContinuumAnalytics malev Reproducibility of your development environment by / / PyData NYC 2015 Marcos Vanetta @malev Continuum Analytics
  2. 2. | |PyData2015 ContinuumAnalytics malev Marcos Vanetta ( ) Powered by tacos Sponsored by Continuum Analytics @malev
  3. 3. | |PyData2015 ContinuumAnalytics malev Reproducibility Reproducibility is the ability of an entire experiment or study to be duplicated, either by the same researcher or by someone else working independently.
  4. 4. | |PyData2015 ContinuumAnalytics malev Development Environment A computer system in which a computer program or software component is deployed and executed. A development environment is a collection of procedures and tools for developing, testing and debugging an application or program. A development environment contains everything required by a team to build and deploy software-intensive systems.
  5. 5. | |PyData2015 ContinuumAnalytics malev Components Method Tools Enablement Organization Infrastructure Adoption
  6. 6. | |PyData2015 ContinuumAnalytics malev Components Method Tools Enablement Organization Infrastructure Adoption
  7. 7. | |PyData2015 ContinuumAnalytics malev Method Roles, work products, tasks, and processes Standards, guidelines, checklists, templates, and examples Deployment topology
  8. 8. | |PyData2015 ContinuumAnalytics malev Tools Development tools and their integrations Development tool configurations and installation scripts Deployment topology, which considers the software and hardware required
  9. 9. | |PyData2015 ContinuumAnalytics malev Infrastructure A development environment considers infrastructure in terms of both hardware and software. Locations, nodes, and connectivity Software (such as operating systems, database management systems, board-level controls, and test harnesses).
  10. 10. | |PyData2015 ContinuumAnalytics malev How do we work with data? Everything is production Everything is NOT production Multi-language Local | Cloud | both Data ~Gb | Data ~Tb | ...
  11. 11. | |PyData2015 ContinuumAnalytics malev What do we want to reproduce? Coding and documentation styles Software dependencies (libraries, databases, etc.) Configuration files and environmental variables Data (dummy data and real data) Keys (aws, ssh, etc)
  12. 12. | |PyData2015 ContinuumAnalytics malev Coding and documentation styles Coding styles Linter ( , , , , etc) IDE configuration pep8 flake8 YAPF AirBnB JS Styleguide EditorConfig Sphinx MkDocs
  13. 13. | |PyData2015 ContinuumAnalytics malev Dependencies Database engines Installation instructions Schema Configuration Dummy data Docker or Vagrant Makefiles or bash scripts SaaS Migrations Automate
  14. 14. | |PyData2015 ContinuumAnalytics malev Dependencies: libraries pip conda Lot of packages Data packages mostrly ~ Multi platform Multi platform Not so fast Fast Included in Anaconda Included in Anaconda Consider tools like or .pipreqs defrost
  15. 15. | |PyData2015 ContinuumAnalytics malev Exporting your dependencies with pip Reusing an environment Keep it simple $pipfreeze>requirements.txt $catrequirements.txt requests==2.8.1 virtualenv==13.0.1 wheel==0.26.0 $ $ (my-env)$pipinstall-rrequirements.txt
  16. 16. | |PyData2015 ContinuumAnalytics malev Exporting your dependencies with conda Reusing with conda Keep it simple $condaenvexport-nplease-work-fenvironment.yml $catenvironment.yml name:my-project dependencies: -bokeh=0.8.0=np19py27_0 -colorama=0.3.3=py27_0 -pip: -flask $condaenvcreate ... $sourceactivatemy-project discarding/Users/mvanetta/miniconda/binfromPATH prepending/Users/mvanetta/miniconda/envs/my-project/bintoPATH (my-project)$
  17. 17. | |PyData2015 ContinuumAnalytics malev Using pill $condainstallpilldeps-cmalev $pillinit $sourcepillin $depsinstall $sourcepillout $rm-rf.pill
  18. 18. | |PyData2015 ContinuumAnalytics malev Working with notebooks Reusing your notebook $condacreate-nproject $condainstall-ybokehpandasjupyter $ipythonnotebookiris.ipynb $condaenvattach-nirisiris.ipynb $anacondanotebookuploadiris.ipynb $anacondanotebookdownloadmalev/iris $condaenvcreateiris.ipynb $sourceactivateiris $ipythonnotebookiris.ipynb
  19. 19. | |PyData2015 ContinuumAnalytics malev Configuration files and environmental variables Essential part of configuration management yaml, ini, json files Generally stored in programmer's brains Attach to the repo, document, use tools like and automate.autoenv
  20. 20. | |PyData2015 ContinuumAnalytics malev keys Security concerns Don't put it in the repo Talk to your IT department
  21. 21. | |PyData2015 ContinuumAnalytics malev Conclusions We still have some work to do We have a lot of manual work It is an expensive process
  22. 22. | |PyData2015 ContinuumAnalytics malev Questions