Successfully reported this slideshow.
Your SlideShare is downloading. ×

PEARC17: Reproducibility and Containers: The Perfect Sandwich

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 250 Ad

PEARC17: Reproducibility and Containers: The Perfect Sandwich

Download to read offline

Vanessa Sochat, Research Software Engineer, Stanford Research Computing Center, and Gregory Kurtzer, High Performance Computing Services group at Lawrence Berkeley National Lab, present on their work with Singularity and Singularity Hub.

Dear reader, how should you disseminate your software? If you want your recipe to come out just right, we encourage you to put it in a container. One such container, Singularity, is the first of its kind to be securely deployed internationally on more than 40 shared cluster resources. Its registry, Singularity Hub, further supports reproducible science by building and making containers accessible to any user of the software. In this talk, Vanessa will review the primary use cases for both Singularity and Singularity Hub, and how both have been designed to support modern, common workflows. (Greg will participate remotely.) She will discuss current and future challenges for building, capturing metadata for, and organizing the exploding landscape of containers, and present novel work for assessing reproducibility of such containers. Containers are changing scientific computing, and this is something to be excited about.

Vanessa Sochat, Research Software Engineer, Stanford Research Computing Center, and Gregory Kurtzer, High Performance Computing Services group at Lawrence Berkeley National Lab, present on their work with Singularity and Singularity Hub.

Dear reader, how should you disseminate your software? If you want your recipe to come out just right, we encourage you to put it in a container. One such container, Singularity, is the first of its kind to be securely deployed internationally on more than 40 shared cluster resources. Its registry, Singularity Hub, further supports reproducible science by building and making containers accessible to any user of the software. In this talk, Vanessa will review the primary use cases for both Singularity and Singularity Hub, and how both have been designed to support modern, common workflows. (Greg will participate remotely.) She will discuss current and future challenges for building, capturing metadata for, and organizing the exploding landscape of containers, and present novel work for assessing reproducibility of such containers. Containers are changing scientific computing, and this is something to be excited about.

Advertisement
Advertisement

More Related Content

Similar to PEARC17: Reproducibility and Containers: The Perfect Sandwich (20)

More from Vanessa S (20)

Advertisement

Recently uploaded (20)

PEARC17: Reproducibility and Containers: The Perfect Sandwich

  1. 1. SINGULARITY CONTAINERS FOR SCIENCE Vanessa Sochat, PhD Research Software Engineer Research Computing Stanford University
  2. 2. THE PERFECT SANDWICH
  3. 3. The Perfect Sandwich 1. Peanut Butter 2. Jelly 3. Bread 4. Spread on Bread 5. Eat
  4. 4. The Perfect Sandwich 1. Peanut Butter 2. Jelly 3. Bread 4. Spread on Bread 5. Eat
  5. 5. The Perfect Sandwich 1. Peanut Butter 2. Jelly 3. Bread 4. Spread on Bread 5. Eat
  6. 6. The Perfect Sandwich 1. Peanut Butter 2. Jelly 3. Bread 4. Spread on Bread 5. Eat
  7. 7. IS IT THE SAME SANDWICH?
  8. 8. SAME SAME, BUT DIFFERENT
  9. 9. SAME SAME, BUT DIFFERENT
  10. 10. WE COULD HAVE DONE WORSE...
  11. 11. Why does it taste different?
  12. 12. 1. Our recipe was not reproducible
  13. 13. 1. Our recipe was not reproducible 2. We had missing dependencies
  14. 14. 1. Our recipe was not reproducible 2. We had missing dependencies 3. The perfect sandwich might never be made again
  15. 15. 1. Our recipe was not reproducible 2. We had missing dependencies 3. The perfect sandwich might never be made again no ability to easily distribute or validate work
  16. 16. Introducing Singularity
  17. 17. Introducing Singularity Give them the sandwich.
  18. 18. Container: encapsulation of system environment
  19. 19. LIFE’S WORK Container: encapsulation of system environment
  20. 20. Why not Docker?
  21. 21. DOCKER IS (STILL) GREAT!
  22. 22. DOCKER IS (STILL) GREAT! Docker Well-known container platform
  23. 23. DOCKER IS (STILL) GREAT! Docker Well-known container platform Micro-service virtualization
  24. 24. DOCKER IS (STILL) GREAT! Docker Well-known container platform Micro-service virtualization Create + distribute containers
  25. 25. DOCKER IS (STILL) GREAT! Docker Well-known container platform Micro-service virtualization Create + distribute containers Reproducible
  26. 26. DOCKER IS (STILL) GREAT! Docker Well-known container platform Micro-service virtualization Create + distribute containers Reproducible Easy to use, well documented
  27. 27. WHY NOT DOCKER? Docker is not designed for,
  28. 28. WHY NOT DOCKER? Docker is not designed for, efficient for,
  29. 29. WHY NOT DOCKER? Docker is not designed for, efficient for, or even compatible with
  30. 30. WHY NOT DOCKER? Docker is not designed for, efficient for, or even compatible with traditional HPC architectures
  31. 31. WHY NOT DOCKER? Docker is not designed for, efficient for, or even compatible with traditional HPC architectures No centers run Docker on their traditional HPC
  32. 32. + HPC HPC ADMIN
  33. 33. HPC USER
  34. 34. scientists need containers too
  35. 35. 1. Singularity, three ways 2. Singularity Hub 3. Reproducible Science
  36. 36. Singularity ...three ways
  37. 37. How do I use it
  38. 38. THE SINGULARITY FLOW
  39. 39. Image Creation $ singularity create ubuntu.img $ singularity import ubuntu.img docker://ubuntu:14.04
  40. 40. Image Creation $ singularity create ubuntu.img $ singularity import ubuntu.img docker://ubuntu:14.04
  41. 41. Image Creation $ singularity create ubuntu.img $ singularity import ubuntu.img docker://ubuntu:14.04 $ singularity pull docker://ubuntu:14.04 ubuntu-14.04.img
  42. 42. Image Bootstrap $ singularity create ubuntu.img $ sudo singularity bootstrap ubuntu.img Singularity
  43. 43. Image Bootstrap $ singularity create ubuntu.img $ sudo singularity bootstrap ubuntu.img Singularity
  44. 44. Bootstrap: docker From: python:latest Singularity
  45. 45. Bootstrap: docker From: python:latest %post apt-get update apt-get install -y vim wget mkdir /cave Singularity
  46. 46. Bootstrap: docker From: python:latest %post apt-get update apt-get install -y vim wget mkdir /cave %labels MAINTAINER vanessasaurus Singularity
  47. 47. Bootstrap: docker From: python:latest %post apt-get update apt-get install -y vim wget mkdir /cave %labels MAINTAINER vanessasaurus %files /home/vanessa/Desktop/rawr.sh /cave/rawr.sh Singularity
  48. 48. Bootstrap: docker From: python:latest %post apt-get update apt-get install -y vim wget mkdir /cave %labels MAINTAINER vanessasaurus %files /home/vanessa/Desktop/rawr.sh /cave/rawr.sh %environment DINOSAUR_HOME=/cave export DINOSAUR_HOME Singularity
  49. 49. Bootstrap: docker From: python:latest %post apt-get update apt-get install -y vim wget mkdir /cave %labels MAINTAINER vanessasaurus %files /home/vanessa/Desktop/rawr.sh /cave/rawr.sh %environment DINOSAUR_HOME=/cave export DINOSAUR_HOME %runscript exec /bin/bash /cave/rawr.sh “$@” Singularity
  50. 50. Where does it live?
  51. 51. open source https://octodex.github.com/
  52. 52. client (bash)
  53. 53. client (bash) src (C)
  54. 54. client (bash) src (C) helper (python)
  55. 55. /home/vanessa/.singularity ├── docker ├── metadata └── shub /usr/local/var/singularity/ └── mnt ├── container ├── overlay └── session
  56. 56. /usr/local/ ├── bin ├── etc ├── include ├── lib ├── libexec └── var ./configure --prefix=/usr/local
  57. 57. client /usr/local/ ├── bin ├── etc ├── include ├── lib ├── libexec └── var ./configure --prefix=/usr/local
  58. 58. /usr/local/ ├── bin ├── etc ├── include ├── lib ├── libexec └── var src
  59. 59. /usr/local/ ├── bin ├── etc ├── include ├── lib ├── libexec └── var python
  60. 60. /usr/local/ ├── bin ├── etc ├── include ├── lib ├── libexec └── var mount
  61. 61. /usr/local/ ├── bin ├── etc ├── include ├── lib ├── libexec └── var config
  62. 62. How does it work?
  63. 63. Installation git clone https://www.github.com/singularityware/singularity.git cd singularity ./autogen.sh ./configure --prefix=/usr/local make sudo make install
  64. 64. Customizable by the HPC Admin SINGULARITY.CONF - bind/mount points - permissions - overlayfs
  65. 65. Customizable by the HPC Admin SINGULARITY.CONF - bind/mount points - permissions - overlayfs - config file must be root owned
  66. 66. Customizable by the HPC Admin SINGULARITY.CONF - bind/mount points - permissions - overlayfs - config file must be root owned - controls what user can/not do
  67. 67. Customizable by the HPC Admin SINGULARITY.CONF - bind/mount points - permissions - overlayfs - config file must be root owned - controls what user can/not do - dis/allow different devices
  68. 68. Customizable by the HPC Admin SINGULARITY.CONF - bind/mount points - permissions - overlayfs - config file must be root owned - controls what user can/not do - dis/allow different devices - paths, session dirs all controlled
  69. 69. If you want to be root inside the container, you must be root outside the container.
  70. 70. contained processes exit all namespaces collapse ...leaving a cleaned system
  71. 71. The Singularity Command singularity --debug run --contain sandwich.img
  72. 72. The Singularity Command singularity --debug run --contain sandwich.img <action>
  73. 73. The Singularity Command singularity --debug run --contain sandwich.img [global options]
  74. 74. The Singularity Command singularity --debug run --contain sandwich.img [command options]
  75. 75. The Singularity Command singularity --debug run --contain sandwich.img <image>
  76. 76. share sandwich?
  77. 77. Singularity Hub
  78. 78. 1. Singularity, three ways 2. Singularity Hub 3. Reproducible Science
  79. 79. WHERE IS THE BOTTLENECK?
  80. 80. SINGULARITY HUB: CONTAINER REGISTRY
  81. 81. COLLECTIONS
  82. 82. COLLECTION
  83. 83. COLLECTION commit
  84. 84. CONTAINER BUILD
  85. 85. CONTAINER BUILD LOG
  86. 86. ESTIMATED OPERATING SYSTEMS
  87. 87. How does it work? :>
  88. 88. 1. Add bootstrap specification file to Github repo base
  89. 89. 1. Add bootstrap specification file to Github repo base 2. “Turn build on” in Singularity Hub
  90. 90. 1. Add bootstrap specification file to Github repo base 2. “Turn build on” in Singularity Hub 3. Commits are built automatically on Google Cloud
  91. 91. 1. Add bootstrap specification file to Github repo base 2. “Turn build on” in Singularity Hub 3. Commits are built automatically on Google Cloud 4. Accessible via command line
  92. 92. 1. Singularity, three ways 2. Singularity Hub 3. Reproducible Science
  93. 93. 1. Singularity, three ways 2. Singularity Hub 3. Reproducible Science What happens next?
  94. 94. container... predictions! Change in the movement of information
  95. 95. Change in the movement of information bits
  96. 96. Change in the movement of information bits file
  97. 97. Change in the movement of information bits file folder
  98. 98. Change in the movement of information bits file folder
  99. 99. Change in the movement of information bits file folder software
  100. 100. Change in the movement of information bits file folder software
  101. 101. Change in the movement of information bits file folder software apt-get install -y party-animal pip install party-animal
  102. 102. I’m missing dependencies. I didn’t get the same result What version of Python did you use? It doesn’t compile on my system!
  103. 103. The unit of information isn’t good enough.
  104. 104. Change in the movement of information bits file folder software os
  105. 105. Change in the movement of information bits file folder software osos containers
  106. 106. container... predictions! Change in the movement of information: put stuff in containers
  107. 107. Too many containers!
  108. 108. Which containers do genomic analysis?
  109. 109. Which containers do genomic analysis? Which containers do it best? How do we define best?
  110. 110. Which containers do genomic analysis? Which containers do it best? How do we define best? Which ones have the most varying result? Why?
  111. 111. I don’t know how to measure that.
  112. 112. Our representation of containers isn’t good enough
  113. 113. Expectation: This container makes the perfect sandwich!
  114. 114. Reality:
  115. 115. container... predictions! Change in the movement of information: put stuff in containers Change in the representation of containers: reproducibility metrics
  116. 116. 1. Singularity, three ways 2. Singularity Hub 3. Reproducibility Metrics
  117. 117. How is container C1 similar to container C2 ? C1 C2
  118. 118. C1 C2
  119. 119. Intersection of sets C1 and C2 C1 C2 Total sum of files in C1 and C2
  120. 120. C1 C2 Is container C1 similar to container C2 ?
  121. 121. C1 C2 Is container C1 similar to container C2 ? It depends who is asking
  122. 122. SONIC MADE IT THROUGH REPRODUCIBILITY LEVEL REPLICATE!
  123. 123. C1
  124. 124. Levels of Reproducibility Identical: the exact same image file
  125. 125. Levels of Reproducibility Identical: the exact same image file Replicate: the same image built at different times
  126. 126. Levels of Reproducibility Identical: the exact same image file Replicate: the same image built at different times Base: the core os is estimated to be the same
  127. 127. Levels of Reproducibility Identical: the exact same image file Replicate: the same image built at different times Base: the core os is estimated to be the same Runscript: the content of the runscript is the same Environment: the environments are the same Labels: the container labels are the same
  128. 128. What is a level of reproducibility? A set of files between containers that are compared via content hash
  129. 129. Intersection of sets C1 and C2 C1 C2 Total sum of files in C1 and C2
  130. 130. C1 C2 Reproducibility Assessment Algorithm “Hash Content Comparison” Intersection of sets C1 and C2 Total sum of files in C1 and C2
  131. 131. Do the levels behave as I would expect? Compare an image to itself
  132. 132. Do the levels behave as I would expect? Compare an image to itself - At step 1, start with the image compared to its full self
  133. 133. Do the levels behave as I would expect? Compare an image to itself - At step 1, start with the image compared to its full self - Subtract one file from the second image, recalculate, until empty
  134. 134. Do the levels behave as I would expect? Compare an image to itself - At step 1, start with the image compared to its full self - Subtract one file from the second image, recalculate, until empty a. Remove more recent files first
  135. 135. Do the levels behave as I would expect? Compare an image to itself - At step 1, start with the image compared to its full self - Subtract one file from the second image, recalculate, until empty a. Remove more recent files first Do this across all levels
  136. 136. Reproducibility Metrics: Takeaways 1. “Operating system science” needs to be a thing
  137. 137. Reproducibility Metrics: Takeaways 1. “Operating system science” needs to be a thing 2. Definitions of levels important
  138. 138. Reproducibility Metrics: Takeaways 1. “Operating system science” needs to be a thing 2. Definitions of levels important 3. I learned things about the OS just looking at the graphs
  139. 139. Reproducibility Metrics: Takeaways 1. “Operating system science” needs to be a thing 2. Definitions of levels important 3. I learned things about the OS just looking at the graphs 4. A way to derive features for an operating system?
  140. 140. thinking about the future
  141. 141. How can containers support reproducible science?
  142. 142. How can the HPC community support containers?
  143. 143. container sharing integration incentives
  144. 144. container sharing integration incentives
  145. 145. This is not optimized for scaled building!
  146. 146. Singularity Registry
  147. 147. Singularity Registry a local registry for a cluster resource
  148. 148. Challenges - Most resources can’t support web download links - How to share images? manifests? - Storage (for most) is a file system - No Docker for orchestration - Permissions? - Integration with Singularity Hub? - Management?
  149. 149. storage - file system
  150. 150. storage - file system builders - job queue - build node - virtual machines
  151. 151. storage - file system builders - job queue - build node - virtual machines manager - singularity image - command line - web interface
  152. 152. /usr/local/libexec/sregistry ├── cli │ ├── help.database │ ├── help.init │ ├── sregistry.build │ ├── sregistry.database │ ├── sregistry.help │ └── sregistry.init │ ├── helpers │ ├── args │ ├── update │ └── utils └── singularity.registry sudo ./install.sh --prefix=/usr/local
  153. 153. /opt/shub/ builder/ templates/ recipes/ .git/ .travis.yml storage/ containers/ sudo sregistry init --base /opt/shub
  154. 154. /opt/shub/builder recipes/ .git/ tensorflow/ tensorflow/ Singularity Singularity.gpu
  155. 155. /opt/shub/builder recipes/ .git/ tensorflow/ tensorflow/ ← collection tensorflow/tensorflow Singularity Singularity.gpu
  156. 156. /opt/shub/builder recipes/ .git/ tensorflow/ tensorflow/ ← collection tensorflow/tensorflow Singularity shub://tacc/tensorflow/tensorflow Singularity.gpu
  157. 157. /opt/shub/builder recipes/ .git/ tensorflow/ tensorflow/ ← collection tensorflow/tensorflow Singularity shub://tacc/tensorflow/tensorflow Singularity.gpu registry
  158. 158. /opt/shub/builder recipes/ .git/ tensorflow/ tensorflow/ ← collection tensorflow/tensorflow Singularity shub://tacc/tensorflow/tensorflow Singularity.gpu container name
  159. 159. /opt/shub/builder recipes/ .git/ tensorflow/ tensorflow/ ← collection tensorflow/tensorflow Singularity shub://tacc/tensorflow/tensorflow:tag Singularity.gpu tag
  160. 160. registry: container collection corresponds to a folder in repository
  161. 161. registry: container collection corresponds to a folder in repository Individual user: container collection corresponds to an entire Github repo
  162. 162. registry: container collection corresponds to a folder in repository Individual user: container collection corresponds to an entire GIthub repo both build multiple tags for one collection from within same repository
  163. 163. Connect to Singularity Hub
  164. 164. Connect to Singularity Hub ...permission to build, granted!
  165. 165. ...build away, Merrill.
  166. 166. 1. If setup to build locally Launches local build job 2. If setup to only build on Singularity Hub Pings Singularity Hub 3. Both Launches local build job Successful builds ping Singularity Hub
  167. 167. 1. run a build command sudo sregistry build tensorflow sudo sregistry build tensorflow/tensorflow sudo sregistry build tensorflow/tensorflow:gpu
  168. 168. /opt/shub/builder templates/ recipes/ .git/ tensorflow/ tensorflow/ Singularity Singularity.gpu
  169. 169. container sharing integration incentives
  170. 170. container sharing integration incentives
  171. 171. THE CLOUD HPC RESOURCE
  172. 172. How can we work together?
  173. 173. Local Development Environment
  174. 174. Local Development Environment
  175. 175. Local Development Environment Testing
  176. 176. Local Development Environment Testing Deploy and Share
  177. 177. Run!Local Development Environment Testing Deploy and Share Run! Run!
  178. 178. Run!Local Development Environment Testing Deploy and Share Run! Run! Result
  179. 179. Run! Creation Testing Publication Run! Run! Reproduce
  180. 180. Run! Creation Testing Publication Run! Run! Reproduce
  181. 181. How can we work together?
  182. 182. How can we work together? Just try.
  183. 183. Thank you Google!
  184. 184. container sharing integration incentives
  185. 185. Academic Layer Cake scientists
  186. 186. Academic Layer Cake scientists staff
  187. 187. scientist “I need a custom tool”
  188. 188. scientist “I need a custom tool” staff “I offer resources”
  189. 189. scientist “I need a custom tool” staff “I offer resources” scientist “I’ll do it myself”
  190. 190. scientist “I need a custom tool” staff “I offer resources” scientist “I’ll do it myself”
  191. 191. scientist “I need a custom tool” staff “I offer resources” scientist “I’ll do it myself”
  192. 192. Why can’t we do better?
  193. 193. Academic Layer Cake scientists staff software engineers
  194. 194. Lessons from Software Engineering 1. Continuous Integration (testing) 2. Version Control 3. Documentation 4. Logging, Handling Errors 5. Databases, Organization, and Storage
  195. 195. We need incentives and support for Research Software Engineers
  196. 196. scientist “I need a custom tool”
  197. 197. scientist “I need a custom tool” software engineer “I can help with that”
  198. 198. containers-ftw
  199. 199. containers-ftw crowdsourcing science with competitive containers
  200. 200. containers-ftw crowdsourcing science with competitive containers - package your challenge
  201. 201. containers-ftw crowdsourcing science with competitive containers - package your challenge - define metric of success
  202. 202. containers-ftw crowdsourcing science with competitive containers - package your challenge - define metric of success - share it
  203. 203. containers-ftw crowdsourcing science with competitive containers - package your challenge - define metric of success - share it - ...may the best container win!
  204. 204. SHOW ME WHAT YOU GOT
  205. 205. SHOW ME WHAT YOU GOT - Dave Godlove, NIH - Stefan Kombrink https://singularity-hub.org/demos/6/
  206. 206. 1. Singularity, three ways 2. Singularity Hub 3. Reproducible Science
  207. 207. [ Singularity ]
  208. 208. [ Singularity ] reproducible tools
  209. 209. [ Singularity ] reproducible tools are a group effort
  210. 210. [ Singularity ] reproducible tools are a group effort grow out of need
  211. 211. [ Singularity Hub ]
  212. 212. [ Singularity Hub ] reproducible practices
  213. 213. [ Singularity Hub ] reproducible practices sharing containers, data, software
  214. 214. [ Singularity Hub ] reproducible practices sharing containers, data, software working across lines
  215. 215. [ Reproducibility Metrics ]
  216. 216. [ Reproducibility Metrics ] representation for understanding
  217. 217. [ Reproducibility Metrics ] representation for understanding container transparency
  218. 218. [ Incentives ]
  219. 219. [ Incentives ] Research software engineering
  220. 220. Build for how you want the world to be
  221. 221. Party on, party dinosaur
  222. 222. [ the perfect sandwich ]
  223. 223. HPC Admin, Developers, and Scientists
  224. 224. http://singularityware.github.io https://www.singularity-hub.org https://www.github.com/singularityhub
  225. 225. Got messy code? Need to use a node? ...jokes, help, for free! #SRCC
  226. 226. Only the best for your analysis mess SRCC
  227. 227. vsochat@stanford.edu

×