The document discusses achieving reproducibility in data science. It outlines putting the science back into data science through reproducibility. The presenter demonstrates using R and Pachyderm, an open source distributed processing framework, to build a reproducible machine learning workflow and pipeline for an iris classification model. Resources are provided for joining the Pachyderm Slack channel and accessing documentation to learn more about building reproducible data science workflows at scale with Pachyderm.