This was a talk that I gave at CERN at the Inter-experimental Machine Learning (IML) Working Group Meeting in April 2017 about language-agnostic (or polyglot) analysis workflows. I show how it is possible to work in multiple languages and switch between them without leaving the workflow you started. Additionally, I demonstrate how an entire workflow can be encapsulated in a markdown file that is rendered to a publishable paper with cross-references and a bibliography (and with raw LaTeX file produced as a by-product) in a simple process, making the whole analysis workflow reproducible. For experimental particle physics, ROOT is the ubiquitous data analysis tool, and has been for the last 20 years old, so I also talk about how to exchange data to and from ROOT.
Reverse-engineering: Using GDB on LinuxRick Harris
At Holberton School, we have had a couple rounds of a ‘#forfun’ project called crackme. For these projects, we are given an executable that accepts a password. Our assignment is to crack the program through reverse engineering.
** Python Certification Training: https://www.edureka.co/python **
This Edureka PPT on File Handling with Python covers all the important aspects of using files in Python right from the introduction to what fields are, all the way till checking out the major aspects of working with files and using the code-first approach to understand them better.
Python Tutorial Playlist: https://goo.gl/WsBpKe
Blog Series: http://bit.ly/2sqmP4s
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Reverse-engineering: Using GDB on LinuxRick Harris
At Holberton School, we have had a couple rounds of a ‘#forfun’ project called crackme. For these projects, we are given an executable that accepts a password. Our assignment is to crack the program through reverse engineering.
** Python Certification Training: https://www.edureka.co/python **
This Edureka PPT on File Handling with Python covers all the important aspects of using files in Python right from the introduction to what fields are, all the way till checking out the major aspects of working with files and using the code-first approach to understand them better.
Python Tutorial Playlist: https://goo.gl/WsBpKe
Blog Series: http://bit.ly/2sqmP4s
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
This talk is about architecture and programming of the Epiphany processor on the Parallella board, discussing step-by-step how to improve and optimize software kernels on such distributed DSP systems. It was held at the "Softwarekonferenz für Parallel Programming, Concurrency
und Multicore-Systeme" in Karlsruhe/Germany 2014.
A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too complex to describe generally in programming languages, so that in effect programs must automatically describe programs.
Python is great for brainstorming and trying out new ideas. I will give an overview of the tools that are available to date that can assist in rapid prototyping and design of machine learning programs in Python.
Reproducible research (and literate programming) in Rliz__is
Presentation on reproducible research and literate programming in R (using Rmarkdown and knitr), from the Lenhard group lab retreat 2015.
Documents used avauilable at: https://github.com/liz-is/repro_talk
This talk is about architecture and programming of the Epiphany processor on the Parallella board, discussing step-by-step how to improve and optimize software kernels on such distributed DSP systems. It was held at the "Softwarekonferenz für Parallel Programming, Concurrency
und Multicore-Systeme" in Karlsruhe/Germany 2014.
A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too complex to describe generally in programming languages, so that in effect programs must automatically describe programs.
Python is great for brainstorming and trying out new ideas. I will give an overview of the tools that are available to date that can assist in rapid prototyping and design of machine learning programs in Python.
Reproducible research (and literate programming) in Rliz__is
Presentation on reproducible research and literate programming in R (using Rmarkdown and knitr), from the Lenhard group lab retreat 2015.
Documents used avauilable at: https://github.com/liz-is/repro_talk
Data Science - Part II - Working with R & R studioDerek Kane
This tutorial will go through a basic primer for individuals who want to get started with predictive analytics through downloading the open source (FREE) language R. I will go through some tips to get up and started and building predictive models ASAP.
Reproducible Computational Research in RSamuel Bosch
A short presentation with pointers on getting started with reproducible computational research in R. Some of the topics include git, R package development, document generation with R markdown, saving plots, saving tables and using packrat.
A lecture on Apace Spark, the well-known open source cluster computing framework. The course consisted of three parts: a) install the environment through Docker, b) introduction to Spark as well as advanced features, and c) hands-on training on three (out of five) of its APIs, namely Core, SQL \ Dataframes, and MLlib.
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
Vibrant Technologies is headquarted in Mumbai,India.We are the best Teradata training provider in Navi Mumbai who provides Live Projects to students.We provide Corporate Training also.We are Best Teradata Database classes in Mumbai according to our students and corporates
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Language-agnostic data analysis workflows and reproducible research
1. Language-agnostic data analysis workflows and
reproducible research
Andrew John Lowe
Wigner Research Centre for Physics,
Hungarian Academy of Sciences
28 April 2017
2. Overview
This talk: language-agnostic (or polyglot) analysis workflows
I’ll show how it is possible to work in multiple languages and
switch between them without leaving the workflow you started
Additionally, I’ll demonstrate how an entire workflow can be
encapsulated in a markdown file that is rendered to a
publishable paper with cross-references and a bibliography (and
with raw a LATEX file produced as a by-product) in a simple
process, making the whole analysis workflow reproducible
3. Which tool/language is best?
TMVA, scikit-learn, h2o, caret, mlr, WEKA, Shogun, . . .
C++, Python, R, Java, MATLAB/Octave, Julia, . . .
Many languages, environments and tools
People may have strong opinions
A false polychotomy?
Some tools are best suited to specific problems than others
Be a polyglot!
6. Flat files
Language agnostic
Formats like text, CSV, and JSON are well-supported
Breaks workflow
Data types may not be preserved (e.g., datetime, NULL)
New binary format Feather solves this
7. Feather
Feather: A fast on-disk format for data frames for R and Python,
powered by Apache Arrow, developed by Wes McKinney and Hadley
Wickham
# In R:
library(feather)
path <- "my_data.feather"
write_feather(df, path)
# In Python:
import feather
path = 'my_data.feather'
df = feather.read_dataframe(path)
Other languages, such as Julia or Scala (for Spark users), can read
and write Feather files without knowledge of details of Python or R
8. RootTreeToR
RootTreeToR allows users to import ROOT data directly into R
Written by Adam Lyon (Fermilab), presented at useR! 2007
cdcvs.fnal.gov/redmine/projects/roottreetor
Requires ROOT to be installed, but no need to run ROOT
Can export R data.frames to ROOT trees
# Open and load ROOT tree:
rt <- openRootChain("TreeName", "FileName")
N <- nEntries(rt) # number of rows of data
# Names of branches:
branches <- RootTreeToR::getNames(rt)
# Read in a subset of branches (varsList), M rows:
df <- toR(rt, varsList, nEntries=M)
# Use writeDFToRoot to write a data.frame to ROOT
9. root_numpy
root_numpy is a Python extension module that provides an interface
between ROOT and NumPy. Example from root_numpy homepage:
import ROOT
from root_numpy import root2array, root2rec, tree2rec
from root_numpy.testdata import get_filepath
filename = get_filepath('test.root')
# Convert a TTree in a ROOT file into a NumPy
# structured array:
arr = root2array(filename, 'tree')
# Convert a TTree in a ROOT file into a NumPy
# record array:
rec = root2rec(filename, 'tree')
# Get the TTree from the ROOT file:
rfile = ROOT.TFile(filename)
intree = rfile.Get('tree')
# Convert the TTree into a NumPy record array:
rec = tree2rec(intree)
15. Jupyter Notebook
Each notebook has one main language
Cells can contain other languages through “magic”
For example: ggplot2 in Jupyter Notebook
Many Jupyter kernels available
ROOTbooks: Notebooks running ROOT Jupyter kernel
Assume that most people here have already seen these
16. Beaker Notebook
Each notebook can contain any language
Many languages are supported
Auto translation of data (copied)
http://beakernotebook.com/
Figure 1: “A universal translator for data scientists”
17. Beaker Notebook
Beaker’s individual cells support different languages within the same
notebook and allow you to pass data from one cell to another —
e.g. Python to R to JavaScript — seamlessly. This autotranslate
functionality allows you to use the language best suited to the
problem.
Figure 2: “A polyglot workflow in Beaker.”
19. Typical research pipeline
1. Measured data
processing code
−−−−−−−−−→ analysis data
2. Analysis data
analysis code
−−−−−−−→ computational results
3. Computational results
presentation code
−−−−−−−−−−→ plots, tables, numbers
4. Plots, tables, numbers + text
manual editing
−−−−−−−−→ ARTICLE
5. (Maybe) put the data and code on the web somewhere
Seldom done
20. Challenges with this approach
Onus is on researchers to make their data and code available
Authors may need to undertake considerable effort to put data
and code on the web
Readers must download data and code individually and piece
together which data go with which code sections, etc.
Typically, authors just put stuff on the web
Readers just download the data and try to figure it out, piece
together the software and run it
Authors/readers must manually interact with websites
There is no single document to integrate data analysis with
textual representations; i.e. data, code, and text are not linked
Your experiment may impose restrictions on what kind of data
can be shared publicly. Nevertheless, making an analysis
reproducible benefits both your collaboration colleagues and
your future self!
21. Literate Statistical Programming
Original idea comes from Donald Knuth
An article is a stream of text and code
Analysis code is divided into text and code “chunks”
Presentation code formats results (tables, figures, etc.)
Article text explains what is going on
Literate programs are weaved to produce human-readable
documents and tangled to produce machine-readable
documents
22. knitr
knitr is a system for literate (statistical) programming
Uses R as the programming language, but others are allowed
Benefits: text and code are all in one place, results
automatically updated to reflect external changes, workflow
supports reproducibility from the outset
Almost no extra effort required by the researcher after the
article has been written
Develop your analysis interactively, like in a Jupyter Notebook
When finished, do rmarkdown::render("my_paper.Rmd") to
get PDF/HTML/doc
Like Notebooks, but output is publication quality, and you get
the raw LATEX also as a freebie
23. R Markdown
This is an R Markdown presentation. Markdown is a simple
formatting syntax for authoring HTML, PDF, and MS Word
documents. For more details on using R Markdown see
http://rmarkdown.rstudio.com
It looks like LATEX, or to be more precise, it’s a Beamer
presentation, but this slide was created like:
## R Markdown
This is an R Markdown presentation. Markdown is a simple
formatting syntax for authoring HTML, PDF, and MS Word
documents. For more details on using R Markdown see
<http://rmarkdown.rstudio.com>
It looks like LaTeX, or to be more precise, it's a
**Beamer** presentation, but this slide was created like:
24. R Markdown (continued)
The Markdown syntax has some enhancements. For example, you
can include LATEX equations, like this:
i γµ
∂µψ − mcψ = 0 (1)
We can also add tables and figures, just as we would do in LATEX:
Table 1: Effectiveness of Insect Sprays. The mean counts of insects in
agricultural experimental units treated with different insecticides.
Count Spray
A 14.500000
B 15.333333
C 2.083333
D 4.916667
E 3.500000
F 16.666667
25. Example code chunk
A code chunk using R is defined like this:
```{r}
print('hello world!')
```
A code chunk using some other execution engine is defined like this:
```{<execution-engine>}
print('hello world!')
```
For example:
```{python}
x = 'hello, python world!'
print(x.split(' '))
```
26. Example R code chunk
ggplot(
data = gapminder, aes(x = lifeExp, y = gdpPercap)) +
geom_point(
aes(color = continent, shape = continent)) +
scale_y_log10()
1e+03
1e+04
1e+05
40 60 80
lifeExp
gdpPercap
continent
Africa
Americas
Asia
Europe
Oceania
27. Code execution engines
Although R is the default code execution engine and a
first-class citizen in the knitr system, many other code
execution engines are available; this is the current list: awk,
bash, coffee, gawk, groovy, haskell, lein, mysql, node, octave,
perl, psql, python, Rscript, ruby, sas, scala, sed, sh, stata, zsh,
highlight, Rcpp, tikz, dot, c, fortran, fortran95, asy, cat, asis,
stan, block, block2, js, css, sql, go
I’ve already shown examples of R and Python code chunks
earlier in this talk
Except for R, all chunks are executed in separate sessions, so
the variables cannot be directly shared. If we want to make use
of objects created in previous chunks, we usually have to write
them to files (as side effects). For the bash engine, we can use
Sys.setenv() to export variables from R to bash.
Code chunks can be read from external files, in the event that
you don’t want to put everything in a monster-length
markdown document (use bookdown)
28. FORTRAN
Define a subroutine:
C Fortran test
subroutine fexp(n, x)
double precision x
C output
integer n, i
C input value
do 10 i=1,n
x=dexp(dcos(dsin(dble(float(i)))))
10 continue
return
end
29. FORTRAN (continued)
Call it from R:
# Call the function from R:
res = .Fortran("fexp", n=100000L, x=0)
str(res)
## List of 2
## $ n: int 100000
## $ x: num 2.72
31. SQL
db = dbConnect(RSQLite::SQLite(), dbname = ":memory:")
DROP TABLE IF EXISTS packages
CREATE TABLE packages (id INTEGER, name TEXT)
INSERT INTO packages VALUES (1, 'readr'), (2, 'tm')
SELECT * FROM packages
/* Can direct query results to an R data frame */
Table 2: 2 records
id name
1 readr
2 tm
32. C
C code is compiled automatically:
void square(double *x) {
*x = *x * *x;
}
// Compiler now runs...
## gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -fpi
## gcc -std=gnu99 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-f
Test the square() function:
.C('square', 9)
## [[1]]
## [1] 81
33. Rcpp
The C++ code is compiled through the Rcpp package; here we
show how we can write a function in C+ that can be called from R:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector timesTwo(NumericVector x) {
return x * 2;
}
# In R:
print(timesTwo(3.1415926))
## [1] 6.283185
34. cat
A special engine cat can be used to save the content of a code
chunk to a file using the cat() function, defined like this:
```{cat engine.opts=list(file='source.cxx')}
// Some code here...
```
The contents can be anything, but perhaps the most interesting use
is as container for source code that you compile and run later.
35. BASH
echo hello bash!
echo 'a b c' | sed 's/ /|/g'
# We can also compile the source.cxx file that we
# created in the previous code chunk; maybe compile
# against ROOT libraries, etc.
# Then run the application
## hello bash!
## a|b|c
We can run anything that we can run from the command line,
including stuff we built in previous code chunks
36. Data exchange
Since the Python engine executes code in an external process,
exchanging data between R chunks and Python chunks is done via
the file system. If you are exchanging data frames, you can use the
Feather package for very high performance transfer of even large
data frames between Python and R:
import pandas
import feather
# Read flights data and select flights to O'Hare
flights = pandas.read_csv("flights.csv")
flights = flights[flights['dest'] == "ORD"]
# Select carrier and delay columns and drop rows with
# missing values
flights = flights[['carrier','dep_delay','arr_delay']]
flights = flights.dropna()
print flights.head(10)
# Write to feather file for reading from R
feather.write_dataframe(flights, "flights.feather")
38. Data exchange (summary)
ROOT ↔ R: RootTreeToR
ROOT ↔ Python: root_numpy
Python ↔ R: Feather
Scala, Julia and other languages supported
ROOT → Octave/MATLAB: I wrote an interface to do this
39. Knitron
The knitron package, available on GitHub but not yet on
CRAN, is intended to allow users the ability to use
IPython/Jupyter and matplotlib in R Markdown code chunks
and render them with knitr
According to the author’s website, “Knitron works by lazily
starting a global IPython kernel the first time a code chunk
gets rendered by knitr and this kernel is reused for all
consecutive chunks. This way all the computation done in any
previous chunk is available in the current chunk, providing
R-like behaviour for Python”.
40. Knitron example
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for c, z in zip(['r', 'g', 'b', 'y'], [30, 20, 10, 0]):
xs = np.arange(20)
ys = np.random.rand(20)
cs = [c] * len(xs)
cs[0] = 'c'
ax.bar(xs, ys, zs=z, zdir='y', color=cs, alpha=0.8)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
plt.savefig("myplot.pdf")
## GLib-GIO-Message: Using the 'memory' GSettings backend.
41. Knitron example (continued)
Then plot the figure using a standard knitr mechanism, for example:
cat("![My matplotlib plot.](myplot.pdf)")
X
0
5
10
15
Y
0.04
0.02
0.00
0.02
0.04
0.06
Z
0.0
0.2
0.4
0.6
0.8
1.0
42. Knitron example (continued)
Alternatively, we can use the following recipe for JPEG/PNG files to
get a figure caption and allow cross-referencing and image resizing:
require(png); require(grid)
img <- readPNG("myplot.png")
grid.raster(img)
Figure 4: This figure has been produced from matplotlib in a Python
code chunk using the knitron R package.
43. ROOT
In the R tutorial that I gave at the IML Workshop last month, I
spoke about the possibility of using code execution engines
other than R, but noted then that there is no way to use
ROOT as a code execution engine
Well, now there is!
I have written an interface to allow ROOT to be used as a
code execution engine
Simple proof-of-concept at the moment; nothing too
sophisticated at this stage
Runs root -q -b macro.C and runs the macro
If not sufficient, it’s always possible to run more complicated
analyses, perhaps using stand-alone C++ code using ROOT
libraries, or using your experiment’s software framework, in a
UNIX shell code chunk
44. ROOT code execution engine example
// Builds a graph with errors, displays it and saves it
// as image. First, include some header files (within,
// CINT these will be ignored).
#include "TCanvas.h"
#include "TROOT.h"
#include "TGraphErrors.h"
#include "TF1.h"
#include "TLegend.h"
#include "TArrow.h"
#include "TLatex.h"
void macro1(){
// The values and the errors on the Y axis
const int n_points=10;
double x_vals[n_points]=
{1,2,3,4,5,6,7,8,9,10};
45. lenght [cm]
2 4 6 8 10
Arb.Units
0
10
20
30
40
50
60
Measurement XYZ
Lab. Lesson 1
Exp. Points
Th. Law
Deviation
Maximum
Measurement XYZ
Figure 5: This is the graph generated by the ROOT macro.
46. ROOT + markdown
ROOT + Jupyter Notebook = ROOTbook
Develop your analysis interactively
Work collaboratively
Reproducible document
Contains the plots and numeric results
Version control is difficult
Not immediately fit for publication in an academic journal
ROOT + R markdown = “ROOTdown”?
Interactivity not currently available for ROOT (needs work)
Plots and numeric results only in the rendered document
Plain text format → version control is simple
Immediately fit for publication in an academic journal
47. Example paper output
See toy_example_paper.pdf to see example output from knitr
This Beamer presentation, but rendered as a paper instead
To demonstrate that the necessary infrastructure works
Contains (hyperlinked) cross-references and bibliography
Note that I wouldn’t have to do anything special to make this
happen for a real paper; I ran a single command to run all the
“analysis code” and generate the document with all the plots,
tables, numerical results, etc.
Raw LATEX (to submit to journal) generated as a by-product
48. Messages to take away
It is already possible to write a reproducible analysis in your
favourite programming language and have your paper rendered
fit for publication
You can mix and match programming languages in a single
uninterrupted workflow
Enabling an analysis to be performed in a single unbroken
workflow greatly facilitates reproducibility
There are ways to exchange data between code chunks that are
written in different programming languages
ROOT, Python, R, . . .
Works nicely with version control (it’s just plain text!)
You can now embed your ROOT analysis code in a
reproducible academic publication