0
Reproducible Research andAlec Zwart21 November 2012CSIRO MATHEMATICS, INFORMATICS, AND STATISTICSImage: Fomel, S. & Claerb...
Preliminary: Markup2 | Reproduceable Research and R | Alec Zwart
Donald Knuth - Literate Programming• DE Knuth - The Computer Journal, 1984. ‘Instead of imagining that our main task is to...
WEB:‘Weave’                                                ‘Tangle’4 | Reproduceable Research and R | Alec Zwart
Weaving (modern version)                                                                 Text, code &                     ...
Why?• Knuth – a way to program• Automatic report generation (web services)• Reports, articles, program documentation/tutor...
Reproducible Research• Promoted by Jon F. Claerbout, Stanford University (1990’s?).• Early publication: Wavelab and Reprod...
Reproducible Research in Statistics• Gentleman & Temple Lang 2004, ‘Statistical Analysis and  Reproducible Research’. ‘It ...
Gentleman and Temple Lange:              The Compendium9 | Reproduceable Research and R | Alec Zwart
Literate programming systems in• CRAN: Task Views – ReproduceableResearch• Sweave (R+LaTeX, standard for vignette producti...
Knitr + Markdown – Yihui Xie11 | Reproduceable Research and R | Alec Zwart
Publish on…12 | Reproduceable Research and R | Alec Zwart
Thank youCSIRO Mathematics, Informatics &StatisticsAlec Zwartt +61 2 6216 7010e alec.zwart@csiro.auCSIRO MATHEMATICS, INFO...
Reproducible Research – again, why?• Anil Potti - Duke University, North Carolina   • Personalised medicine for cancer pat...
B & D: Reproducible Research – why?• Buckheit & Donoho – anecdotes:   •   Which of these printouts was the right version o...
Knitr + markdown• Markdown – text formatting system, not nearly as powerful as  LaTeX, but simple• Knitr + markdown great ...
17 | Reproduceable Research and R | Alec Zwart
Weaving (modern version)                                                     Text w/ markup                               ...
19 | Reproduceable Research and R | Alec Zwart
Knitr + Markdown – Yihui Xie20 | Reproduceable Research and R | Alec Zwart
Knitr + Markdown – Yihui Xie21 | Reproduceable Research and R | Alec Zwart
G & TL – the Compendium• For RR, may need to provide:   •   Dynamic document files   •   Extra code files   •   Extra text...
WEB                                                 CWEB                                                 WEB              ...
Knitr• Yihui Xie• Scratching Sweave itches?• Greater functionality   •   better R output capture,   •   better code format...
Upcoming SlideShare
Loading in...5
×

Reproducible Research and R - Alec Zwart

842

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
842
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Markup – codes added to plain text, to control formatting and layout.HTMLLateX
  • Knuth – highly distinguished & respected computer scientistAuthor of the TeX typsetting system (documents with beautiful math!) Author of ‘The Art of computer programming’ up to 4 volumes (so far) Classic of computer science theory’. 1984 – presented ‘Web’ a system for combining plain text with code to produce nicely formatted documents, plus runnable code. (Code documentation, tutorials, etc)
  • Weave – process into formatted document (via TeX)Tangle – extract the code sections into runnable/compilable code documents.
  • Regarding Knuth – Overly enthusiastic? Although, I am beginning to like knitr+markdown for my code – stream of conscious captured tidily.Regarding reports, articles, etc:- Via LaTeX of other typesetting systems, including easy webpage generation- If you are a decent typist, and are familiar with the syntax – less mousework, fast.- Need to make changes – just reprocess to update!
  • Provide, with publications, all data, code, documentation needed to reproduce results (figures, tables, simulations, whatever).So others can Check your working!Better learn from your work, incl. all the gory detailsMore easily use & build on your workPromotes increased openess, clarity, rigour & (re)useability of the science.
  • For RR, may need to provide:Dynamic document filesExtra code filesExtra text processing files (e.g. LaTeX style files, etc?)Data filesInstructions/documentationPlace all of this in a suitable container – the CompendiumA folder with subfoldersAn R package!GolubRR package – Gentleman 2005, ‘Reproducible Research: A Bioinformatics Case Study’.
  • Yihui XieKnitr - Scratching Sweave itches?Greater functionalitybetter R output capture, better code formatting, built in caching, better graphics handling, source R code from scripts, more customizable.Multiple programming languages (R, python, AWK), and alternative text processing systems (LaTeX, markdown, restructured text & more)Markdown – text formatting system, not nearly as powerful as LaTeX, but simpleKnitr + markdown great for producing quick reports in HTMLIncorporated into RStudio – See RStudio pages for docs.Knitr webpage: http://yihui.name/knitrDocumentation for code chunk options:http://yihui.name/knitr/options
  • For RR, may need to provide:Dynamic document filesExtra code filesExtra text processing files (e.g. LaTeX style files, etc?)Data filesInstructions/documentationPlace all of this in a suitable container – the CompendiumA folder with subfoldersAn R package!GolubRR package – Gentleman 2005, ‘Reproducible Research: A Bioinformatics Case Study’.
  • Weave – process into formatted document (via TeX)Tangle – extract the code sections into runnable/compilable code documents.
  • Transcript of "Reproducible Research and R - Alec Zwart"

    1. 1. Reproducible Research andAlec Zwart21 November 2012CSIRO MATHEMATICS, INFORMATICS, AND STATISTICSImage: Fomel, S. & Claerbout, J. F. Guest Editors’ Introduction: Reproducible Research. Computing in Science & Engineering 11, 5–7 (2009).
    2. 2. Preliminary: Markup2 | Reproduceable Research and R | Alec Zwart
    3. 3. Donald Knuth - Literate Programming• DE Knuth - The Computer Journal, 1984. ‘Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.’3 | Reproduceable Research and R | Alec Zwart
    4. 4. WEB:‘Weave’ ‘Tangle’4 | Reproduceable Research and R | Alec Zwart
    5. 5. Weaving (modern version) Text, code & Code block output outputs w/ markupLanguage translator (R, Python…) Text markup processor (LaTeX, Web browser, Code blocks CWEB, noweb, CWEB Markdown processor) Sweave, knitr… Text w/ markup Formatted output5 | Reproduceable Research and R | Alec Zwart
    6. 6. Why?• Knuth – a way to program• Automatic report generation (web services)• Reports, articles, program documentation/tutorials• Reproducible research6 | Reproduceable Research and R | Alec Zwart
    7. 7. Reproducible Research• Promoted by Jon F. Claerbout, Stanford University (1990’s?).• Early publication: Wavelab and Reproducible Research, Buckheit & Donoho 1995. • ‘When we publish articles containing figures which were generated by computer, we also publish the complete software environment which generates the figures’.• Special issue, Computing in Science & Engineering, V11-1,2009.Image: Fomel, S. & Claerbout, J. F. Guest Editors’ Introduction: Reproducible Research. Computing in Science & Engineering 11, 5–7 (2009).7 | Reproduceable Research and R | Alec Zwart
    8. 8. Reproducible Research in Statistics• Gentleman & Temple Lang 2004, ‘Statistical Analysis and Reproducible Research’. ‘It is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, and so on with the documents that describe and rely on them.’8 | Reproduceable Research and R | Alec Zwart
    9. 9. Gentleman and Temple Lange: The Compendium9 | Reproduceable Research and R | Alec Zwart
    10. 10. Literate programming systems in• CRAN: Task Views – ReproduceableResearch• Sweave (R+LaTeX, standard for vignette production)• Knitr (various + various)• Other possibilities (ascii, odfWeave, brew, etc)10 | Reproduceable Research and R | Alec Zwart
    11. 11. Knitr + Markdown – Yihui Xie11 | Reproduceable Research and R | Alec Zwart
    12. 12. Publish on…12 | Reproduceable Research and R | Alec Zwart
    13. 13. Thank youCSIRO Mathematics, Informatics &StatisticsAlec Zwartt +61 2 6216 7010e alec.zwart@csiro.auCSIRO MATHEMATICS, INFORMATICS AND STATISTICS
    14. 14. Reproducible Research – again, why?• Anil Potti - Duke University, North Carolina • Personalised medicine for cancer patients • Microarray work• Statisticians Keith Baggerly, Kevin Coombes intrigued by results from Potti’s research – decide to investigate: • Found errors, including lots of simple ones – mislabelled samples, mismatched gene names, etc.• To date: 10 retractions, 7 corrections, 1 partial retraction. Anil Potti resigned.• Dishonesty? Ignorance, incompetence + wishful thinking? Unclear14 | Reproduceable Research and R | Alec Zwart
    15. 15. B & D: Reproducible Research – why?• Buckheit & Donoho – anecdotes: • Which of these printouts was the right version of the figure? – Arrgh! • Stolen brief case – loss of irreplaceable figures. • Limitations of oral communication of software & algorithms. • Documentation – returning to old work. • Er – can’t remember what parameter values gave this result – not to worry…• ‘An article about computational science in a scientific publication is NOT the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures’ - Buckheit & Donoho15 | Reproduceable Research and R | Alec Zwart
    16. 16. Knitr + markdown• Markdown – text formatting system, not nearly as powerful as LaTeX, but simple• Knitr + markdown great for producing quick reports in HTML• Incorporated into RStudio – See RStudio pages for docs.• Knitr webpage: http://yihui.name/knitr• Documentation for code chunk options: http://yihui.name/knitr/options16 | Reproduceable Research and R | Alec Zwart
    17. 17. 17 | Reproduceable Research and R | Alec Zwart
    18. 18. Weaving (modern version) Text w/ markup Text, code & output w/ markup CWEB, noweb, CWEB Sweave, knitr… Text markup processor Code blocks (LaTeX, Web browser, Markdown processor) Language translator (R, Python…) Code block Formatted output outputs18 | Reproduceable Research and R | Alec Zwart
    19. 19. 19 | Reproduceable Research and R | Alec Zwart
    20. 20. Knitr + Markdown – Yihui Xie20 | Reproduceable Research and R | Alec Zwart
    21. 21. Knitr + Markdown – Yihui Xie21 | Reproduceable Research and R | Alec Zwart
    22. 22. G & TL – the Compendium• For RR, may need to provide: • Dynamic document files • Extra code files • Extra text processing files (e.g. LaTeX style files, etc?) • Data files • Instructions/documentation• Place all of this in a suitable container – the Compendium • A folder with subfolders • An R package! • GolubRR package – Gentleman 2005, ‘Reproducible Research: A Bioinformatics Case Study’.22 | Reproduceable Research and R | Alec Zwart
    23. 23. WEB CWEB WEB ‘Tangle’23 | Reproduceable Research and R | Alec Zwart
    24. 24. Knitr• Yihui Xie• Scratching Sweave itches?• Greater functionality • better R output capture, • better code formatting, • built in caching, • better graphics handling, • source R code from scripts, • more customizable.• Multiple programming languages (R, python, AWK), and alternative text processing systems (LaTeX, markdown, restructured text & more)24 | Reproduceable Research and R | Alec Zwart
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×