Upcoming SlideShare
×

# Scientific Software Development

714 views

Published on

Introduction to proper software development practices in scientific computing -- revision control, unit testing in R, code reviews, reproducibility, and replicability.

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
714
On SlideShare
0
From Embeds
0
Number of Embeds
393
Actions
Shares
0
6
0
Likes
0
Embeds 0
No embeds

No notes for slide
• Every good programmer I know uses, most bad ones I know don’t.
• You can use Git without Github. GitHub is one of the options for hosting Git repositories.
• Overview, list of projectsPublic v PrivateShow commitsShow diff of a commitShow comments/discussion on commitShow tagsShow wiki – devtools - https://github.com/hadley/devtools/Show issuesShow pull requests
• Only true way to achieve replicability in a project under development is to use a revision control system
• Spot two problems with this function 1. negatives 2. decimals
• What would our new tests look like?
• expect_that(square(2.5),equals(6.25))expect_that(square(-2),equals(4)) square &lt;- function(x){sq &lt;- 0 for (i in 1:x){sq &lt;- sq + x } return(sq) }test_that(&quot;Square function works on various input types&quot;, {expect_that(square(3), equals(9))expect_that(square(5), equals(25))expect_that(square(2.5), equals(6.25))expect_that(square(-2), equals(4))})
• square &lt;- function(x){sq &lt;- x * x return(sq) }test_that(&quot;Square function works on various input types&quot;, {expect_that(square(3), equals(9))expect_that(square(5), equals(25))expect_that(square(2.5), equals(6.25))expect_that(square(-2), equals(4))})
• Lightweight Email your code and have a peer or more experienced programmer look through the code and suggest improvements Demo GitHubFormal Schedule a meeting with a handful of other programmers to audit the code you’ve written Should be less than 500 LOC per meeting Target around 200LOC per hour Selectively pick sections of code to review formally
• square &lt;- function(x){ x ^ 2}test_that(&quot;Square function works on various input types&quot;, {expect_that(square(3), equals(9))expect_that(square(5), equals(25))expect_that(square(2.5), equals(6.25))expect_that(square(-2), equals(4))})
• ### Scientific Software Development

1. 1. Avoiding Big Mistakesin Scientific ComputingOr: How to Write Code That Doesn’t JeopardizeYour Professional Reputation or Patient’s Lives Jeff Allen Quantitative Biomedical Research Center UT Southwestern Medical Center BSCI5096 - 3.26.2013
2. 2. Motivation• Anil Potti scandal at Duke – Genomic signature identified that would identify the best chemo based on a patient‟s genes. – Over 100 patients enrolled in clinical trials. – Later discovered gross mishandling of data and invalidating bugs in software – Alleged manipulation of data – Watch: Lecture from Keith Baggerly
3. 3. Outline• Revision Control• Reproducibility and Replicability• Ensuring Code Quality• Resources
4. 4. Outline• Revision Control – Introduction & Concepts – Git & GitHub• Reproducibility and Replicability• Ensuring Code Quality• Resources
5. 5. Revision Control• Tracks changes to files over time• Keeps a complete log of all changes ever made to any file in a project• Supports more collaboration on projects – Provides an authoritative repository for the code – Gracefully catch and handle conflicts in files• Various forms in use today including Mercurial, Git, Subversion
6. 6. Git• Modern distributed revision control system – “Distributed” means you have the entire history of the project on your local machine. – Don‟t have to be online to develop.• Makes improvements in performance and usability on past systems.• Open-Source and free
7. 7. GitHub• A website that hosts Git repositories.• You can “push” your own Git repositories to their site to gain: – A web interface – easier way to view your files and track changes – Control who has access to which projects – Project organization – hosts documentation, bug- tracking, etc. – Social platform – the “Facebook” of coding – Client-Side graphical user interface
8. 8. GITHUB DEMONSTRATION
9. 9. GitHub Client - GUI• Only works with GitHub.• Much easier to use and navigate.• Mac and Windows versions.• On campus: Need to open Git Shell and run: git config --global http.proxy http://proxy.swmed.edu:3128
10. 10. GitHub Client
11. 11. GITHUB CLIENT DEMO
12. 12. Use Cases• “This function used to work.” – Look at the changes made to that file since it last worked.• “Please send me the code used in this publication.” – Revert the project back to any point in its history• “I found a bug and fixed it.” – (Optionally) Allow others to contribute to your projects.
13. 13. Outline• Revision Control• Reproducibility and Replicability – Replicability – Reproducibility• Ensuring Code Quality• Resources
14. 14. “‘Replicable’ means „other people get exactlythe same results when doing exactly the samething‟, while ‘reproducible’ means „somethingsimilar happens in other peoples hands.‟ Thelatter is far stronger, in general, because itindicates that your results are not merely somequirk of your setup and may actually be right.” C. TITUS BROWN http://ivory.idyll.org/blog/replication-i.html
15. 15. Replicability• In order for analysis to be replicable, another researcher must have access to: – The exact same code you used – The exact same data you used• Any changes (including bug-fixes and other corrections) in your code or data from what you provide will make your results irreplicable. – Must track in a revision control system
16. 16. Reproducibility• Requires much more time and effort• Independently arrive at the same conclusions – Potentially using the same data – Using different techniques and parameters• May take as much time to reproduce results as it did to produce them the first time• Should be done in high-stakes (i.e. clinical) applications
17. 17. Recommended Practicesa. Use a revision control system such as GitHubb. To ensure replicability, clone your repository on another computer and re-run all your analysis. Ensure you get the same results. • This is a good test of replicability. • Knowing you‟ll have to do this will make you write better organized code.c. If it‟s really important, ask a colleague to reproduce.
18. 18. Outline• Revision Control• Reproducibility and Replicability• Ensuring Code Quality – Automated Testing – Code reviews• Resources
19. 19. Automated Testing• Unit testing – Very specific target – May have multiple tests per function install.packages( “testthat”)• Many unit testing frameworks library(testthat) – In R: testthat, and Runit
20. 20. Testing Example - SquareCodesquare <- function(x){ sq <- 0 for (i in 1:x){ sq <- sq + x } return(sq)}
21. 21. Testing Example - SquareCode Tests expect_that(square <- function(x){ square(3), sq <- 0 equals(9) for (i in 1:x){ ) #Passes sq <- sq + x } return(sq)}
22. 22. Testing Example - SquareCode Tests expect_that(square(3),square <- function(x){ equals(9)) #Passes sq <- 0 expect_that(square(5), for (i in 1:x){ equals(25)) #Passes sq <- sq + x } return(sq)}
23. 23. Test-Driven Development (TDD)• If you see a bug: 1. Write a test that fails 2. Fix the bug 3. Show that the test now passes 4. Commit to revision control
24. 24. Testing Example - SquareCode Tests expect_that(square(3),square <- function(x){ equals(9)) #Passes sq <- 0 expect_that(square(5), for (i in 1:x){ equals(25)) #Passes sq <- sq + x } return(sq)}
25. 25. Testing Example - SquareCode Tests expect_that(square(3),square <- function(x){ equals(9)) #Passes sq <- 0 expect_that(square(5), for (i in 1:x){ equals(25)) #Passes sq <- sq + x expect_that(square(2.5), } equals(6.25)) #Fails return(sq)}
26. 26. Testing Example - SquareCode Tests expect_that(square(3),square <- function(x){ equals(9)) #Passes sq <- 0 expect_that(square(5), for (i in 1:x){ equals(25)) #Passes sq <- sq + x expect_that(square(2.5), } equals(6.25)) #Fails return(sq) expect_that(square(-2),} equals(4)) #Fails
27. 27. Test-Driven Development (TDD)• If you see a bug: 1. Write a test that fails 2. Fix the bug 3. Show that the test now passes 4. Commit to revision control
28. 28. Testing Example - SquareCodesquare <- function(x){ sq <- x * x return(sq)}
29. 29. Test-Driven Development (TDD)• If you see a bug: 1. Write a test that fails 2. Fix the bug 3. Show that the test now passes 4. Commit to revision control
30. 30. Testing Example - SquareCodesquare <- function(x){ sq <- x * x return(sq)}
31. 31. Testing Example - SquareCode Tests expect_that(square(3), equals(9)) #Passes expect_that(square(5),square <- function(x){ equals(25)) #Passes sq <- x * x expect_that(square(2.5), return(sq) equals(6.25)) #Passes} expect_that(square(-2), equals(4)) #Passes
32. 32. Test-Driven Development (TDD)• If you see a bug: 1. Write a test that fails 2. Fix the bug 3. Show that the test now passes 4. Commit to revision control
33. 33. Test-Driven Development (TDD)• Advantages – Ensure that problematic areas are well-tested – Regression testing – ensure old bugs don‟t ever come back – Confidently approach old code – More assured in handling someone else‟s code – Saves you time over manual testing
34. 34. Code Reviews• Get more than one set of eyes on your code• Lightweight – Email to get quick feedback – GitHub is great for this• Formal – Have a meeting to audit – Less than 500 LOC per meeting
35. 35. Extreme – Pair Programming• Two programmers share a single workstation• Both participate, though only one can type• Significant learning opportunities for both• Can strategically pair: – Senior with Junior, mentoring – Statistician with Developer, mutual learning• Improvements in code quality compensate for short-term efficiency loss – fewer bugs, easier code to maintain
36. 36. Testing Example - SquareCode Tests expect_that(square(3), equals(9)) #Passes expect_that(square(5),square <- function(x){ equals(25)) #Passes sq <- x * x expect_that(square(2.5), return(sq) equals(6.25)) #Passes} expect_that(square(-2), equals(4)) #Passes
37. 37. Testing Example - SquareCode Tests expect_that(square(3), equals(9)) #Passes expect_that(square(5),square <- function(x){ equals(25)) #Passes x^2 expect_that(square(2.5),} equals(6.25)) #Passes expect_that(square(-2), equals(4)) #Passes
38. 38. Outline• Revision Control• Reproducibility and Replicability• Ensuring Code Quality• Resources
39. 39. Resources• Software Carpentry – www.software-carpentry.org – Volunteer organization focused on teaching these topics to scientific audiences – Contact us (Jeffrey.Allen@UTSouthwestern.edu) if you‟d be interested in attending a local Boot Camp• GitHub Documentation – https://help.github.com/ – Great documentation on how to use Git and/or GitHub
40. 40. Resources• Unit Testing in R – http://cran.r- project.org/web/packages/RUnit/index.html – http://cran.r- project.org/web/packages/testthat/index.html – http://journal.r-project.org/archive/2011- 1/RJournal_2011-1_Wickham.pdf
41. 41. Suggested Next Steps• Watch Lecture from Keith Baggerly• Register for a GitHub account (free), explore• Write an R function and cover it with unit tests using the test_that framework • Then check into a public GitHub repo