SlideShare a Scribd company logo
1 of 41
Avoiding Big Mistakes
in Scientific Computing
Or: How to Write Code That Doesn’t Jeopardize
Your Professional Reputation or Patient’s Lives




                                                Jeff Allen
                  Quantitative Biomedical Research Center
                          UT Southwestern Medical Center
                                    BSCI5096 - 3.26.2013
Motivation


• Anil Potti scandal at Duke
  – Genomic signature identified that would identify
    the best chemo based on a patient‟s genes.
  – Over 100 patients enrolled in clinical trials.
  – Later discovered gross mishandling of data and
    invalidating bugs in software
  – Alleged manipulation of data
  – Watch: Lecture from Keith Baggerly
Outline




•   Revision Control
•   Reproducibility and Replicability
•   Ensuring Code Quality
•   Resources
Outline



• Revision Control
  – Introduction & Concepts
  – Git & GitHub
• Reproducibility and Replicability
• Ensuring Code Quality
• Resources
Revision Control


• Tracks changes to files over time
• Keeps a complete log of all changes ever
  made to any file in a project
• Supports more collaboration on projects
  – Provides an authoritative repository for the code
  – Gracefully catch and handle conflicts in files
• Various forms in use today including
  Mercurial, Git, Subversion
Git


• Modern distributed revision control system
  – “Distributed” means you have the entire history of
    the project on your local machine.
  – Don‟t have to be online to develop.
• Makes improvements in performance and
  usability on past systems.
• Open-Source and free
GitHub

• A website that hosts Git repositories.
• You can “push” your own Git repositories to
  their site to gain:
  – A web interface – easier way to view your files and
    track changes
  – Control who has access to which projects
  – Project organization – hosts documentation, bug-
    tracking, etc.
  – Social platform – the “Facebook” of coding
  – Client-Side graphical user interface
GITHUB DEMONSTRATION
GitHub Client - GUI



•   Only works with GitHub.
•   Much easier to use and navigate.
•   Mac and Windows versions.
•   On campus: Need to open Git Shell and run:
    git config --global http.proxy http://proxy.swmed.edu:3128
GitHub Client
GITHUB CLIENT DEMO
Use Cases

• “This function used to work.”
  – Look at the changes made to that file since it last
    worked.
• “Please send me the code used in this
  publication.”
  – Revert the project back to any point in its history
• “I found a bug and fixed it.”
  – (Optionally) Allow others to contribute to your
    projects.
Outline



• Revision Control
• Reproducibility and Replicability
  – Replicability
  – Reproducibility
• Ensuring Code Quality
• Resources
“‘Replicable’ means „other people get exactly
the same results when doing exactly the same
thing‟, while ‘reproducible’ means „something
similar happens in other people's hands.‟ The
latter is far stronger, in general, because it
indicates that your results are not merely some
quirk of your setup and may actually be right.”
                     C. TITUS BROWN
                         http://ivory.idyll.org/blog/replication-i.html
Replicability


• In order for analysis to be replicable, another
  researcher must have access to:
  – The exact same code you used
  – The exact same data you used
• Any changes (including bug-fixes and other
  corrections) in your code or data from what
  you provide will make your results irreplicable.
  – Must track in a revision control system
Reproducibility


• Requires much more time and effort
• Independently arrive at the same conclusions
  – Potentially using the same data
  – Using different techniques and parameters
• May take as much time to reproduce results
  as it did to produce them the first time
• Should be done in high-stakes (i.e. clinical)
  applications
Recommended Practices

a. Use a revision control system such as GitHub
b. To ensure replicability, clone your repository
   on another computer and re-run all your
   analysis. Ensure you get the same results.
  •   This is a good test of replicability.
  •   Knowing you‟ll have to do this will make you write
      better organized code.
c. If it‟s really important, ask a colleague to
   reproduce.
Outline



• Revision Control
• Reproducibility and Replicability
• Ensuring Code Quality
  – Automated Testing
  – Code reviews
• Resources
Automated Testing


• Unit testing
   – Very specific target
   – May have multiple tests
     per function
                                 install.packages(
                                        “testthat”)
• Many unit testing
  frameworks                     library(testthat)
   – In R: testthat, and Runit
Testing Example - Square

Code

square <- function(x){
  sq <- 0
  for (i in 1:x){
     sq <- sq + x
  }
  return(sq)
}
Testing Example - Square

Code                     Tests
                         expect_that(
square <- function(x){      square(3),
  sq <- 0                   equals(9)
  for (i in 1:x){        ) #Passes
     sq <- sq + x
  }
  return(sq)
}
Testing Example - Square

Code                     Tests
                         expect_that(square(3),
square <- function(x){     equals(9)) #Passes
  sq <- 0                expect_that(square(5),
  for (i in 1:x){          equals(25)) #Passes
     sq <- sq + x
  }
  return(sq)
}
Test-Driven Development (TDD)



• If you see a bug:
  1.   Write a test that fails
  2.   Fix the bug
  3.   Show that the test now passes
  4.   Commit to revision control
Testing Example - Square

Code                     Tests
                         expect_that(square(3),
square <- function(x){     equals(9)) #Passes
  sq <- 0                expect_that(square(5),
  for (i in 1:x){          equals(25)) #Passes
     sq <- sq + x
  }
  return(sq)
}
Testing Example - Square

Code                     Tests
                         expect_that(square(3),
square <- function(x){     equals(9)) #Passes
  sq <- 0                expect_that(square(5),
  for (i in 1:x){          equals(25)) #Passes
     sq <- sq + x        expect_that(square(2.5),
  }                        equals(6.25)) #Fails
  return(sq)
}
Testing Example - Square

Code                     Tests
                         expect_that(square(3),
square <- function(x){     equals(9)) #Passes
  sq <- 0                expect_that(square(5),
  for (i in 1:x){          equals(25)) #Passes
     sq <- sq + x        expect_that(square(2.5),
  }                        equals(6.25)) #Fails
  return(sq)             expect_that(square(-2),
}                          equals(4)) #Fails
Test-Driven Development (TDD)



• If you see a bug:
  1.   Write a test that fails
  2.   Fix the bug
  3.   Show that the test now passes
  4.   Commit to revision control
Testing Example - Square

Code




square <- function(x){
  sq <- x * x
  return(sq)
}
Test-Driven Development (TDD)



• If you see a bug:
  1.   Write a test that fails
  2.   Fix the bug
  3.   Show that the test now passes
  4.   Commit to revision control
Testing Example - Square

Code




square <- function(x){
  sq <- x * x
  return(sq)
}
Testing Example - Square

Code                     Tests
                         expect_that(square(3),
                           equals(9)) #Passes
                         expect_that(square(5),
square <- function(x){
                           equals(25)) #Passes
  sq <- x * x
                         expect_that(square(2.5),
  return(sq)
                           equals(6.25)) #Passes
}
                         expect_that(square(-2),
                           equals(4)) #Passes
Test-Driven Development (TDD)



• If you see a bug:
  1.   Write a test that fails
  2.   Fix the bug
  3.   Show that the test now passes
  4.   Commit to revision control
Test-Driven Development (TDD)


• Advantages
  – Ensure that problematic areas are well-tested
  – Regression testing – ensure old bugs don‟t ever
    come back
  – Confidently approach old code
  – More assured in handling someone else‟s code
  – Saves you time over manual testing
Code Reviews


• Get more than one set of eyes on your code
• Lightweight
  – Email to get quick feedback
  – GitHub is great for this
• Formal
  – Have a meeting to audit
  – Less than 500 LOC per meeting
Extreme – Pair Programming

•   Two programmers share a single workstation
•   Both participate, though only one can type
•   Significant learning opportunities for both
•   Can strategically pair:
    – Senior with Junior, mentoring
    – Statistician with Developer, mutual learning
• Improvements in code quality
  compensate for short-term efficiency loss
    – fewer bugs, easier code to maintain
Testing Example - Square

Code                     Tests
                         expect_that(square(3),
                           equals(9)) #Passes
                         expect_that(square(5),
square <- function(x){
                           equals(25)) #Passes
  sq <- x * x
                         expect_that(square(2.5),
  return(sq)
                           equals(6.25)) #Passes
}
                         expect_that(square(-2),
                           equals(4)) #Passes
Testing Example - Square

Code                     Tests
                         expect_that(square(3),
                           equals(9)) #Passes
                         expect_that(square(5),
square <- function(x){     equals(25)) #Passes
  x^2                    expect_that(square(2.5),
}                          equals(6.25)) #Passes
                         expect_that(square(-2),
                           equals(4)) #Passes
Outline



•   Revision Control
•   Reproducibility and Replicability
•   Ensuring Code Quality
•   Resources
Resources

• Software Carpentry
  – www.software-carpentry.org
  – Volunteer organization focused on teaching these
    topics to scientific audiences
  – Contact us (Jeffrey.Allen@UTSouthwestern.edu) if
    you‟d be interested in attending a local Boot Camp
• GitHub Documentation
  – https://help.github.com/
  – Great documentation on how to use Git and/or
    GitHub
Resources



• Unit Testing in R
  – http://cran.r-
    project.org/web/packages/RUnit/index.html
  – http://cran.r-
    project.org/web/packages/testthat/index.html
  – http://journal.r-project.org/archive/2011-
    1/RJournal_2011-1_Wickham.pdf
Suggested Next Steps



• Watch Lecture from Keith Baggerly
• Register for a GitHub account (free), explore
• Write an R function and cover it with unit tests
  using the test_that framework
  • Then check into a public GitHub repo

More Related Content

What's hot

When assertthat(you).understandUnitTesting() fails
When assertthat(you).understandUnitTesting() failsWhen assertthat(you).understandUnitTesting() fails
When assertthat(you).understandUnitTesting() failsMartin Skurla
 
Certified Reasoning for Automated Verification
Certified Reasoning for Automated VerificationCertified Reasoning for Automated Verification
Certified Reasoning for Automated VerificationAsankhaya Sharma
 
Test Driven Development - The art of fearless programming
Test Driven Development - The art of fearless programmingTest Driven Development - The art of fearless programming
Test Driven Development - The art of fearless programmingChamil Jeewantha
 
02 Java Language And OOP Part II LAB
02 Java Language And OOP Part II LAB02 Java Language And OOP Part II LAB
02 Java Language And OOP Part II LABHari Christian
 
Google mock for dummies
Google mock for dummiesGoogle mock for dummies
Google mock for dummiesHarry Potter
 
20111018 boost and gtest
20111018 boost and gtest20111018 boost and gtest
20111018 boost and gtestWill Shen
 
01 Java Language And OOP Part I LAB
01 Java Language And OOP Part I LAB01 Java Language And OOP Part I LAB
01 Java Language And OOP Part I LABHari Christian
 
Unit tests = maintenance hell ?
Unit tests = maintenance hell ? Unit tests = maintenance hell ?
Unit tests = maintenance hell ? Thibaud Desodt
 
Qtp training in hyderabad
Qtp training in hyderabadQtp training in hyderabad
Qtp training in hyderabadG.C Reddy
 
Test Driven Development
Test Driven DevelopmentTest Driven Development
Test Driven DevelopmentDhaval Dalal
 
2016 10-04: tdd++: tdd made easier
2016 10-04: tdd++: tdd made easier2016 10-04: tdd++: tdd made easier
2016 10-04: tdd++: tdd made easierChristian Hujer
 

What's hot (13)

When assertthat(you).understandUnitTesting() fails
When assertthat(you).understandUnitTesting() failsWhen assertthat(you).understandUnitTesting() fails
When assertthat(you).understandUnitTesting() fails
 
Certified Reasoning for Automated Verification
Certified Reasoning for Automated VerificationCertified Reasoning for Automated Verification
Certified Reasoning for Automated Verification
 
Test Driven Development - The art of fearless programming
Test Driven Development - The art of fearless programmingTest Driven Development - The art of fearless programming
Test Driven Development - The art of fearless programming
 
02 Java Language And OOP Part II LAB
02 Java Language And OOP Part II LAB02 Java Language And OOP Part II LAB
02 Java Language And OOP Part II LAB
 
Google mock for dummies
Google mock for dummiesGoogle mock for dummies
Google mock for dummies
 
Java 101
Java 101Java 101
Java 101
 
20111018 boost and gtest
20111018 boost and gtest20111018 boost and gtest
20111018 boost and gtest
 
01 Java Language And OOP Part I LAB
01 Java Language And OOP Part I LAB01 Java Language And OOP Part I LAB
01 Java Language And OOP Part I LAB
 
Unit tests = maintenance hell ?
Unit tests = maintenance hell ? Unit tests = maintenance hell ?
Unit tests = maintenance hell ?
 
Qtp training in hyderabad
Qtp training in hyderabadQtp training in hyderabad
Qtp training in hyderabad
 
Java 102
Java 102Java 102
Java 102
 
Test Driven Development
Test Driven DevelopmentTest Driven Development
Test Driven Development
 
2016 10-04: tdd++: tdd made easier
2016 10-04: tdd++: tdd made easier2016 10-04: tdd++: tdd made easier
2016 10-04: tdd++: tdd made easier
 

Viewers also liked

Group meeting may 16 2013
Group meeting   may 16 2013Group meeting   may 16 2013
Group meeting may 16 2013jalle6
 
Scientific Computing - Hardware
Scientific Computing - HardwareScientific Computing - Hardware
Scientific Computing - Hardwarejalle6
 
Creating R Packages
Creating R PackagesCreating R Packages
Creating R Packagesjalle6
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2jalle6
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

Viewers also liked (6)

Stories Inc. Deck
Stories Inc. DeckStories Inc. Deck
Stories Inc. Deck
 
Group meeting may 16 2013
Group meeting   may 16 2013Group meeting   may 16 2013
Group meeting may 16 2013
 
Scientific Computing - Hardware
Scientific Computing - HardwareScientific Computing - Hardware
Scientific Computing - Hardware
 
Creating R Packages
Creating R PackagesCreating R Packages
Creating R Packages
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Similar to Scientific Software Development

Test in action – week 1
Test in action – week 1Test in action – week 1
Test in action – week 1Yi-Huan Chan
 
VT.NET 20160411: An Intro to Test Driven Development (TDD)
VT.NET 20160411: An Intro to Test Driven Development (TDD)VT.NET 20160411: An Intro to Test Driven Development (TDD)
VT.NET 20160411: An Intro to Test Driven Development (TDD)Rob Hale
 
How to use Approval Tests for C++ Effectively
How to use Approval Tests for C++ EffectivelyHow to use Approval Tests for C++ Effectively
How to use Approval Tests for C++ EffectivelyClare Macrae
 
Mining Source Code Improvement Patterns from Similar Code Review Works
Mining Source Code Improvement Patterns from Similar Code Review WorksMining Source Code Improvement Patterns from Similar Code Review Works
Mining Source Code Improvement Patterns from Similar Code Review Works奈良先端大 情報科学研究科
 
Mining Source Code Improvement Patterns from Similar Code Review Works
Mining Source Code Improvement Patterns from Similar Code Review WorksMining Source Code Improvement Patterns from Similar Code Review Works
Mining Source Code Improvement Patterns from Similar Code Review WorksYuki Ueda
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
 
Advanced Java Testing @ POSS 2019
Advanced Java Testing @ POSS 2019Advanced Java Testing @ POSS 2019
Advanced Java Testing @ POSS 2019Vincent Massol
 
Continuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CIContinuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CIwajrcs
 
Automated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and ChallengesAutomated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and ChallengesTao Xie
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationMongoDB
 
C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#Hawkman Academy
 
New types of tests for Java projects
New types of tests for Java projectsNew types of tests for Java projects
New types of tests for Java projectsVincent Massol
 
New types of tests for Java projects
New types of tests for Java projectsNew types of tests for Java projects
New types of tests for Java projectsVincent Massol
 
Beginners overview of automated testing with Rspec
Beginners overview of automated testing with RspecBeginners overview of automated testing with Rspec
Beginners overview of automated testing with Rspecjeffrey1ross
 
SledgehammerToFinebrush_Devnexus_2021
SledgehammerToFinebrush_Devnexus_2021SledgehammerToFinebrush_Devnexus_2021
SledgehammerToFinebrush_Devnexus_2021Shelley Lambert
 
Quickly and Effectively Testing Legacy C++ Code with Approval Tests
Quickly and Effectively Testing Legacy C++ Code with Approval TestsQuickly and Effectively Testing Legacy C++ Code with Approval Tests
Quickly and Effectively Testing Legacy C++ Code with Approval TestsClare Macrae
 
A la découverte des google/test (aka gtest)
A la découverte des google/test (aka gtest)A la découverte des google/test (aka gtest)
A la découverte des google/test (aka gtest)Thierry Gayet
 
Developing a Culture of Quality Code (Midwest PHP 2020)
Developing a Culture of Quality Code (Midwest PHP 2020)Developing a Culture of Quality Code (Midwest PHP 2020)
Developing a Culture of Quality Code (Midwest PHP 2020)Scott Keck-Warren
 

Similar to Scientific Software Development (20)

Test in action – week 1
Test in action – week 1Test in action – week 1
Test in action – week 1
 
VT.NET 20160411: An Intro to Test Driven Development (TDD)
VT.NET 20160411: An Intro to Test Driven Development (TDD)VT.NET 20160411: An Intro to Test Driven Development (TDD)
VT.NET 20160411: An Intro to Test Driven Development (TDD)
 
How to use Approval Tests for C++ Effectively
How to use Approval Tests for C++ EffectivelyHow to use Approval Tests for C++ Effectively
How to use Approval Tests for C++ Effectively
 
Mining Source Code Improvement Patterns from Similar Code Review Works
Mining Source Code Improvement Patterns from Similar Code Review WorksMining Source Code Improvement Patterns from Similar Code Review Works
Mining Source Code Improvement Patterns from Similar Code Review Works
 
Mining Source Code Improvement Patterns from Similar Code Review Works
Mining Source Code Improvement Patterns from Similar Code Review WorksMining Source Code Improvement Patterns from Similar Code Review Works
Mining Source Code Improvement Patterns from Similar Code Review Works
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Advanced Java Testing @ POSS 2019
Advanced Java Testing @ POSS 2019Advanced Java Testing @ POSS 2019
Advanced Java Testing @ POSS 2019
 
Continuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CIContinuous Delivery - Automate & Build Better Software with Travis CI
Continuous Delivery - Automate & Build Better Software with Travis CI
 
Automated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and ChallengesAutomated Developer Testing: Achievements and Challenges
Automated Developer Testing: Achievements and Challenges
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#
 
New types of tests for Java projects
New types of tests for Java projectsNew types of tests for Java projects
New types of tests for Java projects
 
New types of tests for Java projects
New types of tests for Java projectsNew types of tests for Java projects
New types of tests for Java projects
 
Building XWiki
Building XWikiBuilding XWiki
Building XWiki
 
Beginners overview of automated testing with Rspec
Beginners overview of automated testing with RspecBeginners overview of automated testing with Rspec
Beginners overview of automated testing with Rspec
 
SledgehammerToFinebrush_Devnexus_2021
SledgehammerToFinebrush_Devnexus_2021SledgehammerToFinebrush_Devnexus_2021
SledgehammerToFinebrush_Devnexus_2021
 
11 whiteboxtesting
11 whiteboxtesting11 whiteboxtesting
11 whiteboxtesting
 
Quickly and Effectively Testing Legacy C++ Code with Approval Tests
Quickly and Effectively Testing Legacy C++ Code with Approval TestsQuickly and Effectively Testing Legacy C++ Code with Approval Tests
Quickly and Effectively Testing Legacy C++ Code with Approval Tests
 
A la découverte des google/test (aka gtest)
A la découverte des google/test (aka gtest)A la découverte des google/test (aka gtest)
A la découverte des google/test (aka gtest)
 
Developing a Culture of Quality Code (Midwest PHP 2020)
Developing a Culture of Quality Code (Midwest PHP 2020)Developing a Culture of Quality Code (Midwest PHP 2020)
Developing a Culture of Quality Code (Midwest PHP 2020)
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Scientific Software Development

  • 1. Avoiding Big Mistakes in Scientific Computing Or: How to Write Code That Doesn’t Jeopardize Your Professional Reputation or Patient’s Lives Jeff Allen Quantitative Biomedical Research Center UT Southwestern Medical Center BSCI5096 - 3.26.2013
  • 2. Motivation • Anil Potti scandal at Duke – Genomic signature identified that would identify the best chemo based on a patient‟s genes. – Over 100 patients enrolled in clinical trials. – Later discovered gross mishandling of data and invalidating bugs in software – Alleged manipulation of data – Watch: Lecture from Keith Baggerly
  • 3. Outline • Revision Control • Reproducibility and Replicability • Ensuring Code Quality • Resources
  • 4. Outline • Revision Control – Introduction & Concepts – Git & GitHub • Reproducibility and Replicability • Ensuring Code Quality • Resources
  • 5. Revision Control • Tracks changes to files over time • Keeps a complete log of all changes ever made to any file in a project • Supports more collaboration on projects – Provides an authoritative repository for the code – Gracefully catch and handle conflicts in files • Various forms in use today including Mercurial, Git, Subversion
  • 6. Git • Modern distributed revision control system – “Distributed” means you have the entire history of the project on your local machine. – Don‟t have to be online to develop. • Makes improvements in performance and usability on past systems. • Open-Source and free
  • 7. GitHub • A website that hosts Git repositories. • You can “push” your own Git repositories to their site to gain: – A web interface – easier way to view your files and track changes – Control who has access to which projects – Project organization – hosts documentation, bug- tracking, etc. – Social platform – the “Facebook” of coding – Client-Side graphical user interface
  • 9. GitHub Client - GUI • Only works with GitHub. • Much easier to use and navigate. • Mac and Windows versions. • On campus: Need to open Git Shell and run: git config --global http.proxy http://proxy.swmed.edu:3128
  • 12. Use Cases • “This function used to work.” – Look at the changes made to that file since it last worked. • “Please send me the code used in this publication.” – Revert the project back to any point in its history • “I found a bug and fixed it.” – (Optionally) Allow others to contribute to your projects.
  • 13. Outline • Revision Control • Reproducibility and Replicability – Replicability – Reproducibility • Ensuring Code Quality • Resources
  • 14. “‘Replicable’ means „other people get exactly the same results when doing exactly the same thing‟, while ‘reproducible’ means „something similar happens in other people's hands.‟ The latter is far stronger, in general, because it indicates that your results are not merely some quirk of your setup and may actually be right.” C. TITUS BROWN http://ivory.idyll.org/blog/replication-i.html
  • 15. Replicability • In order for analysis to be replicable, another researcher must have access to: – The exact same code you used – The exact same data you used • Any changes (including bug-fixes and other corrections) in your code or data from what you provide will make your results irreplicable. – Must track in a revision control system
  • 16. Reproducibility • Requires much more time and effort • Independently arrive at the same conclusions – Potentially using the same data – Using different techniques and parameters • May take as much time to reproduce results as it did to produce them the first time • Should be done in high-stakes (i.e. clinical) applications
  • 17. Recommended Practices a. Use a revision control system such as GitHub b. To ensure replicability, clone your repository on another computer and re-run all your analysis. Ensure you get the same results. • This is a good test of replicability. • Knowing you‟ll have to do this will make you write better organized code. c. If it‟s really important, ask a colleague to reproduce.
  • 18. Outline • Revision Control • Reproducibility and Replicability • Ensuring Code Quality – Automated Testing – Code reviews • Resources
  • 19. Automated Testing • Unit testing – Very specific target – May have multiple tests per function install.packages( “testthat”) • Many unit testing frameworks library(testthat) – In R: testthat, and Runit
  • 20. Testing Example - Square Code square <- function(x){ sq <- 0 for (i in 1:x){ sq <- sq + x } return(sq) }
  • 21. Testing Example - Square Code Tests expect_that( square <- function(x){ square(3), sq <- 0 equals(9) for (i in 1:x){ ) #Passes sq <- sq + x } return(sq) }
  • 22. Testing Example - Square Code Tests expect_that(square(3), square <- function(x){ equals(9)) #Passes sq <- 0 expect_that(square(5), for (i in 1:x){ equals(25)) #Passes sq <- sq + x } return(sq) }
  • 23. Test-Driven Development (TDD) • If you see a bug: 1. Write a test that fails 2. Fix the bug 3. Show that the test now passes 4. Commit to revision control
  • 24. Testing Example - Square Code Tests expect_that(square(3), square <- function(x){ equals(9)) #Passes sq <- 0 expect_that(square(5), for (i in 1:x){ equals(25)) #Passes sq <- sq + x } return(sq) }
  • 25. Testing Example - Square Code Tests expect_that(square(3), square <- function(x){ equals(9)) #Passes sq <- 0 expect_that(square(5), for (i in 1:x){ equals(25)) #Passes sq <- sq + x expect_that(square(2.5), } equals(6.25)) #Fails return(sq) }
  • 26. Testing Example - Square Code Tests expect_that(square(3), square <- function(x){ equals(9)) #Passes sq <- 0 expect_that(square(5), for (i in 1:x){ equals(25)) #Passes sq <- sq + x expect_that(square(2.5), } equals(6.25)) #Fails return(sq) expect_that(square(-2), } equals(4)) #Fails
  • 27. Test-Driven Development (TDD) • If you see a bug: 1. Write a test that fails 2. Fix the bug 3. Show that the test now passes 4. Commit to revision control
  • 28. Testing Example - Square Code square <- function(x){ sq <- x * x return(sq) }
  • 29. Test-Driven Development (TDD) • If you see a bug: 1. Write a test that fails 2. Fix the bug 3. Show that the test now passes 4. Commit to revision control
  • 30. Testing Example - Square Code square <- function(x){ sq <- x * x return(sq) }
  • 31. Testing Example - Square Code Tests expect_that(square(3), equals(9)) #Passes expect_that(square(5), square <- function(x){ equals(25)) #Passes sq <- x * x expect_that(square(2.5), return(sq) equals(6.25)) #Passes } expect_that(square(-2), equals(4)) #Passes
  • 32. Test-Driven Development (TDD) • If you see a bug: 1. Write a test that fails 2. Fix the bug 3. Show that the test now passes 4. Commit to revision control
  • 33. Test-Driven Development (TDD) • Advantages – Ensure that problematic areas are well-tested – Regression testing – ensure old bugs don‟t ever come back – Confidently approach old code – More assured in handling someone else‟s code – Saves you time over manual testing
  • 34. Code Reviews • Get more than one set of eyes on your code • Lightweight – Email to get quick feedback – GitHub is great for this • Formal – Have a meeting to audit – Less than 500 LOC per meeting
  • 35. Extreme – Pair Programming • Two programmers share a single workstation • Both participate, though only one can type • Significant learning opportunities for both • Can strategically pair: – Senior with Junior, mentoring – Statistician with Developer, mutual learning • Improvements in code quality compensate for short-term efficiency loss – fewer bugs, easier code to maintain
  • 36. Testing Example - Square Code Tests expect_that(square(3), equals(9)) #Passes expect_that(square(5), square <- function(x){ equals(25)) #Passes sq <- x * x expect_that(square(2.5), return(sq) equals(6.25)) #Passes } expect_that(square(-2), equals(4)) #Passes
  • 37. Testing Example - Square Code Tests expect_that(square(3), equals(9)) #Passes expect_that(square(5), square <- function(x){ equals(25)) #Passes x^2 expect_that(square(2.5), } equals(6.25)) #Passes expect_that(square(-2), equals(4)) #Passes
  • 38. Outline • Revision Control • Reproducibility and Replicability • Ensuring Code Quality • Resources
  • 39. Resources • Software Carpentry – www.software-carpentry.org – Volunteer organization focused on teaching these topics to scientific audiences – Contact us (Jeffrey.Allen@UTSouthwestern.edu) if you‟d be interested in attending a local Boot Camp • GitHub Documentation – https://help.github.com/ – Great documentation on how to use Git and/or GitHub
  • 40. Resources • Unit Testing in R – http://cran.r- project.org/web/packages/RUnit/index.html – http://cran.r- project.org/web/packages/testthat/index.html – http://journal.r-project.org/archive/2011- 1/RJournal_2011-1_Wickham.pdf
  • 41. Suggested Next Steps • Watch Lecture from Keith Baggerly • Register for a GitHub account (free), explore • Write an R function and cover it with unit tests using the test_that framework • Then check into a public GitHub repo

Editor's Notes

  1. Every good programmer I know uses, most bad ones I know don’t.
  2. You can use Git without Github. GitHub is one of the options for hosting Git repositories.
  3. Overview, list of projectsPublic v PrivateShow commitsShow diff of a commitShow comments/discussion on commitShow tagsShow wiki – devtools - https://github.com/hadley/devtools/Show issuesShow pull requests
  4. Only true way to achieve replicability in a project under development is to use a revision control system
  5. Spot two problems with this function 1. negatives 2. decimals
  6. What would our new tests look like?
  7. expect_that(square(2.5),equals(6.25))expect_that(square(-2),equals(4)) square &lt;- function(x){sq &lt;- 0 for (i in 1:x){sq &lt;- sq + x } return(sq) }test_that(&quot;Square function works on various input types&quot;, {expect_that(square(3), equals(9))expect_that(square(5), equals(25))expect_that(square(2.5), equals(6.25))expect_that(square(-2), equals(4))})
  8. square &lt;- function(x){sq &lt;- x * x return(sq) }test_that(&quot;Square function works on various input types&quot;, {expect_that(square(3), equals(9))expect_that(square(5), equals(25))expect_that(square(2.5), equals(6.25))expect_that(square(-2), equals(4))})
  9. Lightweight Email your code and have a peer or more experienced programmer look through the code and suggest improvements Demo GitHubFormal Schedule a meeting with a handful of other programmers to audit the code you’ve written Should be less than 500 LOC per meeting Target around 200LOC per hour Selectively pick sections of code to review formally
  10. Demo GitHub code comments
  11. square &lt;- function(x){ x ^ 2}test_that(&quot;Square function works on various input types&quot;, {expect_that(square(3), equals(9))expect_that(square(5), equals(25))expect_that(square(2.5), equals(6.25))expect_that(square(-2), equals(4))})