Software Carpentry @ Arizona!
Instructors
• Titus Brown
• Karen Cranston
• Rich Enbody

• Deren, Chas, Katie, Nirav
What do scientists care about?
1. Correctness
2. Reproducibility and provenance
3. Efficiency
What do scientists actually care
               about?
1. Efficiency

2. Correctness
3. Reproducibility and provenance
Our concern
• As we become more reliant on computational
  inference, does more of our science become wrong?
• “Big Data” increasingly requires sophisticated
  computational pipelines…
• We know that simple computational errors have gone
  undetected for many years
   – a sign error => retraction of 3 Science, 1 Nature, 1 PNAS
   – Rejection of grants, publications!
   http://boscoh.com/protein/a-sign-a-flipped-structure-
and-a-scientific-flameout-of-epic-proportions
Our central thesis
With only a little bit of training and effort,
• Computational scientists can become more
  efficient and effective at getting their work
  done,
• while considerably improving correctness and
  reproducibility of their code.
Automation
Why Python, and not R?
In my opinion,

• Python is a more general purpose language, while R is
  mostly about data analysis.

• Everyone will need to learn multiple languages; R and
  Python are pretty dominant in bio right now.

• Luckily, once you get the hang of it, new languages are not
  so difficult to pick up.

• Ultimately, we’re trying to teach process not details.
Administrivia
• Asking for help

• Using the Web site

• Sticky notes: ok? Not ok?

• Minute cards: at the end of every session, write
  down
     • One thing you learned
     • One thing you are confused about

2013 arizona-swc

  • 1.
  • 2.
    Instructors • Titus Brown •Karen Cranston • Rich Enbody • Deren, Chas, Katie, Nirav
  • 3.
    What do scientistscare about? 1. Correctness 2. Reproducibility and provenance 3. Efficiency
  • 4.
    What do scientistsactually care about? 1. Efficiency 2. Correctness 3. Reproducibility and provenance
  • 5.
    Our concern • Aswe become more reliant on computational inference, does more of our science become wrong? • “Big Data” increasingly requires sophisticated computational pipelines… • We know that simple computational errors have gone undetected for many years – a sign error => retraction of 3 Science, 1 Nature, 1 PNAS – Rejection of grants, publications! http://boscoh.com/protein/a-sign-a-flipped-structure- and-a-scientific-flameout-of-epic-proportions
  • 6.
    Our central thesis Withonly a little bit of training and effort, • Computational scientists can become more efficient and effective at getting their work done, • while considerably improving correctness and reproducibility of their code.
  • 7.
  • 8.
    Why Python, andnot R? In my opinion, • Python is a more general purpose language, while R is mostly about data analysis. • Everyone will need to learn multiple languages; R and Python are pretty dominant in bio right now. • Luckily, once you get the hang of it, new languages are not so difficult to pick up. • Ultimately, we’re trying to teach process not details.
  • 9.
    Administrivia • Asking forhelp • Using the Web site • Sticky notes: ok? Not ok? • Minute cards: at the end of every session, write down • One thing you learned • One thing you are confused about