SlideShare a Scribd company logo
Software Carpentry and the
Hydrological Sciences
!

Aron Ahmadia
Coastal and Hydraulics Lab
US Army ERDC
Dark Matter,
Public Health,
and
Scientific
http://www.slideshare.net/gvwilson/dark-matter-publichealth-and-scientific-computing
Computing

This Talk is a Fork
!

Greg Wilson
“Dark Matter Developers”
Scott Hanselman (March 2012)
!
[We] hypothesize that there is another kind
of developer than the ones we meet all the
time. We call them Dark Matter Developers.
They don't read a lot of blogs, they never write blogs,
they don't go to user groups, they don't tweet or
facebook, and you don't often see them at large
conferences... [They] aren't chasing the latest beta
or pushing...limits, they are just producing.
!
!

http://www.hanselman.com/blog/DarkMatterDevelopersTheUnseen99.aspx
!3
We’re Not That Optimistic
l

90% of hydrologists have their heads down
−

l

l

Doing hydrology instead of talking about using
Charm++ to asynchronously distribute their CUDAaccelerated heterogeneous N-body Universe
simulations

Not because they don't
want to do all that Star
Trek stuff
But because the
majority of the field isn’t
there yet.
!4
Shiny Toys

!5
Grimy Reality

!6
So Here We Are
Us

Them

!7
Surely You're Exaggerating
1. How many graduate students write shell
scripts to analyze each new data set
instead of running those analyses by hand?

!8
Surely You're Exaggerating
2. How many of them use version control
to keep track of their work and collaborate
with colleagues?

!9
Surely You're Exaggerating
3.

How many routinely break large problems
into pieces small enough to be
- comprehensible,
- testable, and
- reusable?

!10
Surely You're Exaggerating
3.

How many routinely break large problems
into pieces small enough to be
- comprehensible,
- testable, and
- reusable?
And how many know those are the
same things?

!11
We've Left the Majority Behind

!12
We've Left the Majority Behind
Other than
Googling for things,
the majority of scientists
do not use computers
more effectively today
than they did 28 years ago.

!13
“Not My Problem”
Actually your biggest problem
•

•

•

Application Yesterday’s models are solving today’s
million dollar problems, because nobody knows how to
update them.
Collaboration Hundreds of person-years of effort are
redundantly combining to create a year’s worth of result
Credibility Many published results can’t be trusted,
because the scientific code isn’t modular, can’t be
tested, and sometimes isn’t even released

And Hydrology Loses

!14
Jeff Hammond

Your code will outlive your
supercomputer. Most
successful supercomputers
lasts 5 years. The most
successful codes may reach 5
decades. There are currently
40 year-old modeling codes
still being used.

Argonne Leadership
Computing Facility
!15
Where Are Your Goalposts?
A hydrologist is computationally competent if she
can build, use, validate, and share software to:

•Manage software and data
•Tell if it's working correctly
•Find and fix problems when it isn’t
•Keep track of what she’s done
•Share work with others
•Do all of these things efficiently
!16
A Driver's License Exam
l

l

Developed with the Software Sustainability
Institute for the DiRAC Consortium
Formative assessment
−

l

Do you know what you need to know in order to
get the most out of this gear?

Pencils ready?

Note: actual exam allows for several different
programming languages, version control systems, etc.
!17
A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.

Check out a working copy of the exam
materials from a git repository.

Could do it easily
Could struggle through

1
0.5

Wouldn’t know where to start

0

Don’t understand the question

-1
!18
A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.

Use find and grep in a pipe to create a
list of all .dat files in the working copy,
redirect the output to a file, and add that
file to the repository.
Could do it easily
Could struggle through

1
0.5

Wouldn’t know where to start

0

Don’t understand the question

-1
!19
A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.

Write a shell script that runs a legacy
program for each parameter in a set.

Could do it easily
Could struggle through

1
0.5

Wouldn’t know where to start

0

Don’t understand the question

-1
!20
A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.

Edit the Makefile provided so that if any
.dat file changes, analyze.py is re-run to
create the corresponding .out file.
Could do it easily
Could struggle through

1
0.5

Wouldn’t know where to start

0

Don’t understand the question

-1
!21
A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.

Write four unit tests to exercise a function
that calculates running sums. Explain why
your four tests are most likely to uncover
bugs in the function.
Could do it easily
Could struggle through

1
0.5

Wouldn’t know where to start

0

Don’t understand the question

-1
!22
A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.

Explain when and how a function could
pass your tests, but still fail on real data.

Could do it easily
Could struggle through

1
0.5

Wouldn’t know where to start

0

Don’t understand the question

-1
!23
A Driver's License Exam
1.
2.
3.
4.
5.
6.
7.

Do a code review of the legacy program
from Q3 (about 50 lines long) and list
the four most important improvements
you would make.
Could do it easily
Could struggle through

1
0.5

Wouldn’t know where to start

0

Don’t understand the question

-1
!24
How Are We Doing?
a) How well did you do?

!25
How Are We Doing?
a) How well did you do?
b) How well do you think most hydrologists would
do?

!26
How Are We Doing?
a) How well did you do?
b) How well do you think most hydrologists would do?
c) Do you think a hydrologist could use your software
without having these skills?

d) Or a grasp of the principles behind them?

!27
What Has Worked
l

Target graduate students

l

Intensive short courses (2 days to 2 weeks)

l

Focus on practical skills

!28
What Has Worked
l

Target graduate students

l

Intensive short courses (2 days to 2 weeks)

l

Focus on practical skills

http://software-carpentry.org
!29
!30
What We Teach
l

Unix shell
how to automate repetitive tasks

l

Git and GitHub
how to use distributed version control to manage,
collaborate and share work

l

Python or R
how to grow a program in a structured, modular,
testable, reusable way

l

Databases
the difference between structured and unstructured data
!31
What We Teach
l
l
l
l
l

Unix shell
Python
Version control
Testing
Array computing,
image processing,
SQL, ...

!32
“That's Merely Useful”
l

None of this is publishable any longer
−

Which means it's ineffective career-wise for
academic computer scientists

!33
“That's Merely Useful”
l

None of this is publishable any longer
−

l

Which means it's ineffective career-wise for
academic computer scientists

But it works
−

Two independent assessments have validated the
impact of what we do

!34
Host a Workshop

!35
Teach a Workshop
l

All our materials are CC-BY licensed

l

We will help train you
Text

???
!36
Shine Some Light
l

l

Include a few lines about your software stack
and working practices in your scientific papers
Ask about them when reviewing
−
−
−

Where's the repository containing the code?
What's the coverage of your unit tests?
How did you track provenance?

!37
Links!
Find this talk on Twitter: @ahmadia
!

Me:
http://aron.ahmadia.net
!

Software Carpentry:
http://software-carpentry.org
!

US Army Engineer Research and
Development Center:
http://www.erdc.usace.army.mil/

!38

More Related Content

What's hot

More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...
Daniel Kanchev
 
Continuous Integration DCAEC12
Continuous Integration DCAEC12Continuous Integration DCAEC12
Continuous Integration DCAEC12
Stephen Ritchie
 
Data Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningData Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine Learning
Domino Data Lab
 
The Junior Developer Survival Guide - GDI Ann Arbor 2/10/15
The Junior Developer Survival Guide -  GDI Ann Arbor 2/10/15The Junior Developer Survival Guide -  GDI Ann Arbor 2/10/15
The Junior Developer Survival Guide - GDI Ann Arbor 2/10/15
James York
 
TestCon2018 - Next Generation Testing in the Age of Machines
TestCon2018 - Next Generation Testing in the Age of MachinesTestCon2018 - Next Generation Testing in the Age of Machines
TestCon2018 - Next Generation Testing in the Age of Machines
Berk Dülger
 
When code gets_older
When code gets_olderWhen code gets_older
When code gets_older
Sven Peters
 
Data Science Popup Austin: Conflict in Growing Data Science Organizations
Data Science Popup Austin: Conflict in Growing Data Science Organizations Data Science Popup Austin: Conflict in Growing Data Science Organizations
Data Science Popup Austin: Conflict in Growing Data Science Organizations
Domino Data Lab
 
DevSecCon Singapore 2018 - Insecurity in information technology by Tanya Janca
DevSecCon Singapore 2018 - Insecurity in information technology by Tanya JancaDevSecCon Singapore 2018 - Insecurity in information technology by Tanya Janca
DevSecCon Singapore 2018 - Insecurity in information technology by Tanya Janca
DevSecCon
 
Shift left-testing
Shift left-testingShift left-testing
Shift left-testing
Alan Richardson
 
Achieving Technical Excellence in Your Software Teams - from Devternity
Achieving Technical Excellence in Your Software Teams - from Devternity Achieving Technical Excellence in Your Software Teams - from Devternity
Achieving Technical Excellence in Your Software Teams - from Devternity
Peter Gfader
 
Gerrit Coetzee “Thou Shalt Write Things Down. And Other Rules for Managing Pr...
Gerrit Coetzee “Thou Shalt Write Things Down. And Other Rules for Managing Pr...Gerrit Coetzee “Thou Shalt Write Things Down. And Other Rules for Managing Pr...
Gerrit Coetzee “Thou Shalt Write Things Down. And Other Rules for Managing Pr...
Lviv Startup Club
 
Chaos Engineering, When should you release the monkeys?
Chaos Engineering, When should you release the monkeys?Chaos Engineering, When should you release the monkeys?
Chaos Engineering, When should you release the monkeys?
Thoughtworks
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
Theo Schlossnagle
 

What's hot (13)

More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...More Aim, Less Blame: How to use postmortems to turn failures into something ...
More Aim, Less Blame: How to use postmortems to turn failures into something ...
 
Continuous Integration DCAEC12
Continuous Integration DCAEC12Continuous Integration DCAEC12
Continuous Integration DCAEC12
 
Data Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningData Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine Learning
 
The Junior Developer Survival Guide - GDI Ann Arbor 2/10/15
The Junior Developer Survival Guide -  GDI Ann Arbor 2/10/15The Junior Developer Survival Guide -  GDI Ann Arbor 2/10/15
The Junior Developer Survival Guide - GDI Ann Arbor 2/10/15
 
TestCon2018 - Next Generation Testing in the Age of Machines
TestCon2018 - Next Generation Testing in the Age of MachinesTestCon2018 - Next Generation Testing in the Age of Machines
TestCon2018 - Next Generation Testing in the Age of Machines
 
When code gets_older
When code gets_olderWhen code gets_older
When code gets_older
 
Data Science Popup Austin: Conflict in Growing Data Science Organizations
Data Science Popup Austin: Conflict in Growing Data Science Organizations Data Science Popup Austin: Conflict in Growing Data Science Organizations
Data Science Popup Austin: Conflict in Growing Data Science Organizations
 
DevSecCon Singapore 2018 - Insecurity in information technology by Tanya Janca
DevSecCon Singapore 2018 - Insecurity in information technology by Tanya JancaDevSecCon Singapore 2018 - Insecurity in information technology by Tanya Janca
DevSecCon Singapore 2018 - Insecurity in information technology by Tanya Janca
 
Shift left-testing
Shift left-testingShift left-testing
Shift left-testing
 
Achieving Technical Excellence in Your Software Teams - from Devternity
Achieving Technical Excellence in Your Software Teams - from Devternity Achieving Technical Excellence in Your Software Teams - from Devternity
Achieving Technical Excellence in Your Software Teams - from Devternity
 
Gerrit Coetzee “Thou Shalt Write Things Down. And Other Rules for Managing Pr...
Gerrit Coetzee “Thou Shalt Write Things Down. And Other Rules for Managing Pr...Gerrit Coetzee “Thou Shalt Write Things Down. And Other Rules for Managing Pr...
Gerrit Coetzee “Thou Shalt Write Things Down. And Other Rules for Managing Pr...
 
Chaos Engineering, When should you release the monkeys?
Chaos Engineering, When should you release the monkeys?Chaos Engineering, When should you release the monkeys?
Chaos Engineering, When should you release the monkeys?
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 

Similar to Software Carpentry and the Hydrological Sciences @ AGU 2013

Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesAron Ahmadia
 
Dark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific ComputingDark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific Computing
Greg Wilson
 
DOES SFO 2016 - Greg Padak - Default to Open
DOES SFO 2016 - Greg Padak - Default to OpenDOES SFO 2016 - Greg Padak - Default to Open
DOES SFO 2016 - Greg Padak - Default to Open
Gene Kim
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
Merce Crosas
 
Git Makes Me Angry Inside - DrupalCon Prague
Git Makes Me Angry Inside - DrupalCon PragueGit Makes Me Angry Inside - DrupalCon Prague
Git Makes Me Angry Inside - DrupalCon Prague
Emma Jane Hogbin Westby
 
From NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceFrom NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceDaniel Greenfeld
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
Fangda Wang
 
Big guns for small guys (reloaded)
Big guns for small guys (reloaded)Big guns for small guys (reloaded)
Big guns for small guys (reloaded)
Jorge López-Lago
 
Hidden Speed Bumps on the Road to "Continuous"
Hidden Speed Bumps on the Road to "Continuous"Hidden Speed Bumps on the Road to "Continuous"
Hidden Speed Bumps on the Road to "Continuous"
Sonatype
 
2011_esc.pdf
2011_esc.pdf2011_esc.pdf
2011_esc.pdf
ssuserfb92ae
 
How Did We End up Here?
 How Did We End up Here? How Did We End up Here?
How Did We End up Here?
C4Media
 
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Burr Sutter
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
From 🤦 to 🐿️
From 🤦 to 🐿️From 🤦 to 🐿️
From 🤦 to 🐿️
Ori Pekelman
 
From DevOps to NoOps how not to get Equifaxed Apidays
From DevOps to NoOps how not to get Equifaxed ApidaysFrom DevOps to NoOps how not to get Equifaxed Apidays
From DevOps to NoOps how not to get Equifaxed Apidays
Ori Pekelman
 
Synergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringSynergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software Engineering
Tao Xie
 
Maintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsMaintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia Ecosystems
Chris Rackauckas
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)guestebde
 
Software Development in 21st Century
Software Development in 21st CenturySoftware Development in 21st Century
Software Development in 21st CenturyHenry Jacob
 

Similar to Software Carpentry and the Hydrological Sciences @ AGU 2013 (20)

Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical Sciences
 
Dark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific ComputingDark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific Computing
 
DOES SFO 2016 - Greg Padak - Default to Open
DOES SFO 2016 - Greg Padak - Default to OpenDOES SFO 2016 - Greg Padak - Default to Open
DOES SFO 2016 - Greg Padak - Default to Open
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
Git Makes Me Angry Inside - DrupalCon Prague
Git Makes Me Angry Inside - DrupalCon PragueGit Makes Me Angry Inside - DrupalCon Prague
Git Makes Me Angry Inside - DrupalCon Prague
 
From NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceFrom NASA to Startups to Big Commerce
From NASA to Startups to Big Commerce
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
Big guns for small guys (reloaded)
Big guns for small guys (reloaded)Big guns for small guys (reloaded)
Big guns for small guys (reloaded)
 
Hidden Speed Bumps on the Road to "Continuous"
Hidden Speed Bumps on the Road to "Continuous"Hidden Speed Bumps on the Road to "Continuous"
Hidden Speed Bumps on the Road to "Continuous"
 
2011_esc.pdf
2011_esc.pdf2011_esc.pdf
2011_esc.pdf
 
How Did We End up Here?
 How Did We End up Here? How Did We End up Here?
How Did We End up Here?
 
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
Teaching Elephants to Dance (and Fly!): A Developer's Journey to Digital Tran...
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
From 🤦 to 🐿️
From 🤦 to 🐿️From 🤦 to 🐿️
From 🤦 to 🐿️
 
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
 
From DevOps to NoOps how not to get Equifaxed Apidays
From DevOps to NoOps how not to get Equifaxed ApidaysFrom DevOps to NoOps how not to get Equifaxed Apidays
From DevOps to NoOps how not to get Equifaxed Apidays
 
Synergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringSynergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software Engineering
 
Maintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia EcosystemsMaintaining Large Scale Julia Ecosystems
Maintaining Large Scale Julia Ecosystems
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)
 
Software Development in 21st Century
Software Development in 21st CenturySoftware Development in 21st Century
Software Development in 21st Century
 

Recently uploaded

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 

Recently uploaded (20)

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 

Software Carpentry and the Hydrological Sciences @ AGU 2013

  • 1. Software Carpentry and the Hydrological Sciences ! Aron Ahmadia Coastal and Hydraulics Lab US Army ERDC
  • 3. “Dark Matter Developers” Scott Hanselman (March 2012) ! [We] hypothesize that there is another kind of developer than the ones we meet all the time. We call them Dark Matter Developers. They don't read a lot of blogs, they never write blogs, they don't go to user groups, they don't tweet or facebook, and you don't often see them at large conferences... [They] aren't chasing the latest beta or pushing...limits, they are just producing. ! ! http://www.hanselman.com/blog/DarkMatterDevelopersTheUnseen99.aspx !3
  • 4. We’re Not That Optimistic l 90% of hydrologists have their heads down − l l Doing hydrology instead of talking about using Charm++ to asynchronously distribute their CUDAaccelerated heterogeneous N-body Universe simulations Not because they don't want to do all that Star Trek stuff But because the majority of the field isn’t there yet. !4
  • 7. So Here We Are Us Them !7
  • 8. Surely You're Exaggerating 1. How many graduate students write shell scripts to analyze each new data set instead of running those analyses by hand? !8
  • 9. Surely You're Exaggerating 2. How many of them use version control to keep track of their work and collaborate with colleagues? !9
  • 10. Surely You're Exaggerating 3. How many routinely break large problems into pieces small enough to be - comprehensible, - testable, and - reusable? !10
  • 11. Surely You're Exaggerating 3. How many routinely break large problems into pieces small enough to be - comprehensible, - testable, and - reusable? And how many know those are the same things? !11
  • 12. We've Left the Majority Behind !12
  • 13. We've Left the Majority Behind Other than Googling for things, the majority of scientists do not use computers more effectively today than they did 28 years ago. !13
  • 14. “Not My Problem” Actually your biggest problem • • • Application Yesterday’s models are solving today’s million dollar problems, because nobody knows how to update them. Collaboration Hundreds of person-years of effort are redundantly combining to create a year’s worth of result Credibility Many published results can’t be trusted, because the scientific code isn’t modular, can’t be tested, and sometimes isn’t even released And Hydrology Loses !14
  • 15. Jeff Hammond Your code will outlive your supercomputer. Most successful supercomputers lasts 5 years. The most successful codes may reach 5 decades. There are currently 40 year-old modeling codes still being used. Argonne Leadership Computing Facility !15
  • 16. Where Are Your Goalposts? A hydrologist is computationally competent if she can build, use, validate, and share software to: •Manage software and data •Tell if it's working correctly •Find and fix problems when it isn’t •Keep track of what she’s done •Share work with others •Do all of these things efficiently !16
  • 17. A Driver's License Exam l l Developed with the Software Sustainability Institute for the DiRAC Consortium Formative assessment − l Do you know what you need to know in order to get the most out of this gear? Pencils ready? Note: actual exam allows for several different programming languages, version control systems, etc. !17
  • 18. A Driver's License Exam 1. 2. 3. 4. 5. 6. 7. Check out a working copy of the exam materials from a git repository. Could do it easily Could struggle through 1 0.5 Wouldn’t know where to start 0 Don’t understand the question -1 !18
  • 19. A Driver's License Exam 1. 2. 3. 4. 5. 6. 7. Use find and grep in a pipe to create a list of all .dat files in the working copy, redirect the output to a file, and add that file to the repository. Could do it easily Could struggle through 1 0.5 Wouldn’t know where to start 0 Don’t understand the question -1 !19
  • 20. A Driver's License Exam 1. 2. 3. 4. 5. 6. 7. Write a shell script that runs a legacy program for each parameter in a set. Could do it easily Could struggle through 1 0.5 Wouldn’t know where to start 0 Don’t understand the question -1 !20
  • 21. A Driver's License Exam 1. 2. 3. 4. 5. 6. 7. Edit the Makefile provided so that if any .dat file changes, analyze.py is re-run to create the corresponding .out file. Could do it easily Could struggle through 1 0.5 Wouldn’t know where to start 0 Don’t understand the question -1 !21
  • 22. A Driver's License Exam 1. 2. 3. 4. 5. 6. 7. Write four unit tests to exercise a function that calculates running sums. Explain why your four tests are most likely to uncover bugs in the function. Could do it easily Could struggle through 1 0.5 Wouldn’t know where to start 0 Don’t understand the question -1 !22
  • 23. A Driver's License Exam 1. 2. 3. 4. 5. 6. 7. Explain when and how a function could pass your tests, but still fail on real data. Could do it easily Could struggle through 1 0.5 Wouldn’t know where to start 0 Don’t understand the question -1 !23
  • 24. A Driver's License Exam 1. 2. 3. 4. 5. 6. 7. Do a code review of the legacy program from Q3 (about 50 lines long) and list the four most important improvements you would make. Could do it easily Could struggle through 1 0.5 Wouldn’t know where to start 0 Don’t understand the question -1 !24
  • 25. How Are We Doing? a) How well did you do? !25
  • 26. How Are We Doing? a) How well did you do? b) How well do you think most hydrologists would do? !26
  • 27. How Are We Doing? a) How well did you do? b) How well do you think most hydrologists would do? c) Do you think a hydrologist could use your software without having these skills? d) Or a grasp of the principles behind them? !27
  • 28. What Has Worked l Target graduate students l Intensive short courses (2 days to 2 weeks) l Focus on practical skills !28
  • 29. What Has Worked l Target graduate students l Intensive short courses (2 days to 2 weeks) l Focus on practical skills http://software-carpentry.org !29
  • 30. !30
  • 31. What We Teach l Unix shell how to automate repetitive tasks l Git and GitHub how to use distributed version control to manage, collaborate and share work l Python or R how to grow a program in a structured, modular, testable, reusable way l Databases the difference between structured and unstructured data !31
  • 32. What We Teach l l l l l Unix shell Python Version control Testing Array computing, image processing, SQL, ... !32
  • 33. “That's Merely Useful” l None of this is publishable any longer − Which means it's ineffective career-wise for academic computer scientists !33
  • 34. “That's Merely Useful” l None of this is publishable any longer − l Which means it's ineffective career-wise for academic computer scientists But it works − Two independent assessments have validated the impact of what we do !34
  • 36. Teach a Workshop l All our materials are CC-BY licensed l We will help train you Text ??? !36
  • 37. Shine Some Light l l Include a few lines about your software stack and working practices in your scientific papers Ask about them when reviewing − − − Where's the repository containing the code? What's the coverage of your unit tests? How did you track provenance? !37
  • 38. Links! Find this talk on Twitter: @ahmadia ! Me: http://aron.ahmadia.net ! Software Carpentry: http://software-carpentry.org ! US Army Engineer Research and Development Center: http://www.erdc.usace.army.mil/ !38