• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Dark Matter, Public Health, and Scientific Computing
 

Dark Matter, Public Health, and Scientific Computing

on

  • 1,198 views

Talk given at the 8th IEEE International Conference on eScience in Chicago on October 10, 2012.

Talk given at the 8th IEEE International Conference on eScience in Chicago on October 10, 2012.

Statistics

Views

Total Views
1,198
Views on SlideShare
1,197
Embed Views
1

Actions

Likes
1
Downloads
8
Comments
0

1 Embed 1

https://si0.twimg.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Dark Matter, Public Health, and Scientific Computing Dark Matter, Public Health, and Scientific Computing Presentation Transcript

    • Dark Matter,Public Health, and Scientific Computing Greg Wilson
    • “Dark Matter Developers”Scott Hanselman (March 2012)[We] hypothesize that there is another kindof developer than the ones we meet all thetime. We call them Dark Matter Developers.They dont read a lot of blogs, they never write blogs,they dont go to user groups, they dont tweet orfacebook, and you dont often see them at largeconferences... [They] arent chasing the latest betaor pushing...limits, they are just producing.http://www.hanselman.com/blog/DarkMatterDevelopersTheUnseen99.aspx 2
    • Im Not That Optimistic● 95% of scientists have their heads down – Doing science instead of talking about using GPU clouds to personalize collaborative management of reproducible peta-scale workflows● Not because they dont want to do all that Star Trek stuff● But because e-science is out of reach 3
    • Shiny Toys 4
    • Grimy Reality 5
    • So Here We AreYou Them 6
    • Surely Youre Exaggerating1. How many graduate students write shell scripts to analyze each new data set instead of running those analyses by hand? 7
    • Surely Youre Exaggerating2. How many of them use version control to keep track of their work and collaborate with colleagues? 8
    • Surely Youre Exaggerating3. How many routinely break large problems into pieces small enough to be - comprehensible, - testable, and - reusable? 9
    • Surely Youre Exaggerating3. How many routinely break large problems into pieces small enough to be - comprehensible, - testable, and - reusable? And how many know those are the same things? 10
    • “Not My Problem” Actually your biggest problemIf someone is chronically malnourished as achild, giving them a CT scanner when theyre 20 has no effect on their wellbeing. 11
    • Their Biggest Problem Too Them Us “I dont know much “Lets talk about about this, but Id brain scans.like some clean water.” Theyre cool.” 12
    • Weve Left the Majority Behind 13
    • Weve Left the Majority Behind Other than Googling for things, the majority of scientists do not use computers more effectively today than they did 28 years ago. 14
    • The GoalpostsA scientist is computationally competent if shecan build, use, validate, and share software to:● Manage and process data● Tell if its been processed correctly● Find and fix problems when it hasnt been● Keep track of what shes done● Share work with others● Do all of these things efficiently 15
    • A Drivers License Exam● Developed with the Software Sustainability Institute for the DiRAC Consortium● Formative assessment – Do you know what you need to know in order to get the most out of this gear?● Pencils ready? Note: actual exam allows for several different programming languages, version control systems, etc. 16
    • A Drivers License Exam1. Check out a working copy of the exam2. materials from Subversion.3.4.5.6.7. Could do it easily 1.0 Could struggle through 0.5 Wouldnt know where to start 0.0 Dont understand the question -1.0 17
    • A Drivers License Exam1. Use find and grep in a pipe to create a2. list of all .dat files in the working copy,3. redirect the output to a file, and add that4. file to the repository.5.6.7. Could do it easily 1.0 Could struggle through 0.5 Wouldnt know where to start 0.0 Dont understand the question -1.0 18
    • A Drivers License Exam1. Write a shell script that runs a legacy2. program for each parameter in a set.3.4.5.6.7. Could do it easily 1.0 Could struggle through 0.5 Wouldnt know where to start 0.0 Dont understand the question -1.0 19
    • A Drivers License Exam1. Edit the Makefile provided so that if any2. .dat file changes, analyze.py is re-run to3. create the corresponding .out file.4.5.6.7. Could do it easily 1.0 Could struggle through 0.5 Wouldnt know where to start 0.0 Dont understand the question -1.0 20
    • A Drivers License Exam1. Write four unit tests to exercise a function2. that calculates running sums. Explain why3. your four tests are most likely to uncover4. bugs in the function.5.6.7. Could do it easily 1.0 Could struggle through 0.5 Wouldnt know where to start 0.0 Dont understand the question -1.0 21
    • A Drivers License Exam1. Explain when and how a function could2. pass your tests, but still fail on real data.3.4.5.6.7. Could do it easily 1.0 Could struggle through 0.5 Wouldnt know where to start 0.0 Dont understand the question -1.0 22
    • A Drivers License Exam1. Do a code review of the legacy program2. from Q3 (about 50 lines long) and list3. the four most important improvements4. you would make.5.6.7. Could do it easily 1.0 Could struggle through 0.5 Wouldnt know where to start 0.0 Dont understand the question -1.0 23
    • How Are We Doing?a) How did you do? 24
    • How Are We Doing?a) How did you do?b) How do you think most scientists do? 25
    • How Are We Doing?a) How did you do?b) How do you think most scientists do?c) Do you think a scientist could use a peta-scale GPU provenance cloud without these skills? 26
    • How Are We Doing?a) How did you do?b) How do you think most scientists do?c) Do you think a scientist could use debug a peta-scale GPU provenance cloud without these skills? 27
    • How Are We Doing?a) How did you do?b) How do you think most scientists do?c) Do you think a scientist could use debug a peta-scale GPU provenance extend cloud without these skills? 28
    • So Why Is This Your Problem?If youre only helping the small minoritylucky enough to have acquired the skillsthat mastering your shiny toy depends on...your potential user base is many timessmaller than it could be. 29
    • It Is Therefore Obvious That...● Put more computing courses in the curriculum! – Except the curriculum is already full 30
    • It Is Therefore Obvious That...● Put a little computing in every course! – Still adds up: 5 minutes/lecture = 4 courses/degree – First thing cut when running late – The blind leading the blind 31
    • It Is Therefore Obvious That...● Move all our teaching online! – Where do I start? – Signal vs. noise – “Its not a MOOC, its a MESS” Everyone who understands something correctly understands it the same way. Everyone who misunderstands something misunderstands differently. 32
    • What Has Worked● Target graduate students 33
    • What Has Worked● Target graduate students● Intensive short courses (2 days to 2 weeks) 34
    • What Has Worked● Target graduate students● Intensive short courses (2 days to 2 weeks)● Focus on practical skills 35
    • What Has Worked● Target graduate students● Intensive short courses (2 days to 2 weeks)● Focus on practical skills● Follow up with 6-8 weeks of once-a-week online tutorials and help – Still tuning this part 36
    • 37
    • What We Teach● Unix shell● Python● Version control● Testing● Matrix programming, image processing, SQL, ... 38
    • What We Teach● Unix shell● Python● Version control● Testing● Matrix programming, image processing, SQL, ... 39
    • “Thats Merely Useful”● None of this is publishable any longer – Which means its ineffective career-wise for academic computer scientists 40
    • “Thats Merely Useful”● None of this is publishable any longer – Which means its ineffective career-wise for academic computer scientists● But it works – Two independent assessments have validated the impact of what we do for 1/3-2/3 of learners – Thats at least a six-fold increase in the number of scientists who can get work done without karōshi 41
    • Majestic EqualityAnatole France (1844-1924)“The law, in its majestic equality, forbidsthe rich and poor alike to sleep underbridges, to beg in the streets, and tosteal bread.” 42
    • Majestic EqualityAnatole France (1844-1924)“The law, in its majestic equality, forbidsthe rich and poor alike to sleep underbridges, to beg in the streets, and tosteal bread.”Thanks to modern computers,every scientist can now devote her working lifeto installation and configuration issues. 43
    • If you build a man a fire, youll keep him warm for a night. If you set a man on fire,youll keep him warm for the rest of his life. — Terry Pratchett 44
    • Host a Workshop 45
    • Teach a Workshop● All our materials are CC-BY licensed● We will help train you 46
    • Shine Some Light● Include a few lines about your software stack and working practices in your scientific papers● Ask about them when reviewing – Wheres the repository containing the code? – Whats the coverage of your unit tests? – How did you track provenance? 47
    • Be the Change You Want to See Would you rather increase the productivity of the top 5% of scientists ten-fold? Or double the productivity of the other 95%? 48
    • Be the Change You Want to See Would you rather increase the productivity of the top 5% of scientists ten-fold? Or double the productivity of the other 95%? 49
    • Be the Change You Want to See Would you rather see your best ideas left on the shelf because theyre out of reach? Or raise a generation who can all do the things we think are exciting? 50
    • http://software-carpentry.org 51