Democratizing data skills to
advance data-driven discovery
Tracy K. Teal, PhD
@tracykteal, @datacarpentry
https://github.com/tracykteal/datacarpentry-presentations
Our increasing capacity to collect data is
changing how we can look at the world
Software and tools allow us to turn
data into information.
People turn information into
knowledge.
We can unleash the potential of data
by empowering people
It’s challenging to manage &
analyze this data
Researchers view the major limiting factor in
research progress as a lack of expertise in how
to handle and analyze data
http://braembl.org.au/news/braembl-community-survey-report-2013
Current unmet needs
Barone L, Williams J and Micklos D. Unmet Needs for Analyzing Biological Big Data:
A Survey of 704 NSF Principal Investigators (2017)
How do we scale data skills along with
data production?
Build local communities and capacity
Software and Data
Carpentry
Building community teaching
universal data literacy
Grassroots training effort
- Developed by practitioners for practitioners
- Identify skills and best practices needed in software
development, data management and analysis in given
domains
- Collaboratively and iteratively developed openly licensed
(CC-BY) training materials (all available on github)
- Organize and deliver two-day, intensive hands-on
workshops in fundamental data analysis skills using a pool
of volunteer helpers and instructors
Software and Data
Carpentry
• Core skills for effective research computing
• Two-day hands-on workshops
• Collaboratively developed, openly licensed
lesson materials
• Over 800 trained volunteer instructors on 6
continents
Software & Data Carpentry workshops
We know we can’t teach everything in two days,
but the goal is to teach foundational skills to
reduce the activation energy for getting started
and for people to know what’s possible.
Workshop goals
• Primary goal is increased confidence in a learners ability to
do computational work and continue to learn more.
• Lower the activation barrier for people to get started
working with data. Often people don’t know where to start.
• The skills we teach should be ones they can immediately
apply to their research.
• People should learn the things we’re teaching.
• We should see a shift in perspective in the value of skills
like scripting for better research and reproducible research.
• People should have a positive workshop experience that
empowers them to do better work with data.
Hands-on intensive workshops
• Two days
• Hands-on
• Qualified instructors
• Helpers
• Post-it notes!
• Friendly learning environment
Collaboratively developed curriculum
• Focused on data - teaches how to manage and analyze data
in an effective and reproducible way.
• Initial focus is on workshops for novices - there are no
prerequisites, and no prior knowledge computational
experience is assumed.
• Domain specific by design – currently have lessons in
ecology, in genomics developed with iPlant, in geospatial
data developed with NEON, and on APIs with rOpenSci
Curriculum
• Project and data organization
• Data cleaning and quality control
• Data analysis and visualization in R or Python
Domain dependent
• Introduction to cloud computing
• Command line
• Text mining
Curriculum
• Command line
• Software development best practices in R, Python or
MATLAB
• Github for version control
• Focused on better scientific programming
practices, for writing software or analysis
scripts
• Novice and more advanced levels
• Domain agnostic
Volunteer instructors
The Software Carpentry Foundation runs the Instructor Training program that
teaches volunteers teaching pedagogy and specific strategies for teaching
computational skills in a workshop format.
Training is a missing piece between
data collection & data-driven discovery
Training
If you want to go fast, go alone.
If you want to go far, go together.
Get Involved
 Request a workshop
 Be a helper at a workshop
 Become a partner or affiliate and train
local instructors
 Contribute to lesson development
Guiding the Carpentries
Software Carpentry Steering Committee:
Karin Lagenstrom (Norway)
Rayna Harris (UT Austin)
Susan McClatchy (Jackson Labs)
Kate Hertweck (UT Tyler)
Christina Koch (University of Wisconsin)
Mateusz Kuzak (eScience Netherlands)
Belinda Weaver (Australia)
Data Carpentry Steering Committee:
Karen Cranston (Duke / OpenTree of Life)
Hilmar Lapp (Duke)
Aleksandra Pawlik (New Zealand eScience)
Karthik Ram (rOpenSci / Berkeley Institute of Data Science Fellow)
Ethan White (University of Florida / Moore DDD Investigator)
Support
Data Carpentry
Software Carpentry
http://software-carpentry.org/scf/partners/
Jason Williams at iPlant

Data carpentry replicathon_2017-03-24

  • 1.
    Democratizing data skillsto advance data-driven discovery Tracy K. Teal, PhD @tracykteal, @datacarpentry https://github.com/tracykteal/datacarpentry-presentations
  • 2.
    Our increasing capacityto collect data is changing how we can look at the world
  • 3.
    Software and toolsallow us to turn data into information.
  • 4.
    People turn informationinto knowledge.
  • 5.
    We can unleashthe potential of data by empowering people
  • 6.
    It’s challenging tomanage & analyze this data
  • 7.
    Researchers view themajor limiting factor in research progress as a lack of expertise in how to handle and analyze data http://braembl.org.au/news/braembl-community-survey-report-2013
  • 8.
    Current unmet needs BaroneL, Williams J and Micklos D. Unmet Needs for Analyzing Biological Big Data: A Survey of 704 NSF Principal Investigators (2017)
  • 9.
    How do wescale data skills along with data production?
  • 10.
  • 11.
    Software and Data Carpentry Buildingcommunity teaching universal data literacy
  • 12.
    Grassroots training effort -Developed by practitioners for practitioners - Identify skills and best practices needed in software development, data management and analysis in given domains - Collaboratively and iteratively developed openly licensed (CC-BY) training materials (all available on github) - Organize and deliver two-day, intensive hands-on workshops in fundamental data analysis skills using a pool of volunteer helpers and instructors
  • 13.
    Software and Data Carpentry •Core skills for effective research computing • Two-day hands-on workshops • Collaboratively developed, openly licensed lesson materials • Over 800 trained volunteer instructors on 6 continents
  • 14.
    Software & DataCarpentry workshops We know we can’t teach everything in two days, but the goal is to teach foundational skills to reduce the activation energy for getting started and for people to know what’s possible.
  • 15.
    Workshop goals • Primarygoal is increased confidence in a learners ability to do computational work and continue to learn more. • Lower the activation barrier for people to get started working with data. Often people don’t know where to start. • The skills we teach should be ones they can immediately apply to their research. • People should learn the things we’re teaching. • We should see a shift in perspective in the value of skills like scripting for better research and reproducible research. • People should have a positive workshop experience that empowers them to do better work with data.
  • 17.
    Hands-on intensive workshops •Two days • Hands-on • Qualified instructors • Helpers • Post-it notes! • Friendly learning environment
  • 18.
  • 19.
    • Focused ondata - teaches how to manage and analyze data in an effective and reproducible way. • Initial focus is on workshops for novices - there are no prerequisites, and no prior knowledge computational experience is assumed. • Domain specific by design – currently have lessons in ecology, in genomics developed with iPlant, in geospatial data developed with NEON, and on APIs with rOpenSci Curriculum • Project and data organization • Data cleaning and quality control • Data analysis and visualization in R or Python Domain dependent • Introduction to cloud computing • Command line • Text mining
  • 20.
    Curriculum • Command line •Software development best practices in R, Python or MATLAB • Github for version control • Focused on better scientific programming practices, for writing software or analysis scripts • Novice and more advanced levels • Domain agnostic
  • 21.
    Volunteer instructors The SoftwareCarpentry Foundation runs the Instructor Training program that teaches volunteers teaching pedagogy and specific strategies for teaching computational skills in a workshop format.
  • 22.
    Training is amissing piece between data collection & data-driven discovery Training
  • 28.
    If you wantto go fast, go alone. If you want to go far, go together.
  • 29.
    Get Involved  Requesta workshop  Be a helper at a workshop  Become a partner or affiliate and train local instructors  Contribute to lesson development
  • 30.
    Guiding the Carpentries SoftwareCarpentry Steering Committee: Karin Lagenstrom (Norway) Rayna Harris (UT Austin) Susan McClatchy (Jackson Labs) Kate Hertweck (UT Tyler) Christina Koch (University of Wisconsin) Mateusz Kuzak (eScience Netherlands) Belinda Weaver (Australia) Data Carpentry Steering Committee: Karen Cranston (Duke / OpenTree of Life) Hilmar Lapp (Duke) Aleksandra Pawlik (New Zealand eScience) Karthik Ram (rOpenSci / Berkeley Institute of Data Science Fellow) Ethan White (University of Florida / Moore DDD Investigator)
  • 31.

Editor's Notes

  • #6 By making data accessible and putting the data skills and the perspectives in the hands of all, we allow people to answer their own questions and capture their passion and expert knowledge.
  • #7 Once you get the data back though, often instead of being seen as an exciting opportunity, it’s seen as an insurmountable challenge how to manage, store, analyze or even view the data.
  • #8 , but for instance … Biomedical researchers in the UK had similar sentiments in a survey https://storify.com/biocrusoe/elixir-uk-node-meeting-2014-10-14 And an internal study at PLOS showed a lack of data management expertise has a dramatic impact on the ability of researchers to share their data
  • #9 Also we see in our workshops. Fill within hours. For instance a recent workshop at the NIH. Had 20 spots. Had a wait list of 84 people.
  • #13 Deliverables are lessons and workshops
  • #15 More than a focus on particular analysis techniques, workshops are focused on data management and what might be called data wrangling, gettting your data in to and out of analysis programs, formatting the data for analysis, conducting visualization We want to teach not only skills but affect attitudes – emphasizing importance and value of these approaches for effective and reproducible research
  • #23 Researchers need to be trained in how to effectively manage and use this data. Researchers don’t know how to address the questions they want to ask or aren’t even aware of the types of questions they can ask with this data. Story about genomics researcher It’s no longer the case where we’re saying ‘researchers should’. Researcher themselves want this training, feel limited by their own current data analysis skills. You likely have anecdotal evidence of this in your own community or domain
  • #31 Truly collaborative effort