Data science education resources for
everyone
Nicole Vasilevsky, Jackie Wirz, Bjorn Pederson, Ted Laderas, Shannon McWeeney,
William Hersh, David A. Dorr, Melissa Haendel
Oregon Health & Science University
MLA/PNC 2016
The problem
Major challenge: how to manage, analyze
and interpret vast amounts of data being
generated in biomedical research
One goal of NIH Big Data to Knowledge
(BD2K) initiative: provide training for
students and researchers to address this
Research team in the Library and
Department of Medical Informatics and
Clinical Epidemiology (DMICE) is
developing skills courses and open
educational resources (OERs)
http://whelf.ac.uk/activity-data-delivering-benefits-from-the-data-deluge/
Our Approach
Skills courses and OERs connect the dots that help researchers understand how to apply
data science techniques in the context of their whole research life cycle
Skills courses and OER topics are aimed to fill specific gaps
BD2K Skills Courses
Taught by BD2K Faculty,
Post-doc and Staff
In person format Targeted to a variety of
students
Defining The Problem Wrangling Data
Data Identification And Resources
 Problems amenable to analytics
 Importance of question
 Team definitions
 Scope
 When we do this wrong: methods don't match
 Finding the right data
 Search methods
 Use of metadata
 Data management
 Exploratory Data Analysis
 Data Dictionary
 As you touch data, what can go wrong?
Methods, Tools And Analysis Scientific Communication
 Visualization
 Matching algorithms to problems
…
 Reporting Findings and Limitations
 Giving “Elevator Speech” on ideas of how to
approach problem
 Critique of related problem
Course Offerings
Course Length Who WhatWhen
Intro Course
Week long course
(~40 hrs)
July
2015
Interns and
undergraduates
Taught basics of data
science in the context
of the research life
cycle
Data After Dark
2 evening course
(4 hrs/nt)
January
2016
OHSU students,
staff and faculty
Emerging data science
activities/research
impact
Data and Donuts
2 morning course
(3 hrs/day)
June
2016
OHSU Summer
interns
Basics of data science
Advanced Course
4 evening course
(2 hrs/nt)
May
2016
OHSU students,
staff and faculty
Hands on Data viz /
Data wrangling
Data and Donuts
West
4 hour course
July
2016
OHSU summer
interns (West
Campus)
Basics of data science
Think like a data scientis
t
- the Data and Donuts workshop
will provide an introduction to data science for those new
to research. Summer interns encouraged to attend!
Topics covered will include
• What is Big Data?
• Asking the right question and getting
the right
answers from your data
• Finding data resources in the real world
• Data handling 101
• Ethics of data• Communicating your science for maximal
impact
June 2 8 & 2 9 | 9 - 1 2 PM | D onut s!
Fr ee Wor kshop!
DataAndDonuts
Interested?
Register at http
:
// bit.ly/ 1sfDeXz
or email wirzj@ohsu.eduw w w .ohsu.edu/ bd2 k
Hands-on! Learn by Doing!
Join us for a 4 evening workshop:
· Data Wrangling with Python and Pandas
· Interactive visualization with R/ Shiny
· Supervised Learning Algorithms + Kaggle Challenge
Familiarity with R and Git is required. Bring your laptop!
!
May 23-26th
5-7pm
Register at http:/ / bit.ly/ 1pFVvLv
Department of Medical Informatics + Clinical Epidemiology + OHSU Library
Funding: NIH 5R25EB020379
For more information, e-mail bd2k@ohsu.edu
FREE OHSU BD2K ADVANCED
DATA AFTER DARK WORKSHOP
Evaluation of Skills Courses
0% 20% 40% 60% 80% 100%
Evaluation Summary from Beginnner
Students
Beginner Percent 6 & 7 Beginner Percent 3, 4 & 5
Beginner Percent 1 & 2
0% 20% 40% 60% 80% 100%
Evaluation Summary from Advanced
Students
Advanced Percent 6 & 7 Advanced Percent 3, 4 & 5
Advanced Percent 1 & 2
The instructors clearly presented the
skills to be learned
The instructors presented
content in an organized manner
The instructors effectively presented
concepts and techniques
OER Modules
01 | Biomedical Big Data Science
02 | Introduction to Big Data in Biology and Medicine
03 | Ethical Issues in Use of Big Data
04 | Clinical Standards Related to Big Data
05 | Basic Research Data Standards
06 | Public Health and Big Data
07 | Team Science
08 | Secondary Use (Reuse) of Clinical Data
09 | Publication and Peer Review
10 | Information Retrieval
11 | Version Control and Identifiers
12 | Data annotation and curation
13 | Data Tools and Landscape
14 | Ontologies 101
15 | Data metadata and provenance
16 | Semantic data interoperability
17 | Choice of Algorithms and Algorithm Dynamics
18 | Visualization and Interpretation
19 | Replication, Validation and the spectrum of
Reproducibility Semantic data interoperability
20 | Regulatory Issues in Big Data for Genomics and Health
Semantic Web data
21 | Hosting data dissemination and data stewardship
workshops
22 | Hosting data dissemination and data stewardship
workshops
23 | Terminology of Biomedical, Clinical, and Translational
Research
24 | Computing Concepts for Big Data
25 | Data modeling
26 | Semantic Web data
27 | Context-based selection of data
28 | Translating the Question
29 | Implications of Provenance and Pre-processing
30 | Data tells a story
31 | Statistical Significance, P-hacking and Multiple-testing
32 | Displaying Confidence and Uncertainty
https://dmice.ohsu.edu/bd2k/topics.html
What is available in the modules?
Module Overview Online viewing Powerpoint files Audio files
Exercises References Resources
MLA- Professional Competencies For Health Sciences Librarians
https://dmice.ohsu.edu/bd2k/mapping_MLA.html
Competency #1
Understand the health sciences and
health care environment and the policies,
issues, and trends that impact that
environment
BDK02 - Introduction
To Big Data In Biology
And Medicine
BDK03 - Ethical Issues
In Use Of Big Data
BDK07- Team Science
Competency #3
Understand the principles and practices
related to providing information services
to meet users' needs
BDK10 - Information
Retrieval
BDK22 - Guidelines For
Reporting, Publications,
And Data Sharing
Competency #4
Have the ability to manage health
information resources in a broad range of
formats
BDK09 - Publication And Peer
Review
BDK12 - Data Annotation And
Curation
BDK14 - Ontologies 101
BDK15 - Data Metadata And
Provenance
Competency #5
Understand and use technology and
systems to manage all forms of
information
BDK10 - Information Retrieval
BDK12 - Data Annotation And
Curation
BDK13 - Data and tools
landscape
BDK14 - Ontologies 101
BDK26 - Introduction to
Semantic Web data
Competency #6
Understand curricular design and
instruction and have the ability to teach
ways to access, organize, and use
information
BDK21 - Hosting Data
Dissemination And Data
Stewardship Workshops
Competency #7
Understand scientific research methods
and have the ability to critically examine
and filter research literature from many
related disciplines
BDK07- Team Science
BDK18 - Visualization And
Interpretation
BDK19 - Replication,
Validation And The
Spectrum Of Reproducibility
BDK01 - Biomedical Big Data
Science
BDK04 - Clinical Data And Standards
Related To Big Data
BDK05 - Basic Research Data Standards
BDK04 - Clinical Data And Standards
Related To Big Data
BDK05 - Basic Research Data Standards
Challenges
Scope
Images
Style
Dissemination
How to scope generic curricula for different
levels of users
How to translate diverse teaching
styles into general materials
How to maximize dissemination
while protecting intellectual
property
How to incorporate images and other
copyrighted materials into open
resources
Who are these resources for? EVERYONE!
thenounproject.com
Undergraduate
Students
Graduate
Students
Clinicians Post-docs
Librarians Staff Faculty
Help review our modules:
https://dmice.ohsu.edu/bd2k/topics.html
Acknowledgements
Bill Hersh, PI Melissa Haendel, PI Shannon McWeeney, PI David Dorr, PI
Ted Laderas,
Instructor
Jackie Wirz,
Instructor
Nicole Vasilevsky,
Instructor
Bjorn Pederson,
Instructional Designer
This work is supported by NIH Grants 1R25EB020379-01 and 1R25GM114820-01.
You can find me at:
@n_vasilevsky
vasilevs@ohsu.edu
Thanks!

Data science education resources for everyone

  • 1.
    Data science educationresources for everyone Nicole Vasilevsky, Jackie Wirz, Bjorn Pederson, Ted Laderas, Shannon McWeeney, William Hersh, David A. Dorr, Melissa Haendel Oregon Health & Science University MLA/PNC 2016
  • 2.
    The problem Major challenge:how to manage, analyze and interpret vast amounts of data being generated in biomedical research One goal of NIH Big Data to Knowledge (BD2K) initiative: provide training for students and researchers to address this Research team in the Library and Department of Medical Informatics and Clinical Epidemiology (DMICE) is developing skills courses and open educational resources (OERs) http://whelf.ac.uk/activity-data-delivering-benefits-from-the-data-deluge/
  • 3.
    Our Approach Skills coursesand OERs connect the dots that help researchers understand how to apply data science techniques in the context of their whole research life cycle Skills courses and OER topics are aimed to fill specific gaps
  • 4.
    BD2K Skills Courses Taughtby BD2K Faculty, Post-doc and Staff In person format Targeted to a variety of students
  • 5.
    Defining The ProblemWrangling Data Data Identification And Resources  Problems amenable to analytics  Importance of question  Team definitions  Scope  When we do this wrong: methods don't match  Finding the right data  Search methods  Use of metadata  Data management  Exploratory Data Analysis  Data Dictionary  As you touch data, what can go wrong? Methods, Tools And Analysis Scientific Communication  Visualization  Matching algorithms to problems …  Reporting Findings and Limitations  Giving “Elevator Speech” on ideas of how to approach problem  Critique of related problem
  • 6.
    Course Offerings Course LengthWho WhatWhen Intro Course Week long course (~40 hrs) July 2015 Interns and undergraduates Taught basics of data science in the context of the research life cycle Data After Dark 2 evening course (4 hrs/nt) January 2016 OHSU students, staff and faculty Emerging data science activities/research impact Data and Donuts 2 morning course (3 hrs/day) June 2016 OHSU Summer interns Basics of data science Advanced Course 4 evening course (2 hrs/nt) May 2016 OHSU students, staff and faculty Hands on Data viz / Data wrangling Data and Donuts West 4 hour course July 2016 OHSU summer interns (West Campus) Basics of data science
  • 7.
    Think like adata scientis t - the Data and Donuts workshop will provide an introduction to data science for those new to research. Summer interns encouraged to attend! Topics covered will include • What is Big Data? • Asking the right question and getting the right answers from your data • Finding data resources in the real world • Data handling 101 • Ethics of data• Communicating your science for maximal impact June 2 8 & 2 9 | 9 - 1 2 PM | D onut s! Fr ee Wor kshop! DataAndDonuts Interested? Register at http : // bit.ly/ 1sfDeXz or email wirzj@ohsu.eduw w w .ohsu.edu/ bd2 k Hands-on! Learn by Doing! Join us for a 4 evening workshop: · Data Wrangling with Python and Pandas · Interactive visualization with R/ Shiny · Supervised Learning Algorithms + Kaggle Challenge Familiarity with R and Git is required. Bring your laptop! ! May 23-26th 5-7pm Register at http:/ / bit.ly/ 1pFVvLv Department of Medical Informatics + Clinical Epidemiology + OHSU Library Funding: NIH 5R25EB020379 For more information, e-mail bd2k@ohsu.edu FREE OHSU BD2K ADVANCED DATA AFTER DARK WORKSHOP
  • 8.
    Evaluation of SkillsCourses 0% 20% 40% 60% 80% 100% Evaluation Summary from Beginnner Students Beginner Percent 6 & 7 Beginner Percent 3, 4 & 5 Beginner Percent 1 & 2 0% 20% 40% 60% 80% 100% Evaluation Summary from Advanced Students Advanced Percent 6 & 7 Advanced Percent 3, 4 & 5 Advanced Percent 1 & 2 The instructors clearly presented the skills to be learned The instructors presented content in an organized manner The instructors effectively presented concepts and techniques
  • 9.
    OER Modules 01 |Biomedical Big Data Science 02 | Introduction to Big Data in Biology and Medicine 03 | Ethical Issues in Use of Big Data 04 | Clinical Standards Related to Big Data 05 | Basic Research Data Standards 06 | Public Health and Big Data 07 | Team Science 08 | Secondary Use (Reuse) of Clinical Data 09 | Publication and Peer Review 10 | Information Retrieval 11 | Version Control and Identifiers 12 | Data annotation and curation 13 | Data Tools and Landscape 14 | Ontologies 101 15 | Data metadata and provenance 16 | Semantic data interoperability 17 | Choice of Algorithms and Algorithm Dynamics 18 | Visualization and Interpretation 19 | Replication, Validation and the spectrum of Reproducibility Semantic data interoperability 20 | Regulatory Issues in Big Data for Genomics and Health Semantic Web data 21 | Hosting data dissemination and data stewardship workshops 22 | Hosting data dissemination and data stewardship workshops 23 | Terminology of Biomedical, Clinical, and Translational Research 24 | Computing Concepts for Big Data 25 | Data modeling 26 | Semantic Web data 27 | Context-based selection of data 28 | Translating the Question 29 | Implications of Provenance and Pre-processing 30 | Data tells a story 31 | Statistical Significance, P-hacking and Multiple-testing 32 | Displaying Confidence and Uncertainty https://dmice.ohsu.edu/bd2k/topics.html
  • 10.
    What is availablein the modules? Module Overview Online viewing Powerpoint files Audio files Exercises References Resources
  • 11.
    MLA- Professional CompetenciesFor Health Sciences Librarians https://dmice.ohsu.edu/bd2k/mapping_MLA.html Competency #1 Understand the health sciences and health care environment and the policies, issues, and trends that impact that environment BDK02 - Introduction To Big Data In Biology And Medicine BDK03 - Ethical Issues In Use Of Big Data BDK07- Team Science Competency #3 Understand the principles and practices related to providing information services to meet users' needs BDK10 - Information Retrieval BDK22 - Guidelines For Reporting, Publications, And Data Sharing Competency #4 Have the ability to manage health information resources in a broad range of formats BDK09 - Publication And Peer Review BDK12 - Data Annotation And Curation BDK14 - Ontologies 101 BDK15 - Data Metadata And Provenance Competency #5 Understand and use technology and systems to manage all forms of information BDK10 - Information Retrieval BDK12 - Data Annotation And Curation BDK13 - Data and tools landscape BDK14 - Ontologies 101 BDK26 - Introduction to Semantic Web data Competency #6 Understand curricular design and instruction and have the ability to teach ways to access, organize, and use information BDK21 - Hosting Data Dissemination And Data Stewardship Workshops Competency #7 Understand scientific research methods and have the ability to critically examine and filter research literature from many related disciplines BDK07- Team Science BDK18 - Visualization And Interpretation BDK19 - Replication, Validation And The Spectrum Of Reproducibility BDK01 - Biomedical Big Data Science BDK04 - Clinical Data And Standards Related To Big Data BDK05 - Basic Research Data Standards BDK04 - Clinical Data And Standards Related To Big Data BDK05 - Basic Research Data Standards
  • 12.
    Challenges Scope Images Style Dissemination How to scopegeneric curricula for different levels of users How to translate diverse teaching styles into general materials How to maximize dissemination while protecting intellectual property How to incorporate images and other copyrighted materials into open resources
  • 13.
    Who are theseresources for? EVERYONE! thenounproject.com Undergraduate Students Graduate Students Clinicians Post-docs Librarians Staff Faculty
  • 14.
    Help review ourmodules: https://dmice.ohsu.edu/bd2k/topics.html
  • 15.
    Acknowledgements Bill Hersh, PIMelissa Haendel, PI Shannon McWeeney, PI David Dorr, PI Ted Laderas, Instructor Jackie Wirz, Instructor Nicole Vasilevsky, Instructor Bjorn Pederson, Instructional Designer This work is supported by NIH Grants 1R25EB020379-01 and 1R25GM114820-01.
  • 16.
    You can findme at: @n_vasilevsky vasilevs@ohsu.edu Thanks!

Editor's Notes

  • #12 I won’t read all the content on this slide, the point will just be that we mapped the MLA professional competencies to the BD2K modules. For 6 of the 7 MLA professional competencies, there are BD2K modules that could help train Librarians in these areas.