Teaching Data Science to Undergraduate
Students
Nicole Vasilevsky
Oregon Health & Science University Library
From Evidence to Scholarship Conference
Reed College
March 16, 2018
The Data Deluge
http://jooinn.com/the-big-data-deluge.html
Issues
Data
producers
Data
Users
How do we ask the right
research question? ✔ ✔
How do we best manage,
utilize, and interpret new
knowledge from all this
data?
✔ ✔
How do we ensure our
data is reusable and
reproducible?
✔
How do we access and
reuse data from other
sources?
✔
Data science is an emerging interdisciplinary field
where researchers are trained to extract
knowledge from data, to make it more structured
and re-usable, and to garner new insights
https://www.quora.com/Will-learning-Tableau-help-me-become-a-better-data-scientist
Data Science is interdisciplinary
Data and Donuts
OHSU Library Data
Science Institute
Open Educational
Resources
1-2 day workshops
offered to OHSU
summer interns
3 day workshop offered
to researchers and
librarians
Freely available, online
training materials
Addressing the need for data science training
SStudent demographics
S
• OHSU is a biomedical university
• Medicine
• Nursing
• Public Health
• Basic science research
• etc
• OERs were funding by NIH, focus
on biomedical approaches
Open Educational Resources
NIH Big Data to Knowledge (BD2K) Program
One goal of NIH BD2K initiative is to provide training for students and
researchers to address challenges in managing, analyzing and interpreting big
data
Research team in the Library and Department of Medical Informatics and
Clinical Epidemiology (DMICE) developed open educational resources
(OERs) and skills courses
Our Approach
OERs and Skills Courses connect the dots that help researchers understand how to apply
data science techniques in the context of their whole research life cycle
Modules are aimed to fill specific gaps in the research process
Finding
resources/
data
Introduction to
Big Data
Managing data
and applying
data standards
Ethics and
regulatory
issues
Methods for
analysis,
visualization,
interpretation
Collaboration
and team
science
Sharing and
Dissemination
https://github.com/OHSUBD2K/
Data and Donuts
Think like a data scientist - the Data and Donuts workshop
will provide an introduction to data science for those new
to research. Summer interns encouraged to attend!
Topics covered will include
• What is Big Data?
• Asking the right question and getting the right
answers from your data
• Finding data resources in the real world
• Data handling 101
• Ethics of data
• Communicating your science for maximal
impact
June 2 8 & 2 9 | 9 - 1 2 PM | D onut s!
Fr ee Wor kshop!
DataAndDonuts
Interested?
Register at http
:
// bit.ly/ 1sfDeXz
or email wirzj@ohsu.edu
w w w .ohsu.edu/ bd2 k
Topics covered:
 What is big data?
 Asking the right question and
getting the right answers from
your data
 Finding data resources in the
real world
 Data handling 101
 Ethics of data
 Communicating your science for
maximal impact
Lessons learned
Coffee
1
Teaching RDM is
challenging
Interactive exercises 2
3 Games
Donuts are a hit 4
Add image
OHSU Library Data Science Institute
Structure of the institute/schedule
Day 1 Day 2 Day 3
• Introduction to Command
line/GitHub
• Data Exploration and
Statistics
• Data description
• Research Data Standards
• Data Sharing and Reuse
• Mixed methods: Quantitative
and Qualitative research
• Analyzing textual data
• Web scraping
• Mapping and Geospatial
Visualization with QGIS
Data visualization
Lessons learned
Coffee
1 Train the trainer
Targeted audience 2
3 Be adaptable
Coffee! 4
http://sites.nationalacademies.org/deps/bmsa/deps_180066
Conclusion and Future Work
Conclusion
• Successfully have offered short trainings to undergrads, OHSU
students, researcher, librarians and others
• A lot of lessons learned along the way
• Big demand for data science training
• Training sessions should be hands on and interactive
Next steps
• Data and Donuts again this summer for OHSU summer interns
• Plan a future OHSU Library Data Science Institute?
• BioData Club
• Data Jamborees
• Encourage usage of our Open Educational Resources
• We are open to new collaborations
GCC/BOSC 2018, Reed College, June 25-30, 2018
https://galaxyproject.org/events/gccbosc2018/
Resources
• OHSU Library Data Science Institute: https://ohsulibrary-
datascienceinstitute.github.io/
• BioData Club: https://biodata-club.github.io/
• Data Jamboree: http://www.ohsu.edu/xd/education/schools/school-of-
medicine/departments/computational-biology/events/data-jamboree.cfm
• Open Educational Resources: https://github.com/OHSUBD2K/
https://github.com/OHSUBD2K/Presentations
Acknowledgements
The OHSU Library Data Science Institute was supported by NNLM PNR under the National Library of Medicine
(NLM), National Institutes of Health (NIH) cooperative agreement number UG4LM012343. Data and Donuts and the
Open Educational Resources were supported by NIH Grants 1R25EB020379-01 and 1R25GM114820-01.
Bill Hersh Melissa
Haendel
David
Dorr
Shannon
McWeeney
Bjorn
Pederson
Jackie Wirz Robin
Champieux
Letisha
Wyatt
Laura Zeigan
Ted
Laderas
You can find me at:
@n_vasilevsky
vasilevs@ohsu.edu
Thanks!
https://github.com/OHSUBD2K/Presentations

Teaching Data Science to Undergraduate Students

  • 1.
    Teaching Data Scienceto Undergraduate Students Nicole Vasilevsky Oregon Health & Science University Library From Evidence to Scholarship Conference Reed College March 16, 2018
  • 2.
    The Data Deluge http://jooinn.com/the-big-data-deluge.html Issues Data producers Data Users Howdo we ask the right research question? ✔ ✔ How do we best manage, utilize, and interpret new knowledge from all this data? ✔ ✔ How do we ensure our data is reusable and reproducible? ✔ How do we access and reuse data from other sources? ✔
  • 3.
    Data science isan emerging interdisciplinary field where researchers are trained to extract knowledge from data, to make it more structured and re-usable, and to garner new insights
  • 4.
  • 5.
    Data and Donuts OHSULibrary Data Science Institute Open Educational Resources 1-2 day workshops offered to OHSU summer interns 3 day workshop offered to researchers and librarians Freely available, online training materials Addressing the need for data science training
  • 6.
    SStudent demographics S • OHSUis a biomedical university • Medicine • Nursing • Public Health • Basic science research • etc • OERs were funding by NIH, focus on biomedical approaches
  • 7.
  • 8.
    NIH Big Datato Knowledge (BD2K) Program One goal of NIH BD2K initiative is to provide training for students and researchers to address challenges in managing, analyzing and interpreting big data Research team in the Library and Department of Medical Informatics and Clinical Epidemiology (DMICE) developed open educational resources (OERs) and skills courses
  • 9.
    Our Approach OERs andSkills Courses connect the dots that help researchers understand how to apply data science techniques in the context of their whole research life cycle
  • 10.
    Modules are aimedto fill specific gaps in the research process Finding resources/ data Introduction to Big Data Managing data and applying data standards Ethics and regulatory issues Methods for analysis, visualization, interpretation Collaboration and team science Sharing and Dissemination https://github.com/OHSUBD2K/
  • 11.
  • 12.
    Think like adata scientist - the Data and Donuts workshop will provide an introduction to data science for those new to research. Summer interns encouraged to attend! Topics covered will include • What is Big Data? • Asking the right question and getting the right answers from your data • Finding data resources in the real world • Data handling 101 • Ethics of data • Communicating your science for maximal impact June 2 8 & 2 9 | 9 - 1 2 PM | D onut s! Fr ee Wor kshop! DataAndDonuts Interested? Register at http : // bit.ly/ 1sfDeXz or email wirzj@ohsu.edu w w w .ohsu.edu/ bd2 k Topics covered:  What is big data?  Asking the right question and getting the right answers from your data  Finding data resources in the real world  Data handling 101  Ethics of data  Communicating your science for maximal impact
  • 13.
    Lessons learned Coffee 1 Teaching RDMis challenging Interactive exercises 2 3 Games Donuts are a hit 4 Add image
  • 14.
    OHSU Library DataScience Institute
  • 16.
    Structure of theinstitute/schedule Day 1 Day 2 Day 3 • Introduction to Command line/GitHub • Data Exploration and Statistics • Data description • Research Data Standards • Data Sharing and Reuse • Mixed methods: Quantitative and Qualitative research • Analyzing textual data • Web scraping • Mapping and Geospatial Visualization with QGIS Data visualization
  • 17.
    Lessons learned Coffee 1 Trainthe trainer Targeted audience 2 3 Be adaptable Coffee! 4 http://sites.nationalacademies.org/deps/bmsa/deps_180066
  • 18.
    Conclusion and FutureWork Conclusion • Successfully have offered short trainings to undergrads, OHSU students, researcher, librarians and others • A lot of lessons learned along the way • Big demand for data science training • Training sessions should be hands on and interactive Next steps • Data and Donuts again this summer for OHSU summer interns • Plan a future OHSU Library Data Science Institute? • BioData Club • Data Jamborees • Encourage usage of our Open Educational Resources • We are open to new collaborations
  • 19.
    GCC/BOSC 2018, ReedCollege, June 25-30, 2018 https://galaxyproject.org/events/gccbosc2018/
  • 20.
    Resources • OHSU LibraryData Science Institute: https://ohsulibrary- datascienceinstitute.github.io/ • BioData Club: https://biodata-club.github.io/ • Data Jamboree: http://www.ohsu.edu/xd/education/schools/school-of- medicine/departments/computational-biology/events/data-jamboree.cfm • Open Educational Resources: https://github.com/OHSUBD2K/ https://github.com/OHSUBD2K/Presentations
  • 21.
    Acknowledgements The OHSU LibraryData Science Institute was supported by NNLM PNR under the National Library of Medicine (NLM), National Institutes of Health (NIH) cooperative agreement number UG4LM012343. Data and Donuts and the Open Educational Resources were supported by NIH Grants 1R25EB020379-01 and 1R25GM114820-01. Bill Hersh Melissa Haendel David Dorr Shannon McWeeney Bjorn Pederson Jackie Wirz Robin Champieux Letisha Wyatt Laura Zeigan Ted Laderas
  • 22.
    You can findme at: @n_vasilevsky vasilevs@ohsu.edu Thanks! https://github.com/OHSUBD2K/Presentations

Editor's Notes

  • #11 Image credits: Big data: Josh, The Noun Project (https://thenounproject.com/jkdubb/) Finding resources, By creative outlet, The Noun Project (https://thenounproject.com/creativeoutlet/) Scale: Ralf Schmitzer, The Noun Project (https://thenounproject.com/ralfschmitzer/) Information download: By Vectors Market Analysis: By Yamini Ahluwalia, GB Share: By Anand A Nair Team: By Rockicon, The Noun Project (https://thenounproject.com/rockicon/)
  • #13 Add text on slide