SlideShare a Scribd company logo
1 of 40
Designing a Synergistic Relationship Between
Undergraduate Data Science Education and
Usability of Biodiversity Databases
Ciera Martinez, PhD
Berkeley Institute of Data Science
Mozilla Foundation
1. Motivation
2. Project set-up
2. Results of the project
3. End questions
Outline
1. Motivation
2. Project set-up
2. Results of the project
3. End questions
Outline
How do
organisms get
their shape?
Evolution!
Motivation
Learning to perform computational research is hard
- Attempting to gain the correct skills is frustrating
- They don’t teach the skills you need in classes
- PIs don’t teach these skills to their students
How I learned
- Using online tutorials and documentation
- Talking with peers, largely online
- Teaching others
Motivation
Now committed to programming education
Chitwood et al, 2014, Martinez et al, 2019, Martinez et al, 2016a, Martinez et al., 2016b,
Utilizing data made my research faster and better
Motivation
fieldmuseum.org
Motivation
ag.com/smithsonian-institution/story-behind-those-jaw-dropping-photos-collections-natur
Motivation
Motivation
Ect
Yay!
So much cool data!
Motivation
Ect
Yay!
So much cool data!
Nay! It’s hard to:
Navigate these databases
Understand what is in these databases
Access data within these databases
Motivation
Data Science
Education
Using Biodiversity data
To answer research questions+ =
Curiosity Data Project
Motivation
1. Motivation
2. Project set-up
2. Results of the project
3. End questions
Outline
Guided by mentors, students are given free rein to explore a
database of their choosing with the end goal of creating a
tutorial outlining their process.
Curiosity Data Project
To use biodiversity databases and inquiry based learning to teach
data literacy skills to undergraduate and graduate students.
Project Overview
Project set-up
Synergistic Relationship
Project set-up
1.Got team together
• Recruited eight undergraduates from pool of 30
• Some programming knowledge (R or Python)
• Majors: CS, Stats, Biology
• Recruited mentor help: 1 PhD student and 1
industry data scientist
Project set-up
2. The students were asked to pick one
database and document how they access,
clean, and analyze the data from that database
Project set-upProject set-up
2. The students were asked to pick one
database and document how they access,
clean, and analyze the data from that database
List of prompt questions to begin exploration
• What features you find most useful?
• What kind of questions can we ask when using this database?
• What tools are available to use this database?
• What skills are required to use this database?
• How do you access the data?
• What tutorials help you access the database?
• What are the challenges to using this data?
• What are the rules for using this database?
Project set-upProject set-up
• We are giving back to the community by teaching others
• They get over their fears of posting their code online
• Expose them to open science philosophy and
computational reproducibility
• Focus on code readability
• Allow them to learn how to present a story
• Teaches them computational research notebook
management
• Regularly use Git and Github
• Start online presence and portfolio
3. Student end project goal is a tutorial that will
be posted online
Project set-up
Reasoning
Project set-up
4. We met for 20 min individually each week
to discuss what they worked on, challenges,
and plan next weeks work. (~3 hours a
week)
Project set-upProject set-up
1. Motivation
2. Project set-up
2. Results of the project
3. End questions
Outline
I am so impressed with their work!
Results of the project
curiositydata.org
Ask how animals interact with network graph visualization
Yikang Li
GloBI
Results of the project
curiositydata.org
Tracking animal movement in time and space
Caryn
MoveBank
Results of the project
curiositydata.org
Mapping dinosaur and plant fossil data in time and
space
Paleobiology Database (PBDB)
Results of the project
curiositydata.org
Coming Soon: Explore 3D CT Scans using open source tools
Morphosource
Results of the project
Samantha
curiositydata.org
Exploring the data of natural history
Results of the project
What worked?
What didn’t work?
What did the students like about the project?
What did the students dislike about the project?
Results of the project
- They all thrived in self directed inquiry
-These tutorials can be coopted for workshops AND be
remixed for lesson modules
-Feedback works if directly in contact with developer
- They all enjoyed it and stuck with it, I didn’t lose one
student
- Students became extremely interested in the data
What worked?
Results of the project
- The students skills did improve!
What worked?
Results of the project
-They sometimes floundered with so much freedom
-It was hard to assess how much work they did each week
-They could not create tools like I originally planned
-The limiting factor is me, hard to scale.
-Reviewing the code
-Dealing with notebook to website
-Individual meetings
Results of the project
What didn’t work?
- “Ability to take the project in whatever direction I
wanted”
- “I love manipulating data programmatically and making
plots with the data.”
-“I enjoyed learning new skills outside of the classroom and
actually being able to create a tutorial using data that I was
really interested in!”
-“Exploring different data visualization methods and tailor
them to fit the data set”
Results of the project
What did they like about the project?
Results of the project
What did they dislike about the project?
-“Debugging code and figuring out how to make an
interesting…story out of the data”
-“Downloading data using API and storing big data.”
-"Lack of structure”
-“Trying to figure out how to solve the problems like dealing
with missing and "bad" data”
-“That some visualization methods/packages are difficult to
install and use “
-“Debugging code and figuring out how to make an
interesting…story out of the data”
-“Downloading data using API and storing big data.”
-"Lack of structure”
-“Trying to figure out how to solve the problems like dealing
with missing and "bad" data”
-“That some visualization methods/packages are difficult to
install and use “
But everything they listed is normal frustrations about
working with code, so they learned what is common
frustrations with the code
Results of the project
What did they dislike about the project?
CURIOSITY
DATA
TEAM
Caryn
CieraWinifred
Samantha Shirley
Sara
Zoe
Yikang OllieLynn
*Not pictured Yihui
1. Motivation
2. Project set-up
2. Results of the project
3. End questions
Outline
- Is there something here with gender and this data? Why? How do
we leverage for bridging gender gap in STEM?
- Can this scale as an academic course?
- Can this scale in the online community?
- Do you need to know how to program to run a similar project?
- Will people use these tutorials?
- Where is an appropriate place to house these tutorials?
End questions
Thank You!
Neotoma, PBDB, GloBI, Movebank, Google Earth,
iDigBio, BHL, Morphosource
All the museums, collections, and universities that hosted me to visit, and
provided insightful discussion and critical feedback
Berkeley Museum of Vertebrate and Zoology, New York Botanical Garden,
National Museum of Denmark, Kew Gardens, Natural History Museum of Utah,
American Museum of Natural History, Cooper-Hewitt Design Museum, UT
Austin Vertebrate Paleontology Laboratory Collection, Digimorph Facility, UT
Austin Ichthyology Collection, Lewis and Clark College, Natural History
Museum (London)
Special Thanks: Deb Paul
Learning Objectives
Link to exact document
- identify assumptions about data
- formulate research questions
- tell a story with data
- visualize data
- build data science portfolio
- write collaborative code
- employ version control
- apply data / file management
- think with reproducibility in mind
- handle and merge data
- clean messy data
Overlapping Data Literacy Skills
1. Manipulation of data frames
• subset
• merge
• transform long vs wide
• apply formulas to values
2. Map points onto a geographic map, understand assumptions
3. Clean taxonomic names, understand assumptions
1. Species concept
2. Spelling errors
3. Species synonyms
4. Hierarchical classification
5. Species databases available and differences between them
4. Time stamp data formats
1. Time collected vs time measured
2. Time formats
3. Visualization techniques for time and space
5. Look into biology for sanity checks
6. Understand the value of plotting data and which visualization is important
Minimum

More Related Content

What's hot

A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationUniversity of South Africa (Unisa)
 
22nd Academic Design Management Conference DMI
22nd Academic Design Management Conference DMI 22nd Academic Design Management Conference DMI
22nd Academic Design Management Conference DMI Brigitte Borja de Mozota
 
Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...Nicola Osborne
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research PaperAnita de Waard
 
Towards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service InnovationTowards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service InnovationJack Park
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introductionDinesh K
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research PaperAnita de Waard
 
Web analytics webinar
Web analytics webinarWeb analytics webinar
Web analytics webinarJim Jansen
 
googlization of information
googlization of informationgooglization of information
googlization of informationrajat00001in
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automationbenosteen
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformaticsc.titus.brown
 
The New e-Science (Bangalore Edition)
The New e-Science (Bangalore Edition)The New e-Science (Bangalore Edition)
The New e-Science (Bangalore Edition)David De Roure
 
Experiments in Educational Research & Practice
Experiments in Educational Research & PracticeExperiments in Educational Research & Practice
Experiments in Educational Research & PracticeJoseph Jay Williams
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behaviorJames Howison
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science Carole Goble
 

What's hot (17)

A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) Education
 
22nd Academic Design Management Conference DMI
22nd Academic Design Management Conference DMI 22nd Academic Design Management Conference DMI
22nd Academic Design Management Conference DMI
 
Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
How to Execute A Research Paper
How to Execute A Research PaperHow to Execute A Research Paper
How to Execute A Research Paper
 
Towards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service InnovationTowards An Improvement Community Platform for Service Innovation
Towards An Improvement Community Platform for Service Innovation
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introduction
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research Paper
 
Web analytics webinar
Web analytics webinarWeb analytics webinar
Web analytics webinar
 
Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]
 
googlization of information
googlization of informationgooglization of information
googlization of information
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automation
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
The New e-Science (Bangalore Edition)
The New e-Science (Bangalore Edition)The New e-Science (Bangalore Edition)
The New e-Science (Bangalore Edition)
 
Experiments in Educational Research & Practice
Experiments in Educational Research & PracticeExperiments in Educational Research & Practice
Experiments in Educational Research & Practice
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behavior
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
 

Similar to Designing a synergistic relationship between undergraduate Data Science education and usability of Biodiversity databases

Technology Integration
Technology IntegrationTechnology Integration
Technology Integrationlxshelby
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
Organizing and Securing Ethnographic Field Materials.pptx
Organizing and Securing Ethnographic Field Materials.pptxOrganizing and Securing Ethnographic Field Materials.pptx
Organizing and Securing Ethnographic Field Materials.pptxCelia Emmelhainz
 
Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...Keith Webster
 
Ms paul s_biology_class1
Ms paul s_biology_class1Ms paul s_biology_class1
Ms paul s_biology_class1liscricket
 
/Users/jesseurban/desktop/ms paul s_biology_class1
/Users/jesseurban/desktop/ms paul s_biology_class1/Users/jesseurban/desktop/ms paul s_biology_class1
/Users/jesseurban/desktop/ms paul s_biology_class1liscricket
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
Loanable equipment supporting creation and dissemination for the campus commu...
Loanable equipment supporting creation and dissemination for the campus commu...Loanable equipment supporting creation and dissemination for the campus commu...
Loanable equipment supporting creation and dissemination for the campus commu...Shawna Sadler
 
Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24tracykteal
 
Designing and delivering an international MOOC on Research Data Management an...
Designing and delivering an international MOOC on Research Data Management an...Designing and delivering an international MOOC on Research Data Management an...
Designing and delivering an international MOOC on Research Data Management an...Robin Rice
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New ScienceAnita de Waard
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
Start making sense: Talking data management with researchers
Start making sense: Talking data management with researchersStart making sense: Talking data management with researchers
Start making sense: Talking data management with researchersIncremental Project
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...University of California Curation Center
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅kulibrarians
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 

Similar to Designing a synergistic relationship between undergraduate Data Science education and usability of Biodiversity databases (20)

Technology Integration
Technology IntegrationTechnology Integration
Technology Integration
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Organizing and Securing Ethnographic Field Materials.pptx
Organizing and Securing Ethnographic Field Materials.pptxOrganizing and Securing Ethnographic Field Materials.pptx
Organizing and Securing Ethnographic Field Materials.pptx
 
Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...Immersive informatics - research data management at Pitt iSchool and Carnegie...
Immersive informatics - research data management at Pitt iSchool and Carnegie...
 
Ms paul s_biology_class1
Ms paul s_biology_class1Ms paul s_biology_class1
Ms paul s_biology_class1
 
/Users/jesseurban/desktop/ms paul s_biology_class1
/Users/jesseurban/desktop/ms paul s_biology_class1/Users/jesseurban/desktop/ms paul s_biology_class1
/Users/jesseurban/desktop/ms paul s_biology_class1
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social Sciences
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Loanable equipment supporting creation and dissemination for the campus commu...
Loanable equipment supporting creation and dissemination for the campus commu...Loanable equipment supporting creation and dissemination for the campus commu...
Loanable equipment supporting creation and dissemination for the campus commu...
 
Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24
 
Designing and delivering an international MOOC on Research Data Management an...
Designing and delivering an international MOOC on Research Data Management an...Designing and delivering an international MOOC on Research Data Management an...
Designing and delivering an international MOOC on Research Data Management an...
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New Science
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Start making sense: Talking data management with researchers
Start making sense: Talking data management with researchersStart making sense: Talking data management with researchers
Start making sense: Talking data management with researchers
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 

Recently uploaded

BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 

Recently uploaded (20)

BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 

Designing a synergistic relationship between undergraduate Data Science education and usability of Biodiversity databases

  • 1. Designing a Synergistic Relationship Between Undergraduate Data Science Education and Usability of Biodiversity Databases Ciera Martinez, PhD Berkeley Institute of Data Science Mozilla Foundation
  • 2. 1. Motivation 2. Project set-up 2. Results of the project 3. End questions Outline
  • 3. 1. Motivation 2. Project set-up 2. Results of the project 3. End questions Outline
  • 4. How do organisms get their shape? Evolution! Motivation
  • 5. Learning to perform computational research is hard - Attempting to gain the correct skills is frustrating - They don’t teach the skills you need in classes - PIs don’t teach these skills to their students How I learned - Using online tutorials and documentation - Talking with peers, largely online - Teaching others Motivation Now committed to programming education
  • 6. Chitwood et al, 2014, Martinez et al, 2019, Martinez et al, 2016a, Martinez et al., 2016b, Utilizing data made my research faster and better Motivation
  • 10. Ect Yay! So much cool data! Motivation
  • 11. Ect Yay! So much cool data! Nay! It’s hard to: Navigate these databases Understand what is in these databases Access data within these databases Motivation
  • 12. Data Science Education Using Biodiversity data To answer research questions+ = Curiosity Data Project Motivation
  • 13. 1. Motivation 2. Project set-up 2. Results of the project 3. End questions Outline
  • 14. Guided by mentors, students are given free rein to explore a database of their choosing with the end goal of creating a tutorial outlining their process. Curiosity Data Project To use biodiversity databases and inquiry based learning to teach data literacy skills to undergraduate and graduate students. Project Overview Project set-up
  • 16. 1.Got team together • Recruited eight undergraduates from pool of 30 • Some programming knowledge (R or Python) • Majors: CS, Stats, Biology • Recruited mentor help: 1 PhD student and 1 industry data scientist Project set-up
  • 17. 2. The students were asked to pick one database and document how they access, clean, and analyze the data from that database Project set-upProject set-up
  • 18. 2. The students were asked to pick one database and document how they access, clean, and analyze the data from that database List of prompt questions to begin exploration • What features you find most useful? • What kind of questions can we ask when using this database? • What tools are available to use this database? • What skills are required to use this database? • How do you access the data? • What tutorials help you access the database? • What are the challenges to using this data? • What are the rules for using this database? Project set-upProject set-up
  • 19. • We are giving back to the community by teaching others • They get over their fears of posting their code online • Expose them to open science philosophy and computational reproducibility • Focus on code readability • Allow them to learn how to present a story • Teaches them computational research notebook management • Regularly use Git and Github • Start online presence and portfolio 3. Student end project goal is a tutorial that will be posted online Project set-up Reasoning Project set-up
  • 20. 4. We met for 20 min individually each week to discuss what they worked on, challenges, and plan next weeks work. (~3 hours a week) Project set-upProject set-up
  • 21. 1. Motivation 2. Project set-up 2. Results of the project 3. End questions Outline
  • 22. I am so impressed with their work! Results of the project
  • 23. curiositydata.org Ask how animals interact with network graph visualization Yikang Li GloBI Results of the project
  • 24. curiositydata.org Tracking animal movement in time and space Caryn MoveBank Results of the project
  • 25. curiositydata.org Mapping dinosaur and plant fossil data in time and space Paleobiology Database (PBDB) Results of the project
  • 26. curiositydata.org Coming Soon: Explore 3D CT Scans using open source tools Morphosource Results of the project Samantha
  • 27. curiositydata.org Exploring the data of natural history Results of the project
  • 28. What worked? What didn’t work? What did the students like about the project? What did the students dislike about the project? Results of the project
  • 29. - They all thrived in self directed inquiry -These tutorials can be coopted for workshops AND be remixed for lesson modules -Feedback works if directly in contact with developer - They all enjoyed it and stuck with it, I didn’t lose one student - Students became extremely interested in the data What worked? Results of the project
  • 30. - The students skills did improve! What worked? Results of the project
  • 31. -They sometimes floundered with so much freedom -It was hard to assess how much work they did each week -They could not create tools like I originally planned -The limiting factor is me, hard to scale. -Reviewing the code -Dealing with notebook to website -Individual meetings Results of the project What didn’t work?
  • 32. - “Ability to take the project in whatever direction I wanted” - “I love manipulating data programmatically and making plots with the data.” -“I enjoyed learning new skills outside of the classroom and actually being able to create a tutorial using data that I was really interested in!” -“Exploring different data visualization methods and tailor them to fit the data set” Results of the project What did they like about the project?
  • 33. Results of the project What did they dislike about the project? -“Debugging code and figuring out how to make an interesting…story out of the data” -“Downloading data using API and storing big data.” -"Lack of structure” -“Trying to figure out how to solve the problems like dealing with missing and "bad" data” -“That some visualization methods/packages are difficult to install and use “
  • 34. -“Debugging code and figuring out how to make an interesting…story out of the data” -“Downloading data using API and storing big data.” -"Lack of structure” -“Trying to figure out how to solve the problems like dealing with missing and "bad" data” -“That some visualization methods/packages are difficult to install and use “ But everything they listed is normal frustrations about working with code, so they learned what is common frustrations with the code Results of the project What did they dislike about the project?
  • 36. 1. Motivation 2. Project set-up 2. Results of the project 3. End questions Outline
  • 37. - Is there something here with gender and this data? Why? How do we leverage for bridging gender gap in STEM? - Can this scale as an academic course? - Can this scale in the online community? - Do you need to know how to program to run a similar project? - Will people use these tutorials? - Where is an appropriate place to house these tutorials? End questions
  • 38. Thank You! Neotoma, PBDB, GloBI, Movebank, Google Earth, iDigBio, BHL, Morphosource All the museums, collections, and universities that hosted me to visit, and provided insightful discussion and critical feedback Berkeley Museum of Vertebrate and Zoology, New York Botanical Garden, National Museum of Denmark, Kew Gardens, Natural History Museum of Utah, American Museum of Natural History, Cooper-Hewitt Design Museum, UT Austin Vertebrate Paleontology Laboratory Collection, Digimorph Facility, UT Austin Ichthyology Collection, Lewis and Clark College, Natural History Museum (London) Special Thanks: Deb Paul
  • 39. Learning Objectives Link to exact document - identify assumptions about data - formulate research questions - tell a story with data - visualize data - build data science portfolio - write collaborative code - employ version control - apply data / file management - think with reproducibility in mind - handle and merge data - clean messy data
  • 40. Overlapping Data Literacy Skills 1. Manipulation of data frames • subset • merge • transform long vs wide • apply formulas to values 2. Map points onto a geographic map, understand assumptions 3. Clean taxonomic names, understand assumptions 1. Species concept 2. Spelling errors 3. Species synonyms 4. Hierarchical classification 5. Species databases available and differences between them 4. Time stamp data formats 1. Time collected vs time measured 2. Time formats 3. Visualization techniques for time and space 5. Look into biology for sanity checks 6. Understand the value of plotting data and which visualization is important Minimum

Editor's Notes

  1. So this is the outline of my talk. Today I am going to talk about a project I that has been the culmination of over ten years of interests of mine. So to start, I am going to take a few minutes to explain my motivations for building this project and how my background led me here.
  2. The short answer is evolution. I am obsessed with thinking about how organisms get their shape. The moment I learned that there are people who can devote their careers to asking such questions, is the exact moment I set a trajectory to become an evolutionary biologist. I vowed to attack this question, How do organism get their shape, to where ever it led me.
  3. Where it led me, is where almost every modern scientist is led. To a lot of data and a lot of frustrations on how to use that data. Learning to program is hard. It took many years to figure out I was not alone with this frustration, And now I am very committed to teaching programming and the pedagogy behind learning these programming and datasceince skills.
  4. And overtime I stopped being so hard on myself. And as my programming skills improved, my research questions became bigger and I started uncovering surprising patterns in my research that would not have been possible without programming. Describe slide. Instead of being frustrated by large amounts of data I became craving it
  5. Now I am hooked, I am currently a postdoc studying comparitive genomics at Berkeley. I consider myself a data scientist, just as much as I consider myself an evolutionary biologist and I began thinking about where the coolest data in the world is to answer my evolutionary questions, and it hit me, duh, Natural History Museums. I got my start in research at the Field Museum in Chicago (maybe you were there last week?),
  6. but from working there I knew the sheer magnitude of specimens not open to the public and I knew Natural History Museum Collections and Botanical Gardens must have amazing data.
  7. But lucky for me, I don’t have to literally go to these museums. I was thrilled to learn about the digitization efforts that this community in the past fifteen years. I could not believe how many databases their were. How did I not know about these before? So then I started devouring as much information I could about how to use this data. and realized THIS is hard. Navigating these databases is hard. Understanding what is in these databases is hard. Accessing the data is hard. But still, I could plainly see, the data from this community is extraordinary and immense, I just wish there was more information on how to use this data!
  8. But lucky for me, I don’t have to literally go to these museums. I was thrilled to learn about the digitization efforts that this community in the past fifteen years. I could not believe how many databases their were. How did I not know about these before? So then I started devouring as much information I could about how to use this data. and realized THIS is hard. Navigating these databases is hard. Understanding what is in these databases is hard. Accessing the data is hard. But still, I could plainly see, the data from this community is extraordinary and immense, I just wish there was more information on how to use this data!
  9. So this year, with the help of the Mozilla foundation, I decided to dive head first into my new obsession - biodiversity and natural history data, while also incorporating my second passion of teaching programming, data science, and computational reproducibility I call the project The Curiosity Data Project, After Cabinet of Curiosities.
  10. So I wanted to help not only undergraduates, but also increase the usability of these databases. So while the databases provide the data and point of contact for questions, The students
  11. I knew from working with undergraduates on my research that they are extremely well trained and hungry for projects that allows them to handle real data.
  12. They all started the same list of questions about what to ask about the data. This structure was built so that it will be useful for 1.Future users of the database and 2. Can be used as critical feedback for database maintainers. These are the sorts of questions that every database should have somewhere on their site, but often take days to figure out.
  13. Start online presence and portfolio (which is instrumental in them getting a job or get into graduate school) Regularly use Git and Github if you have ever tried to teach or learn git, you know that you can only learn with regular practice. A short course or learning module is boring and nearly impossible way to learn git this way
  14. - even with sometimes very little knowledge of biology. They do what we all do, Google. - Feedback works if directly in contact with developer (for example GloBI maintainer Jorrit Polean worked directly with Yikang Lee discussing problems she had. Gtihub issues were created and database improved) - Students became extremely interested in the data. Meeting with them was a delight, they would tell me all about dinosaurs, how fossils are collected, California fires, sea start diseases, whale migratory patterns. It was a delight.
  15. I should have limited them to the scope of their questions early on. because exploratory data analysis is not linear Understanding and cleaning the data was full time.
  16. Lastly I want to thank my team. Notice anything? Gender: 6 female, 1 male, 1 non-binary gendered student You might be thinking “oh, but maybe they were attracted to a female led data science team”. Not the case, I also have a team of undergraduates with my comparative genomics project: which is 5 male 2 females. The majority of applicants for this project was female - 24 out of the 30 applicant that applied were females. This is something that I have noticed about this community in general, a lot of females are attracted to this type of data. Why? I have some theories, but overall, I think it is data itself,
  17. So I end with a list of questions that I would love your opinion on. Please come find me if you have any thoughts on these questions or data science education in general. I can take any questions not tag
  18. They all started in the same place, and from there, they had to follow their own curiosity (the ultimate motivation).