SlideShare a Scribd company logo
DATA SCIENCE
AND THE LIBRARY
AT UC SAN DIEGO
STEPHANIE LABOU
DATA SCIENCE LIBRARIAN
MARCH 18, 2020
NISO WEBINAR
SOME CONTEXT
• University of California San
Diego
• R1 university
• ~39,000 students
• Data Science Librarian = Data
Librarian +
• I don’t have a library degree or
any previous library
experience
• But I do have lots of experience
TODAY’S TOPIC
“This roundtable discussion will focus on the on-
going need for information professionals to be
well-versed in data science skills in order to
successfully support the work of students, scholars
and other professionals.”
“…additional tools or support are needed for
information professionals as they extract, wrangle,
analyze and present data? “
WHAT IS DATA SCIENCE?
LET’S TALK SEMANTICS
• Data science = artificial intelligence (AI), deep
learning, machine learning (ML), neural
networks, high performance computing (HPC)
• Data science = data cleaning and manipulation,
using code to automate data tasks, data at “big
enough” scale
DATA SCIENCE AS A SUPPORT AREA
MY ROLE
• Questions about:
• 1) Looking for specific data about X
• What does data science – and other domains
leveraging data science methodologies – need? Data!
• 2) Have data – now what?
• Makes up the vast majority of support
THE COMMON THREAD:
COMPUTING
• Questions about using data in compute-heavy ways
• Reading in and formatting data in R/Python
• Working with non-traditional data formats
• API access, web scraping
• Access to additional resources for large (TB)
datasets
• Data & GIS Lab
• Using other platforms related to coding, like
GitHub, Jupyter
WHAT SKILLS DO I NEED FOR
THIS?
• Data life cycle 101 (find, manage, analyze,
preserve, etc.)
• For data science support, need knowledge of at
least one programming language
• Concepts transfer between languages
• My path: self-taught!
• Cons: this is the long and rocky path
• Pros: forced early on to develop excellent problem-solving skills
DO WE ALL NEED TO LEARN
“DATA SCIENCE”?
• In my opinion: no (but it depends)
• What are the support needs?
• Knowing “enough” goes a long way
• A handful of functions for a subset of topics (mostly
data cleaning and manipulation in an automated
platform) goes a long way
• More important to know where to find help, think
through how to approach a problem
SO WHAT SHOULD WE DO?
• Skilling up existing employees
• Library Carpentry, etc.
• “Know just enough to be dangerous”
• Hiring non-library for new/adapted roles
• Aka, my experience
• In-the-field skillset is valuable; higher level of support
• Outsourcing – collaborating with other groups on
campus
• IT, other computing groups
DATA SCIENCE WITHIN THE LIBRARY
EXAMPLE PROJECTS
• Things we’ve done
• Python scripts to automate parts of metadata ingest
into system
• OpenRefine for metadata cleaning
• What we’d like to do
• Automate scraping DataCite
• Perhaps APIs?
GUIDING PRINCIPLES
• Look for problems where data science
methodologies could be the solution
• Could this manual process be automated? (coding)
• Could we better assess our metrics? (analytics)
• Could we better display this info for findability?
(visualization)
• Not “fancy solution in search of a problem”
• Data science for the sake of data science is just more work
for everyone
OTHER POPULAR TOPICS
• Collections as data
• Making existing collections more accessible for data
science topics
• Text mining, natural language processing, etc.
• Data as collections
• Once again: What does data science need? Data!
• Data collections/guides as high value, high use
LESSONS LEARNED
• Adaptability/flexibility
• Software changes but best practices remain (and get
better)
• This is a natural fit for the library!
• Building infrastructure today that will handle
tomorrow’s needs
• Collaboration is key
• Within-library and campus partners
Contact me:
slabou@ucsd.edu
THANK YOU!
QUESTIONS?

More Related Content

What's hot

Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
Albert Anthony Gavino, MBA
 
Putnam Data Quality and the IR
Putnam Data Quality and the IRPutnam Data Quality and the IR
The liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycleThe liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycle
Celia Emmelhainz
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
ASIS&T
 
Slides | Targeting the librarian’s role in research services
Slides | Targeting the librarian’s role in research servicesSlides | Targeting the librarian’s role in research services
Slides | Targeting the librarian’s role in research services
Library_Connect
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
National Information Standards Organization (NISO)
 
Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of Utah
Rebekah Cummings
 
Springer "The Research Data Landscape: Beyond Filling Gaps"
Springer "The Research Data Landscape: Beyond Filling Gaps"Springer "The Research Data Landscape: Beyond Filling Gaps"
Springer "The Research Data Landscape: Beyond Filling Gaps"
National Information Standards Organization (NISO)
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libs
SusanMRob
 
Lee "Supporting Research Data is a Group Effort"
Lee "Supporting Research Data is a Group Effort"Lee "Supporting Research Data is a Group Effort"
Lee "Supporting Research Data is a Group Effort"
National Information Standards Organization (NISO)
 
SLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research supportSLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research support
Library_Connect
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
National Information Standards Organization (NISO)
 
Think like a Digital Curator
Think like a Digital CuratorThink like a Digital Curator
Think like a Digital Curator
DigitalLibraryServices
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
Christophe Guéret
 
Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"
National Information Standards Organization (NISO)
 
RDAP 15 Data Management Outreach for the Humanities: A University of Illinois...
RDAP 15 Data Management Outreach for the Humanities: A University of Illinois...RDAP 15 Data Management Outreach for the Humanities: A University of Illinois...
RDAP 15 Data Management Outreach for the Humanities: A University of Illinois...
ASIS&T
 
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
ASIS&T
 
Washington Linked Data Authority Service at University of Houston
Washington Linked Data Authority Service at University of HoustonWashington Linked Data Authority Service at University of Houston
Washington Linked Data Authority Service at University of Houston
National Information Standards Organization (NISO)
 
Day in the life of a data librarian [presentation for ANU 23Things group]
Day in the life of a data librarian [presentation for ANU 23Things group]Day in the life of a data librarian [presentation for ANU 23Things group]
Day in the life of a data librarian [presentation for ANU 23Things group]
Jane Frazier
 
Data 101: A Gentle Introduction
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle Introduction
Hamilton Public Library
 

What's hot (20)

Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Putnam Data Quality and the IR
Putnam Data Quality and the IRPutnam Data Quality and the IR
Putnam Data Quality and the IR
 
The liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycleThe liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycle
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
 
Slides | Targeting the librarian’s role in research services
Slides | Targeting the librarian’s role in research servicesSlides | Targeting the librarian’s role in research services
Slides | Targeting the librarian’s role in research services
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of Utah
 
Springer "The Research Data Landscape: Beyond Filling Gaps"
Springer "The Research Data Landscape: Beyond Filling Gaps"Springer "The Research Data Landscape: Beyond Filling Gaps"
Springer "The Research Data Landscape: Beyond Filling Gaps"
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libs
 
Lee "Supporting Research Data is a Group Effort"
Lee "Supporting Research Data is a Group Effort"Lee "Supporting Research Data is a Group Effort"
Lee "Supporting Research Data is a Group Effort"
 
SLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research supportSLIDES | 12 time-saving tips for research support
SLIDES | 12 time-saving tips for research support
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Think like a Digital Curator
Think like a Digital CuratorThink like a Digital Curator
Think like a Digital Curator
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"
 
RDAP 15 Data Management Outreach for the Humanities: A University of Illinois...
RDAP 15 Data Management Outreach for the Humanities: A University of Illinois...RDAP 15 Data Management Outreach for the Humanities: A University of Illinois...
RDAP 15 Data Management Outreach for the Humanities: A University of Illinois...
 
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
 
Washington Linked Data Authority Service at University of Houston
Washington Linked Data Authority Service at University of HoustonWashington Linked Data Authority Service at University of Houston
Washington Linked Data Authority Service at University of Houston
 
Day in the life of a data librarian [presentation for ANU 23Things group]
Day in the life of a data librarian [presentation for ANU 23Things group]Day in the life of a data librarian [presentation for ANU 23Things group]
Day in the life of a data librarian [presentation for ANU 23Things group]
 
Data 101: A Gentle Introduction
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle Introduction
 

Similar to Labou "Data Science and the Library at UC San Diego"

Intro to dh data management
Intro to dh data management Intro to dh data management
Intro to dh data management
Rachel Di Cresce
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
UpXAcademy
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
 
Feb.2016 Demystifying Digital Humanities - Workshop 3
Feb.2016 Demystifying Digital Humanities - Workshop 3Feb.2016 Demystifying Digital Humanities - Workshop 3
Feb.2016 Demystifying Digital Humanities - Workshop 3
Paige Morgan
 
01-Introduction.pdf
01-Introduction.pdf01-Introduction.pdf
01-Introduction.pdf
ngVnThng12
 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Maninda Edirisooriya
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
Erin D. Foster
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
ENUG
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
Melissa Hornbostel
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
SugumarSarDurai
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
S P Sajjan
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
CrowdFlower
 
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Adaryl "Bob" Wakefield, MBA
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...
Jenny Mitcham
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
Lynne Thomas
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
Shree Shree
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
Yannick Pouliot
 
Open Sesame (and other open movements)
Open Sesame (and other open movements)Open Sesame (and other open movements)
Open Sesame (and other open movements)
Dorothea Salo
 
Data Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah GuidoData Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah Guido
Bitly
 

Similar to Labou "Data Science and the Library at UC San Diego" (20)

Intro to dh data management
Intro to dh data management Intro to dh data management
Intro to dh data management
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
Feb.2016 Demystifying Digital Humanities - Workshop 3
Feb.2016 Demystifying Digital Humanities - Workshop 3Feb.2016 Demystifying Digital Humanities - Workshop 3
Feb.2016 Demystifying Digital Humanities - Workshop 3
 
01-Introduction.pdf
01-Introduction.pdf01-Introduction.pdf
01-Introduction.pdf
 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Open Sesame (and other open movements)
Open Sesame (and other open movements)Open Sesame (and other open movements)
Open Sesame (and other open movements)
 
Data Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah GuidoData Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah Guido
 

More from National Information Standards Organization (NISO)

Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
National Information Standards Organization (NISO)
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
National Information Standards Organization (NISO)
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
National Information Standards Organization (NISO)
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
National Information Standards Organization (NISO)
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
National Information Standards Organization (NISO)
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
National Information Standards Organization (NISO)
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
National Information Standards Organization (NISO)
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
National Information Standards Organization (NISO)
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
National Information Standards Organization (NISO)
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
National Information Standards Organization (NISO)
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
National Information Standards Organization (NISO)
 

More from National Information Standards Organization (NISO) (20)

Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 

Recently uploaded

How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 

Labou "Data Science and the Library at UC San Diego"

  • 1. DATA SCIENCE AND THE LIBRARY AT UC SAN DIEGO STEPHANIE LABOU DATA SCIENCE LIBRARIAN MARCH 18, 2020 NISO WEBINAR
  • 2. SOME CONTEXT • University of California San Diego • R1 university • ~39,000 students • Data Science Librarian = Data Librarian + • I don’t have a library degree or any previous library experience • But I do have lots of experience
  • 3. TODAY’S TOPIC “This roundtable discussion will focus on the on- going need for information professionals to be well-versed in data science skills in order to successfully support the work of students, scholars and other professionals.” “…additional tools or support are needed for information professionals as they extract, wrangle, analyze and present data? “
  • 4. WHAT IS DATA SCIENCE?
  • 5. LET’S TALK SEMANTICS • Data science = artificial intelligence (AI), deep learning, machine learning (ML), neural networks, high performance computing (HPC) • Data science = data cleaning and manipulation, using code to automate data tasks, data at “big enough” scale
  • 6. DATA SCIENCE AS A SUPPORT AREA
  • 7. MY ROLE • Questions about: • 1) Looking for specific data about X • What does data science – and other domains leveraging data science methodologies – need? Data! • 2) Have data – now what? • Makes up the vast majority of support
  • 8. THE COMMON THREAD: COMPUTING • Questions about using data in compute-heavy ways • Reading in and formatting data in R/Python • Working with non-traditional data formats • API access, web scraping • Access to additional resources for large (TB) datasets • Data & GIS Lab • Using other platforms related to coding, like GitHub, Jupyter
  • 9. WHAT SKILLS DO I NEED FOR THIS? • Data life cycle 101 (find, manage, analyze, preserve, etc.) • For data science support, need knowledge of at least one programming language • Concepts transfer between languages • My path: self-taught! • Cons: this is the long and rocky path • Pros: forced early on to develop excellent problem-solving skills
  • 10. DO WE ALL NEED TO LEARN “DATA SCIENCE”? • In my opinion: no (but it depends) • What are the support needs? • Knowing “enough” goes a long way • A handful of functions for a subset of topics (mostly data cleaning and manipulation in an automated platform) goes a long way • More important to know where to find help, think through how to approach a problem
  • 11. SO WHAT SHOULD WE DO? • Skilling up existing employees • Library Carpentry, etc. • “Know just enough to be dangerous” • Hiring non-library for new/adapted roles • Aka, my experience • In-the-field skillset is valuable; higher level of support • Outsourcing – collaborating with other groups on campus • IT, other computing groups
  • 12. DATA SCIENCE WITHIN THE LIBRARY
  • 13. EXAMPLE PROJECTS • Things we’ve done • Python scripts to automate parts of metadata ingest into system • OpenRefine for metadata cleaning • What we’d like to do • Automate scraping DataCite • Perhaps APIs?
  • 14. GUIDING PRINCIPLES • Look for problems where data science methodologies could be the solution • Could this manual process be automated? (coding) • Could we better assess our metrics? (analytics) • Could we better display this info for findability? (visualization) • Not “fancy solution in search of a problem” • Data science for the sake of data science is just more work for everyone
  • 15. OTHER POPULAR TOPICS • Collections as data • Making existing collections more accessible for data science topics • Text mining, natural language processing, etc. • Data as collections • Once again: What does data science need? Data! • Data collections/guides as high value, high use
  • 16. LESSONS LEARNED • Adaptability/flexibility • Software changes but best practices remain (and get better) • This is a natural fit for the library! • Building infrastructure today that will handle tomorrow’s needs • Collaboration is key • Within-library and campus partners

Editor's Notes

  1. Hello, my name is Stephanie Labou and today I’m going to be talking with you about data science and the library at UC San Diego.
  2. I want to start with some background information, since it will help contextualize my perspective. I’m at UC San Diego, which is a large R1 university – meaning we are doctoral granting with very high research activity. We have a medical school and a business school and about 39,000 students. UCSD had a Data Librarian for decades, but the position was recently rebranded as Data Science Librarian. I’ve been here almost two years and I’m the inaugural “data science librarian”. I do want to mention: I don’t have a library degree and this job is my first job working in a library. But! I do have a master’s degree and before this job, I worked for 3 years as a data manager and research assistant with a large interdisciplinary environmental research group. So I have lots of experience working with data and – crucially – with scientific programming.
  3. So this is today’s topic. I wanted to highlight a few phrases in particular because these guided how I put together this talk. Of all the topics, I want to focus on two in particular: First, how can information professionals be well-versed in data science skills in order to fill a support role, and second, what tools do information professionals need to work with data moving forward? I’ve split these into the two components – outward vs inward – because I think the skill sets are complementary, but not necessarily the same.
  4. To start, I want to take a step back because all this is predicated on the term “data science”. So what is “data science”? Well, it’s got a lot going on. You’ve maybe seen one of these types of data science venn diagrams and you can see that data science is a large, often poorly defined, multidisciplinary, and ever-evolving field. https://www.kdnuggets.com/2016/10/battle-data-science-venn-diagrams.html
  5. To be blunt, there’s a lot of hype about data science. I see two flavors of data science: the “hard core” stuff like AI and deep learning and the aspects of data science that are really like computer science engineering. And then there’s the rest of data science – which, to be clear, I love! – which are what I think more of as modern data literacy skills. This is being able to work with data. You may have heard the statistic that 80% of a data scientist’s work is data cleaning, and it’s true. I don’t mind the term “data science” for these kinds of skills. I think they’re incredibly important and they are, in a real sense, the “science of data”. This is what I’m going to focus on in this talk. I wanted to touch on this briefly because I also think the term “data science” has a lot of baggage. It can intimidate people. It can be stressful to think “oh no, now I’ve got to learn machine learning and AI and it seems like such a big leap”. And I think being clear about what we mean when we say “how are libraries using data science, what should librarians know about data science” is not just helpful, but necessary in terms of setting expectations and goals.
  6. So let’s start by talking about data science as a support area – how can we support students and researchers.
  7. For this, I’ll talk about my role specifically. What kinds of questions does the data science librarian get? Well, a lot of them are about finding data. Because what does data science methodologies need? Data! For the other type of question, it’s really about how do we extend the support we provide for data, to data science.
  8. And the common thread here is computing. These questions – coming from pretty much all disciplines on campus – are about using data in compute-heavy ways. They’re about working with data in R or Python, or working with non-traditional data formats like JSON or HTML of netCDF, which was mentioned in the webinar last week – formats that we new to people or disciplines that are used to working with spreadsheets. It’s questions about API use and web scraping – about accessing and leveraging the massive amounts of data that are available out there. It also means that sometimes, a traditional laptop isn’t going to cut it. The library at UCSD has a Data & GIS Lab and I see it as filling the space between “I can do what I need on my laptop” and “I need a supercomputer”. This is for large – even TB – datasets and our computers have more memory and processing power than a laptop. So sometimes the “support” is providing hardware, as well as software and software help. It is this niche that’s not computer science, but is about reproducible research, research automation, scientific programming. Which means I also spend a lot of time talking about best practices and how to get started with these things, because usually people aren’t coming at this with a background in computer science. Often, they’ve never taken a single programming class, or maybe they’ve taken a quick bootcamp.
  9. Ok! So, considering that that’s my purview, what skills do I need to do this? Obviously, I needed to have a strong grasp of the entire data life cycle. It has also been crucial for me that I had deep knowledge of at least one programming language. I consider myself quite experienced with the software program R, and I know enough to be dangerous in Python and Stata. But, a lot of that additional platform knowledge entails a lot of Googling – I know what I want to do, and I understand the conceptual framework or order of operations to get there, in a programming sense, but I may not know the exact syntax. But I can Google that. And I’m self-taught when it comes to programming. I took a stats class in grad school that used R, but everything else I learned on-the-job in my previous position. There’s nothing like getting thrown in the deep end to make you learn fast. Plus, learning project-by-project also meant that I picked up the skills that were most useful first, rather than starting from fundamentals. Whether this was a good thing is debateable – I know I’m missing some building blocks of basic computer science knowledge – but it rarely causes problems.
  10. So, do we all need to learn data science to be able to provide data science support, in this sense? Well, probably not, but it depends on what level of support you want to be able to provide, which in turns depends on what your patron needs are. We’re a large, STEM-heavy campus, with a data science institute and major, so it makes sense for us to be able to provide this deep level of support, which includes code support from someone who is not only conversant, but experienced with at least one programming language. But, being conversant is often enough. The important thing is the conceptual framework of how to approach a problem. For instance, if I need to present data, how can I get data from format and structure of type X to type Y? Breaking this down into steps: first, I need to convert my character dates to date formats, I want to have columns of ABC which are currently in row format, etc. Thinking through the workflow is what helps the most. And this is something that comes from experience working in a programming language but a little goes a long way. Long vs wide data, the concept of grouping data, etc. – a lot of basic data literacy, but in this programming/data science context. And for other librarians, I would say that this is just one more reference area. As a reference area, it’s more about knowing where/what to search: knowing the names of some common platforms, maybe the names of some packages, and where to find help.
  11. So, this is the big question and again, depends: we don’t all need to learn everything, but it is helpful to know a bit. And the level of support your organization can provide will increase with more in-house knowledge. The first option is, of course, skilling up existing employees. I just finished saying that we don’t all need to become full-fledged data scientists, and I believe it! But, I do think having the exposure to it can definitely help provide more in-depth support for patrons. This is a popular topic, not just within data science, but across all domains. So how can libraries scale up their support of data, to support for data science? The Carpentries organization – an international organization with free curriculum online – has a curriculum for Library Carpentry, which is these kinds of skills for librarians. It covers a variety of different platforms for automating tasks, including R and Python, as well as command line and others, as well as how to get into that mindset of working with data at scale in an automated fashion. We’ve run one of these workshops here recently and it was quite popular. Another option is my experience: bringing a non-library trained professional in. It has been an amazing fit for me – libraries is clearly where I belong – but I know this isn’t always a popular option. But until MLIS curriculum changes to incorporate more of this – and if it should is another conversation; no one program can prepare one person with every skill – this is a definite option to hit the ground running with providing a deeper level of support. I had that in-the-field skillset so I can work with students and faculty at a higher level, from recommending specific packages, to teaching workshops, to reviewing code. The third option I see is outsourcing these duties to another group, likely one on campus. This may be the IT group, or a research facilitator group, or another group. This does assume though, that these groups (a) exist and (b) are willing to take this one, which may be not the case. I’m fortunate that at UCSD we have a vibrant campus-wide collaboration for all things data science and we can each tackle a component that fits most naturally. For instance, central IT runs an online virtual machine, in essence, that students can use to run GPU-intensive calculations. And honestly, I’m relieved that that’s not something I need to worry about – it exists and it’s not my department. But it does exist!
  12. Ok, so moving on to data science within the library. This is less about supporting patrons and more about using data science to enhance our own work.
  13. There’s been a lot of talk about data science methodologies in libraries, but – and correct me if I’m wrong – I haven’t yet seen more than case studies from more university libraries. That is, no one has yet gone 100% data science and revamped their entire workflows. But, please let me know if I’m wrong here! What I have seen other groups do, and what our library has done, is implement “data science” methodologies for certain projects. So for example we’ve created Python scrips to automate parts of metadata ingest into our internal system for certain projects. The cataloguers seem quite keen on using OpenRefine, which is an open source platform for automated data cleaning. We’ve talked about how we would like to figure out how to automate scraping DataCite and maybe leverage API capabilities more than we currently do. We’re definitely still in the early stages – we have some folks in-house who know Python and can do these types of things, and I’ve also had some of the students in the Data & GIS labs work on some Python scripts for us as well. I’d love to get some data science majors hired in at some point, but that’s for the future and for specific projects.
  14. The main take away, I think, from what we’ve done so far, is that using data science in libraries should be a case of using new methodologies because they solve a problem faster or better. Not because they’re flashy and everyone is talking about it and the higher ups are suddenly expecting us to “do data science”. It’s really been about automating manual processes, or talking about how we could better assess our own metrics, or better display information. It is targeted and it is on a case-by-case basis by people who feel comfortable with it and are eager to learn and implement these projects.
  15. And from other libraries and universities, I’ve seen some great “collections as data” projects, which entail making existing collections more accessible for data science topics. So using a sprinkle of data science and FAIR principles – findable, accessible, interoperable, and reusable – to make existing collections more likely to be used for these data science types of projects. Especially for natural language processing or text mining, this is a huge opportunity. I also think we should be thinking a lot about data as collections. I said it before and I’ll say it again – what does data science methodologies need, in every domain? They need data! We have a research data collection here at UCSD, as well as data we have purchased or licensed, and I want to see those collections, or guide, as high value and high use, even more so than they currently are. This is a chance for reuse of data to really speed up in certain fields and I want the library to be at the center of it.
  16. I want to close with a few takeaways. Adaptability and flexibility will be key, in terms of data science and libraries moving forward. There’s a lot of hype and focus on specific platforms, or languages, or packages, or what have you, but these things will absolutely change. What won’t change is best practices. How to manage data and code and information. How to structure data and projects. Where to find data, how to cite it, when to reuse or not reuse data. These are all areas that absolutely fall within the library’s purview and where the library can excel. This is modern data literacy and by focusing on those aspects, the library can not only remain relevant, but provide much-needed guidance. This also fits when talking about infrastructure: hardware, software, and people. We’ll have to adapt and we’ll have to grow. It can be scary. But it can also be a good opportunity. So planning forward when talking about what spaces we want, what staff we want, and what hardware/software capacity and capability we want. Finally, collaboration. I’m the point of contact for data and data science, but I work closely with our subject libraries, other groups in the library, and our library IT. I work with campus IT, our research facilitators, and the Halıcıoğlu Data Science Institute here. Data science is way too big – not just in terms of TB scale and hardware, but also that it’s now in basically every domain across campus – for any one person or group to handle. Remember the Venn diagram? It’s got a lot going on! Collaboration is key and I think a lot of the reasons my position in particular has been so successful is that there was a network of people I could work with, which has been invaluable.
  17. Thank you very much for listening. I’d be happy to take questions now and you can of course email me with questions about this or anything else related to data science. My email is here: slabou@ucsd.edu. Thanks again!