SlideShare a Scribd company logo
A Case Study in User
Needs for Text Analysis
Jessica Smith
Manager, Discovery and Research at JSTOR Labs
January 26, 2022
We are a not-for-profit with a mission to improve
access to knowledge and education for people
around the world. We believe education is key to
the well-being of individuals and society, and we
work to make it more effective and affordable.
2
Text analysis/mining at ITHAKA
3
JSTOR + Portico =
● ~40,000 journal runs
● ~300,000 books
Started in 2017, ramped up in 2019
Background Research
4
● Contextual inquiry-esque interviews with librarians, publishers, and
researchers
● Customer service conversations about JSTOR DfR (Data for
Research)
● Blogs from experts like Ted Underwood and Andrew Goldstone
5
Some Findings at This Stage
6
A range of abilities who work completely differently
● Experts who mine text with relative ease
● Learners who can mine text with, usually, 1-on-1 support
● Absolutely beginners who don’t know where to start
. . . and a tension between the groups
The Ivory Tower vs. the Silver Platter
Workflow - Sourcing
7
● “Really frustrating to get text ”
● “Data access is the most limiting thing ”
● “I don’t have time for JSTOR yet ”
● This step can take “more than a year ”
● Often not free - might need grant funding
Workflow - Structuring
8
● I’ve got an “unknown problem” with an “unknown solution ”
● “You’ve got to expect that manipulations will be required . . . always
bespoke”
● “It’s complicated”
● “There’s no single path”
● “We need interoperability and simplicity”
Workflow - Mining and Interpreting
9
In all the steps, it’s harder if you don’t speak this language and TDM so far
is Anglocentric
To publish, they may need preservation of original data and steps to
recreate results
Scholars don’t feel comfortable publishing without understanding, and
being able to defend, the algorithms used
Meanwhile, beginners . . .
10
● “Time and anxiety keep me from learning it ”
● “Faculty are too busy to do something new ”
● “DH is a club and it isn’t inclusive ”
● “The real loss isn’t the answers you don’t get, but the questions you
don’t even know to ask ”
● “You can only ask research questions you can imagine the answer to”
There are nice clickable GUI tools like Voyant for analysis, but only meets
a few needs
11
And teachers say . . .
12
● “For the disciplines I work in, people are very unfamiliar with numbers
and can’t even look at a spreadsheet ”
● “Most people give up before they start ”
● “It wasn’t a trainwreck. It was worse than that ”
● The naive view is to think this is going to be “like Tableau ”
When teaching someone, ask
1. Do they have the dataset in mind?
2. What’s their tech skill? (e.g. can they scrape)
3. What format do they want the data in?
4. What - if any - are their methods, tools, and research strategy?
So, what did we learn?
We Learned
14
Sourcing and structuring are the hardest steps
Learning text analysis is hard and Python and R are key
Can humanists learn? Do they want to?
Hard to teach a text mining class at scale
Beginners don’t know what’s possible
The term digital humanities has been popular precisely because it
promises that all those projects can still be contained in the
humanities. The implicit pitch is something like this: ‘You won’t
need a whole statistics course. Come to our two-hour workshop on
topic models instead. You can always find a statistician to
collaborate with.’
I understand why digital humanists said that kind of thing eight
years ago. We didn’t want to frighten people away. If you write
‘Learn Another Discipline’ on your welcome mat, you may not get
many visitors. But a deceptively gentle welcome mat, followed
by a trapdoor, is not really more welcoming.
Ted Underwood, January 2018
Then, what did we build?
17
Constellate
18
Constellate
19
Jessica Smith
Manager, Discovery and Research
+1 734 780 2499
jessica.smith@ithaka.org
Thank you
301 E. Liberty
Suite 250
Ann Arbor, MI 48104 labs.jstor.org

More Related Content

Similar to Smith "A Case Study in User Needs for Text Analysis"

Peer-to-peer: involving a student researcher to uncover students’ real inform...
Peer-to-peer: involving a student researcher to uncover students’ real inform...Peer-to-peer: involving a student researcher to uncover students’ real inform...
Peer-to-peer: involving a student researcher to uncover students’ real inform...
IL Group (CILIP Information Literacy Group)
 
CSSE Coding with Scratch presentation June 2019
CSSE Coding with Scratch presentation June 2019CSSE Coding with Scratch presentation June 2019
CSSE Coding with Scratch presentation June 2019
Michael Nantais
 
build@mercari-week7-mark-talk
build@mercari-week7-mark-talkbuild@mercari-week7-mark-talk
build@mercari-week7-mark-talk
Mark Hahn
 
LaTICE 2016: Learner-Centered Design of Computing Education for All
LaTICE 2016: Learner-Centered Design of Computing Education for AllLaTICE 2016: Learner-Centered Design of Computing Education for All
LaTICE 2016: Learner-Centered Design of Computing Education for All
Mark Guzdial
 
Howard, H., Phillips, M., Wang, J. & Zwicky, D. Workplace Information Literac...
Howard, H., Phillips, M., Wang, J. & Zwicky, D. Workplace Information Literac...Howard, H., Phillips, M., Wang, J. & Zwicky, D. Workplace Information Literac...
Howard, H., Phillips, M., Wang, J. & Zwicky, D. Workplace Information Literac...
IL Group (CILIP Information Literacy Group)
 
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
Analytics India Magazine
 
Dmdh workshop 5 slides
Dmdh   workshop 5 slidesDmdh   workshop 5 slides
Dmdh workshop 5 slides
Paige Morgan
 
Highlights from Just Enough Research by Erika Hall - User Experience Abu Dhab...
Highlights from Just Enough Research by Erika Hall - User Experience Abu Dhab...Highlights from Just Enough Research by Erika Hall - User Experience Abu Dhab...
Highlights from Just Enough Research by Erika Hall - User Experience Abu Dhab...
Jonathan Steingiesser
 
Digital Art History: From Practice to Publication
Digital Art History: From Practice to PublicationDigital Art History: From Practice to Publication
Digital Art History: From Practice to Publication
Susan Edwards
 
ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011
tpgoddard
 
Nsdc zen and the art of ppt short presentation
Nsdc zen and the art of ppt short presentationNsdc zen and the art of ppt short presentation
Nsdc zen and the art of ppt short presentation
Angela Peery
 
Wu Jiajin UXID2014 Researching User’ Experience
Wu Jiajin UXID2014 Researching User’ ExperienceWu Jiajin UXID2014 Researching User’ Experience
Wu Jiajin UXID2014 Researching User’ Experience
UX Indonesia
 
Preserving the Craft of Thinking
Preserving the Craft of ThinkingPreserving the Craft of Thinking
Preserving the Craft of Thinking
Jodi Leo
 
AI as a career skill.MCU AI conference slides.may 11.2024.pptx
AI as a career skill.MCU AI conference slides.may 11.2024.pptxAI as a career skill.MCU AI conference slides.may 11.2024.pptx
AI as a career skill.MCU AI conference slides.may 11.2024.pptx
Nigel Daly
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
MartinFrigaard
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
Thinkful
 
A picture is worth a thousand words_Mathilda Eloff
A picture is worth a thousand words_Mathilda EloffA picture is worth a thousand words_Mathilda Eloff
A picture is worth a thousand words_Mathilda Eloff
Mathilda Eloff
 
Tech
TechTech
2013 LIANZA Keynote: River's End
2013 LIANZA Keynote: River's End2013 LIANZA Keynote: River's End
2013 LIANZA Keynote: River's End
gnat
 
Technology Challenges for Computer Science Students
Technology Challenges for Computer Science StudentsTechnology Challenges for Computer Science Students
Technology Challenges for Computer Science Students
shenbagavallijanarth
 

Similar to Smith "A Case Study in User Needs for Text Analysis" (20)

Peer-to-peer: involving a student researcher to uncover students’ real inform...
Peer-to-peer: involving a student researcher to uncover students’ real inform...Peer-to-peer: involving a student researcher to uncover students’ real inform...
Peer-to-peer: involving a student researcher to uncover students’ real inform...
 
CSSE Coding with Scratch presentation June 2019
CSSE Coding with Scratch presentation June 2019CSSE Coding with Scratch presentation June 2019
CSSE Coding with Scratch presentation June 2019
 
build@mercari-week7-mark-talk
build@mercari-week7-mark-talkbuild@mercari-week7-mark-talk
build@mercari-week7-mark-talk
 
LaTICE 2016: Learner-Centered Design of Computing Education for All
LaTICE 2016: Learner-Centered Design of Computing Education for AllLaTICE 2016: Learner-Centered Design of Computing Education for All
LaTICE 2016: Learner-Centered Design of Computing Education for All
 
Howard, H., Phillips, M., Wang, J. & Zwicky, D. Workplace Information Literac...
Howard, H., Phillips, M., Wang, J. & Zwicky, D. Workplace Information Literac...Howard, H., Phillips, M., Wang, J. & Zwicky, D. Workplace Information Literac...
Howard, H., Phillips, M., Wang, J. & Zwicky, D. Workplace Information Literac...
 
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
 
Dmdh workshop 5 slides
Dmdh   workshop 5 slidesDmdh   workshop 5 slides
Dmdh workshop 5 slides
 
Highlights from Just Enough Research by Erika Hall - User Experience Abu Dhab...
Highlights from Just Enough Research by Erika Hall - User Experience Abu Dhab...Highlights from Just Enough Research by Erika Hall - User Experience Abu Dhab...
Highlights from Just Enough Research by Erika Hall - User Experience Abu Dhab...
 
Digital Art History: From Practice to Publication
Digital Art History: From Practice to PublicationDigital Art History: From Practice to Publication
Digital Art History: From Practice to Publication
 
ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011
 
Nsdc zen and the art of ppt short presentation
Nsdc zen and the art of ppt short presentationNsdc zen and the art of ppt short presentation
Nsdc zen and the art of ppt short presentation
 
Wu Jiajin UXID2014 Researching User’ Experience
Wu Jiajin UXID2014 Researching User’ ExperienceWu Jiajin UXID2014 Researching User’ Experience
Wu Jiajin UXID2014 Researching User’ Experience
 
Preserving the Craft of Thinking
Preserving the Craft of ThinkingPreserving the Craft of Thinking
Preserving the Craft of Thinking
 
AI as a career skill.MCU AI conference slides.may 11.2024.pptx
AI as a career skill.MCU AI conference slides.may 11.2024.pptxAI as a career skill.MCU AI conference slides.may 11.2024.pptx
AI as a career skill.MCU AI conference slides.may 11.2024.pptx
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
A picture is worth a thousand words_Mathilda Eloff
A picture is worth a thousand words_Mathilda EloffA picture is worth a thousand words_Mathilda Eloff
A picture is worth a thousand words_Mathilda Eloff
 
Tech
TechTech
Tech
 
2013 LIANZA Keynote: River's End
2013 LIANZA Keynote: River's End2013 LIANZA Keynote: River's End
2013 LIANZA Keynote: River's End
 
Technology Challenges for Computer Science Students
Technology Challenges for Computer Science StudentsTechnology Challenges for Computer Science Students
Technology Challenges for Computer Science Students
 

More from National Information Standards Organization (NISO)

Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
National Information Standards Organization (NISO)
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
National Information Standards Organization (NISO)
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
National Information Standards Organization (NISO)
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
National Information Standards Organization (NISO)
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
National Information Standards Organization (NISO)
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
National Information Standards Organization (NISO)
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
National Information Standards Organization (NISO)
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
National Information Standards Organization (NISO)
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
National Information Standards Organization (NISO)
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
National Information Standards Organization (NISO)
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
National Information Standards Organization (NISO)
 

More from National Information Standards Organization (NISO) (20)

Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 

Recently uploaded

ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
dot55audits
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
ssuser13ffe4
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
Chevonnese Chevers Whyte, MBA, B.Sc.
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 

Recently uploaded (20)

ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
Constructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective CommunicationConstructing Your Course Container for Effective Communication
Constructing Your Course Container for Effective Communication
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 

Smith "A Case Study in User Needs for Text Analysis"

  • 1. A Case Study in User Needs for Text Analysis Jessica Smith Manager, Discovery and Research at JSTOR Labs January 26, 2022
  • 2. We are a not-for-profit with a mission to improve access to knowledge and education for people around the world. We believe education is key to the well-being of individuals and society, and we work to make it more effective and affordable. 2
  • 3. Text analysis/mining at ITHAKA 3 JSTOR + Portico = ● ~40,000 journal runs ● ~300,000 books Started in 2017, ramped up in 2019
  • 4. Background Research 4 ● Contextual inquiry-esque interviews with librarians, publishers, and researchers ● Customer service conversations about JSTOR DfR (Data for Research) ● Blogs from experts like Ted Underwood and Andrew Goldstone
  • 5. 5
  • 6. Some Findings at This Stage 6 A range of abilities who work completely differently ● Experts who mine text with relative ease ● Learners who can mine text with, usually, 1-on-1 support ● Absolutely beginners who don’t know where to start . . . and a tension between the groups The Ivory Tower vs. the Silver Platter
  • 7. Workflow - Sourcing 7 ● “Really frustrating to get text ” ● “Data access is the most limiting thing ” ● “I don’t have time for JSTOR yet ” ● This step can take “more than a year ” ● Often not free - might need grant funding
  • 8. Workflow - Structuring 8 ● I’ve got an “unknown problem” with an “unknown solution ” ● “You’ve got to expect that manipulations will be required . . . always bespoke” ● “It’s complicated” ● “There’s no single path” ● “We need interoperability and simplicity”
  • 9. Workflow - Mining and Interpreting 9 In all the steps, it’s harder if you don’t speak this language and TDM so far is Anglocentric To publish, they may need preservation of original data and steps to recreate results Scholars don’t feel comfortable publishing without understanding, and being able to defend, the algorithms used
  • 10. Meanwhile, beginners . . . 10 ● “Time and anxiety keep me from learning it ” ● “Faculty are too busy to do something new ” ● “DH is a club and it isn’t inclusive ” ● “The real loss isn’t the answers you don’t get, but the questions you don’t even know to ask ” ● “You can only ask research questions you can imagine the answer to” There are nice clickable GUI tools like Voyant for analysis, but only meets a few needs
  • 11. 11
  • 12. And teachers say . . . 12 ● “For the disciplines I work in, people are very unfamiliar with numbers and can’t even look at a spreadsheet ” ● “Most people give up before they start ” ● “It wasn’t a trainwreck. It was worse than that ” ● The naive view is to think this is going to be “like Tableau ” When teaching someone, ask 1. Do they have the dataset in mind? 2. What’s their tech skill? (e.g. can they scrape) 3. What format do they want the data in? 4. What - if any - are their methods, tools, and research strategy?
  • 13. So, what did we learn?
  • 14. We Learned 14 Sourcing and structuring are the hardest steps Learning text analysis is hard and Python and R are key Can humanists learn? Do they want to? Hard to teach a text mining class at scale Beginners don’t know what’s possible
  • 15. The term digital humanities has been popular precisely because it promises that all those projects can still be contained in the humanities. The implicit pitch is something like this: ‘You won’t need a whole statistics course. Come to our two-hour workshop on topic models instead. You can always find a statistician to collaborate with.’ I understand why digital humanists said that kind of thing eight years ago. We didn’t want to frighten people away. If you write ‘Learn Another Discipline’ on your welcome mat, you may not get many visitors. But a deceptively gentle welcome mat, followed by a trapdoor, is not really more welcoming. Ted Underwood, January 2018
  • 16. Then, what did we build?
  • 18. 18
  • 20. Jessica Smith Manager, Discovery and Research +1 734 780 2499 jessica.smith@ithaka.org Thank you 301 E. Liberty Suite 250 Ann Arbor, MI 48104 labs.jstor.org