SlideShare a Scribd company logo
1 of 44
Data for the Humanities
February 21, 2017
Rafia Mirza
Digital Humanities Librarian
rafia@uta.edu @librarianrafia
Peace Ossom Williamson
Director of Research Data Services
peace@uta.edu @123POW
Learning Outcomes
• Understand the use of data in answering humanities research
questions
• Understand descriptive metadata and the rationale for its use
• Recognize areas of potential bias and ambiguous or misleading
representation in reporting
What are data?
“All content in digital formats can
be characterized as structured or
unstructured data.”
Introduction to Digital Humanities: Concepts, Methods, and Tutorials
Examples:
•Audio
•Notes
•Geospatial
•Textual
Data are more than numbers
https://www.lib.umn.edu/datamanagement/whatdata
What is data literacy?
the ability to read,
create, utilize,
communicate, and
criticize data.
Data Literacy
data quality
accessibility, usability,
and
understandability on
the basis of context,
providence, and
metadata
Data Literacy
data structure
of different objects in
a way that works to
evaluate developing
hypotheses
Data Literacy
recognize
Research potential
be aware of
Research methods
understand
Context and provenience
Humanities Data Literacy
“Humanists have data, and
they need data skills.”
Digital Humanities Data Curation
Data in the Humanities
Types of Humanities Data
• Scholarly editions
• Text corpora
• Text with markup
• Thematic research collections
• Data with accompanying analysis or annotation
• Finding aids and other information maps, such as
bibliographies
Digital Humanities Data Curation Introduction
Big Data Digital Humanities vs.
Small Data Digital Humanities
• “Research in Big Data Digital Humanities focuses on large or dense
cultural datasets, which call for new processing and interpretation
methods”
• “..Small Data Digital Humanities regroup more focused works that do
not use massive data processing..”
• A map for big data research in digital humanities, Frédéric Kaplan
1. research the context:
know the data about the data (so meta!)
How to understand data
Data versus Metadata
Big? Smart? Clean? Messy? Data in the Humanities, Christof Schöch
Metadata Metadata Metadata Metadata
data data data data
data data data data
data data data data
data data data data
About this dataset:
Title: Metadata
Date Created: Metadata
Creator: Metadata
Methods Used: Metadata
2. research who the data is about
How to understand data
What are historical
contexts around their
language and style?
A note on data ethics.
Zine Librarians Code of Ethics
• “Zines are not like mass-distributed books. They are often self-
published and self-distributed, and sometimes printed in very
small runs, intended for a small audience. In addition, perzines
are by definition “personal”, and zinesters may feel different
about having their zines distributed in print than they would
about having them openly available on the internet or print.
This can be especially true in the case of “historical” zines in
library collections — for example, a teen girl writing a zine for
her close friends in 1994 may not want her zine distributed
online or in print 20 years later.”
• Via Zinelibraries
Ethics
• Choosing tools:
• Omeka CMS vs Mukurtu CMS
• Collecting data:
• Boston College Oral Histories
3. investigate the source
How to understand data
Recognizing uncertainty and bias
Data on killings in the Syrian conflict.
https://responsibledata.io/reflection-stories/uncertainty-
statistics/
Let’s investigate the source…
Recognizing uncertainty and bias
Sources include
• Syrian government
• Syrian Center for Statistics and Research
• Syrian Network for Human Rights
• Syrian Observatory for Human Rigets
and many more.
https://responsibledata.io/reflection-stories/uncertainty-
statistics/
there are lots of human decisions that go into
creating these statistics
without knowing how these deaths have
been coded, it’s difficult to trust in the
figures
4. highlight un/common data entries to gain
rough insights
How to understand data
Descriptive analysis
i.e., description of the
data from a sample
Quick descriptive statistics
•frequency
•rank from lowest to highest
•average (mean, median, mode)
•variability
Bivariate descriptive statistics
fancy way of saying
we are looking at two
variables at once
Hamlet Macbeth Othello
Similes 50 9 59
Metaphors 20 38 58
Total 70 47 117
Evaluating Comparison Methods
Correlation
most common way to
describe a relationship
between two measures
Finding Data
What type of data are you looking for?
List of Data Repositories
DH Toychest: Data Collections and Datasets
• Texts: HathiTrust Digital Library
• Spatial or numeric datasets: Data.gov
• Images: British Library Images
• Hybrid data sets: Digital Public Library of America
Via
What if the dataset you need
does not exist?
How to data
1. Determine what to say
2. Find/collect/create the data
you need
3. Wrangle!
4. Clean!
5. Do it many more times.
ID Religion Income Age Q1 Q2 Q3
26371 Jewish <$10K 19 Yes 6 20
26372 Atheist $50-75K 24 - 4 21
26373 Catholic $75-100K 56 Yes 3 21
26374 Withheld $75-100K 33 No 6 21
26375 Pentecostal withheld 49 Yes 8 20
26376 Jewish $40-50K 29 Yes 5 19
26377 Catholic $20-30K 37 No 4 22
http://vita.had.co.nz/papers/tidy-data.pdf
Tidy Data
Most common problems
• Column headers are values, not variable names.
• Multiple variables are stored in one column.
• Variables are stored in both rows and columns.
• Multiple types of observational units are stored in the same table.
• A single observational unit is stored in multiple tables
http://vita.had.co.nz/papers/tidy-data.pdf
if you torture data long enough,
it will confess to anything
How can a
visualization be
misleading?
What’s wrong?
A little less
dramatic
than you thought.
http://www.visualisingdata.com/2014/04/the-fine-line-
between-confusion-and-deception/
https://thesyriacampaign.org/
Open Data: Things to Consider
http://www.slideshare.net/libereurope/humanities-data-literacy-student-
perspective-on-digital-cultural-heritage-collections?qid=70bd86f2-10c5-43a6-
b053-56d264ca28ab&v=&b=&from_search=1
Recommended Reading / Viewing
“Numbers are Only Human” – Brian Root
“Ethical Principles of Psychologists and Code of Conduct” –
American Psychological Association
“On Not Looking: Ethics and Access in the Digital Humanities” –
Kimberly Cristen-Withey
Upcoming Workshops and Events
library.uta.edu/scholcomm
Rafia Mirza
rafia@uta.edu @librarianrafia
Peace Ossom Williamson
peace@uta.edu @123POW

More Related Content

What's hot

LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013
PrattSILS
 
LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014 LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014
PrattSILS
 
Pratt sils knowledge organization spring 2014
Pratt sils knowledge organization spring 2014Pratt sils knowledge organization spring 2014
Pratt sils knowledge organization spring 2014
PrattSILS
 

What's hot (20)

Building the Archive of DH Research
Building the Archive of DH ResearchBuilding the Archive of DH Research
Building the Archive of DH Research
 
Workset Creation for Scholarly Analysis Project presentation at CNI 2013
Workset Creation for Scholarly Analysis Project presentation at CNI 2013Workset Creation for Scholarly Analysis Project presentation at CNI 2013
Workset Creation for Scholarly Analysis Project presentation at CNI 2013
 
User Engagement with Digital Archives: A Case Study of Emblematica Online
User Engagement with Digital Archives: A Case Study of Emblematica Online User Engagement with Digital Archives: A Case Study of Emblematica Online
User Engagement with Digital Archives: A Case Study of Emblematica Online
 
Personal Learning Networks and Professional Learning Communities in Informati...
Personal Learning Networks and Professional Learning Communities in Informati...Personal Learning Networks and Professional Learning Communities in Informati...
Personal Learning Networks and Professional Learning Communities in Informati...
 
Alexander - Education in the Internet of Everything
Alexander - Education in the Internet of EverythingAlexander - Education in the Internet of Everything
Alexander - Education in the Internet of Everything
 
Sadler niso-apr13
Sadler niso-apr13Sadler niso-apr13
Sadler niso-apr13
 
Relationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeRelationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in Europe
 
Butterfly Hunt: On Collecting #mla14 Tweets (#mla15 #s398)
Butterfly Hunt: On Collecting #mla14 Tweets (#mla15 #s398)Butterfly Hunt: On Collecting #mla14 Tweets (#mla15 #s398)
Butterfly Hunt: On Collecting #mla14 Tweets (#mla15 #s398)
 
LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013
 
Using social media to address professional issues in LIS
Using social media to address professional issues in LISUsing social media to address professional issues in LIS
Using social media to address professional issues in LIS
 
LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014 LIS 653 Posters Fall 2014
LIS 653 Posters Fall 2014
 
Weisberg - Museums and the Internet of Things
Weisberg - Museums and the Internet of ThingsWeisberg - Museums and the Internet of Things
Weisberg - Museums and the Internet of Things
 
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...
 
Library Of The Future – An Academic Librarian
Library Of The Future – An Academic LibrarianLibrary Of The Future – An Academic Librarian
Library Of The Future – An Academic Librarian
 
Digital Public History and Collaborative Teaching Initiatives
Digital Public History and Collaborative Teaching InitiativesDigital Public History and Collaborative Teaching Initiatives
Digital Public History and Collaborative Teaching Initiatives
 
Electronic publishing
Electronic publishingElectronic publishing
Electronic publishing
 
Pratt sils knowledge organization spring 2014
Pratt sils knowledge organization spring 2014Pratt sils knowledge organization spring 2014
Pratt sils knowledge organization spring 2014
 
08 datasets
08 datasets08 datasets
08 datasets
 
LIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project postersLIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project posters
 
Digital Odyssey 2015 - Open Collections
Digital Odyssey 2015 - Open CollectionsDigital Odyssey 2015 - Open Collections
Digital Odyssey 2015 - Open Collections
 

Similar to Data for the Humanities

Marketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data InsideMarketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data Inside
Tony Hirst
 
Miscellaneous Info: The Digital Past, Present, Future
Miscellaneous Info: The Digital Past, Present, FutureMiscellaneous Info: The Digital Past, Present, Future
Miscellaneous Info: The Digital Past, Present, Future
Lee Cafferata
 
Cj 3901 transnational crime
Cj 3901 transnational crimeCj 3901 transnational crime
Cj 3901 transnational crime
Traciwm
 

Similar to Data for the Humanities (20)

Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
Marketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data InsideMarketing Gold for Libraries - The Data Inside
Marketing Gold for Libraries - The Data Inside
 
Miscellaneous Info: The Digital Past, Present, Future
Miscellaneous Info: The Digital Past, Present, FutureMiscellaneous Info: The Digital Past, Present, Future
Miscellaneous Info: The Digital Past, Present, Future
 
Big data for qualitative research by kathy a. mills (z lib.org)
Big data for qualitative research by kathy a. mills (z lib.org)Big data for qualitative research by kathy a. mills (z lib.org)
Big data for qualitative research by kathy a. mills (z lib.org)
 
Linked Data: opening Scotland’s library content to the world
Linked Data: opening Scotland’s library content to the world Linked Data: opening Scotland’s library content to the world
Linked Data: opening Scotland’s library content to the world
 
Steps for research process
Steps for research processSteps for research process
Steps for research process
 
The Power of Open Data!
The Power of Open Data!The Power of Open Data!
The Power of Open Data!
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveys
 
A Pedagogical Approach to Web Scale Discovery User Interface
A Pedagogical Approach to Web Scale Discovery User InterfaceA Pedagogical Approach to Web Scale Discovery User Interface
A Pedagogical Approach to Web Scale Discovery User Interface
 
Flames summer school 2016 slides
Flames summer school 2016 slidesFlames summer school 2016 slides
Flames summer school 2016 slides
 
Cj 3901 transnational crime
Cj 3901 transnational crimeCj 3901 transnational crime
Cj 3901 transnational crime
 
Envisioning Social Applications of Library Linked Data
Envisioning Social Applications of Library Linked DataEnvisioning Social Applications of Library Linked Data
Envisioning Social Applications of Library Linked Data
 
Privacy in the Digital Age, Helen Cullyer
Privacy in the Digital Age, Helen CullyerPrivacy in the Digital Age, Helen Cullyer
Privacy in the Digital Age, Helen Cullyer
 
Digital Humanities by Ingrid Thomson
Digital Humanities  by Ingrid ThomsonDigital Humanities  by Ingrid Thomson
Digital Humanities by Ingrid Thomson
 
Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015Digital Humanities - Conversation Starter 2015
Digital Humanities - Conversation Starter 2015
 
LSC Glasgow 061609
LSC Glasgow 061609LSC Glasgow 061609
LSC Glasgow 061609
 
The secret mission that people yearn to have libraries address
The secret mission that people yearn to have libraries addressThe secret mission that people yearn to have libraries address
The secret mission that people yearn to have libraries address
 
Digitization and public libraries
Digitization and public librariesDigitization and public libraries
Digitization and public libraries
 
Library Science Students and Digital Libraries
Library Science Students and Digital LibrariesLibrary Science Students and Digital Libraries
Library Science Students and Digital Libraries
 
Forty Years of the OTA
Forty Years of the OTAForty Years of the OTA
Forty Years of the OTA
 

More from librarianrafia

More from librarianrafia (20)

Know your author's rights
Know your author's rightsKnow your author's rights
Know your author's rights
 
Publishing in the digital humanities
Publishing in the digital humanitiesPublishing in the digital humanities
Publishing in the digital humanities
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101
 
Introduction to WordPress (blogging)
Introduction to WordPress (blogging)Introduction to WordPress (blogging)
Introduction to WordPress (blogging)
 
Digital project planning and pedagogy
Digital project planning and pedagogyDigital project planning and pedagogy
Digital project planning and pedagogy
 
Introduction To Wordpress
Introduction  To Wordpress Introduction  To Wordpress
Introduction To Wordpress
 
Digital Frontiers 2016: Memorandums of Understanding Workshop
Digital Frontiers 2016: Memorandums of Understanding Workshop  Digital Frontiers 2016: Memorandums of Understanding Workshop
Digital Frontiers 2016: Memorandums of Understanding Workshop
 
Create a (free) Wordpress Site
Create a (free) Wordpress SiteCreate a (free) Wordpress Site
Create a (free) Wordpress Site
 
Digitization for accessibility
Digitization for accessibilityDigitization for accessibility
Digitization for accessibility
 
CTLC Annual 2016 slides
CTLC Annual 2016 slides CTLC Annual 2016 slides
CTLC Annual 2016 slides
 
Digital Projects Outreach: A Challenge to Traditional Library Liaison Services
Digital Projects Outreach: A Challenge to Traditional Library Liaison Services Digital Projects Outreach: A Challenge to Traditional Library Liaison Services
Digital Projects Outreach: A Challenge to Traditional Library Liaison Services
 
Memorandum of Understanding Workshop: Creating a Process for Successful Digit...
Memorandum of Understanding Workshop: Creating a Process for Successful Digit...Memorandum of Understanding Workshop: Creating a Process for Successful Digit...
Memorandum of Understanding Workshop: Creating a Process for Successful Digit...
 
Digital humanities and libraries
Digital humanities and libraries Digital humanities and libraries
Digital humanities and libraries
 
Digital Humanities for Historians: An introduction
Digital Humanities for Historians: An introductionDigital Humanities for Historians: An introduction
Digital Humanities for Historians: An introduction
 
Using Omeka as a Gateway to Digital Projects
Using Omeka as a Gateway to Digital ProjectsUsing Omeka as a Gateway to Digital Projects
Using Omeka as a Gateway to Digital Projects
 
Introduction to Omeka
Introduction to OmekaIntroduction to Omeka
Introduction to Omeka
 
The Silver Age of Comics 1956-c.1970
The Silver Age of Comics 1956-c.1970The Silver Age of Comics 1956-c.1970
The Silver Age of Comics 1956-c.1970
 
The Golden Age of Comics c.1938-c.1950
The Golden Age of Comics c.1938-c.1950The Golden Age of Comics c.1938-c.1950
The Golden Age of Comics c.1938-c.1950
 
Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...
 
Digital Frontiers 2014: Developing Library Services for Digital Humanities & ...
Digital Frontiers 2014: Developing Library Services for Digital Humanities & ...Digital Frontiers 2014: Developing Library Services for Digital Humanities & ...
Digital Frontiers 2014: Developing Library Services for Digital Humanities & ...
 

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Recently uploaded (20)

Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 

Data for the Humanities

  • 1. Data for the Humanities February 21, 2017 Rafia Mirza Digital Humanities Librarian rafia@uta.edu @librarianrafia Peace Ossom Williamson Director of Research Data Services peace@uta.edu @123POW
  • 2. Learning Outcomes • Understand the use of data in answering humanities research questions • Understand descriptive metadata and the rationale for its use • Recognize areas of potential bias and ambiguous or misleading representation in reporting
  • 4. “All content in digital formats can be characterized as structured or unstructured data.” Introduction to Digital Humanities: Concepts, Methods, and Tutorials
  • 5. Examples: •Audio •Notes •Geospatial •Textual Data are more than numbers https://www.lib.umn.edu/datamanagement/whatdata
  • 6. What is data literacy?
  • 7. the ability to read, create, utilize, communicate, and criticize data. Data Literacy
  • 8. data quality accessibility, usability, and understandability on the basis of context, providence, and metadata Data Literacy
  • 9. data structure of different objects in a way that works to evaluate developing hypotheses Data Literacy
  • 10. recognize Research potential be aware of Research methods understand Context and provenience Humanities Data Literacy
  • 11. “Humanists have data, and they need data skills.” Digital Humanities Data Curation Data in the Humanities
  • 12. Types of Humanities Data • Scholarly editions • Text corpora • Text with markup • Thematic research collections • Data with accompanying analysis or annotation • Finding aids and other information maps, such as bibliographies Digital Humanities Data Curation Introduction
  • 13. Big Data Digital Humanities vs. Small Data Digital Humanities • “Research in Big Data Digital Humanities focuses on large or dense cultural datasets, which call for new processing and interpretation methods” • “..Small Data Digital Humanities regroup more focused works that do not use massive data processing..” • A map for big data research in digital humanities, Frédéric Kaplan
  • 14. 1. research the context: know the data about the data (so meta!) How to understand data
  • 15. Data versus Metadata Big? Smart? Clean? Messy? Data in the Humanities, Christof Schöch Metadata Metadata Metadata Metadata data data data data data data data data data data data data data data data data About this dataset: Title: Metadata Date Created: Metadata Creator: Metadata Methods Used: Metadata
  • 16. 2. research who the data is about How to understand data
  • 17. What are historical contexts around their language and style?
  • 18. A note on data ethics.
  • 19. Zine Librarians Code of Ethics • “Zines are not like mass-distributed books. They are often self- published and self-distributed, and sometimes printed in very small runs, intended for a small audience. In addition, perzines are by definition “personal”, and zinesters may feel different about having their zines distributed in print than they would about having them openly available on the internet or print. This can be especially true in the case of “historical” zines in library collections — for example, a teen girl writing a zine for her close friends in 1994 may not want her zine distributed online or in print 20 years later.” • Via Zinelibraries
  • 20. Ethics • Choosing tools: • Omeka CMS vs Mukurtu CMS • Collecting data: • Boston College Oral Histories
  • 21. 3. investigate the source How to understand data
  • 22. Recognizing uncertainty and bias Data on killings in the Syrian conflict. https://responsibledata.io/reflection-stories/uncertainty- statistics/ Let’s investigate the source…
  • 23. Recognizing uncertainty and bias Sources include • Syrian government • Syrian Center for Statistics and Research • Syrian Network for Human Rights • Syrian Observatory for Human Rigets and many more. https://responsibledata.io/reflection-stories/uncertainty- statistics/
  • 24.
  • 25. there are lots of human decisions that go into creating these statistics without knowing how these deaths have been coded, it’s difficult to trust in the figures
  • 26. 4. highlight un/common data entries to gain rough insights How to understand data
  • 27. Descriptive analysis i.e., description of the data from a sample
  • 28. Quick descriptive statistics •frequency •rank from lowest to highest •average (mean, median, mode) •variability
  • 29. Bivariate descriptive statistics fancy way of saying we are looking at two variables at once Hamlet Macbeth Othello Similes 50 9 59 Metaphors 20 38 58 Total 70 47 117 Evaluating Comparison Methods
  • 30. Correlation most common way to describe a relationship between two measures
  • 31. Finding Data What type of data are you looking for? List of Data Repositories DH Toychest: Data Collections and Datasets • Texts: HathiTrust Digital Library • Spatial or numeric datasets: Data.gov • Images: British Library Images • Hybrid data sets: Digital Public Library of America Via
  • 32. What if the dataset you need does not exist?
  • 33. How to data 1. Determine what to say 2. Find/collect/create the data you need 3. Wrangle! 4. Clean! 5. Do it many more times.
  • 34. ID Religion Income Age Q1 Q2 Q3 26371 Jewish <$10K 19 Yes 6 20 26372 Atheist $50-75K 24 - 4 21 26373 Catholic $75-100K 56 Yes 3 21 26374 Withheld $75-100K 33 No 6 21 26375 Pentecostal withheld 49 Yes 8 20 26376 Jewish $40-50K 29 Yes 5 19 26377 Catholic $20-30K 37 No 4 22 http://vita.had.co.nz/papers/tidy-data.pdf Tidy Data
  • 35. Most common problems • Column headers are values, not variable names. • Multiple variables are stored in one column. • Variables are stored in both rows and columns. • Multiple types of observational units are stored in the same table. • A single observational unit is stored in multiple tables http://vita.had.co.nz/papers/tidy-data.pdf
  • 36. if you torture data long enough, it will confess to anything
  • 37. How can a visualization be misleading?
  • 42. Open Data: Things to Consider http://www.slideshare.net/libereurope/humanities-data-literacy-student- perspective-on-digital-cultural-heritage-collections?qid=70bd86f2-10c5-43a6- b053-56d264ca28ab&v=&b=&from_search=1
  • 43. Recommended Reading / Viewing “Numbers are Only Human” – Brian Root “Ethical Principles of Psychologists and Code of Conduct” – American Psychological Association “On Not Looking: Ethics and Access in the Digital Humanities” – Kimberly Cristen-Withey
  • 44. Upcoming Workshops and Events library.uta.edu/scholcomm Rafia Mirza rafia@uta.edu @librarianrafia Peace Ossom Williamson peace@uta.edu @123POW

Editor's Notes

  1. RAFIA
  2. Data are anything which is used or created to generate new knowledge and interpretations. “Anything” may be objective or subjective; physical or emotional; persistent or ephemeral; personal or public; explicit or tactic; and is consciously or unconsciously referenced by the researcher at some point during the course of their research. Research data may or may not lead to a research output, which regardless of method of presentation, is a planned public statement of new knowledge or interpretation. Garrett, 2012)
  3. Involves knowledge of quantitative (statistical) methods, metadata standards, and the data curation lifecycle. But also the understanding of
  4. Identifying problems that a dataset can answer
  5. Recognizing research potential of an existing heritage collection, or identifying ways to answer questions or problems. Develop a hypothesis based on the data. Becoming aware of the data features that enable new quantitative methods (including access, format, systematicity, and metadata) Understanding the context and provenience of a collection (including extension, representativeness, openness, and copyright and privacy issues)
  6. As the materials and analytical practices of research become increasingly digital, the theoretical knowledge and practical skills of information science, librarianship, and archival science will become ever more vital to humanists and to anyone working with cultural heritage.”
  7. Heritage collections
  8. “Another important distinction is between data and metadata. Here, the term “data” refers to the part of a file or dataset which contains the actual representation of an object of inquiry, while the term “metadata” refers to data about that data: metadata explicitly describes selected aspects of a dataset, such as the time of its creation, or the way it was collected, or what entity external to the dataset it is supposed to represent. Read papers and study accompanying documentation
  9. Need to know your data. What is the background of the person, time period, or language you are studying? What elements are represented and how were they obtained? What elements are missing or misrepresented? What are other questions you can ask or ways you can find out answers?
  10. How does this reflect itself in writing? How does one show their background? How does one signify?
  11. The present moment is filled with DH practitioners creating visualizations of ‘big data,’ mapping connections between people and ancient cities, and building archives dedicated to long-dead authors. These worthwhile academic and practical pursuits point us to the center of the digital humanities landscape. But, if we move to the margins and begin to look at the projects and tools that emerge from indigenous communities, archivists and cultural specialists, we see a different pattern: images are purposely removed, archives are not ‘open to the public,’ maps of sacred sites are consciously not created, defined or linked to. How do we integrate these varied practices and philosophies into the possibilities offered by digital humanities scholars? It is one thing to call attention to difference, it is another to alter our display practices, question access parameters, and redefine our own ways of knowing based on systems of accountability that define an ethical field of visually based on not looking. If seeing is believing and a picture is worth a thousand words, what can we learn from the act of not looking, or perhaps, more specifically, not seeing? 
  12. PEACE Talk to subject experts, read papers, and study accompanying documentation
  13. More often than not, it is not the writer that is twisting the numbers but the numbers themselves twisting up the writer. Manipulation of the facts or of the reader is usually not intentional.
  14. More often than not, it is not the writer that is twisting the numbers but the numbers themselves twisting up the writer. Manipulation of the facts or of the reader is usually not intentional.
  15. Estimations from the Syrian Observatory for Human Rights. Imagine the decision that might have to be made to categorize a typical citizen with no military training, who has picked up a gun shortly before his death. Perhaps the coder might have a bias to continue calling this person a civilian. But this person took up arms against the government, did they not? How would you code a Syrian army defector now fighting with an opposition group? … Without some sort of standard protocol, rigorously followed, the coding of affiliation allows for a degree of subjectivity
  16. Talk to subject experts, read papers, and study accompanying documentation
  17. What is a numerical way to describe data?
  18. Frequency – how often a value exists (e.g., a name or gender) 50 women, 38 men, 2 other, 10 unknown Rank from lowest to highest – list albums’ use of personal pronouns in order of high to low Average - Measure of central tendency Variability – how different or similar scores are to each other (range, standard deviation)
  19. Contingency table, a type of summary table. This shows frequency distribution.
  20. Contingency table, a type of summary table. This shows frequency distribution.
  21. HathiTrust: More than 2 million volumes are in the public domain and freely viewable on the Web. More information about obtaining the texts can be found here British Library Images: Millions of images from the pages of 17th, 18th and 19th century books digitized DPLA: "brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world"; API enabled
  22. Data literacy is important so that we can • Tell compelling stories that others are more likely to repeat, remember, and act on • Determine when someone else is trying to mislead using visualizations
  23. Clean- open refine
  24. Identifier Independent information / fixed variables Dependent variables / measured variables
  25. This is similar to when you got that mosquito bite and were sure you were getting Zika Virus or West Nile, only to realize your arm itched for 24 hours.
  26. Business insider published using the chart but sought permission to reverse it, keeping the same design but turning it upside down for their readers.
  27. It’s true though, that images such as the visualisation above draw attention to some important issues. Though they state their data source (the Syria Network for Human Rights) what we’ve explored here so far makes it clear that this data has flaws. We can’t know for sure the extent of those flaws, though, and some might argue that as long as the main message is transmitted, the details don’t matter so much.