SlideShare a Scribd company logo
OSUL and Digital Humanities
Dealing with Data Problems
◦ While the Library licenses the content via
a content provider, access to the
underlying data for aggregated research is
and isn’t supported.
◦ In this case, access to content is limited
through both our subscriptions and
newspaper publishers themselves.
◦ For this project, licensing to many of the
sources David and Patrick were interested
in working with required licensing fees of
~$25-50,000 per newspaper.
Big “little” data
We worry a lot about big research data in the library and how this information will be preserved
and made accessible into the future
◦ But equally concerning – is big “little” data
Big “little” data has very specific problems:
1. Acquisition of the data can be really difficult
2. Storage tends to be inefficient and difficult
3. It’s incredibly hard to move around
4. For purposes of aggregation, it limits the types of tools that can be used for evaluation
5. When the data is closed, finding undocumented inconsistencies is hard
Sample Data Set
NewsPaper Processing tool
Data processing methodology
Created two data sets:
1. First data set focused on any digital object (excluding classifieds), that included references to public
housing
2. Second data set focused on any digital object (excluding classifieds), that included public housing
and 4 agreed upon synonyms for public housing
One of the benefits of using the resources that we did, was that there was very little article
duplication across resources (i.e., very little reliance on the Associated Press – meaning that
little data filtering needed to occur to account for duplicate data across newspapers)
Data processing methodology
From these sets – I wrote a suite of tools in C# that measured:
1) Presence of positive terms
2) Presences of negative terms
3) Neutral terms
4) Frequency of negative and positive terms
5) Proximity to positive and negative terms to provide weight
These tools utilized stemming to allow the tool to capture forms of words.
One thing that this work highlighted however, was the limitations in the data due to data quality. These
resources are ocr’ed representations of a particular newspaper article, classified, etc. – and ocr data
quality varies significantly across the titles. A secondary research project that I’ve begun is using these
data sets to test ocr quality of the set by utilizing word frequency to map unique words across a digital
object
0
5
10
15
20
25
30
35
40
45
1930 1940 1950 1960 1970 1980 1990 2000
Cleveland Call Post
More Positive More Negative
Just Public Housing: Cleveland
-15
-10
-5
0
5
10
15
20
25
1930 1940 1950 1960 1970 1980 1990 2000
Article Content: Positive Over Negative
Just Public Housing: Cleveland
Extended Terms: Cleveland
0
5
10
15
20
25
30
35
40
45
50
1930 1940 1950 1960 1970 1980 1990 2000
Cleveland Call Post
More Positive More Negative
Extended Terms: Cleveland
-15
-10
-5
0
5
10
15
20
25
1930 1940 1950 1960 1970 1980 1990 2000
Article Content: Positive Over Negative
Public Housing vs Extended Terms
-15
-10
-5
0
5
10
15
20
25
1930 1940 1950 1960 1970 1980 1990 2000
Article Content: Positive Over Negative
-15
-10
-5
0
5
10
15
20
25
1930 1940 1950 1960 1970 1980 1990 2000
Article Content: Positive Over Negative
Public Housing vs Extended Terms: NY
-10
-5
0
5
10
15
20
25
30
35
1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
Article Content: Positive Over Negative
-15
-10
-5
0
5
10
15
20
25
30
1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
Article Content: Positive Over Negative
Data processing methodology
Potential additional areas of inquiry:
• Representation of public housing in:
• letters to the editor
• Editorials
• Featured Articles

More Related Content

What's hot

Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...
Peter Neish
 
Getting Comfortable with Metadata Reuse
Getting Comfortable with Metadata ReuseGetting Comfortable with Metadata Reuse
Getting Comfortable with Metadata Reuse
Jenn Riley
 
Future Of Metadata –
Future Of Metadata –Future Of Metadata –
Future Of Metadata –
Jill Strass
 
Open Source Reference Desk Software at the Victorian Parliamentary Library
Open Source Reference Desk Software at the Victorian Parliamentary LibraryOpen Source Reference Desk Software at the Victorian Parliamentary Library
Open Source Reference Desk Software at the Victorian Parliamentary Library
Peter Neish
 
Driver Guidelines and Repository Interoperability
Driver Guidelines and Repository InteroperabilityDriver Guidelines and Repository Interoperability
Driver Guidelines and Repository Interoperability
maurice.vanderfeesten
 
Discovery elsewhere
Discovery elsewhereDiscovery elsewhere
Discovery elsewhere
Jenn Riley
 
Checking for Originality: Crossref Similarity Check
Checking for Originality: Crossref Similarity CheckChecking for Originality: Crossref Similarity Check
Checking for Originality: Crossref Similarity Check
Crossref
 
Designing the Garden: Getting Grounded in Linked Data
Designing the Garden: Getting Grounded in Linked DataDesigning the Garden: Getting Grounded in Linked Data
Designing the Garden: Getting Grounded in Linked Data
Jenn Riley
 
Enabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperabilityEnabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperability
Irina Bolychevsky
 
Moving to the network level: discovery and disclosure
Moving to the network level:discovery and disclosureMoving to the network level:discovery and disclosure
Moving to the network level: discovery and disclosure
lisld
 
UKSG Conference 2017 Breakout - KBART recommendations: challenges and achieve...
UKSG Conference 2017 Breakout - KBART recommendations: challenges and achieve...UKSG Conference 2017 Breakout - KBART recommendations: challenges and achieve...
UKSG Conference 2017 Breakout - KBART recommendations: challenges and achieve...
UKSG: connecting the knowledge community
 
Introduction to Crossref
Introduction to CrossrefIntroduction to Crossref
Introduction to Crossref
Crossref
 
Weaving a Web of Linked Data - September 26th, 2019
Weaving a Web of Linked Data - September 26th, 2019Weaving a Web of Linked Data - September 26th, 2019
Weaving a Web of Linked Data - September 26th, 2019
Platform Linked Data Netherlands (PLDN)
 
Quick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & MicroformatsQuick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & Microformats
University of California, San Diego
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
SpazioDati
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
Synaptica, LLC
 
New product developments - Jennifer Lin - London LIVE 2017
New product developments - Jennifer Lin - London LIVE 2017New product developments - Jennifer Lin - London LIVE 2017
New product developments - Jennifer Lin - London LIVE 2017
Crossref
 
Linked Data: so what?
Linked Data: so what?Linked Data: so what?
Linked Data: so what?
MIUR
 
Archives 2.0, the Archives Hub and AIM25
Archives 2.0, the Archives Hub and AIM25Archives 2.0, the Archives Hub and AIM25
Archives 2.0, the Archives Hub and AIM25
Jane Stevenson
 
Semantics and Web 3.0
Semantics and Web 3.0Semantics and Web 3.0
Semantics and Web 3.0
IntelliSemantic
 

What's hot (20)

Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...
 
Getting Comfortable with Metadata Reuse
Getting Comfortable with Metadata ReuseGetting Comfortable with Metadata Reuse
Getting Comfortable with Metadata Reuse
 
Future Of Metadata –
Future Of Metadata –Future Of Metadata –
Future Of Metadata –
 
Open Source Reference Desk Software at the Victorian Parliamentary Library
Open Source Reference Desk Software at the Victorian Parliamentary LibraryOpen Source Reference Desk Software at the Victorian Parliamentary Library
Open Source Reference Desk Software at the Victorian Parliamentary Library
 
Driver Guidelines and Repository Interoperability
Driver Guidelines and Repository InteroperabilityDriver Guidelines and Repository Interoperability
Driver Guidelines and Repository Interoperability
 
Discovery elsewhere
Discovery elsewhereDiscovery elsewhere
Discovery elsewhere
 
Checking for Originality: Crossref Similarity Check
Checking for Originality: Crossref Similarity CheckChecking for Originality: Crossref Similarity Check
Checking for Originality: Crossref Similarity Check
 
Designing the Garden: Getting Grounded in Linked Data
Designing the Garden: Getting Grounded in Linked DataDesigning the Garden: Getting Grounded in Linked Data
Designing the Garden: Getting Grounded in Linked Data
 
Enabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperabilityEnabling re-use via CKAN: discoverability and interoperability
Enabling re-use via CKAN: discoverability and interoperability
 
Moving to the network level: discovery and disclosure
Moving to the network level:discovery and disclosureMoving to the network level:discovery and disclosure
Moving to the network level: discovery and disclosure
 
UKSG Conference 2017 Breakout - KBART recommendations: challenges and achieve...
UKSG Conference 2017 Breakout - KBART recommendations: challenges and achieve...UKSG Conference 2017 Breakout - KBART recommendations: challenges and achieve...
UKSG Conference 2017 Breakout - KBART recommendations: challenges and achieve...
 
Introduction to Crossref
Introduction to CrossrefIntroduction to Crossref
Introduction to Crossref
 
Weaving a Web of Linked Data - September 26th, 2019
Weaving a Web of Linked Data - September 26th, 2019Weaving a Web of Linked Data - September 26th, 2019
Weaving a Web of Linked Data - September 26th, 2019
 
Quick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & MicroformatsQuick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & Microformats
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
New product developments - Jennifer Lin - London LIVE 2017
New product developments - Jennifer Lin - London LIVE 2017New product developments - Jennifer Lin - London LIVE 2017
New product developments - Jennifer Lin - London LIVE 2017
 
Linked Data: so what?
Linked Data: so what?Linked Data: so what?
Linked Data: so what?
 
Archives 2.0, the Archives Hub and AIM25
Archives 2.0, the Archives Hub and AIM25Archives 2.0, the Archives Hub and AIM25
Archives 2.0, the Archives Hub and AIM25
 
Semantics and Web 3.0
Semantics and Web 3.0Semantics and Web 3.0
Semantics and Web 3.0
 

Similar to Reframing Public Housing: Visualization and Data Analytics in History

Here Comes Everything
Here Comes EverythingHere Comes Everything
Here Comes Everything
Nigel Shadbolt
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
varshakumar21
 
Cal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPTool
Carly Strasser
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
Paco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Big Data Spain
 
Linked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsLinked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & Museums
Jon Voss
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
varshakumar21
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data Things
Katina Toufexis
 
Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend
Konkuk University
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
Manuel Corpas
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
datacite
 
Wire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub ProjectWire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub Project
mwe400
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data Stewardship
ICPSR
 
Meeting Federal Research Requirements
Meeting Federal Research RequirementsMeeting Federal Research Requirements
Meeting Federal Research Requirements
ICPSR
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
Vlad Posea
 
Is Linked Open Data the way forward?
Is Linked Open Data the way forward?Is Linked Open Data the way forward?
Is Linked Open Data the way forward?
American Art Collaborative
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
DataONE
 
Where's the Data?
Where's the Data?Where's the Data?
Where's the Data?
Andrea Payant
 
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
National Information Standards Organization (NISO)
 

Similar to Reframing Public Housing: Visualization and Data Analytics in History (20)

Here Comes Everything
Here Comes EverythingHere Comes Everything
Here Comes Everything
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Cal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPToolCal Poly - Data Management and the DMPTool
Cal Poly - Data Management and the DMPTool
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Linked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsLinked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & Museums
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data Things
 
Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
 
Wire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub ProjectWire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub Project
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data Stewardship
 
Meeting Federal Research Requirements
Meeting Federal Research RequirementsMeeting Federal Research Requirements
Meeting Federal Research Requirements
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
Is Linked Open Data the way forward?
Is Linked Open Data the way forward?Is Linked Open Data the way forward?
Is Linked Open Data the way forward?
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
Where's the Data?
Where's the Data?Where's the Data?
Where's the Data?
 
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
 

More from Terry Reese

MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
Terry Reese
 
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
Terry Reese
 
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A PrimerMarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
Terry Reese
 
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEditMarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
Terry Reese
 
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
Terry Reese
 
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
Terry Reese
 
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit MacMarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
Terry Reese
 
Working with the MarcEditor
Working with the MarcEditorWorking with the MarcEditor
Working with the MarcEditor
Terry Reese
 
Slides from the NASIG 2018 Preconference
Slides from the NASIG 2018 PreconferenceSlides from the NASIG 2018 Preconference
Slides from the NASIG 2018 Preconference
Terry Reese
 
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
Terry Reese
 
Rejoining the Information access landscape
Rejoining the Information access landscapeRejoining the Information access landscape
Rejoining the Information access landscape
Terry Reese
 
Open metadata, open systems…redrawing the library metadata landscape
Open metadata, open systems…redrawing the library metadata landscapeOpen metadata, open systems…redrawing the library metadata landscape
Open metadata, open systems…redrawing the library metadata landscape
Terry Reese
 
Getting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEditGetting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEdit
Terry Reese
 
Fitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemFitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystem
Terry Reese
 
Thinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage WorkflowThinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage Workflow
Terry Reese
 
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
Terry Reese
 
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
Terry Reese
 
Practical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic dataPractical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic data
Terry Reese
 
Making RDA Easy(er) with MarcEdit
Making RDA Easy(er) with MarcEditMaking RDA Easy(er) with MarcEdit
Making RDA Easy(er) with MarcEdit
Terry Reese
 
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
Terry Reese
 

More from Terry Reese (20)

MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
 
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
 
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A PrimerMarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
 
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEditMarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
 
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
 
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
 
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit MacMarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
 
Working with the MarcEditor
Working with the MarcEditorWorking with the MarcEditor
Working with the MarcEditor
 
Slides from the NASIG 2018 Preconference
Slides from the NASIG 2018 PreconferenceSlides from the NASIG 2018 Preconference
Slides from the NASIG 2018 Preconference
 
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
 
Rejoining the Information access landscape
Rejoining the Information access landscapeRejoining the Information access landscape
Rejoining the Information access landscape
 
Open metadata, open systems…redrawing the library metadata landscape
Open metadata, open systems…redrawing the library metadata landscapeOpen metadata, open systems…redrawing the library metadata landscape
Open metadata, open systems…redrawing the library metadata landscape
 
Getting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEditGetting Started with Regular Expressions In MarcEdit
Getting Started with Regular Expressions In MarcEdit
 
Fitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemFitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystem
 
Thinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage WorkflowThinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage Workflow
 
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
 
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
 
Practical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic dataPractical approaches to entification in library bibliographic data
Practical approaches to entification in library bibliographic data
 
Making RDA Easy(er) with MarcEdit
Making RDA Easy(er) with MarcEditMaking RDA Easy(er) with MarcEdit
Making RDA Easy(er) with MarcEdit
 
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
 

Recently uploaded

Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
christianmathematics
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Ashish Kohli
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 

Recently uploaded (20)

Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 

Reframing Public Housing: Visualization and Data Analytics in History

  • 1. OSUL and Digital Humanities
  • 2. Dealing with Data Problems ◦ While the Library licenses the content via a content provider, access to the underlying data for aggregated research is and isn’t supported. ◦ In this case, access to content is limited through both our subscriptions and newspaper publishers themselves. ◦ For this project, licensing to many of the sources David and Patrick were interested in working with required licensing fees of ~$25-50,000 per newspaper.
  • 3. Big “little” data We worry a lot about big research data in the library and how this information will be preserved and made accessible into the future ◦ But equally concerning – is big “little” data Big “little” data has very specific problems: 1. Acquisition of the data can be really difficult 2. Storage tends to be inefficient and difficult 3. It’s incredibly hard to move around 4. For purposes of aggregation, it limits the types of tools that can be used for evaluation 5. When the data is closed, finding undocumented inconsistencies is hard
  • 6. Data processing methodology Created two data sets: 1. First data set focused on any digital object (excluding classifieds), that included references to public housing 2. Second data set focused on any digital object (excluding classifieds), that included public housing and 4 agreed upon synonyms for public housing One of the benefits of using the resources that we did, was that there was very little article duplication across resources (i.e., very little reliance on the Associated Press – meaning that little data filtering needed to occur to account for duplicate data across newspapers)
  • 7. Data processing methodology From these sets – I wrote a suite of tools in C# that measured: 1) Presence of positive terms 2) Presences of negative terms 3) Neutral terms 4) Frequency of negative and positive terms 5) Proximity to positive and negative terms to provide weight These tools utilized stemming to allow the tool to capture forms of words. One thing that this work highlighted however, was the limitations in the data due to data quality. These resources are ocr’ed representations of a particular newspaper article, classified, etc. – and ocr data quality varies significantly across the titles. A secondary research project that I’ve begun is using these data sets to test ocr quality of the set by utilizing word frequency to map unique words across a digital object
  • 8. 0 5 10 15 20 25 30 35 40 45 1930 1940 1950 1960 1970 1980 1990 2000 Cleveland Call Post More Positive More Negative Just Public Housing: Cleveland
  • 9. -15 -10 -5 0 5 10 15 20 25 1930 1940 1950 1960 1970 1980 1990 2000 Article Content: Positive Over Negative Just Public Housing: Cleveland
  • 10. Extended Terms: Cleveland 0 5 10 15 20 25 30 35 40 45 50 1930 1940 1950 1960 1970 1980 1990 2000 Cleveland Call Post More Positive More Negative
  • 11. Extended Terms: Cleveland -15 -10 -5 0 5 10 15 20 25 1930 1940 1950 1960 1970 1980 1990 2000 Article Content: Positive Over Negative
  • 12. Public Housing vs Extended Terms -15 -10 -5 0 5 10 15 20 25 1930 1940 1950 1960 1970 1980 1990 2000 Article Content: Positive Over Negative -15 -10 -5 0 5 10 15 20 25 1930 1940 1950 1960 1970 1980 1990 2000 Article Content: Positive Over Negative
  • 13. Public Housing vs Extended Terms: NY -10 -5 0 5 10 15 20 25 30 35 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 Article Content: Positive Over Negative -15 -10 -5 0 5 10 15 20 25 30 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 Article Content: Positive Over Negative
  • 14. Data processing methodology Potential additional areas of inquiry: • Representation of public housing in: • letters to the editor • Editorials • Featured Articles

Editor's Notes

  1. David Staley was the first faculty member outside of OSUL that I met when I first moved to Ohio, so when he and Patrick approached me with this particular problem I was definitely interested. I approached content provider, and they allowed us to grandfather this project into a data pilot. More researchers want this data, and their current system doesn’t make this process easy. So, to support researcher requests, content provider has been testing a program where all data is loaded to amazon, and researchers can then be granted access to these files, for a nominal fee, for processing. Based on the Library’s subscriptions and publisher license data, content provider was able to make available content from ~1880-Present for the 8 historical African American newspapers. I let David and Patrick know, and they tweaked their initial project scope, with that idea that we could evaluate the data we had, and maybe expand to other resources for later comparison.
  2. Big data – astrometric data, physic data, etc. Big “little” data has a number of problems particular to it Getting the data can be a real challenging. In our case, data needed to be downloaded, one by one, from the content provider. (1 month) Difficult to move around (our data set takes 3 weeks to do a full copy) There are a lot of great python libraries for doing text mining and evaluation, and they just simply wouldn’t work over the data set