SlideShare a Scribd company logo
1 of 15
VIEW THESE SLIDES 
MAPPING IMPLICIT PROCESSES: 
EXTRACTING SOCIAL NETWORKS FROM DIGITAL CORPORA 
M. H. Beals 
Shef f ield Hallam University 
@mhbeals 
ABOUT ME
Overview 
Understanding Scissors-and-Paste Journalism in Georgian Britain 
Computer-Aided Identification of Reprints and Memes 
Understanding Dissemination Pathways 
Manual Construction of Social Networks 
Computer-Aided Ordering of Dissemination Pathways 
Future Plans
Scissors-and-Paste Journalism in Georgian Britain 
Proliferation of Colonial and Provincial Presses 
Spread of Journeyman Printers 
Reduction of Stamp Duty 
New Profit Models 
Entertaining and Literary Content 
Adverts to Attract Readers to Sell to Advertisers 
Manual Dissemination of News 
Limited Number of “Specials” 
Postal Exchange, Subscriptions, Correspondence 
No Telegraph until 1840s and Not Used for Miscellany
Computer-Aided Identification of Reprints & Memes 
Promise 
Large-Scale Digitisation Efforts 
Keyword Searching 
nGram Matching (WCopyFind) 
Edition Tracking (Juxta) 
Viral Texts Project (Cordell, Dillon, and Smith) 
Large-Scale Corpus of Nineteenth Century Newspapers 
Extensive, Automatic Repair of OCR Errors 
Identification of Highly Reprinted Materials (Memes) 
Discussion and Exploration of Meme Traits and and Patterns 
Perils 
Discrete Digital Corpera (Paywalls) 
Offline Penumbra (Curation) 
Lost Nodes (Incomplete Data) 
OCR Variability (50-80%)
Computer-Aided Identification of Reprints & Memes 
# concordanceset.py 
import re 
def replace_words(text, word_dic): 
rc = re.compile('|'.join(map(re.escape, word_dic))) 
def translate(match): 
return word_dic[match.group(0)] 
return rc.sub(translate, text) 
def getNGrams(wordlist, n): 
return [wordlist[i:i+n] for i in range(len(wordlist)-(n-1))] 
basenumber = raw_input('What is the first id number? ’) 
number = str(basenumber) 
numberint = int(basenumber) 
basenumberend = raw_input('What is the last id number? ’) 
endnumber = int(basenumberend) 
ngram = raw_input('How many words should be in a phrase? ’) 
ngrams = int(ngram) 
combifile = 'combine.txt’ 
listopen = open(combifile, "r”) 
wordlist = listopen.read() 
splitlist = wordlist.split() 
listopen.close() 
ngramslist = getNGrams(splitlist, ngrams) 
if ngramslist: 
ngramslist.sort() 
last = ngramslist[-1] 
for i in range(len(ngramslist)-2, -1, -1): 
if last == ngramslist[i]: 
del ngramslist[i] 
else: 
last = ngramslist[i] 
tidystring = '’ 
for item in ngramslist: 
number = str(basenumber) 
numberint = int(basenumber) 
lineitem = " ".join(item) 
print lineitem 
tidystring += str('n' + lineitem + ',') 
while (numberint<=endnumber): 
file = str(number + ".txt”) 
fin = open(file, 'r’) 
text = fin.read() 
fin.close() 
if lineitem in text: 
tidystring += str(number + ',’) 
numberint = int(number) 
numberint += 1 
number = str(numberint) 
# create an excelfile for this example 
excel_file = "ngramcompiled.csv” 
fout = open(excel_file, "w”) 
fout.write(tidystring) 
fout.close()
Computer-Aided Identification of Reprints & Memes
Understanding Dissemination Pathways 
Meme Identification 
Courtesy of Viral Texts Project, http://www.viraltexts.org/
Understanding Dissemination Pathways 
Chronological Spread 
Courtesy of Viral Texts Project, https://www.youtube.com/watch?v=YwDlyt7jhMs
Understanding Dissemination Pathways 
Genealogical Model
Manual Construction of Social Networks 
The Glasgow Advertiser, 7 October 1793, p. 5 
Knoxville, May 11. 
IT is shocking to describe the bloody scenes that 
have lately taken place in this district. The 
Indians have killed and scalped a great number of 
persons, among whom is Colonel Isaac Bledose, 
who was massacred within 150 yards of his own 
house. 
On the 27th instant a body of Indians attacked 
Greenfield station: they killed John Jervis, and 
a negro fellow, belonging to Mrs. Tarker. By 
the bravery of three young men, viz. William Nee-ly, 
William Wilson, and William Hall, the station 
was preserved; they killed two Indians, wounded 
several others, and put them to flight. It is to be 
remembered, that Neely and Hall had each lost a 
father and two brothers, and Wilson a brother, by 
the savages. Men are now in pursuit of the Indi-ans. 
Full Discussion of Dissemination Pathway Available at: http://prezi.com/in4_bqvgmanr/
Manual Construction of Social Networks 
Derived from 
Glasgow News Archive, British Library 19th Century Newspapers, 
NewspaperArchive.com, Readex Early American Newspapers, Newspapers.com, and the University of Kentucky
Computer-Aided Ordering of Dissemination Pathways 
Binary Computer Model 
Arbitrary Tolerance Levels 
Reference to Additional Tables 
Bypassing Missing Nodes 
Flexibility 
Difficult to Recreate Human Instinct… 
…But is That a Bad Thing?
Computer-Aided Ordering of Dissemination Pathways 
Phylogenetic Model 
Image Courtesy of Fred Hsu (Wikipedia:User:Fredhsu on en.wikipedia) 
CC-BY-SA-3.0 via Wikimedia Commons
Future Plans 
Computer Program 
OCR Clean-up Processes 
Division into Likely Meme Groupings 
Variety of Relatedness Scores 
Textual Integrity 
Prefixes and Suffixes 
Chronological Separation 
Chronological-Geographical Feasibility 
Well-Worn Path Modifier 
Modeling of Relatedness Factors 
Manual Corrections 
Directional Social Network Database 
Raw Data to Inform Additional Research 
Direct Attributions 
Parsing Compilations 
Initial Discovery of Well-Worn Paths 
Inclusion of Offline Materials 
www.mhbeals.com/cnd
VIEW THESE SLIDES ON SLIDESHARE 
MAPPING IMPLICIT PROCESSES: 
EXTRACTING SOCIAL NETWORKS FROM DIGITAL CORPORA 
M. H. Beals 
Shef f ield Hallam University 
@mhbeals 
ABOUT ME 
WWW. MHBEALS.COM

More Related Content

Viewers also liked

Try Harder: Archival Research in the Digital Age
Try Harder: Archival Research in the Digital AgeTry Harder: Archival Research in the Digital Age
Try Harder: Archival Research in the Digital AgeM. H Beals
 
Lies, Damned Lies and Statistics: History and the Impact Agenda
Lies, Damned Lies and Statistics: History and the Impact AgendaLies, Damned Lies and Statistics: History and the Impact Agenda
Lies, Damned Lies and Statistics: History and the Impact AgendaM. H Beals
 
Historical TEI: Developing a Portfolio of Common Practice
Historical TEI: Developing a Portfolio of Common PracticeHistorical TEI: Developing a Portfolio of Common Practice
Historical TEI: Developing a Portfolio of Common PracticeM. H Beals
 
Evolutionary Plagiarism: Tracing Dissemination Pathways in 19th-Century Reprints
Evolutionary Plagiarism: Tracing Dissemination Pathways in 19th-Century ReprintsEvolutionary Plagiarism: Tracing Dissemination Pathways in 19th-Century Reprints
Evolutionary Plagiarism: Tracing Dissemination Pathways in 19th-Century ReprintsM. H Beals
 
A Series of Small Things: The Case Study in the Age of Big Data
A Series of Small Things: The Case Study in the Age of Big DataA Series of Small Things: The Case Study in the Age of Big Data
A Series of Small Things: The Case Study in the Age of Big DataM. H Beals
 
Mennons and MacGillivray: Scotland and the North American Frontier, 1790-1795
Mennons and MacGillivray: Scotland and the North American Frontier, 1790-1795Mennons and MacGillivray: Scotland and the North American Frontier, 1790-1795
Mennons and MacGillivray: Scotland and the North American Frontier, 1790-1795M. H Beals
 
Interactive Character Assassination: The Ethics of Historical Video Game Design
Interactive Character Assassination: The Ethics of Historical Video Game DesignInteractive Character Assassination: The Ethics of Historical Video Game Design
Interactive Character Assassination: The Ethics of Historical Video Game DesignM. H Beals
 
Teaching and Learning in History: A 20-Minute Survival Guide
Teaching and Learning in History: A 20-Minute Survival GuideTeaching and Learning in History: A 20-Minute Survival Guide
Teaching and Learning in History: A 20-Minute Survival GuideM. H Beals
 

Viewers also liked (9)

Try Harder: Archival Research in the Digital Age
Try Harder: Archival Research in the Digital AgeTry Harder: Archival Research in the Digital Age
Try Harder: Archival Research in the Digital Age
 
Lies, Damned Lies and Statistics: History and the Impact Agenda
Lies, Damned Lies and Statistics: History and the Impact AgendaLies, Damned Lies and Statistics: History and the Impact Agenda
Lies, Damned Lies and Statistics: History and the Impact Agenda
 
Historical TEI: Developing a Portfolio of Common Practice
Historical TEI: Developing a Portfolio of Common PracticeHistorical TEI: Developing a Portfolio of Common Practice
Historical TEI: Developing a Portfolio of Common Practice
 
Evolutionary Plagiarism: Tracing Dissemination Pathways in 19th-Century Reprints
Evolutionary Plagiarism: Tracing Dissemination Pathways in 19th-Century ReprintsEvolutionary Plagiarism: Tracing Dissemination Pathways in 19th-Century Reprints
Evolutionary Plagiarism: Tracing Dissemination Pathways in 19th-Century Reprints
 
A Series of Small Things: The Case Study in the Age of Big Data
A Series of Small Things: The Case Study in the Age of Big DataA Series of Small Things: The Case Study in the Age of Big Data
A Series of Small Things: The Case Study in the Age of Big Data
 
Mennons and MacGillivray: Scotland and the North American Frontier, 1790-1795
Mennons and MacGillivray: Scotland and the North American Frontier, 1790-1795Mennons and MacGillivray: Scotland and the North American Frontier, 1790-1795
Mennons and MacGillivray: Scotland and the North American Frontier, 1790-1795
 
Crisitunity!
Crisitunity!Crisitunity!
Crisitunity!
 
Interactive Character Assassination: The Ethics of Historical Video Game Design
Interactive Character Assassination: The Ethics of Historical Video Game DesignInteractive Character Assassination: The Ethics of Historical Video Game Design
Interactive Character Assassination: The Ethics of Historical Video Game Design
 
Teaching and Learning in History: A 20-Minute Survival Guide
Teaching and Learning in History: A 20-Minute Survival GuideTeaching and Learning in History: A 20-Minute Survival Guide
Teaching and Learning in History: A 20-Minute Survival Guide
 

Similar to Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

European Journal of Cultural Studies2015, Vol. 18(4-5) 395 –
European Journal of Cultural Studies2015, Vol. 18(4-5) 395 –European Journal of Cultural Studies2015, Vol. 18(4-5) 395 –
European Journal of Cultural Studies2015, Vol. 18(4-5) 395 –BetseyCalderon89
 
How to follow actors through their traces. Exploiting digital traceability
How to follow actors through their traces. Exploiting digital traceabilityHow to follow actors through their traces. Exploiting digital traceability
How to follow actors through their traces. Exploiting digital traceabilityINRIA - ENS Lyon
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebMartin Kalfatovic
 
A MeMber of the Perseus books Gr ou Pwww.westviewpress.com.docx
A MeMber of the Perseus books Gr ou Pwww.westviewpress.com.docxA MeMber of the Perseus books Gr ou Pwww.westviewpress.com.docx
A MeMber of the Perseus books Gr ou Pwww.westviewpress.com.docxransayo
 
Data versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontationData versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontationLou Burnard
 
FREE 13 Abstract Writing Samples And Templates I
FREE 13 Abstract Writing Samples And Templates IFREE 13 Abstract Writing Samples And Templates I
FREE 13 Abstract Writing Samples And Templates IWendy Berg
 
Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistDigital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistRebecca Davis
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebMartin Kalfatovic
 
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...Daniel Katz
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 2 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 2 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 2 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 2 - Professor...Daniel Katz
 
All the world exists to end up in a dictionary
All the world exists to end up in a dictionaryAll the world exists to end up in a dictionary
All the world exists to end up in a dictionaryRossellaDH
 
PPT slides
PPT slidesPPT slides
PPT slidesbutest
 
Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online caniceconsulting
 
How To Write Analysis Paper. Online assignment writing service.
How To Write Analysis Paper. Online assignment writing service.How To Write Analysis Paper. Online assignment writing service.
How To Write Analysis Paper. Online assignment writing service.Angela Lovett
 
Digital Scholarship Intersection
Digital Scholarship IntersectionDigital Scholarship Intersection
Digital Scholarship IntersectionDavid De Roure
 
Figures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative ThinkingFigures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative ThinkingBernhard Rieder
 
Kult divinity lost deep dark net
Kult divinity lost   deep dark netKult divinity lost   deep dark net
Kult divinity lost deep dark netANDAZIELITETV
 
Pencil Border Back To School Bulletin Board Writing Pape
Pencil Border Back To School Bulletin Board Writing PapePencil Border Back To School Bulletin Board Writing Pape
Pencil Border Back To School Bulletin Board Writing PapeMichelle Meienburg
 
Computer Technology Essay
Computer Technology EssayComputer Technology Essay
Computer Technology EssayDonna Harvey
 

Similar to Mapping Implicit Processes: Extracting Social Networks from Digital Corpora (20)

European Journal of Cultural Studies2015, Vol. 18(4-5) 395 –
European Journal of Cultural Studies2015, Vol. 18(4-5) 395 –European Journal of Cultural Studies2015, Vol. 18(4-5) 395 –
European Journal of Cultural Studies2015, Vol. 18(4-5) 395 –
 
How to follow actors through their traces. Exploiting digital traceability
How to follow actors through their traces. Exploiting digital traceabilityHow to follow actors through their traces. Exploiting digital traceability
How to follow actors through their traces. Exploiting digital traceability
 
Topical_Facets
Topical_FacetsTopical_Facets
Topical_Facets
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
 
A MeMber of the Perseus books Gr ou Pwww.westviewpress.com.docx
A MeMber of the Perseus books Gr ou Pwww.westviewpress.com.docxA MeMber of the Perseus books Gr ou Pwww.westviewpress.com.docx
A MeMber of the Perseus books Gr ou Pwww.westviewpress.com.docx
 
Data versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontationData versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontation
 
FREE 13 Abstract Writing Samples And Templates I
FREE 13 Abstract Writing Samples And Templates IFREE 13 Abstract Writing Samples And Templates I
FREE 13 Abstract Writing Samples And Templates I
 
Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistDigital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
 
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 2 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 2 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 2 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 2 - Professor...
 
All the world exists to end up in a dictionary
All the world exists to end up in a dictionaryAll the world exists to end up in a dictionary
All the world exists to end up in a dictionary
 
PPT slides
PPT slidesPPT slides
PPT slides
 
Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online
 
How To Write Analysis Paper. Online assignment writing service.
How To Write Analysis Paper. Online assignment writing service.How To Write Analysis Paper. Online assignment writing service.
How To Write Analysis Paper. Online assignment writing service.
 
Digital Scholarship Intersection
Digital Scholarship IntersectionDigital Scholarship Intersection
Digital Scholarship Intersection
 
Figures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative ThinkingFigures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative Thinking
 
Kult divinity lost deep dark net
Kult divinity lost   deep dark netKult divinity lost   deep dark net
Kult divinity lost deep dark net
 
Pencil Border Back To School Bulletin Board Writing Pape
Pencil Border Back To School Bulletin Board Writing PapePencil Border Back To School Bulletin Board Writing Pape
Pencil Border Back To School Bulletin Board Writing Pape
 
Computer Technology Essay
Computer Technology EssayComputer Technology Essay
Computer Technology Essay
 

More from M. H Beals

Georgian Pingbacks: Launch Event
Georgian Pingbacks: Launch EventGeorgian Pingbacks: Launch Event
Georgian Pingbacks: Launch EventM. H Beals
 
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...M. H Beals
 
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century N...
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century N...Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century N...
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century N...M. H Beals
 
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...M. H Beals
 
Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism...
Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism...Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism...
Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism...M. H Beals
 
Imagining Communities: The Glasgow Advertiser and the Kentucky Frontier, 1790...
Imagining Communities: The Glasgow Advertiser and the Kentucky Frontier, 1790...Imagining Communities: The Glasgow Advertiser and the Kentucky Frontier, 1790...
Imagining Communities: The Glasgow Advertiser and the Kentucky Frontier, 1790...M. H Beals
 
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century Ne...
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century Ne...Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century Ne...
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century Ne...M. H Beals
 
Slow Down: Teaching Students to Encode their Close Reading
Slow Down: Teaching Students to Encode their Close ReadingSlow Down: Teaching Students to Encode their Close Reading
Slow Down: Teaching Students to Encode their Close ReadingM. H Beals
 
Promoting Peer-to-Peer Teaching, On and Offline
Promoting Peer-to-Peer Teaching, On and OfflinePromoting Peer-to-Peer Teaching, On and Offline
Promoting Peer-to-Peer Teaching, On and OfflineM. H Beals
 

More from M. H Beals (9)

Georgian Pingbacks: Launch Event
Georgian Pingbacks: Launch EventGeorgian Pingbacks: Launch Event
Georgian Pingbacks: Launch Event
 
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...
 
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century N...
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century N...Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century N...
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century N...
 
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...
Georgian Pingbacks: Mapping Attribution Networks in a 19th-Century Newspaper ...
 
Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism...
Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism...Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism...
Boutique Big Data: Understanding 19th-Century Reprint Culture With Plagiarism...
 
Imagining Communities: The Glasgow Advertiser and the Kentucky Frontier, 1790...
Imagining Communities: The Glasgow Advertiser and the Kentucky Frontier, 1790...Imagining Communities: The Glasgow Advertiser and the Kentucky Frontier, 1790...
Imagining Communities: The Glasgow Advertiser and the Kentucky Frontier, 1790...
 
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century Ne...
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century Ne...Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century Ne...
Boutique Big Data: Reintegrating Close and Distant Reading of 19th-Century Ne...
 
Slow Down: Teaching Students to Encode their Close Reading
Slow Down: Teaching Students to Encode their Close ReadingSlow Down: Teaching Students to Encode their Close Reading
Slow Down: Teaching Students to Encode their Close Reading
 
Promoting Peer-to-Peer Teaching, On and Offline
Promoting Peer-to-Peer Teaching, On and OfflinePromoting Peer-to-Peer Teaching, On and Offline
Promoting Peer-to-Peer Teaching, On and Offline
 

Recently uploaded

Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 

Recently uploaded (20)

Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 

Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

  • 1. VIEW THESE SLIDES MAPPING IMPLICIT PROCESSES: EXTRACTING SOCIAL NETWORKS FROM DIGITAL CORPORA M. H. Beals Shef f ield Hallam University @mhbeals ABOUT ME
  • 2. Overview Understanding Scissors-and-Paste Journalism in Georgian Britain Computer-Aided Identification of Reprints and Memes Understanding Dissemination Pathways Manual Construction of Social Networks Computer-Aided Ordering of Dissemination Pathways Future Plans
  • 3. Scissors-and-Paste Journalism in Georgian Britain Proliferation of Colonial and Provincial Presses Spread of Journeyman Printers Reduction of Stamp Duty New Profit Models Entertaining and Literary Content Adverts to Attract Readers to Sell to Advertisers Manual Dissemination of News Limited Number of “Specials” Postal Exchange, Subscriptions, Correspondence No Telegraph until 1840s and Not Used for Miscellany
  • 4. Computer-Aided Identification of Reprints & Memes Promise Large-Scale Digitisation Efforts Keyword Searching nGram Matching (WCopyFind) Edition Tracking (Juxta) Viral Texts Project (Cordell, Dillon, and Smith) Large-Scale Corpus of Nineteenth Century Newspapers Extensive, Automatic Repair of OCR Errors Identification of Highly Reprinted Materials (Memes) Discussion and Exploration of Meme Traits and and Patterns Perils Discrete Digital Corpera (Paywalls) Offline Penumbra (Curation) Lost Nodes (Incomplete Data) OCR Variability (50-80%)
  • 5. Computer-Aided Identification of Reprints & Memes # concordanceset.py import re def replace_words(text, word_dic): rc = re.compile('|'.join(map(re.escape, word_dic))) def translate(match): return word_dic[match.group(0)] return rc.sub(translate, text) def getNGrams(wordlist, n): return [wordlist[i:i+n] for i in range(len(wordlist)-(n-1))] basenumber = raw_input('What is the first id number? ’) number = str(basenumber) numberint = int(basenumber) basenumberend = raw_input('What is the last id number? ’) endnumber = int(basenumberend) ngram = raw_input('How many words should be in a phrase? ’) ngrams = int(ngram) combifile = 'combine.txt’ listopen = open(combifile, "r”) wordlist = listopen.read() splitlist = wordlist.split() listopen.close() ngramslist = getNGrams(splitlist, ngrams) if ngramslist: ngramslist.sort() last = ngramslist[-1] for i in range(len(ngramslist)-2, -1, -1): if last == ngramslist[i]: del ngramslist[i] else: last = ngramslist[i] tidystring = '’ for item in ngramslist: number = str(basenumber) numberint = int(basenumber) lineitem = " ".join(item) print lineitem tidystring += str('n' + lineitem + ',') while (numberint<=endnumber): file = str(number + ".txt”) fin = open(file, 'r’) text = fin.read() fin.close() if lineitem in text: tidystring += str(number + ',’) numberint = int(number) numberint += 1 number = str(numberint) # create an excelfile for this example excel_file = "ngramcompiled.csv” fout = open(excel_file, "w”) fout.write(tidystring) fout.close()
  • 7. Understanding Dissemination Pathways Meme Identification Courtesy of Viral Texts Project, http://www.viraltexts.org/
  • 8. Understanding Dissemination Pathways Chronological Spread Courtesy of Viral Texts Project, https://www.youtube.com/watch?v=YwDlyt7jhMs
  • 10. Manual Construction of Social Networks The Glasgow Advertiser, 7 October 1793, p. 5 Knoxville, May 11. IT is shocking to describe the bloody scenes that have lately taken place in this district. The Indians have killed and scalped a great number of persons, among whom is Colonel Isaac Bledose, who was massacred within 150 yards of his own house. On the 27th instant a body of Indians attacked Greenfield station: they killed John Jervis, and a negro fellow, belonging to Mrs. Tarker. By the bravery of three young men, viz. William Nee-ly, William Wilson, and William Hall, the station was preserved; they killed two Indians, wounded several others, and put them to flight. It is to be remembered, that Neely and Hall had each lost a father and two brothers, and Wilson a brother, by the savages. Men are now in pursuit of the Indi-ans. Full Discussion of Dissemination Pathway Available at: http://prezi.com/in4_bqvgmanr/
  • 11. Manual Construction of Social Networks Derived from Glasgow News Archive, British Library 19th Century Newspapers, NewspaperArchive.com, Readex Early American Newspapers, Newspapers.com, and the University of Kentucky
  • 12. Computer-Aided Ordering of Dissemination Pathways Binary Computer Model Arbitrary Tolerance Levels Reference to Additional Tables Bypassing Missing Nodes Flexibility Difficult to Recreate Human Instinct… …But is That a Bad Thing?
  • 13. Computer-Aided Ordering of Dissemination Pathways Phylogenetic Model Image Courtesy of Fred Hsu (Wikipedia:User:Fredhsu on en.wikipedia) CC-BY-SA-3.0 via Wikimedia Commons
  • 14. Future Plans Computer Program OCR Clean-up Processes Division into Likely Meme Groupings Variety of Relatedness Scores Textual Integrity Prefixes and Suffixes Chronological Separation Chronological-Geographical Feasibility Well-Worn Path Modifier Modeling of Relatedness Factors Manual Corrections Directional Social Network Database Raw Data to Inform Additional Research Direct Attributions Parsing Compilations Initial Discovery of Well-Worn Paths Inclusion of Offline Materials www.mhbeals.com/cnd
  • 15. VIEW THESE SLIDES ON SLIDESHARE MAPPING IMPLICIT PROCESSES: EXTRACTING SOCIAL NETWORKS FROM DIGITAL CORPORA M. H. Beals Shef f ield Hallam University @mhbeals ABOUT ME WWW. MHBEALS.COM