Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014

Gerben Zaagsma
Gerben ZaagsmaResearch fellow at Lichtenberg-Kolleg, Georg-August-Universität Göttingen
Search & Data Mining 
SKILLS SEMINAR 
Master of European History, University of Luxembourg, 11 December 2014 
Gerben Zaagsma 
Lichtenberg-Kolleg,
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Overview 
1. 
2. T 
3. Practical exercises 
1. Introduction search & data mining
Code yourself… …or use existing tools
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Why historians should be 
interested: 
Old New CHANGE 
Analogue resources Digital resources 
SCALE 
Small data Big data 
Close reading Distant reading TECHNOLOGY
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities
culturomics and Google ngrams
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns and structures: a new essentialism?
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns and structures: a new essentialism? 
Based upon changes of scale & method: humanities 
supposedly becoming more ‘scientific’ > results can be 
checked and replicated, but can they? Interpretation.
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns and structures: a new essentialism? 
Based upon changes of scale & method: humanities 
supposedly becoming more ‘scientific’ > results can be 
checked and replicated, but can they? Interpretation. 
Politics: funding & valorisation
“One of the problems confronting data enthusiasts in 
the humanities is that we feel a need to convince our 
more old-fashioned colleagues about what can be done. 
But our role as advocates of data shouldn't mean that 
we lose our critical sense as scholars. 
[....] there is a risk that we look more carefully at the 
technical components of the datasets than the 
historical context of the information that they represent. 
Andrew Prescott, ‘The Deceptions of Data’, Digital Riffs (13 
January 2013).
Frédéric Clavert, ‘Lecture des sources historiennes à l’ère 
numérique’ (14 November 2012) 
Integrate 
approaches 
& methods/ 
hybridity
1. SEARCH
Google/ Bing/ Yahoo 
er is veel meer ...
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us 
http://www.langreiter.com/exec/yahoo-vs-google.html
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us 
http://yometa.com
filter bubble? 
http://www.thefilterbubble.com
filter bubble? 
http://www.thefilterbubble.com
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Web search round-up 
differences between search engines 
filter bubble 
deep web versus visible web
Searching digital libraries & archives…
composition of resources, selection…
example of Compactmemory: a great resource on 
German-Jewish history
Die Sammlung umfasst die 110 wichtigsten jüdischen 
Zeitungen und Zeitschriften des deutschsprachigen Raumes 
aus den Jahren 1806-1938. Die Periodika repräsentieren die 
gesamte religiöse, politische, soziale, literarische oder 
wissenschaftliche Bandbreite der jüdischen Gemeinschaft. 
but be aware of selection: focus on elites and organisations that 
highlight German Jewry’s process of emancipation : 
• classical vision in historiography on German Jewry? 
• reinforcement of existing master narratives?
mind the context…
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Processing and searching data on your own 
computer…
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
1. DATA MINING
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
data? 
data = computer-processable information
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Example of structured data
Many digital libraries/archives: 
un-/semi-structured data
Digital editions: bridging the gap with XML
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
http://eculture.cs.vu.nl/europeana/session/search 
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen per zoekmachine 
• en er is een filter bubbel 
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal 
Semantic web and linking data
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen per zoekmachine 
• en er is een filter bubbel 
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal 
cs.vu.nl/europeana/session/search
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen per zoekmachine 
• en er is een filter bubbel 
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal
Some definitions of data mining:
At its simplest, data mining is the process of extracting 
new knowledge (usually in terms of previously unknown 
patterns) from sets of data already in existence. 
Jonathan Hagood
Data mining (the analysis step of the "Knowledge Discovery in 
Databases" process, or KDD), an interdisciplinary subfield of 
computer science, is the computational process of discovering 
patterns in large data sets involving methods at the intersection 
of artificial intelligence, machine learning, statistics, and 
database systems. 
The overall goal of the data mining process is to extract 
information from a data set and transform it into an 
understandable structure for further use. 
Wikipedia
Examples of projects and techniques
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
an n-gram is a contiguous sequence of n 
items from a given sequence of text or speech
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
Topic Modeling Martha Ballard’s Diary
data? 
data & data mining ≠ neutral
“What is too often forgotten, though, is that our 
digital helpers are full of ‘theory’ and ‘judgement’ 
already. As with any methodology, they rely on sets 
of assumptions, models, and strategies. Theory is 
already at work on the most basic level when it 
comes to defining units of analysis, algorithms, and 
visualisation procedures.” 
Bernhard Rieder and Theo Röhle, ‘Digital Methods: Five 
Challenges’ in: David M Berry ed., Understanding Digital 
Humanities (Houndmills: Palgrave Macmillan, 2012) 67-85, 
70.
2. TOOLS
3. Practical exercises
Overview of exercises 
http://goo.gl/72fCn7
Tools & workflows 
Voyant Tools 
Voyant Tools Documentation 
Programming Historian 
DIRT: Digital Research Tools 
Turkel, William J., Kevin Kee, and Spencer Roberts, ‘A 
Method for Navigating the Infinite Archive’ in: Toni 
Weller ed., History in the Digital Age (London; New 
York: Routledge, 2013). 
William J. Turkel: How To
Further reading 
Special issue on Digital History, BMGN - Low Countries Historical Review, 128/4 (2013). 
Haber, Peter, Digital Past : Geschichtswissenschaft Im Digitalen Zeitalter (München: 
Oldenbourg Verlag, 2011). 
Boonstra, Onno, Leen Breure, and Peter Doorn, Past, Present and Future of Historical 
Information Science (Amsterdam: NIWI-KNAW, 2004). 
Ciravegna, Fabio, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie McLaughlin, 
and Ravish Bhagdev, ‘Finding Needles in Hay- Stacks: Data-Mining in Distributed 
Historical Datasets’ in: Mark Greengrass and Lorna M Hughes eds., The Virtual 
Representation of the Past (Ashgate, 2008). 
Cohen, D, F Gibbs, T Hitchcock, G Rockwell, J Sander, R Shoemaker, S Sinclair, S Takats, 
W J Turkel, and C Briquet. "Data Mining with Criminal Intent." Final white paper (2011). 
Hagood, Jonathan, "A Brief Introduction to Data Mining Projects in the Humanities." 
Bulletin of the American Society for Information Science and Technology 38/4 (2012). 
Hitchcock, Tim, "Big Data for Dead People: Digital Readings and the Conundrums of 
Positivism." (9 December 2013). 
Leonard, Peter, "Mining Large Datasets for the Humanities”, IFLA WLIC 2014.
Dr. Gerben Zaagsma 
http://gerbenzaagsma.org 
de.linkedin.com/in/gerbenzaagsma/ 
https://twitter.com/gerbenzaagsma 
https://uni-goettingen.academia.edu/GerbenZaagsma 
https://www.researchgate.net/profile/Gerben_Zaagsma 
https://www.slideshare.net/gerbenzaagsma
Image credits 
The Field Museum Library, Hall 37 Geology overview. URL: https://www.flickr.com/photos/ 
field_museum_library/3333920156/in/set-72157614881700424. 
The U.S. National Archives, Photograph of Card Catalog in Central Search Room, 1942. URL: http:// 
www.flickr.com/photos/usnationalarchives/3873932255/. 
Witch computer 1951: Wolverhampton and Staffordshire College of Technology in 1961, The National 
Computing Museum and Computer Conservation Society/UKAEA/Wolverhampton Express and Star, via: 
http://www.wired.com/2009/09/britan-oldest-computer/. 
Code: https://www.flickr.com/photos/lord_james/4696338852/. 
Tools: Flickr Commons 
The droids we're googling for: https://www.flickr.com/photos/st3f4n/3951143570/. 
Jaws (Steven Spielberg) original movie poster: https://en.wikipedia.org/wiki/File:JAWS_Movie_poster.jpg 
Structured/unstructured data: http://www.emc.com/collateral/demos/microsites/emc-digital-universe- 
2011/index.htm 
Macbook Data Mining: http://www.flickr.com/photos/17208993@N00/442531562/. 
Topic Modeling Martha Ballard’s Diary: http://www.cameronblevins.org/posts/topic-modeling-martha-ballards- 
diary/. 
Boolean operators: http://uksourcers.co.uk/2012/capital-letters-the-key-to-boolean-success/ 
Miami University students in laboratory classroom 1908: https://www.flickr.com/photos/ 
muohio_digital_collections/3199691495/
1 of 67

Recommended

Bridging Digital Humanities Research and Big Data Repositories of Digital Text by
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBridging Digital Humanities Research and Big Data Repositories of Digital Text
Bridging Digital Humanities Research and Big Data Repositories of Digital TextBeth Plale
5.8K views49 slides
Plale HathiTrust El Colegio de Mexico May2014 by
Plale HathiTrust El Colegio de Mexico May2014Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014Beth Plale
4.9K views50 slides
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework by
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkRobert H. McDonald
6.2K views21 slides
Linked Open Data in Libraries, Archives & Museums by
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
2.2K views51 slides
Big Data in the Arts and Humanities by
Big Data in the Arts and HumanitiesBig Data in the Arts and Humanities
Big Data in the Arts and HumanitiesAndrew Prescott
2.2K views35 slides
Requirements Engineering for the Humanities by
Requirements Engineering for the HumanitiesRequirements Engineering for the Humanities
Requirements Engineering for the HumanitiesShawn Day
1.1K views65 slides

More Related Content

What's hot

Rogers digitalmethods 4nov2010 by
Rogers digitalmethods 4nov2010Rogers digitalmethods 4nov2010
Rogers digitalmethods 4nov2010Digital Methods Initiative
1.9K views73 slides
Semantic web Santhosh N Basavarajappa by
Semantic web   Santhosh N BasavarajappaSemantic web   Santhosh N Basavarajappa
Semantic web Santhosh N BasavarajappaSanthosh Basavarajappa
824 views50 slides
International Collaboration Networks in the Emerging (Big) Data Science by
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Sciencedatasciencekorea
2.3K views59 slides
A Cabinet Of Web2.0 Scientific Curiosities by
A Cabinet Of Web2.0 Scientific CuriositiesA Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesIan Mulvany
42.2K views128 slides
Big Data in the Arts and Humanities by
Big Data in the Arts and HumanitiesBig Data in the Arts and Humanities
Big Data in the Arts and HumanitiesAndrew Prescott
940 views21 slides
Humanities in the Digital World by
Humanities in the Digital WorldHumanities in the Digital World
Humanities in the Digital WorldDavid De Roure
2K views58 slides

What's hot(20)

International Collaboration Networks in the Emerging (Big) Data Science by datasciencekorea
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
datasciencekorea2.3K views
A Cabinet Of Web2.0 Scientific Curiosities by Ian Mulvany
A Cabinet Of Web2.0 Scientific CuriositiesA Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific Curiosities
Ian Mulvany42.2K views
Big Data in the Arts and Humanities by Andrew Prescott
Big Data in the Arts and HumanitiesBig Data in the Arts and Humanities
Big Data in the Arts and Humanities
Andrew Prescott940 views
Humanities in the Digital World by David De Roure
Humanities in the Digital WorldHumanities in the Digital World
Humanities in the Digital World
David De Roure2K views
From DARPA to Shakespeare: All the Data we Can Handle by Kimberly Hoffman
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
Kimberly Hoffman1.2K views
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj by Mirko Lorenz
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Mirko Lorenz7.3K views
New Forms of Data for e-Research by David De Roure
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-Research
David De Roure1.3K views
CLIR Fellows - Science Data - 14_0730 by jeffreylancaster
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730
jeffreylancaster1.2K views
MPhil Lecture of Data Vis for Presentation by Shawn Day
MPhil Lecture of Data Vis for PresentationMPhil Lecture of Data Vis for Presentation
MPhil Lecture of Data Vis for Presentation
Shawn Day818 views
Data Management Solutions from Libraries at NSF Large Facilities Workshop by Carly Strasser
Data Management Solutions from Libraries at NSF Large Facilities WorkshopData Management Solutions from Libraries at NSF Large Facilities Workshop
Data Management Solutions from Libraries at NSF Large Facilities Workshop
Carly Strasser655 views
Scholarship in the Digital World by David De Roure
Scholarship in the Digital WorldScholarship in the Digital World
Scholarship in the Digital World
David De Roure1.2K views
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014 by Kimberly Hoffman
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
Kimberly Hoffman1K views
Intro to Linked Open Data in Libraries Archives & Museums. by Jon Voss
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.
Jon Voss7.4K views
Beyond Preservation: Situating Archaeological Data in Professional Practice by Eric Kansa
Beyond Preservation: Situating Archaeological Data in Professional PracticeBeyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional Practice
Eric Kansa394 views
How to Build Linked Data Sites with Drupal 7 and RDFa by scorlosquet
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFa
scorlosquet11.5K views
Google Tools for Digital Humanities Scholars by Shawn Day
Google Tools for Digital Humanities ScholarsGoogle Tools for Digital Humanities Scholars
Google Tools for Digital Humanities Scholars
Shawn Day1.3K views

Viewers also liked

Data mining slides by
Data mining slidesData mining slides
Data mining slidessmj
130.8K views20 slides
Data Mining Ieee Papers Trichy by
Data Mining Ieee Papers TrichyData Mining Ieee Papers Trichy
Data Mining Ieee Papers Trichykrish madhi
415 views11 slides
Presentation data mining(1) by
Presentation data mining(1)Presentation data mining(1)
Presentation data mining(1)cegonsoft1999
627 views11 slides
Cloud computing 2015 ieee papers Data mining ieee project titles by
Cloud computing  2015 ieee papers  Data mining ieee project titlesCloud computing  2015 ieee papers  Data mining ieee project titles
Cloud computing 2015 ieee papers Data mining ieee project titlesDoClick Solutions
406 views5 slides
Project center in trichy @ieee 2016 17 titles for java and dotnet by
Project center in trichy @ieee 2016 17 titles for java and dotnetProject center in trichy @ieee 2016 17 titles for java and dotnet
Project center in trichy @ieee 2016 17 titles for java and dotnetElakkiya Triplen
165 views2 slides
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH by
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACHNexgen Technology
616 views4 slides

Viewers also liked(20)

Data mining slides by smj
Data mining slidesData mining slides
Data mining slides
smj130.8K views
Data Mining Ieee Papers Trichy by krish madhi
Data Mining Ieee Papers TrichyData Mining Ieee Papers Trichy
Data Mining Ieee Papers Trichy
krish madhi415 views
Presentation data mining(1) by cegonsoft1999
Presentation data mining(1)Presentation data mining(1)
Presentation data mining(1)
cegonsoft1999627 views
Cloud computing 2015 ieee papers Data mining ieee project titles by DoClick Solutions
Cloud computing  2015 ieee papers  Data mining ieee project titlesCloud computing  2015 ieee papers  Data mining ieee project titles
Cloud computing 2015 ieee papers Data mining ieee project titles
DoClick Solutions406 views
Project center in trichy @ieee 2016 17 titles for java and dotnet by Elakkiya Triplen
Project center in trichy @ieee 2016 17 titles for java and dotnetProject center in trichy @ieee 2016 17 titles for java and dotnet
Project center in trichy @ieee 2016 17 titles for java and dotnet
Elakkiya Triplen165 views
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH by Nexgen Technology
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
Nexgen Technology 616 views
Mining Electronic Health Records for Insights by Ontotext
Mining Electronic Health Records for InsightsMining Electronic Health Records for Insights
Mining Electronic Health Records for Insights
Ontotext844 views
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA by projectsepark
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCAFinal year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
Final year IEEE,NON IEEE projects for 2013-14 for BCA,BTECH,Diploma,Mtech,MCA
projectsepark3.4K views
Data mining on social networks for students learning experiences by Biplab Debnath
Data mining on social networks for students learning experiences Data mining on social networks for students learning experiences
Data mining on social networks for students learning experiences
Biplab Debnath381 views
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd by Healthcare consultant
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdSMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
Smart health prediction using data mining by customsoft by Custom Soft
Smart health prediction using data mining by customsoftSmart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoft
Custom Soft1.2K views
Introduction to Big Data & Hadoop by Edureka!
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
Edureka!1.5K views
Monkey runner & Monkey testing by SWAAM Tech
Monkey runner & Monkey testingMonkey runner & Monkey testing
Monkey runner & Monkey testing
SWAAM Tech7.6K views
Data mining seminar report by mayurik19
Data mining seminar reportData mining seminar report
Data mining seminar report
mayurik1915.6K views

Similar to Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014

Data Science in 2016: Moving Up by
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
10.5K views66 slides
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015 by
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
998 views67 slides
Digital research: Collections, data, tools and methods by
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Stella Wisdom
109 views36 slides
Digital Humanities and “Digital” Social Sciences by
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesChantal van Son
2K views65 slides
Accessing and Using Big Data to Advance Social Science Knowledge by
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeJosh Cowls
675 views21 slides
Critical issues in the collection, analysis and use of student (digital) data by
Critical issues in the collection, analysis and use of student (digital) dataCritical issues in the collection, analysis and use of student (digital) data
Critical issues in the collection, analysis and use of student (digital) dataUniversity of South Africa (Unisa)
2.4K views45 slides

Similar to Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014(20)

Data Science in 2016: Moving Up by Paco Nathan
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
Paco Nathan10.5K views
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015 by Big Data Spain
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Big Data Spain998 views
Digital research: Collections, data, tools and methods by Stella Wisdom
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods
Stella Wisdom109 views
Digital Humanities and “Digital” Social Sciences by Chantal van Son
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
Chantal van Son2K views
Accessing and Using Big Data to Advance Social Science Knowledge by Josh Cowls
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
Josh Cowls675 views
Mapping (big) data science (15 dec2014)대학(원)생 by Han Woo PARK
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
Han Woo PARK1.7K views
When Search becomes Research and Research becomes Search by Jaap Kamps
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
Jaap Kamps836 views
Digital Humanities by Ingrid Thomson by pvhead123
Digital Humanities  by Ingrid ThomsonDigital Humanities  by Ingrid Thomson
Digital Humanities by Ingrid Thomson
pvhead123777 views
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015 by Jonathan Woodward
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Jonathan Woodward2.6K views
Big Data in NATO and Your Role by Jay Gendron
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
Jay Gendron1.4K views
Data Communities - reusable data in and outside your organization. by Paul Groth
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
Paul Groth121 views
Digital project planning and pedagogy by librarianrafia
Digital project planning and pedagogyDigital project planning and pedagogy
Digital project planning and pedagogy
librarianrafia259 views

More from Gerben Zaagsma

20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri... by
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri...Gerben Zaagsma
846 views41 slides
20130314 - Historical sources and data in the digital age by
20130314 - Historical sources and data in the digital age20130314 - Historical sources and data in the digital age
20130314 - Historical sources and data in the digital ageGerben Zaagsma
671 views61 slides
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische... by
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...Gerben Zaagsma
891 views37 slides
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding by
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - InleidingGerben Zaagsma
896 views35 slides
20130107 - Introduction: On Digital History by
20130107 -  Introduction: On Digital History20130107 -  Introduction: On Digital History
20130107 - Introduction: On Digital HistoryGerben Zaagsma
601 views67 slides
20110517 - Presenting the Yiddish past in contemporary Europe by
20110517 - Presenting the Yiddish past in contemporary Europe20110517 - Presenting the Yiddish past in contemporary Europe
20110517 - Presenting the Yiddish past in contemporary EuropeGerben Zaagsma
497 views40 slides

More from Gerben Zaagsma(7)

20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri... by Gerben Zaagsma
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3  - Bronnenkri...
20130315 - Cursus Digitaal Historisch Onderzoek 2013: College 3 - Bronnenkri...
Gerben Zaagsma846 views
20130314 - Historical sources and data in the digital age by Gerben Zaagsma
20130314 - Historical sources and data in the digital age20130314 - Historical sources and data in the digital age
20130314 - Historical sources and data in the digital age
Gerben Zaagsma671 views
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische... by Gerben Zaagsma
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
20130301 - Cursus Digitaal Historisch Onderzoek 2013: College 2 - Historische...
Gerben Zaagsma891 views
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding by Gerben Zaagsma
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
20130215 - Cursus Digitaal Historisch Onderzoek 2013: College 1 - Inleiding
Gerben Zaagsma896 views
20130107 - Introduction: On Digital History by Gerben Zaagsma
20130107 -  Introduction: On Digital History20130107 -  Introduction: On Digital History
20130107 - Introduction: On Digital History
Gerben Zaagsma601 views
20110517 - Presenting the Yiddish past in contemporary Europe by Gerben Zaagsma
20110517 - Presenting the Yiddish past in contemporary Europe20110517 - Presenting the Yiddish past in contemporary Europe
20110517 - Presenting the Yiddish past in contemporary Europe
Gerben Zaagsma497 views
20111031 - Online Jewish content in a broader context by Gerben Zaagsma
20111031 - Online Jewish content in a broader context20111031 - Online Jewish content in a broader context
20111031 - Online Jewish content in a broader context
Gerben Zaagsma541 views

Recently uploaded

Java Simplified: Understanding Programming Basics by
Java Simplified: Understanding Programming BasicsJava Simplified: Understanding Programming Basics
Java Simplified: Understanding Programming BasicsAkshaj Vadakkath Joshy
653 views155 slides
UNIDAD 3 6º C.MEDIO.pptx by
UNIDAD 3 6º C.MEDIO.pptxUNIDAD 3 6º C.MEDIO.pptx
UNIDAD 3 6º C.MEDIO.pptxMarcosRodriguezUcedo
146 views32 slides
Thanksgiving!.pdf by
Thanksgiving!.pdfThanksgiving!.pdf
Thanksgiving!.pdfEnglishCEIPdeSigeiro
500 views17 slides
MercerJesse3.0.pdf by
MercerJesse3.0.pdfMercerJesse3.0.pdf
MercerJesse3.0.pdfjessemercerail
152 views6 slides
Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37 by
Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37
Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37MysoreMuleSoftMeetup
50 views17 slides
12.5.23 Poverty and Precarity.pptx by
12.5.23 Poverty and Precarity.pptx12.5.23 Poverty and Precarity.pptx
12.5.23 Poverty and Precarity.pptxmary850239
381 views30 slides

Recently uploaded(20)

Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37 by MysoreMuleSoftMeetup
Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37
Payment Integration using Braintree Connector | MuleSoft Mysore Meetup #37
12.5.23 Poverty and Precarity.pptx by mary850239
12.5.23 Poverty and Precarity.pptx12.5.23 Poverty and Precarity.pptx
12.5.23 Poverty and Precarity.pptx
mary850239381 views
INT-244 Topic 6b Confucianism by S Meyer
INT-244 Topic 6b ConfucianismINT-244 Topic 6b Confucianism
INT-244 Topic 6b Confucianism
S Meyer45 views
JRN 362 - Lecture Twenty-Three (Epilogue) by Rich Hanley
JRN 362 - Lecture Twenty-Three (Epilogue)JRN 362 - Lecture Twenty-Three (Epilogue)
JRN 362 - Lecture Twenty-Three (Epilogue)
Rich Hanley41 views
Create a Structure in VBNet.pptx by Breach_P
Create a Structure in VBNet.pptxCreate a Structure in VBNet.pptx
Create a Structure in VBNet.pptx
Breach_P86 views
Nelson_RecordStore.pdf by BrynNelson5
Nelson_RecordStore.pdfNelson_RecordStore.pdf
Nelson_RecordStore.pdf
BrynNelson546 views
Narration lesson plan by TARIQ KHAN
Narration lesson planNarration lesson plan
Narration lesson plan
TARIQ KHAN75 views
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv... by Taste
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Taste55 views
Six Sigma Concept by Sahil Srivastava.pptx by Sahil Srivastava
Six Sigma Concept by Sahil Srivastava.pptxSix Sigma Concept by Sahil Srivastava.pptx
Six Sigma Concept by Sahil Srivastava.pptx
Sahil Srivastava44 views

Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014

  • 1. Search & Data Mining SKILLS SEMINAR Master of European History, University of Luxembourg, 11 December 2014 Gerben Zaagsma Lichtenberg-Kolleg,
  • 3. Overview 1. 2. T 3. Practical exercises 1. Introduction search & data mining
  • 4. Code yourself… …or use existing tools
  • 6. Why historians should be interested: Old New CHANGE Analogue resources Digital resources SCALE Small data Big data Close reading Distant reading TECHNOLOGY
  • 7. the Big Data revolution? Big data and claims about a paradigm change in the humanities
  • 11. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history
  • 12. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism?
  • 13. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism? Based upon changes of scale & method: humanities supposedly becoming more ‘scientific’ > results can be checked and replicated, but can they? Interpretation.
  • 14. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism? Based upon changes of scale & method: humanities supposedly becoming more ‘scientific’ > results can be checked and replicated, but can they? Interpretation. Politics: funding & valorisation
  • 15. “One of the problems confronting data enthusiasts in the humanities is that we feel a need to convince our more old-fashioned colleagues about what can be done. But our role as advocates of data shouldn't mean that we lose our critical sense as scholars. [....] there is a risk that we look more carefully at the technical components of the datasets than the historical context of the information that they represent. Andrew Prescott, ‘The Deceptions of Data’, Digital Riffs (13 January 2013).
  • 16. Frédéric Clavert, ‘Lecture des sources historiennes à l’ère numérique’ (14 November 2012) Integrate approaches & methods/ hybridity
  • 18. Google/ Bing/ Yahoo er is veel meer ...
  • 19. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us
  • 20. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us http://www.langreiter.com/exec/yahoo-vs-google.html
  • 21. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us http://yometa.com
  • 25. Web search round-up differences between search engines filter bubble deep web versus visible web
  • 28. example of Compactmemory: a great resource on German-Jewish history
  • 29. Die Sammlung umfasst die 110 wichtigsten jüdischen Zeitungen und Zeitschriften des deutschsprachigen Raumes aus den Jahren 1806-1938. Die Periodika repräsentieren die gesamte religiöse, politische, soziale, literarische oder wissenschaftliche Bandbreite der jüdischen Gemeinschaft. but be aware of selection: focus on elites and organisations that highlight German Jewry’s process of emancipation : • classical vision in historiography on German Jewry? • reinforcement of existing master narratives?
  • 34. Processing and searching data on your own computer…
  • 40. data? data = computer-processable information
  • 43. Many digital libraries/archives: un-/semi-structured data
  • 44. Digital editions: bridging the gap with XML
  • 47. http://eculture.cs.vu.nl/europeana/session/search •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal Semantic web and linking data
  • 48. •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal cs.vu.nl/europeana/session/search
  • 49. •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal
  • 50. Some definitions of data mining:
  • 51. At its simplest, data mining is the process of extracting new knowledge (usually in terms of previously unknown patterns) from sets of data already in existence. Jonathan Hagood
  • 52. Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Wikipedia
  • 53. Examples of projects and techniques
  • 55. an n-gram is a contiguous sequence of n items from a given sequence of text or speech
  • 58. Topic Modeling Martha Ballard’s Diary
  • 59. data? data & data mining ≠ neutral
  • 60. “What is too often forgotten, though, is that our digital helpers are full of ‘theory’ and ‘judgement’ already. As with any methodology, they rely on sets of assumptions, models, and strategies. Theory is already at work on the most basic level when it comes to defining units of analysis, algorithms, and visualisation procedures.” Bernhard Rieder and Theo Röhle, ‘Digital Methods: Five Challenges’ in: David M Berry ed., Understanding Digital Humanities (Houndmills: Palgrave Macmillan, 2012) 67-85, 70.
  • 63. Overview of exercises http://goo.gl/72fCn7
  • 64. Tools & workflows Voyant Tools Voyant Tools Documentation Programming Historian DIRT: Digital Research Tools Turkel, William J., Kevin Kee, and Spencer Roberts, ‘A Method for Navigating the Infinite Archive’ in: Toni Weller ed., History in the Digital Age (London; New York: Routledge, 2013). William J. Turkel: How To
  • 65. Further reading Special issue on Digital History, BMGN - Low Countries Historical Review, 128/4 (2013). Haber, Peter, Digital Past : Geschichtswissenschaft Im Digitalen Zeitalter (München: Oldenbourg Verlag, 2011). Boonstra, Onno, Leen Breure, and Peter Doorn, Past, Present and Future of Historical Information Science (Amsterdam: NIWI-KNAW, 2004). Ciravegna, Fabio, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie McLaughlin, and Ravish Bhagdev, ‘Finding Needles in Hay- Stacks: Data-Mining in Distributed Historical Datasets’ in: Mark Greengrass and Lorna M Hughes eds., The Virtual Representation of the Past (Ashgate, 2008). Cohen, D, F Gibbs, T Hitchcock, G Rockwell, J Sander, R Shoemaker, S Sinclair, S Takats, W J Turkel, and C Briquet. "Data Mining with Criminal Intent." Final white paper (2011). Hagood, Jonathan, "A Brief Introduction to Data Mining Projects in the Humanities." Bulletin of the American Society for Information Science and Technology 38/4 (2012). Hitchcock, Tim, "Big Data for Dead People: Digital Readings and the Conundrums of Positivism." (9 December 2013). Leonard, Peter, "Mining Large Datasets for the Humanities”, IFLA WLIC 2014.
  • 66. Dr. Gerben Zaagsma http://gerbenzaagsma.org de.linkedin.com/in/gerbenzaagsma/ https://twitter.com/gerbenzaagsma https://uni-goettingen.academia.edu/GerbenZaagsma https://www.researchgate.net/profile/Gerben_Zaagsma https://www.slideshare.net/gerbenzaagsma
  • 67. Image credits The Field Museum Library, Hall 37 Geology overview. URL: https://www.flickr.com/photos/ field_museum_library/3333920156/in/set-72157614881700424. The U.S. National Archives, Photograph of Card Catalog in Central Search Room, 1942. URL: http:// www.flickr.com/photos/usnationalarchives/3873932255/. Witch computer 1951: Wolverhampton and Staffordshire College of Technology in 1961, The National Computing Museum and Computer Conservation Society/UKAEA/Wolverhampton Express and Star, via: http://www.wired.com/2009/09/britan-oldest-computer/. Code: https://www.flickr.com/photos/lord_james/4696338852/. Tools: Flickr Commons The droids we're googling for: https://www.flickr.com/photos/st3f4n/3951143570/. Jaws (Steven Spielberg) original movie poster: https://en.wikipedia.org/wiki/File:JAWS_Movie_poster.jpg Structured/unstructured data: http://www.emc.com/collateral/demos/microsites/emc-digital-universe- 2011/index.htm Macbook Data Mining: http://www.flickr.com/photos/17208993@N00/442531562/. Topic Modeling Martha Ballard’s Diary: http://www.cameronblevins.org/posts/topic-modeling-martha-ballards- diary/. Boolean operators: http://uksourcers.co.uk/2012/capital-letters-the-key-to-boolean-success/ Miami University students in laboratory classroom 1908: https://www.flickr.com/photos/ muohio_digital_collections/3199691495/