SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014
These are the slides for the introductory lecture that I gave as part of a skills seminar on Search and Data Mining (Luxembourg, 11 December 2014). The slides are rather visual and for the most part don’t include notes, yet I believe the gist of the talk will be clear. At the end links are included for tools, further reading and a link to the exercises we did.
These are the slides for the introductory lecture that I gave as part of a skills seminar on Search and Data Mining (Luxembourg, 11 December 2014). The slides are rather visual and for the most part don’t include notes, yet I believe the gist of the talk will be clear. At the end links are included for tools, further reading and a link to the exercises we did.
4.
Why historians should be
interested:
Old New CHANGE
Analogue resources Digital resources
SCALE
Small data Big data
Close reading Distant reading TECHNOLOGY
5.
the Big Data revolution?
Big data and claims about a paradigm change in the
humanities
7.
the Big Data revolution?
Big data and claims about a paradigm change in the
humanities
Data driven history
8.
the Big Data revolution?
Big data and claims about a paradigm change in the
humanities
Data driven history
Patterns and structures: a new essentialism?
9.
the Big Data revolution?
Big data and claims about a paradigm change in the
humanities
Data driven history
Patterns and structures: a new essentialism?
Based upon changes of scale & method: humanities
supposedly becoming more ‘scientific’ > results can be
checked and replicated, but can they? Interpretation.
10.
the Big Data revolution?
Big data and claims about a paradigm change in the
humanities
Data driven history
Patterns and structures: a new essentialism?
Based upon changes of scale & method: humanities
supposedly becoming more ‘scientific’ > results can be
checked and replicated, but can they? Interpretation.
Politics: funding & valorisation
11.
“One of the problems confronting data enthusiasts in
the humanities is that we feel a need to convince our
more old-fashioned colleagues about what can be done.
But our role as advocates of data shouldn't mean that
we lose our critical sense as scholars.
[....] there is a risk that we look more carefully at the
technical components of the datasets than the
historical context of the information that they represent.
Andrew Prescott, ‘The Deceptions of Data’, Digital Riffs (13
January 2013).
12.
Frédéric Clavert, ‘Lecture des sources historiennes à l’ère
numérique’ (14 November 2012)
Integrate
approaches
& methods/
hybridity
15.
zoeken op Internet algemeen:
Google
er is veel meer dan Google
filter bubble? bekijk eens: http://dontbubble.us
16.
zoeken op Internet algemeen:
Google
er is veel meer dan Google
filter bubble? bekijk eens: http://dontbubble.us
http://www.langreiter.com/exec/yahoo-vs-google.html
17.
zoeken op Internet algemeen:
Google
er is veel meer dan Google
filter bubble? bekijk eens: http://dontbubble.us
http://yometa.com
23.
example of Compactmemory: a great resource on
German-Jewish history
24.
Die Sammlung umfasst die 110 wichtigsten jüdischen
Zeitungen und Zeitschriften des deutschsprachigen Raumes
aus den Jahren 1806-1938. Die Periodika repräsentieren die
gesamte religiöse, politische, soziale, literarische oder
wissenschaftliche Bandbreite der jüdischen Gemeinschaft.
but be aware of selection: focus on elites and organisations that
highlight German Jewry’s process of emancipation :
• classical vision in historiography on German Jewry?
• reinforcement of existing master narratives?
32.
http://eculture.cs.vu.nl/europeana/session/search
•Google/ Bing/ Yahoo
• er is veel meer ...
• resultaten verschillen per zoekmachine
• en er is een filter bubbel
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal
Semantic web and linking data
33.
•Google/ Bing/ Yahoo
• er is veel meer ...
• resultaten verschillen per zoekmachine
• en er is een filter bubbel
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal
cs.vu.nl/europeana/session/search
34.
•Google/ Bing/ Yahoo
• er is veel meer ...
• resultaten verschillen per zoekmachine
• en er is een filter bubbel
•--> kortom: weten wat je zoekt en zoekstrategie cruciaal
36.
At its simplest, data mining is the process of extracting
new knowledge (usually in terms of previously unknown
patterns) from sets of data already in existence.
Jonathan Hagood
37.
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of
computer science, is the computational process of discovering
patterns in large data sets involving methods at the intersection
of artificial intelligence, machine learning, statistics, and
database systems.
The overall goal of the data mining process is to extract
information from a data set and transform it into an
understandable structure for further use.
Wikipedia
42.
“What is too often forgotten, though, is that our
digital helpers are full of ‘theory’ and ‘judgement’
already. As with any methodology, they rely on sets
of assumptions, models, and strategies. Theory is
already at work on the most basic level when it
comes to defining units of analysis, algorithms, and
visualisation procedures.”
Bernhard Rieder and Theo Röhle, ‘Digital Methods: Five
Challenges’ in: David M Berry ed., Understanding Digital
Humanities (Houndmills: Palgrave Macmillan, 2012) 67-85,
70.
46.
Tools & workflows
Voyant Tools
Voyant Tools Documentation
Programming Historian
DIRT: Digital Research Tools
Turkel, William J., Kevin Kee, and Spencer Roberts, ‘A
Method for Navigating the Infinite Archive’ in: Toni
Weller ed., History in the Digital Age (London; New
York: Routledge, 2013).
William J. Turkel: How To
47.
Further reading
Special issue on Digital History, BMGN - Low Countries Historical Review, 128/4 (2013).
Haber, Peter, Digital Past : Geschichtswissenschaft Im Digitalen Zeitalter (München:
Oldenbourg Verlag, 2011).
Boonstra, Onno, Leen Breure, and Peter Doorn, Past, Present and Future of Historical
Information Science (Amsterdam: NIWI-KNAW, 2004).
Ciravegna, Fabio, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie McLaughlin,
and Ravish Bhagdev, ‘Finding Needles in Hay- Stacks: Data-Mining in Distributed
Historical Datasets’ in: Mark Greengrass and Lorna M Hughes eds., The Virtual
Representation of the Past (Ashgate, 2008).
Cohen, D, F Gibbs, T Hitchcock, G Rockwell, J Sander, R Shoemaker, S Sinclair, S Takats,
W J Turkel, and C Briquet. "Data Mining with Criminal Intent." Final white paper (2011).
Hagood, Jonathan, "A Brief Introduction to Data Mining Projects in the Humanities."
Bulletin of the American Society for Information Science and Technology 38/4 (2012).
Hitchcock, Tim, "Big Data for Dead People: Digital Readings and the Conundrums of
Positivism." (9 December 2013).
Leonard, Peter, "Mining Large Datasets for the Humanities”, IFLA WLIC 2014.
48.
Dr. Gerben Zaagsma
http://gerbenzaagsma.org
de.linkedin.com/in/gerbenzaagsma/
https://twitter.com/gerbenzaagsma
https://uni-goettingen.academia.edu/GerbenZaagsma
https://www.researchgate.net/profile/Gerben_Zaagsma
https://www.slideshare.net/gerbenzaagsma
49.
Image credits
The Field Museum Library, Hall 37 Geology overview. URL: https://www.flickr.com/photos/
field_museum_library/3333920156/in/set-72157614881700424.
The U.S. National Archives, Photograph of Card Catalog in Central Search Room, 1942. URL: http://
www.flickr.com/photos/usnationalarchives/3873932255/.
Witch computer 1951: Wolverhampton and Staffordshire College of Technology in 1961, The National
Computing Museum and Computer Conservation Society/UKAEA/Wolverhampton Express and Star, via:
http://www.wired.com/2009/09/britan-oldest-computer/.
Code: https://www.flickr.com/photos/lord_james/4696338852/.
Tools: Flickr Commons
The droids we're googling for: https://www.flickr.com/photos/st3f4n/3951143570/.
Jaws (Steven Spielberg) original movie poster: https://en.wikipedia.org/wiki/File:JAWS_Movie_poster.jpg
Structured/unstructured data: http://www.emc.com/collateral/demos/microsites/emc-digital-universe-
2011/index.htm
Macbook Data Mining: http://www.flickr.com/photos/17208993@N00/442531562/.
Topic Modeling Martha Ballard’s Diary: http://www.cameronblevins.org/posts/topic-modeling-martha-ballards-
diary/.
Boolean operators: http://uksourcers.co.uk/2012/capital-letters-the-key-to-boolean-success/
Miami University students in laboratory classroom 1908: https://www.flickr.com/photos/
muohio_digital_collections/3199691495/