Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Search & Data Mining 
SKILLS SEMINAR 
Master of European History, University of Luxembourg, 11 December 2014 
Gerben Zaags...
Overview 
1. 
2. T 
3. Practical exercises 
1. Introduction search & data mining
Code yourself… …or use existing tools
Why historians should be 
interested: 
Old New CHANGE 
Analogue resources Digital resources 
SCALE 
Small data Big data 
C...
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities
culturomics and Google ngrams
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns an...
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns an...
the Big Data revolution? 
Big data and claims about a paradigm change in the 
humanities 
Data driven history 
Patterns an...
“One of the problems confronting data enthusiasts in 
the humanities is that we feel a need to convince our 
more old-fash...
Frédéric Clavert, ‘Lecture des sources historiennes à l’ère 
numérique’ (14 November 2012) 
Integrate 
approaches 
& metho...
1. SEARCH
Google/ Bing/ Yahoo 
er is veel meer ...
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us 
http:/...
zoeken op Internet algemeen: 
Google 
er is veel meer dan Google 
filter bubble? bekijk eens: http://dontbubble.us 
http:/...
filter bubble? 
http://www.thefilterbubble.com
filter bubble? 
http://www.thefilterbubble.com
Web search round-up 
differences between search engines 
filter bubble 
deep web versus visible web
Searching digital libraries & archives…
composition of resources, selection…
example of Compactmemory: a great resource on 
German-Jewish history
Die Sammlung umfasst die 110 wichtigsten jüdischen 
Zeitungen und Zeitschriften des deutschsprachigen Raumes 
aus den Jahr...
mind the context…
Processing and searching data on your own 
computer…
1. DATA MINING
data? 
data = computer-processable information
Example of structured data
Many digital libraries/archives: 
un-/semi-structured data
Digital editions: bridging the gap with XML
http://eculture.cs.vu.nl/europeana/session/search 
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen p...
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen per zoekmachine 
• en er is een filter bubbel 
•--> ...
•Google/ Bing/ Yahoo 
• er is veel meer ... 
• resultaten verschillen per zoekmachine 
• en er is een filter bubbel 
•--> ...
Some definitions of data mining:
At its simplest, data mining is the process of extracting 
new knowledge (usually in terms of previously unknown 
patterns...
Data mining (the analysis step of the "Knowledge Discovery in 
Databases" process, or KDD), an interdisciplinary subfield ...
Examples of projects and techniques
an n-gram is a contiguous sequence of n 
items from a given sequence of text or speech
Topic Modeling Martha Ballard’s Diary
data? 
data & data mining ≠ neutral
“What is too often forgotten, though, is that our 
digital helpers are full of ‘theory’ and ‘judgement’ 
already. As with ...
2. TOOLS
3. Practical exercises
Overview of exercises 
http://goo.gl/72fCn7
Tools & workflows 
Voyant Tools 
Voyant Tools Documentation 
Programming Historian 
DIRT: Digital Research Tools 
Turkel, ...
Further reading 
Special issue on Digital History, BMGN - Low Countries Historical Review, 128/4 (2013). 
Haber, Peter, Di...
Dr. Gerben Zaagsma 
http://gerbenzaagsma.org 
de.linkedin.com/in/gerbenzaagsma/ 
https://twitter.com/gerbenzaagsma 
https:...
Image credits 
The Field Museum Library, Hall 37 Geology overview. URL: https://www.flickr.com/photos/ 
field_museum_libra...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 Decemb...
Upcoming SlideShare
Loading in …5
×

Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014

1,341 views

Published on

These are the slides for the introductory lecture that I gave as part of a skills seminar on Search and Data Mining (Luxembourg, 11 December 2014). The slides are rather visual and for the most part don’t include notes, yet I believe the gist of the talk will be clear. At the end links are included for tools, further reading and a link to the exercises we did.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014

  1. 1. Search & Data Mining SKILLS SEMINAR Master of European History, University of Luxembourg, 11 December 2014 Gerben Zaagsma Lichtenberg-Kolleg,
  2. 2. Overview 1. 2. T 3. Practical exercises 1. Introduction search & data mining
  3. 3. Code yourself… …or use existing tools
  4. 4. Why historians should be interested: Old New CHANGE Analogue resources Digital resources SCALE Small data Big data Close reading Distant reading TECHNOLOGY
  5. 5. the Big Data revolution? Big data and claims about a paradigm change in the humanities
  6. 6. culturomics and Google ngrams
  7. 7. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history
  8. 8. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism?
  9. 9. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism? Based upon changes of scale & method: humanities supposedly becoming more ‘scientific’ > results can be checked and replicated, but can they? Interpretation.
  10. 10. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism? Based upon changes of scale & method: humanities supposedly becoming more ‘scientific’ > results can be checked and replicated, but can they? Interpretation. Politics: funding & valorisation
  11. 11. “One of the problems confronting data enthusiasts in the humanities is that we feel a need to convince our more old-fashioned colleagues about what can be done. But our role as advocates of data shouldn't mean that we lose our critical sense as scholars. [....] there is a risk that we look more carefully at the technical components of the datasets than the historical context of the information that they represent. Andrew Prescott, ‘The Deceptions of Data’, Digital Riffs (13 January 2013).
  12. 12. Frédéric Clavert, ‘Lecture des sources historiennes à l’ère numérique’ (14 November 2012) Integrate approaches & methods/ hybridity
  13. 13. 1. SEARCH
  14. 14. Google/ Bing/ Yahoo er is veel meer ...
  15. 15. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us
  16. 16. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us http://www.langreiter.com/exec/yahoo-vs-google.html
  17. 17. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us http://yometa.com
  18. 18. filter bubble? http://www.thefilterbubble.com
  19. 19. filter bubble? http://www.thefilterbubble.com
  20. 20. Web search round-up differences between search engines filter bubble deep web versus visible web
  21. 21. Searching digital libraries & archives…
  22. 22. composition of resources, selection…
  23. 23. example of Compactmemory: a great resource on German-Jewish history
  24. 24. Die Sammlung umfasst die 110 wichtigsten jüdischen Zeitungen und Zeitschriften des deutschsprachigen Raumes aus den Jahren 1806-1938. Die Periodika repräsentieren die gesamte religiöse, politische, soziale, literarische oder wissenschaftliche Bandbreite der jüdischen Gemeinschaft. but be aware of selection: focus on elites and organisations that highlight German Jewry’s process of emancipation : • classical vision in historiography on German Jewry? • reinforcement of existing master narratives?
  25. 25. mind the context…
  26. 26. Processing and searching data on your own computer…
  27. 27. 1. DATA MINING
  28. 28. data? data = computer-processable information
  29. 29. Example of structured data
  30. 30. Many digital libraries/archives: un-/semi-structured data
  31. 31. Digital editions: bridging the gap with XML
  32. 32. http://eculture.cs.vu.nl/europeana/session/search •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal Semantic web and linking data
  33. 33. •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal cs.vu.nl/europeana/session/search
  34. 34. •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal
  35. 35. Some definitions of data mining:
  36. 36. At its simplest, data mining is the process of extracting new knowledge (usually in terms of previously unknown patterns) from sets of data already in existence. Jonathan Hagood
  37. 37. Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Wikipedia
  38. 38. Examples of projects and techniques
  39. 39. an n-gram is a contiguous sequence of n items from a given sequence of text or speech
  40. 40. Topic Modeling Martha Ballard’s Diary
  41. 41. data? data & data mining ≠ neutral
  42. 42. “What is too often forgotten, though, is that our digital helpers are full of ‘theory’ and ‘judgement’ already. As with any methodology, they rely on sets of assumptions, models, and strategies. Theory is already at work on the most basic level when it comes to defining units of analysis, algorithms, and visualisation procedures.” Bernhard Rieder and Theo Röhle, ‘Digital Methods: Five Challenges’ in: David M Berry ed., Understanding Digital Humanities (Houndmills: Palgrave Macmillan, 2012) 67-85, 70.
  43. 43. 2. TOOLS
  44. 44. 3. Practical exercises
  45. 45. Overview of exercises http://goo.gl/72fCn7
  46. 46. Tools & workflows Voyant Tools Voyant Tools Documentation Programming Historian DIRT: Digital Research Tools Turkel, William J., Kevin Kee, and Spencer Roberts, ‘A Method for Navigating the Infinite Archive’ in: Toni Weller ed., History in the Digital Age (London; New York: Routledge, 2013). William J. Turkel: How To
  47. 47. Further reading Special issue on Digital History, BMGN - Low Countries Historical Review, 128/4 (2013). Haber, Peter, Digital Past : Geschichtswissenschaft Im Digitalen Zeitalter (München: Oldenbourg Verlag, 2011). Boonstra, Onno, Leen Breure, and Peter Doorn, Past, Present and Future of Historical Information Science (Amsterdam: NIWI-KNAW, 2004). Ciravegna, Fabio, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie McLaughlin, and Ravish Bhagdev, ‘Finding Needles in Hay- Stacks: Data-Mining in Distributed Historical Datasets’ in: Mark Greengrass and Lorna M Hughes eds., The Virtual Representation of the Past (Ashgate, 2008). Cohen, D, F Gibbs, T Hitchcock, G Rockwell, J Sander, R Shoemaker, S Sinclair, S Takats, W J Turkel, and C Briquet. "Data Mining with Criminal Intent." Final white paper (2011). Hagood, Jonathan, "A Brief Introduction to Data Mining Projects in the Humanities." Bulletin of the American Society for Information Science and Technology 38/4 (2012). Hitchcock, Tim, "Big Data for Dead People: Digital Readings and the Conundrums of Positivism." (9 December 2013). Leonard, Peter, "Mining Large Datasets for the Humanities”, IFLA WLIC 2014.
  48. 48. Dr. Gerben Zaagsma http://gerbenzaagsma.org de.linkedin.com/in/gerbenzaagsma/ https://twitter.com/gerbenzaagsma https://uni-goettingen.academia.edu/GerbenZaagsma https://www.researchgate.net/profile/Gerben_Zaagsma https://www.slideshare.net/gerbenzaagsma
  49. 49. Image credits The Field Museum Library, Hall 37 Geology overview. URL: https://www.flickr.com/photos/ field_museum_library/3333920156/in/set-72157614881700424. The U.S. National Archives, Photograph of Card Catalog in Central Search Room, 1942. URL: http:// www.flickr.com/photos/usnationalarchives/3873932255/. Witch computer 1951: Wolverhampton and Staffordshire College of Technology in 1961, The National Computing Museum and Computer Conservation Society/UKAEA/Wolverhampton Express and Star, via: http://www.wired.com/2009/09/britan-oldest-computer/. Code: https://www.flickr.com/photos/lord_james/4696338852/. Tools: Flickr Commons The droids we're googling for: https://www.flickr.com/photos/st3f4n/3951143570/. Jaws (Steven Spielberg) original movie poster: https://en.wikipedia.org/wiki/File:JAWS_Movie_poster.jpg Structured/unstructured data: http://www.emc.com/collateral/demos/microsites/emc-digital-universe- 2011/index.htm Macbook Data Mining: http://www.flickr.com/photos/17208993@N00/442531562/. Topic Modeling Martha Ballard’s Diary: http://www.cameronblevins.org/posts/topic-modeling-martha-ballards- diary/. Boolean operators: http://uksourcers.co.uk/2012/capital-letters-the-key-to-boolean-success/ Miami University students in laboratory classroom 1908: https://www.flickr.com/photos/ muohio_digital_collections/3199691495/

×