Successfully reported this slideshow.

Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014

1

Share

1 of 67
1 of 67

Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014

1

Share

Download to read offline

These are the slides for the introductory lecture that I gave as part of a skills seminar on Search and Data Mining (Luxembourg, 11 December 2014). The slides are rather visual and for the most part don’t include notes, yet I believe the gist of the talk will be clear. At the end links are included for tools, further reading and a link to the exercises we did.

These are the slides for the introductory lecture that I gave as part of a skills seminar on Search and Data Mining (Luxembourg, 11 December 2014). The slides are rather visual and for the most part don’t include notes, yet I believe the gist of the talk will be clear. At the end links are included for tools, further reading and a link to the exercises we did.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Introduction for skills seminar on Search and Data Mining, Master of European History, University of Luxembourg, 11 December 2014

  1. 1. Search & Data Mining SKILLS SEMINAR Master of European History, University of Luxembourg, 11 December 2014 Gerben Zaagsma Lichtenberg-Kolleg,
  2. 2. Overview 1. 2. T 3. Practical exercises 1. Introduction search & data mining
  3. 3. Code yourself… …or use existing tools
  4. 4. Why historians should be interested: Old New CHANGE Analogue resources Digital resources SCALE Small data Big data Close reading Distant reading TECHNOLOGY
  5. 5. the Big Data revolution? Big data and claims about a paradigm change in the humanities
  6. 6. culturomics and Google ngrams
  7. 7. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history
  8. 8. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism?
  9. 9. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism? Based upon changes of scale & method: humanities supposedly becoming more ‘scientific’ > results can be checked and replicated, but can they? Interpretation.
  10. 10. the Big Data revolution? Big data and claims about a paradigm change in the humanities Data driven history Patterns and structures: a new essentialism? Based upon changes of scale & method: humanities supposedly becoming more ‘scientific’ > results can be checked and replicated, but can they? Interpretation. Politics: funding & valorisation
  11. 11. “One of the problems confronting data enthusiasts in the humanities is that we feel a need to convince our more old-fashioned colleagues about what can be done. But our role as advocates of data shouldn't mean that we lose our critical sense as scholars. [....] there is a risk that we look more carefully at the technical components of the datasets than the historical context of the information that they represent. Andrew Prescott, ‘The Deceptions of Data’, Digital Riffs (13 January 2013).
  12. 12. Frédéric Clavert, ‘Lecture des sources historiennes à l’ère numérique’ (14 November 2012) Integrate approaches & methods/ hybridity
  13. 13. 1. SEARCH
  14. 14. Google/ Bing/ Yahoo er is veel meer ...
  15. 15. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us
  16. 16. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us http://www.langreiter.com/exec/yahoo-vs-google.html
  17. 17. zoeken op Internet algemeen: Google er is veel meer dan Google filter bubble? bekijk eens: http://dontbubble.us http://yometa.com
  18. 18. filter bubble? http://www.thefilterbubble.com
  19. 19. filter bubble? http://www.thefilterbubble.com
  20. 20. Web search round-up differences between search engines filter bubble deep web versus visible web
  21. 21. Searching digital libraries & archives…
  22. 22. composition of resources, selection…
  23. 23. example of Compactmemory: a great resource on German-Jewish history
  24. 24. Die Sammlung umfasst die 110 wichtigsten jüdischen Zeitungen und Zeitschriften des deutschsprachigen Raumes aus den Jahren 1806-1938. Die Periodika repräsentieren die gesamte religiöse, politische, soziale, literarische oder wissenschaftliche Bandbreite der jüdischen Gemeinschaft. but be aware of selection: focus on elites and organisations that highlight German Jewry’s process of emancipation : • classical vision in historiography on German Jewry? • reinforcement of existing master narratives?
  25. 25. mind the context…
  26. 26. Processing and searching data on your own computer…
  27. 27. 1. DATA MINING
  28. 28. data? data = computer-processable information
  29. 29. Example of structured data
  30. 30. Many digital libraries/archives: un-/semi-structured data
  31. 31. Digital editions: bridging the gap with XML
  32. 32. http://eculture.cs.vu.nl/europeana/session/search •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal Semantic web and linking data
  33. 33. •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal cs.vu.nl/europeana/session/search
  34. 34. •Google/ Bing/ Yahoo • er is veel meer ... • resultaten verschillen per zoekmachine • en er is een filter bubbel •--> kortom: weten wat je zoekt en zoekstrategie cruciaal
  35. 35. Some definitions of data mining:
  36. 36. At its simplest, data mining is the process of extracting new knowledge (usually in terms of previously unknown patterns) from sets of data already in existence. Jonathan Hagood
  37. 37. Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Wikipedia
  38. 38. Examples of projects and techniques
  39. 39. an n-gram is a contiguous sequence of n items from a given sequence of text or speech
  40. 40. Topic Modeling Martha Ballard’s Diary
  41. 41. data? data & data mining ≠ neutral
  42. 42. “What is too often forgotten, though, is that our digital helpers are full of ‘theory’ and ‘judgement’ already. As with any methodology, they rely on sets of assumptions, models, and strategies. Theory is already at work on the most basic level when it comes to defining units of analysis, algorithms, and visualisation procedures.” Bernhard Rieder and Theo Röhle, ‘Digital Methods: Five Challenges’ in: David M Berry ed., Understanding Digital Humanities (Houndmills: Palgrave Macmillan, 2012) 67-85, 70.
  43. 43. 2. TOOLS
  44. 44. 3. Practical exercises
  45. 45. Overview of exercises http://goo.gl/72fCn7
  46. 46. Tools & workflows Voyant Tools Voyant Tools Documentation Programming Historian DIRT: Digital Research Tools Turkel, William J., Kevin Kee, and Spencer Roberts, ‘A Method for Navigating the Infinite Archive’ in: Toni Weller ed., History in the Digital Age (London; New York: Routledge, 2013). William J. Turkel: How To
  47. 47. Further reading Special issue on Digital History, BMGN - Low Countries Historical Review, 128/4 (2013). Haber, Peter, Digital Past : Geschichtswissenschaft Im Digitalen Zeitalter (München: Oldenbourg Verlag, 2011). Boonstra, Onno, Leen Breure, and Peter Doorn, Past, Present and Future of Historical Information Science (Amsterdam: NIWI-KNAW, 2004). Ciravegna, Fabio, Mark Greengrass, Tim Hitchcock, Sam Chapman, Jamie McLaughlin, and Ravish Bhagdev, ‘Finding Needles in Hay- Stacks: Data-Mining in Distributed Historical Datasets’ in: Mark Greengrass and Lorna M Hughes eds., The Virtual Representation of the Past (Ashgate, 2008). Cohen, D, F Gibbs, T Hitchcock, G Rockwell, J Sander, R Shoemaker, S Sinclair, S Takats, W J Turkel, and C Briquet. "Data Mining with Criminal Intent." Final white paper (2011). Hagood, Jonathan, "A Brief Introduction to Data Mining Projects in the Humanities." Bulletin of the American Society for Information Science and Technology 38/4 (2012). Hitchcock, Tim, "Big Data for Dead People: Digital Readings and the Conundrums of Positivism." (9 December 2013). Leonard, Peter, "Mining Large Datasets for the Humanities”, IFLA WLIC 2014.
  48. 48. Dr. Gerben Zaagsma http://gerbenzaagsma.org de.linkedin.com/in/gerbenzaagsma/ https://twitter.com/gerbenzaagsma https://uni-goettingen.academia.edu/GerbenZaagsma https://www.researchgate.net/profile/Gerben_Zaagsma https://www.slideshare.net/gerbenzaagsma
  49. 49. Image credits The Field Museum Library, Hall 37 Geology overview. URL: https://www.flickr.com/photos/ field_museum_library/3333920156/in/set-72157614881700424. The U.S. National Archives, Photograph of Card Catalog in Central Search Room, 1942. URL: http:// www.flickr.com/photos/usnationalarchives/3873932255/. Witch computer 1951: Wolverhampton and Staffordshire College of Technology in 1961, The National Computing Museum and Computer Conservation Society/UKAEA/Wolverhampton Express and Star, via: http://www.wired.com/2009/09/britan-oldest-computer/. Code: https://www.flickr.com/photos/lord_james/4696338852/. Tools: Flickr Commons The droids we're googling for: https://www.flickr.com/photos/st3f4n/3951143570/. Jaws (Steven Spielberg) original movie poster: https://en.wikipedia.org/wiki/File:JAWS_Movie_poster.jpg Structured/unstructured data: http://www.emc.com/collateral/demos/microsites/emc-digital-universe- 2011/index.htm Macbook Data Mining: http://www.flickr.com/photos/17208993@N00/442531562/. Topic Modeling Martha Ballard’s Diary: http://www.cameronblevins.org/posts/topic-modeling-martha-ballards- diary/. Boolean operators: http://uksourcers.co.uk/2012/capital-letters-the-key-to-boolean-success/ Miami University students in laboratory classroom 1908: https://www.flickr.com/photos/ muohio_digital_collections/3199691495/

×