Successfully reported this slideshow.
Your SlideShare is downloading. ×

Ihr june15-evans

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 16 Ad

Ihr june15-evans

Download to read offline

Digital History Seminar and Archives and Society Seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/

Digital History Seminar and Archives and Society Seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/

Advertisement
Advertisement

More Related Content

Viewers also liked (19)

Similar to Ihr june15-evans (20)

Advertisement

Recently uploaded (20)

Advertisement

Ihr june15-evans

  1. 1. NLP and Data Mining: From Chartex to Traces Through Time and beyond Dr Roger Evans Natural Language Technology Group & Cultural Informatics Research Group University of Brighton
  2. 2. One man, two guvnors ChartEx TTT ‘Deep’ processing
  3. 3. Two men, two guvnors ChartEx TTT Natural language processing Data mining
  4. 4. Two men, two guvnors ChartEx TTT Natural language processing Data mining Brighton Leiden
  5. 5. ChartEx Architecture 1000’s of charters Virtual workbench Data mining Natural language processing DM development NLP development 5-10 charters Markup scheme Expert elicitation 100-200 Charters Marked-up charters Manual markup ChartEx repository VWB development VWB requirements Repository development
  6. 6. ChartEx Architecture 1000’s of charters Virtual workbench Data mining Natural language processing DM development NLP development 5-10 charters Markup scheme Expert elicitation 100-200 Charters Marked-up charters Manual markup ChartEx repository VWB development VWB requirements Repository development Runtime architecture
  7. 7. TTT architecture Record Linkage Visualisation Shallow language processing Extract content Deep language processing Documents Optimisation /statistics
  8. 8. Comparison Record Linkage Visualisation Shallow language processing Extract content Deep language processing Documents Optimisation /statistics 1000’s of charters Virtual workbench Data mining Natural language processing ChartEx repository
  9. 9. Comparison Record Linkage Visualisation Shallow language processing Extract content Deep language processing Documents Optimisation /statistics 1000’s of charters Virtual workbench Data mining Natural language processing ChartEx repository Range of data Medieval charters English and Latin Early and modern Free text Text and data
  10. 10. Comparison Record Linkage Visualisation Shallow language processing Extract content Deep language processing Documents Optimisation /statistics 1000’s of charters Virtual workbench Data mining Natural language processing ChartEx repository Range of data Analytic Complexity Medieval charters English and Latin Early and modern Free text Text and data Focus on people Detailed view Focus on places Broad relational view
  11. 11. Comparison Record Linkage Visualisation Shallow language processing Extract content Deep language processing Documents Optimisation /statistics 1000’s of charters Virtual workbench Data mining Natural language processing ChartEx repository Range of data Target users Analytic Complexity Medieval charters English and Latin Early and modern Free text Text and data Focus on people Detailed view Focus on places Broad relational view ‘Researchers’ Controlled environment Web users Less control
  12. 12. Comparison Record Linkage Visualisation Shallow language processing Extract content Deep language processing Documents Optimisation /statistics 1000’s of charters Virtual workbench Data mining Natural language processing ChartEx repository Range of data Target users Analytic Complexity Medieval charters English and Latin Early and modern Free text Text and data Focus on people Detailed view Focus on places Broad relational view ‘Researchers’ Controlled environment Web users Less control (Heritage) Enterprise Bespoke
  13. 13. What can Computer Science do? • State of the art is broadly based on statistics • Answers are always only approximate • Different kinds of approximation: • Precision – focus on making sure answers are right (but may miss some) • Recall - focus on getting as many right answers as possible (but may give some wrong answers too)
  14. 14. Precision and recall
  15. 15. What does Digital Humanities want? • Perfect results? • How do you respond if we say we can’t do that? • Control over tradeoff? • How easy is it to understand what control you have? • Does this help you interpret the results you get?
  16. 16. Where are we now, and where are we going? • Human in the loop • Tools always require human interpretation of results • Is this really just a cop out by computer scientists? • Or just a pragmatic expression of the state of the art? • Deskilling • Do we really mean an expert in the loop? • Conversations • Are we really only just at the point of negotiating what is possible and what is required?

×