Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
TO I N E P I E T E RS A N D JA A P V E R H E U L
U T R E C H T U N I V E R S I T Y , T H E N E T H E R L A N D S
Texcavato...
Overview
 Translantis research project
 Concept of reference cultures
 Digital humanities
 Texcavator tool
 Requireme...
T R A N S L A N T I S . N L
KB Big Data Conference 24 March 2015
Translantis research project
Translantis
Topic: emergence of the United States in Public Discourse in the
Netherlands, 1890-1990
Concept: transnational...
Culture Mining
Culture
•Ideas
•Kowledge
•Practices
Public
Sphere
•Public Opinion
•Citizens
engaging in
enlightened
debate
...
T R A N S L A N T I S . N L
KB Big Data Conference 24 March 2015
Texcavator
Texcavator
 generic tool for cultural text mining and big data
research
 enables scholars to systematically search very ...
Features
 Direct access to big data repository
 Integrated text-mining tools
 Boolean search
 Named Entity Recognition...
Current configuration
Digitized
newspapers
(National
Library)
9m pages
Texcavator
interface
Elastic
Search
(500GB)
xTAS
KB...
Current configuration
Digitized
newspapers
(National
Library)
9m pages
Texcavator
interface
Elastic
Search
(500GB)
xTAS
re...
B U F FA LO B I L L
CO C A - CO L A
TAY LO R I S M
KB Big Data Conference 24 March 2015
Use cases
Records and word cloud
KB Big Data Conference 24 March 2015
Timeline + cloud of one “burst” (1965)
Normalized timeline
KB Big Data Conference 24 March 2015
Access to original
KB Big Data Conference 24 March 2015
Configuration
KB Big Data Conference 24 March 2015
Visualizing historical change
KB Big Data Conference 24 March 2015
Soft drinks
KB Big Data Conference 24 March 2015
Verwijzingen naar Coca-Cola èn Amerika in reclames
 Verklaar de pieken e...
Soft drinks
KB Big Data Conference 24 March 2015
Verwijzingen naar Coca-Cola zonder Amerika in reclames
 Verklaar de piek
Topic modeling en GIS
KB Big Data Conference 24 March 2015
Taylorism
KB Big Data Conference 24 March 2015
Voyant word cloud van
“wetenschappelijke
bedrijfsleiding” dataset
Verwijzin...
C H A L L E N G ES &
O P P O RT U N I T I ES
KB Big Data Conference 24 March 2015
Ambitions
Challenges
 Software development
 Stable version of Texcavator
 Intuitive interface
 Additional features
 Technologic...
Cultural Text Mining
 Mining of cultural aspects of entities and events
 Concepts, mentalities, ideas, utopia’s, etc
 M...
Thank you!
KB Big Data Conference 24 March 2015
Upcoming SlideShare
Loading in …5
×

07 verheul texcavator

631 views

Published on

KB symposium historische kranten als big data,
Den Haag, 24 maart 2015

Published in: Government & Nonprofit
  • Be the first to comment

07 verheul texcavator

  1. 1. TO I N E P I E T E RS A N D JA A P V E R H E U L U T R E C H T U N I V E R S I T Y , T H E N E T H E R L A N D S Texcavator Text Mining Historical Newspapers
  2. 2. Overview  Translantis research project  Concept of reference cultures  Digital humanities  Texcavator tool  Requirements  Features  Configuration  Texcavator use cases  Future ambitions  Challenges  Cultural Text Mining KB Big Data Conference 24 March 2015
  3. 3. T R A N S L A N T I S . N L KB Big Data Conference 24 March 2015 Translantis research project
  4. 4. Translantis Topic: emergence of the United States in Public Discourse in the Netherlands, 1890-1990 Concept: transnational reference cultures Method: digital humanities  text mining Translantis.nl KB Big Data Conference 24 March 2015
  5. 5. Culture Mining Culture •Ideas •Kowledge •Practices Public Sphere •Public Opinion •Citizens engaging in enlightened debate Public Media •Periodicals •Radio •TV •Internet Digitized Newspapers (sample of 10%) Digitized Newspapers • Sample of 10% of all printed newspapers Mediation KB Big Data Conference 24 March 2015
  6. 6. T R A N S L A N T I S . N L KB Big Data Conference 24 March 2015 Texcavator
  7. 7. Texcavator  generic tool for cultural text mining and big data research  enables scholars to systematically search very large quantities of textual data in a reliable and reproducible way  able to support exploration and contextualization  serve multiple user groups  Wide community of historians using big data  Translantis team (NWO-funded)  Asymmetrical Encounters team (HERA-funded) KB Big Data Conference 24 March 2015
  8. 8. Features  Direct access to big data repository  Integrated text-mining tools  Boolean search  Named Entity Recognition  Sentiment mining  Stemming  Real-time visualization of search results  Dynamic word clouds (and export of underlying data)  Timelines (normalized, bursts)  Input-output storage  Close and distant reading KB Big Data Conference 24 March 2015
  9. 9. Current configuration Digitized newspapers (National Library) 9m pages Texcavator interface Elastic Search (500GB) xTAS KB Big Data Conference 24 March 2015
  10. 10. Current configuration Digitized newspapers (National Library) 9m pages Texcavator interface Elastic Search (500GB) xTAS real-time, scalable indexing eXtensible Text Analysis Suite KB Big Data Conference 24 March 2015
  11. 11. B U F FA LO B I L L CO C A - CO L A TAY LO R I S M KB Big Data Conference 24 March 2015 Use cases
  12. 12. Records and word cloud KB Big Data Conference 24 March 2015
  13. 13. Timeline + cloud of one “burst” (1965) Normalized timeline KB Big Data Conference 24 March 2015
  14. 14. Access to original KB Big Data Conference 24 March 2015
  15. 15. Configuration KB Big Data Conference 24 March 2015
  16. 16. Visualizing historical change KB Big Data Conference 24 March 2015
  17. 17. Soft drinks KB Big Data Conference 24 March 2015 Verwijzingen naar Coca-Cola èn Amerika in reclames  Verklaar de pieken en dalen
  18. 18. Soft drinks KB Big Data Conference 24 March 2015 Verwijzingen naar Coca-Cola zonder Amerika in reclames  Verklaar de piek
  19. 19. Topic modeling en GIS KB Big Data Conference 24 March 2015
  20. 20. Taylorism KB Big Data Conference 24 March 2015 Voyant word cloud van “wetenschappelijke bedrijfsleiding” dataset Verwijzingen over tijd binnen “wetenschappelijke bedrijfsleiding” dataset naar “Taylor”, “taylor-stelsel”, “Taylor- systeem”
  21. 21. C H A L L E N G ES & O P P O RT U N I T I ES KB Big Data Conference 24 March 2015 Ambitions
  22. 22. Challenges  Software development  Stable version of Texcavator  Intuitive interface  Additional features  Technological  Processor and server capacity  Data exchange and standardization (metatags)  OCR  Scientific  Combining close and distant reading  Reproducability KB Big Data Conference 24 March 2015
  23. 23. Cultural Text Mining  Mining of cultural aspects of entities and events  Concepts, mentalities, ideas, utopia’s, etc  Mining for Meaning  Towards digital conceptual history or digital history of mentalities  Address macro-historical questions:  Trends, patterns, structures in debates  Circulation of knowledge  Emergence of transnational reference cultures KB Big Data Conference 24 March 2015
  24. 24. Thank you! KB Big Data Conference 24 March 2015

×