Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Why Language Technology Can’t
Handle Game of Thrones (yet)
Marieke van Erp merpeltje

Joint work with: 

Niels Dekker & To...
This talk
• NLP 101 

• Recognising named entities
in fiction

• Digital Humanities @KNAW
HuC
D I G I TA L H U M A N I T I ...
Image source: http://www.jvwmoergestel.nl/site/wp-content/uploads/2016/12/KerstWoordzoeker.jpg
NLP 101
Image source: https://i.ytimg.com/vi/iuumnjJWFO4/maxresdefault.jpg
NLP 101: What is Text Mining?
• Extracting knowledge and information from texts in natural language:
• metadata for a text...
Types of Knowledge to Extract
• Conceptual relations: define possible relations between concepts in an ontology, e.g.
what ...
Text Mining pipeline
• Analysis starts at token-level
• Moves up to phrases, sentences and
documents
• Performance goes do...
Companies want text mining
• From click logs they can see what people looked at on their site
• To know what they think ab...
Humanities researchers want text mining
• To evaluate gender bias in large corpora http://
literaryquality.huygens.knaw.nl...
State-of-the-art
• POS tagging: 97%
• Sentiment Analysis: 95% (document level) / 54% (fine-grained sentence level)
• Named ...
Image source: https://memegenerator.net/img/instances/56709008/are-we-there-yet.jpg
Image source: https://i.redd.it/dmnouc4hip521.jpg
Recognising named entities in fiction
Image source: https://wp-media.patheos.com/blogs/sites/1186/2019/04/mauricio-santos-5...
Background
• Characters and relations are backbone of
stories 

• Computational methods allow for scaling
up network extra...
Experimental setup
• Collect 20 ‘old’ and 20 ‘new’ novels 

• Annotate first chapters for entities and
relationships betwee...
19th and early 20th century novels, based on The Guardian’s Top 100 Classic novels +
availability through Project Gutenber...
‘New’ Science Fiction and Fantasy novels, based on list from BestFantasyBooks.com
Data preprocessing
• All books converted to plain text format 

• Ensure all texts have the same character
encoding 

• Pr...
Gold standard annotations
• Chapter lengths varied from 84 to 1,442
sentences 

• An average of 300 sentences close to a
c...
Annotation Instructions
• For each sentence:

• Identify all characters in it 

• Identify anaphoric references (e.g. she
...
Named Entity Recognisers:
BookNLP
• NLP pipeline modified to deal with books 

• POS tagging, dependency parsing, NER,
char...
Named Entity Recognisers:
Stanford NER
• State-of-the-art CRF NER system

• Trained on CoNLL 2003 data (Reuters
newswire a...
Named Entity Recognisers:
Illinois Tagger
• Perceptron-based classifier 

• Includes contextual information

• 10,146 downl...
Named Entity Recognisers:
IXA-Pipe-NERC
• Perceptron model 

• additional background information from
Brown clusters

• F1...
JosethJoseth
Harys SerHarys Ser
BrackensBrackens
Lord RobbLord Robb
CoholloCohollo
Piper Ser MarqPiper Ser Marq
HullenHull...
CoholloCohollo
EliaElia
AggoAggo
JhiquiJhiqui
ChellaChella
JhogoJhogo
ShaeShae
PentoshiPentoshi
MagoMago
Vargo HoatVargo H...
Image source: https://i.pinimg.com/originals/30/25/20/302520dbb49bb4a01b5687a7e6c6bf60.jpg
Discussion
• No difference between ‘old’ and ‘new’
books 

• Within categories, great variety in entity
distributions and r...
Why is fiction hard for NLP?
• Fiction writers don’t have to abide by
conventions: they can use language more
creatively th...
ChalaisChalais
M. BonacieuxM. Bonacieux
de M. Busignyde M. Busigny
Houdiniere LaHoudiniere La
John FeltonJohn Felton
Bois-...
ChalaisChalais
M. BonacieuxM. Bonacieux
de M. Busignyde M. Busigny
Houdiniere LaHoudiniere La
John FeltonJohn Felton
Bois-...
Image source: https://static.boredpanda.com/blog/wp-content/uploads/2015/10/funny-game-of-thrones-memes-fb__700.jpg
Performance fixes
• Replace word names with generic names

• Remove apostrophes from names 

• But:

• Requires manual inte...
Where to go from here?
• Robuster NLP tools are necessary to better
understand novels (and other non-newspaper
texts)

• B...
Digital Humanities Lab
History,
Literary Studies,
History of Science
& Scholarship
Social History
Dutch Language
& Culture...
Slide by Antal van den Bosch
Slide by Antal van den Bosch
Cultural Artificial Intelligence
Making AI culturally aware
Appreciate the user
Being contextually appropriate
Understand ...
Applications of Cultural AI: Filters and flags
• Toxicity
• Protective filters (like spam filters and
ad blockers)
• Gende...
Theory of Cultural AI: Understanding & nuance
• Understanding concepts
• Changes over time
• Perspectives
• Evolution
• Kn...
Some DHLab projects
• Food culture via newspaper recipes
(Meertens and IISH)

• Analysing online debates: refugee vs
migra...
Debates on the refugee crisis
• From 2015 on, wider use of both
‘European refugee crisis’ and ‘European
migrant crisis’ in...
DHLab@HuC:
Advancing the humanities through digital methods
• DHLabHuC / adinanerghes / melvinwevers
/ merpeltje 

• https...
Grazie per la vostra attenzione!
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Upcoming SlideShare
Loading in …5
×

Why language technology can’t handle Game of Thrones (yet)

103 views

Published on

Natural language processing (NLP) tools are commonly used in many day-to-day applications such as Siri and Google, but the effectiveness of these technologies is not thoroughly understood. I will present joint work with colleagues from the Vrij Universiteit Amsterdam in which we perform a thorough evaluation of four different name recognition tools on 40 popular novels (including A Game of Thrones). I will highlight why literary texts are so difficult for NLP tools as well as solutions for improving their performance.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Why language technology can’t handle Game of Thrones (yet)

  1. 1. Why Language Technology Can’t Handle Game of Thrones (yet) Marieke van Erp merpeltje Joint work with: Niels Dekker & Tobias Kuhn Image source: https://anibundel.files.wordpress.com/2015/04/jonsnow-leaves-ygritte.jpg
  2. 2. This talk • NLP 101 • Recognising named entities in fiction • Digital Humanities @KNAW HuC D I G I TA L H U M A N I T I E S L A B Image source: https://vignette.wikia.nocookie.net/pirates/images/3/3c/ MediterraneanProfile.jpg/revision/latest?cb=20120312215230
  3. 3. Image source: http://www.jvwmoergestel.nl/site/wp-content/uploads/2016/12/KerstWoordzoeker.jpg
  4. 4. NLP 101 Image source: https://i.ytimg.com/vi/iuumnjJWFO4/maxresdefault.jpg
  5. 5. NLP 101: What is Text Mining? • Extracting knowledge and information from texts in natural language: • metadata for a text: author, publisher, time of publication, topic, its language, URL, URLs to and from a web text • people mentioned in text, but also companies, organisations, places, dates → links to Wikipedia, Wikification of text • Amounts: prices, age, size, distance, weight • Facts (statements), concepts (terms) and relations between concepts • Sentiment (positive/negative), opinions • Emotions, purpose, intention, humour, sarcasm, irony, threats, style (formal, informal), genre (blog, news, science, tax form)
  6. 6. Types of Knowledge to Extract • Conceptual relations: define possible relations between concepts in an ontology, e.g. what things have weight, size, age, get born, eat, drink, get an education, work, marry, do sports, live and die. • Factual relations: actual instantiations of concepts and relations that are the case in some world (time and place), Barack Obama was born on Augus 4, 1961, in Honolulu, Hawaii. • Factual relations need to fit the ontological model but the ontology does not predict actual facts only the possible facts!!! • Opinions: epistemic and modal relations (believe, wish, hope, fear, expect) between source and target expressed as a private state of the source, e.g. I am a fan of Barack Obama, I believe Barack Obama will help people.
  7. 7. Text Mining pipeline • Analysis starts at token-level • Moves up to phrases, sentences and documents • Performance goes down as analyses becomes deeper • Statistical methods mostly used, but hybrid methods are a promising research topic Tokenisation Lexical Analysis Syntactic Analysis Semantic Analysis Pragmatic Analysis Input text Speaker's intended meaning
  8. 8. Companies want text mining • From click logs they can see what people looked at on their site • To know what they think about it they need to mine reviews, tweets etc: text mining • To stay ahead of their competitors, they need to obfuscate their patents, and find relevant patents from competitors: text mining • To aid their information departments, they need access to relevant information: text mining
  9. 9. Humanities researchers want text mining • To evaluate gender bias in large corpora http:// literaryquality.huygens.knaw.nl/ • To trace concepts through time: https://www.esciencecenter.nl/project/ evidence • Detecting and modelling populist movements on social media https:// www.meertens.knaw.nl/cms/en/research/projects/259-het-dagelijks-leven/ 145541-populisme-social-media-en-religie • Analysis of church registers, letters, ship journals etc…
  10. 10. State-of-the-art • POS tagging: 97% • Sentiment Analysis: 95% (document level) / 54% (fine-grained sentence level) • Named Entity Recognition: 90% • Temporal information extraction: 77% Note: this holds for English and on standardised datasets
  11. 11. Image source: https://memegenerator.net/img/instances/56709008/are-we-there-yet.jpg
  12. 12. Image source: https://i.redd.it/dmnouc4hip521.jpg
  13. 13. Recognising named entities in fiction Image source: https://wp-media.patheos.com/blogs/sites/1186/2019/04/mauricio-santos-503880-unsplash.jpg
  14. 14. Background • Characters and relations are backbone of stories • Computational methods allow for scaling up network extraction and analysis • Relies on named entity recognition • Most work thusfar focuses on 19th and early 20th century novels • Research question: how do these tools perform on modern science fiction/fantasy novels? D I G I TA L H U M A N I T I E S L A B Image source: https://newleftreview.org/system/dragonfly/production/2019/03/09/9rcllsj7us_3020501.gif
  15. 15. Experimental setup • Collect 20 ‘old’ and 20 ‘new’ novels • Annotate first chapters for entities and relationships between entities (gold standard) • Run 4 named entity recognisers on the sets of ‘old’ and ‘new’ novels • Compare system outputs to gold standard annotations • Bonus: compare network structures Image source: delpher.nl D I G I TA L H U M A N I T I E S L A B Image source: https://cdn-images-1.medium.com/max/2400/1*QbCo9uE7jPbt1ttnMsqOog.jpeg
  16. 16. 19th and early 20th century novels, based on The Guardian’s Top 100 Classic novels + availability through Project Gutenberg + used in earlier studies
  17. 17. ‘New’ Science Fiction and Fantasy novels, based on list from BestFantasyBooks.com
  18. 18. Data preprocessing • All books converted to plain text format • Ensure all texts have the same character encoding • Pro tip: check whether there are no odd or inconsistent quotation marks in your documents • Appendices, glossaries and reviews were removed manually D I G I TA L H U M A N I T I E S L A B Image source: https://www.dataentryoutsourced.com/blog/wp-content/uploads/2015/03/ Post-091-640x200.jpg
  19. 19. Gold standard annotations • Chapter lengths varied from 84 to 1,442 sentences • An average of 300 sentences close to a chapter boundary was selected • e.g. the third chapter in Alice in Wonderland ended after sentence 315, so for that book the first three chapters were annotated • 2 annotators (not the authors of the study) D I G I TA L H U M A N I T I E S L A B Image source: https://panmacmillan.azureedge.net/pmk11/panmacmillan/files/media/ panmacmillan/blogs/tws/august%202017/alice-in-wonderland-knowledge-quiz-header.png
  20. 20. Annotation Instructions • For each sentence: • Identify all characters in it • Identify anaphoric references (e.g. she refers to Alice) • To speed up the process, annotators were provided with a list of characters derived automatically • Missing characters could be added to the list • Ignore generic pronouns, exclamations, generic noun phrases, non-human named characters (Buckbeak) D I G I TA L H U M A N I T I E S L A B Image source: https://vignette.wikia.nocookie.net/p__/images/3/35/Erich_Mueller_and_Shannon_McGrath_are_glued_together_back_to_back_with_Tree_Resin.jpeg/revision/ latest?cb=20170331180847&path-prefix=protagonist
  21. 21. Named Entity Recognisers: BookNLP • NLP pipeline modified to deal with books • POS tagging, dependency parsing, NER, character name clustering, quotation speaker identification, pronominal coreference resolution, supersense tagging • NER module based on Stanford NER, with some modifications • We focus on NER, character name clustering and pronominal character resolution modules in our evaluation • https://github.com/dbamman/book-nlp D I G I TA L H U M A N I T I E S L A B Image source: https://cdn.aarp.net/content/dam/aarp/money/budgeting_savings/2016/04/1140- yeager-sell-your-used-books.imgcache.rev6feda141288df73e8fd100822bb375ea.jpg
  22. 22. Named Entity Recognisers: Stanford NER • State-of-the-art CRF NER system • Trained on CoNLL 2003 data (Reuters newswire articles from 1996-08-20 to 1997-08-19) • Cited 2,720 times • F1 = 86.31 on CoNLL 2003 test set • https://nlp.stanford.edu/software/CRF- NER.html D I G I TA L H U M A N I T I E S L A B
  23. 23. Named Entity Recognisers: Illinois Tagger • Perceptron-based classifier • Includes contextual information • 10,146 downloads • F1 = 90.57 on CoNLL 2003 test set • https://cogcomp.org/page/software_view/ NETagger Image source: delpher.nl D I G I TA L H U M A N I T I E S L A B
  24. 24. Named Entity Recognisers: IXA-Pipe-NERC • Perceptron model • additional background information from Brown clusters • F1 = 91.36 on CoNLL 2003 test • https://github.com/ixa-ehu/ixa-pipe-nerc D I G I TA L H U M A N I T I E S L A B
  25. 25. JosethJoseth Harys SerHarys Ser BrackensBrackens Lord RobbLord Robb CoholloCohollo Piper Ser MarqPiper Ser Marq HullenHullen Tommen PrinceTommen Prince Trant Meryn SerTrant Meryn Ser Hightower Ser GeroldHightower Ser Gerold Lord VanceLord VanceDareonDareon Arya HorsefaceArya Horseface Lord HornwoodLord Hornwood Robert BaratheonRobert BaratheonCotter PykeCotter Pyke Caron Lord BryceCaron Lord Bryce EliaElia Stark SansaStark Sansa Mott MasterMott Master AggoAggo Rodrik Cassel SerRodrik Cassel Ser ThorosThoros LyannaLyanna Ser DonnelSer Donnel NymeriaNymeria SherrerSherrer Tarly SamTarly Sam JhiquiJhiqui Alyssa ArrynAlyssa Arryn JyckJyck YorenYoren Frey LadyFrey Lady Rayder ManceRayder Mance PypPyp Manderly Ser WylisManderly Ser Wylis ChellaChella JhogoJhogo ChiggenChiggen Dontos SerDontos Ser Bronze Yohn RoyceBronze Yohn Royce ChettChett VisenyaVisenya Cassel JoryCassel Jory GrennGrenn Lord SlyntLord Slynt Hal MollenHal Mollen Ned StarkNed Stark Stark BrandonStark Brandon MikkenMikken Greyjoy BalonGreyjoy Balon MorrecMorrec TomardTomard DanwellDanwell Mya StoneMya Stone HeartsbaneHeartsbane Jaremy Ser RykkerJaremy Ser Rykker Egen Ser VardisEgen Ser Vardis GodwynGodwyn Castle BlackCastle Black Lord Dondarrion BericLord Dondarrion Beric Brynden BlackfishBrynden Blackfish Maester LuwinMaester Luwin Maester AemonMaester Aemon CravenCraven MordMord MattMatt Clegane SandorClegane Sandor ShaeShae HarrenhalHarrenhal Lord Nestor RoyceLord Nestor Royce PentoshiPentoshi ToadToad PortherPorther Lord lord TyrionLord lord Tyrion MagoMago Vargo HoatVargo Hoat RickonRickon EroehEroeh Lord ArrynLord Arryn QuaroQuaro Lord PiperLord Piper Lysa Lady ArrynLysa Lady Arryn BraavosiBraavosi MattharMatthar Bracken Jonos LordBracken Jonos Lord Lord StewardLord Steward Manderly Ser WendelManderly Ser Wendel TregarTregar TimettTimett Santagar Ser AronSantagar Ser Aron Barristan Selmy SerBarristan Selmy Ser Payne Ser IlynPayne Ser Ilyn Boy MoonBoy Moon Perwyn SerPerwyn Ser Lord Mallister JasonLord Mallister Jason Samwell TarlySamwell Tarly Poole VayonPoole Vayon JoffteyJofftey BethBeth GaredGared MoreoMoreo Whent Oswell SerWhent Oswell Ser Forel SyrioForel Syrio DanyDany KurleketKurleket GreatjonGreatjon Lannister TyrionLannister Tyrion Ser Moore MandonSer Moore Mandon Lord WymanLord Wyman HardinHardin DorneDorne Lord JonLord Jon Stannis Baratheon LordStannis Baratheon Lord JerenJeren UlfUlf Fat TomFat Tom Jaime Ser LannisterJaime Ser Lannister Ogo KhalOgo Khal Moat CailinMoat Cailin Cassel MartynCassel Martyn Alliser Ser ThorneAlliser Ser Thorne FarlenFarlen Lord RobertLord Robert LysLys Lord RowanLord Rowan Jeyne PooleJeyne Poole TyroshiTyroshi ConnConn MaegorMaegor HaggoHaggo ValeVale Edmure Ser TullyEdmure Ser Tully HighgardenHighgarden GageGage Hill HornHill Horn CorattCoratt Heddle MashaHeddle Masha Maege MormontMaege Mormont Lady Catelyn StarkLady Catelyn Stark CaynCayn Ben StarkBen Stark MarillionMarillion Lady MormontLady Mormont KingKing Robert ArrynRobert Arryn GendryGendry Xho JalabharXho Jalabhar KhaleesiKhaleesi Lord Baratheon RenlyLord Baratheon Renly AlynAlyn Lord Baelish PetyrLord Baelish Petyr Lady SansaLady Sansa Mirri Maz DuurMirri Maz Duur Lord Frey WalderLord Frey Walder FatherFather Ser Addam MarbrandSer Addam Marbrand Hugh SerHugh Ser Old NanOld Nan LharysLharys JacksJacks Rhaegar TargaryenRhaegar Targaryen Joffrey PrinceJoffrey Prince Boros Ser BlountBoros Ser Blount Vance KarylVance Karyl JoffJoff Arthur Dayne SerArthur Dayne Ser Mordane SeptaMordane Septa Ser Tallhart HelmanSer Tallhart Helman Lord Tytos BlackwoodLord Tytos Blackwood Tywin Lord LannisterTywin Lord Lannister Yi TiYi Ti Jen BenJen Ben HalderHalder ShaggaShagga Arryn JonArryn Jon DolfDolf BaelorBaelor GunthorGunthor Tyrell Ser LorasTyrell Ser Loras Lannister Ser KevanLannister Ser Kevan Stevron Frey SerStevron Frey Ser Tanda LadyTanda Lady Raymun Darry SerRaymun Darry Ser ShaggydogShaggydog Lord Tully HosterLord Tully Hoster Arys SerArys Ser Flowers JaferFlowers Jafer Willis Ser WodeWillis Ser Wode DawnDawn HewardHeward Willem DarryWillem Darry FogoFogo MalleonMalleon WillWill Rhaggat KhalRhaggat Khal MycahMycah JaggotJaggot Flement Brax SerFlement Brax Ser UmarUmar Robar SerRobar Ser NaerysNaerys CheykCheyk Tobho MottTobho Mott Benjen StarkBenjen Stark MohorMohor LittlefingerLittlefinger Lord TyrellLord Tyrell Brynden Ser TullyBrynden Ser Tully HaliHali MyrcellaMyrcella StivStiv Othell YarwyckOthell Yarwyck Greyjoy TheonGreyjoy Theon IrriIrri Maester PycelleMaester Pycelle Grey WindGrey Wind Quorin HalfhandQuorin Halfhand JaehaerysJaehaerys Lord CerwynLord Cerwyn ClydasClydas RakharoRakharo DywenDywen Magister IllyrioMagister Illyrio TorrhenTorrhen Aegon TargaryenAegon Targaryen Bowen MarshBowen Marsh Daryn HornwoodDaryn Hornwood RiverrunRiverrun Clegane Gregor SerClegane Gregor Ser Snow JonSnow Jon RastRast Aerys TargaryenAerys Targaryen Drogo KhalDrogo Khal Viserys TargaryenViserys Targaryen QothoQotho Whent LadyWhent Lady Hobb Three-FingerHobb Three-Finger DothrakiDothraki Royce Ser AndarRoyce Ser Andar Karyl SerKaryl Ser HakeHake LanceLance HosteenHosteen Mace TyrellMace Tyrell Lord HunterLord Hunter Hallis MollenHallis Mollen Dothrak VaesDothrak Vaes Daeren TargaryenDaeren Targaryen Lord LeffordLord Lefford VolantisVolantis Glover GalbartGlover Galbart RhaegoRhaego Bolton RooseBolton Roose Catelyn TullyCatelyn Tully Lannister CerseiLannister Cersei JossJoss Waymar Ser RoyceWaymar Ser Royce Lothor BruneLothor Brune Lord Tarly RandyllLord Tarly Randyll Derik LordDerik Lord Jared Frey SerJared Frey Ser TyroshTyrosh Ser Swann BalonSer Swann Balon Lord VarysLord Varys BranBran Harrion KarstarkHarrion Karstark JhaqoJhaqo DoreahDoreah HaiderHaider bushbush Janos SlyntJanos Slynt Brothers MoonBrothers Moon Arya StarkArya Stark Daenerys TargaryenDaenerys Targaryen Corbray Lyn SerCorbray Lyn Ser HodorHodor Robett GloverRobett Glover HarwinHarwin Lord Karstark RickardLord Karstark Rickard BronnBronn Hobber SerHobber Ser Khal JommoKhal Jommo Horas SerHoras Ser Lord MormontLord Mormont DesmondDesmond StarksStarks Robb StarkRobb Stark Lord Hand lordLord Hand lord AlbettAlbett Noye DonalNoye Donal Jorah Ser MormontJorah Ser Mormont
  26. 26. CoholloCohollo EliaElia AggoAggo JhiquiJhiqui ChellaChella JhogoJhogo ShaeShae PentoshiPentoshi MagoMago Vargo HoatVargo Hoat EroehEroeh QuaroQuaro rdrd TimettTimett DanyDany annister Tyrionannister Tyrion DorneDorne UlfUlf Ogo KhalOgo Khal LysLys ConnConn HaggoHaggo HighgardenHighgarden KingKing KhaleesiKhaleesi Mirri Maz DuurMirri Maz Duur Rhaegar TargaryenRhaegar Targaryen Vance KarylVance Karyl Yi TiYi Ti ShaggaShagga DolfDolf GunthorGunthor Lannister Ser KevanLannister Ser Kevan Raymun Darry SerRaymun Darry Ser FogoFogo Rhaggat KhalRhaggat Khal Flement Brax SerFlement Brax Ser UmarUmar NaerysNaerys CheykCheyk Lord TyrellLord Tyrell IrriIrri RakharoRakharo Magister IllyrioMagister Illyrio Aegon TargaryenAegon Targaryen Drogo KhalDrogo Khal Viserys TargaryenViserys Targaryen QothoQotho DothrakiDothraki Karyl SerKaryl Ser Dothrak VaesDothrak Vaes Daeren TargaryenDaeren Targaryen Lord LeffordLord Lefford RhaegoRhaego Lannister CerseiLannister Cersei JossJoss Derik LordDerik Lord TyroshTyrosh JhaqoJhaqo DoreahDoreah MoonMoon Daenerys TargaryenDaenerys Targaryen onnonn Khal JommoKhal Jommo Lord MormontLord Mormont Robb StarkRobb Stark Jorah Ser MormontJorah Ser Mormont
  27. 27. Image source: https://i.pinimg.com/originals/30/25/20/302520dbb49bb4a01b5687a7e6c6bf60.jpg
  28. 28. Discussion • No difference between ‘old’ and ‘new’ books • Within categories, great variety in entity distributions and results • If a central entity is missed, the performance suffers greatly (e.g. Brave New World) • Coreference resolution particularly difficult in this domain D I G I TA L H U M A N I T I E S L A B Image source: https://www.nuffoodsspectrum.in/uploads/articles/quarterly_results_bg-4192.jpg
  29. 29. Why is fiction hard for NLP? • Fiction writers don’t have to abide by conventions: they can use language more creatively than newspaper journalists • mix languages • make up languages • use nicknames • Narratives written from first-person perspective confuse the software D I G I TA L H U M A N I T I E S L A B Image source: https://steamuserimages-a.akamaihd.net/ugc/859477733475369907/F34770D6EFEC30A70A84BEFE93C2C522C0B4A902/
  30. 30. ChalaisChalais M. BonacieuxM. Bonacieux de M. Busignyde M. Busigny Houdiniere LaHoudiniere La John FeltonJohn Felton Bois-Tracy de Ma...Bois-Tracy de Ma... de M. Schombergde M. Schomberg LubinLubin Porthos MonsieurPorthos Monsieur la Harpe de Ruela Harpe de Rue RochellaisRochellais Richelieu deRichelieu de de Busigny Monsi...de Busigny Monsi... Milady ClarikMilady Clarik RochefortRochefort Grimaud MonsieurGrimaud Monsieur M. CoquenardM. Coquenard de Treville Mons...de Treville Mons... Mr. FeltonMr. Felton MontagueMontague dâArtagnan Mon...dâArtagnan Mon... Buckingham de Mo...Buckingham de Mo... de Monsieur Voit...de Monsieur Voit... Monsieur Bernajo...Monsieur Bernajo... III HenryIII Henry Monsieur Dessess...Monsieur Dessess... de Chevreuse Mad...de Chevreuse Mad... Donna EstafaniaDonna Estafania Lord DukeLord Duke Quixote DonQuixote Don Lorme de MarionLorme de Marion de Cahusac Monsi...de Cahusac Monsi... BazinBazin Chevalier Monsie...Chevalier Monsie... MusketeerMusketeer Constance Bonaci...Constance Bonaci... M. DessessartM. Dessessart GermainGermain de M. Cavoisde M. Cavois JudithJudith GasconGascon MousquetonMousqueton Monsieur AthosMonsieur Athos Duke MonsieurDuke Monsieur Charlotte BacksonCharlotte Backson BethuneBethune Planchet MonsieurPlanchet Monsieur Louis XIIILouis XIII Bonacieux MadameBonacieux Madame de Benserade Mon...de Benserade Mon... GervaisGervais MeungMeung Chesnaye LaChesnaye La Bonacieux Monsie...Bonacieux Monsie... ChrysostomChrysostom Wardes de De M.Wardes de De M. Coquenard Monsie...Coquenard Monsie... PatrickPatrick BerryBerry MandeMande Laporte M.Laporte M. de M. Laffemasde M. Laffemas Laporte MonsieurLaporte Monsieur Louis XIVLouis XIV AnneAnne de M. Tremouille...de M. Tremouille... NormanNorman de M. Bassompier...de M. Bassompier... IV HenryIV Henry Villiers GeorgeVilliers George BearnaisBearnais I CharlesI Charles PierrePierre monsieur Aramis ...monsieur Aramis ... JussacJussac DenisDenis GasconsGascons Coquenard MadameCoquenard Madame CrevecoeurCrevecoeur PicardPicard pope Popepope Pope de M. Trevillede M. Treville de Marie Medicisde Marie Medicis LorraineLorraine #N/A#N/A Cardinal MonsieurCardinal Monsieur FourreauFourreau BicaratBicarat Marie Michon MAR...Marie Michon MAR... Lord de WinterLord de Winter Milady de De Win...Milady de De Win... M. dâArtagnanM. dâArtagnan DukeDuke Messieurs PorthosMessieurs Porthos KittyKitty The Three Musketeers: F1 32 - 48
  31. 31. ChalaisChalais M. BonacieuxM. Bonacieux de M. Busignyde M. Busigny Houdiniere LaHoudiniere La John FeltonJohn Felton Bois-Tracy de Ma...Bois-Tracy de Ma... de M. Schombergde M. Schomberg LubinLubin Porthos MonsieurPorthos Monsieur la Harpe de Ruela Harpe de Rue RochellaisRochellais de Marie Medicisde Marie Medicis de Busigny Monsi...de Busigny Monsi... Milady ClarikMilady Clarik RochefortRochefort Grimaud MonsieurGrimaud Monsieur M. CoquenardM. Coquenard de Treville Mons...de Treville Mons... Commissary Monsi...Commissary Monsi... Mr. FeltonMr. Felton MontagueMontague Buckingham de Mo...Buckingham de Mo... de Monsieur Voit...de Monsieur Voit... M. DartagnanM. Dartagnan Monsieur Bernajo...Monsieur Bernajo... III HenryIII Henry Monsieur Dessess...Monsieur Dessess... de Chevreuse Mad...de Chevreuse Mad... Donna EstafaniaDonna Estafania Lord DukeLord Duke Quixote DonQuixote Don Lorme de MarionLorme de Marion de Cahusac Monsi...de Cahusac Monsi... BazinBazin Chevalier Monsie...Chevalier Monsie... MusketeerMusketeer M. DessessartM. Dessessart GermainGermain de M. Cavoisde M. Cavois JudithJudith Monsieur Dartagn...Monsieur Dartagn... GasconGascon MousquetonMousqueton Monsieur AthosMonsieur Athos Duke MonsieurDuke Monsieur Charlotte BacksonCharlotte Backson BethuneBethune Planchet MonsieurPlanchet Monsieur Louis XIIILouis XIII Milady de WinterMilady de Winter Bonacieux MadameBonacieux Madame de Benserade Mon...de Benserade Mon... GervaisGervais MeungMeung Chesnaye LaChesnaye La Bonacieux Monsie...Bonacieux Monsie... ChrysostomChrysostom Wardes de De M.Wardes de De M. Coquenard Monsie...Coquenard Monsie... PatrickPatrick Lord de De WinterLord de De Winter BerryBerry MandeMande Laporte M.Laporte M. Richelieu deRichelieu de GodeauGodeau Laporte MonsieurLaporte Monsieur Louis XIVLouis XIV AnneAnne de M. Tremouille...de M. Tremouille... NormanNorman de M. Bassompier...de M. Bassompier... IV HenryIV Henry Villiers GeorgeVilliers George de M. Laffemasde M. Laffemas BearnaisBearnais PierrePierre monsieur Aramis ...monsieur Aramis ... JussacJussac DenisDenis GasconsGascons CrevecoeurCrevecoeur PicardPicard pope Popepope Pope de M. Trevillede M. Treville de Monsieur Cavo...de Monsieur Cavo... LorraineLorraine Dangouleme DucDangouleme Duc #N/A#N/A Cardinal MonsieurCardinal Monsieur FourreauFourreau BicaratBicarat Marie Michon MAR...Marie Michon MAR... I CharlesI CharlesDukeDuke VilleroyVilleroy Messieurs PorthosMessieurs Porthos KittyKitty Bonacieux Consta...Bonacieux Consta... The Three Musketeers after rewriting d’Artagnan to Dartagnan
  32. 32. Image source: https://static.boredpanda.com/blog/wp-content/uploads/2015/10/funny-game-of-thrones-memes-fb__700.jpg
  33. 33. Performance fixes • Replace word names with generic names • Remove apostrophes from names • But: • Requires manual intervention • Doesn’t scale D I G I TA L H U M A N I T I E S L A B
  34. 34. Where to go from here? • Robuster NLP tools are necessary to better understand novels (and other non-newspaper texts) • Background knowledge can help (e.g. GoT Wiki lists all Danaerys’ nicknames) • But: not all books are that popular • Also: different names are used in different contexts, you may not want to collapse them! • Always: don’t just assume it works, look into your data! • Full paper at: http://peerj.com/articles/cs-189 D I G I TA L H U M A N I T I E S L A B Image source: https://news.images.itv.com/image/file/1232718/stream_img.jpg
  35. 35. Digital Humanities Lab History, Literary Studies, History of Science & Scholarship Social History Dutch Language & Culture https://huc.knaw.nl/
  36. 36. Slide by Antal van den Bosch
  37. 37. Slide by Antal van den Bosch
  38. 38. Cultural Artificial Intelligence Making AI culturally aware Appreciate the user Being contextually appropriate Understand the issues What do you get when you invert “Digital Humanities”? Slide by Antal van den Bosch
  39. 39. Applications of Cultural AI: Filters and flags • Toxicity • Protective filters (like spam filters and ad blockers) • Gender • Linguistic filters and helpers • Fake news • Meme detectors, explanations Slide by Antal van den Bosch
  40. 40. Theory of Cultural AI: Understanding & nuance • Understanding concepts • Changes over time • Perspectives • Evolution • Knowing the origins of digital stories • Understanding viral potential • Language is “social and cultural data” (Nguyen, 2017) Slide by Antal van den Bosch
  41. 41. Some DHLab projects • Food culture via newspaper recipes (Meertens and IISH) • Analysing online debates: refugee vs migrant (with EUR) • Amsterdam Time Machine (with many partners) • Tracing 18th century career trajectories (with HuC-DI & Huygens Institute) • Analysing the concept ‘violence’ through time (with NLeSc, OU & NIOD) D I G I TA L H U M A N I T I E S L A B
  42. 42. Debates on the refugee crisis • From 2015 on, wider use of both ‘European refugee crisis’ and ‘European migrant crisis’ in the news and social media • “Framing labels” (Knoll, Redlawsk, & Sanborn, 2011) imply two different frames: • ‘Refugee’ – people fleeing conflict or persecution • ‘Migrant’ – improving economic situation • Mixed usage and mislabeling have implications for refugees, e.g., negative influence on perceptions of host countries D I G I TA L H U M A N I T I E S L A B
  43. 43. DHLab@HuC: Advancing the humanities through digital methods • DHLabHuC / adinanerghes / melvinwevers / merpeltje • https://dhlab.nl (under construction) Melvin WeversAdina NerghesMarieke van Erp
  44. 44. Grazie per la vostra attenzione!

×