www.polimedia.nlBuilding the PoliMedia system;data- and user-driven
Who are we?Laura Hollink• Assistant professor at VU• Modeling, linking and enrichmentof data• Data-driven research• @laura...
Linking Politics to MediaeHumanities group - PoliMedia 3
The research questions• How is a person, subject or process covered & visualised by the media?• How do debates and argumen...
eHumanities group - PoliMedia 5Issues with current approach
eHumanities group - PoliMedia 6Issues with current approach
Goal: explicit links to different mediatypes in one systemeHumanities group - PoliMedia 7
PoliMedia systemeHumanities group - PoliMedia 8PoliMediaPortal- Browse:debate anddate- Search:debate andpersonNewspapersKB...
DataeHumanities group - PoliMedia 9
Debate dataHandelingen der Staten-General or Dutch Hansardfrom 1945-1995Some provenance:1. Transcripts are made of the com...
eHumanities group - PoliMedia 11DebateMetadataTopic 1Topic 2Speaker 1 / ContentSpeaker 2 / ContentSpeaker 3 / ContentSpeak...
Media data• Newspaper articles– at the National Library of theNetherlands– Many newspapers 1950- 1995– Text + images of ne...
Semantic modelnl.proc.sgd.d.194519460000002nl.proc.sgd.d.194519460000002.1PartOfDebateDebatehttp://resolver.politicalmashu...
Semantic modelsem:hasActorSpeaker_00064Party_kvphasPartyhasSpeakermember_of_parliamentPartyKVPKatholieke Volkspartijrdf:ty...
Linked DataeHumanities group - PoliMedia 15• Data openly accessible in a semantic Web standard• Easy to combine with other...
Linking Debates to Newspaperarticles that cover them• Challenges:– How to link documents that are so different innature?– ...
Linking approacheHumanities group - PoliMedia 17Detecttopics inspeechesCreatequeriesSearchnewspaperarchiveTopicsNamedEntit...
Detect topicsThe MALLET topic model package• Unsupervised analysis of text• “a Topic consists of a cluster of words that f...
Create QuerieseHumanities group - PoliMedia 19NamedEntities fromthe speechNamedEntities fromthe debateintroTopics fromthe ...
Evaluation• Experiment 1: NEs in speech• Experiment 2: NEs + topics in speech• Experiment 3: NEs + topics in speech and de...
Results• A linked open data set of Dutch parliamentarydebates.• With links to URL’s of news paper articles andradio bullet...
User-drivenWhat do scholars want?• Why user research?• Understanding the user [1, 2]– Acceptance– Performance– Capabilitie...
User research in the developmentprocess• Examine search behaviour of users– Survey regarding search strategies– Interviews...
SurveyGeneral search strategies• N=294• Popular search enginesVery oftenOftenRegularlySometimesNeverDon’tknow itGoogleGoog...
SurveyGeneral search strategies1. Keywords 4,752. Advanced search 3,363. Related terms 2,524. Boolean 2,425. Browsing subj...
SurveyConclusions• Google is the dominant search engine• This has two consequences1. People compare other search systems t...
Interviews• N=5• Quantitative (n=2) as well as qualitative (n=4)• Main themes– How do people search currently?– What could...
InterviewsFindings• Key issue is to provide a good overview of data– Why are search results retrieved– How are search resu...
• Clear andimmediatekeyword-search• Support forBooleans and(some) Google-search operators• Separateadvanced-searcheHumanit...
WireframesSearch results• Keyword searchremainsprominent• User chosenranking of results• Keywordhighlighting• Overview ofr...
WireframesDebate page• Keyword searchremainsprominent• Overview ofpeople in debate• Easy access torelated material31eHuman...
Prototype v1.0eHumanities group - PoliMedia 32
Evaluation• Eye tracking evaluation of the search system– Search system was still in development• N=24– History– Political...
EvaluationEye tracking• Viewing Duration• Search bar received little attention aftersearch results were displayed• Facets ...
EvaluationUsability feedback• The ranking of search results was an issue forusers• The year-filter should be a slider• The...
Prototype v2.0eHumanities group - PoliMedia 36
Prototype v2.0 - queryeHumanities group - PoliMedia 37
Prototype v2.0 – filter speakereHumanities group - PoliMedia 38
Prototype v2.0 - filter roleeHumanities group - PoliMedia 39
Prototype v2.0 - debateeHumanities group - PoliMedia 40
Prototype v2.0 - highlight speecheHumanities group - PoliMedia 41
Prototype v2.0 - link newspapereHumanities group - PoliMedia 42
Prototype v2.0 - newspapereHumanities group - PoliMedia 43
Prototype v2.0 - link radioeHumanities group - PoliMedia 44
Conclusion• PoliMedia; data- or user-driven?• Continuous interplay– Users gave input for usefulness of links– Data limits ...
Upcoming SlideShare
Loading in...5
×

Building the PoliMedia search system; data- and user-driven

477

Published on

Presentation at eHumanities group at Meerten's Institute (Amsterdam) on Thursday 18 April 2013.

Analysing media coverage across several types of media-outlets is a challenging task for (media) historians. A specific example of media coverage research investigates the coverage of political debates and how the representation of topics and people change over time. The PoliMedia project (http://www.polimedia.nl) aims to showcase the potential of cross-media analysis for research in the humanities, by 1) curating automatically detected semantic links between four data sets of different media types, and 2) developing a demonstrator application that allows researchers to deploy such an interlinked collection for quantitative and qualitative analysis of media coverage of debates in the Dutch parliament.

These two goals reflect the two perspectives on the development of a search system such as PoliMedia; data- and user-driven. In this presentation, Laura Hollink (VU) will present the data-driven perspective of linking between different datasets and the research questions that arise in achieving this linkage: how to combine different types of datasets and what kind of research questions are made possible by the data? Max Kemman (EUR) will present the user-driven perspective: which benefits can scholars have from linking of these datasets? What are the user requirements for the PoliMedia search system and how was the system evaluated with scholars in an eye tracking study?

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
477
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Create explicit links.
  • Go to archives, look up original data, decide whether there is a link to a debate.
  • Many systems, cross media analysis is difficult.
  • Debates.
  • used to check models, summarize the corpus, and guide exploration of its contents
  • Manual evaluation of relevance media items to political speech? = unsure about relevance0 = not relevant1 = partially relevant2 = relevant
  • Context metadata:Roles of peopleLinks toexternal databasesTypes of documentsTypes of presentation (dramatic, humoristic, etc.)
  • Building the PoliMedia search system; data- and user-driven

    1. 1. www.polimedia.nlBuilding the PoliMedia system;data- and user-driven
    2. 2. Who are we?Laura Hollink• Assistant professor at VU• Modeling, linking and enrichmentof data• Data-driven research• @laurahollinkMax Kemman• Junior researcher at EUR• Human-Computer Interaction• User-driven research• @MaxJ_KeHumanities group - PoliMedia 2PoliMedia teamHenri Beunders (EUR)Jaap Blom (NISV)Laura Hollink (VU)Geert-Jan Houben (TU Delft)Funded by CLARIN-NLDamir Juric (TU Delft)Max Kemman (EUR)Martijn Kleppe (EUR)Johan Oomen (NISV)
    3. 3. Linking Politics to MediaeHumanities group - PoliMedia 3
    4. 4. The research questions• How is a person, subject or process covered & visualised by the media?• How do debates and arguments develop over a longer period of time?• Analysing the changing ideas, arguments and presentation in differentmediaeHumanities group - PoliMedia 4
    5. 5. eHumanities group - PoliMedia 5Issues with current approach
    6. 6. eHumanities group - PoliMedia 6Issues with current approach
    7. 7. Goal: explicit links to different mediatypes in one systemeHumanities group - PoliMedia 7
    8. 8. PoliMedia systemeHumanities group - PoliMedia 8PoliMediaPortal- Browse:debate anddate- Search:debate andpersonNewspapersKBTelevisionSound and VisionRadioKBStatenGeneraalDigitaalKBData-driven (Laura) & user-driven (Max)
    9. 9. DataeHumanities group - PoliMedia 9
    10. 10. Debate dataHandelingen der Staten-General or Dutch Hansardfrom 1945-1995Some provenance:1. Transcripts are made of the complete debates of the Dutchparliament.2. Published online by the government onhttp://www.statengeneraaldigitaal.nl/ (1818 1995) andhttp://officielebekendmakingen.nl/ (from 1995)3. PoliticalMashup project has translated government pdf andtxt files into XML, incl URI’s as identifiers, seehttp://politicalmashup.nl/4. We build on that.
    11. 11. eHumanities group - PoliMedia 11DebateMetadataTopic 1Topic 2Speaker 1 / ContentSpeaker 2 / ContentSpeaker 3 / ContentSpeaker 1 / ContentStructureof thedebate dataIncluding:• who, when, what• identifiers for subpartsof the debate• chronological order ofspeakers
    12. 12. Media data• Newspaper articles– at the National Library of theNetherlands– Many newspapers 1950- 1995– Text + images of newspaperlayout• Radio bulletins– Transcripts of ANP news• Newscasts– in the Academia collection of theNetherlands institute for Soundand Vision
    13. 13. Semantic modelnl.proc.sgd.d.194519460000002nl.proc.sgd.d.194519460000002.1PartOfDebateDebatehttp://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002http://statengeneraaldigitaal.nl/http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdfnl.proc.sgd.d.19720000002Handelingen VerenigdeVergadering...Dutch1945-11-20rdf:typedc:iddc:sourcedc:sourcedc:publisherdc:languagedc:datehasPartrdf:typenl.proc.sgd.d.194519460000002.1.1hasPartDebateContextrdf:typenl.proc.sgd.d.194519460000002.1.2Speechrdf:typehasPartnl.proc.sgd.d.194519460000002.1.3hasSubsequentSpeech"Mijnheer deVoorzitter, deCommissievan …"hasSpokenTextsem:hasActor"De voorzitteropent devergadering…"hasTexthttp://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocrcoveredInnl.proc.sgd.d.194519460000002.2hasSubsequentPartOfDebate
    14. 14. Semantic modelsem:hasActorSpeaker_00064Party_kvphasPartyhasSpeakermember_of_parliamentPartyKVPKatholieke Volkspartijrdf:typehasAcronymhasFullNameJoannes Antonius JamesBargefoaf:firstNamefoaf:lastNameBargerdfs:labelhttp://resolver.politicalmashup.nl/nl.m.00064dc:sourcePoliticianrdf:typehasRoleReuse of vocabularies:Simple Event Model (SEM),Dublin Core, FOAF, links toISOCAT data categories.
    15. 15. Linked DataeHumanities group - PoliMedia 15• Data openly accessible in a semantic Web standard• Easy to combine with other semantic Web data• E.g. DBpedia data on politicians and parties.
    16. 16. Linking Debates to Newspaperarticles that cover them• Challenges:– How to link documents that are so different innature?– Can we use the structure of the debates: people,chronologic order of speeches, introductions toeach new topic, etc.– How can we do this efficiently, using the accessmechanisms of the archives?eHumanities group - PoliMedia 16
    17. 17. Linking approacheHumanities group - PoliMedia 17Detecttopics inspeechesCreatequeriesSearchnewspaperarchiveTopicsNamedEntitiesName ofspeakerDetectNamedEntities inspeechesCandidatearticlesQueriesRankcandidatearticlesLinksbetweenspeechesand articlesDebatesDate ofdebate
    18. 18. Detect topicsThe MALLET topic model package• Unsupervised analysis of text• “a Topic consists of a cluster of words that frequently occur together”• [see http://mallet.cs.umass.edu/topics.php]• Input:– Text– Number of iterations– Number of topics/clusters• Output:– Words that cluster around one topic.• Example:– Text: a speech in a debate from 1975– number of iterations: 2000– number of topics: 1
    19. 19. Create QuerieseHumanities group - PoliMedia 19NamedEntities fromthe speechNamedEntities fromthe debateintroTopics fromthe speechTopics fromthe debateintroName ofspeaker Date of debateNamedEntities fromthe speechNamedEntities fromthe debateintroTopics fromthe speechTopics fromthe debateintro
    20. 20. Evaluation• Experiment 1: NEs in speech• Experiment 2: NEs + topics in speech• Experiment 3: NEs + topics in speech and debateeHumanities group - PoliMedia 20
    21. 21. Results• A linked open data set of Dutch parliamentarydebates.• With links to URL’s of news paper articles andradio bulletins at the Royal Library.• A system that supports researchers in findingthe data to answer their questions.eHumanities group - PoliMedia 21
    22. 22. User-drivenWhat do scholars want?• Why user research?• Understanding the user [1, 2]– Acceptance– Performance– Capabilities– Weaknesses• Goal– Creating a system that is intuitive and helpful to the users[1] Y. Liu, A. Osvalder, and M. Karlsson, “Considering the importance of user profiles ininterface design,” no. May, 2010[2] J. Preece, Y. Rogers, and H. Sharp, “Interaction Design: Beyond Human-ComputerInteraction,” Design, vol. 18, no. 1, pp. 68-68, 2002eHumanities group - PoliMedia 22
    23. 23. User research in the developmentprocess• Examine search behaviour of users– Survey regarding search strategies– Interviews• User wishes → user requirements• Wireframes → Prototype• Evaluation →New prioritization of remaininguser requirements• Final versioneHumanities group - PoliMedia 23
    24. 24. SurveyGeneral search strategies• N=294• Popular search enginesVery oftenOftenRegularlySometimesNeverDon’tknow itGoogleGoogleImagesGoogleScholarYouTubeJSTORKBFlickrEBSCONationaalArchiefWebofKnowledgeUitzendingGemistYahoo!BingAcademia.nlEuropeanaScopusMicrosoftAcademicSearchEUscreenArkyves24
    25. 25. SurveyGeneral search strategies1. Keywords 4,752. Advanced search 3,363. Related terms 2,524. Boolean 2,425. Browsing subjectcategories 2,296. Filters 2,197. Thesaurus 1,878. Visualization 1,22eHumanities group - PoliMedia 25
    26. 26. SurveyConclusions• Google is the dominant search engine• This has two consequences1. People compare other search systems to theirexperience with Google2. The search task is mainly performed by usingkeywordseHumanities group - PoliMedia 26
    27. 27. Interviews• N=5• Quantitative (n=2) as well as qualitative (n=4)• Main themes– How do people search currently?– What could be improved about current search systems?– What should PoliMedia offer, given its goals?• Results– 39 user wishes– Prioritized internally• 19 user wishes deemed out of scope• 20 user requirementseHumanities group - PoliMedia 27
    28. 28. InterviewsFindings• Key issue is to provide a good overview of data– Why are search results retrieved– How are search results ranked• Assumptions of relevance– Higher frequency of keywords indicated higher relevancy toquery?– Longer segments (speeches and articles) indicate higherimportance?• Many more or less out-of-scope wishes to make currentresearch easier– Sentiment-metadata– Context metadata– Ability to export to own softwareeHumanities group - PoliMedia 28
    29. 29. • Clear andimmediatekeyword-search• Support forBooleans and(some) Google-search operators• Separateadvanced-searcheHumanities group - PoliMedia 29WireframesSearch interface
    30. 30. WireframesSearch results• Keyword searchremainsprominent• User chosenranking of results• Keywordhighlighting• Overview ofrelated media• Support forfilteringeHumanities group - PoliMedia 30
    31. 31. WireframesDebate page• Keyword searchremainsprominent• Overview ofpeople in debate• Easy access torelated material31eHumanities group - PoliMedia
    32. 32. Prototype v1.0eHumanities group - PoliMedia 32
    33. 33. Evaluation• Eye tracking evaluation of the search system– Search system was still in development• N=24– History– Political communication• Goals– Gain understanding of distribution of attention– Collect general feedback on interfaceeHumanities group - PoliMedia 33
    34. 34. EvaluationEye tracking• Viewing Duration• Search bar received little attention aftersearch results were displayed• Facets received a lot of attention• Page-search (CTRL+F) mainly receivedattention on debate page vieweHumanities group - PoliMedia 34Tasks Search bar Facets Search results Page-searchKnown Item 17% 22% 60% 2%Exploratory 6% 12% 80% 2%
    35. 35. EvaluationUsability feedback• The ranking of search results was an issue forusers• The year-filter should be a slider• The debate page should be greatly improved– Better identification for speaker, party, topic,relevance to query– Provide filters on debate-page as welleHumanities group - PoliMedia 35
    36. 36. Prototype v2.0eHumanities group - PoliMedia 36
    37. 37. Prototype v2.0 - queryeHumanities group - PoliMedia 37
    38. 38. Prototype v2.0 – filter speakereHumanities group - PoliMedia 38
    39. 39. Prototype v2.0 - filter roleeHumanities group - PoliMedia 39
    40. 40. Prototype v2.0 - debateeHumanities group - PoliMedia 40
    41. 41. Prototype v2.0 - highlight speecheHumanities group - PoliMedia 41
    42. 42. Prototype v2.0 - link newspapereHumanities group - PoliMedia 42
    43. 43. Prototype v2.0 - newspapereHumanities group - PoliMedia 43
    44. 44. Prototype v2.0 - link radioeHumanities group - PoliMedia 44
    45. 45. Conclusion• PoliMedia; data- or user-driven?• Continuous interplay– Users gave input for usefulness of links– Data limits what features we can offer to users• Collection quality and usability are both critical tousers [3][3] Xie, I. (2006). Evaluation of digital libraries: Criteria and problems from users’perspectives. Library & Information Science Research, 28(3), 433–452.doi:10.1016/j.lisr.2006.06.002eHumanities group - PoliMedia 45
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×