Your SlideShare is downloading. ×
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Multi-step Classification Approaches to Cumulative Citation Recommendation

277

Published on

Presented at the Open research Areas in Information Retrieval (OAIR 2013) conference

Presented at the Open research Areas in Information Retrieval (OAIR 2013) conference

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
277
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Multi-step ClassificationApproaches to CumulativeCitation RecommendationKrisztian BalogUniversity of StavangerOpen research Areas in Information Retrieval (OAIR) 2013 | Lisbon, Portugal, May 2013Naimdjon Takhirov, Heri Ramampiaro, Kjetil NørvågNorwegian University of Science and Technology
  • 2. Motivation- Maintain the accuracy and high quality ofknowledge bases- Develop automated methods to discover (andprocess) new information as it becomesavailable
  • 3. TREC 2012 KBA track
  • 4. Central?
  • 5. TaskCumulative Citation Recommendation- Filter a time-ordered corpus for documentsthat are highly relevant to a predefined set ofentities- For each entity, provide a ranked list of documentsbased on their “citation-worthiness”
  • 6. Collection and topics- KBA stream corpus- Oct 2011 - Apr 2012- Split into training and testing periods- Three sources: news, social, linking- raw data 8.7TB- cleansed version 1.2TB (270G compressed)- stream documents uniquely identified by stream_id- Test topics (“target entities”)- 29 entities from Wikipedia (27 persons, 2 org)- uniquely identified by urlname
  • 7. Annotation matrixyescontainsmentionnogarbage neutral relevant centralnon-relevantG N R Crelevant
  • 8. Scoring1328055120f6462409e60d2748a0adef82fe68b86d132805788079cdee3c9218ec77f6580183cb16e045132805728080fb850c089caa381a796c34e23d9af81328056560450983d117c5a7903a3a27c959cc682a1328056560450983d117c5a7903a3a27c959cc682a1328056260684e2f8fc90de6ef949946f5061a91e01328056560be417475cca57b6557a7d5db0bbc695913280575204e92eb721bfbfdfa0b1d9476b1ecb0091328058660807e4aaeca58000f6889c31c2471224713280600407a8c209ad36bbb9c946348996f8c616b13280632801ac4b6f3a58004d1596d6e42c4746e2113280646601a0167925256b32d715c1a3a2ee0730c13280629807324a71469556bcd1f3904ba090ab685PositiveNegativeAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakscoreTarget entity: Aharon Barakurlname stream_idCutoff10005005004804504304284283803803753152631328055120f6462409e60d2748a0adef82fe68b86d132805788079cdee3c9218ec77f6580183cb16e045132805728080fb850c089caa381a796c34e23d9af81328056560450983d117c5a7903a3a27c959cc682a1328056560450983d117c5a7903a3a27c959cc682a1328056260684e2f8fc90de6ef949946f5061a91e01328056560be417475cca57b6557a7d5db0bbc695913280575204e92eb721bfbfdfa0b1d9476b1ecb0091328058660807e4aaeca58000f6889c31c2471224713280600407a8c209ad36bbb9c946348996f8c616b13280632801ac4b6f3a58004d1596d6e42c4746e2113280646601a0167925256b32d715c1a3a2ee0730c13280629807324a71469556bcd1f3904ba090ab685PositiveNegativeAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_BarakAharon_Barak
  • 9. Approach
  • 10. Overview- “Is this document central for this entity?”- Binary classification task- Multi-step approach- Classifying every document-entity pair is not feasible- First step do decide whether the document containsthe entity- Subsequent step(s) to decide centrality
  • 11. 2-step classificationMention? Central?Yscore0 1000NY
  • 12. 3-step classificationMention? Relevant? Central?Y Yscore0 1000N N Y
  • 13. ComponentsMention? Central?Yscore0 1000NYMention? Relevant? Central?Y Yscore0 1000N N Ymention detection supervised learning
  • 14. Identifying entity mentions- Goals- High recall- Keep false positive rate low- Efficiency- Detection based on known surface forms ofthe entity- urlname (i.e., Wikipedia title)- name variants from DBpedia- DBpedia-loose: only last names for people- No disambiguation
  • 15. Features1.Document (5)- Length of document fields (body, title, anchors)- Type (news/social/linking)2.Entity (1)- Number of related entities in KB3.Document-entity (28)- Occurrences of entity in document- Number of related entity mentions- Similarity between doc and the entity’s WP page
  • 16. Features4.Temporal (38)- Wikipedia pageviews- Average pageviews- Change in pageviews- Bursts- Mentions in document stream- Average volume- Change in volume- Bursts
  • 17. Results
  • 18. Identifying entity mentionsResults on testing periodIdentificationDocument-entity pairsRecallFalsepositive rateurlnameDBpediaDBpedia-loose41.2K 0.842 0.55970.4K 0.974 0.70112.5M 0.994 0.998
  • 19. End-to-end taskF1 score, single cutoff00.150.30.45J48 RF2-step 3-stepTRECCentral00.250.50.75J48 RF TRECCentral+Relevant
  • 20. End-to-end taskF1 score, per-entity cutoff00.150.30.45J48 RF2-step 3-stepTRECCentral00.250.50.75J48 RF TRECCentral+Relevant
  • 21. Summary- Cumulative Citation Recommendation task@TREC 2012 KBA- Two multi-step classification approaches- Four groups of features- Differentiating between relevant and central isdifficult
  • 22. Classification vs. Ranking[Balog & Ramampiaro, SIGIR’13]- Approach CCR as a ranking task- Learning-to-rank- Pointwise, pairwise, listwise- Pointwise LTR outperforms classificationapproaches using the same set of featureshttp://krisztianbalog.com/files/sigir2013-kba.pdf
  • 23. AABKKnowledge Base Acceleration Assessment & Analysishttp://research.idi.ntnu.no/wislab/kbaaa
  • 24. Questions?Contact | @krisztianbalog | krisztianbalog.com

×