Concept-Based Information
Retrieval using Explicit
Semantic Analysis
OFER EGOZI, SHAUL MARKOVITCH, and EVGENIY GABRILOVICH...
Content
• Information Retrieval
• Keyword- retrieval
Bag- Of-Word (BOW)
• Irrelevant Data
• Concept Based Retrieval
• Expl...
Information Retrieval Systems
Query
IR
Recall
Precision
Query
Keyword-Based Retrieval
Query
IR
Bag Of
Words
(BOW)
Irrelevant Data ??
• Vocabulary Problems
- Synonymy
- World Knowledge
Concept Based IR
• Transform to a domain of concepts
(not to domain of words)
• Less dependent on specific terms
Explicit Semantic Analysis
Wikipedia Based ESA
ESA Based Data Retrieval - Example
salvaging shipwreck treasure
“ANCIENT ARTIFACTS FOUND.
Divers have recovered artifacts
...
Irrelevant Docs
• ESTONIA AT THE 2000 SUMMER OLYMPICS
• ESTONIA AT THE 2004 SUMMER OLYMPICS
• 2006 COMMONWEALTH GAMES
• ES...
Selecting Query Features
• Selection could remove noisy ESA concepts
• However, IR task provides no training data…
Utility...
Pseudo Relevant Feedback
ESA Feature Selection Methods
• IG- calculate each feature’s Information Gain
in separating positive and negative
examples...
MORAG System
MORAG Evaluation
Conclusion
• MORAG: a new methodology for concept-
based information retrieval
• Documents and query are enhanced by
Wikip...
Thank You
Q & A
Upcoming SlideShare
Loading in …5
×

Concept based information retrieval using explicit

621 views

Published on

Published in: Sports, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
621
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Concept based information retrieval using explicit

  1. 1. Concept-Based Information Retrieval using Explicit Semantic Analysis OFER EGOZI, SHAUL MARKOVITCH, and EVGENIY GABRILOVICH Technion-Israel Institute of Technology
  2. 2. Content • Information Retrieval • Keyword- retrieval Bag- Of-Word (BOW) • Irrelevant Data • Concept Based Retrieval • Explicit Semantic Analysis • Morag System • Conclusion
  3. 3. Information Retrieval Systems Query IR Recall Precision Query
  4. 4. Keyword-Based Retrieval Query IR Bag Of Words (BOW)
  5. 5. Irrelevant Data ?? • Vocabulary Problems - Synonymy - World Knowledge
  6. 6. Concept Based IR • Transform to a domain of concepts (not to domain of words) • Less dependent on specific terms
  7. 7. Explicit Semantic Analysis
  8. 8. Wikipedia Based ESA
  9. 9. ESA Based Data Retrieval - Example salvaging shipwreck treasure “ANCIENT ARTIFACTS FOUND. Divers have recovered artifacts lying underwater for more than 2,000 years in the wreck of a Roman ship that sank in the Gulf of Baratti, 12 miles off the island of Elba, newspapers reported Saturday." •SHIPWRECK •TREASURE •MARITIME ARCHAEOLOGY •MARINE SALVAGE •HISTORY OF THE BRITISH VIRGIN ISLANDS •WRECKING (SHIPWRECK) •KEY WEST, FLORIDA •FLOTSAM AND JETSAM •WRECK DIVING •SPANISH TREASURE FLEET •SCUBA DIVING •WRECK DIVING •RMS TITANIC •USS HOEL (DD-533) •SHIPWRECK •UNDERWATER ARCHAEOLOGY •USS MAINE (ACR-1) •MARITIME ARCHAEOLOGY •TOMB RAIDER II •USS MEADE (DD-602)
  10. 10. Irrelevant Docs • ESTONIA AT THE 2000 SUMMER OLYMPICS • ESTONIA AT THE 2004 SUMMER OLYMPICS • 2006 COMMONWEALTH GAMES • ESTONIA AT THE 2006 WINTER OLYMPICS • 1992 SUMMER OLYMPICS • ATHLETICS AT THE 2004 SUMMER OLYMPICS • 2000 SUMMER OLYMPICS • 2006 WINTER OLYMPICS • CROSS-COUNTRY SKIING 2006 WINTER OLYMPICS • NEW ZEALAND AT THE 2006 WINTER OLYMPICS “Olympic News In Brief: Cycling win for Estonia. Erika Salumae won Estonia's first Olympic gold when retaining the women's cycling individual sprint title she won four years ago in Seoul as a Soviet athlete. " Estonia Economy • ESTONIA • ECONOMY OF ESTONIA • ESTONIA AT THE 2000 SUMMER OLYMPICS • ESTONIA AT THE 2004 SUMMER OLYMPICS • ESTONIA NATIONAL FOOTBALL TEAM • ESTONIA AT THE 2006 WINTER OLYMPICS • BALTIC SEA ?? • EUROZONE • TIIT VÄHI • MILITARY OF ESTONIA??
  11. 11. Selecting Query Features • Selection could remove noisy ESA concepts • However, IR task provides no training data… Utility function U(+|-) requires target measure >> training set f =ESA(q) Filter U f’ Focus on query concepts - Query is short and noisy, while FS at indexing lacks context
  12. 12. Pseudo Relevant Feedback
  13. 13. ESA Feature Selection Methods • IG- calculate each feature’s Information Gain in separating positive and negative examples, take best performing features • IIG- add concepts in the positive examples to candidate features, and re-weight all features based on their weights in examples • RV- find subset of features that best separates positive and negative examples, employing heuristic search
  14. 14. MORAG System
  15. 15. MORAG Evaluation
  16. 16. Conclusion • MORAG: a new methodology for concept- based information retrieval • Documents and query are enhanced by Wikipedia concepts • Informative features are selected using pseudo-relevance feedback • The generated features improve the performance of BOW-based systems
  17. 17. Thank You
  18. 18. Q & A

×