A Research Literature Search Engine With Abbreviation Recognition

891 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
891
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Research Literature Search Engine With Abbreviation Recognition

  1. 1. A research literature search engine with abbreviation recognition Cheng-Tao Chu Pei-Chin Wang
  2. 2. Outline <ul><li>Features </li></ul><ul><li>Demo </li></ul><ul><li>Issues involved </li></ul><ul><li>Implementation </li></ul><ul><ul><li>Tailored Edit Distance </li></ul></ul><ul><ul><li>Probabilistic Model </li></ul></ul><ul><ul><ul><li>Translation Model </li></ul></ul></ul><ul><ul><li>Score Combination </li></ul></ul><ul><li>Evaluation </li></ul><ul><li>Q&A </li></ul>
  3. 3. Features <ul><li>Given a query containing authors, proceeding or title keywords, return relevant papers </li></ul><ul><li>Able to retrieve the desired papers with abbreviated author/proceeding names </li></ul><ul><li>Web interface for query and user evaluation. </li></ul>
  4. 4. Demo <ul><li>It’s show time </li></ul>
  5. 5. Issues involved <ul><li>Tag the arbitrary query into author, proceeding, and other keywords fields </li></ul><ul><li>Recognize author </li></ul><ul><ul><li>P. Raghavan -> Prabhakar Raghavan </li></ul></ul><ul><ul><li> -> Padma Raghavan </li></ul></ul><ul><ul><li> -> … Raghavan </li></ul></ul><ul><ul><li>Probability of each possible candidates </li></ul></ul>
  6. 6. Issues involved (cont.) <ul><li>Recognize proceeding name </li></ul><ul><ul><li>More than a look-up table </li></ul></ul><ul><ul><li>IJCAI -> International Joint Conference of AI </li></ul></ul><ul><ul><li>-> IJCAI Workshop </li></ul></ul><ul><li>How to combine the weight of each candidate </li></ul><ul><ul><li>Score from Lucene </li></ul></ul><ul><ul><li>Score for a possible author </li></ul></ul><ul><ul><li>Score for a possible proceeding </li></ul></ul>
  7. 7. Implementation Database XML Parser Tagger DBLP Search Engine Browser Probabilistic Model Tailored Edit Distance Query Retrieved Documents
  8. 8. Tailored Edit Distance <ul><li>Heuristic </li></ul><ul><ul><li>Award for consecutive matching </li></ul></ul><ul><ul><li>Award for matching capitalized character </li></ul></ul><ul><ul><li>More penalty on substitution, less on insertion/deletion </li></ul></ul><ul><li>Probabilistic representation </li></ul><ul><ul><li>Transform edit distance cost to probability </li></ul></ul><ul><ul><li>Normalize the cost </li></ul></ul><ul><ul><li>Use training data to estimate the distribution </li></ul></ul>
  9. 9. Conceptual Histogram
  10. 10. Probabilistic Model <ul><li>Translation Model </li></ul><ul><ul><li>Use tailored edit distance to estimate the distribution </li></ul></ul><ul><ul><li>Return a distribution of candidate names (Assuming the independency between the full name and its abbreviation given evidence) </li></ul></ul><ul><li>Network Structure </li></ul>Mid. Ini. First Ini. Last Name Last Ini. Middle Name First Name Full Name
  11. 11. Score Combination <ul><li>Lucene score formula </li></ul><ul><li>Assign weights to each candidates as </li></ul><ul><li>Combination score </li></ul><ul><ul><li>Set idf(t) as ( weight of that term + original idf(t) ) </li></ul></ul><ul><ul><li>Assign boost value to each term in query </li></ul></ul>
  12. 12. Evaluation <ul><li>Test data construction </li></ul><ul><li>Evaluation by test data </li></ul><ul><ul><li>precision </li></ul></ul><ul><li>User evaluation </li></ul><ul><ul><li>Comparison with Google Scholar </li></ul></ul>
  13. 13. Q&A

×