Entity linking in advertisements

Entity linking in Advertisements
Team: Mentor:
Rounak Patni Pulkit Goel
Kumar Rishabh
Rohit Jain
Siva kumar

Goals
•Identify important entities within the
advertisements.
•Link them to corresponding wikipedia pages.
•Identify relevant concepts in order to
disambigute entity.

Benefits of Wikipedia
•Ever-expanding number of Pages in Corpus
Wikipedia
•A rigorous structure but with low coverage
which emulates real world data very well.
•Many number of entities including proper
names unlikely to be found in any other
collection.
•Redirect pages or disambiguation pages.

Process Overview
•Parser Module - This module parses the the
given webpage page and produces two
documents namely the Advertisments itself and
the Document which will later be used to in the
final steps to disambigute results of the search
module.
•Tokenizer Module - Converts the
advertisments into a list of tokens.
•POS Tagger Module- It is used for marking up a
word in an Ad particular part of speech

Process Overview
•Parsing Module – Returns advertisements in
tree format.
•Noun Phrase Extraction Module - Extract NP
from the tree generated in the previous process.
•Noun Phrase Ranking – Ranks NP using a
heuristic function.

Process Overview
•Entity/Keyword Extraction Module:- Probable
entity and keywords are extracted from the
highest ranked NP.
•Search Module – Returns a list of relevant
documents. The seach module is basically a
inverted index of the wiki dump. We extract only
the titles and summary of the page.
•Filtering of results – Finds out most likely/close
wiki page.

Entity Detection
•Basic Technique for entity detection is chunk
detection via shallow parsing.
•This technique reduces the key-words to be
searched in the corpus, improving performance
and accuracy.

Evaluation and Results
•Advertisement: An Apple a day keeps the
doctor away Wiki Page: Apple(fruit)
•Advertisement: Apple innovates relentlessly to
make great products , buy an apple Wiki Page:
Apple Corporation
•Advertisement: Royal Stag , its your life make it
large Wiki Page: royal stag

Conclusions
• It is possible to use NLP techniques to narrow
down list of words to be searched in the search
engine.
•Context can be extracted from the
advertisement itslef using NLP techniques.
•The search module gives satifactory results on a
simple inverted index created using page titles
and summary.

References
•M. Datar, N. Immorlica, P. Indyk, and V.S. Mirrokni, â€œLocality-sensitive
hashing scheme based on p-stable distributions,â€•Symposium on
Computational Geometry pp. 253â€“262, 2004.
•A.Z. Broder, â€œOn the resemblance and containment of documents,â€•
Proc. Compression and Complexity of Sequences, pp. 21â€“29, Positano Italy,
1997
•A. Andoni and P. Indyk, â€œNear-optimal hashing algorithms for
approximate nearest neighbor in high dimensions,â€•Comm. ACM
51:1, pp. 117â€“ 122, 2008.

Entity linking in advertisements

More Related Content

Viewers also liked

Similar to Entity linking in advertisements

Recently uploaded

Entity linking in advertisements