IST 441 : Progress Report
Query Formulation for Similarity
Student : Nitish Upreti
Customer : Kyle Williams
What is Similarity Search?
• Given a sample document and a standard Web
search engine, the goal is to find similar
• Multiple Applications of Similarity Search :
Process of locating instances of plagiarism in a
suspicious document from the web.
Research Paper Recommendation
Finding relevant documents for research paper
Query Formulation Approach
(Automatic extraction of relevant terms from a given corpus)
Java Automatic Term Extraction toolkit
A library of state-of-the-art term extraction
algorithms and framework for developing term
• Topic models provide a simple way to analyze
large volumes of unlabeled text.
• A "topic" consists of a cluster of words that
frequently occur together.
• Using contextual clues, topic models can
connect words with similar meanings and
distinguish between uses of words with
MALLET + MAUI
• The MALLET topic model package includes an
extremely fast and efficient methods for
document topic hyper parameter optimization,
and tools for inferring topics for new documents
given trained models.
• MAUI uses candidate generation algorithms to
identify topics in a given document and then
filtering, analyzing the properties, or features, of
the candidate topics and filtering out the most