Introduction
• SUCHE German word for Search
• Project is a fully functioning extensible Search Engine.
• Also has:
– Auto Completion
– Spell Correction
– Query Language/Grammar Parsing
– Authentication
– Relevant Suggestions
– AJAX Based Querying
– Extensible via Plug-ins
Overall structure
Methodology
User Management
• Django Authentication framework:
– User model
– validation
– authentication
– Authorized access
Spell Correction
• Runs continuously in Background
• Reads words through the interface
provided:
– DB
– Named Pipes
• Words with counts loaded at startup
Spell Correction
• Based on Baye’s Theorem of conditional
probability.
• We use : argmaxc P(w|c) P(c) / P(w)
Where:
– P(c), the probability that a proposed
correction c stands on its own.
– P(w|c), the probability that w would be typed in
a text when the author meant c.
Spell Correction
• Process:
– Read the word
– Calculate possible words by deletion,
transposition, instertion, etc.
– Check if the word is currently present and
find its occurance probability.
– Return maximum probable word.
Plugin Support
• Easy extension of required features by the users.
• Emphasizes Selective Implentation
• Plug-in designers can design and submit Plug-
ins for approval.
• Separative Deployment
• Private-Key based Verification
Plugin Support
• Grammar/Language Parsing
– Each Plug-in has a specific grammar
• E.g. <temperature|temp><?for|of><$query>
• This is used for: temperature of kathmandu
• Returns ‘kathmandu’ to the Temperature Plugin
Plugin Support
• Process:
– Read Corrected Query
– Format Words I,e, remove unwanted
spaces, symbols.
– Match Stored Grammar
– Call Corresponding Results
– Return Result
Crawler
•Process:
–Read scheduled URLs
–Visit URLs for fresh content
–Save complete page
–Schedule another crawl date
Indexer
•Process:
–Read Unprocessed websites
–Undo result of previous content
–Analyze content
–Create Reverse Index
Search
•Process:
–Read query from user
–Pass to plugin Handler
–Search for each word in query
–Combine and rank the result
–Display the final result
–Uses pagerank algorithm
Autocompletion
•Process:
–Get current incomplete query
–Search for query in cache
–Complete the query using language models
–Return the various alternatives
Further recommendation
• Image search/classification
• Video search
• Knowledge extraction
• Improved NLP

Search Engine Project Presentation

  • 1.
  • 2.
    • SUCHE Germanword for Search • Project is a fully functioning extensible Search Engine. • Also has: – Auto Completion – Spell Correction – Query Language/Grammar Parsing – Authentication – Relevant Suggestions – AJAX Based Querying – Extensible via Plug-ins
  • 3.
  • 4.
  • 5.
    User Management • DjangoAuthentication framework: – User model – validation – authentication – Authorized access
  • 6.
    Spell Correction • Runscontinuously in Background • Reads words through the interface provided: – DB – Named Pipes • Words with counts loaded at startup
  • 7.
    Spell Correction • Basedon Baye’s Theorem of conditional probability. • We use : argmaxc P(w|c) P(c) / P(w) Where: – P(c), the probability that a proposed correction c stands on its own. – P(w|c), the probability that w would be typed in a text when the author meant c.
  • 8.
    Spell Correction • Process: –Read the word – Calculate possible words by deletion, transposition, instertion, etc. – Check if the word is currently present and find its occurance probability. – Return maximum probable word.
  • 9.
    Plugin Support • Easyextension of required features by the users. • Emphasizes Selective Implentation • Plug-in designers can design and submit Plug- ins for approval. • Separative Deployment • Private-Key based Verification
  • 10.
    Plugin Support • Grammar/LanguageParsing – Each Plug-in has a specific grammar • E.g. <temperature|temp><?for|of><$query> • This is used for: temperature of kathmandu • Returns ‘kathmandu’ to the Temperature Plugin
  • 11.
    Plugin Support • Process: –Read Corrected Query – Format Words I,e, remove unwanted spaces, symbols. – Match Stored Grammar – Call Corresponding Results – Return Result
  • 12.
    Crawler •Process: –Read scheduled URLs –VisitURLs for fresh content –Save complete page –Schedule another crawl date
  • 13.
    Indexer •Process: –Read Unprocessed websites –Undoresult of previous content –Analyze content –Create Reverse Index
  • 14.
    Search •Process: –Read query fromuser –Pass to plugin Handler –Search for each word in query –Combine and rank the result –Display the final result –Uses pagerank algorithm
  • 15.
    Autocompletion •Process: –Get current incompletequery –Search for query in cache –Complete the query using language models –Return the various alternatives
  • 16.
    Further recommendation • Imagesearch/classification • Video search • Knowledge extraction • Improved NLP