• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Enterprise Search Using Apache Solr
 

Enterprise Search Using Apache Solr

on

  • 721 views

Apache solr is an enterprise search engine. It facilitates indexing of large number of documents of any size and provides very robust search techniques. This ppt provides brief introduction of it.

Apache solr is an enterprise search engine. It facilitates indexing of large number of documents of any size and provides very robust search techniques. This ppt provides brief introduction of it.

Statistics

Views

Total Views
721
Views on SlideShare
721
Embed Views
0

Actions

Likes
1
Downloads
15
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Enterprise Search Using Apache Solr Enterprise Search Using Apache Solr Presentation Transcript

    • Enterprise Search using Apache Solr Sagar Chaturvedi
    • Agenda • What is Solr? • Features of Solr • High level Architecture of Solr • How things work in Solr? • What is Fuzzy Search? • How is Performance of Solr?
    • Solr - Introduction • An open source Enterprise search platform by Apache. • A full text search server running on Web containers like Tomcat or Jetty. • Indexes input files and provides various search facilities over them. • Uses the Lucene Java search library at its core. Type of Tool: Search and Index API Documentation: http://lucene.apache.org/solr/4_3_0/ License Type Apache License 2.0 Last Release Date 6 May 2013(4.3.0) Release Frequency 1 month approximately Mailing List/Community support http://lucene.apache.org/solr/discussion.html Major Applications/Users Instagram, AOL, the Guardian, Shopper.com, SourceForge, eBay Stability Stable version "4.3.0".
    • Solr - Features • Faceted Search • Can take input in form of XML, CSV, JSON files and from database. • Using Apache Tika, supports more than 25 input formats like PDF and MS Word. • JSON, XML, PHP, Ruby, Python and custom Java binary output formats. • Scalable in form of Solr Cloud • Supports 32 major languages including Chinese, Korean, Japanese, Arabian etc. • Boosting of results • Extensible plugin architecture • HTML Administration Interface
    • Solr - Architecture
    • Solr – Query Processing
    • Solr - Indexing
    • Solr – Fuzzy Search • It is the technique of finding strings that match a pattern approximately (rather than exactly). • It is used to find documents that contain words with similar spelling to the search term. Ex. - If you search for appple then search engine will show all documents having term "apple" also. • Used in spell checking, spam filtering, OCR scanning. • Solr's standard query parser supports fuzzy searches based on the Levenshtein Distance or Edit Distance algorithm. • Closeness of search is based upon Edit Distance (No of steps required to convert one word into another) of words.
    • Solr - Performance • Indexing – The time taken in indexing depends on –  Size and number of fields in each document  Number of fields to be indexed  Type of fields  Machine capabilities (CPU, Memory) With each document size ~1 KB, if we have 100 million documents then total indexing time must be a few hours. • Query – If we have 100 million documents on 10 Solr nodes(10 million documents each) then average search response time is ~1 second.
    • Thank You !!