Develop open source search engine

1,398 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,398
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Develop open source search engine

  1. 1. DEVELOP OPEN SOURCE SEARCH ENGINE26th Feb 2012 Ritesh Ambastha – CEO, iWillStudy.com
  2. 2. Open Source Search Engines Lucen DataparkSphinx Search Zettair e YaCy Xapian SWISH-E Seeks Recoll OpenFTS Nutch Namazu
  3. 3. Platform Ideas ! Credits: http://zooie.wordpress.com
  4. 4. Comparision Credits: http://zooie.wordpress.com
  5. 5. Comparision Credits: http://zooie.wordpress.com
  6. 6. We are going to talk about Sphinx & Apache- Solr
  7. 7. Sphinx Sphinx is an open source full text search server. Its written in C++ and works on Linux (RedHat, Ubuntu, etc), Window s, MacOS, Solaris, FreeBSD, a nd a few other systems. Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily
  8. 8. Sphinx Text processing features Searching via SphinxAPI is as simple as 3 lines of code, and querying via SphinxQL is even simpler Sphinx clusters scale up to billions of documents and tens of millions search queries per day, powering top websites such as Craigslist, DailyMotion, NetLog, etc.
  9. 9. Performance and scalability Indexing performance: Sphinx indexes up to 10- 15 MB of text per second per single CPU core. Searching performance: Searching through 1,000,000-document, 1.2 GB text collection that they use for everyday development and testing runs at 500+ queries/sec on a 2-core desktop machine with 2 GB of RAM. Scalability: Biggest known Sphinx cluster indexes almost 5 billion documents, resulting in over 6 TB of data. Busiest known one is, unsurpisingly, Craigslist, top- 10 website in the US that serves 50+ million search
  10. 10. Key Features Batch and Real-Time full-text indexes Non-text attributes support SQL database indexing Non-SQL storage indexing Easy application integration Advanced full-text searching syntax Rich database-like querying features Better relevance ranking Flexible text processing Distributed searching
  11. 11. http://lucene.apache.org/solr/
  12. 12. Solr is thepopular, blazing fastopen source enterprisesearch platform fromthe Apache Luceneproject.
  13. 13. Its major features includepowerful full-text search, hithighlighting, facetedsearch, dynamicclustering, databaseintegration, rich document(e.g., Word, PDF)handling, and geospatial
  14. 14. Solr is written in Javaand runs as astandalone full-textsearch server within aservlet container suchas Tomcat.
  15. 15. Solr Features Advanced Full-Text Search Capabilities Optimized for High Volume Web Traffic Standards Based Open Interfaces - XML,JSON and HTTP Comprehensive HTML Administration Interfaces Server statistics exposed over JMX for monitoring Scalability - Efficient Replication to other Solr Search Servers Flexible and Adaptable with XML configuration Extensible Plugin Architecture
  16. 16. What is it all about?
  17. 17. Solr is based on Lucene
  18. 18. More about Lucene

×