Indexing with solr search server and
hadoop framework.
indexing
• indexing collects, parses, and stores data to facilitate fast and
accurate information retrieval.
• The purpose...
Why hadoop + solr ?
• Data set outgrows the storage capacity of a single physical machine.
• Distributed filesystems more ...
Continue…
• A program written in other frameworks may require large amounts of
refactoring when scaling from ten to one hu...
Continue…
• Highly fault-tolerant
• Suitable for applications with large data sets
• A HTTP browser can be used to browse ...
Solr
• Advanced Full-Text Search Capabilities
• Optimized for High Volume Web Traffic
• Standards Based Open Interfaces - ...
Solr cloud
• New in Solr 4.0
• Easier scaling
• Centralized config
• Fault tolerant indexing and querying
• Using Apache Z...
slave
slave
slave
Solr server
Solr server
Solr server
master ZooKee
per
Solr cloud
Technology and Platform
Technology: Hadoop, Solr
Front End: Solr
Back End: Hadoop Framework, solr search
server
Thank you
Upcoming SlideShare
Loading in …5
×

Indexing with solr search server and hadoop framework

759 views

Published on

Why to combine Hadoop and solr,two cutting edge open source technologies.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
759
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Indexing with solr search server and hadoop framework

  1. 1. Indexing with solr search server and hadoop framework.
  2. 2. indexing • indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. • The purpose of storing an index is to optimize speed and performance in finding documents. • Without an index, the search engine would scan every document. • The additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval.
  3. 3. Why hadoop + solr ? • Data set outgrows the storage capacity of a single physical machine. • Distributed filesystems more complex than regular disk filesystems. • Biggest challenges is making the filesystem tolerate node failure without suffering data loss. • Hadoop comes with a distributed filesystem called HDFS. • HDFS is built around the idea that the most efficient data processing pattern is a write-once, read-many-times pattern. • Hadoop doesn’t require expensive, highly reliable hardware to run on.
  4. 4. Continue… • A program written in other frameworks may require large amounts of refactoring when scaling from ten to one hundred or one thousand machines. • This may involve having the program be rewritten several times • Hadoop is specifically designed to have a very flat scalability curve. • In Hadoop very little--if any--work is required for that same program to run on a much larger amount of hardware. • Hadoop platform will manage the data and hardware resources and provide dependable performance growth proportionate to the number of machines available.
  5. 5. Continue… • Highly fault-tolerant • Suitable for applications with large data sets • A HTTP browser can be used to browse the files of a HDFS instance. • Detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS.
  6. 6. Solr • Advanced Full-Text Search Capabilities • Optimized for High Volume Web Traffic • Standards Based Open Interfaces - XML, JSON and HTTP • Comprehensive HTML Administration Interfaces • Linearly scalable, auto index replication, auto failover and recovery • Near Real-time indexing • Flexible and Adaptable with XML configuration • Extensible Plugin Architecture
  7. 7. Solr cloud • New in Solr 4.0 • Easier scaling • Centralized config • Fault tolerant indexing and querying • Using Apache ZooKeeper as registry
  8. 8. slave slave slave Solr server Solr server Solr server master ZooKee per Solr cloud
  9. 9. Technology and Platform Technology: Hadoop, Solr Front End: Solr Back End: Hadoop Framework, solr search server
  10. 10. Thank you

×