• Email
  • Like
  • Save
  • Private Content
  • Embed
 

The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simplify Big Data Analytics

by

  • 5,796 views

Presented by M.C. Srivas | MapR. See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 ...

Presented by M.C. Srivas | MapR. See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012

This session addresses the biggest issue facing Big Data – Search, Discovery and Analytics need to be integrated. While creating and maintaining separate SOLR and Hadoop clusters is time consuming, error prone and difficult to keep in synch, most Hadoop installations do not integrate with SOLR within the same cluster. Find out how to easily integrate these capabilities into a single cluster. The session will also touch on some of the technical aspects of Big Data Search including how to; protect against silent index corruption that permeates large distributed clusters, overcome the shard distribution problem by leveraging Hadoop to ensure accurate distributed search results, and provide real-time indexing for distributed search including support for streaming data capture. Srivas will also share relevant experiences from his days at Google where he ran one of the major search infrastructure teams where GFS, BigTable and MapReduce were used extensively.

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel

5 Embeds 17

http://a0.twimg.com 5
https://twitter.com 5
https://twimg0-a.akamaihd.net 3
https://si0.twimg.com 3
http://tweetedtimes.com 1

Statistics

Likes
7
Downloads
151
Comments
1
Embed Views
17
Views on SlideShare
5,779
Total Views
5,796

11 of 1 previous next

  • davidjeske David Jeske on 11. Sharded text indexing: it looks like there are only shard_count workers doing text-index inversion. Why would it be done this way? The MapReduce should be fed all documents. The Map stage maps terms in a document to an output line like (file:shardid)(key:termid)(key:docid). Then the reducer is run to compress that into an index per shard. This allows the greatest number of workers on the expensive part (the index inversion). 8 months ago
    Are you sure you want to
Post Comment
Edit your comment

The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simplify Big Data Analytics The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simplify Big Data Analytics Presentation Transcript