Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Upcoming SlideShare
Loading in...5
×
 

Chicago Solr Meetup - June 10th: Exploring Hadoop with Search

on

  • 149 views

 

Statistics

Views

Total Views
149
Views on SlideShare
149
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • We’ve all seen this. <br /> You see search showing up there, but what does that really mean? <br /> --Is it push or is it pull? <br /> Well we have multiple options
  • --Directly from Ingestion, you can send to solr with the respective serializer classes. <br /> --Hbase is interesting. It’s the SQL like store for HDFS <br /> --Notice that all of these are pushes. I haven’t included pull yet, but they do exist. <br /> --One thing to note however is that HBase does have a Web access layer where you can make RestFul calls to grab data. <br />
  • Complimentary <br /> = Intelligence system of large textual data sets
  • --Hbase is the SQL Store in HDFS <br /> --Has distribution with Master and RegionServers <br /> --There is an open source project called the Hbase Indexer that creates a façade <br /> <br /> Most importantly, you can store data in HDFS and search it with Solr without storing in Solr so taking advantage of the strengths of both. <br />
  • This is what the architecture of this setup looks like.— <br /> --Our data source is twitter. <br /> --Flume is serializing it and writing directly to Hbase <br /> --Hbase is setup with a façade replication that behind the scenes is an indexer to solr <br /> --Then we are using SilK (i.e. banana) to visualize that that comes through
  • You can apply type of architecture to many use cases …

Chicago Solr Meetup - June 10th: Exploring Hadoop with Search Chicago Solr Meetup - June 10th: Exploring Hadoop with Search Presentation Transcript

  • Exploring Hadoop with Search Pritesh Patel, Principal Architect Search and Big Data Analytics @ Avalon Consulting, LLC
  • Hadoop Ecosystem
  • Possible Integration Points
  • Why Search + Big Data? What Hadoop is good at What Search is good at Distributed File storage Free text retrieval Store large data sets Index large data sets Distributed Processing Textual Analysis Filtering and Sorting = Intelligence Discovery System of large textual data sets
  • How we Integrated Search and Big Data  Hbase Replication Facade  Take advantage of results of Analytical Pig and Hive jobs in Hadoop to make retrieval more intelligent  Done with inbuilt replication and it scales  Fast access since in Memory  Push architecture so its near real time  CRUD  Store in HDFS and Search in LW/Solr  Gives reference to source when integrated this way  Hbase has a RestFul API to retrieve data given ID that Solr would have after replication/indexing
  • Our Demo Architecture Diagram by Varun Rao @ Avalon Consulting, LLC
  • A Use Case of this Architecture  Monitor tweets with words “Hadoop”, “Lucidworks”, and “Big Data”  Automatically extract url’s mentioned when talking about these terms  In near real time visualize which urls seem to be mentioned with these terms  Discover urls that are becoming the most popular when mentioned with the topics “Big Data”, “Lucidworks”, and “Hadoop” and those might be urls you want to read
  • Demo  Any one want to send a tweet? Just use one or more of the words “Hadoop”, “Lucidworks”, “Big Data”  Add the any url to the tweet that you’d like to share. Try: www.avalonconsult.com or www.lucidworks.com
  • So much potential  You can apply this to so many things.  Do intelligent entity extraction to discover topics with UIMA integration of Solr  Do similar analysis of popular mentions and people of the topics of choice  Endless …  Any questions?
  • Team  Client Implementation done by Kevin Risden @ Avalon (risdenk@avalonconsult.com)  Demo Architecture Team  Varun Rao @ Avalon (raov@avalonconsult.com)  Pritesh Patel @ Avalon (patelp@avalonconsult.com)