Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SolrCloud on Hadoop

4,361 views

Published on

An overview of building and serving Lucene indexes on a Hadoop cluster with Solr for text and parametric searching, as presented at Cleveland Hadoop User Group on 13 January 2014.

Published in: Technology, Education
  • Thank you, Alex. This was a very informative presentation.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

SolrCloud on Hadoop

  1. 1. 1 SolrCloud  on  Hadoop   Cleveland  Big  Data  and  Hadoop  User  Group,  January  2014   Alex  Moundalexis     @technmsg  
  2. 2. Disclaimer   •  Technologies,  not  products   •  Cloudera  builds  things  soGware   •  most  donated  to  Apache   •  some  closed-­‐source   •  I  will  likely  menLon  “Cloudera  Something”   •  Cloudera  “products”  I  reference  are  open  source   •  Apache  Licensed   •  Source  code  is  on  GitHub   •  hQps://github.com/cloudera   2
  3. 3. What  This  Talk  Isn’t  About   •  Deploying   •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor   •  Sizing  &  Tuning   •  Depends  heavily  on  data  and  workload   •  Coding   •  Unless  you  count  XML  or  CSV   •  Algorithms   3
  4. 4. 4 Quick  and  dirty,  more  Lme  for  use  cases.   The  Apache  Hadoop  Ecosystem  
  5. 5. Why  “Ecosystem?”   •  In  the  beginning,  just  Hadoop   •  HDFS   •  MapReduce   •  Today,  dozens  of  interrelated  components   •  I/O   •  Processing   •  Specialty  ApplicaLons   •  ConfiguraLon   •  Workflow   5
  6. 6. ParLal  Ecosystem   6 Hadoop   external  system   RDBMS  /  DWH   web  server   device  logs   API  access   log  collecLon   DB  table  import   batch  processing   machine  learning   external  system   API  access   user   RDBMS  /  DWH   DB  table    export   BI  tool   +  JDBC/ODBC   Search   SQL  
  7. 7. HDFS   •  Distributed,  highly  fault-­‐tolerant  filesystem   •  OpLmized  for  large  streaming  access  to  data   •  Based  on  Google  File  System   •  hQp://research.google.com/archive/gfs.html   7
  8. 8. Lots  of  Commodity  Machines   8 Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
  9. 9. MapReduce  (MR)   •  Programming  paradigm   •  Batch  oriented,  not  realLme   •  Works  well  with  distributed  compuLng   •  Lots  of  Java,  but  other  languages  supported   •  Based  on  Google’s  paper   •  hQp://research.google.com/archive/mapreduce.html   9
  10. 10. Under  the  Covers  
  11. 11. You specify map() and reduce() functions.

×