1
Search	
  in	
  the	
  Apache	
  Hadoop	
  
Ecosystem:	
  Thoughts	
  from	
  the	
  Field	
  
Open	
  Source	
  Search	...
2
Thoughts	
  of	
  a	
  Former	
  SA	
  
3
Thoughts	
  of	
  a	
  Former	
  SA	
  Field	
  Guy	
  
Disclaimer	
  
•  Technologies,	
  not	
  products	
  
•  Cloudera	
  builds	
  things	
  soJware	
  
•  most	
  donated	
...
What	
  This	
  Talk	
  Isn’t	
  About	
  
•  Deploying	
  
•  Puppet,	
  Chef,	
  Ansible,	
  homegrown	
  scripts,	
  in...
6	
  
“	
  The	
  answer	
  to	
  most	
  
Hadoop	
  quesOons	
  is	
  it	
  
depends.”	
  
7
Quick	
  and	
  dirty,	
  more	
  Ome	
  for	
  use	
  cases.	
  
The	
  Apache	
  Hadoop	
  Ecosystem	
  
Why	
  “Ecosystem?”	
  
•  In	
  the	
  beginning,	
  just	
  Hadoop	
  
•  HDFS	
  
•  MapReduce	
  
•  Today,	
  dozens	...
ParOal	
  Ecosystem	
  
9
Hadoop	
  
external	
  system	
  
RDBMS	
  /	
  DWH	
  
web	
  server	
  
device	
  logs	
  
API...
HDFS	
  
•  Distributed,	
  highly	
  fault-­‐tolerant	
  filesystem	
  
•  OpOmized	
  for	
  large	
  streaming	
  access...
Lots	
  of	
  Commodity	
  Machines	
  
11
Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
MapReduce	
  (MR)	
  
•  Programming	
  paradigm	
  
•  Batch	
  oriented,	
  not	
  realOme	
  
•  Works	
  well	
  with	...
Under	
  the	
  Covers	
  
You specify map() and
reduce() functions.
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Upcoming SlideShare
Loading in …5
×

Search in the Apache Hadoop Ecosystem: Thoughts from the Field

1,622 views

Published on

This presentation describes the Hadoop ecosystem and gives examples of how these open source tools are combined and used to solve specific and sometimes very complex problems. Drawing upon case studies from the field, Mr. Moundalexis demonstrates that one-size, rigid traditional systems don’t fit all, but that combinations of tools in the Apache Hadoop ecosystem provide a versatile and flexible platform for integrating, finding, and analyzing information.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,622
On SlideShare
0
From Embeds
0
Number of Embeds
48
Actions
Shares
0
Downloads
71
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Search in the Apache Hadoop Ecosystem: Thoughts from the Field

  1. 1. 1 Search  in  the  Apache  Hadoop   Ecosystem:  Thoughts  from  the  Field   Open  Source  Search  Conference,  November  2013   Alex  Moundalexis     @technmsg  
  2. 2. 2 Thoughts  of  a  Former  SA  
  3. 3. 3 Thoughts  of  a  Former  SA  Field  Guy  
  4. 4. Disclaimer   •  Technologies,  not  products   •  Cloudera  builds  things  soJware   •  most  donated  to  Apache   •  some  closed-­‐source   •  I  will  likely  menOon  “Cloudera  Something”   •  Cloudera  “products”  I  reference  are  open  source   •  Apache  Licensed   •  Source  code  is  on  GitHub   •  hSps://github.com/cloudera   4
  5. 5. What  This  Talk  Isn’t  About   •  Deploying   •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor   •  Sizing  &  Tuning   •  Depends  heavily  on  data  and  workload   •  Coding   •  Algorithms   5
  6. 6. 6   “  The  answer  to  most   Hadoop  quesOons  is  it   depends.”  
  7. 7. 7 Quick  and  dirty,  more  Ome  for  use  cases.   The  Apache  Hadoop  Ecosystem  
  8. 8. Why  “Ecosystem?”   •  In  the  beginning,  just  Hadoop   •  HDFS   •  MapReduce   •  Today,  dozens  of  interrelated  components   •  I/O   •  Processing   •  Specialty  ApplicaOons   •  ConfiguraOon   •  Workflow   8
  9. 9. ParOal  Ecosystem   9 Hadoop   external  system   RDBMS  /  DWH   web  server   device  logs   API  access   log  collecOon   DB  table  import   batch  processing   machine  learning   external  system   API  access   user   RDBMS  /  DWH   DB  table    export   BI  tool   +  JDBC/ODBC   Search   SQL  
  10. 10. HDFS   •  Distributed,  highly  fault-­‐tolerant  filesystem   •  OpOmized  for  large  streaming  access  to  data   •  Based  on  Google  File  System   •  hSp://research.google.com/archive/gfs.html   10
  11. 11. Lots  of  Commodity  Machines   11 Image:Yahoo! Hadoop cluster [ OSCON ’07 ]
  12. 12. MapReduce  (MR)   •  Programming  paradigm   •  Batch  oriented,  not  realOme   •  Works  well  with  distributed  compuOng   •  Lots  of  Java,  but  other  languages  supported   •  Based  on  Google’s  paper   •  hSp://research.google.com/archive/mapreduce.html   12
  13. 13. Under  the  Covers  
  14. 14. You specify map() and reduce() functions.

×