Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Adam	
  Muise	
  –	
  Solu/on	
  Architect,	
  Hortonworks	
  

HADOOP	
  101:	
  

AN	
  INTRODUCTION	
  TO	
  HADOOP	
  ...
Who	
  are	
  we?	
  
Who	
  is	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ?	
  
100%	
  Open	
  Source	
  –	
  
Democra/zed	
  Access	
  to	
  
Data	
  

The	
  leaders	
  of	
  Hadoop’s	
  
development...
We	
  do	
  Hadoop	
  successfully.	
  
Support	
  	
  
Training	
  
Professional	
  Services	
  
Enter	
  the	
  Hadoop.	
  

………	
  
hOp://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐sto...
Hadoop	
  was	
  created	
  because	
  
tradi/onal	
  technologies	
  never	
  cut	
  it	
  
for	
  the	
  Internet	
  pro...
Tradi/onal	
  architecture	
  didn’t	
  
scale	
  enough…	
  
App	
   App	
   App	
   App	
  

App	
   App	
   App	
   App...
Databases	
  can	
  become	
  bloated	
  
and	
  useless	
  
$upercompu/ng	
  

Tradi/onal	
  architectures	
  cost	
  too	
  
much	
  at	
  that	
  volume…	
  

$/TB	
  

$pecial	
  ...
So	
  what	
  is	
  the	
  answer?	
  
If	
  you	
  could	
  design	
  a	
  system	
  that	
  
would	
  handle	
  this,	
  what	
  would	
  it	
  
look	
  like?	...
It	
  would	
  probably	
  need	
  a	
  highly	
  
resilient,	
  self-­‐healing,	
  cost-­‐efficient,	
  
distributed	
  file...
It	
  would	
  probably	
  need	
  a	
  completely	
  
parallel	
  processing	
  framework	
  that	
  
took	
  tasks	
  to...
It	
  would	
  probably	
  run	
  on	
  commodity	
  
hardware,	
  virtualized	
  machines,	
  and	
  
common	
  OS	
  pla...
It	
  would	
  probably	
  be	
  open	
  source	
  so	
  
innova/on	
  could	
  happen	
  as	
  quickly	
  
as	
  possible...
It	
  would	
  need	
  a	
  cri/cal	
  mass	
  of	
  
users	
  
Tez	
  

Storm	
  

YARN	
  

Pig	
  

HDFS	
  

MapReduce	
  

Apache	
  Hadoop	
  

HCatalog	
  

Hive	
  
HBase	
  

Am...
Storm	
  

Tez	
  
Pig	
  

YARN	
  

HDFS	
  

MapReduce	
  

Hortonworks	
  Data	
  PlaForm	
  
HCatalog	
  

Hive	
  
H...
We	
  are	
  going	
  to	
  learn	
  how	
  to	
  work	
  
with	
  Hadoop	
  in	
  less	
  than	
  an	
  hour.	
  
To	
  do	
  this,	
  we	
  need	
  to	
  install	
  
Hadoop	
  right?	
  
Nope.	
  
Enter	
  the	
  
	
  
	
  
	
  
Sandbox.	
  
The	
  Sandbox	
  is	
  ‘Hadoop	
  in	
  a	
  Can’.	
  
It	
  contains	
  one	
  copy	
  of	
  each	
  of	
  the	
  
Maste...
Gefng	
  started	
  with	
  Sandbox	
  VM:	
  
	
  
-­‐	
  Pick	
  your	
  flavor	
  of	
  VM	
  at…	
  

	
  hOp://www.hor...
In	
  this	
  tutorial	
  we	
  will:	
  
-­‐	
  Land	
  files	
  in	
  HDFS	
  
-­‐	
  Assign	
  metadata	
  with	
  HCata...
Try	
  the	
  other	
  tutorials.	
  
Hadoop	
  is	
  the	
  new	
  Modern	
  Data	
  
Architecture	
  for	
  the	
  Enterprise	
  
There is NO second place

Hortonworks	
  

…the	
  Bull	
  Elephant	
  of	
  Hadoop	
  InnovaGon	
  
© Hortonworks Inc. 20...
Upcoming SlideShare
Loading in …5
×

2014 feb 24_big_datacongress_hadoopsession1_hadoop101

683 views

Published on

A hands on introduction to Hadoop by using the Hortonworks Sandbox

Published in: Technology
  • Be the first to comment

  • Be the first to like this

2014 feb 24_big_datacongress_hadoopsession1_hadoop101

  1. 1. Adam  Muise  –  Solu/on  Architect,  Hortonworks   HADOOP  101:   AN  INTRODUCTION  TO  HADOOP  WITH  THE   HORTONWORKS  SANDBOX  
  2. 2. Who  are  we?  
  3. 3. Who  is                                        ?  
  4. 4. 100%  Open  Source  –   Democra/zed  Access  to   Data   The  leaders  of  Hadoop’s   development   We  do  Hadoop   Drive  Innova/on  in   the  plaForm  –  We   lead  the  roadmap     Community  driven,     Enterprise  Focused  
  5. 5. We  do  Hadoop  successfully.   Support     Training   Professional  Services  
  6. 6. Enter  the  Hadoop.   ………   hOp://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐stories/  
  7. 7. Hadoop  was  created  because   tradi/onal  technologies  never  cut  it   for  the  Internet  proper/es  like   Google,  Yahoo,  Facebook,  TwiOer,   and  LinkedIn  
  8. 8. Tradi/onal  architecture  didn’t   scale  enough…   App   App   App   App   App   App   App   App   DB   DB   DB   SAN   App   App   App   App   DB   DB   DB   SAN   DB   DB   DB   SAN  
  9. 9. Databases  can  become  bloated   and  useless  
  10. 10. $upercompu/ng   Tradi/onal  architectures  cost  too   much  at  that  volume…   $/TB   $pecial   Hardware  
  11. 11. So  what  is  the  answer?  
  12. 12. If  you  could  design  a  system  that   would  handle  this,  what  would  it   look  like?  
  13. 13. It  would  probably  need  a  highly   resilient,  self-­‐healing,  cost-­‐efficient,   distributed  file  system…   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage  
  14. 14. It  would  probably  need  a  completely   parallel  processing  framework  that   took  tasks  to  the  data…   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  
  15. 15. It  would  probably  run  on  commodity   hardware,  virtualized  machines,  and   common  OS  plaForms   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  
  16. 16. It  would  probably  be  open  source  so   innova/on  could  happen  as  quickly   as  possible  
  17. 17. It  would  need  a  cri/cal  mass  of   users  
  18. 18. Tez   Storm   YARN   Pig   HDFS   MapReduce   Apache  Hadoop   HCatalog   Hive   HBase   Ambari   Knox   Sqoop   Falcon   Flume  
  19. 19. Storm   Tez   Pig   YARN   HDFS   MapReduce   Hortonworks  Data  PlaForm   HCatalog   Hive   HBase   Ambari   Knox   Sqoop   Falcon   Flume  
  20. 20. We  are  going  to  learn  how  to  work   with  Hadoop  in  less  than  an  hour.  
  21. 21. To  do  this,  we  need  to  install   Hadoop  right?  
  22. 22. Nope.  
  23. 23. Enter  the         Sandbox.  
  24. 24. The  Sandbox  is  ‘Hadoop  in  a  Can’.   It  contains  one  copy  of  each  of  the   Master  and  Worker  node  processes   used  in  a  cluster,  only  in  a  single   virtual  node.   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Storage   Linux  VM   Processing   Processing  Processing   Storage   Storage   Storage  
  25. 25. Gefng  started  with  Sandbox  VM:     -­‐  Pick  your  flavor  of  VM  at…    hOp://www.hortonworks.com/sandbox   -­‐  Start  the  sandbox  VM   -­‐  find  the  IP  displayed       -­‐  go  to…    hOp://172.16.130.131     -­‐  Register   -­‐  Click  on  ‘Start  Tutorials’   -­‐  On  the  lek  hand  nav,  click  on  ‘HCatalog,  Basic  Pig    &  Hive  Commands’    
  26. 26. In  this  tutorial  we  will:   -­‐  Land  files  in  HDFS   -­‐  Assign  metadata  with  HCatalog   -­‐  Use  SQL  with  Hive   -­‐  Learn  to  process  data  with  Pig  
  27. 27. Try  the  other  tutorials.  
  28. 28. Hadoop  is  the  new  Modern  Data   Architecture  for  the  Enterprise  
  29. 29. There is NO second place Hortonworks   …the  Bull  Elephant  of  Hadoop  InnovaGon   © Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page  29  

×