2014 feb 24_big_datacongress_hadoopsession1_hadoop101
Upcoming SlideShare
Loading in...5
×
 

2014 feb 24_big_datacongress_hadoopsession1_hadoop101

on

  • 381 views

A hands on introduction to Hadoop by using the Hortonworks Sandbox

A hands on introduction to Hadoop by using the Hortonworks Sandbox

Statistics

Views

Total Views
381
Views on SlideShare
376
Embed Views
5

Actions

Likes
0
Downloads
10
Comments
0

2 Embeds 5

http://www.linkedin.com 4
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

2014 feb 24_big_datacongress_hadoopsession1_hadoop101 2014 feb 24_big_datacongress_hadoopsession1_hadoop101 Presentation Transcript

  • Adam  Muise  –  Solu/on  Architect,  Hortonworks   HADOOP  101:   AN  INTRODUCTION  TO  HADOOP  WITH  THE   HORTONWORKS  SANDBOX  
  • Who  are  we?  
  • Who  is                                        ?  
  • 100%  Open  Source  –   Democra/zed  Access  to   Data   The  leaders  of  Hadoop’s   development   We  do  Hadoop   Drive  Innova/on  in   the  plaForm  –  We   lead  the  roadmap     Community  driven,     Enterprise  Focused  
  • We  do  Hadoop  successfully.   Support     Training   Professional  Services  
  • Enter  the  Hadoop.   ………   hOp://www.fabulouslybroke.com/2011/05/ninja-­‐elephants-­‐and-­‐other-­‐awesome-­‐stories/  
  • Hadoop  was  created  because   tradi/onal  technologies  never  cut  it   for  the  Internet  proper/es  like   Google,  Yahoo,  Facebook,  TwiOer,   and  LinkedIn  
  • Tradi/onal  architecture  didn’t   scale  enough…   App   App   App   App   App   App   App   App   DB   DB   DB   SAN   App   App   App   App   DB   DB   DB   SAN   DB   DB   DB   SAN  
  • Databases  can  become  bloated   and  useless  
  • $upercompu/ng   Tradi/onal  architectures  cost  too   much  at  that  volume…   $/TB   $pecial   Hardware  
  • So  what  is  the  answer?  
  • If  you  could  design  a  system  that   would  handle  this,  what  would  it   look  like?  
  • It  would  probably  need  a  highly   resilient,  self-­‐healing,  cost-­‐efficient,   distributed  file  system…   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage   Storage  
  • It  would  probably  need  a  completely   parallel  processing  framework  that   took  tasks  to  the  data…   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  
  • It  would  probably  run  on  commodity   hardware,  virtualized  machines,  and   common  OS  plaForms   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage  
  • It  would  probably  be  open  source  so   innova/on  could  happen  as  quickly   as  possible  
  • It  would  need  a  cri/cal  mass  of   users  
  • Tez   Storm   YARN   Pig   HDFS   MapReduce   Apache  Hadoop   HCatalog   Hive   HBase   Ambari   Knox   Sqoop   Falcon   Flume  
  • Storm   Tez   Pig   YARN   HDFS   MapReduce   Hortonworks  Data  PlaForm   HCatalog   Hive   HBase   Ambari   Knox   Sqoop   Falcon   Flume  
  • We  are  going  to  learn  how  to  work   with  Hadoop  in  less  than  an  hour.  
  • To  do  this,  we  need  to  install   Hadoop  right?  
  • Nope.  
  • Enter  the         Sandbox.  
  • The  Sandbox  is  ‘Hadoop  in  a  Can’.   It  contains  one  copy  of  each  of  the   Master  and  Worker  node  processes   used  in  a  cluster,  only  in  a  single   virtual  node.   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Processing  Processing   Storage   Storage   Storage   Processing   Storage   Linux  VM   Processing   Processing  Processing   Storage   Storage   Storage  
  • Gefng  started  with  Sandbox  VM:     -­‐  Pick  your  flavor  of  VM  at…    hOp://www.hortonworks.com/sandbox   -­‐  Start  the  sandbox  VM   -­‐  find  the  IP  displayed       -­‐  go  to…    hOp://172.16.130.131     -­‐  Register   -­‐  Click  on  ‘Start  Tutorials’   -­‐  On  the  lek  hand  nav,  click  on  ‘HCatalog,  Basic  Pig    &  Hive  Commands’    
  • In  this  tutorial  we  will:   -­‐  Land  files  in  HDFS   -­‐  Assign  metadata  with  HCatalog   -­‐  Use  SQL  with  Hive   -­‐  Learn  to  process  data  with  Pig  
  • Try  the  other  tutorials.  
  • Hadoop  is  the  new  Modern  Data   Architecture  for  the  Enterprise  
  • There is NO second place Hortonworks   …the  Bull  Elephant  of  Hadoop  InnovaGon   © Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page  29