Introduction to hadoop
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Introduction to hadoop

  • 629 views
Uploaded on

Lynx Consultants training about Hadoop

Lynx Consultants training about Hadoop

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
629
On Slideshare
613
From Embeds
16
Number of Embeds
4

Actions

Shares
Downloads
24
Comments
0
Likes
0

Embeds 16

http://www.linkedin.com 8
http://54.199.180.60 4
https://www.linkedin.com 3
http://hubot-clb-2081983768.ap-northeast-1.elb.amazonaws.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Marc  Cluet  –  Lynx  Consultants  What’s  behind  Big  Data  
  • 2. What we’ll cover?¡  Understand  Hadoop  components  ¡  Understand  different  technologies  involved  ¡  Embrace  Big  Data!  Lynx  Consultants  ©  2013  
  • 3. What is Big Data?Lynx  Consultants  ©  2013  
  • 4. What is Big Data?¡   SQL  has  a  limited  ability  to  process  changing  data  §  SQL  schemas  are  the  truth,  data  needs  to  fit  that  Lynx  Consultants  ©  2013  
  • 5. What is Big Data?¡   Big  Data  is  the  solution!  §  Data  can  be  truly  dynamic  Lynx  Consultants  ©  2013  
  • 6. What is Big Data?¡   Big  Data  is  the  solution!  §  Data  can  be  truly  dynamic  §  Designed  to  handle  Terabytes  of  data  Lynx  Consultants  ©  2013  
  • 7. What is Big Data?¡   Big  Data  is  the  solution!  §  Data  can  be  truly  dynamic  §  Designed  to  handle  Terabytes  of  data  §  Designed  for  fault  tolerance  and  securing  data  Lynx  Consultants  ©  2013  
  • 8. What is Big Data?¡   Big  Data  is  the  solution!  §  Data  can  be  truly  dynamic  §  Designed  to  handle  Terabytes  of  data  §  Designed  for  fault  tolerance  and  securing  data  §  Designed  around  exploiting  hardware  to  the  fullest  Lynx  Consultants  ©  2013  
  • 9. What is Big Data?¡   Big  Data  is  the  solution!  §  Data  can  be  truly  dynamic  §  Designed  to  handle  Terabytes  of  data  §  Designed  for  fault  tolerance  and  securing  data  §  Designed  around  exploiting  hardware  to  the  fullest  §  Designed  around  Map/Reduce  Lynx  Consultants  ©  2013  
  • 10. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 11. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 12. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 13. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 14. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 15. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 16. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 17. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 18. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 19. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 20. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 21. Who runs Big Data?¡  A  few  small  companies  Lynx  Consultants  ©  2013  
  • 22. What is Hadoop?Lynx  Consultants  ©  2013  
  • 23. What is Hadoop?¡   Hadoop  is  one  of  the  big  players  for  Big  Data  §  Developed  as  an  Open  Source  implementation  to  implement  Google  BigTable  Lynx  Consultants  ©  2013  
  • 24. What is Hadoop?¡   Hadoop  is  one  of  the  big  players  for  Big  Data  §  Developed  as  an  Open  Source  implementation  to  implement  Google  BigTable  §  Mainly  developed  at  Yahoo!  Lynx  Consultants  ©  2013  
  • 25. What is Hadoop?¡   Hadoop  is  one  of  the  big  players  for  Big  Data  §  Developed  as  an  Open  Source  implementation  to  implement  Google  BigTable  §  Mainly  developed  at  Yahoo!  §  Current  companies  behind  it:  Hortonworks  and  Cloudera  Lynx  Consultants  ©  2013  
  • 26. What are the features of Hadoop?¡   HDFS  –  Hadoop  Distributed  File  System  §  HDFS  is  a  distributed  filesystem  across  many  nodes  §  Has  many  copies  of  your  data  (default:  3)  §  If  one  node  goes  down  makes  sure  all  the  data  is  rebalanced  Lynx  Consultants  ©  2013  
  • 27. What are the features of Hadoop?¡   HDFS  –  Hadoop  Distributed  File  System  Lynx  Consultants  ©  2013  
  • 28. What are the features of Hadoop?¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  §  Schemaless  Key-­‐Value  storage  §  All  data  exportable  in  JSON  Lynx  Consultants  ©  2013  
  • 29. What are the features of Hadoop?¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  Lynx  Consultants  ©  2013  
  • 30. What are the features of Hadoop?¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  ¡   Map/Reduce  –  The  key  to  it  all  §  This  was  invented  by  Google  §  Given  a  dataset  we  Map  all  that  match  a  criteria  §  Then  we  Reduce  this  to  a  result  Lynx  Consultants  ©  2013  
  • 31. What are the features of Hadoop?¡  Map/Reduce  –  The  key  to  it  all  Lynx  Consultants  ©  2013  
  • 32. What are the features of Hadoop?¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  ¡   Map/Reduce  –  The  key  to  it  all  ¡   Hive  –  SQL  for  NoSQL  §  Hive  provides  a  SQL  language  called  HiveSQL  §  Provides  a  good  entrance  for  SQL  users  :)  Lynx  Consultants  ©  2013  
  • 33. What are the features of Hadoop?¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  ¡   Map/Reduce  –  The  key  to  it  all  ¡   Hive  –  SQL  for  NoSQL  ¡   Pig  –  Map/Reduce  made  easy  §  Creates  data  results  given  a  reduced  language  §  Reinvents  SQL  somehow  Lynx  Consultants  ©  2013  
  • 34. What are the features of Hadoop?¡   Hive  Lynx  Consultants  ©  2013  
  • 35. What are the features of Hadoop?¡   Pig  Lynx  Consultants  ©  2013  
  • 36. What are the features of Hadoop?¡   HDFS  –  Hadoop  Distributed  File  System  ¡   Hbase  –  Hadoop  NoSQL  Database  ¡   Map/Reduce  –  The  key  to  it  all  ¡   Hive  –  SQL  for  NoSQL  ¡   Pig  –  Map/Reduce  made  easy  ¡   Flume  –  Fault  Tolerant  transport  Lynx  Consultants  ©  2013  
  • 37. What are the features of Hadoop?¡   Flume  §  Divides  in  Sources,  Channels,  Sinks  §  Can  have  multiple  of  everything,  makes  it  fault  tolerant  §  Many  sources!  ▪  Avro,  Exec,  JMS,  Syslog,  HTTP,  NetCat,  Your  Own  (Java)  Lynx  Consultants  ©  2013  
  • 38. What are the features of Hadoop?¡   Flume  §  Divides  in  Sources,  Channels,  Sinks  §  Can  have  multiple  of  everything,  makes  it  fault  tolerant  §  Many  sources!  §  Many  channels!  ▪  Memory,  File,  Your  Own  (Java)  Lynx  Consultants  ©  2013  
  • 39. What are the features of Hadoop?¡   Flume  §  Divides  in  Sources,  Channels,  Sinks  §  Can  have  multiple  of  everything,  makes  it  fault  tolerant  §  Many  sources!  §  Many  channels!  §  Many  sinks!  ▪  Avro,  HDFS,  Logger,  IRC,  File,  Hbase,  ElasticSearch,  S3,  Community  sinks,  Your  Own  (Java)  Lynx  Consultants  ©  2013  
  • 40. What are the features of Hadoop?¡   Flume  Lynx  Consultants  ©  2013  
  • 41. How Hadoop looks like in a DC¡   Components  §  Primary  Namenode  §  Secondary  Namenode  §  Data  Node  Lynx  Consultants  ©  2013  
  • 42. How Hadoop looks like in a DC¡   Components  §  Primary  Namenode  ▪  Controls  all  the  cluster,  knows  where  the  data  resides  ▪  Runs  the  job  tracker  to  keep  track  of  Map/Reduce  jobs  ▪  Biggest  point  of  failure,  shadowing  it  is  a  potential  option  §  Secondary  Namenode  §  Data  Node  Lynx  Consultants  ©  2013  
  • 43. How Hadoop looks like in a DC¡   Components  §  Primary  Namenode  §  Secondary  Namenode  ▪  Performs  secondary  cleanup  options  §  Data  Node  Lynx  Consultants  ©  2013  
  • 44. How Hadoop looks like in a DC¡   Components  §  Primary  Namenode  §  Secondary  Namenode  §  Data  Node  ▪  Stores  all  the  information  ▪  Runs  Map/Reduce  Lynx  Consultants  ©  2013  
  • 45. How Hadoop looks like in a DC¡   Components  Lynx  Consultants  ©  2013  
  • 46. Questions?Lynx  Consultants  ©  2013