Big Data & Hadoop
❑ LIVE On-Line Classes
❑ Class recordings made available for life time
❑ Quizzes and Assignments at end of each chapter
❑ ...
Day 1 Day 2
Week 1 Understanding Big Data
Hadoop Architecture
Hadoop Cluster
Data Loading Techniques
Week 2 Basic MapReduc...
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
NYSE generates about one
terabyte of new trade data per
day to Perform stock trading
analytics to determine trends...
Volume Variety Velocity
Big Data
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
Storage - Backup /
Read - Write
Processing
(ETL)
Usage /
Visualization
OLTP
RDBMS
Soci
al
Logs
Expensive
Storage and
processing
Lot of Data
Discarded
Storage spread across.
Not easily accessibl...
OLTP
RDBMS
Soci
al
Logs
Lot of Data
Discarded
Reports
(Batch)
Hadoop
DW Reports
❏
❏
❏
❏
❏
❏
❏
❏
1 Machine
4 I/O Channels
Each Channel -- 100 MBps
100 Machines
4 I/O Channels
Each Channel -- 100 MBps
1 Machine
4 I/O Channels
Each Channel -- 100 MBps
100 Machines
4 I/O Channels
Each Channel -- 100 MBps
Reading 1 TB Data
4...
Story of Hadoop
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
Characterstics of Hadoop
Hadoop
Reliable
Economical
Scalable
Fault Tolerant
❏
❏
❏
❏
❏
❏
❏
❏
Hadoop Core Components
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
Name Node:
Keeps track of overall file
directory structure and the
placement of Data Block
Name Node
(...
NameNode
Edit Logs
FSImage
❑
❑
❑
❑
NameNode
Sedondary
Namenode
File System
Metadata
Its been
an hour
?
Quiz
Quiz
Quiz
When the NameNode fails, Secondary NameNode takes over instantly and
prevents Cluster Failure:
❑ TRUE
❑ FALSE
Quiz
When the NameNode fails, Secondary NameNode takes
over instantly and prevents Cluster Failure:
❑ TRUE
❑ FALSE
False. ...
JobTracker
JobTracker (cotd..)
JobTracker (cotd..)
Quiz
Quiz
Rack 1 Rack 2 Rack 3
Block A Block B Block C
Topology script property topology.script.file.name in core-site.xml
❑
❑
❑
❑
Green - GA Versions
Black - Not Released by
Apache yet
Red - Commercial
❏ http://blog.cloudera.com/blog/2012/01/an-update-on-apache-
hadoop-1-0/
❏ https://blogs.apache.org/bigtop/entry/all_you_w...
❏
❏
❏
❏
Class 2 Pre-work
❏ Setup hadoop environment using documents provided on google
drive
❏ CDH3 (recommended) or CDH4
❏ Execut...
Thank You !
See you in next class
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Day 1 big data & hadoop By SoApt
Upcoming SlideShare
Loading in …5
×

Day 1 big data & hadoop By SoApt

460 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
460
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Day 1 big data & hadoop By SoApt

  1. 1. Big Data & Hadoop
  2. 2. ❑ LIVE On-Line Classes ❑ Class recordings made available for life time ❑ Quizzes and Assignments at end of each chapter ❑ Technical support ❑ Project work ❑ Assessment and Certification ❑ Post Training Guidance and Support ❑ Assistance in finding relevent Job
  3. 3. Day 1 Day 2 Week 1 Understanding Big Data Hadoop Architecture Hadoop Cluster Data Loading Techniques Week 2 Basic MapReduce Advanced MapReduce YARN 2.0 Week 3 PIG Latin Hive Week 4 NoSQL Databases, HBase and ZooKeeper Project Work
  4. 4. ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏
  5. 5. ❏ ❏ ❏ ❏ NYSE generates about one terabyte of new trade data per day to Perform stock trading analytics to determine trends for optimal trades.
  6. 6. Volume Variety Velocity Big Data
  7. 7. ❏ ❏ ❏ ❏
  8. 8. ❏ ❏ ❏ ❏
  9. 9. ❏ ❏ ❏ ❏ ❏
  10. 10. ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏
  11. 11. ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏
  12. 12. ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏
  13. 13. ❏ ❏
  14. 14. Storage - Backup / Read - Write Processing (ETL) Usage / Visualization
  15. 15. OLTP RDBMS Soci al Logs Expensive Storage and processing Lot of Data Discarded Storage spread across. Not easily accessible. Limited storage capacity Reports Reports
  16. 16. OLTP RDBMS Soci al Logs Lot of Data Discarded Reports (Batch) Hadoop DW Reports
  17. 17. ❏ ❏ ❏ ❏ ❏ ❏
  18. 18. ❏ ❏
  19. 19. 1 Machine 4 I/O Channels Each Channel -- 100 MBps 100 Machines 4 I/O Channels Each Channel -- 100 MBps
  20. 20. 1 Machine 4 I/O Channels Each Channel -- 100 MBps 100 Machines 4 I/O Channels Each Channel -- 100 MBps Reading 1 TB Data 45 Minutes .45 Minutes
  21. 21. Story of Hadoop ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏
  22. 22. ❏ ❏
  23. 23. ❏ ❏ ❏
  24. 24. ❏ ❏ ❏ ❏ ❏ ❏ ❏
  25. 25. Characterstics of Hadoop Hadoop Reliable Economical Scalable Fault Tolerant
  26. 26. ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏
  27. 27. Hadoop Core Components
  28. 28. ❏ ❏ ❏ ❏ ❏
  29. 29. ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ Name Node: Keeps track of overall file directory structure and the placement of Data Block Name Node (Stores metadata only) METADATA: /user/doug/hinfo-> 1 3 5 /user/doug/pdetail-> 4 2
  30. 30. NameNode Edit Logs FSImage
  31. 31. ❑ ❑ ❑ ❑ NameNode Sedondary Namenode File System Metadata Its been an hour ?
  32. 32. Quiz
  33. 33. Quiz
  34. 34. Quiz When the NameNode fails, Secondary NameNode takes over instantly and prevents Cluster Failure: ❑ TRUE ❑ FALSE
  35. 35. Quiz When the NameNode fails, Secondary NameNode takes over instantly and prevents Cluster Failure: ❑ TRUE ❑ FALSE False. Secondary NameNode is used for creating NameNode Checkpoints. NameNode can be manually recovered using ‘edits’ and ‘FSImage’ stored in Secondary NameNode.
  36. 36. JobTracker
  37. 37. JobTracker (cotd..)
  38. 38. JobTracker (cotd..)
  39. 39. Quiz
  40. 40. Quiz
  41. 41. Rack 1 Rack 2 Rack 3 Block A Block B Block C Topology script property topology.script.file.name in core-site.xml
  42. 42. ❑ ❑
  43. 43. ❑ ❑
  44. 44. Green - GA Versions Black - Not Released by Apache yet Red - Commercial
  45. 45. ❏ http://blog.cloudera.com/blog/2012/01/an-update-on-apache- hadoop-1-0/ ❏ https://blogs.apache.org/bigtop/entry/all_you_wanted_to_know ❏ https://hadoop.apache.org/releases.html ❏ http://hortonworks.com/blog/apache-hadoop-2-is-ga/
  46. 46. ❏ ❏ ❏ ❏
  47. 47. Class 2 Pre-work ❏ Setup hadoop environment using documents provided on google drive ❏ CDH3 (recommended) or CDH4 ❏ Execute basic linux commands ❏ Execute HDFS hands on commands ❏ Attempt the class-1 assignment
  48. 48. Thank You ! See you in next class

×