Day 1 big data & hadoop By SoApt

•

0 likes•546 views

This document provides an overview of a training program on Big Data and Hadoop. The training includes live online classes, recorded class materials, quizzes and assignments. Key topics covered include Hadoop architecture, MapReduce, YARN, Pig Latin, Hive, HBase and project work. The training aims to help students understand Big Data challenges, how Hadoop addresses them and gain skills required for jobs working with Big Data.

❑ LIVE On-Line Classes
❑ Class recordings made available for life time
❑ Quizzes and Assignments at end of each chapter
❑ Technical support
❑ Project work
❑ Assessment and Certification
❑ Post Training Guidance and Support
❑ Assistance in finding relevent Job

Day 1 Day 2
Week 1 Understanding Big Data
Hadoop Architecture
Hadoop Cluster
Data Loading Techniques
Week 2 Basic MapReduce Advanced MapReduce
YARN 2.0
Week 3 PIG Latin Hive
Week 4 NoSQL Databases, HBase and
ZooKeeper
Project Work

❏
❏
❏
❏
NYSE generates about one
terabyte of new trade data per
day to Perform stock trading
analytics to determine trends for
optimal trades.

Storage - Backup /
Read - Write
Processing
(ETL)
Usage /
Visualization

OLTP
RDBMS
Soci
al
Logs
Expensive
Storage and
processing
Lot of Data
Discarded
Storage spread across.
Not easily accessible.
Limited storage capacity
Reports
Reports

OLTP
RDBMS
Soci
al
Logs
Lot of Data
Discarded
Reports
(Batch)
Hadoop
DW Reports

1 Machine
4 I/O Channels
Each Channel -- 100 MBps
100 Machines
4 I/O Channels
Each Channel -- 100 MBps

1 Machine
4 I/O Channels
Each Channel -- 100 MBps
100 Machines
4 I/O Channels
Each Channel -- 100 MBps
Reading 1 TB Data
45 Minutes .45 Minutes

Characterstics of Hadoop
Hadoop
Reliable
Economical
Scalable
Fault Tolerant

❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
Name Node:
Keeps track of overall file
directory structure and the
placement of Data Block
Name Node
(Stores metadata only)
METADATA:
/user/doug/hinfo-> 1 3 5
/user/doug/pdetail-> 4 2

❑
❑
❑
❑
NameNode
Sedondary
Namenode
File System
Metadata
Its been
an hour
?

Quiz
When the NameNode fails, Secondary NameNode takes over instantly and
prevents Cluster Failure:
❑ TRUE
❑ FALSE

Quiz
When the NameNode fails, Secondary NameNode takes
over instantly and prevents Cluster Failure:
❑ TRUE
❑ FALSE
False. Secondary NameNode is used for creating NameNode
Checkpoints. NameNode can be manually recovered using ‘edits’ and
‘FSImage’ stored in Secondary NameNode.

Rack 1 Rack 2 Rack 3
Block A Block B Block C
Topology script property topology.script.file.name in core-site.xml

Green - GA Versions
Black - Not Released by
Apache yet
Red - Commercial

❏ http://blog.cloudera.com/blog/2012/01/an-update-on-apache-
hadoop-1-0/
❏ https://blogs.apache.org/bigtop/entry/all_you_wanted_to_know
❏ https://hadoop.apache.org/releases.html
❏ http://hortonworks.com/blog/apache-hadoop-2-is-ga/

Class 2 Pre-work
❏ Setup hadoop environment using documents provided on google
drive
❏ CDH3 (recommended) or CDH4
❏ Execute basic linux commands
❏ Execute HDFS hands on commands
❏ Attempt the class-1 assignment

What's hot

Redis memory optimization sripathi, CTO hashedinHashedIn Technologies

MySQL Performance Schema in Action Mydbops

Hdfs internalsBhupesh Chawda

Best practices for running MySQL on production - Vaibhav UpadhyayMydbops

C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...DataStax Academy

Алексей Лесовский "Тюнинг Linux для баз данных. "Tanya Denisyuk

Ceph at salesforce ceph day external presentationSameer Tiwari

Hp java heap dump analysis WorkshopMadhavan Marimuthu

Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...ScyllaDB

Flume-based Independent News AggregatorMário Almeida

Nov HUG 2009: Hadoop Record Reader In PythonYahoo Developer Network

Introduction to Apache Tajo: Data Warehouse for Big DataGruter

Deployment and Management of Hadoop ClustersAmal G Jose

Apache Hadoop HDFSMike Frampton

Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChangerMongoDB

Using ZFS file system with MySQLMydbops

Automation of Hadoop cluster operations in Arm Treasure DataYan Wang

Draft slide of Demystifying DHT in GlusterFSAnkit Raj

Hadoop operations basicHafizur Rahman

JFall 2011 no sql workshopfvanvollenhoven

What's hot (20)

Redis memory optimization sripathi, CTO hashedin

MySQL Performance Schema in Action

Hdfs internals

Best practices for running MySQL on production - Vaibhav Upadhyay

C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...

Алексей Лесовский "Тюнинг Linux для баз данных. "

Ceph at salesforce ceph day external presentation

Hp java heap dump analysis Workshop

Scylla Summit 2018: The Short and Straight Road That Leads from Cassandra to ...

Flume-based Independent News Aggregator

Nov HUG 2009: Hadoop Record Reader In Python

Introduction to Apache Tajo: Data Warehouse for Big Data

Deployment and Management of Hadoop Clusters

Apache Hadoop HDFS

Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger

Using ZFS file system with MySQL

Automation of Hadoop cluster operations in Arm Treasure Data

Draft slide of Demystifying DHT in GlusterFS

Hadoop operations basic

JFall 2011 no sql workshop

Similar to Day 1 big data & hadoop By SoApt

NameNode Analytics - Querying HDFS Namespace in Real TimePlamen Jeliazkov

Plain english guide to drupal 8 criticalsAngela Byron

Hadoop introductionmusrath mohammad

Hadoop-Quick introductionSandeep Singh

Introduction to Hadoop AdministrationRamesh Pabba - seeking new projects

Hadoop Cluster With High AvailabilityEdureka!

Big data processing using hadoop poster presentationAmrut Patil

Big data nyuEdward Capriolo

What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!

MySQL HAKris Buytaert

Introduction to Hadoop AdministrationRamesh Pabba - seeking new projects

Apache Hadoop- Hadoop Basics.pptxMiraj Godha

Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionEdureka!

Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz

RedisDiego Pacheco

Hadoop seminarKrishnenduKrishh

Understanding HadoopMahendran Ponnusamy

Intro to Apache HadoopSufi Nawaz

Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan

Similar to Day 1 big data & hadoop By SoApt (20)

NameNode Analytics - Querying HDFS Namespace in Real Time

Plain english guide to drupal 8 criticals

Hadoop introduction

Hadoop-Quick introduction

Introduction to Hadoop Administration

Hadoop Cluster With High Availability

Big data processing using hadoop poster presentation

Big data nyu

What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...

MySQL HA

Introduction to Hadoop Administration

Apache Hadoop- Hadoop Basics.pptx

Hadoop a Highly Available and Secure Enterprise Data Warehousing solution

Introduction to Big Data Hadoop Training Online by www.itjobzone.biz

Redis

Hadoop seminar

Understanding Hadoop

Intro to Apache Hadoop

Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY

Day 1 big data & hadoop By SoApt

1. Big Data & Hadoop

2. ❑ LIVE On-Line Classes ❑ Class recordings made available for life time ❑ Quizzes and Assignments at end of each chapter ❑ Technical support ❑ Project work ❑ Assessment and Certification ❑ Post Training Guidance and Support ❑ Assistance in finding relevent Job

3. Day 1 Day 2 Week 1 Understanding Big Data Hadoop Architecture Hadoop Cluster Data Loading Techniques Week 2 Basic MapReduce Advanced MapReduce YARN 2.0 Week 3 PIG Latin Hive Week 4 NoSQL Databases, HBase and ZooKeeper Project Work

4. ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏

5. ❏ ❏ ❏ ❏ NYSE generates about one terabyte of new trade data per day to Perform stock trading analytics to determine trends for optimal trades.

6. Volume Variety Velocity Big Data

12. ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏

13.

14.

15. ❏ ❏

16.

17. Storage - Backup / Read - Write Processing (ETL) Usage / Visualization

18. OLTP RDBMS Soci al Logs Expensive Storage and processing Lot of Data Discarded Storage spread across. Not easily accessible. Limited storage capacity Reports Reports

19. OLTP RDBMS Soci al Logs Lot of Data Discarded Reports (Batch) Hadoop DW Reports

20. ❏ ❏ ❏ ❏ ❏ ❏

21. ❏ ❏

22.

23.

24.

25.

26. 1 Machine 4 I/O Channels Each Channel -- 100 MBps 100 Machines 4 I/O Channels Each Channel -- 100 MBps

27. 1 Machine 4 I/O Channels Each Channel -- 100 MBps 100 Machines 4 I/O Channels Each Channel -- 100 MBps Reading 1 TB Data 45 Minutes .45 Minutes

28. Story of Hadoop ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏

29. ❏ ❏

30. ❏ ❏ ❏

31. ❏ ❏ ❏ ❏ ❏ ❏ ❏

32. Characterstics of Hadoop Hadoop Reliable Economical Scalable Fault Tolerant

33.

34.

35. ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏

36. Hadoop Core Components

37.

38.

39. ❏ ❏ ❏ ❏ ❏

40. ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ ❏ Name Node: Keeps track of overall file directory structure and the placement of Data Block Name Node (Stores metadata only) METADATA: /user/doug/hinfo-> 1 3 5 /user/doug/pdetail-> 4 2

41. NameNode Edit Logs FSImage

42. ❑ ❑ ❑ ❑ NameNode Sedondary Namenode File System Metadata Its been an hour ?

43. Quiz

44. Quiz

45. Quiz When the NameNode fails, Secondary NameNode takes over instantly and prevents Cluster Failure: ❑ TRUE ❑ FALSE

46. Quiz When the NameNode fails, Secondary NameNode takes over instantly and prevents Cluster Failure: ❑ TRUE ❑ FALSE False. Secondary NameNode is used for creating NameNode Checkpoints. NameNode can be manually recovered using ‘edits’ and ‘FSImage’ stored in Secondary NameNode.

47. JobTracker

48. JobTracker (cotd..)

49. JobTracker (cotd..)

50. Quiz

51. Quiz

52.

53.

54. Rack 1 Rack 2 Rack 3 Block A Block B Block C Topology script property topology.script.file.name in core-site.xml

55. ❑ ❑

56. ❑ ❑

57.

58.

59.

60. Green - GA Versions Black - Not Released by Apache yet Red - Commercial

61. ❏ http://blog.cloudera.com/blog/2012/01/an-update-on-apache- hadoop-1-0/ ❏ https://blogs.apache.org/bigtop/entry/all_you_wanted_to_know ❏ https://hadoop.apache.org/releases.html ❏ http://hortonworks.com/blog/apache-hadoop-2-is-ga/

62. ❏ ❏ ❏ ❏

63. Class 2 Pre-work ❏ Setup hadoop environment using documents provided on google drive ❏ CDH3 (recommended) or CDH4 ❏ Execute basic linux commands ❏ Execute HDFS hands on commands ❏ Attempt the class-1 assignment

64. Thank You ! See you in next class

Day 1 big data & hadoop By SoApt

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Day 1 big data & hadoop By SoApt

Similar to Day 1 big data & hadoop By SoApt (20)

Day 1 big data & hadoop By SoApt