Big Data and Hadoop Ecosystem

•

3 likes•3,032 views

Big data and Hadoop are introduced as ways to handle the increasing volume, variety, and velocity of data. Hadoop evolved as a solution to process large amounts of unstructured and semi-structured data across distributed systems in a cost-effective way using commodity hardware. It provides scalable and parallel processing via MapReduce and HDFS distributed file system that stores data across clusters and provides redundancy and failover. Key Hadoop projects include HDFS, MapReduce, HBase, Hive, Pig and Zookeeper.

Education Technology Business

Big Data and Hadoop

Presenter
Rajkumar Singh
http://rajkrrsingh.blogspot.com/
http://in.linkedin.com/in/rajkrrsingh

Big Data and Hadoop Introduction
Volume

Variety

Velocity

Facebook
Google Plus
Twitter
LinkedIn
Stock Exchange
Healthcare
Telecom

Structured,SemiStructured,unstructured

Facebook
Stock Exchange
Healthcare
Telecom
Mobile Devices
GPS
Security Infrastructure

The Solution (Hadoop Evolution)
Traditional Approach

GB->TB->PB--ZB
so the processing with RDBMS is Impossible

Challenges In Big data
• Storage -- PB
• Processing – In a timely manner
• Variety of data -- S/SS/US
• Cost

To overcome Big Data Challenges
Hadoop evolves
• Cost Effective – Commodity HW
• Big Cluster – (1000 Nodes) --- Provides Storage n Processing
• Parallel Processing – Map reduce
• Big Storage – Memory per node * no of Nodes / RF
• Fail over mechanism – Automatic Failover
• Data Distribution
• Map Reduce Framework
• Moving Code to data
• Heterogeneous Hardware System (IBM,HP,AIX,Oracle Machine of
any memory and CPU configuration)
• Scalable

What is Hadoop
•

Java Framework to Process erroneous amount of data

Hadoop Core
• HDFS
• Programming Construct (Map Reduce)

Hadoop Sub-Projects
• Hadoop Common: The common utilities that support the other Hadoop subprojects.
• Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to
application data.
• Hadoop MapReduce: A software framework for distributed processing of large data sets on compute
clusters.
Other Hadoop-related projects at Apache include:
• Avro™: A data serialization system.
• Cassandra™: A scalable multi-master database with no single points of failure.
• Chukwa™: A data collection system for managing large distributed systems.
• HBase™: A scalable, distributed database that supports structured data storage for large tables.
• Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
• Mahout™: A Scalable machine learning and data mining library.
• Pig™: A high-level data-flow language and execution framework for parallel computation.
• ZooKeeper™: A high-performance coordination service for distributed applications.

HDFS

250 GB

DFS

250 GB

1 TB File

250 GB

Based on GFS
250 GB

HDFS : Use Cases

• Very large file.
• Reading/Streaming Data Access.
Read data in large volume
Write once and Read frequent

• Expensive Hardware.
• Low latency Access.
• Lots of small files
• Parallel write/ Arbitrary Read

HDFS Building Blocks
Default Block Size
64MB
128MB

1GB file = 1024 MB/128 MB = 8 Blocks

For Small File Size
100 MB File < Block Size (128 MB) : Optimize for storage = 1 Block of
HDFS of size 100 MB

HDFS Daemon Services
• Name Node
• Secondary Name Node
• Data Node

GFS (Master/Slave Architecture)

HDFS Write
File 1: D1,D2,D4
File 2: D1,D2,D3

128 MB
RF = 3

D1

D1,D2,D4

D2

D3

D4

Copying Data from one Cluster to another
Cluster

UAT Cluster

Prod Cluster

Parallel copying using distcp

hadoop distcp hdfs://uat:54311/user/rajkrrsingh/input hdfs://prod:54311/user/rajkrrsingh/input

What's hot

Hadoop And Their Ecosystemsunera pathan

Introduction to the Hadoop Ecosystem (FrOSCon Edition)Uwe Printz

The Evolution of the Hadoop EcosystemCloudera, Inc.

What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!

Hadoop and Distributed ComputingFederico Cargnelutti

Hadoop ABHIJEET RAJ

Hadoop PrimerSteve Staso

Apache hadoop technology : BeginnersShweta Patnaik

Hadoop TechnologiesKannappan Sirchabesan

Hadoop hive presentationArvind Kumar

Syncsort et le retour d'expérience ComScoreModern Data Stack France

Real time hadoop + mapreduce introGeoff Hendrey

Apache Hadoop at 10Cloudera, Inc.

Hadoop-Quick introductionSandeep Singh

Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans

Column Stores and Google BigQueryCsaba Toth

Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz

Nextag talkJoydeep Sen Sarma

Hadoop Ecosystem OverviewGerrit van Vuuren

Big data and HadoopRahul Agarwal

What's hot (20)

Hadoop And Their Ecosystem

Introduction to the Hadoop Ecosystem (FrOSCon Edition)

The Evolution of the Hadoop Ecosystem

What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka

Hadoop and Distributed Computing

Hadoop

Hadoop Primer

Apache hadoop technology : Beginners

Hadoop Technologies

Hadoop hive presentation

Syncsort et le retour d'expérience ComScore

Real time hadoop + mapreduce intro

Apache Hadoop at 10

Hadoop-Quick introduction

Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop

Column Stores and Google BigQuery

Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)

Nextag talk

Hadoop Ecosystem Overview

Big data and Hadoop

Viewers also liked

Managing Big data using Hadoop Map Reduce in Telecom DomainAM Publications

Harnessing Big Data in Real-TimeDataWorks Summit

Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.

Hadoop Boosts Profits in Media and Telecom IndustryDataWorks Summit

Dataiku big data paris - the rise of the hadoop ecosystemDataiku

The Hadoop Ecosystem for DevelopersZohar Elkayam

Hadoop And Their Ecosystem pptsunera pathan

Hadoop Ecosystem at a GlanceNeev Technologies

Hadoop ecosystemRan Silberman

Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Cloudera, Inc.

Hadoop ecosystemtfmailru

Map reduce - simplified data processing on large clustersCleverence Kombe

Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG

Hadoop EcosystemPatrick Nicolas

Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz

Hadoop Ecosystem Architecture Overview Senthil Kumar

Hadoop Map Reduce 程式設計Wei-Yu Chen

Introduction to Map-ReduceBrendan Tierney

Apache Flume - DataDayTexasArvind Prabhakar

Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Edureka!

Viewers also liked (20)

Managing Big data using Hadoop Map Reduce in Telecom Domain

Harnessing Big Data in Real-Time

Hw09 Hadoop Based Data Mining Platform For The Telecom Industry

Hadoop Boosts Profits in Media and Telecom Industry

Dataiku big data paris - the rise of the hadoop ecosystem

The Hadoop Ecosystem for Developers

Hadoop And Their Ecosystem ppt

Hadoop Ecosystem at a Glance

Hadoop ecosystem

Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010

Hadoop ecosystem

Map reduce - simplified data processing on large clusters

Hadoop ecosystem framework n hadoop in live environment

Hadoop Ecosystem

Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...

Hadoop Ecosystem Architecture Overview

Hadoop Map Reduce 程式設計

Introduction to Map-Reduce

Apache Flume - DataDayTexas

Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka

Similar to Big Data and Hadoop Ecosystem

Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin

4. hadoop גיא לבנברגTaldor Group

Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin

Scaling Storage and Computation with Hadoopyaevents

Hadoop Distributed File SystemVaibhav Jain

BigdataAyush Agrawal

Introduction to Hadoop and Big DataJoe Alex

Big data Hadoop Ayyappan Paramesh

Big Data and Hadoop Training in ChandigarhBig Boxx Animation Academy

Big data and hadoop overvewKunal Khanna

Hadoop ppt1chariorienit

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw

Bigdata workshop february 2015 clairvoyantllc

2. hadoop fundamentalsLokesh Ramaswamy

Hadoopavnishagr

Big Data Architecture Workshop - Vahid Amiridatastack

List of Engineering Colleges in UttarakhandRoorkee College of Engineering, Roorkee

Hadoop.pptxarslanhaneef

Hadoop.pptxsonukumar379092

Introduction to BIg Data and HadoopAmir Shaikh

Similar to Big Data and Hadoop Ecosystem (20)

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends

4. hadoop גיא לבנברג

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends

Scaling Storage and Computation with Hadoop

Hadoop Distributed File System

Bigdata

Introduction to Hadoop and Big Data

Big data Hadoop

Big Data and Hadoop Training in Chandigarh

Big data and hadoop overvew

Hadoop ppt1

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3

Bigdata workshop february 2015

2. hadoop fundamentals

Hadoop

Big Data Architecture Workshop - Vahid Amiri

List of Engineering Colleges in Uttarakhand

Hadoop.pptx

Introduction to BIg Data and Hadoop

Recently uploaded

Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417

Introduction to Nonprofit Accounting: The BasicsTechSoup

Holdier Curriculum Vitae (April 2024).pdfagholdier

How to Create and Manage Wizard in Odoo 17Celine George

Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George

Towards a code of practice for AI in AT.pptxJisc

Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417

Salient Features of India constitution especially power and functionsKarakKing

Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh

Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University of Engineering & Technology, Jamshoro

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136

1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh

How to Give a Domain for a Field in Odoo 17Celine George

Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam

General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

SOC 101 Demonstration of Learning Presentationcamerronhm

Dyslexia AI Workshop for Slideshare.pptxcallscotland1987

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC

Recently uploaded (20)

Unit-IV- Pharma. Marketing Channels.pptx

Introduction to Nonprofit Accounting: The Basics

Holdier Curriculum Vitae (April 2024).pdf

How to Create and Manage Wizard in Odoo 17

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes

Towards a code of practice for AI in AT.pptx

Unit-V; Pricing (Pharma Marketing Management).pptx

Salient Features of India constitution especially power and functions

Micro-Scholarship, What it is, How can it help me.pdf

Mehran University Newsletter Vol-X, Issue-I, 2024

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...

1029-Danh muc Sach Giao Khoa khoi 6.pdf

How to Give a Domain for a Field in Odoo 17

Python Notes for mca i year students osmania university.docx

General Principles of Intellectual Property: Concepts of Intellectual Proper...

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

SOC 101 Demonstration of Learning Presentation

Dyslexia AI Workshop for Slideshare.pptx

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx

Big Data and Hadoop Ecosystem

1. Big Data and Hadoop Presenter Rajkumar Singh http://rajkrrsingh.blogspot.com/ http://in.linkedin.com/in/rajkrrsingh

2. Big Data and Hadoop Introduction Volume Variety Velocity Facebook Google Plus Twitter LinkedIn Stock Exchange Healthcare Telecom Structured,SemiStructured,unstructured Facebook Stock Exchange Healthcare Telecom Mobile Devices GPS Security Infrastructure

3. The Problem e.g. Stock Market

4. The Solution (Hadoop Evolution) Traditional Approach

5. GB->TB->PB--ZB so the processing with RDBMS is Impossible

6. Challenges In Big data • Storage -- PB • Processing – In a timely manner • Variety of data -- S/SS/US • Cost

7. To overcome Big Data Challenges Hadoop evolves • Cost Effective – Commodity HW • Big Cluster – (1000 Nodes) --- Provides Storage n Processing • Parallel Processing – Map reduce • Big Storage – Memory per node * no of Nodes / RF • Fail over mechanism – Automatic Failover • Data Distribution • Map Reduce Framework • Moving Code to data • Heterogeneous Hardware System (IBM,HP,AIX,Oracle Machine of any memory and CPU configuration) • Scalable

8. Typical Hadoop Infrastructure

9. What is Hadoop • Java Framework to Process erroneous amount of data Hadoop Core • HDFS • Programming Construct (Map Reduce)

10. HDFS

11. Processing Framework (Mapreduce)

12. Hadoop Ecosystem

13. Hadoop Sub-Projects • Hadoop Common: The common utilities that support the other Hadoop subprojects. • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. • Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters. Other Hadoop-related projects at Apache include: • Avro™: A data serialization system. • Cassandra™: A scalable multi-master database with no single points of failure. • Chukwa™: A data collection system for managing large distributed systems. • HBase™: A scalable, distributed database that supports structured data storage for large tables. • Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. • Mahout™: A Scalable machine learning and data mining library. • Pig™: A high-level data-flow language and execution framework for parallel computation. • ZooKeeper™: A high-performance coordination service for distributed applications.

14. HDFS 250 GB DFS 250 GB 1 TB File 250 GB Based on GFS 250 GB

15. HDFS : Use Cases • Very large file. • Reading/Streaming Data Access. Read data in large volume Write once and Read frequent • Expensive Hardware. • Low latency Access. • Lots of small files • Parallel write/ Arbitrary Read

16. HDFS Building Blocks Default Block Size 64MB 128MB 1GB file = 1024 MB/128 MB = 8 Blocks For Small File Size 100 MB File < Block Size (128 MB) : Optimize for storage = 1 Block of HDFS of size 100 MB

17. HDFS Daemon Services • Name Node • Secondary Name Node • Data Node GFS (Master/Slave Architecture)

18. HDFS Write File 1: D1,D2,D4 File 2: D1,D2,D3 128 MB RF = 3 D1 D1,D2,D4 D2 D3 D4

19.

20.

21.

22. HDFS File System Commands

23.

24.

25. HDFS Federation

26. High Availability

27. Copying Data from one Cluster to another Cluster UAT Cluster Prod Cluster Parallel copying using distcp hadoop distcp hdfs://uat:54311/user/rajkrrsingh/input hdfs://prod:54311/user/rajkrrsingh/input

Big Data and Hadoop Ecosystem

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Big Data and Hadoop Ecosystem

Similar to Big Data and Hadoop Ecosystem (20)

Recently uploaded

Recently uploaded (20)

Big Data and Hadoop Ecosystem