SlideShare a Scribd company logo
Krishnendu P
CONTENTS:
 Data and Big Data
 Problems with Big Data
 Hadoop
 Small History of Hadoop
 What problems can Hadoop solve?
 Components of Hadoop - HDFS, MapReduce
 Hadoop Cluster
 High Level Archetecture of Hadoop
 Hadoop Core Components
 Features of Hadoop
 Limitations of Hadoop
 Users of Hadoop
 Conclusion
 References
Data:
➔ Any real world symbol (character, numeric,
special character ) or group of them is said
to be data.
➔It may be visual, audio, scriptual etc.
Big Data
Big data means really a big data, it is a collection
of large datasets that cannot be processed using
on hand database management tools or
traditional computing techniques.
Big Data
The Big Data includes huge volume, high velocity,
and extensible variety of data. The data in it will be of
three types.
Structured data : Relational data.
Semi Structured data : XML data.
Unstructured data : Word, PDF, Text
Problems with Big Data:
➔Daily about 0.5 petabytes of updates are being
made into FACEBOOK including 40 millions
photos.
➔Daily YOUTUBE is loaded with videos that can be
watched for one year continously.
➔Limitations are encountered due to large data sets
in many areas, including genomics,complex
physics simulations, and biological and
environmental research.
Cont...
➔Also affect Internet search, finance and
business informatics.
➔The challenges include in capture, retrieval
,storage, search, sharing, analysis, and
visualization.
What could be the solution for
Big Data ?
hadoohadoo
pp
What is hadoop ?
➔Hadoop is an open source, Java-based
programming framework developed by Doug
Cutting and Mike Cafarella in 2005.
➔It is part of the Apache project sponsored by the
Apache Software Foundation.
➔Its designed to scale up from single servers to
thousands of machines, each offering local computers
and storage.
Cont...
➔It is used for distributed storage and distributed
processing of very large data sets on computer
clusters built from commodity hardware.
Small History
➔Hadoop was inspired by Google's MapReduce, a
software framework in which an application is
broken down into numerous small parts.
➔Any of these parts(also called fragments or blocks)
can be run on any node in the cluster.
➔Doug Cutting, Hadoop's creator, named the
framework after his child's stuffed toy elephant.
Small History
➔Started with building Web Search Engine
- Nutch in 2002
- Aim was to index billons of pages.
- Archetecture can't support billons of pages.
➔Google's GFS in 2003 solved storage problem.
- Nutch Distributed File System in 2004.
➔Google's MapReduce in 2004
- MapReduce implemented in 2005.
Doug Cutting with Hadoop
Mike Cafarella
2005: Doug Cutting and Mike Cafarella developed Hadoop
to support distribution for the Nutch search engine project.
The project was funded by Yahoo.
2006: Yahoo gave the project to Apache
Software Foundation.
Now Apache Hadoop is a registered trademark of the
Apache Software Foundation.
What problems can Hadoop solve?
The Hadoop platform was designed to solve problems
where you have a lot of data " perhaps a mixture of
complex and structured data " and it doesn't fit well
into tables.
Components Of Hadoop
Hadoop consists of MapReduce, the Hadoop
distributed file system (HDFS) and a number of
related projects such as Apache Hive, HBase and
Zookeeper.
HADOOPHADOOP
HDFS MapReduce
Hadoop seminar
HDFS (Hadoop Distributed File System)
➔The Hadoop Distributed File System (HDFS) is a
distributed file system designed to run on
commodity hardware.
➔ Its is a sub-project of Apache Hadoop project.
➔ HDFS is highly fault-tolerant and is designed to
be deployed on low-cost hardware.
➔HDFS provides high throughput access to
application data and is suitable for applications
that have large data sets.
Cont...
➔The HDFS takes care of storing and managing the
data within the hadoop cluster.
Cont...
MapReduce
➔ MapReducing is a programming model used for
processing large data sets.
➔Programs written in this functional style are
automatically parallelized and executed on a large
cluster of commodity machines.
➔MapReduce is an associated implementation for
processing and generating large data sets.
MapReduce
MapReduce program executes in two stages, namely
map stage, and reduce stage.
Map stage :
The map or mapper’s job is to process the
input data. Generally the input data is in the form of
file or directory and is stored in the Hadoop file
system (HDFS). The input file is passed to the
mapper function line by line. The mapper processes
the data and creates several small chunks of data.
MapReduce
MapReduce program executes in two stages, namely
map stage, and reduce stage.
Reduce stage :
The Reducer’s job is to process the data that
comes from the mapper. After processing, it
produces a new set of output, which will be stored in
the HDFS.
MapReduce
Hadoop Core components
MASTER NODE
SLAVE NODE
Name node
Data node
Job tracker
Task tracker
Storage node Compute node
Cont...
Node :
It is a technical term used to describe a
machine or a computer that is present in a
cluster.
Demode :
It is a technical term used to describe the
background process that is running on a
linux machine.
Cont...
➔ The Master node responsible for running
Name nodes and Job tracker demodes.
➔The Slave node responsible for running the
Data nodes and Task tracker demodes.
Cont...
➔Name node and Data node are responsible
for storing and managing the data, and they
are commonly referred to as Storage Node.
➔Job Tracker and Task Tracker are
responsible for processing and computing the
data, and they are commonly referred to as
Compute Node.
Cont..
➔Usually Name node and Job tracker
configured on a single machine.
➔ The Data node and Task tracker
configured on multiple machines. But can
have instances running on more than one
machines at the same time.
Hadoop Cluster
➔ Normally any set of loosely connected or tightly
connected computers that work together as a single
system is called Cluster.
➔ In simple words, a computer cluster used for Hadoop
is called Hadoop Cluster.
Hadoop Cluster
Hadoop cluster is a special type of computational
cluster designed for storing and analyzing vast
amount of unstructured data in a distributed
computing environment. These clusters run on low
cost commodity computers.
Hadoop Cluster
Hadoop Cluster
➔Hadoop clusters are often referred to as "shared
nothing" systems because the only thing that is
shared between nodes is the network that connects
Them.
➔Clustering improves the system's availability to
users.
Hadoop Cluster
A Real Time Example:
Here is a picture of Yahoo's Hadoop cluster. They
have more than 10,000 machines running Hadoop
and nearly 1 petabyte of user data.
● Scalability :
Scalability basically refers to the ability of
adding or removing the nodes without bringing
down or affecting the cluster operation.
Features of Hadoop
Features of Hadoop
● Cost effective :
Hadoop does not requires any expensive
cost specialized harware. In other words, it can
be implemented on a simple hardware. These
hardware components are technically called as
commodity hardware.
Features of Hadoop
● Large Cluster of Nodes:
A hadoop cluster can be made up
off 100's and 1000's of nodes. One of the
main advantage of having a large cluster is, it
offers more computing power and huge
storage system to the clients.
Features of Hadoop
● Parallel Processing of Data:
The data can be process
simultaniously across all the nodes
within the cluster and thus saving a lot
of time.
Features of Hadoop
● Automatic Failover Management:
In case, if any of the nodes
within the cluster fails, the hadoop framework
will replace that particular machine with
another machine.
● Flexible :
Hadoop is schema-less, and can
absorb any type of data, structured or not,
from any number of sources.
● Fault-tolerant :
When you lose a node, the system
redirects work to another location of the
data and continue processing without
missing a beat.
Features of Hadoop
Limitations of Hadoop
● Security concerns
● Vulnerable by nature
● Not fit for Small data
● Potential steability issues
What is Hadoop used for?
● Search
– Yahoo, Amazon, Zvents
• Log processing
– Facebook, Yahoo, ContextWeb. Joost,
Last.fm
• Recommendation Systems
– Facebook
• Data Warehouse
– Facebook, AOL(America Online)
• Video and Image Analysis
– New York Times, Eyealike
Conclusion
➔Hadoop has been very effective for companies
dealing with the data in petabytes.
➔It has solved many problems in industry
related to huge data management and
distributed system.
➔As it is open source, so it is adopted by
companies widely.
References
● www.dezyre.com/Big-Data-and-Hadoop
● www.cloudera.com/content/www/...hadoop
/hdfs-mapreduce-yarn.html
● www.ufaber.com/hadoop/bigbata/free
● www.psgtech.edu/yrgcc/attach/haoop_archite
cture.ppt
Hadoop seminar
Hadoop seminar

More Related Content

What's hot

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
Edureka!
 
Hadoop
HadoopHadoop
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
kristinferrier
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
Andrea Iacono
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Bhavesh Padharia
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Simplilearn
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
Knoldus Inc.
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
Edureka!
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
 
Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
PoojaShah174393
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 

What's hot (20)

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 

Similar to Hadoop seminar

Anju
AnjuAnju
Hadoop info
Hadoop infoHadoop info
Hadoop info
Nikita Sure
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
RajatTripathi34
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
Harshdeep Kaur
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
umapavankumar kethavarapu
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
KennyPratheepKumar
 
hadoop
hadoophadoop
hadoop
swatic018
 
hadoop
hadoophadoop
hadoop
swatic018
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
Hadoop
HadoopHadoop
Hadoop
Oded Rotter
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
Tarak Tar
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
Tarak Tar
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
AltafKhadim
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
Muthusamy Manigandan
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
Sonal Tiwari
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
chunkypandey12
 

Similar to Hadoop seminar (20)

Anju
AnjuAnju
Anju
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 

Recently uploaded

Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
MysoreMuleSoftMeetup
 
matatag curriculum education for Kindergarten
matatag curriculum education for Kindergartenmatatag curriculum education for Kindergarten
matatag curriculum education for Kindergarten
SarahAlie1
 
Year-to-Date Filter in Odoo 17 Dashboard
Year-to-Date Filter in Odoo 17 DashboardYear-to-Date Filter in Odoo 17 Dashboard
Year-to-Date Filter in Odoo 17 Dashboard
Celine George
 
(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening
MJDuyan
 
New Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 SlidesNew Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 Slides
Celine George
 
How to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POSHow to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POS
Celine George
 
formative Evaluation By Dr.Kshirsagar R.V
formative Evaluation By Dr.Kshirsagar R.Vformative Evaluation By Dr.Kshirsagar R.V
formative Evaluation By Dr.Kshirsagar R.V
DrRavindrakshirsagar1
 
Principles of Roods Approach!!!!!!!.pptx
Principles of Roods Approach!!!!!!!.pptxPrinciples of Roods Approach!!!!!!!.pptx
Principles of Roods Approach!!!!!!!.pptx
ibtesaam huma
 
1-NLC-MATH7-Consolidation-Lesson1 2024.pptx
1-NLC-MATH7-Consolidation-Lesson1 2024.pptx1-NLC-MATH7-Consolidation-Lesson1 2024.pptx
1-NLC-MATH7-Consolidation-Lesson1 2024.pptx
AnneMarieJacildo
 
Odoo 17 Social Marketing - Lead Generation On Facebook
Odoo 17 Social Marketing - Lead Generation On FacebookOdoo 17 Social Marketing - Lead Generation On Facebook
Odoo 17 Social Marketing - Lead Generation On Facebook
Celine George
 
How to Store Data on the Odoo 17 Website
How to Store Data on the Odoo 17 WebsiteHow to Store Data on the Odoo 17 Website
How to Store Data on the Odoo 17 Website
Celine George
 
Split Shifts From Gantt View in the Odoo 17
Split Shifts From Gantt View in the  Odoo 17Split Shifts From Gantt View in the  Odoo 17
Split Shifts From Gantt View in the Odoo 17
Celine George
 
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
HappieMontevirgenCas
 
How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17
Celine George
 
National Learning Camp Grade 7 ENGLISH 7-LESSON 7.pptx
National Learning Camp Grade 7 ENGLISH 7-LESSON 7.pptxNational Learning Camp Grade 7 ENGLISH 7-LESSON 7.pptx
National Learning Camp Grade 7 ENGLISH 7-LESSON 7.pptx
EdsNatividad
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
heathfieldcps1
 
CTD Punjab Police Past Papers MCQs PPSC PDF
CTD Punjab Police Past Papers MCQs PPSC PDFCTD Punjab Police Past Papers MCQs PPSC PDF
CTD Punjab Police Past Papers MCQs PPSC PDF
hammadmughal76316
 
How To Create a Transient Model in Odoo 17
How To Create a Transient Model in Odoo 17How To Create a Transient Model in Odoo 17
How To Create a Transient Model in Odoo 17
Celine George
 
modul ajar kelas x bahasa inggris 2024-2025
modul ajar kelas x bahasa inggris 2024-2025modul ajar kelas x bahasa inggris 2024-2025
modul ajar kelas x bahasa inggris 2024-2025
NurFitriah45
 
How to Create a New Article in Knowledge App in Odoo 17
How to Create a New Article in Knowledge App in Odoo 17How to Create a New Article in Knowledge App in Odoo 17
How to Create a New Article in Knowledge App in Odoo 17
Celine George
 

Recently uploaded (20)

Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
Configuring Single Sign-On (SSO) via Identity Management | MuleSoft Mysore Me...
 
matatag curriculum education for Kindergarten
matatag curriculum education for Kindergartenmatatag curriculum education for Kindergarten
matatag curriculum education for Kindergarten
 
Year-to-Date Filter in Odoo 17 Dashboard
Year-to-Date Filter in Odoo 17 DashboardYear-to-Date Filter in Odoo 17 Dashboard
Year-to-Date Filter in Odoo 17 Dashboard
 
(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening(T.L.E.) Agriculture: Essentials of Gardening
(T.L.E.) Agriculture: Essentials of Gardening
 
New Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 SlidesNew Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 Slides
 
How to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POSHow to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POS
 
formative Evaluation By Dr.Kshirsagar R.V
formative Evaluation By Dr.Kshirsagar R.Vformative Evaluation By Dr.Kshirsagar R.V
formative Evaluation By Dr.Kshirsagar R.V
 
Principles of Roods Approach!!!!!!!.pptx
Principles of Roods Approach!!!!!!!.pptxPrinciples of Roods Approach!!!!!!!.pptx
Principles of Roods Approach!!!!!!!.pptx
 
1-NLC-MATH7-Consolidation-Lesson1 2024.pptx
1-NLC-MATH7-Consolidation-Lesson1 2024.pptx1-NLC-MATH7-Consolidation-Lesson1 2024.pptx
1-NLC-MATH7-Consolidation-Lesson1 2024.pptx
 
Odoo 17 Social Marketing - Lead Generation On Facebook
Odoo 17 Social Marketing - Lead Generation On FacebookOdoo 17 Social Marketing - Lead Generation On Facebook
Odoo 17 Social Marketing - Lead Generation On Facebook
 
How to Store Data on the Odoo 17 Website
How to Store Data on the Odoo 17 WebsiteHow to Store Data on the Odoo 17 Website
How to Store Data on the Odoo 17 Website
 
Split Shifts From Gantt View in the Odoo 17
Split Shifts From Gantt View in the  Odoo 17Split Shifts From Gantt View in the  Odoo 17
Split Shifts From Gantt View in the Odoo 17
 
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
 
How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17
 
National Learning Camp Grade 7 ENGLISH 7-LESSON 7.pptx
National Learning Camp Grade 7 ENGLISH 7-LESSON 7.pptxNational Learning Camp Grade 7 ENGLISH 7-LESSON 7.pptx
National Learning Camp Grade 7 ENGLISH 7-LESSON 7.pptx
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
CTD Punjab Police Past Papers MCQs PPSC PDF
CTD Punjab Police Past Papers MCQs PPSC PDFCTD Punjab Police Past Papers MCQs PPSC PDF
CTD Punjab Police Past Papers MCQs PPSC PDF
 
How To Create a Transient Model in Odoo 17
How To Create a Transient Model in Odoo 17How To Create a Transient Model in Odoo 17
How To Create a Transient Model in Odoo 17
 
modul ajar kelas x bahasa inggris 2024-2025
modul ajar kelas x bahasa inggris 2024-2025modul ajar kelas x bahasa inggris 2024-2025
modul ajar kelas x bahasa inggris 2024-2025
 
How to Create a New Article in Knowledge App in Odoo 17
How to Create a New Article in Knowledge App in Odoo 17How to Create a New Article in Knowledge App in Odoo 17
How to Create a New Article in Knowledge App in Odoo 17
 

Hadoop seminar

  • 2. CONTENTS:  Data and Big Data  Problems with Big Data  Hadoop  Small History of Hadoop  What problems can Hadoop solve?  Components of Hadoop - HDFS, MapReduce  Hadoop Cluster  High Level Archetecture of Hadoop  Hadoop Core Components  Features of Hadoop  Limitations of Hadoop  Users of Hadoop  Conclusion  References
  • 3. Data: ➔ Any real world symbol (character, numeric, special character ) or group of them is said to be data. ➔It may be visual, audio, scriptual etc.
  • 4. Big Data Big data means really a big data, it is a collection of large datasets that cannot be processed using on hand database management tools or traditional computing techniques.
  • 5. Big Data The Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types. Structured data : Relational data. Semi Structured data : XML data. Unstructured data : Word, PDF, Text
  • 6. Problems with Big Data: ➔Daily about 0.5 petabytes of updates are being made into FACEBOOK including 40 millions photos. ➔Daily YOUTUBE is loaded with videos that can be watched for one year continously. ➔Limitations are encountered due to large data sets in many areas, including genomics,complex physics simulations, and biological and environmental research.
  • 7. Cont... ➔Also affect Internet search, finance and business informatics. ➔The challenges include in capture, retrieval ,storage, search, sharing, analysis, and visualization.
  • 8. What could be the solution for Big Data ?
  • 10. What is hadoop ? ➔Hadoop is an open source, Java-based programming framework developed by Doug Cutting and Mike Cafarella in 2005. ➔It is part of the Apache project sponsored by the Apache Software Foundation.
  • 11. ➔Its designed to scale up from single servers to thousands of machines, each offering local computers and storage. Cont... ➔It is used for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.
  • 12. Small History ➔Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts. ➔Any of these parts(also called fragments or blocks) can be run on any node in the cluster. ➔Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant.
  • 13. Small History ➔Started with building Web Search Engine - Nutch in 2002 - Aim was to index billons of pages. - Archetecture can't support billons of pages. ➔Google's GFS in 2003 solved storage problem. - Nutch Distributed File System in 2004. ➔Google's MapReduce in 2004 - MapReduce implemented in 2005.
  • 16. 2005: Doug Cutting and Mike Cafarella developed Hadoop to support distribution for the Nutch search engine project. The project was funded by Yahoo. 2006: Yahoo gave the project to Apache Software Foundation. Now Apache Hadoop is a registered trademark of the Apache Software Foundation.
  • 17. What problems can Hadoop solve? The Hadoop platform was designed to solve problems where you have a lot of data " perhaps a mixture of complex and structured data " and it doesn't fit well into tables.
  • 18. Components Of Hadoop Hadoop consists of MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper.
  • 21. HDFS (Hadoop Distributed File System) ➔The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. ➔ Its is a sub-project of Apache Hadoop project. ➔ HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware.
  • 22. ➔HDFS provides high throughput access to application data and is suitable for applications that have large data sets. Cont... ➔The HDFS takes care of storing and managing the data within the hadoop cluster.
  • 24. MapReduce ➔ MapReducing is a programming model used for processing large data sets. ➔Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. ➔MapReduce is an associated implementation for processing and generating large data sets.
  • 25. MapReduce MapReduce program executes in two stages, namely map stage, and reduce stage. Map stage : The map or mapper’s job is to process the input data. Generally the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data.
  • 26. MapReduce MapReduce program executes in two stages, namely map stage, and reduce stage. Reduce stage : The Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.
  • 28. Hadoop Core components MASTER NODE SLAVE NODE Name node Data node Job tracker Task tracker Storage node Compute node
  • 29. Cont... Node : It is a technical term used to describe a machine or a computer that is present in a cluster. Demode : It is a technical term used to describe the background process that is running on a linux machine.
  • 30. Cont... ➔ The Master node responsible for running Name nodes and Job tracker demodes. ➔The Slave node responsible for running the Data nodes and Task tracker demodes.
  • 31. Cont... ➔Name node and Data node are responsible for storing and managing the data, and they are commonly referred to as Storage Node. ➔Job Tracker and Task Tracker are responsible for processing and computing the data, and they are commonly referred to as Compute Node.
  • 32. Cont.. ➔Usually Name node and Job tracker configured on a single machine. ➔ The Data node and Task tracker configured on multiple machines. But can have instances running on more than one machines at the same time.
  • 33. Hadoop Cluster ➔ Normally any set of loosely connected or tightly connected computers that work together as a single system is called Cluster. ➔ In simple words, a computer cluster used for Hadoop is called Hadoop Cluster.
  • 34. Hadoop Cluster Hadoop cluster is a special type of computational cluster designed for storing and analyzing vast amount of unstructured data in a distributed computing environment. These clusters run on low cost commodity computers.
  • 36. Hadoop Cluster ➔Hadoop clusters are often referred to as "shared nothing" systems because the only thing that is shared between nodes is the network that connects Them. ➔Clustering improves the system's availability to users.
  • 37. Hadoop Cluster A Real Time Example: Here is a picture of Yahoo's Hadoop cluster. They have more than 10,000 machines running Hadoop and nearly 1 petabyte of user data.
  • 38. ● Scalability : Scalability basically refers to the ability of adding or removing the nodes without bringing down or affecting the cluster operation. Features of Hadoop
  • 39. Features of Hadoop ● Cost effective : Hadoop does not requires any expensive cost specialized harware. In other words, it can be implemented on a simple hardware. These hardware components are technically called as commodity hardware.
  • 40. Features of Hadoop ● Large Cluster of Nodes: A hadoop cluster can be made up off 100's and 1000's of nodes. One of the main advantage of having a large cluster is, it offers more computing power and huge storage system to the clients.
  • 41. Features of Hadoop ● Parallel Processing of Data: The data can be process simultaniously across all the nodes within the cluster and thus saving a lot of time.
  • 42. Features of Hadoop ● Automatic Failover Management: In case, if any of the nodes within the cluster fails, the hadoop framework will replace that particular machine with another machine.
  • 43. ● Flexible : Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. ● Fault-tolerant : When you lose a node, the system redirects work to another location of the data and continue processing without missing a beat. Features of Hadoop
  • 44. Limitations of Hadoop ● Security concerns ● Vulnerable by nature ● Not fit for Small data ● Potential steability issues
  • 45. What is Hadoop used for? ● Search – Yahoo, Amazon, Zvents • Log processing – Facebook, Yahoo, ContextWeb. Joost, Last.fm • Recommendation Systems – Facebook • Data Warehouse – Facebook, AOL(America Online) • Video and Image Analysis – New York Times, Eyealike
  • 46. Conclusion ➔Hadoop has been very effective for companies dealing with the data in petabytes. ➔It has solved many problems in industry related to huge data management and distributed system. ➔As it is open source, so it is adopted by companies widely.
  • 47. References ● www.dezyre.com/Big-Data-and-Hadoop ● www.cloudera.com/content/www/...hadoop /hdfs-mapreduce-yarn.html ● www.ufaber.com/hadoop/bigbata/free ● www.psgtech.edu/yrgcc/attach/haoop_archite cture.ppt