SlideShare a Scribd company logo
GANDHI INSTITUTE FOR TECHNOLOGICAL
ADVANCEMENT, BHUBANESWAR
TECHNICAL SEMINAR ON
HADOOP
GUIDED BY- PRESENTED BY-
PROF.KUNDAN CHANDRA PATRA NAME-ABHIJEET RAJ
PROF. SWOGAT KUMAR JENA BRANCH-CSE(1)
PROF. SAROJ KUMAR MOHANTY REG NO.-1301287529
CONTENTS -
1. INTRODUCTION TO HADOOP
2. HADOOP-HISTORY AND ORIGIN
3. BIG DATA ANALYTICS AND CHALLENGES
4. HADOOP ECOSYSTEM
5. HDFS ARCHITECTURE
6. HADOOP VS RDBMS
7. MAP REDUCE
8. PIG AND HIVE
9. CONCLUSION
1Abhijeet raj,131001
INTRODUCTION-
• What is Hadoop-
• Apache Hadoop is an open-source software
framework for distribuited storage and
processing of large data
• Written in java
• Based on Google file system(GFS)
2Abhijeet raj,131001
Continued...
• It is designed to scale up from single servers to
thousands of machines, each offering local
computation and storage.
• Hadoop framework consists on two main layers
• HDFS
• Map Reduce
Abhijeet raj,131001 3
History and Origin
• Doug cutting trying to make an open source
search engine in 2003
• Google released their distributed system
papers called Map/Reduce and Google file
system (GFS) which powered Google search
engine:
Abhijeet raj,131001 4
Continued...
• Doug cutting took these ideas and started to
work on open source
• In 2006 he joins Yahoo! and the distributed
system named as Hadoop
• Yahoo open sourced it through Apache
organization
Abhijeet raj,131001 5
Organizations using Hadoop
• Amazon
• Adobe
• Cloudspace
• Ebay
• Facebook
• Google
• IBM
• LinkedIn
• yahoo
Abhijeet raj,131001 6
Big data analytics and
challenges
• Minimum size of that a Big Data file starts is
at least 1 Terabyte.
• 4 V’s tossed for Big Data:-
1. VOLUME- The scale of data
2. VARIETY- Different forms of data
3. VELOCITY- Analysis of streaming data
4. VARACITY- Uncertainity of data
Abhijeet raj,131001 7
Challenges for Big Data
processing
• Meeting the need for speed
• Scale
• Continuous Availability
• Displaying meaningful results
• Workload diversity
• Data security
• Cost
• Manageability
Abhijeet raj,131001 8
Hadoop vs traditional RDBMS
Abhijeet raj,131001 9
Factors Hadoop RDBMS
Size of data Petabytes Gigabytes
Integrity of data Low High
Data schema Dynamic Static
Access method Interactive and batch Batch
Scaling Linear Non linear
Data structure Unstructured/structured Structured
Normalization of data Not required Required
Query response time Has latency(due to
batch process)
Can be near immediate
Hadoop Ecosystem
Abhijeet raj,131001 10
HDFS(Hadoop Distribuited File System)
• a distributed file system designed to run on
commodity hardware
• It is suitable for the distributed storage and
processing.
• The built-in servers of namenode and
datanode help users to easily check the
status of cluster.
• HDFS provides file permissions and
authentication.
Abhijeet raj,131001 11
Continued...
Namenode
• Namenode is the node which stores the filesystem
metadata i.e. which file maps to what block
locations and which blocks are stored on which
datanode.
Datanode
• The data node is where the actual data resides.
Abhijeet raj,131001 12
Continued...
Job tracker
• primary function of the job tracker is resource
management ,tracking resource availability and
task life cycle management
Task tracker
• Follow the orders of the job tracker and
updating the job tracker with its progress status
periodically.
Abhijeet raj,131001 13
Abhijeet raj,131001 14
Goals of HDFS
• Fault detection and recovery
• Huge datasets
• Reduce network traffic
• Increases throughput
Abhijeet raj,131001 15
Map Reduce
• MapReduce is a processing technique and a
program model for distributed computing
based on java
• Map-data are broken into tuples
• Reduce-combines the tuples into a smaller
form
Abhijeet raj,131001 16
Abhijeet raj,131001 17
Advantages of Map Reduce
• Easy to scale data processing over multiple
computing nodes.
• Parallel processing.
• Fast.
• Simple model of programming
Abhijeet raj,131001 18
HBASE
• Developed by Apache software foundation
• Database for Hadoop.
• Open source
• Non-relational
Abhijeet raj,131001 19
Continued...
• Distribuited
• Written in java
• Connectivity is done using JDBC –Type 4
driver
Abhijeet raj,131001 20
YARN
• Yet Another Resource Negotiator
• In Yarn, the job tracker is split into two
different daemons called Resource
Manager and Node Manager
Abhijeet raj,131001 21
YARN ARCHITECTURE
Abhijeet raj,131001 22
PIG
• Analyzing large data sets that consists of a
high-level language for expressing data
analysis programs
• Structure is amenable to substantial
parallelization
Abhijeet raj,131001 23
Continued...
• Easy of programming
• Optimization opportunities
• Extensibility
Abhijeet raj,131001 24
HIVE
• Data warehouse software facilitates querying
and managing large datasets
• Allows traditional map/reduce programmers
to plug in their custom mappers and
reducers
Abhijeet raj,131001 25
PIG VS HIVE
Abhijeet raj,131001 26
PIG HIVE
TYPES OF FLOW PROCEDURAL LANGUAGE DECLARATIVE LANGUAGE
EASY OF USE COMPLEX EASY
NATURE OF USAGE EFFICIENCY IN COMPUTING ANALYTICS AREA
TYPE OF DATA VARIABLES TABLES
DEBUGGING FACILITY DEBUGGED LOCALLY COMPLEX
MAINTENANCE MORE LESS
DEVELOPMENT TIME MORE LESS
HANDLING BIG DATA HANDLES MORE DATA MEMORY OVERFLOW
REFERENCES
• hadoop.apache.org
• tutorialspoint.com
• hbase.apache.org
• en.wikipedia.org/wiki/Apache_Hadoop
• Pig.apache.org
• datastax.com
• youtube.com
• Google images
Abhijeet raj,131001 27
Conclusion
• Hadoop has been very effective solution for
companies dealing with the data in petabytes
or big data.
• Has overcame the limitations of traditional
data storage problems.
• Being open source , widely accepted
Abhijeet raj,131001 28
Abhijeet raj,131001 29
•
Abhijeet raj,131001 30

More Related Content

What's hot

Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
joelcrabb
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
karthika karthi
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Stanley Wang
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
Dr. C.V. Suresh Babu
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
Thisara Pramuditha
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Edureka!
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
Pietro Michiardi
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 

What's hot (20)

Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 

Viewers also liked

Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
Bhushan Kulkarni
 
Seminar_3D INTERNET
Seminar_3D INTERNETSeminar_3D INTERNET
Seminar_3D INTERNET
Preeti Rajak
 
Blue brain by MAYANK SAHU
Blue brain by MAYANK SAHUBlue brain by MAYANK SAHU
Blue brain by MAYANK SAHU
mayank843
 
3D Internet
3D Internet 3D Internet
3D Internet
Abhishek Abhi
 
Smart card technology
Smart card technologySmart card technology
Smart card technology
Lav Pratap
 
Best Ever PPT Of Bluebrain
Best Ever PPT Of BluebrainBest Ever PPT Of Bluebrain
Best Ever PPT Of Bluebrain
Prakash Thulaseedharan
 
Bluebrain
BluebrainBluebrain
Bluebrain
Subiya Nadar
 
Blue brain
Blue brain Blue brain
Blue brain
Aurobindo Nayak
 
Blue brain project ppt
Blue brain project pptBlue brain project ppt
Blue brain project pptLishita Shah
 
Bulletin d'informations n°001 18 avril 2016 18h00-vf
Bulletin d'informations n°001 18 avril 2016   18h00-vfBulletin d'informations n°001 18 avril 2016   18h00-vf
Bulletin d'informations n°001 18 avril 2016 18h00-vf
Commune Urbaine Antalaha
 
Why Most Of IT Companies outsourcing?
Why Most Of IT Companies outsourcing?Why Most Of IT Companies outsourcing?
Why Most Of IT Companies outsourcing?
Kryptos Technologies
 
FREE Phonics worksheets
FREE Phonics worksheetsFREE Phonics worksheets
FREE Phonics worksheets
Kids Academy Co
 
Heatkal Container Design Solutions (EN 12079)
Heatkal   Container Design Solutions (EN 12079)Heatkal   Container Design Solutions (EN 12079)
Heatkal Container Design Solutions (EN 12079)
Pankaj Kalaskar
 

Viewers also liked (17)

Hadoop
HadoopHadoop
Hadoop
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Seminar_3D INTERNET
Seminar_3D INTERNETSeminar_3D INTERNET
Seminar_3D INTERNET
 
Blue brain by MAYANK SAHU
Blue brain by MAYANK SAHUBlue brain by MAYANK SAHU
Blue brain by MAYANK SAHU
 
3D Internet
3D Internet 3D Internet
3D Internet
 
Smart card technology
Smart card technologySmart card technology
Smart card technology
 
Best Ever PPT Of Bluebrain
Best Ever PPT Of BluebrainBest Ever PPT Of Bluebrain
Best Ever PPT Of Bluebrain
 
3d internet
3d internet3d internet
3d internet
 
Bluebrain
BluebrainBluebrain
Bluebrain
 
Blue brain
Blue brain Blue brain
Blue brain
 
Blue brain project ppt
Blue brain project pptBlue brain project ppt
Blue brain project ppt
 
Bulletin d'informations n°001 18 avril 2016 18h00-vf
Bulletin d'informations n°001 18 avril 2016   18h00-vfBulletin d'informations n°001 18 avril 2016   18h00-vf
Bulletin d'informations n°001 18 avril 2016 18h00-vf
 
Why Most Of IT Companies outsourcing?
Why Most Of IT Companies outsourcing?Why Most Of IT Companies outsourcing?
Why Most Of IT Companies outsourcing?
 
LOFAR
LOFARLOFAR
LOFAR
 
FREE Phonics worksheets
FREE Phonics worksheetsFREE Phonics worksheets
FREE Phonics worksheets
 
Lauren CV 2016
Lauren CV 2016Lauren CV 2016
Lauren CV 2016
 
Heatkal Container Design Solutions (EN 12079)
Heatkal   Container Design Solutions (EN 12079)Heatkal   Container Design Solutions (EN 12079)
Heatkal Container Design Solutions (EN 12079)
 

Similar to Hadoop

M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Dr.Florence Dayana
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Abdul Nasir
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop cluster
Furqan Haider
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
Hadoop
HadoopHadoop
Hadoop
avnishagr
 
Hadoop
HadoopHadoop
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
Humoyun Ahmedov
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
Learntek1
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
Arvind Kumar
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
Tom Rogers
 
Hadoop
HadoopHadoop
Hadoop
thisisnabin
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
databloginfo
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
Anthony Thomas
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
ManiMaran230751
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 

Similar to Hadoop (20)

M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop cluster
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Hadoop
HadoopHadoop
Hadoop
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 

Recently uploaded

Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 

Recently uploaded (20)

Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 

Hadoop

  • 1. GANDHI INSTITUTE FOR TECHNOLOGICAL ADVANCEMENT, BHUBANESWAR TECHNICAL SEMINAR ON HADOOP GUIDED BY- PRESENTED BY- PROF.KUNDAN CHANDRA PATRA NAME-ABHIJEET RAJ PROF. SWOGAT KUMAR JENA BRANCH-CSE(1) PROF. SAROJ KUMAR MOHANTY REG NO.-1301287529
  • 2. CONTENTS - 1. INTRODUCTION TO HADOOP 2. HADOOP-HISTORY AND ORIGIN 3. BIG DATA ANALYTICS AND CHALLENGES 4. HADOOP ECOSYSTEM 5. HDFS ARCHITECTURE 6. HADOOP VS RDBMS 7. MAP REDUCE 8. PIG AND HIVE 9. CONCLUSION 1Abhijeet raj,131001
  • 3. INTRODUCTION- • What is Hadoop- • Apache Hadoop is an open-source software framework for distribuited storage and processing of large data • Written in java • Based on Google file system(GFS) 2Abhijeet raj,131001
  • 4. Continued... • It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. • Hadoop framework consists on two main layers • HDFS • Map Reduce Abhijeet raj,131001 3
  • 5. History and Origin • Doug cutting trying to make an open source search engine in 2003 • Google released their distributed system papers called Map/Reduce and Google file system (GFS) which powered Google search engine: Abhijeet raj,131001 4
  • 6. Continued... • Doug cutting took these ideas and started to work on open source • In 2006 he joins Yahoo! and the distributed system named as Hadoop • Yahoo open sourced it through Apache organization Abhijeet raj,131001 5
  • 7. Organizations using Hadoop • Amazon • Adobe • Cloudspace • Ebay • Facebook • Google • IBM • LinkedIn • yahoo Abhijeet raj,131001 6
  • 8. Big data analytics and challenges • Minimum size of that a Big Data file starts is at least 1 Terabyte. • 4 V’s tossed for Big Data:- 1. VOLUME- The scale of data 2. VARIETY- Different forms of data 3. VELOCITY- Analysis of streaming data 4. VARACITY- Uncertainity of data Abhijeet raj,131001 7
  • 9. Challenges for Big Data processing • Meeting the need for speed • Scale • Continuous Availability • Displaying meaningful results • Workload diversity • Data security • Cost • Manageability Abhijeet raj,131001 8
  • 10. Hadoop vs traditional RDBMS Abhijeet raj,131001 9 Factors Hadoop RDBMS Size of data Petabytes Gigabytes Integrity of data Low High Data schema Dynamic Static Access method Interactive and batch Batch Scaling Linear Non linear Data structure Unstructured/structured Structured Normalization of data Not required Required Query response time Has latency(due to batch process) Can be near immediate
  • 12. HDFS(Hadoop Distribuited File System) • a distributed file system designed to run on commodity hardware • It is suitable for the distributed storage and processing. • The built-in servers of namenode and datanode help users to easily check the status of cluster. • HDFS provides file permissions and authentication. Abhijeet raj,131001 11
  • 13. Continued... Namenode • Namenode is the node which stores the filesystem metadata i.e. which file maps to what block locations and which blocks are stored on which datanode. Datanode • The data node is where the actual data resides. Abhijeet raj,131001 12
  • 14. Continued... Job tracker • primary function of the job tracker is resource management ,tracking resource availability and task life cycle management Task tracker • Follow the orders of the job tracker and updating the job tracker with its progress status periodically. Abhijeet raj,131001 13
  • 16. Goals of HDFS • Fault detection and recovery • Huge datasets • Reduce network traffic • Increases throughput Abhijeet raj,131001 15
  • 17. Map Reduce • MapReduce is a processing technique and a program model for distributed computing based on java • Map-data are broken into tuples • Reduce-combines the tuples into a smaller form Abhijeet raj,131001 16
  • 19. Advantages of Map Reduce • Easy to scale data processing over multiple computing nodes. • Parallel processing. • Fast. • Simple model of programming Abhijeet raj,131001 18
  • 20. HBASE • Developed by Apache software foundation • Database for Hadoop. • Open source • Non-relational Abhijeet raj,131001 19
  • 21. Continued... • Distribuited • Written in java • Connectivity is done using JDBC –Type 4 driver Abhijeet raj,131001 20
  • 22. YARN • Yet Another Resource Negotiator • In Yarn, the job tracker is split into two different daemons called Resource Manager and Node Manager Abhijeet raj,131001 21
  • 24. PIG • Analyzing large data sets that consists of a high-level language for expressing data analysis programs • Structure is amenable to substantial parallelization Abhijeet raj,131001 23
  • 25. Continued... • Easy of programming • Optimization opportunities • Extensibility Abhijeet raj,131001 24
  • 26. HIVE • Data warehouse software facilitates querying and managing large datasets • Allows traditional map/reduce programmers to plug in their custom mappers and reducers Abhijeet raj,131001 25
  • 27. PIG VS HIVE Abhijeet raj,131001 26 PIG HIVE TYPES OF FLOW PROCEDURAL LANGUAGE DECLARATIVE LANGUAGE EASY OF USE COMPLEX EASY NATURE OF USAGE EFFICIENCY IN COMPUTING ANALYTICS AREA TYPE OF DATA VARIABLES TABLES DEBUGGING FACILITY DEBUGGED LOCALLY COMPLEX MAINTENANCE MORE LESS DEVELOPMENT TIME MORE LESS HANDLING BIG DATA HANDLES MORE DATA MEMORY OVERFLOW
  • 28. REFERENCES • hadoop.apache.org • tutorialspoint.com • hbase.apache.org • en.wikipedia.org/wiki/Apache_Hadoop • Pig.apache.org • datastax.com • youtube.com • Google images Abhijeet raj,131001 27
  • 29. Conclusion • Hadoop has been very effective solution for companies dealing with the data in petabytes or big data. • Has overcame the limitations of traditional data storage problems. • Being open source , widely accepted Abhijeet raj,131001 28