SlideShare a Scribd company logo
1 of 14
Download to read offline
What is Hadoop
HTTP://WWW.ASTERIXSOLUTION.COM/BIG-DATA-HADOOP-TRAINING-IN-
MUMBAI.HTML
 When we look at how data was handled in the past, we see that it was a
fairly easy task due to the limited amount of data that professionals had to
work with. Years ago, only one processor and storage unit was required to
handle data.
 It was handled with the concept of structured data and a database that
contained the relevant data.
 SQL queries made it possible to go through giant spreadsheets with
multiple rows and columns.
 As the years went by and data generation increased, higher volumes and
more formats emerged. Hence, multiple processors were needed to
process data in order to save time.
 However, a single storage unit became the bottleneck due to the network
overhead that was generated. This led to using a distributed storage unit
for each processor, which made data access easier.
 This method is known as parallel processing with distributed storage -
various computers run the processes on various storages.
 Big Data and its Challenges
 Big data refers to the massive amount of data which cannot be stored,
processed and analyzed using traditional ways.
 The main elements of big data are:
 Volume - There is a massive amount of data generated every second.
 Velocity - The speed at which data is generated, collected and analyzed
 Variety - The different types of data: structured, semi-structured,
unstructured
 Value - The ability to turn data into useful insights for your business
 Veracity - Trustworthiness in terms of quality and accuracy
 Hadoop and its Components
 Hadoop is a framework that uses distributed storage and parallel processing to
store and manage big data. It is the most commonly used software to handle
big data. There are two components of Hadoop.
 Hadoop HDFS - Hadoop Distributed File System (HDFS) is the storage unit of
Hadoop.
 Hadoop MapReduce - Hadoop MapReduce is the processing unit of Hadoop.
 Hadoop HDFS
 Data is stored in a distributed manner in HDFS. There are two
of HDFS - name node and data node. While there is only one name
there can be multiple data nodes.
 HDFS is specially designed for storing huge datasets in commodity
hardware. An enterprise version of a server costs roughly $10,000 per
terabyte for the full processor. In case you need to buy 100 of these
enterprise version servers, it will go up to a million dollars.
 Hadoop enables you to use commodity machines as your data nodes.
 Features of HDFS
 Provides distributed storage
 Can be implemented on commodity hardware
 Provides data security
 Highly fault tolerant - If one machine goes down, the data from that
machine goes to the next machine
 Master and slave nodes
 Master and slave nodes form the HDFS cluster. The name node is called
the master and the data nodes are called the slaves.
 Hadoop MapReduce
 Hadoop MapReduce is the processing unit of Hadoop. In the MapReduce
approach, the processing is done at the slave nodes and the final result is
sent to the master node.
 A data containing code is used to process the entire data. This coded data
usually very small in comparison to the data itself.
 You only need to send a few kilobytes worth of code to perform heavy duty
process on computers.
 The input dataset is first split into chunks of data. In this example, the input
has three lines of text with three separate entities - “bus car train”, “ship ship
train”, “bus ship car”.
 The dataset is then split into three chunks, based on these entities, and
processed parallelly.
 In the map phase, the data is assigned a key and a value of 1. In this case, we
have one bus, one car, one ship, and one train.
 These key-value pairs are then shuffled and sorted together based on their
keys. At the reduce phase, the aggregation takes place and the final output
is obtained.
 Hadoop YARN
 Hadoop YARN stands for Yet Another Resource Negotiator. It is the resource
management unit of Hadoop and is available as a component of Hadoop
version 2.
 Hadoop YARN acts like an OS to Hadoop. It is a file system that is built on
top of HDFS.
 It is responsible for managing cluster resources to make sure you don't
overload one machine.
 It performs job scheduling to make sure that the jobs are scheduled in the
right place
 Suppose a client machine wants to do a query or fetch some code for data
analysis. This job request goes to the resource manager (Hadoop Yarn),
which is responsible for resource allocation and management.
 In the node section, each of the nodes has their own node managers. These
node managers manage the nodes and monitor the resource usage in the node.
The containers contain a collection of physical resources, which could be RAM,
CPU or hard drives. Whenever a job request comes in, the app master requests
the container from the node manager. Once the node manager gets the
resource, it goes back to the Resource Manager.
 A Use Case of Hadoop
 In this case study, we will discuss how Hadoop can combat fraudulent activities.
 Let us look at the case of Zions Bancorporation. Their main challenge was in how
to use the Zions security team’s approaches to combat fraudulent activities
taking place. The problem was that they used an RDBMS dataset, which was
unable to store and analyze huge amounts of data.
 In other words, they were only able to analyze small amounts of data. But with a
flood of customers coming in, there were so many things they couldn’t keep
track of, which left them vulnerable to fraudulent activities
 They began to use parallel processing. However, the data was unstructured and
analyzing it was not possible. Not only did they have a huge amount of data
that could not get into their databases, but they also had unstructured data.
 Hadoop enabled the Zions’ team to pull all that massive amounts of data
together and store it in one place. It also became possible to process and
analyze the huge amounts of unstructured data that they had. It was more time
efficient and the in-depth analysis of various data formats became easier
through Hadoop. Zions’ team could now detect everything from malware,
spears, and phishing attempts to account takeovers.
www.asterixsolution.com
www.plus.google.com/+Asterixsolutionlab
www.facebook.com/asterixsolutionlab
To Know More Visit :-
http://www.asterixsolution.com/big-data-hadoop-training-in-mumbai.html

More Related Content

What's hot

Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopRojaT4
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applicationsdzhou
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3RojaT4
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1RojaT4
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionEdureka!
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 

What's hot (19)

Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - Hadoop
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
 
Anju
AnjuAnju
Anju
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 

Similar to What is Hadoop: An Introduction to Hadoop Components and Use Cases

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystemrohitraj268
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesDavid Tjahjono,MD,MBA(UK)
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khanKamranKhan587
 

Similar to What is Hadoop: An Introduction to Hadoop Components and Use Cases (20)

Big data
Big dataBig data
Big data
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
paper
paperpaper
paper
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 

More from faizrashid1995

Android Developer Training
Android Developer TrainingAndroid Developer Training
Android Developer Trainingfaizrashid1995
 
Android Developer Training
Android Developer TrainingAndroid Developer Training
Android Developer Trainingfaizrashid1995
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoptionfaizrashid1995
 
Big Data Courses In Mumbai
Big Data Courses In MumbaiBig Data Courses In Mumbai
Big Data Courses In Mumbaifaizrashid1995
 
Python Classes In Thane
Python Classes In ThanePython Classes In Thane
Python Classes In Thanefaizrashid1995
 
python classes in thane
python classes in thanepython classes in thane
python classes in thanefaizrashid1995
 
Hadoop training in mumbai
Hadoop training in mumbaiHadoop training in mumbai
Hadoop training in mumbaifaizrashid1995
 
android development training in mumbai
android development training in mumbaiandroid development training in mumbai
android development training in mumbaifaizrashid1995
 

More from faizrashid1995 (12)

Hadoop Training
Hadoop TrainingHadoop Training
Hadoop Training
 
Android Developer Training
Android Developer TrainingAndroid Developer Training
Android Developer Training
 
Android Developer Training
Android Developer TrainingAndroid Developer Training
Android Developer Training
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
What is python
What is pythonWhat is python
What is python
 
The mean stack
The mean stackThe mean stack
The mean stack
 
Big Data Courses In Mumbai
Big Data Courses In MumbaiBig Data Courses In Mumbai
Big Data Courses In Mumbai
 
Python Classes In Thane
Python Classes In ThanePython Classes In Thane
Python Classes In Thane
 
python classes in thane
python classes in thanepython classes in thane
python classes in thane
 
Hadoop training in mumbai
Hadoop training in mumbaiHadoop training in mumbai
Hadoop training in mumbai
 
Advanced java course
Advanced java courseAdvanced java course
Advanced java course
 
android development training in mumbai
android development training in mumbaiandroid development training in mumbai
android development training in mumbai
 

Recently uploaded

6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...Nguyen Thanh Tu Collection
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Sarah Lahm In Media Res Media Component
Sarah Lahm  In Media Res Media ComponentSarah Lahm  In Media Res Media Component
Sarah Lahm In Media Res Media ComponentInMediaRes1
 
Paul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media ComponentPaul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media ComponentInMediaRes1
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptx16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptxUmeshTimilsina1
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfChristalin Nelson
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 
Israel Genealogy Research Assoc. April 2024 Database Release
Israel Genealogy Research Assoc. April 2024 Database ReleaseIsrael Genealogy Research Assoc. April 2024 Database Release
Israel Genealogy Research Assoc. April 2024 Database Release
 
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Sarah Lahm In Media Res Media Component
Sarah Lahm  In Media Res Media ComponentSarah Lahm  In Media Res Media Component
Sarah Lahm In Media Res Media Component
 
Paul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media ComponentPaul Dobryden In Media Res Media Component
Paul Dobryden In Media Res Media Component
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptx16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptx
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdf
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 
Teaching Critical AI Literacies - Maha Bali
Teaching Critical AI Literacies - Maha BaliTeaching Critical AI Literacies - Maha Bali
Teaching Critical AI Literacies - Maha Bali
 

What is Hadoop: An Introduction to Hadoop Components and Use Cases

  • 2.  When we look at how data was handled in the past, we see that it was a fairly easy task due to the limited amount of data that professionals had to work with. Years ago, only one processor and storage unit was required to handle data.  It was handled with the concept of structured data and a database that contained the relevant data.  SQL queries made it possible to go through giant spreadsheets with multiple rows and columns.
  • 3.  As the years went by and data generation increased, higher volumes and more formats emerged. Hence, multiple processors were needed to process data in order to save time.  However, a single storage unit became the bottleneck due to the network overhead that was generated. This led to using a distributed storage unit for each processor, which made data access easier.  This method is known as parallel processing with distributed storage - various computers run the processes on various storages.
  • 4.  Big Data and its Challenges  Big data refers to the massive amount of data which cannot be stored, processed and analyzed using traditional ways.  The main elements of big data are:  Volume - There is a massive amount of data generated every second.  Velocity - The speed at which data is generated, collected and analyzed  Variety - The different types of data: structured, semi-structured, unstructured
  • 5.  Value - The ability to turn data into useful insights for your business  Veracity - Trustworthiness in terms of quality and accuracy  Hadoop and its Components  Hadoop is a framework that uses distributed storage and parallel processing to store and manage big data. It is the most commonly used software to handle big data. There are two components of Hadoop.  Hadoop HDFS - Hadoop Distributed File System (HDFS) is the storage unit of Hadoop.  Hadoop MapReduce - Hadoop MapReduce is the processing unit of Hadoop.
  • 6.  Hadoop HDFS  Data is stored in a distributed manner in HDFS. There are two of HDFS - name node and data node. While there is only one name there can be multiple data nodes.  HDFS is specially designed for storing huge datasets in commodity hardware. An enterprise version of a server costs roughly $10,000 per terabyte for the full processor. In case you need to buy 100 of these enterprise version servers, it will go up to a million dollars.  Hadoop enables you to use commodity machines as your data nodes.
  • 7.  Features of HDFS  Provides distributed storage  Can be implemented on commodity hardware  Provides data security  Highly fault tolerant - If one machine goes down, the data from that machine goes to the next machine  Master and slave nodes  Master and slave nodes form the HDFS cluster. The name node is called the master and the data nodes are called the slaves.
  • 8.  Hadoop MapReduce  Hadoop MapReduce is the processing unit of Hadoop. In the MapReduce approach, the processing is done at the slave nodes and the final result is sent to the master node.  A data containing code is used to process the entire data. This coded data usually very small in comparison to the data itself.  You only need to send a few kilobytes worth of code to perform heavy duty process on computers.  The input dataset is first split into chunks of data. In this example, the input has three lines of text with three separate entities - “bus car train”, “ship ship train”, “bus ship car”.
  • 9.  The dataset is then split into three chunks, based on these entities, and processed parallelly.  In the map phase, the data is assigned a key and a value of 1. In this case, we have one bus, one car, one ship, and one train.  These key-value pairs are then shuffled and sorted together based on their keys. At the reduce phase, the aggregation takes place and the final output is obtained.  Hadoop YARN  Hadoop YARN stands for Yet Another Resource Negotiator. It is the resource management unit of Hadoop and is available as a component of Hadoop version 2.
  • 10.  Hadoop YARN acts like an OS to Hadoop. It is a file system that is built on top of HDFS.  It is responsible for managing cluster resources to make sure you don't overload one machine.  It performs job scheduling to make sure that the jobs are scheduled in the right place  Suppose a client machine wants to do a query or fetch some code for data analysis. This job request goes to the resource manager (Hadoop Yarn), which is responsible for resource allocation and management.
  • 11.  In the node section, each of the nodes has their own node managers. These node managers manage the nodes and monitor the resource usage in the node. The containers contain a collection of physical resources, which could be RAM, CPU or hard drives. Whenever a job request comes in, the app master requests the container from the node manager. Once the node manager gets the resource, it goes back to the Resource Manager.  A Use Case of Hadoop  In this case study, we will discuss how Hadoop can combat fraudulent activities.  Let us look at the case of Zions Bancorporation. Their main challenge was in how to use the Zions security team’s approaches to combat fraudulent activities taking place. The problem was that they used an RDBMS dataset, which was unable to store and analyze huge amounts of data.
  • 12.  In other words, they were only able to analyze small amounts of data. But with a flood of customers coming in, there were so many things they couldn’t keep track of, which left them vulnerable to fraudulent activities  They began to use parallel processing. However, the data was unstructured and analyzing it was not possible. Not only did they have a huge amount of data that could not get into their databases, but they also had unstructured data.  Hadoop enabled the Zions’ team to pull all that massive amounts of data together and store it in one place. It also became possible to process and analyze the huge amounts of unstructured data that they had. It was more time efficient and the in-depth analysis of various data formats became easier through Hadoop. Zions’ team could now detect everything from malware, spears, and phishing attempts to account takeovers.
  • 14. To Know More Visit :- http://www.asterixsolution.com/big-data-hadoop-training-in-mumbai.html