SlideShare a Scribd company logo
Apache Hadoop
By Ashwin Kumar R
What is Bigdata?
Big Data is a collection of data that is huge in volume, yet growing
exponentially with time. It is a data with so large size and complexity
that none of traditional data management tools can store it or
process it efficiently. Big data is also a data but with huge size
Why we need Big data analytics?
Big data analytics helps organizations harness their data and use it
to identify new opportunities. That, in turn, leads to smarter business
moves, more efficient operations, higher profits and happier
customers.
What is hadoop?
Hadoop is an open-source software framework for storing data and running
applications on clusters of commodity hardware. It provides massive storage
for any kind of data, enormous processing power and the ability to handle
virtually limitless concurrent tasks or jobs.
Use of Hadoop ?
Apache Hadoop is an open source framework that is used to efficiently store
and process large datasets ranging in size from gigabytes to petabytes of
data. Instead of using one large computer to store and process the data,
Hadoop allows clustering multiple computers to analyze massive datasets in
parallel more quickly.
What is DFS?
A Distributed File System (DFS) as the name suggests, is a file system
that is distributed on multiple file servers or multiple locations.
It allows programs to access or store isolated files as they do with the
local ones, allowing programmers to access files from any network or
computer.
What is HDFS?
HDFS is designed to reliably store very large files across machines in a large
cluster. It stores each file as a sequence of blocks; all blocks in a file except
the last block are the same size.
The blocks of a file are replicated for fault tolerance. The block size and
replication factor are configurable per file.
Advantages
● Scalable. Hadoop is a highly scalable storage platform, because it can store
and distribute very large data sets across hundreds of inexpensive servers
that operate in parallel.
● Cost effective. Hadoop also offers a cost effective storage solution for
businesses' exploding data sets.
● Flexible.
● Fast.
● Resilient to failure.
Disadvantages
● Security Concerns.
● Vulnerable By Nature.
● Not Fit for Small Data.
● Potential Stability Issues.
● General Limitations.
Thank you !

More Related Content

Similar to Fundamentals of Apache Hadoop in Bigdata

Similar to Fundamentals of Apache Hadoop in Bigdata (20)

Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Hire Hadoop Developer
Hire Hadoop DeveloperHire Hadoop Developer
Hire Hadoop Developer
 
Hadoop
HadoopHadoop
Hadoop
 
HDFS
HDFSHDFS
HDFS
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Hadoop Training in Delhi
Hadoop Training in DelhiHadoop Training in Delhi
Hadoop Training in Delhi
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop .pdf
Hadoop .pdfHadoop .pdf
Hadoop .pdf
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
КАТЕРИНА АБЗЯТОВА «Ефективне планування тестування ключові аспекти та практ...
КАТЕРИНА АБЗЯТОВА  «Ефективне планування тестування  ключові аспекти та практ...КАТЕРИНА АБЗЯТОВА  «Ефективне планування тестування  ключові аспекти та практ...
КАТЕРИНА АБЗЯТОВА «Ефективне планування тестування ключові аспекти та практ...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
UiPath New York Community Day in-person event
UiPath New York Community Day in-person eventUiPath New York Community Day in-person event
UiPath New York Community Day in-person event
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

Fundamentals of Apache Hadoop in Bigdata

  • 2. What is Bigdata? Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size
  • 3. Why we need Big data analytics? Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers.
  • 4. What is hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.
  • 5. Use of Hadoop ? Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.
  • 6. What is DFS? A Distributed File System (DFS) as the name suggests, is a file system that is distributed on multiple file servers or multiple locations. It allows programs to access or store isolated files as they do with the local ones, allowing programmers to access files from any network or computer.
  • 7. What is HDFS? HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file.
  • 8. Advantages ● Scalable. Hadoop is a highly scalable storage platform, because it can store and distribute very large data sets across hundreds of inexpensive servers that operate in parallel. ● Cost effective. Hadoop also offers a cost effective storage solution for businesses' exploding data sets. ● Flexible. ● Fast. ● Resilient to failure.
  • 9. Disadvantages ● Security Concerns. ● Vulnerable By Nature. ● Not Fit for Small Data. ● Potential Stability Issues. ● General Limitations.