Hadoop_Introduction_pptx.pptx

•Download as PPTX, PDF•

0 likes•11 views

Shrinivasa6

Module-2 introduction to hadoop ppt

Engineering

 Introduction to Hadoop
 Hadoop nodes & daemons
 Hadoop Architecture
 Characteristics
 Hadoop Features
2

The Technology that empowers Yahoo, Facebook, Twitter, Walmart and
others
Hadoop
3

An Open Source framework
that allows distributed
processing of large data-sets
across the cluster of
commodity hardware
4

An Open Source framework
that allows distributed
processing of large data-sets
across the cluster of
commodity hardware
Open Source
 Source code is freely
available
 It may be redistributed
and modified
5

An open source framework
that allows Distributed
Processing of large data-sets
across the cluster of
commodity hardware
Distributed Processing
 Data is processed/
distributed on multiple
nodes / servers
 Multiple machines
processes the data
independently
6

An open source framework
that allows distributed
processing of large data-sets
across the Cluster of
commodity hardware
Cluster
 Multiple machines
connected together
 Nodes are connected via
LAN
7

An open source framework
that allows distributed
processing of large data-sets
across the cluster of
Commodity Hardware
Commodity Hardware
 Economic / affordable
machines
 Typically low
performance hardware
8

 Open source framework written in Java
 Inspired by Google's Map-Reduce programming model as
well as its file system (GFS)
9

Hadoop defeated
Super computer
Hadoop became
top-level project
launched Hive,
SQL Support for Hadoop
Development of
started as Lucene sub-project
published GFS &
MapReduce papers
2002 2003 2005 2006 2008
Doug Cutting started
working on
Doug Cutting added
DFS & MapReduce
in
converted 4TB of
image archives over
100 EC2 instances
Doug Cutting
joined Cloudera
2009
2004
Hadoop History
2007
10

Master Node Slave Node
Resource
Manager
NameNode
Node
Manager
DataNode
Nodes
13

Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
14

 Source code is freely
available
 Can be redistributed
 Can be modified
Free
Affordabl
e
Communi
ty
Transpare
nt
Inter-
operable
No
vendor
lock
Open
Source
16

 Data is processed
distributedly on cluster
 Multiple nodes in the
cluster process data
independently
Centralized Processing
Distributed Processing
17

 Failure of nodes are
recovered automatically
 Framework takes care of
failure of hardware as well
tasks
18

 Data is reliably stored on
the cluster of machines
despite machine failures
 Failure of nodes doesn’t
cause data loss
19

 Data is highly available
and accessible despite
hardware failure
 There will be no downtime
for end user application
due to data
20

 Vertical Scalability – New
hardware can be added to
the nodes
 Horizontal Scalability –
New nodes can be added
on the fly
21

 No need to purchase costly license
 No need to purchase costly hardware
Economic
Open
Source
Commodity
Hardware =
+
22

 Distributed computing
challenges are handled by
framework
 Client just need to concentrate
on business logic
23

 Move computation to data
instead of data to
computation
 Data is processed on the
nodes where it is stored
Storage Servers App Servers
Dat
a
Dat
a
Dat
a
Dat
a
Servers
Dat
a
Dat
a
Dat
a
Dat
a
Algorith
m
Alg
o
Alg
o
Alg
o
Alg
o
24

 Everyday we generate 2.5 quintillion bytes of data
 Hadoop handles huge volumes of data efficiently
 Hadoop uses the power of distributed computing
 HDFS & Yarn are two main components of Hadoop
 It is highly fault tolerant, reliable & available
25

Similar to Hadoop_Introduction_pptx.pptx

2016 August POWER Up Your Insights - IBM System Summit Mumbai

Anand Haridass

OPERATING SYSTEM .pptx

AltafKhadim

Big data Analytics Hadoop

Mishika Bharadwaj

Hadoop architecture-tutorial

vinayiqbusiness

Hadoop introduction

Chirag Ahuja

Hadoop and Big Data

Harshdeep Kaur

We can make a data mining to get the prediction about the future data, which is mined from an old data especially Big data using a machine learning algorithms based on two clusters. One is the intrinsic for managing the file system of Big data, which is called Hadoop. The other is essentially to make fast analysis of Big data which is called Apache Spark. In order to achieve this purpose we will use R based on Rstudio or Scala based on Zeppelin.

Analyzing Big data in R and Scala using Apache Spark 17-7-19

Ahmed Elsayed

In Data Engineer's Lunch #55, CEO of Anant, Rahul Singh, will cover 10 resources every data engineer needs to get started or master their game. Accompanying Blog: Coming Soon! Accompanying YouTube: Coming Soon! Sign Up For Our Newsletter: http://eepurl.com/grdMkn Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday: https://www.meetup.com/Data-Wranglers-DC/events/ Cassandra.Link: https://cassandra.link/ Follow Us and Reach Us At: Anant: https://www.anant.us/ Awesome Cassandra: https://github.com/Anant/awesome-cassandra Email: solutions@anant.us LinkedIn: https://www.linkedin.com/company/anant/ Twitter: https://twitter.com/anantcorp Eventbrite: https://www.eventbrite.com/o/anant-1072927283 Facebook: https://www.facebook.com/AnantCorp/ Join The Anant Team: https://www.careers.anant.us

Data Engineer's Lunch #55: Get Started in Data Engineering

Anant Corporation

Bigdata and Hadoop Introduction

umapavankumar kethavarapu

Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women

maharajothip1

Introduction to hazelcast

Emin Demirci

Hadoop has quickly evolved into the system of choice for storing and processing Big Data, and is now widely used to support mission-critical applications that operate within a ‘data lake’ style infrastructures. A critical requirement of such applications is the need for continuous operation even in the event of various system failures. This requirement has driven adoption of multi-data center Hadoop architectures, a.k.a geo-distributed or global Hadoop. In this session we will provide a brief introduction to WANdisco, then dig into how our Non-Stop Hadoop solution addresses real world use cases, and also a show live demonstration of Non-Stop namenode operation across two WAN connected hadoop clusters.

WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

Chris Almond

Apache Hadoop- Hadoop Basics.pptx

Miraj Godha

IRJET- Secured Hadoop Environment

IRJET Journal

Big Data and virtualization are two of the hottest trends in the industry today, yet the full potential for bringing the two together has not been fully realized. In this session, learn how virtualization brings the advantages of greater elasticity, stronger isolation for multi-tenancy, and a click HA protection to Hadoop, while maintaining the comparable performance to Hadoop on physical machines. Objective 1: Understand the benefits of virtualizing Hadoop. After this session you will be able to: Objective 2: Understand how to get started with Pivotal HD Hadoop . Objective 3: Understand where to find more information.

Pivotal: Virtualize Big Data to Make the Elephant Dance

EMC

Final White Paper_

Ryan Ellingson

John Sing's Edge 2013 presentation, detailing when/where/how external storage products and/or system software (i.e. GPFS) can be effectively used in a Hadoop storage environment. Many Hadoop situations absolutely required direct attached storage. However, there are many intelligent situations where shared external storage may make sense in a Hadoop environment. This presentation details how/why/where, and promotes taking an intelligent, Hadoop-aware approach to deciding between internal storage and external shared storage. Having full awareness of Hadoop considerations is essential to selecting either internal or external shared storage in Hadoop environment.

Hadoop_Its_Not_Just_Internal_Storage_V14

John Sing

Hadoop jon

Humoyun Ahmedov

BDA Mod2@AzDOCUMENTS.in.pdf

KUMARRISHAV37

Big Data Processing Above and Beyond Hadoop: Data-intensive computing represents a new computing paradigm to address Big Data processing requirements using high-performance architectures supporting scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement new applications previously thought to be impractical or infeasible. The fundamental challenges of data-intensive computing are managing and processing exponentially growing data volumes, significantly reducing associated data analysis cycles to support practical, timely applications, and developing new algorithms which can scale to search and process massive amounts of data. The open source HPCC (High-Performance Computing Cluster) Systems platform offers a unified approach to Big Data processing requirements: (1) a scalable, integrated computer systems hardware and software architecture designed for parallel processing of data-intensive computing applications, and (2) a new programming paradigm in the form of a high-level, declarative, data-centric programming language designed specifically for big data processing. This presentation explores the challenges of data-intensive computing from a programming perspective, and describes the ECL programming language and the HPCC architecture designed for data-intensive computing applications. HPCC is an alternative to the Hadoop platform, and ECL is compared to Pig Latin, a high-level language developed for the Hadoop MapReduce architecture.

Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016

MLconf

Similar to Hadoop_Introduction_pptx.pptx (20)

2016 August POWER Up Your Insights - IBM System Summit Mumbai

OPERATING SYSTEM .pptx

Big data Analytics Hadoop

Hadoop architecture-tutorial

Hadoop introduction

Hadoop and Big Data

Analyzing Big data in R and Scala using Apache Spark 17-7-19

Data Engineer's Lunch #55: Get Started in Data Engineering

Bigdata and Hadoop Introduction

Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women

Introduction to hazelcast

WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014

Apache Hadoop- Hadoop Basics.pptx

IRJET- Secured Hadoop Environment

Pivotal: Virtualize Big Data to Make the Elephant Dance

Final White Paper_

Hadoop_Its_Not_Just_Internal_Storage_V14

Hadoop jon

BDA Mod2@AzDOCUMENTS.in.pdf

Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016

Recently uploaded

In this relay, the armature is attracted to the pole of a magnet. The electromagnetic force exerted on the moving element is proportional to the square of the current flow through the coil. This relay responds to both the alternating and direct current. For AC quantity the electromagnetic force developed is given as equation-1The above equation shows that the electromagnetic relay consists two components, one constant independent of time and another dependent upon time and pulsating at double supply frequency. This double supply frequency produces noise and hence damage the relay contacts. The difficulty of a double frequency supply is overcome by splitting the flux developing in the electromagnetic relay. These fluxes were acting simultaneously but differ in time phase. Thus the resulting deflecting force is always positive and constant. The splitting of fluxes is achieved by using the electromagnet having a phase shifting networks or by putting shading rings on the poles of an electromagnet. The electromagnetic attraction relay is the simplest type of relay which includes a plunger (or solenoid), hinged armature, rotating armature (or balanced) and moving iron polarised relay.

Electromagnetic relays used for power system .pptx

NANDHAKUMARA10

Max. shear stress theory-Maximum Shear Stress Theory Maximum Distortional ...

ronahami

Theory of Time 2024 (Universal Theory for Everything)

Ramkumar k

Saudi Arabia [ Abortion pills) Jeddah/riaydh/dammam/+918761049707☎️] cytotec tablets uses abortion pills 💊💊 How effective is the abortion pill? 💊💊 +918761049707) "Abortion pills in Jeddah" how to get cytotec tablets in Riyadh " Abortion pills in dammam*💊💊 The abortion pill is very effective. If you’re taking mifepristone and misoprostol, it depends on how far along the pregnancy is, and how many doses of medicine you take:💊💊 +918761049707) how to buy cytotec pills At 8 weeks pregnant or less, it works about 94-98% of the time. +918761049707[ 💊💊💊 At 8-9 weeks pregnant, it works about 94-96% of the time. +918761049707) At 9-10 weeks pregnant, it works about 91-93% of the time. +918761049707)💊💊 If you take an extra dose of misoprostol, it works about 99% of the time. At 10-11 weeks pregnant, it works about 87% of the time. +918761049707) If you take an extra dose of misoprostol, it works about 98% of the time. In general, taking both mifepristone and+918761049707 misoprostol works a bit better than taking misoprostol only. +918761049707 Taking misoprostol alone works to end the+918761049707 pregnancy about 85-95% of the time — depending on how far along the+918761049707 pregnancy is and how you take the medicine. +918761049707 The abortion pill usually works, but if it doesn’t, you can take more medicine or have an in-clinic abortion. +918761049707 When can I take the abortion pill?+918761049707 In general, you can have a medication abortion up to 77 days (11 weeks)+918761049707 after the first day of your last period. If it’s been 78 days or more since the first day of your last+918761049707period, you can have an in-clinic abortion to end your pregnancy.+918761049707 Why do people choose the abortion pill? Which kind of abortion you choose all depends on your personal+918761049707preference and situation. With+918761049707medication+918761049707 abortion, some people like that you don’t need to have a procedure in a doctor’s office. You can have your medication abortion on your own+918761049707schedule, at home or in another comfortable place that you choose.+918761049707 You get to decide who you want to be with during your abortion, or you can go it alone. Because+918761049707medication abortion is similar to a miscarriage, many people feel like it’s more “natural” and less invasive. And some+918761049707 people may not have an in-clinic abortion provider close by, so abortion pills are more available to+918761049707 them. +918761049707 Your doctor, nurse, or health center staff can help you decide which kind of abortion is best for you. More questions from patients: Saudi Arabia+918761049707 CYTOTEC Misoprostol Tablets. Misoprostol is a medication that can prevent stomach ulcers if you also take NSAID medications. It reduces the amount of acid in your stomach, which protects your stomach lining. The brand name of this medication is Cytotec®️.+918761049707) Unwanted Kit is a combination of two medicines, which is used for

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait

jaanualu31

Although the Dark web was originally used for maintaining privacy-sensitive communication for business or intelligence services for defence, government and business organizations, fighting against censorship and blocked content, later, the advantage of technologies behind the Dark web were abused by criminals to conduct crimes which involve drug dealing to the contract of assassinations in a widespread manner. Since the communication remains secure and untraceable, criminals can easily use dark web service via The Onion Router (TOR), can hide their illegal motives and can conceal their criminal activities. This makes it very difficult to monitor and detect cybercrimes over the dark web. With the evolution of machine learning, natural language processing techniques, computational big data applications and hardware, there is a growing interest in exploiting dark web data to monitor and detect criminal activities. Due to the anonymity provided by the Dark Web, the rapid disappearance and the change of the uniform resource locators (URLs) of the resources, it is not as easy to crawl the Drak web and get the data as the usual surface web which limits the researchers and law enforcement agencies to analyse the data. Therefore, there is an urgent need to study the technology behind the Dark web, its widespread abuse, its impact on society and the existing systems, to identify the sources of drug deal or terrorism activities. In this research, we analysed the predominant darker sides of the world wide web (WWW), their volumes, their contents and their ratios. We have performed the analysis of the larger malicious or hidden activities that occupy the major portions of the Dark net; tools and techniques used to identify cybercrimes which happen inside the dark web. We applied a systematic literature review (SLR) approach on the resources where the actual dark net data have been used for research purposes in several areas. From this SLR, we identified the approaches (tools and algorithms) which have been applied to analyse the Dark net data, the key gaps as well as the key contributions of the existing works in the literature. In our study, we find the main challenges to crawl the dark web and collect forum data are: scalability of crawler, content selection trade off, and social obligation for TOR crawler and the limitations of techniques used in automatic sentiment analysis to understand criminals’ forums and thereby monitor the forums. From the comprehensive analysis of existing tools, our study summarizes the most tools. However the forum topics rapidly change as their sources changes; criminals inject noises to obfuscate the forum’s main topic and thus remain undetectable. Therefore supervised techniques fail to address the above challenges. Semi-supervised techniques would be an interesting research direction.

Cybercrimes in the Darknet and Their Detections: A Comprehensive Analysis and...

dannyijwest

Basic Electronics for diploma students as per technical education Kerala Syll...

ppkakm

Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...

ssuserdfc773

Signal Processing and Linear System Analysis

National Chung Hsing University

UNIT 4 PTRP final Convergence in probability.pptx

kalpana413121

scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...

HenryBriggs2

NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...

Amil baba

HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx

SCMS School of Architecture

Education system forms the backbone of every nation. And hence it is important to provide a strong educational foundation to the young generation to ensure the development of open-minded global citizens securing the future for everyone. Advanced technology available today can play a crucial role in streamlining education-related processes to promote solidarity among students, teachers and the school staff. School Management System(SMS) consists of tasks such as registering students, attendance record keeping to control absentees, producing report cards, producing official transcript, preparing timetable and producing different reports for teachers, officials from Dr.Mohiuddin Education foundation and other stakeholders. Automation is the utilization of technology to replace human with a machine that can perform more quickly and more continuously. By automating SMS documents that took up many large storage rooms can be stored on few disks. Transcript images can be annotate. It reduces the time to retrieve old transcripts from hours to seconds.

School management system project Report.pdf

Kamal Acharya

Worksharing and 3D Modeling with Revit.pptx

Mustafa Ahmed

Path loss model, OKUMURA Model, Hata Model

DrAjayKumarYadav4

PE 459 LECTURE 2- natural gas basic concepts and properties

sarkmank1

Databricks Generative AI Fundamentals .pdf

VinayVadlagattu

Microprocessors & Microcontrollers: Interrupt controller 8259; The 8259 is known as the Programmable Interrupt Controller (PIC) microprocessor. In 8085 and 8086 there are five hardware interrupts and two hardware interrupts respectively. Bu adding 8259, we can increase the interrupt handling capability. This chip combines the multi-interrupt input source to single interrupt output. This provides 8-interrupts from IR0 to IR7. Let us see some features of this microprocessor. This chip is designed for 8085 and 8086. It can be programmed either in edge triggered, or in level triggered mode We can mask individual bits of Interrupt Request Register. By cascading 8259 chips, we can increase interrupts up to 64 interrupt lines Clock cycle is not needed. 8259 microprocessor can be programmed according to given interrupts condition and it can be provided either with level or edge-triggered interrupt level. It can be programmed to either work in 8085 or in 8086 microprocessors. Individual interrupt bits can be masked. By cascading Nine 8259’s in Master-Slave Configuration we can handle up to 64 interrupt pins. It contains 3 registers commonly known as ISR, IRR, IMR & there is 1 priority resolver (PR). Interrupt Request Register (IRR): It stores those bits which are requested for their interrupt services. Interrupt Service Register (ISR): It stores the interrupt levels which is currently being served. Interrupt Mask Register (IMR): It stores interrupt levels that have to be masked. These interrupt levels are already accepted by the 8259 microprocessor. Priority Resolver (PR): It examines all the 3 registers and sets the priority of interrupts and sets the interrupt levels in ISR which has the highest priority and the rest of the interrupt bit is IRR which is already accepted. SP/EN (low active pin): If its value is 1 it works in master mode & if its value=e is 0 then it works in slave mode. Cascade Buffer: It is used to cascade more number of Programmable Interrupt Controller to increase the interrupts handling capability up to 64 levels. Advantages: Interrupt management: The 8259 microprocessor is a specialized chip that is dedicated to managing interrupts, which can help to improve system performance and reduce the workload on the main CPU. Programmability: The 8259 microprocessor is programmable, which means that it can be customized to handle specific types of interrupts and to prioritize different interrupt requests. Compatibility: The 8259 microprocessor is compatible with a wide range of microprocessors, making it a popular choice for interrupt management in many different systems. Multiple interrupt inputs: The 8259 microprocessor can handle multiple interrupt inputs, which makes it a useful peripheral for managing complex systems with multiple devices. Ease of use: The 8259 microprocessor includes simple interface pins and registers, making it relatively easy to use and program.

INTERRUPT CONTROLLER 8259 MICROPROCESSOR

TanishkaHira1

Introduction to Serverless with AWS Lambda

Omar Fathy

Introduction to Geographic Information Systems

Ange Felix NSANZIYERA

Recently uploaded (20)

Electromagnetic relays used for power system .pptx

Max. shear stress theory-Maximum Shear Stress Theory Maximum Distortional ...

Theory of Time 2024 (Universal Theory for Everything)

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait

Cybercrimes in the Darknet and Their Detections: A Comprehensive Analysis and...

Basic Electronics for diploma students as per technical education Kerala Syll...

Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...

Signal Processing and Linear System Analysis

UNIT 4 PTRP final Convergence in probability.pptx

scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...

NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...

HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx

School management system project Report.pdf

Worksharing and 3D Modeling with Revit.pptx

Path loss model, OKUMURA Model, Hata Model

PE 459 LECTURE 2- natural gas basic concepts and properties

Databricks Generative AI Fundamentals .pdf

INTERRUPT CONTROLLER 8259 MICROPROCESSOR

Introduction to Serverless with AWS Lambda

Introduction to Geographic Information Systems

Hadoop_Introduction_pptx.pptx

2.  Introduction to Hadoop  Hadoop nodes & daemons  Hadoop Architecture  Characteristics  Hadoop Features 2

3. The Technology that empowers Yahoo, Facebook, Twitter, Walmart and others Hadoop 3

4. An Open Source framework that allows distributed processing of large data-sets across the cluster of commodity hardware 4

5. An Open Source framework that allows distributed processing of large data-sets across the cluster of commodity hardware Open Source  Source code is freely available  It may be redistributed and modified 5

6. An open source framework that allows Distributed Processing of large data-sets across the cluster of commodity hardware Distributed Processing  Data is processed/ distributed on multiple nodes / servers  Multiple machines processes the data independently 6

7. An open source framework that allows distributed processing of large data-sets across the Cluster of commodity hardware Cluster  Multiple machines connected together  Nodes are connected via LAN 7

8. An open source framework that allows distributed processing of large data-sets across the cluster of Commodity Hardware Commodity Hardware  Economic / affordable machines  Typically low performance hardware 8

9.  Open source framework written in Java  Inspired by Google's Map-Reduce programming model as well as its file system (GFS) 9

10. Hadoop defeated Super computer Hadoop became top-level project launched Hive, SQL Support for Hadoop Development of started as Lucene sub-project published GFS & MapReduce papers 2002 2003 2005 2006 2008 Doug Cutting started working on Doug Cutting added DFS & MapReduce in converted 4TB of image archives over 100 EC2 instances Doug Cutting joined Cloudera 2009 2004 Hadoop History 2007 10

11. Hadoop consists of three key parts 11

12. Master Node Slave Node Nodes 12

13. Master Node Slave Node Resource Manager NameNode Node Manager DataNode Nodes 13

14. Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work 14

15. 15

16.  Source code is freely available  Can be redistributed  Can be modified Free Affordabl e Communi ty Transpare nt Inter- operable No vendor lock Open Source 16

17.  Data is processed distributedly on cluster  Multiple nodes in the cluster process data independently Centralized Processing Distributed Processing 17

18.  Failure of nodes are recovered automatically  Framework takes care of failure of hardware as well tasks 18

19.  Data is reliably stored on the cluster of machines despite machine failures  Failure of nodes doesn’t cause data loss 19

20.  Data is highly available and accessible despite hardware failure  There will be no downtime for end user application due to data 20

21.  Vertical Scalability – New hardware can be added to the nodes  Horizontal Scalability – New nodes can be added on the fly 21

22.  No need to purchase costly license  No need to purchase costly hardware Economic Open Source Commodity Hardware = + 22

23.  Distributed computing challenges are handled by framework  Client just need to concentrate on business logic 23

24.  Move computation to data instead of data to computation  Data is processed on the nodes where it is stored Storage Servers App Servers Dat a Dat a Dat a Dat a Servers Dat a Dat a Dat a Dat a Algorith m Alg o Alg o Alg o Alg o 24

25.  Everyday we generate 2.5 quintillion bytes of data  Hadoop handles huge volumes of data efficiently  Hadoop uses the power of distributed computing  HDFS & Yarn are two main components of Hadoop  It is highly fault tolerant, reliable & available 25

Hadoop_Introduction_pptx.pptx

Recommended

Recommended

More Related Content

Similar to Hadoop_Introduction_pptx.pptx

Similar to Hadoop_Introduction_pptx.pptx (20)

Recently uploaded

Recently uploaded (20)

Hadoop_Introduction_pptx.pptx