Hadoop Foundation for Analytics
History of Hadoop
Features of Hadoop
Key Advantages of Hadoop
Why Hadoop
Versions of Hadoop
Eco Projects
Essential of Hadoop ecosystem
RDBMS versus Hadoop
Key Aspects of Hadoop
Components of Hadoop
If you are search Best Engineering college in India, Then you can trust RCE (Roorkee College of Engineering) services and facilities. They provide the best education facility, highly educated and experienced faculty, well furnished hostels for both boys and girls, top computerized Library, great placement opportunity and more at affordable fee.
If you are search Best Engineering college in India, Then you can trust RCE (Roorkee College of Engineering) services and facilities. They provide the best education facility, highly educated and experienced faculty, well furnished hostels for both boys and girls, top computerized Library, great placement opportunity and more at affordable fee.
Presented By :- Rahul Sharma
B-Tech (Cloud Technology & Information Security)
2nd Year 4th Sem.
Poornima University (I.Nurture),Jaipur
www.facebook.com/rahulsharmarh18
Apache Hadoop software library is essentially a framework that
allows for the distributed processing of large data-sets across
clusters of computers using a simple programming model.
The Apache Hadoop software library is essentially a framework that allows for the distributed processing of large datasets across clusters of computers using a simple programming model. Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage.
This presentation discusses the follow topics
What is Hadoop?
Need for Hadoop
History of Hadoop
Hadoop Overview
Advantages and Disadvantages of Hadoop
Hadoop Distributed File System
Comparing: RDBMS vs. Hadoop
Advantages and Disadvantages of HDFS
Hadoop frameworks
Modules of Hadoop frameworks
Features of 'Hadoop‘
Hadoop Analytics Tools
Introduction to Cloud
Definition
Vision on Cloud Computing
A Closer Look-NewYork Times, Washington Post, private cloud,Public Cloud, Hybrid Cloud, Reference Model, Actors in Cloud Computing, Characteristics and Benefits, Challenges Ahead, History of Cloud Computing, Distributed system, Virtualization, PROS and CONS of Cloud Computing. Technology Examples
More Related Content
Similar to M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Presented By :- Rahul Sharma
B-Tech (Cloud Technology & Information Security)
2nd Year 4th Sem.
Poornima University (I.Nurture),Jaipur
www.facebook.com/rahulsharmarh18
Apache Hadoop software library is essentially a framework that
allows for the distributed processing of large data-sets across
clusters of computers using a simple programming model.
The Apache Hadoop software library is essentially a framework that allows for the distributed processing of large datasets across clusters of computers using a simple programming model. Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage.
This presentation discusses the follow topics
What is Hadoop?
Need for Hadoop
History of Hadoop
Hadoop Overview
Advantages and Disadvantages of Hadoop
Hadoop Distributed File System
Comparing: RDBMS vs. Hadoop
Advantages and Disadvantages of HDFS
Hadoop frameworks
Modules of Hadoop frameworks
Features of 'Hadoop‘
Hadoop Analytics Tools
Introduction to Cloud
Definition
Vision on Cloud Computing
A Closer Look-NewYork Times, Washington Post, private cloud,Public Cloud, Hybrid Cloud, Reference Model, Actors in Cloud Computing, Characteristics and Benefits, Challenges Ahead, History of Cloud Computing, Distributed system, Virtualization, PROS and CONS of Cloud Computing. Technology Examples
M. FLORENCE DAYANA/unit - II logic gates and circuits.pdfDr.Florence Dayana
Logic Gates, Truth Table, AND Gate
Types of Digital Logic AND Gate, The 2-input and 3-input AND Gate, OR Gate, Types of Digital Logic AND Gate, The 2-input OR gate, The 3-input OR gate, NOT Gate, NAND Gate, The 2-input NAND Gate, The 3-input NAND Gate, NOR Gate, 2-input NOR gate
Just like other gates, XOR gate or Exclusive-OR gate
Reading, Pre Task, Reading Strategies, Types of reading, Reading Comprehension, Questions, Comparison, Group Discussion, Identify the Meaning, positive vibration, vocabulary
Listening, form of communication, Process Description, Definition, Model Video for Listening, Questions, Procedure for Flowchart, Pre Listening, Post Listening, Motivational video, comparison video
Input Devices-Keyboard, Mouse, Trackball, Joystick, Scanner and Types, Barcode Reader, Voice Recognition, Web Camera, Optical character recognition, Optical Mark recognition, Monitor, Printer and Types, Plotter
Definition, SSL Concepts Connection and Service, SSL Architecture, SSL Record Protocol, Record Format, Higher Layer Protocol, Handshake Protocol- Change Cipher Specification and lert Protocol
Introduction, networking, types of network, connections, packet switching, open systems, protocols, firewalls, mime types, addresses, domain name system
XML Introduction,Syntax of XML,Well formed XML Documents,XML Document Structure,Document Type Definitions,XML Namespace,XML Schemas,DOM(Document Object Model)
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
1. Bon Secours College for Women
Accredited with A++ Grade by NAAC in Cycle-II
Recognized by 2(f) and 12(B) Institution, Vilar, Bypass,
Thanjavur.
Dr.M.FLORENCE DAYANA
Assistant Professor
Department of Computer Applications
Year : 2023 – 2024 Class : II-MSc. CS
Semester : III
Course : Big Data Analytics (PP22CSCC31 )
Unit : IV
Hadoop Foundation for Analytics
2. History pf Hadoop
Features
Key Advantages of Hadoop
Why Hadoop
Versions of Hadoop
Essential of Hadoop ecosystem
RDBMS versus Hadoop
Key Aspects of Hadoop
Components of Hadoop
Hadoop Foundation for Analytics
3. • Hadoop was created by Doug Cutting and Mike
Cafarella in 2005.
• It was originally developed to support distribution
for the Nutch search engine project.
• In 2006, Hadoop was released by Yahoo and today
is maintained and distributed by Apache Software
Foundation (ASF).
History of Hadoop
4. Handles massive quantities of structured, semi structured
and unstructured data using commodity h/w
Has shared nothing architecture
Replicates data across multiple computers-Replica
For high throughput rather than latency
Batch processing therefore response time is not immediate
Complements OLTP and OLAP
Not a replacement for RDBMS
Not good when work cannot be parallelized
Not good for processing small files
Features
5. Key Advantages of Hadoop
1. Stores data in its native form(HDFS)
No structure that is imposed in keying or storing data
Schema less
Only when data needs to be processed that structure
is imposed on new data
2. Scalable
Can store and distribute very large data sets across
hundred of inexpensive servers that operate in
parallel
3. Cost Effective
Has a much reduced cost/terabyte of storage and
processing
6. Key Advantages of Hadoop
4. Resilient to Failure
Fault tolerant. Practices replication of data. When
data is sent, it is replicated.
5. Flexibility
Works with all type of data structures. Helps drive
meaningful information from email, social media.
ClickStreamData.
Put to several purpose such as log analysis, data
mining, recommendation systems, market campaign
analysis etc.
6. Fast
Extremely fast. Moves code to data.
8. • Hadoop 1.0
• • Data storage Framework
• • Data processing
Versions of Hadoop
Hadoop 1.0
Hadoop 2.0
9. Hadoop 1.0
Data storage Framework
• HDFS is schemaless. Stores data files in data format.
• Stores files close to original form.
Data processing framework:
• Uses two functions MAP and REDUCE to process data.
• “Mappers” take in a set of key value pairs and generate
intermediate data.
• “Reducers” act on this input to produce the output data.
Two functions work in isolation enabling high distributed in
a high parallel, fault tolerant and scalable way
Versions of Hadoop
10. Limitations
• Requires MapReduce programming expertise with
proficiency required in other programming languages
like Java
• Supported batch processing suitable for tasks such as
log analysis, large scale data mining projects.
• Tightly computationally coupled with MapReduce.
Either rewrite their functionality in MapReduce so that
it could be executed in Hadoop or extract the data from
HDFS and process it outside of Hadoop. None of the
options were viable as a Hadoop. Led to process
inefficiencies caused by the data being moved in and
out of Hadoop cluster.
Hadoop 1.0
11. • HDFS continues to be the data storage framework.
• Yet Another Resource Negotiator(YARN) has been
added
• Any application capable of dividing itself into
parallel tasks is supported by YARN
• YARN co ordinates the allocation of the subtasks of
the submitted applications thereby enhancing
flexibility, scalability and efficiency of the
applications
Hadoop 2.0
12. • It works by having ApplicationMaster in place of
the JobTracker , Running applications on resources
governed by a new NodeManager
• MapReduce programming expertise is no longer
required
• It supports Batch Processing and also Real time
processing
• Data Processing Functions such as Data
Standardisation, Master Data Management can
now be performed in HDFS.
Hadoop 2.0
13. Supports projects to enhance the functionality of
Hadoop Core Components
The Eco projects
• HIVE
• PIG
• SQOOP
• HBASE
• FLUME.
• OOZIE
• MAHOUT
Essential of Hadoop Ecosystems
14. Essential of Hadoop Ecosystems
The Eco projects are
• HIVE: It enables analysis of large data sets using a
language similar to standard ANSI SQL. Enables to access
data stored on a Hadoop Cluster
• PIG: Easy to understand data flow language. Helps with
the analysis of large data sets. Even without the
proficiency in MapReduce, the data in the Hadoop cluster
can be analysed as PIG scripts are automatically
converted into MapReduce jobs by the PIG interpreter
• SQOOP: Used to transfer bulk data between Hadoop and
structured data stores as RDBMS
15. • HBASE: It is Hadoop’s database and compares well with
an RDBMS. It supports structured data storage for large
tables
• FLUME:Is a distributed, reliable and available software
for efficiently collecting, aggregating and moving large
amounts of log data. Has simple and flexible
architecture.
• OOZIE: It is a workflow scheduler system to manage
Apache Hadoop jobs
• MAHOUT: It is a scalable machine learning and data
mining library
Essential of Hadoop Ecosystems
16. RDBMS versus HADOOP
PARAMETERS RDBMS HADOOP
System Relational database
Management System
Node Based Flat Structure
Data Suitable for structured
data
Suitable for structured, unstructured data,
Supports variety of data formats in real time
such as XML, JSON, text based flat file
formats etc.
Processing OLTP Analytical, Big Data Processing
Choice When the data needs
consistent Relationship
Big Data processing, which does not require
any consistent relationships between data
Processor Needs expensive
hardware or high-end
processors to store
huge volumes of data
In a HADOOP cluster, a node requires only a
processor, a network card and few hard
drives
Cost Cost around $10,000
to $14,000 per
terabytes of storage
Cost around $4,000 per terabytes of storage
17. 1
• Open Source Software
• It is free to download, use and contribute
2
• Framework
• The requirements to develop and execute and application is
provided-program tools etc.
3
• Distributed
• Divides and stores data across multiple computers.
• Computation/Processing is done in parallel across multiple
connected nodes
4
• Massive Storage
• Stores colossal amounts of data across nodes of low-cost commodity
hardware
5
• Faster Processing
• Large amounts of data is processed in parallel yielding quick
response
Key Aspects of Hadoop
19. HDFS
Storage Components
Distribute data across several
nodes
Natively redundant
MapReduce
Computational framework
Splits a task across several nodes
Process data in parallel
Core Components Hadoop Ecosystem
• HIVE
• PIG
• SQOOP
• HBASE
• FLUME
• OOZIE
• MAHOUT