SlideShare a Scribd company logo
1 of 3
Download to read offline
International Journal of Technical Research and Applications e-ISSN: 2320-8163,
www.ijtra.com Volume 3, Issue 3 (May-June 2015), PP. 114-116
114 | P a g e
ANALYTICS OF DATA USING HADOOP-A
REVIEW
Vineet Sajwan, Vikash Yadav
MGM’s College of Engineering and Technology,
Noida Sector-62
Vin94sajwan@gmail.com, 1109510067@coet.in
ABSTRACT- In this paper, we discuss about the Big Data. We
analyze and reveals the benefits of Big Data. We analyze the
big data challenges and how Hadoop gives solution to it. This
research paper gives the comparison between relational
databases and Hadoop. This research paper also gives reason
of why Big Data and Hadoop.
General Terms
Data Explosion, Big Data, Big Data Analytics, Hadoop, Hadoop
Distributed File System, MapReduce
Keywords
Big Data, Big Data Analytics, Hadoop, Hadoop Distributed File
System, Map Reduce.
I. INTRODUCTION
We live in the age of data .It is not easy to measure the
volume of data. According to the IDC estimate the size of
“Digital World” is about 0.16 zettabytes and forecasting the
“Digital World” will grow by a factor of 300, from 130
Exabyte to 40,000 Exabyte. This flood of the data is coming
from various sources around the world. Society is being
surrounded by gadgets and as a results organization have to
store huge amount of data .To manage and process that data
is big challenge for the organization. The data from the
social websites makes a big opportunities for the
organization for understand the customer needs. This
paradigm is known as Big Data. Big Data, two popular
terms are here today for good reason. There is a big market
in big future for Big Data. Company are starting to realize
there untapped information and unstructured documents
siting across the network. Analytics is the solution that
mines unstructured and semi structured data and gives
insights to the organization from not their privately acquired
data but also from large amount of data is publicly available
from various sources. We have emails, imagine we can data
mine our emails and get meaningful results. The kind of
information we get from our emails, documents, textiles,
spreadsheet, and pdf. All these unstructured data siting
across the network that contains answer which help us to
create new products, refining existing product, discover
trends, improve customer relation, and understand ourselves
and help us to understand even our company better. Hadoop
answer many big data challenges that we see today that is
how to store terabytes and petabytes of information ,how to
access the information quickly , how to work with variety of
data and how to do you do all scalable fault tolerant and
flexible way. There is a saying “More data usually beats
better algorithms.” It means for some problem may be your
algorithm solve that problem but it can be often be beaten by
adding more data simply.
The good news is big data is here and bad news is that we
are struggling for storing and processing it.
II. DATA EXPLOSION
If we sum up the state of data in one word it is a lot.
There’s a lot of data in our world.90% of worlds data is
created in last two years that’s say big data is here .Through
smart phones and tablets one can access data everywhere.
Social media is everywhere Facebook, twitter, Instagram,
tumbler the list goes on and on generating data with
alarming rate which is known as data explosion. Data
explosion is the explosion of unstructured and semi-
structured data. Since the end of 90’s it’s been on terror .It is
shows exponential growth with no sign of slowing down.
Structured data are pretty manageable and predictable
curve.Exmaples of the types of the data are:-
 Unstructured Data: - Things like email, pdfs, documents
on the hard drive, textiles.
 Structured Data:-Everything has schema associated with
it and checked against the schema when we put it into
the database.
 Semi-Structured:- Things that have some form of
hierarchy or associated with table but nonetheless
contain tags. Xml is a tag based format that describe data
inside the tag in its hierarchical so got some structure.
They don’t have scehma associated with it.
III. BIG COMPANIES WITH BIG DATA
A. Google
In starting Google have to make internet searchable for
they need to index a billion pages .So they built the
technology called mapReduce along with GFS(Google File
System) that’s what Hadoop is based on .Doug Cutting in
2003 join Yahoo and yahoo gave him the team of engineers.
They built Hadoop based on those white papers the GFS and
mapReduce. That’s the core technology of Hadoop i.e.
HDFS (Hadoop distributed File System) and mapReduce.
Google index billion pages using the same technology. Now
today 60 Billion pages google indexes to make searchable
for the internet in very less seconds.
B. Facebook
Facebook has the largest Hadoop cluster on this planet.
They have the cluster that contain of the 100 petabytes of
data and they have generated half petabyte of data in one
day. Anything everybody does on Facebook from login to
clicking to liking and sharing something is tracked on
Facebook and thats how they use the data to target views.
They are looking what we are clicking what our friends are
clicking and find the commonalities them something called
collaborative filtering which they use for targeting.
C. Amazon
In Amazon, recommendation system is based on some
elements like what a user had bought in the past, items they
have in their virtual shopping and item they have rated and
liked and what other customer had viewed or purchased.
Behind the recommendation system amazon’s collaborative
filtering algorithm works. This is the reason of Amazon’s
success.
International Journal of Technical Research and Applications e-ISSN: 2320-8163,
www.ijtra.com Volume 3, Issue 3 (May-June 2015), PP. 114-116
115 | P a g e
D. Twitter
400 million a day which equals to 15 terabytes of data
every day. They have thousands of Hadoop clusters setup
they run thousands of mapReduce jobs every night to
discover trends and they sell those trends to the people who
makes product so can they target those trends
E. General Motors
General Motors are car manufacture of America .In
2012, they cut off multibillion dollar contract with HP for
outsourcing .They are building 220000 square feet
warehouses in which Hadoop were installed.
(i) GM combines the Global Information System (GIS)
and data analytics to improve the performance of
their dealer. These analytics shared are shared with
the dealers who now can view local demographics,
location characteristics and customer satisfaction.
(ii) With the marketing budget of $2 billion per year GM
have lots of potential customer. But instead of mass
targeting the public they use big data analytics to
create the customer profile.GM knows who the
buyers are that want luxury cars and where they are.
(iii) GM focuses on sensors and telematics. For GM Big
Data saves 800$ per car in GM telematics .Which
means saving lie in the connected cars that can
communicate with manufacturer via 4G.
IV. 5 V’S OF BIG DATA
It is the 5 big reason we don’t use traditional computing
model which is big expensive trickled down machines with
lots of process, cords, hard drives read enable to the
processes and fault tolerance. It is the hardware solution and
we don’t use hardware solution and also we can’t use the
relational world. Relational world is use to design to handle
gigabyte of data not petabytes of data. The big data breaks
into 5 dimensions that is volume, velocity, variety, value
and veracity.
A. Volume
Volume refers to the data generated every second. We
are not talking about terabytes but Zettabytes. If we take all
the data generated in the world between beginning time and
2008 the same amount of data generated every minute. This
makes most data sets too large to store and analyses using
traditional database technology. Hadoop uses use distributed
system so we can store and analyses data across the database
around anywhere in the world.
B. Velocity
Velocity is the speed which we access the data. In
traditional computing models no matter how fast our
computer is our process is still bound by disk I/O because
our disk transfer is having involved in newly pace of
processing power. So that’s why distributed computing is
more useful and Hadoop increases strength of current
computing world. It brings computation to the data. It
doesn’t brings data to the computation. It can process the
data locally and when we have cluster of nodes to all
working together using and harnessing the power of
processor and reducing network bandwidth and navigating
the weakness of disk transfer. Yahoo break the record of
processing a terabyte of data many times using Hadoop.
C. Variety
The relational model can only handle structure data they
can get semi structured and unstructured data. Some
engineers they have ETL tool to transform and scrub the
data to bring it but that require a lot of extra work .With the
help of Hadoop we can now analyses and bring together
data of different type such as messages and social media
conversations, photos, sensor data, video and voice
recording.
D. Value
Having access to the Big data is no good unless we can
turn it into the value.
Companies are starting to generate amazing value from their
big data.
E. Veracity
Veracity refers to the uncertainty surrounding of data
which is due to the inconsistence and incompleteness of
data. It is the major challenge to keeping organized data.
V. HADOOP
Hadoop is distributed software solution. It is scalable,
fault tolerant distributed system for data storage and
processing. There is two component of Hadoop that is
HDFS which is the storage and mapReduce which is the
retrieval and organising.
 HDFS is self-healing high bandwidth cluster storage. If
we put petabyte file in Hadoop cluster. Hadoop would
break it up in the blocks and distributed across all the
nodes in a cluster where its fault tolerant function
come into play. When we setup HDFS we setup
replication factor ( by default it sets 3).When we put a
file in HDFS it’s going to make sure there are three
copies of every block that make up that file spread
across the node in a cluster. If we lose a node it going
to self heal because it knows where is data on that node
It re-replicate the blocks ever on that node to the rest of
the subversion sites of the cluster.
It has the name node and the data node. Generally one
node is name node in a cluster and rest of them are data
node. Name node is the meta data server it just hold N
memory and knows where are blocks exist on what
node on what rack spread across the cluster inside the
network.
That’s how Hadoop is fault tolerant.
 mapReduce is the two-step process at the service.
Programmer writes the mapper function which will
goes out and tell the cluster what data we want to
retrieve. The reducer takes all the data and aggregate.
Hadoop is batch processing based system. We working
on all of the data. mapReduce is all about working on all of
the data inside cluster. Hadoop is flexible because there is
no need to understand javato get data of cluster. In
Facebook, engineers built Hive because Facebook doesn’t to
force anybody to learn java for data extraction. Now
anybody is familiar with sequel which most of professional
are can pull data out of the cluster.
Hadoop is scalable because we keep adding more data into
the cluster by add more node which increases the overall
performance of cluster.
International Journal of Technical Research and Applications e-ISSN: 2320-8163,
www.ijtra.com Volume 3, Issue 3 (May-June 2015), PP. 114-116
116 | P a g e
VI. PROBLEM DEFINITION
As there exist large amount of data, the various
challenges are faced about the management of such
extensive data like unstructured data, fault tolerance and
issues regarding storage of big amount of data.
A. Problem Description
The problem is simple: the storage capacities of hard
drivers increased massively over the years-the rate at which
data can read from drives have not kept up. In 1990, one
typical drive can store 1,370 MB of data and had transfer
speed of 4.4 MB/s, so you could read the full data in about 5
minutes. 20 years later, one typical hard drives can store
terabytes but speed is around 4.4MB/s so it need half hour to
read full data of the disk.
B. Problem Solution
The obvious way to reduce time is multiple disks at
once. Imagine if we have 100 drives working in parallel we
can read the data in 2 minutes. Only 100 disks is wasteful
.But we can store datasets, each of which is one terabyte and
provide access to them.
C. Problem to Solve
The first problem to solve is hardware failure:-The
chance of failure increases as we connected the drivers. A
common way of avoiding data loss is through replication.
The Hadoop Distributed Filesystem (HDFS).
The second problem to solve is analysis task need to
combine in some way:-The data read to one disk need to be
able to combine from any of 99 disks. Various distribution
system allow this but doing this is very challenging.
MapReduce provides a programming model that abstracts
the model from disk read and writes, transforming it into
computation over sets of keys and values.
VII. HADOOP VS RDBMS
Hadoop is intelligent as we seen this in the computation
of data maximizing the strength of todays computing world
and navigating the weaknesses. In a multi rack environment
there is a switch which is rack aware. It knows what node
belongs to which rack. In a more data locality whenever
receive mapReduce job it find the shortest path to the data as
possible.
Traditional
RDBMS tools
mapReduce
Data Size In Gigabytes In petabytes
Access
Batch and
interactive
Similar
Update
Read and Write
many times
Write once and read
many times
Structure Fixed schema
Unstructured
schema
Latency Low High
Integrity High Low
Language SQL
Procedural
Language
(C++,Java)
VIII. CONCLUSION
We live in the data world. There are various challenges
in big data .there are various issue too. We came into
conclusion that big data is not just huge data it is an
opportunity to find new insights and analysis of our future
data. We have discuss various parameters of big data. We
also have discuss about comparison between RDBMS and
MapReduce. There must be support and research on big data
for getting new results.
MapReduce is good for analyze huge dataset in a batch
function where RDBMS is good for queries and update.
REFERENCES
[1] Shipa, Manjit kaur, “Big Data and Methodology”, 10 Oct,
2013
[2] Pareedpa, A.; Dr. Antony Selvadoss, “Significant Trends of
Big Data”, 8 Aug, 2013
[3] Gurpeet Singh Bedi, Ashima, “Big Data Analysis with
Dataset Scaling in Yet another Resource Negotiator
(YARN)”, 5April, 2013
[4] Hadoop-The Definitive Guide, Tom White, Edition-3, 27Jan,
2012
[5] Mrigank Mridul, Akashdeep Khajuria,Snehasish
Dutta,Kumar N, “Analysis of Big data using Apache Hadoop
and MapReduce”,Volume 4, May 2014
[6] IBM 2012, What is big data: Bring big data to the
enterprise,http://www.01.ibm.com/software/data/bigdata,
IBM
[7] Sam Madden, “From Databases to Big Data”,IEEE
computer society,2012
[8] “Data Mining with BigData” ,Xindong Wu, Xingquan Zhu,
Gong-Qing Wu, Wei Ding , 1041-4347/13/$31.00 © 2013
IEEE
[9] Russom, “Big Data Analytics” , TDWI Research,2011
[10] An Oracle White Paper, “Hadoop and NoSQL Technologies
and the Oracle DataBase”,February 2012

More Related Content

What's hot

An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social mediaSupriya Radhakrishna
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Big Data and Classification
Big Data and ClassificationBig Data and Classification
Big Data and Classification303Computing
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataRichard Vidgen
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBernard Marr
 
Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Bessie Chu
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 

What's hot (20)

BigData
BigDataBigData
BigData
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social media
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Big Data and Classification
Big Data and ClassificationBig Data and Classification
Big Data and Classification
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
 
Big data mining
Big data miningBig data mining
Big data mining
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
 
Big data
Big dataBig data
Big data
 
Big data 101
Big data 101Big data 101
Big data 101
 
Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan
 
Big Data
Big DataBig Data
Big Data
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 

Viewers also liked

Viewers also liked (20)

EFFICIENT EMBEDDED SURVEILLANCE SYSTEM WITH AUTO IMAGE CAPTURING AND EMAIL SE...
EFFICIENT EMBEDDED SURVEILLANCE SYSTEM WITH AUTO IMAGE CAPTURING AND EMAIL SE...EFFICIENT EMBEDDED SURVEILLANCE SYSTEM WITH AUTO IMAGE CAPTURING AND EMAIL SE...
EFFICIENT EMBEDDED SURVEILLANCE SYSTEM WITH AUTO IMAGE CAPTURING AND EMAIL SE...
 
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...
PHYSICO-CHEMICAL AND BACTERIOLOGICAL ASSESSMENT OF RIVER MUDZIRA WATER IN MUB...
 
UNDERSTANDING ENTREPRENEURSHIP: IMPACT OF OVERCONFIDENCE ON ENTREPRENEURSHIP
UNDERSTANDING ENTREPRENEURSHIP: IMPACT OF OVERCONFIDENCE ON ENTREPRENEURSHIPUNDERSTANDING ENTREPRENEURSHIP: IMPACT OF OVERCONFIDENCE ON ENTREPRENEURSHIP
UNDERSTANDING ENTREPRENEURSHIP: IMPACT OF OVERCONFIDENCE ON ENTREPRENEURSHIP
 
EFFECT OF DIFFERENT CONCENTRATIONS OF AUXINS AND COMBINATION WITH KINETIN ON ...
EFFECT OF DIFFERENT CONCENTRATIONS OF AUXINS AND COMBINATION WITH KINETIN ON ...EFFECT OF DIFFERENT CONCENTRATIONS OF AUXINS AND COMBINATION WITH KINETIN ON ...
EFFECT OF DIFFERENT CONCENTRATIONS OF AUXINS AND COMBINATION WITH KINETIN ON ...
 
IDENTIFICATION OF SUPPLY CHAIN MANAGEMENT PROBLEMS: A REVIEW
IDENTIFICATION OF SUPPLY CHAIN MANAGEMENT PROBLEMS: A REVIEWIDENTIFICATION OF SUPPLY CHAIN MANAGEMENT PROBLEMS: A REVIEW
IDENTIFICATION OF SUPPLY CHAIN MANAGEMENT PROBLEMS: A REVIEW
 
Ijtra130511
Ijtra130511Ijtra130511
Ijtra130511
 
INVESTIGATION OF STRESSES IN CANTILEVER BEAM BY FEM AND ITS EXPERIMENTAL VERI...
INVESTIGATION OF STRESSES IN CANTILEVER BEAM BY FEM AND ITS EXPERIMENTAL VERI...INVESTIGATION OF STRESSES IN CANTILEVER BEAM BY FEM AND ITS EXPERIMENTAL VERI...
INVESTIGATION OF STRESSES IN CANTILEVER BEAM BY FEM AND ITS EXPERIMENTAL VERI...
 
EFFECT OF PRUNING AND SKIFFING ON GROWTH AND PRODUCTIVITY OF DARJEELING TEA (...
EFFECT OF PRUNING AND SKIFFING ON GROWTH AND PRODUCTIVITY OF DARJEELING TEA (...EFFECT OF PRUNING AND SKIFFING ON GROWTH AND PRODUCTIVITY OF DARJEELING TEA (...
EFFECT OF PRUNING AND SKIFFING ON GROWTH AND PRODUCTIVITY OF DARJEELING TEA (...
 
STUDIES ON THE ANTIBIOTIC RESISTANCE USING URINARY TRACT ISOLATES OF E.COLI.
STUDIES ON THE ANTIBIOTIC RESISTANCE USING URINARY TRACT ISOLATES OF E.COLI.STUDIES ON THE ANTIBIOTIC RESISTANCE USING URINARY TRACT ISOLATES OF E.COLI.
STUDIES ON THE ANTIBIOTIC RESISTANCE USING URINARY TRACT ISOLATES OF E.COLI.
 
FACTORS AFFECTING PROJECT SUSTAINABILITY BEYOND DONOR’S SUPPORT. THE CASE OF ...
FACTORS AFFECTING PROJECT SUSTAINABILITY BEYOND DONOR’S SUPPORT. THE CASE OF ...FACTORS AFFECTING PROJECT SUSTAINABILITY BEYOND DONOR’S SUPPORT. THE CASE OF ...
FACTORS AFFECTING PROJECT SUSTAINABILITY BEYOND DONOR’S SUPPORT. THE CASE OF ...
 
MECHANISMS OF PHOTOPERIOD IN REGULATION OF RICE FLOWERING
MECHANISMS OF PHOTOPERIOD IN REGULATION OF RICE FLOWERING MECHANISMS OF PHOTOPERIOD IN REGULATION OF RICE FLOWERING
MECHANISMS OF PHOTOPERIOD IN REGULATION OF RICE FLOWERING
 
Ijtra150317
Ijtra150317Ijtra150317
Ijtra150317
 
PERFORMANCE EVALUATION OF DIFFERENT CLASSIFIER ON BREAST CANCER
PERFORMANCE EVALUATION OF DIFFERENT CLASSIFIER ON BREAST CANCERPERFORMANCE EVALUATION OF DIFFERENT CLASSIFIER ON BREAST CANCER
PERFORMANCE EVALUATION OF DIFFERENT CLASSIFIER ON BREAST CANCER
 
AN INSIDE LOOK IN THE ELECTRICAL STRUCTURE OF THE BATTERY MANAGEMENT SYSTEM T...
AN INSIDE LOOK IN THE ELECTRICAL STRUCTURE OF THE BATTERY MANAGEMENT SYSTEM T...AN INSIDE LOOK IN THE ELECTRICAL STRUCTURE OF THE BATTERY MANAGEMENT SYSTEM T...
AN INSIDE LOOK IN THE ELECTRICAL STRUCTURE OF THE BATTERY MANAGEMENT SYSTEM T...
 
AN EXPERIMENTAL STUDY ON SEPARATION OF WATER FROM THE ATMOSPHERIC AIR
AN EXPERIMENTAL STUDY ON SEPARATION OF WATER FROM THE ATMOSPHERIC AIRAN EXPERIMENTAL STUDY ON SEPARATION OF WATER FROM THE ATMOSPHERIC AIR
AN EXPERIMENTAL STUDY ON SEPARATION OF WATER FROM THE ATMOSPHERIC AIR
 
USING SOCIAL NETWORKING AS A MECHANISM TO ACHIEVE BRAND RESONANCE: AN EMPIRIC...
USING SOCIAL NETWORKING AS A MECHANISM TO ACHIEVE BRAND RESONANCE: AN EMPIRIC...USING SOCIAL NETWORKING AS A MECHANISM TO ACHIEVE BRAND RESONANCE: AN EMPIRIC...
USING SOCIAL NETWORKING AS A MECHANISM TO ACHIEVE BRAND RESONANCE: AN EMPIRIC...
 
MAPPING REMOTE PLANTS THROUGH REMOTE SENSING TECHNOLOGY AND GIS
MAPPING REMOTE PLANTS THROUGH REMOTE SENSING TECHNOLOGY AND GISMAPPING REMOTE PLANTS THROUGH REMOTE SENSING TECHNOLOGY AND GIS
MAPPING REMOTE PLANTS THROUGH REMOTE SENSING TECHNOLOGY AND GIS
 
IDENTIFICATIONS OF ELECTRICAL AND ELECTRONIC EQUIPMENT POLYMER WASTE TYPES US...
IDENTIFICATIONS OF ELECTRICAL AND ELECTRONIC EQUIPMENT POLYMER WASTE TYPES US...IDENTIFICATIONS OF ELECTRICAL AND ELECTRONIC EQUIPMENT POLYMER WASTE TYPES US...
IDENTIFICATIONS OF ELECTRICAL AND ELECTRONIC EQUIPMENT POLYMER WASTE TYPES US...
 
IMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACH
IMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACHIMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACH
IMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACH
 
PREVALENCE EVALUATION OF DIABETIC RETINOPATHY
PREVALENCE EVALUATION OF DIABETIC RETINOPATHYPREVALENCE EVALUATION OF DIABETIC RETINOPATHY
PREVALENCE EVALUATION OF DIABETIC RETINOPATHY
 

Similar to ANALYTICS OF DATA USING HADOOP-A REVIEW

Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data FundamentalsSmarak Das
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Surveyijeei-iaes
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overviewNitesh Ghosh
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsHappiest Minds Technologies
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
 

Similar to ANALYTICS OF DATA USING HADOOP-A REVIEW (20)

Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Big data
Big dataBig data
Big data
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
 
Big data
Big dataBig data
Big data
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Big Data
Big DataBig Data
Big Data
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
 
Big data
Big dataBig data
Big data
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Big data-ppt-
Big data-ppt-Big data-ppt-
Big data-ppt-
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
Big data
Big dataBig data
Big data
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Bigdata
Bigdata Bigdata
Bigdata
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 

More from International Journal of Technical Research & Application

More from International Journal of Technical Research & Application (20)

STUDY & PERFORMANCE OF METAL ON METAL HIP IMPLANTS: A REVIEW
STUDY & PERFORMANCE OF METAL ON METAL HIP IMPLANTS: A REVIEWSTUDY & PERFORMANCE OF METAL ON METAL HIP IMPLANTS: A REVIEW
STUDY & PERFORMANCE OF METAL ON METAL HIP IMPLANTS: A REVIEW
 
EXPONENTIAL SMOOTHING OF POSTPONEMENT RATES IN OPERATION THEATRES OF ADVANCED...
EXPONENTIAL SMOOTHING OF POSTPONEMENT RATES IN OPERATION THEATRES OF ADVANCED...EXPONENTIAL SMOOTHING OF POSTPONEMENT RATES IN OPERATION THEATRES OF ADVANCED...
EXPONENTIAL SMOOTHING OF POSTPONEMENT RATES IN OPERATION THEATRES OF ADVANCED...
 
POSTPONEMENT OF SCHEDULED GENERAL SURGERIES IN A TERTIARY CARE HOSPITAL - A T...
POSTPONEMENT OF SCHEDULED GENERAL SURGERIES IN A TERTIARY CARE HOSPITAL - A T...POSTPONEMENT OF SCHEDULED GENERAL SURGERIES IN A TERTIARY CARE HOSPITAL - A T...
POSTPONEMENT OF SCHEDULED GENERAL SURGERIES IN A TERTIARY CARE HOSPITAL - A T...
 
STUDY OF NANO-SYSTEMS FOR COMPUTER SIMULATIONS
STUDY OF NANO-SYSTEMS FOR COMPUTER SIMULATIONSSTUDY OF NANO-SYSTEMS FOR COMPUTER SIMULATIONS
STUDY OF NANO-SYSTEMS FOR COMPUTER SIMULATIONS
 
ENERGY GAP INVESTIGATION AND CHARACTERIZATION OF KESTERITE CU2ZNSNS4 THIN FIL...
ENERGY GAP INVESTIGATION AND CHARACTERIZATION OF KESTERITE CU2ZNSNS4 THIN FIL...ENERGY GAP INVESTIGATION AND CHARACTERIZATION OF KESTERITE CU2ZNSNS4 THIN FIL...
ENERGY GAP INVESTIGATION AND CHARACTERIZATION OF KESTERITE CU2ZNSNS4 THIN FIL...
 
POD-PWM BASED CAPACITOR CLAMPED MULTILEVEL INVERTER
POD-PWM BASED CAPACITOR CLAMPED MULTILEVEL INVERTERPOD-PWM BASED CAPACITOR CLAMPED MULTILEVEL INVERTER
POD-PWM BASED CAPACITOR CLAMPED MULTILEVEL INVERTER
 
DIGITAL COMPRESSING OF A BPCM SIGNAL ACCORDING TO BARKER CODE USING FPGA
DIGITAL COMPRESSING OF A BPCM SIGNAL ACCORDING TO BARKER CODE USING FPGADIGITAL COMPRESSING OF A BPCM SIGNAL ACCORDING TO BARKER CODE USING FPGA
DIGITAL COMPRESSING OF A BPCM SIGNAL ACCORDING TO BARKER CODE USING FPGA
 
MODELLING THE IMPACT OF FLOODING USING GEOGRAPHIC INFORMATION SYSTEM AND REMO...
MODELLING THE IMPACT OF FLOODING USING GEOGRAPHIC INFORMATION SYSTEM AND REMO...MODELLING THE IMPACT OF FLOODING USING GEOGRAPHIC INFORMATION SYSTEM AND REMO...
MODELLING THE IMPACT OF FLOODING USING GEOGRAPHIC INFORMATION SYSTEM AND REMO...
 
LI-ION BATTERY TESTING FROM MANUFACTURING TO OPERATION PROCESS
LI-ION BATTERY TESTING FROM MANUFACTURING TO OPERATION PROCESSLI-ION BATTERY TESTING FROM MANUFACTURING TO OPERATION PROCESS
LI-ION BATTERY TESTING FROM MANUFACTURING TO OPERATION PROCESS
 
QUALITATIVE RISK ASSESSMENT AND MITIGATION MEASURES FOR REAL ESTATE PROJECTS ...
QUALITATIVE RISK ASSESSMENT AND MITIGATION MEASURES FOR REAL ESTATE PROJECTS ...QUALITATIVE RISK ASSESSMENT AND MITIGATION MEASURES FOR REAL ESTATE PROJECTS ...
QUALITATIVE RISK ASSESSMENT AND MITIGATION MEASURES FOR REAL ESTATE PROJECTS ...
 
SCOPE OF REPLACING FINE AGGREGATE WITH COPPER SLAG IN CONCRETE- A REVIEW
SCOPE OF REPLACING FINE AGGREGATE WITH COPPER SLAG IN CONCRETE- A REVIEWSCOPE OF REPLACING FINE AGGREGATE WITH COPPER SLAG IN CONCRETE- A REVIEW
SCOPE OF REPLACING FINE AGGREGATE WITH COPPER SLAG IN CONCRETE- A REVIEW
 
IMPLEMENTATION OF METHODS FOR TRANSACTION IN SECURE ONLINE BANKING
IMPLEMENTATION OF METHODS FOR TRANSACTION IN SECURE ONLINE BANKINGIMPLEMENTATION OF METHODS FOR TRANSACTION IN SECURE ONLINE BANKING
IMPLEMENTATION OF METHODS FOR TRANSACTION IN SECURE ONLINE BANKING
 
EFFECT OF TRANS-SEPTAL SUTURE TECHNIQUE VERSUS NASAL PACKING AFTER SEPTOPLASTY
EFFECT OF TRANS-SEPTAL SUTURE TECHNIQUE VERSUS NASAL PACKING AFTER SEPTOPLASTYEFFECT OF TRANS-SEPTAL SUTURE TECHNIQUE VERSUS NASAL PACKING AFTER SEPTOPLASTY
EFFECT OF TRANS-SEPTAL SUTURE TECHNIQUE VERSUS NASAL PACKING AFTER SEPTOPLASTY
 
EVALUATION OF DRAINAGE WATER QUALITY FOR IRRIGATION BY INTEGRATION BETWEEN IR...
EVALUATION OF DRAINAGE WATER QUALITY FOR IRRIGATION BY INTEGRATION BETWEEN IR...EVALUATION OF DRAINAGE WATER QUALITY FOR IRRIGATION BY INTEGRATION BETWEEN IR...
EVALUATION OF DRAINAGE WATER QUALITY FOR IRRIGATION BY INTEGRATION BETWEEN IR...
 
THE CONSTRUCTION PROCEDURE AND ADVANTAGE OF THE RAIL CABLE-LIFTING CONSTRUCTI...
THE CONSTRUCTION PROCEDURE AND ADVANTAGE OF THE RAIL CABLE-LIFTING CONSTRUCTI...THE CONSTRUCTION PROCEDURE AND ADVANTAGE OF THE RAIL CABLE-LIFTING CONSTRUCTI...
THE CONSTRUCTION PROCEDURE AND ADVANTAGE OF THE RAIL CABLE-LIFTING CONSTRUCTI...
 
TIME EFFICIENT BAYLIS-HILLMAN REACTION ON STEROIDAL NUCLEUS OF WITHAFERIN-A T...
TIME EFFICIENT BAYLIS-HILLMAN REACTION ON STEROIDAL NUCLEUS OF WITHAFERIN-A T...TIME EFFICIENT BAYLIS-HILLMAN REACTION ON STEROIDAL NUCLEUS OF WITHAFERIN-A T...
TIME EFFICIENT BAYLIS-HILLMAN REACTION ON STEROIDAL NUCLEUS OF WITHAFERIN-A T...
 
A STUDY ON THE FRESH PROPERTIES OF SCC WITH FLY ASH
A STUDY ON THE FRESH PROPERTIES OF SCC WITH FLY ASHA STUDY ON THE FRESH PROPERTIES OF SCC WITH FLY ASH
A STUDY ON THE FRESH PROPERTIES OF SCC WITH FLY ASH
 
OPEN LOOP ANALYSIS OF CASCADED HBRIDGE MULTILEVEL INVERTER USING PDPWM FOR PH...
OPEN LOOP ANALYSIS OF CASCADED HBRIDGE MULTILEVEL INVERTER USING PDPWM FOR PH...OPEN LOOP ANALYSIS OF CASCADED HBRIDGE MULTILEVEL INVERTER USING PDPWM FOR PH...
OPEN LOOP ANALYSIS OF CASCADED HBRIDGE MULTILEVEL INVERTER USING PDPWM FOR PH...
 
PERFORMANCE ANALYSIS OF MICROSTRIP PATCH ANTENNA USING COAXIAL PROBE FEED TEC...
PERFORMANCE ANALYSIS OF MICROSTRIP PATCH ANTENNA USING COAXIAL PROBE FEED TEC...PERFORMANCE ANALYSIS OF MICROSTRIP PATCH ANTENNA USING COAXIAL PROBE FEED TEC...
PERFORMANCE ANALYSIS OF MICROSTRIP PATCH ANTENNA USING COAXIAL PROBE FEED TEC...
 
OVERVIEW OF TCP PERFORMANCE IN SATELLITE COMMUNICATION NETWORKS
OVERVIEW OF TCP PERFORMANCE IN SATELLITE COMMUNICATION NETWORKSOVERVIEW OF TCP PERFORMANCE IN SATELLITE COMMUNICATION NETWORKS
OVERVIEW OF TCP PERFORMANCE IN SATELLITE COMMUNICATION NETWORKS
 

Recently uploaded

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

ANALYTICS OF DATA USING HADOOP-A REVIEW

  • 1. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 3, Issue 3 (May-June 2015), PP. 114-116 114 | P a g e ANALYTICS OF DATA USING HADOOP-A REVIEW Vineet Sajwan, Vikash Yadav MGM’s College of Engineering and Technology, Noida Sector-62 Vin94sajwan@gmail.com, 1109510067@coet.in ABSTRACT- In this paper, we discuss about the Big Data. We analyze and reveals the benefits of Big Data. We analyze the big data challenges and how Hadoop gives solution to it. This research paper gives the comparison between relational databases and Hadoop. This research paper also gives reason of why Big Data and Hadoop. General Terms Data Explosion, Big Data, Big Data Analytics, Hadoop, Hadoop Distributed File System, MapReduce Keywords Big Data, Big Data Analytics, Hadoop, Hadoop Distributed File System, Map Reduce. I. INTRODUCTION We live in the age of data .It is not easy to measure the volume of data. According to the IDC estimate the size of “Digital World” is about 0.16 zettabytes and forecasting the “Digital World” will grow by a factor of 300, from 130 Exabyte to 40,000 Exabyte. This flood of the data is coming from various sources around the world. Society is being surrounded by gadgets and as a results organization have to store huge amount of data .To manage and process that data is big challenge for the organization. The data from the social websites makes a big opportunities for the organization for understand the customer needs. This paradigm is known as Big Data. Big Data, two popular terms are here today for good reason. There is a big market in big future for Big Data. Company are starting to realize there untapped information and unstructured documents siting across the network. Analytics is the solution that mines unstructured and semi structured data and gives insights to the organization from not their privately acquired data but also from large amount of data is publicly available from various sources. We have emails, imagine we can data mine our emails and get meaningful results. The kind of information we get from our emails, documents, textiles, spreadsheet, and pdf. All these unstructured data siting across the network that contains answer which help us to create new products, refining existing product, discover trends, improve customer relation, and understand ourselves and help us to understand even our company better. Hadoop answer many big data challenges that we see today that is how to store terabytes and petabytes of information ,how to access the information quickly , how to work with variety of data and how to do you do all scalable fault tolerant and flexible way. There is a saying “More data usually beats better algorithms.” It means for some problem may be your algorithm solve that problem but it can be often be beaten by adding more data simply. The good news is big data is here and bad news is that we are struggling for storing and processing it. II. DATA EXPLOSION If we sum up the state of data in one word it is a lot. There’s a lot of data in our world.90% of worlds data is created in last two years that’s say big data is here .Through smart phones and tablets one can access data everywhere. Social media is everywhere Facebook, twitter, Instagram, tumbler the list goes on and on generating data with alarming rate which is known as data explosion. Data explosion is the explosion of unstructured and semi- structured data. Since the end of 90’s it’s been on terror .It is shows exponential growth with no sign of slowing down. Structured data are pretty manageable and predictable curve.Exmaples of the types of the data are:-  Unstructured Data: - Things like email, pdfs, documents on the hard drive, textiles.  Structured Data:-Everything has schema associated with it and checked against the schema when we put it into the database.  Semi-Structured:- Things that have some form of hierarchy or associated with table but nonetheless contain tags. Xml is a tag based format that describe data inside the tag in its hierarchical so got some structure. They don’t have scehma associated with it. III. BIG COMPANIES WITH BIG DATA A. Google In starting Google have to make internet searchable for they need to index a billion pages .So they built the technology called mapReduce along with GFS(Google File System) that’s what Hadoop is based on .Doug Cutting in 2003 join Yahoo and yahoo gave him the team of engineers. They built Hadoop based on those white papers the GFS and mapReduce. That’s the core technology of Hadoop i.e. HDFS (Hadoop distributed File System) and mapReduce. Google index billion pages using the same technology. Now today 60 Billion pages google indexes to make searchable for the internet in very less seconds. B. Facebook Facebook has the largest Hadoop cluster on this planet. They have the cluster that contain of the 100 petabytes of data and they have generated half petabyte of data in one day. Anything everybody does on Facebook from login to clicking to liking and sharing something is tracked on Facebook and thats how they use the data to target views. They are looking what we are clicking what our friends are clicking and find the commonalities them something called collaborative filtering which they use for targeting. C. Amazon In Amazon, recommendation system is based on some elements like what a user had bought in the past, items they have in their virtual shopping and item they have rated and liked and what other customer had viewed or purchased. Behind the recommendation system amazon’s collaborative filtering algorithm works. This is the reason of Amazon’s success.
  • 2. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 3, Issue 3 (May-June 2015), PP. 114-116 115 | P a g e D. Twitter 400 million a day which equals to 15 terabytes of data every day. They have thousands of Hadoop clusters setup they run thousands of mapReduce jobs every night to discover trends and they sell those trends to the people who makes product so can they target those trends E. General Motors General Motors are car manufacture of America .In 2012, they cut off multibillion dollar contract with HP for outsourcing .They are building 220000 square feet warehouses in which Hadoop were installed. (i) GM combines the Global Information System (GIS) and data analytics to improve the performance of their dealer. These analytics shared are shared with the dealers who now can view local demographics, location characteristics and customer satisfaction. (ii) With the marketing budget of $2 billion per year GM have lots of potential customer. But instead of mass targeting the public they use big data analytics to create the customer profile.GM knows who the buyers are that want luxury cars and where they are. (iii) GM focuses on sensors and telematics. For GM Big Data saves 800$ per car in GM telematics .Which means saving lie in the connected cars that can communicate with manufacturer via 4G. IV. 5 V’S OF BIG DATA It is the 5 big reason we don’t use traditional computing model which is big expensive trickled down machines with lots of process, cords, hard drives read enable to the processes and fault tolerance. It is the hardware solution and we don’t use hardware solution and also we can’t use the relational world. Relational world is use to design to handle gigabyte of data not petabytes of data. The big data breaks into 5 dimensions that is volume, velocity, variety, value and veracity. A. Volume Volume refers to the data generated every second. We are not talking about terabytes but Zettabytes. If we take all the data generated in the world between beginning time and 2008 the same amount of data generated every minute. This makes most data sets too large to store and analyses using traditional database technology. Hadoop uses use distributed system so we can store and analyses data across the database around anywhere in the world. B. Velocity Velocity is the speed which we access the data. In traditional computing models no matter how fast our computer is our process is still bound by disk I/O because our disk transfer is having involved in newly pace of processing power. So that’s why distributed computing is more useful and Hadoop increases strength of current computing world. It brings computation to the data. It doesn’t brings data to the computation. It can process the data locally and when we have cluster of nodes to all working together using and harnessing the power of processor and reducing network bandwidth and navigating the weakness of disk transfer. Yahoo break the record of processing a terabyte of data many times using Hadoop. C. Variety The relational model can only handle structure data they can get semi structured and unstructured data. Some engineers they have ETL tool to transform and scrub the data to bring it but that require a lot of extra work .With the help of Hadoop we can now analyses and bring together data of different type such as messages and social media conversations, photos, sensor data, video and voice recording. D. Value Having access to the Big data is no good unless we can turn it into the value. Companies are starting to generate amazing value from their big data. E. Veracity Veracity refers to the uncertainty surrounding of data which is due to the inconsistence and incompleteness of data. It is the major challenge to keeping organized data. V. HADOOP Hadoop is distributed software solution. It is scalable, fault tolerant distributed system for data storage and processing. There is two component of Hadoop that is HDFS which is the storage and mapReduce which is the retrieval and organising.  HDFS is self-healing high bandwidth cluster storage. If we put petabyte file in Hadoop cluster. Hadoop would break it up in the blocks and distributed across all the nodes in a cluster where its fault tolerant function come into play. When we setup HDFS we setup replication factor ( by default it sets 3).When we put a file in HDFS it’s going to make sure there are three copies of every block that make up that file spread across the node in a cluster. If we lose a node it going to self heal because it knows where is data on that node It re-replicate the blocks ever on that node to the rest of the subversion sites of the cluster. It has the name node and the data node. Generally one node is name node in a cluster and rest of them are data node. Name node is the meta data server it just hold N memory and knows where are blocks exist on what node on what rack spread across the cluster inside the network. That’s how Hadoop is fault tolerant.  mapReduce is the two-step process at the service. Programmer writes the mapper function which will goes out and tell the cluster what data we want to retrieve. The reducer takes all the data and aggregate. Hadoop is batch processing based system. We working on all of the data. mapReduce is all about working on all of the data inside cluster. Hadoop is flexible because there is no need to understand javato get data of cluster. In Facebook, engineers built Hive because Facebook doesn’t to force anybody to learn java for data extraction. Now anybody is familiar with sequel which most of professional are can pull data out of the cluster. Hadoop is scalable because we keep adding more data into the cluster by add more node which increases the overall performance of cluster.
  • 3. International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 3, Issue 3 (May-June 2015), PP. 114-116 116 | P a g e VI. PROBLEM DEFINITION As there exist large amount of data, the various challenges are faced about the management of such extensive data like unstructured data, fault tolerance and issues regarding storage of big amount of data. A. Problem Description The problem is simple: the storage capacities of hard drivers increased massively over the years-the rate at which data can read from drives have not kept up. In 1990, one typical drive can store 1,370 MB of data and had transfer speed of 4.4 MB/s, so you could read the full data in about 5 minutes. 20 years later, one typical hard drives can store terabytes but speed is around 4.4MB/s so it need half hour to read full data of the disk. B. Problem Solution The obvious way to reduce time is multiple disks at once. Imagine if we have 100 drives working in parallel we can read the data in 2 minutes. Only 100 disks is wasteful .But we can store datasets, each of which is one terabyte and provide access to them. C. Problem to Solve The first problem to solve is hardware failure:-The chance of failure increases as we connected the drivers. A common way of avoiding data loss is through replication. The Hadoop Distributed Filesystem (HDFS). The second problem to solve is analysis task need to combine in some way:-The data read to one disk need to be able to combine from any of 99 disks. Various distribution system allow this but doing this is very challenging. MapReduce provides a programming model that abstracts the model from disk read and writes, transforming it into computation over sets of keys and values. VII. HADOOP VS RDBMS Hadoop is intelligent as we seen this in the computation of data maximizing the strength of todays computing world and navigating the weaknesses. In a multi rack environment there is a switch which is rack aware. It knows what node belongs to which rack. In a more data locality whenever receive mapReduce job it find the shortest path to the data as possible. Traditional RDBMS tools mapReduce Data Size In Gigabytes In petabytes Access Batch and interactive Similar Update Read and Write many times Write once and read many times Structure Fixed schema Unstructured schema Latency Low High Integrity High Low Language SQL Procedural Language (C++,Java) VIII. CONCLUSION We live in the data world. There are various challenges in big data .there are various issue too. We came into conclusion that big data is not just huge data it is an opportunity to find new insights and analysis of our future data. We have discuss various parameters of big data. We also have discuss about comparison between RDBMS and MapReduce. There must be support and research on big data for getting new results. MapReduce is good for analyze huge dataset in a batch function where RDBMS is good for queries and update. REFERENCES [1] Shipa, Manjit kaur, “Big Data and Methodology”, 10 Oct, 2013 [2] Pareedpa, A.; Dr. Antony Selvadoss, “Significant Trends of Big Data”, 8 Aug, 2013 [3] Gurpeet Singh Bedi, Ashima, “Big Data Analysis with Dataset Scaling in Yet another Resource Negotiator (YARN)”, 5April, 2013 [4] Hadoop-The Definitive Guide, Tom White, Edition-3, 27Jan, 2012 [5] Mrigank Mridul, Akashdeep Khajuria,Snehasish Dutta,Kumar N, “Analysis of Big data using Apache Hadoop and MapReduce”,Volume 4, May 2014 [6] IBM 2012, What is big data: Bring big data to the enterprise,http://www.01.ibm.com/software/data/bigdata, IBM [7] Sam Madden, “From Databases to Big Data”,IEEE computer society,2012 [8] “Data Mining with BigData” ,Xindong Wu, Xingquan Zhu, Gong-Qing Wu, Wei Ding , 1041-4347/13/$31.00 © 2013 IEEE [9] Russom, “Big Data Analytics” , TDWI Research,2011 [10] An Oracle White Paper, “Hadoop and NoSQL Technologies and the Oracle DataBase”,February 2012