SlideShare a Scribd company logo
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129
www.iosrjournals.org
DOI: 10.9790/0661-180304125129 www.iosrjournals.org 125 | Page
I++
Mapreduce: Incremental Mapreduce for Mining the Big Data
Prof. Kanchan Dhomse B.1
1
(IT, METS BKC IOE, NASHIK, India)
Abstract: Data mining is an interdisciplinary area of computer engineering. Incremental processing is a
challenging approach to refreshing mining results and it uses saved states to avoid the cost of re computation
from damage. Here, I propose i2
MapReduce is a novel incremental processing expansion to MapReduce in data
mining. As compared with the state-of-the-art work on Incoop, instead of using task level re-computation the
i2
MapReduce executes the key-value pair level incremental processing . It helps not only one-step computation
but also more sophisticated iterative computation. Then it incorporates a set of novel techniques to decrease
input output level for accessing dedicated fine-grain computation states in the data mining. The final results on
Amazon EC2 describe the significant performance development of i2
MapReduce as compared to iterative and
plain MapReduce performing re computation.
Keywords: Big data, Incremental Processing, Mapreduce, MRBGraph, Recompilation.
I. Introduction
Recently, huge amount of digital data is being assembled in many important areas like social network,
e-commerce, education, finance, health care and environment. It has become increasingly popular to mine such
big data in order to obtain outputs to help business decisions or to give better personalized and higher quality
services in data mining. Recently a huge number of computing frameworks [1-10] have been developed for big
data analysis in data mining. So, among these frameworks the MapReduce [1] with its open-source
implementations, such as Hadoop is the mostly used in production due to its simplicity and generality and
maturity. Here, I focus on improving MapReduce performance. Actually, Big data is constantly increasing new
updates are being collected and the input data of a big data mining algorithm will slowely change. Generally, it
is desirable to periodically updates the mining computation in order to keep the mining results accurate. Here,
some are examples like the PageRank algorithm [11] evaluates ranking scores of web pages based on the web
graph structure for supporting web search in data mining. Here, I explain the nature of the problem and previous
work, purpose, and the contribution of the paper which shows the structure of mapreduce. The incremental
processing utilize the fact that the input data of two subsequent computations are similar . Only a very small
fraction of the input data has changed. Here, system observe and find the realization of this principle in the
context of the MapReduce computing framework in data mining.
Fig.1: computation of Map Reduce
II. Implementation
1. Mapreduce Background
Generally, a MapReduce program is made up of a Map function and a Reduce function [1], as shown in
Fig. 1. Their APIs are as shown below:
Map(k1,v1)-> (<k2,v2>)
Reduce(k2,v2)->(<k3,v3>)
Here, the Map function enters a kv-pair <K1, V1> as input and evaluate zero or more middle kv-pairs
<K2,V2> in mapreduce function. Now all <K2, V2> are grouped by K2 and The Reduce function inserts a K2
and a list of V 2 as input and computes the final output kv-pairs <K3,V3> in mapreduce. So, a MapReduce
I++
Mapreduce: Incremental Mapreduce for Mining The Big Data
DOI: 10.9790/0661-180304125129 www.iosrjournals.org 126 | Page
normally reads the input data of the MapReduce evaluation from and writes the final results to a distributed file
system which distributes a file into equal-sized i.e. 64 MB blocks and collects the blocks across a cluster of
machines in data mining. The mapreduce system executes a JobTracker process on a master node to concentrate
the job progress and a set of Task Tracker processes on worker nodes to implement the actual Map and Reduce
tasks. Then, the Job Tracker starts a Map task per data block and typically assigns it to the TaskTracker on the
machine that holds the corresponding data block in order to decrease communication. At each Map task calls the
map function for every input <K1, V1> pair and stores the middle kv-pairs <K2, V2> on local disks in
mapreduce function. The intermediate results are shuffled to reduce tasks according to a divides the function on
K2 in pair. So, after a reduce task gets and merges middle pair’s results from all map tasks then it invokes the
reduce function on every <K2, V 2> pair to create the final output kv-pairs <K3, V3>among all pairs.
2 Fine- Grain Incremental Processing For One-Step Computation
Here, we start by describing the basic idea of fine-grain incremental processing in the mapreduce
function . it shows the main design including the MRBGraph abstraction and the incremental processing engine
in mapreduce process. Here we divide it into two aspects of the design. i.e. the mechanism that maintain the
fine-grain states and the handling of a special case where the Reduce function performs collection function.
2.1 MRBGraph Abstraction
We use a MRBGraph i.e. Map Reduce Bipartite Graph is the abstraction to model the data flow in
MapReduce function. As shown in Fig. 2(a) every vertex in the Map task represents an single Map function call
instance on a pair of <K1,V1> nodes. At every vertex in the Reduce task represents an individual Reduce
function call instance on a group of <K2; V 2>. The range from a Map instance to a Reduce instance means that
the Map instance creates a <K2,V2> pair that is shuffled to become part of the input to the Reduce instance of
pair. Here, for example the input of reduce instance getting from map instance 0, 2, and 4 respectively.
Therefore, i2
MapReduce will maintain (K2, MK, V 2) for each MRBGraph edge in given function.
Fig. 2. MRB Graph
2.2 Fine-Grain Incremental Processing Engine
Here, figure shows the fine-grain incremental function engine with an example application which
evaluates the addition of in-edge weights for each vertex in a graph of MRBG. So, as shown the input data i.e.
the graph structure includes over time in running process. We present and explains that how the engine
performs incremental processing to update the analysis results in mapreduce techniques.
III. Algorithms
In data mining and analysis the fundamental algorithms form field of data science which includes
default methods to analyze patterns and models for all kinds of data with applications ranging from scientific
discovery to business intelligence and analytics at the time of algorithm selection [8].
3 General-Purpose Support For Iterative Computation
3.1 Analyzing Iterative Computation :
PageRank is a iterative graph algorithm for ranking web pages in data mining. So, it evaluates a ranking score
for each vertex in a graph in algorithm.
I++
Mapreduce: Incremental Mapreduce for Mining The Big Data
DOI: 10.9790/0661-180304125129 www.iosrjournals.org 127 | Page
Algorithm 1: PageRank in MapReduce
Kmeans : The Kmean is a generally used clustering algorithm that partitions points into k clusters .
Fig. 3. Incremental processing for an application
Algorithm 2: kmeans in MapReduce
4 Incremental Iterative Processing
Here, we shows the incremental processing techniques for iterative evaluation. But it is not sufficient
to simply gather the above solutions for incremental one step processing and iterative computation given
below we discuss 3 main aspects that we noticed in order to obtaine an effective design of i2
mapreduce.
Fig. 6. Iterative model of i2MapReduce.
I++
Mapreduce: Incremental Mapreduce for Mining The Big Data
DOI: 10.9790/0661-180304125129 www.iosrjournals.org 128 | Page
IV. Result Analysis
As we described all the algorithms which are helpful for i2
mapreduce Solutions. Actually, our
experiments compare four solutions initially PlainMR recomp re-computation on vanilla Hadoop then iterMR
recomp and re-computation on Hadoop optimized for iterative evaluation. Secondly, hadoop re-computation on
the iterative Mapreduce framework Hadoop which controls mapreduce by providing a structure data caching
mechanism then in i2
Mapreduce our proposed solution is to best of our knowledge the task-level coarse-grain
incremental processing system. Here, Incoop is not normally present that’s why we can’t compare
i2
MapReduce with Incoop technique. So, the statistics describes that without careful data partition almost all
tasks see changes in the experiments and making task-level incremental processing less important than the other.
Fig. 7. Run time of individual stages in PageRank.
Fig. 8. Normalized runtime.
V. Conclusion
In data mining includes mapreduce function described i2
MapReduce technique. A MapReduce based
framework for incremental big data process is used for i2
mapreduce function. I2
MapReduce collects a fine-grain
incremental engine and a general-purpose iterative model also a set of useful techniques for incremental iterative
evaluation. The real-machine experiments describes that i2
MapReduce can significantly decrease the run time
for updating the big data mining results as compared to the recomputation on both plain and iterative
MapReduce.
Acknowledgements
We especially thanks to our department and our Institute for the great support regarding paper and their
all views. I really thankful to my all staffs and our H.O.D. Prof. N.R. kale. who showed me the way of
successful journey of publishing paper.
I++
Mapreduce: Incremental Mapreduce for Mining The Big Data
DOI: 10.9790/0661-180304125129 www.iosrjournals.org 129 | Page
References
Journal Papers
[1]. J. Dean and S. Ghemawat, “ Mapreduce: Simplified data processing on large clusters,” in Proc. 6th Conf. Symp. Opear. Syst.
Des.Implementation, 2004, p. 10.
[2]. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed
Datasets : A fault-tolerant abstraction for, in-memory cluster computing,” in Proc. 9th USENIX Conf. Netw. Syst. Des.
Implementation,2012, p. 2.
[3]. R. Power and J. Li, “Piccolo: Building fast, distributed programs with partitioned tables,” in Proc. 9th USENIX Conf. Oper. Syst.
Des.Implementation, 2010, pp. 1–14.
[4]. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: A system for large-scale
Graph processing,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2010, pp. 135–146.
[5]. S. R. Mihaylov, Z. G. Ives, and S. Guha, “Rex: Recursive, deltabased data-centric computation,” in Proc. VLDB Endowment,
2012,vol. 5, no. 11, pp. 1280–1291.
[6]. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein, “Distributed graphlab: A framework for machine
Learning and data mining in the cloud,” in Proc. VLDB Endowment, 2012, vol. 5, no. 8, pp. 716–727.
Proceedings Papers
[7]. S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl, “Spinning fast iterative data flows,” in Proc. VLDB Endowment, 2012, vol.
5,no. 11, pp. 1268–1279.
Books
[8]. Mohammed j. Zaki’s and Wagner Meira jr.’s Data mining and Analysis (Cambridge university , 2014 )

More Related Content

What's hot

Scrutinizing mapreduce mechanism with orientation to refine the productivity
Scrutinizing mapreduce mechanism with orientation to refine the productivityScrutinizing mapreduce mechanism with orientation to refine the productivity
Scrutinizing mapreduce mechanism with orientation to refine the productivity
eSAT Journals
 
IJET-V3I2P24
IJET-V3I2P24IJET-V3I2P24
Scheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy CostsScheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy Costs
inventionjournals
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
Robert Grossman
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
IRJET Journal
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
Edureka!
 
A Brief on MapReduce Performance
A Brief on MapReduce PerformanceA Brief on MapReduce Performance
A Brief on MapReduce Performance
AM Publications
 
Gis rainfallstudy
Gis rainfallstudyGis rainfallstudy
Gis rainfallstudy
steven_uk
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
journalBEEI
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
Adrian Florea
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusters
kazuma_sato
 
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET Journal
 
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
ijccsa
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
NECST Lab @ Politecnico di Milano
 
D0212326
D0212326D0212326
133
133133
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHM
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHMPROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHM
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHM
ecij
 
Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters
IJECEIAES
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modeling
nadikari123
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computing
ijujournal
 

What's hot (20)

Scrutinizing mapreduce mechanism with orientation to refine the productivity
Scrutinizing mapreduce mechanism with orientation to refine the productivityScrutinizing mapreduce mechanism with orientation to refine the productivity
Scrutinizing mapreduce mechanism with orientation to refine the productivity
 
IJET-V3I2P24
IJET-V3I2P24IJET-V3I2P24
IJET-V3I2P24
 
Scheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy CostsScheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy Costs
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
A Brief on MapReduce Performance
A Brief on MapReduce PerformanceA Brief on MapReduce Performance
A Brief on MapReduce Performance
 
Gis rainfallstudy
Gis rainfallstudyGis rainfallstudy
Gis rainfallstudy
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusters
 
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
 
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
 
D0212326
D0212326D0212326
D0212326
 
133
133133
133
 
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHM
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHMPROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHM
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHM
 
Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modeling
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computing
 

Viewers also liked

A1804010105
A1804010105A1804010105
A1804010105
IOSR Journals
 
Practical Investigation of the Environmental Hazards of Idle Time and Speed o...
Practical Investigation of the Environmental Hazards of Idle Time and Speed o...Practical Investigation of the Environmental Hazards of Idle Time and Speed o...
Practical Investigation of the Environmental Hazards of Idle Time and Speed o...
IOSR Journals
 
F010423541
F010423541F010423541
F010423541
IOSR Journals
 
J010627176.iosr jece
J010627176.iosr jeceJ010627176.iosr jece
J010627176.iosr jece
IOSR Journals
 
F017633538
F017633538F017633538
F017633538
IOSR Journals
 
A017260110
A017260110A017260110
A017260110
IOSR Journals
 
L017136269
L017136269L017136269
L017136269
IOSR Journals
 
Cloud Computing for E-Commerce
Cloud Computing for E-CommerceCloud Computing for E-Commerce
Cloud Computing for E-Commerce
IOSR Journals
 
I1304026367
I1304026367I1304026367
I1304026367
IOSR Journals
 
Leather Quality Estimation Using an Automated Machine Vision System
Leather Quality Estimation Using an Automated Machine Vision SystemLeather Quality Estimation Using an Automated Machine Vision System
Leather Quality Estimation Using an Automated Machine Vision System
IOSR Journals
 
N017117276
N017117276N017117276
N017117276
IOSR Journals
 
I017144954
I017144954I017144954
I017144954
IOSR Journals
 
N010237982
N010237982N010237982
N010237982
IOSR Journals
 
S0102297102
S0102297102S0102297102
S0102297102
IOSR Journals
 
J010126267
J010126267J010126267
J010126267
IOSR Journals
 
D010121832
D010121832D010121832
D010121832
IOSR Journals
 
A Generalization of QN-Maps
A Generalization of QN-MapsA Generalization of QN-Maps
A Generalization of QN-Maps
IOSR Journals
 
C1803061620
C1803061620C1803061620
C1803061620
IOSR Journals
 
Providing The Security Against The DDOS Attack In Mobile Ad Hoc Networks
Providing The Security Against The DDOS Attack In Mobile Ad Hoc NetworksProviding The Security Against The DDOS Attack In Mobile Ad Hoc Networks
Providing The Security Against The DDOS Attack In Mobile Ad Hoc Networks
IOSR Journals
 
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...
IOSR Journals
 

Viewers also liked (20)

A1804010105
A1804010105A1804010105
A1804010105
 
Practical Investigation of the Environmental Hazards of Idle Time and Speed o...
Practical Investigation of the Environmental Hazards of Idle Time and Speed o...Practical Investigation of the Environmental Hazards of Idle Time and Speed o...
Practical Investigation of the Environmental Hazards of Idle Time and Speed o...
 
F010423541
F010423541F010423541
F010423541
 
J010627176.iosr jece
J010627176.iosr jeceJ010627176.iosr jece
J010627176.iosr jece
 
F017633538
F017633538F017633538
F017633538
 
A017260110
A017260110A017260110
A017260110
 
L017136269
L017136269L017136269
L017136269
 
Cloud Computing for E-Commerce
Cloud Computing for E-CommerceCloud Computing for E-Commerce
Cloud Computing for E-Commerce
 
I1304026367
I1304026367I1304026367
I1304026367
 
Leather Quality Estimation Using an Automated Machine Vision System
Leather Quality Estimation Using an Automated Machine Vision SystemLeather Quality Estimation Using an Automated Machine Vision System
Leather Quality Estimation Using an Automated Machine Vision System
 
N017117276
N017117276N017117276
N017117276
 
I017144954
I017144954I017144954
I017144954
 
N010237982
N010237982N010237982
N010237982
 
S0102297102
S0102297102S0102297102
S0102297102
 
J010126267
J010126267J010126267
J010126267
 
D010121832
D010121832D010121832
D010121832
 
A Generalization of QN-Maps
A Generalization of QN-MapsA Generalization of QN-Maps
A Generalization of QN-Maps
 
C1803061620
C1803061620C1803061620
C1803061620
 
Providing The Security Against The DDOS Attack In Mobile Ad Hoc Networks
Providing The Security Against The DDOS Attack In Mobile Ad Hoc NetworksProviding The Security Against The DDOS Attack In Mobile Ad Hoc Networks
Providing The Security Against The DDOS Attack In Mobile Ad Hoc Networks
 
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...
 

Similar to T180304125129

A simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceA simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
IRJET Journal
 
IJET-V3I1P27
IJET-V3I1P27IJET-V3I1P27
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
NelakurthyVasanthRed1
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
TSANKARARAO
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop Clusters
BRNSSPublicationHubI
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacm
lmphuong06
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
jencyjayastina
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
dbpublications
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
NilaNila16
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
IRJET Journal
 
Review on Multiply-Accumulate Unit
Review on Multiply-Accumulate UnitReview on Multiply-Accumulate Unit
Review on Multiply-Accumulate Unit
IJERA Editor
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
ijccsa
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
IJRAT
 
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
I3E Technologies
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
SANTOSH WAYAL
 
Dynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine MappingDynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine Mapping
AM Publications
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
Jyotirmoy Dey
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
পল্লব রায়
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
M Baddar
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 

Similar to T180304125129 (20)

A simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceA simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
 
IJET-V3I1P27
IJET-V3I1P27IJET-V3I1P27
IJET-V3I1P27
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop Clusters
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacm
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
 
Review on Multiply-Accumulate Unit
Review on Multiply-Accumulate UnitReview on Multiply-Accumulate Unit
Review on Multiply-Accumulate Unit
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
 
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 
Dynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine MappingDynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine Mapping
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 

More from IOSR Journals

A011140104
A011140104A011140104
A011140104
IOSR Journals
 
M0111397100
M0111397100M0111397100
M0111397100
IOSR Journals
 
L011138596
L011138596L011138596
L011138596
IOSR Journals
 
K011138084
K011138084K011138084
K011138084
IOSR Journals
 
J011137479
J011137479J011137479
J011137479
IOSR Journals
 
I011136673
I011136673I011136673
I011136673
IOSR Journals
 
G011134454
G011134454G011134454
G011134454
IOSR Journals
 
H011135565
H011135565H011135565
H011135565
IOSR Journals
 
F011134043
F011134043F011134043
F011134043
IOSR Journals
 
E011133639
E011133639E011133639
E011133639
IOSR Journals
 
D011132635
D011132635D011132635
D011132635
IOSR Journals
 
C011131925
C011131925C011131925
C011131925
IOSR Journals
 
B011130918
B011130918B011130918
B011130918
IOSR Journals
 
A011130108
A011130108A011130108
A011130108
IOSR Journals
 
I011125160
I011125160I011125160
I011125160
IOSR Journals
 
H011124050
H011124050H011124050
H011124050
IOSR Journals
 
G011123539
G011123539G011123539
G011123539
IOSR Journals
 
F011123134
F011123134F011123134
F011123134
IOSR Journals
 
E011122530
E011122530E011122530
E011122530
IOSR Journals
 
D011121524
D011121524D011121524
D011121524
IOSR Journals
 

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 

Recently uploaded (20)

The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 

T180304125129

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org DOI: 10.9790/0661-180304125129 www.iosrjournals.org 125 | Page I++ Mapreduce: Incremental Mapreduce for Mining the Big Data Prof. Kanchan Dhomse B.1 1 (IT, METS BKC IOE, NASHIK, India) Abstract: Data mining is an interdisciplinary area of computer engineering. Incremental processing is a challenging approach to refreshing mining results and it uses saved states to avoid the cost of re computation from damage. Here, I propose i2 MapReduce is a novel incremental processing expansion to MapReduce in data mining. As compared with the state-of-the-art work on Incoop, instead of using task level re-computation the i2 MapReduce executes the key-value pair level incremental processing . It helps not only one-step computation but also more sophisticated iterative computation. Then it incorporates a set of novel techniques to decrease input output level for accessing dedicated fine-grain computation states in the data mining. The final results on Amazon EC2 describe the significant performance development of i2 MapReduce as compared to iterative and plain MapReduce performing re computation. Keywords: Big data, Incremental Processing, Mapreduce, MRBGraph, Recompilation. I. Introduction Recently, huge amount of digital data is being assembled in many important areas like social network, e-commerce, education, finance, health care and environment. It has become increasingly popular to mine such big data in order to obtain outputs to help business decisions or to give better personalized and higher quality services in data mining. Recently a huge number of computing frameworks [1-10] have been developed for big data analysis in data mining. So, among these frameworks the MapReduce [1] with its open-source implementations, such as Hadoop is the mostly used in production due to its simplicity and generality and maturity. Here, I focus on improving MapReduce performance. Actually, Big data is constantly increasing new updates are being collected and the input data of a big data mining algorithm will slowely change. Generally, it is desirable to periodically updates the mining computation in order to keep the mining results accurate. Here, some are examples like the PageRank algorithm [11] evaluates ranking scores of web pages based on the web graph structure for supporting web search in data mining. Here, I explain the nature of the problem and previous work, purpose, and the contribution of the paper which shows the structure of mapreduce. The incremental processing utilize the fact that the input data of two subsequent computations are similar . Only a very small fraction of the input data has changed. Here, system observe and find the realization of this principle in the context of the MapReduce computing framework in data mining. Fig.1: computation of Map Reduce II. Implementation 1. Mapreduce Background Generally, a MapReduce program is made up of a Map function and a Reduce function [1], as shown in Fig. 1. Their APIs are as shown below: Map(k1,v1)-> (<k2,v2>) Reduce(k2,v2)->(<k3,v3>) Here, the Map function enters a kv-pair <K1, V1> as input and evaluate zero or more middle kv-pairs <K2,V2> in mapreduce function. Now all <K2, V2> are grouped by K2 and The Reduce function inserts a K2 and a list of V 2 as input and computes the final output kv-pairs <K3,V3> in mapreduce. So, a MapReduce
  • 2. I++ Mapreduce: Incremental Mapreduce for Mining The Big Data DOI: 10.9790/0661-180304125129 www.iosrjournals.org 126 | Page normally reads the input data of the MapReduce evaluation from and writes the final results to a distributed file system which distributes a file into equal-sized i.e. 64 MB blocks and collects the blocks across a cluster of machines in data mining. The mapreduce system executes a JobTracker process on a master node to concentrate the job progress and a set of Task Tracker processes on worker nodes to implement the actual Map and Reduce tasks. Then, the Job Tracker starts a Map task per data block and typically assigns it to the TaskTracker on the machine that holds the corresponding data block in order to decrease communication. At each Map task calls the map function for every input <K1, V1> pair and stores the middle kv-pairs <K2, V2> on local disks in mapreduce function. The intermediate results are shuffled to reduce tasks according to a divides the function on K2 in pair. So, after a reduce task gets and merges middle pair’s results from all map tasks then it invokes the reduce function on every <K2, V 2> pair to create the final output kv-pairs <K3, V3>among all pairs. 2 Fine- Grain Incremental Processing For One-Step Computation Here, we start by describing the basic idea of fine-grain incremental processing in the mapreduce function . it shows the main design including the MRBGraph abstraction and the incremental processing engine in mapreduce process. Here we divide it into two aspects of the design. i.e. the mechanism that maintain the fine-grain states and the handling of a special case where the Reduce function performs collection function. 2.1 MRBGraph Abstraction We use a MRBGraph i.e. Map Reduce Bipartite Graph is the abstraction to model the data flow in MapReduce function. As shown in Fig. 2(a) every vertex in the Map task represents an single Map function call instance on a pair of <K1,V1> nodes. At every vertex in the Reduce task represents an individual Reduce function call instance on a group of <K2; V 2>. The range from a Map instance to a Reduce instance means that the Map instance creates a <K2,V2> pair that is shuffled to become part of the input to the Reduce instance of pair. Here, for example the input of reduce instance getting from map instance 0, 2, and 4 respectively. Therefore, i2 MapReduce will maintain (K2, MK, V 2) for each MRBGraph edge in given function. Fig. 2. MRB Graph 2.2 Fine-Grain Incremental Processing Engine Here, figure shows the fine-grain incremental function engine with an example application which evaluates the addition of in-edge weights for each vertex in a graph of MRBG. So, as shown the input data i.e. the graph structure includes over time in running process. We present and explains that how the engine performs incremental processing to update the analysis results in mapreduce techniques. III. Algorithms In data mining and analysis the fundamental algorithms form field of data science which includes default methods to analyze patterns and models for all kinds of data with applications ranging from scientific discovery to business intelligence and analytics at the time of algorithm selection [8]. 3 General-Purpose Support For Iterative Computation 3.1 Analyzing Iterative Computation : PageRank is a iterative graph algorithm for ranking web pages in data mining. So, it evaluates a ranking score for each vertex in a graph in algorithm.
  • 3. I++ Mapreduce: Incremental Mapreduce for Mining The Big Data DOI: 10.9790/0661-180304125129 www.iosrjournals.org 127 | Page Algorithm 1: PageRank in MapReduce Kmeans : The Kmean is a generally used clustering algorithm that partitions points into k clusters . Fig. 3. Incremental processing for an application Algorithm 2: kmeans in MapReduce 4 Incremental Iterative Processing Here, we shows the incremental processing techniques for iterative evaluation. But it is not sufficient to simply gather the above solutions for incremental one step processing and iterative computation given below we discuss 3 main aspects that we noticed in order to obtaine an effective design of i2 mapreduce. Fig. 6. Iterative model of i2MapReduce.
  • 4. I++ Mapreduce: Incremental Mapreduce for Mining The Big Data DOI: 10.9790/0661-180304125129 www.iosrjournals.org 128 | Page IV. Result Analysis As we described all the algorithms which are helpful for i2 mapreduce Solutions. Actually, our experiments compare four solutions initially PlainMR recomp re-computation on vanilla Hadoop then iterMR recomp and re-computation on Hadoop optimized for iterative evaluation. Secondly, hadoop re-computation on the iterative Mapreduce framework Hadoop which controls mapreduce by providing a structure data caching mechanism then in i2 Mapreduce our proposed solution is to best of our knowledge the task-level coarse-grain incremental processing system. Here, Incoop is not normally present that’s why we can’t compare i2 MapReduce with Incoop technique. So, the statistics describes that without careful data partition almost all tasks see changes in the experiments and making task-level incremental processing less important than the other. Fig. 7. Run time of individual stages in PageRank. Fig. 8. Normalized runtime. V. Conclusion In data mining includes mapreduce function described i2 MapReduce technique. A MapReduce based framework for incremental big data process is used for i2 mapreduce function. I2 MapReduce collects a fine-grain incremental engine and a general-purpose iterative model also a set of useful techniques for incremental iterative evaluation. The real-machine experiments describes that i2 MapReduce can significantly decrease the run time for updating the big data mining results as compared to the recomputation on both plain and iterative MapReduce. Acknowledgements We especially thanks to our department and our Institute for the great support regarding paper and their all views. I really thankful to my all staffs and our H.O.D. Prof. N.R. kale. who showed me the way of successful journey of publishing paper.
  • 5. I++ Mapreduce: Incremental Mapreduce for Mining The Big Data DOI: 10.9790/0661-180304125129 www.iosrjournals.org 129 | Page References Journal Papers [1]. J. Dean and S. Ghemawat, “ Mapreduce: Simplified data processing on large clusters,” in Proc. 6th Conf. Symp. Opear. Syst. Des.Implementation, 2004, p. 10. [2]. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed Datasets : A fault-tolerant abstraction for, in-memory cluster computing,” in Proc. 9th USENIX Conf. Netw. Syst. Des. Implementation,2012, p. 2. [3]. R. Power and J. Li, “Piccolo: Building fast, distributed programs with partitioned tables,” in Proc. 9th USENIX Conf. Oper. Syst. Des.Implementation, 2010, pp. 1–14. [4]. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: A system for large-scale Graph processing,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2010, pp. 135–146. [5]. S. R. Mihaylov, Z. G. Ives, and S. Guha, “Rex: Recursive, deltabased data-centric computation,” in Proc. VLDB Endowment, 2012,vol. 5, no. 11, pp. 1280–1291. [6]. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein, “Distributed graphlab: A framework for machine Learning and data mining in the cloud,” in Proc. VLDB Endowment, 2012, vol. 5, no. 8, pp. 716–727. Proceedings Papers [7]. S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl, “Spinning fast iterative data flows,” in Proc. VLDB Endowment, 2012, vol. 5,no. 11, pp. 1268–1279. Books [8]. Mohammed j. Zaki’s and Wagner Meira jr.’s Data mining and Analysis (Cambridge university , 2014 )