SlideShare a Scribd company logo
1 of 5
Download to read offline
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129
www.iosrjournals.org
DOI: 10.9790/0661-180304125129 www.iosrjournals.org 125 | Page
I++
Mapreduce: Incremental Mapreduce for Mining the Big Data
Prof. Kanchan Dhomse B.1
1
(IT, METS BKC IOE, NASHIK, India)
Abstract: Data mining is an interdisciplinary area of computer engineering. Incremental processing is a
challenging approach to refreshing mining results and it uses saved states to avoid the cost of re computation
from damage. Here, I propose i2
MapReduce is a novel incremental processing expansion to MapReduce in data
mining. As compared with the state-of-the-art work on Incoop, instead of using task level re-computation the
i2
MapReduce executes the key-value pair level incremental processing . It helps not only one-step computation
but also more sophisticated iterative computation. Then it incorporates a set of novel techniques to decrease
input output level for accessing dedicated fine-grain computation states in the data mining. The final results on
Amazon EC2 describe the significant performance development of i2
MapReduce as compared to iterative and
plain MapReduce performing re computation.
Keywords: Big data, Incremental Processing, Mapreduce, MRBGraph, Recompilation.
I. Introduction
Recently, huge amount of digital data is being assembled in many important areas like social network,
e-commerce, education, finance, health care and environment. It has become increasingly popular to mine such
big data in order to obtain outputs to help business decisions or to give better personalized and higher quality
services in data mining. Recently a huge number of computing frameworks [1-10] have been developed for big
data analysis in data mining. So, among these frameworks the MapReduce [1] with its open-source
implementations, such as Hadoop is the mostly used in production due to its simplicity and generality and
maturity. Here, I focus on improving MapReduce performance. Actually, Big data is constantly increasing new
updates are being collected and the input data of a big data mining algorithm will slowely change. Generally, it
is desirable to periodically updates the mining computation in order to keep the mining results accurate. Here,
some are examples like the PageRank algorithm [11] evaluates ranking scores of web pages based on the web
graph structure for supporting web search in data mining. Here, I explain the nature of the problem and previous
work, purpose, and the contribution of the paper which shows the structure of mapreduce. The incremental
processing utilize the fact that the input data of two subsequent computations are similar . Only a very small
fraction of the input data has changed. Here, system observe and find the realization of this principle in the
context of the MapReduce computing framework in data mining.
Fig.1: computation of Map Reduce
II. Implementation
1. Mapreduce Background
Generally, a MapReduce program is made up of a Map function and a Reduce function [1], as shown in
Fig. 1. Their APIs are as shown below:
Map(k1,v1)-> (<k2,v2>)
Reduce(k2,v2)->(<k3,v3>)
Here, the Map function enters a kv-pair <K1, V1> as input and evaluate zero or more middle kv-pairs
<K2,V2> in mapreduce function. Now all <K2, V2> are grouped by K2 and The Reduce function inserts a K2
and a list of V 2 as input and computes the final output kv-pairs <K3,V3> in mapreduce. So, a MapReduce
I++
Mapreduce: Incremental Mapreduce for Mining The Big Data
DOI: 10.9790/0661-180304125129 www.iosrjournals.org 126 | Page
normally reads the input data of the MapReduce evaluation from and writes the final results to a distributed file
system which distributes a file into equal-sized i.e. 64 MB blocks and collects the blocks across a cluster of
machines in data mining. The mapreduce system executes a JobTracker process on a master node to concentrate
the job progress and a set of Task Tracker processes on worker nodes to implement the actual Map and Reduce
tasks. Then, the Job Tracker starts a Map task per data block and typically assigns it to the TaskTracker on the
machine that holds the corresponding data block in order to decrease communication. At each Map task calls the
map function for every input <K1, V1> pair and stores the middle kv-pairs <K2, V2> on local disks in
mapreduce function. The intermediate results are shuffled to reduce tasks according to a divides the function on
K2 in pair. So, after a reduce task gets and merges middle pair’s results from all map tasks then it invokes the
reduce function on every <K2, V 2> pair to create the final output kv-pairs <K3, V3>among all pairs.
2 Fine- Grain Incremental Processing For One-Step Computation
Here, we start by describing the basic idea of fine-grain incremental processing in the mapreduce
function . it shows the main design including the MRBGraph abstraction and the incremental processing engine
in mapreduce process. Here we divide it into two aspects of the design. i.e. the mechanism that maintain the
fine-grain states and the handling of a special case where the Reduce function performs collection function.
2.1 MRBGraph Abstraction
We use a MRBGraph i.e. Map Reduce Bipartite Graph is the abstraction to model the data flow in
MapReduce function. As shown in Fig. 2(a) every vertex in the Map task represents an single Map function call
instance on a pair of <K1,V1> nodes. At every vertex in the Reduce task represents an individual Reduce
function call instance on a group of <K2; V 2>. The range from a Map instance to a Reduce instance means that
the Map instance creates a <K2,V2> pair that is shuffled to become part of the input to the Reduce instance of
pair. Here, for example the input of reduce instance getting from map instance 0, 2, and 4 respectively.
Therefore, i2
MapReduce will maintain (K2, MK, V 2) for each MRBGraph edge in given function.
Fig. 2. MRB Graph
2.2 Fine-Grain Incremental Processing Engine
Here, figure shows the fine-grain incremental function engine with an example application which
evaluates the addition of in-edge weights for each vertex in a graph of MRBG. So, as shown the input data i.e.
the graph structure includes over time in running process. We present and explains that how the engine
performs incremental processing to update the analysis results in mapreduce techniques.
III. Algorithms
In data mining and analysis the fundamental algorithms form field of data science which includes
default methods to analyze patterns and models for all kinds of data with applications ranging from scientific
discovery to business intelligence and analytics at the time of algorithm selection [8].
3 General-Purpose Support For Iterative Computation
3.1 Analyzing Iterative Computation :
PageRank is a iterative graph algorithm for ranking web pages in data mining. So, it evaluates a ranking score
for each vertex in a graph in algorithm.
I++
Mapreduce: Incremental Mapreduce for Mining The Big Data
DOI: 10.9790/0661-180304125129 www.iosrjournals.org 127 | Page
Algorithm 1: PageRank in MapReduce
Kmeans : The Kmean is a generally used clustering algorithm that partitions points into k clusters .
Fig. 3. Incremental processing for an application
Algorithm 2: kmeans in MapReduce
4 Incremental Iterative Processing
Here, we shows the incremental processing techniques for iterative evaluation. But it is not sufficient
to simply gather the above solutions for incremental one step processing and iterative computation given
below we discuss 3 main aspects that we noticed in order to obtaine an effective design of i2
mapreduce.
Fig. 6. Iterative model of i2MapReduce.
I++
Mapreduce: Incremental Mapreduce for Mining The Big Data
DOI: 10.9790/0661-180304125129 www.iosrjournals.org 128 | Page
IV. Result Analysis
As we described all the algorithms which are helpful for i2
mapreduce Solutions. Actually, our
experiments compare four solutions initially PlainMR recomp re-computation on vanilla Hadoop then iterMR
recomp and re-computation on Hadoop optimized for iterative evaluation. Secondly, hadoop re-computation on
the iterative Mapreduce framework Hadoop which controls mapreduce by providing a structure data caching
mechanism then in i2
Mapreduce our proposed solution is to best of our knowledge the task-level coarse-grain
incremental processing system. Here, Incoop is not normally present that’s why we can’t compare
i2
MapReduce with Incoop technique. So, the statistics describes that without careful data partition almost all
tasks see changes in the experiments and making task-level incremental processing less important than the other.
Fig. 7. Run time of individual stages in PageRank.
Fig. 8. Normalized runtime.
V. Conclusion
In data mining includes mapreduce function described i2
MapReduce technique. A MapReduce based
framework for incremental big data process is used for i2
mapreduce function. I2
MapReduce collects a fine-grain
incremental engine and a general-purpose iterative model also a set of useful techniques for incremental iterative
evaluation. The real-machine experiments describes that i2
MapReduce can significantly decrease the run time
for updating the big data mining results as compared to the recomputation on both plain and iterative
MapReduce.
Acknowledgements
We especially thanks to our department and our Institute for the great support regarding paper and their
all views. I really thankful to my all staffs and our H.O.D. Prof. N.R. kale. who showed me the way of
successful journey of publishing paper.
I++
Mapreduce: Incremental Mapreduce for Mining The Big Data
DOI: 10.9790/0661-180304125129 www.iosrjournals.org 129 | Page
References
Journal Papers
[1]. J. Dean and S. Ghemawat, “ Mapreduce: Simplified data processing on large clusters,” in Proc. 6th Conf. Symp. Opear. Syst.
Des.Implementation, 2004, p. 10.
[2]. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed
Datasets : A fault-tolerant abstraction for, in-memory cluster computing,” in Proc. 9th USENIX Conf. Netw. Syst. Des.
Implementation,2012, p. 2.
[3]. R. Power and J. Li, “Piccolo: Building fast, distributed programs with partitioned tables,” in Proc. 9th USENIX Conf. Oper. Syst.
Des.Implementation, 2010, pp. 1–14.
[4]. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: A system for large-scale
Graph processing,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2010, pp. 135–146.
[5]. S. R. Mihaylov, Z. G. Ives, and S. Guha, “Rex: Recursive, deltabased data-centric computation,” in Proc. VLDB Endowment,
2012,vol. 5, no. 11, pp. 1280–1291.
[6]. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein, “Distributed graphlab: A framework for machine
Learning and data mining in the cloud,” in Proc. VLDB Endowment, 2012, vol. 5, no. 8, pp. 716–727.
Proceedings Papers
[7]. S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl, “Spinning fast iterative data flows,” in Proc. VLDB Endowment, 2012, vol.
5,no. 11, pp. 1268–1279.
Books
[8]. Mohammed j. Zaki’s and Wagner Meira jr.’s Data mining and Analysis (Cambridge university , 2014 )

More Related Content

What's hot

Scrutinizing mapreduce mechanism with orientation to refine the productivity
Scrutinizing mapreduce mechanism with orientation to refine the productivityScrutinizing mapreduce mechanism with orientation to refine the productivity
Scrutinizing mapreduce mechanism with orientation to refine the productivityeSAT Journals
 
Scheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy CostsScheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy Costsinventionjournals
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsRobert Grossman
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkIRJET Journal
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins Edureka!
 
A Brief on MapReduce Performance
A Brief on MapReduce PerformanceA Brief on MapReduce Performance
A Brief on MapReduce PerformanceAM Publications
 
Gis rainfallstudy
Gis rainfallstudyGis rainfallstudy
Gis rainfallstudysteven_uk
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniquejournalBEEI
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusterskazuma_sato
 
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...IRJET Journal
 
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...ijccsa
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsNECST Lab @ Politecnico di Milano
 
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHM
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHMPROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHM
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHMecij
 
Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters IJECEIAES
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modelingnadikari123
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computingijujournal
 

What's hot (20)

Scrutinizing mapreduce mechanism with orientation to refine the productivity
Scrutinizing mapreduce mechanism with orientation to refine the productivityScrutinizing mapreduce mechanism with orientation to refine the productivity
Scrutinizing mapreduce mechanism with orientation to refine the productivity
 
IJET-V3I2P24
IJET-V3I2P24IJET-V3I2P24
IJET-V3I2P24
 
Scheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy CostsScheduling Divisible Jobs to Optimize the Computation and Energy Costs
Scheduling Divisible Jobs to Optimize the Computation and Energy Costs
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
A Brief on MapReduce Performance
A Brief on MapReduce PerformanceA Brief on MapReduce Performance
A Brief on MapReduce Performance
 
Gis rainfallstudy
Gis rainfallstudyGis rainfallstudy
Gis rainfallstudy
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusters
 
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
 
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
 
D0212326
D0212326D0212326
D0212326
 
133
133133
133
 
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHM
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHMPROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHM
PROCESS OF LOAD BALANCING IN CLOUD COMPUTING USING GENETIC ALGORITHM
 
Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters Optimization of energy consumption in cloud computing datacenters
Optimization of energy consumption in cloud computing datacenters
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modeling
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computing
 

Viewers also liked

Practical Investigation of the Environmental Hazards of Idle Time and Speed o...
Practical Investigation of the Environmental Hazards of Idle Time and Speed o...Practical Investigation of the Environmental Hazards of Idle Time and Speed o...
Practical Investigation of the Environmental Hazards of Idle Time and Speed o...IOSR Journals
 
J010627176.iosr jece
J010627176.iosr jeceJ010627176.iosr jece
J010627176.iosr jeceIOSR Journals
 
Cloud Computing for E-Commerce
Cloud Computing for E-CommerceCloud Computing for E-Commerce
Cloud Computing for E-CommerceIOSR Journals
 
Leather Quality Estimation Using an Automated Machine Vision System
Leather Quality Estimation Using an Automated Machine Vision SystemLeather Quality Estimation Using an Automated Machine Vision System
Leather Quality Estimation Using an Automated Machine Vision SystemIOSR Journals
 
A Generalization of QN-Maps
A Generalization of QN-MapsA Generalization of QN-Maps
A Generalization of QN-MapsIOSR Journals
 
Providing The Security Against The DDOS Attack In Mobile Ad Hoc Networks
Providing The Security Against The DDOS Attack In Mobile Ad Hoc NetworksProviding The Security Against The DDOS Attack In Mobile Ad Hoc Networks
Providing The Security Against The DDOS Attack In Mobile Ad Hoc NetworksIOSR Journals
 
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...IOSR Journals
 

Viewers also liked (20)

A1804010105
A1804010105A1804010105
A1804010105
 
Practical Investigation of the Environmental Hazards of Idle Time and Speed o...
Practical Investigation of the Environmental Hazards of Idle Time and Speed o...Practical Investigation of the Environmental Hazards of Idle Time and Speed o...
Practical Investigation of the Environmental Hazards of Idle Time and Speed o...
 
F010423541
F010423541F010423541
F010423541
 
J010627176.iosr jece
J010627176.iosr jeceJ010627176.iosr jece
J010627176.iosr jece
 
F017633538
F017633538F017633538
F017633538
 
A017260110
A017260110A017260110
A017260110
 
L017136269
L017136269L017136269
L017136269
 
Cloud Computing for E-Commerce
Cloud Computing for E-CommerceCloud Computing for E-Commerce
Cloud Computing for E-Commerce
 
I1304026367
I1304026367I1304026367
I1304026367
 
Leather Quality Estimation Using an Automated Machine Vision System
Leather Quality Estimation Using an Automated Machine Vision SystemLeather Quality Estimation Using an Automated Machine Vision System
Leather Quality Estimation Using an Automated Machine Vision System
 
N017117276
N017117276N017117276
N017117276
 
I017144954
I017144954I017144954
I017144954
 
N010237982
N010237982N010237982
N010237982
 
S0102297102
S0102297102S0102297102
S0102297102
 
J010126267
J010126267J010126267
J010126267
 
D010121832
D010121832D010121832
D010121832
 
A Generalization of QN-Maps
A Generalization of QN-MapsA Generalization of QN-Maps
A Generalization of QN-Maps
 
C1803061620
C1803061620C1803061620
C1803061620
 
Providing The Security Against The DDOS Attack In Mobile Ad Hoc Networks
Providing The Security Against The DDOS Attack In Mobile Ad Hoc NetworksProviding The Security Against The DDOS Attack In Mobile Ad Hoc Networks
Providing The Security Against The DDOS Attack In Mobile Ad Hoc Networks
 
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...
Safety Features Evaluation of the Vehicle Interior Using Finite Element Analy...
 

Similar to T180304125129

A simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceA simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceIRJET Journal
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfTSANKARARAO
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersBRNSSPublicationHubI
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacmlmphuong06
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...jencyjayastina
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...dbpublications
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmNilaNila16
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterIRJET Journal
 
Review on Multiply-Accumulate Unit
Review on Multiply-Accumulate UnitReview on Multiply-Accumulate Unit
Review on Multiply-Accumulate UnitIJERA Editor
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintijccsa
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467IJRAT
 
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...I3E Technologies
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
FinalprojectpresentationSANTOSH WAYAL
 
Dynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine MappingDynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine MappingAM Publications
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspectiveপল্লব রায়
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 

Similar to T180304125129 (20)

A simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceA simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
 
IJET-V3I1P27
IJET-V3I1P27IJET-V3I1P27
IJET-V3I1P27
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Generating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop ClustersGenerating Frequent Itemsets by RElim on Hadoop Clusters
Generating Frequent Itemsets by RElim on Hadoop Clusters
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacm
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
 
Review on Multiply-Accumulate Unit
Review on Multiply-Accumulate UnitReview on Multiply-Accumulate Unit
Review on Multiply-Accumulate Unit
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
 
Paper id 25201467
Paper id 25201467Paper id 25201467
Paper id 25201467
 
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 
Dynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine MappingDynamically Partitioning Big Data Using Virtual Machine Mapping
Dynamically Partitioning Big Data Using Virtual Machine Mapping
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

T180304125129

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org DOI: 10.9790/0661-180304125129 www.iosrjournals.org 125 | Page I++ Mapreduce: Incremental Mapreduce for Mining the Big Data Prof. Kanchan Dhomse B.1 1 (IT, METS BKC IOE, NASHIK, India) Abstract: Data mining is an interdisciplinary area of computer engineering. Incremental processing is a challenging approach to refreshing mining results and it uses saved states to avoid the cost of re computation from damage. Here, I propose i2 MapReduce is a novel incremental processing expansion to MapReduce in data mining. As compared with the state-of-the-art work on Incoop, instead of using task level re-computation the i2 MapReduce executes the key-value pair level incremental processing . It helps not only one-step computation but also more sophisticated iterative computation. Then it incorporates a set of novel techniques to decrease input output level for accessing dedicated fine-grain computation states in the data mining. The final results on Amazon EC2 describe the significant performance development of i2 MapReduce as compared to iterative and plain MapReduce performing re computation. Keywords: Big data, Incremental Processing, Mapreduce, MRBGraph, Recompilation. I. Introduction Recently, huge amount of digital data is being assembled in many important areas like social network, e-commerce, education, finance, health care and environment. It has become increasingly popular to mine such big data in order to obtain outputs to help business decisions or to give better personalized and higher quality services in data mining. Recently a huge number of computing frameworks [1-10] have been developed for big data analysis in data mining. So, among these frameworks the MapReduce [1] with its open-source implementations, such as Hadoop is the mostly used in production due to its simplicity and generality and maturity. Here, I focus on improving MapReduce performance. Actually, Big data is constantly increasing new updates are being collected and the input data of a big data mining algorithm will slowely change. Generally, it is desirable to periodically updates the mining computation in order to keep the mining results accurate. Here, some are examples like the PageRank algorithm [11] evaluates ranking scores of web pages based on the web graph structure for supporting web search in data mining. Here, I explain the nature of the problem and previous work, purpose, and the contribution of the paper which shows the structure of mapreduce. The incremental processing utilize the fact that the input data of two subsequent computations are similar . Only a very small fraction of the input data has changed. Here, system observe and find the realization of this principle in the context of the MapReduce computing framework in data mining. Fig.1: computation of Map Reduce II. Implementation 1. Mapreduce Background Generally, a MapReduce program is made up of a Map function and a Reduce function [1], as shown in Fig. 1. Their APIs are as shown below: Map(k1,v1)-> (<k2,v2>) Reduce(k2,v2)->(<k3,v3>) Here, the Map function enters a kv-pair <K1, V1> as input and evaluate zero or more middle kv-pairs <K2,V2> in mapreduce function. Now all <K2, V2> are grouped by K2 and The Reduce function inserts a K2 and a list of V 2 as input and computes the final output kv-pairs <K3,V3> in mapreduce. So, a MapReduce
  • 2. I++ Mapreduce: Incremental Mapreduce for Mining The Big Data DOI: 10.9790/0661-180304125129 www.iosrjournals.org 126 | Page normally reads the input data of the MapReduce evaluation from and writes the final results to a distributed file system which distributes a file into equal-sized i.e. 64 MB blocks and collects the blocks across a cluster of machines in data mining. The mapreduce system executes a JobTracker process on a master node to concentrate the job progress and a set of Task Tracker processes on worker nodes to implement the actual Map and Reduce tasks. Then, the Job Tracker starts a Map task per data block and typically assigns it to the TaskTracker on the machine that holds the corresponding data block in order to decrease communication. At each Map task calls the map function for every input <K1, V1> pair and stores the middle kv-pairs <K2, V2> on local disks in mapreduce function. The intermediate results are shuffled to reduce tasks according to a divides the function on K2 in pair. So, after a reduce task gets and merges middle pair’s results from all map tasks then it invokes the reduce function on every <K2, V 2> pair to create the final output kv-pairs <K3, V3>among all pairs. 2 Fine- Grain Incremental Processing For One-Step Computation Here, we start by describing the basic idea of fine-grain incremental processing in the mapreduce function . it shows the main design including the MRBGraph abstraction and the incremental processing engine in mapreduce process. Here we divide it into two aspects of the design. i.e. the mechanism that maintain the fine-grain states and the handling of a special case where the Reduce function performs collection function. 2.1 MRBGraph Abstraction We use a MRBGraph i.e. Map Reduce Bipartite Graph is the abstraction to model the data flow in MapReduce function. As shown in Fig. 2(a) every vertex in the Map task represents an single Map function call instance on a pair of <K1,V1> nodes. At every vertex in the Reduce task represents an individual Reduce function call instance on a group of <K2; V 2>. The range from a Map instance to a Reduce instance means that the Map instance creates a <K2,V2> pair that is shuffled to become part of the input to the Reduce instance of pair. Here, for example the input of reduce instance getting from map instance 0, 2, and 4 respectively. Therefore, i2 MapReduce will maintain (K2, MK, V 2) for each MRBGraph edge in given function. Fig. 2. MRB Graph 2.2 Fine-Grain Incremental Processing Engine Here, figure shows the fine-grain incremental function engine with an example application which evaluates the addition of in-edge weights for each vertex in a graph of MRBG. So, as shown the input data i.e. the graph structure includes over time in running process. We present and explains that how the engine performs incremental processing to update the analysis results in mapreduce techniques. III. Algorithms In data mining and analysis the fundamental algorithms form field of data science which includes default methods to analyze patterns and models for all kinds of data with applications ranging from scientific discovery to business intelligence and analytics at the time of algorithm selection [8]. 3 General-Purpose Support For Iterative Computation 3.1 Analyzing Iterative Computation : PageRank is a iterative graph algorithm for ranking web pages in data mining. So, it evaluates a ranking score for each vertex in a graph in algorithm.
  • 3. I++ Mapreduce: Incremental Mapreduce for Mining The Big Data DOI: 10.9790/0661-180304125129 www.iosrjournals.org 127 | Page Algorithm 1: PageRank in MapReduce Kmeans : The Kmean is a generally used clustering algorithm that partitions points into k clusters . Fig. 3. Incremental processing for an application Algorithm 2: kmeans in MapReduce 4 Incremental Iterative Processing Here, we shows the incremental processing techniques for iterative evaluation. But it is not sufficient to simply gather the above solutions for incremental one step processing and iterative computation given below we discuss 3 main aspects that we noticed in order to obtaine an effective design of i2 mapreduce. Fig. 6. Iterative model of i2MapReduce.
  • 4. I++ Mapreduce: Incremental Mapreduce for Mining The Big Data DOI: 10.9790/0661-180304125129 www.iosrjournals.org 128 | Page IV. Result Analysis As we described all the algorithms which are helpful for i2 mapreduce Solutions. Actually, our experiments compare four solutions initially PlainMR recomp re-computation on vanilla Hadoop then iterMR recomp and re-computation on Hadoop optimized for iterative evaluation. Secondly, hadoop re-computation on the iterative Mapreduce framework Hadoop which controls mapreduce by providing a structure data caching mechanism then in i2 Mapreduce our proposed solution is to best of our knowledge the task-level coarse-grain incremental processing system. Here, Incoop is not normally present that’s why we can’t compare i2 MapReduce with Incoop technique. So, the statistics describes that without careful data partition almost all tasks see changes in the experiments and making task-level incremental processing less important than the other. Fig. 7. Run time of individual stages in PageRank. Fig. 8. Normalized runtime. V. Conclusion In data mining includes mapreduce function described i2 MapReduce technique. A MapReduce based framework for incremental big data process is used for i2 mapreduce function. I2 MapReduce collects a fine-grain incremental engine and a general-purpose iterative model also a set of useful techniques for incremental iterative evaluation. The real-machine experiments describes that i2 MapReduce can significantly decrease the run time for updating the big data mining results as compared to the recomputation on both plain and iterative MapReduce. Acknowledgements We especially thanks to our department and our Institute for the great support regarding paper and their all views. I really thankful to my all staffs and our H.O.D. Prof. N.R. kale. who showed me the way of successful journey of publishing paper.
  • 5. I++ Mapreduce: Incremental Mapreduce for Mining The Big Data DOI: 10.9790/0661-180304125129 www.iosrjournals.org 129 | Page References Journal Papers [1]. J. Dean and S. Ghemawat, “ Mapreduce: Simplified data processing on large clusters,” in Proc. 6th Conf. Symp. Opear. Syst. Des.Implementation, 2004, p. 10. [2]. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed Datasets : A fault-tolerant abstraction for, in-memory cluster computing,” in Proc. 9th USENIX Conf. Netw. Syst. Des. Implementation,2012, p. 2. [3]. R. Power and J. Li, “Piccolo: Building fast, distributed programs with partitioned tables,” in Proc. 9th USENIX Conf. Oper. Syst. Des.Implementation, 2010, pp. 1–14. [4]. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: A system for large-scale Graph processing,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2010, pp. 135–146. [5]. S. R. Mihaylov, Z. G. Ives, and S. Guha, “Rex: Recursive, deltabased data-centric computation,” in Proc. VLDB Endowment, 2012,vol. 5, no. 11, pp. 1280–1291. [6]. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein, “Distributed graphlab: A framework for machine Learning and data mining in the cloud,” in Proc. VLDB Endowment, 2012, vol. 5, no. 8, pp. 716–727. Proceedings Papers [7]. S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl, “Spinning fast iterative data flows,” in Proc. VLDB Endowment, 2012, vol. 5,no. 11, pp. 1268–1279. Books [8]. Mohammed j. Zaki’s and Wagner Meira jr.’s Data mining and Analysis (Cambridge university , 2014 )