SlideShare a Scribd company logo
1 of 3
Download to read offline
ISSN 2348-1196 (print)
International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online)
Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com
Page | 159
Research Publish Journals
A Data Aware Caching (Dache) for Big-Data
Applications Using the MapReduce Framework
Utkarsh Honey1
, Yogesh More2
, Prasad Wandhekar3
, Santosh Wayal4
,
Prof. Jayashree Chaudhari5
1, 2, 3, 4, 5
Dr. D Y Patil School of Engineering, Pune, India 411047
Abstract: The big-data refers to the large-scale distributed data processing applications which works on
exceptionally large amounts of data. Google’s MapReduce and Apache’s Hadoop, its open-source implementation,
are the software systems for big-data applications. An observation of the MapReduce framework is that the
framework generates a large amount of intermediate data. MapReduce is unable to utilize such data so they are
thrown after used. We propose Dache, a data-aware cache framework used for big-data applications. In Dache,
tasks submit their intermediate results to the cache manager and queries the cache manager before executing the
actual computing work. A novel cache description system and a cache request and reply protocol aredesigned
Keyword: Big-data, MapReduce, Hadoop, caching.
I. INTRODUCTION
Google MapReduce is a programming model and also a software framework for Big -scale distributed Computing on
large amounts of data. Figure illustrates the high level work flow of a MapReduce Task. Application developers specify
the computation in terms of reduce function and a map and the underlying MapReduce Task scheduling system
automatically parallelizes the computation across the cluster of machines. While MapReduce obtain popularity for its
simple programming interface and excellent Performance when implement a large spectrum of applications. Since most
such applications take a huge amount of input data, they are called as “Big-dataapplications”.
Input data is first divided and then given to workers in the map stage. Every Individual data items are called records. The
MapReduce system parses the input splits to each worker and produces records. After the phase of map, intermediate
results generated in the map phase are shuffled and sorted by the MapReduce system which are then given into the
workers in the reduce phase. Final results are computed by multiple reducers and after it written back to the disk. Hadoop
is an open-source implementation of the Google MapReduce programming model. Hadoop includes of the Hadoop
Common, which provides access to the file systems supported by Hadoop. HDFS (Hadoop Distributed File System)
provides distributed file storage and is optimized for large unchangeable blocks of data. A small Hadoop cluster will
include a single master and multiple worker nodes known as slave. Whereas master node runs multiple processes,
including a Task tracker and a Name Node. The Task tracker is having control for the management of running jobs in the
Hadoop cluster. While the name node manages the HDFS, The Task tracker and the Name Node are primarily collocated
on the same physical machine. The other servers in the cluster run a Task tracker for Data Node processes. A MapReduce
job is divided into number of tasks. Tasks are managed by the Task tracker. The Task trackers and the Data Node are
collated on the same servers to provide data locality. MapReduce provides a standardized framework for implementing
the large-scale distributed computation, known as the big-data applications. Still, there is restriction of the system. i.e. The
inefficiency in incremental processing which refers to the applications that incrementally grow up the input data and
continuously apply computations on this input and produce output. There are potential duplicate computations and
operations are performed in this process. As the MapReduce does not have the any technique which is to identify such
duplicate computations and accelerate the job execution. While Motivating by this observation, in same paper we propose,
a data-aware cache scheme for big data applications using MapReduce framework, which goals at extending the
MapReduce framework and provisioning a cache layer which is used for efficiently identifying and accessing cache items
in a MapReduce job.
ISSN 2348-1196 (print)
International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online)
Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com
Page | 160
Research Publish Journals
II. OBJECTIVE(S) AND SCOPE
The Scope Of The Work Can Be Extended To Following:
 Requires minimum change to the original MapReduce programmingmodel.
 Application code only requires slight changes in order to utilize Dache.
 Implement Dache in Hadoop by extending the relevant components.
 Tested experiments show that it can easily eliminate all the duplicate tasks in incremental MapReducejobs.
 Minimum execution time and CPU utilization
Problem Statement:
In current Hadoop MapReduce framework is that the framework generates a large flow of intermediate data. MapReduce
is unable to save that such data so they are deleted after used. But in our system we introducing the cache memory that
holds the intermediate results in it, because of that the data processing, means job executing processing is faster than old
system, So that system is a time consuming, repetition of data processing arereduced.
Map Reduce Architecture:
Fig 1: Map Reduce Architecture
ISSN 2348-1196 (print)
International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online)
Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com
Page | 161
Research Publish Journals
2.3.1 Operations:
 Clients submit jobs to the Job Tracker.
 Job Tracker talks to the name node.
 Job Tracker creates execution plan.
 Job Tracker submits works to Task tracker.
 Task Trackers report progress via heart beats.
 Job Tracker manages the phases.
 Job Tracker updates the status.
Software & Hardware Requirement:
Software Requirements:
Operating System : WINDOWS XP & ABOVE
Database : My SQL,Hadoop
Language : java
Hardware Requirements:
System : Pentium Iv 2.4 GHz
Hard Disk : 40 GB
Monitor : 15 VGA Color
RAM : 512 Mb
III. CONCLUSIONS
This paper present the design and evaluation of a data aware cache framework that requires minimum changes to the
original MapReduce programming model for provisioning the incremental processing for Big data applications using
MapReduce model. In this paper we propose, a data-aware cache description scheme, architecture and protocol. In this
Paper Presented method only requires a slight modification in the input format processing and task management of
MapReduce framework. As a result, application code requires only slight changes in order to use Data in data aware
caching. This paper implements Hadoop by extending relevant components. In the future, we propose to adapt our
framework is to more general application scenarios and also implement the scheme in the Hadoopproject.
REFERENCES
[1] J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, Communication of ACM, vol.
51, no. 1, pp. 107-113.
[2] Hadoop, http://www.Hadoop.apache.org.
[3] Cache algorithms, http://en.wikipedia.org/wiki/Cache algorithms.
[4] Java programming language, http://www.java.com/.
[5] Google compute engine, http://cloud.google.com/products/computeengine.html
[6] Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters
http://labs.google.com/papers/mapreduce.html
[7] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System http://labs.google.com/
papers/gfs.html

More Related Content

What's hot

What's hot (17)

Unit 1
Unit 1Unit 1
Unit 1
 
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics Leveraging Map Reduce With Hadoop for Weather Data Analytics
Leveraging Map Reduce With Hadoop for Weather Data Analytics
 
Eg4301808811
Eg4301808811Eg4301808811
Eg4301808811
 
Cppt
CpptCppt
Cppt
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
IJET-V2I6P25
IJET-V2I6P25IJET-V2I6P25
IJET-V2I6P25
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 

Similar to A data aware caching 2415

A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
NavNeet KuMar
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
inventionjournals
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
ijcsit
 

Similar to A data aware caching 2415 (20)

B017320612
B017320612B017320612
B017320612
 
Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERLOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
IJET-V3I2P24
IJET-V3I2P24IJET-V3I2P24
IJET-V3I2P24
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Hadoop ppt2
Hadoop ppt2Hadoop ppt2
Hadoop ppt2
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
 
Ijircce publish this paper
Ijircce publish this paperIjircce publish this paper
Ijircce publish this paper
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 

Recently uploaded

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 

A data aware caching 2415

  • 1. ISSN 2348-1196 (print) International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online) Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com Page | 159 Research Publish Journals A Data Aware Caching (Dache) for Big-Data Applications Using the MapReduce Framework Utkarsh Honey1 , Yogesh More2 , Prasad Wandhekar3 , Santosh Wayal4 , Prof. Jayashree Chaudhari5 1, 2, 3, 4, 5 Dr. D Y Patil School of Engineering, Pune, India 411047 Abstract: The big-data refers to the large-scale distributed data processing applications which works on exceptionally large amounts of data. Google’s MapReduce and Apache’s Hadoop, its open-source implementation, are the software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data. MapReduce is unable to utilize such data so they are thrown after used. We propose Dache, a data-aware cache framework used for big-data applications. In Dache, tasks submit their intermediate results to the cache manager and queries the cache manager before executing the actual computing work. A novel cache description system and a cache request and reply protocol aredesigned Keyword: Big-data, MapReduce, Hadoop, caching. I. INTRODUCTION Google MapReduce is a programming model and also a software framework for Big -scale distributed Computing on large amounts of data. Figure illustrates the high level work flow of a MapReduce Task. Application developers specify the computation in terms of reduce function and a map and the underlying MapReduce Task scheduling system automatically parallelizes the computation across the cluster of machines. While MapReduce obtain popularity for its simple programming interface and excellent Performance when implement a large spectrum of applications. Since most such applications take a huge amount of input data, they are called as “Big-dataapplications”. Input data is first divided and then given to workers in the map stage. Every Individual data items are called records. The MapReduce system parses the input splits to each worker and produces records. After the phase of map, intermediate results generated in the map phase are shuffled and sorted by the MapReduce system which are then given into the workers in the reduce phase. Final results are computed by multiple reducers and after it written back to the disk. Hadoop is an open-source implementation of the Google MapReduce programming model. Hadoop includes of the Hadoop Common, which provides access to the file systems supported by Hadoop. HDFS (Hadoop Distributed File System) provides distributed file storage and is optimized for large unchangeable blocks of data. A small Hadoop cluster will include a single master and multiple worker nodes known as slave. Whereas master node runs multiple processes, including a Task tracker and a Name Node. The Task tracker is having control for the management of running jobs in the Hadoop cluster. While the name node manages the HDFS, The Task tracker and the Name Node are primarily collocated on the same physical machine. The other servers in the cluster run a Task tracker for Data Node processes. A MapReduce job is divided into number of tasks. Tasks are managed by the Task tracker. The Task trackers and the Data Node are collated on the same servers to provide data locality. MapReduce provides a standardized framework for implementing the large-scale distributed computation, known as the big-data applications. Still, there is restriction of the system. i.e. The inefficiency in incremental processing which refers to the applications that incrementally grow up the input data and continuously apply computations on this input and produce output. There are potential duplicate computations and operations are performed in this process. As the MapReduce does not have the any technique which is to identify such duplicate computations and accelerate the job execution. While Motivating by this observation, in same paper we propose, a data-aware cache scheme for big data applications using MapReduce framework, which goals at extending the MapReduce framework and provisioning a cache layer which is used for efficiently identifying and accessing cache items in a MapReduce job.
  • 2. ISSN 2348-1196 (print) International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online) Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com Page | 160 Research Publish Journals II. OBJECTIVE(S) AND SCOPE The Scope Of The Work Can Be Extended To Following:  Requires minimum change to the original MapReduce programmingmodel.  Application code only requires slight changes in order to utilize Dache.  Implement Dache in Hadoop by extending the relevant components.  Tested experiments show that it can easily eliminate all the duplicate tasks in incremental MapReducejobs.  Minimum execution time and CPU utilization Problem Statement: In current Hadoop MapReduce framework is that the framework generates a large flow of intermediate data. MapReduce is unable to save that such data so they are deleted after used. But in our system we introducing the cache memory that holds the intermediate results in it, because of that the data processing, means job executing processing is faster than old system, So that system is a time consuming, repetition of data processing arereduced. Map Reduce Architecture: Fig 1: Map Reduce Architecture
  • 3. ISSN 2348-1196 (print) International Journal of Computer Science and Information Technology Research ISSN 2348-120X (online) Vol. 3, Issue 4, pp: (159-161), Month: October - December 2015, Available at: www.researchpublish.com Page | 161 Research Publish Journals 2.3.1 Operations:  Clients submit jobs to the Job Tracker.  Job Tracker talks to the name node.  Job Tracker creates execution plan.  Job Tracker submits works to Task tracker.  Task Trackers report progress via heart beats.  Job Tracker manages the phases.  Job Tracker updates the status. Software & Hardware Requirement: Software Requirements: Operating System : WINDOWS XP & ABOVE Database : My SQL,Hadoop Language : java Hardware Requirements: System : Pentium Iv 2.4 GHz Hard Disk : 40 GB Monitor : 15 VGA Color RAM : 512 Mb III. CONCLUSIONS This paper present the design and evaluation of a data aware cache framework that requires minimum changes to the original MapReduce programming model for provisioning the incremental processing for Big data applications using MapReduce model. In this paper we propose, a data-aware cache description scheme, architecture and protocol. In this Paper Presented method only requires a slight modification in the input format processing and task management of MapReduce framework. As a result, application code requires only slight changes in order to use Data in data aware caching. This paper implements Hadoop by extending relevant components. In the future, we propose to adapt our framework is to more general application scenarios and also implement the scheme in the Hadoopproject. REFERENCES [1] J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, Communication of ACM, vol. 51, no. 1, pp. 107-113. [2] Hadoop, http://www.Hadoop.apache.org. [3] Cache algorithms, http://en.wikipedia.org/wiki/Cache algorithms. [4] Java programming language, http://www.java.com/. [5] Google compute engine, http://cloud.google.com/products/computeengine.html [6] Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters http://labs.google.com/papers/mapreduce.html [7] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System http://labs.google.com/ papers/gfs.html