Big data is a popular term used to define the exponential evolution and availability of data, includes both structured and unstructured data. The volatile progression of demands on big data processing imposes heavy burden on computation, communication and storage in geographically distributed data centers. Hence it is necessary to minimize the cost of big data processing, which also includes fault tolerance cost. Big Data processing involves two types of faults: node failure and data loss. Both the faults can be recovered using heartbeat messages. Here heartbeat messages acts as an acknowledgement messages between two servers. This paper depicts about the study of node failure and recovery, data replication and heartbeat messages.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Cloud computing is widely considered as potentially the next dominant technology in IT industry. It
offers basic system maintenance and scalable source management with Virtual Machines (VMs). As a essential
technology of cloud computing, VM has been a searing research issue in recent years. The high overhead of
virtualization has been well address by hardware expansion in CPU industry, and by software realization
improvement in hypervisors themselves. However, the high order on VM image storage remains a difficult
problem. Existing systems have made efforts to decrease VM image storage consumption by means of
deduplication inside a storage area network system. Nevertheless, storage area network cannot assure the
increasing demand of large-scale VM hosting for cloud computing because of its cost limitation. In this project,
we propose SILO, improved deduplication file system that has been particularly designed for major VM
deployment. Its design provide fast VM deployment with similarity and locality based fingerprint index for data
transfer and low storage consumption by means of deduplication on VM images. And implement heart beat
protocol in Meta Data Server (MDS) to recover the data from data server. It also provides a comprehensive set of
storage features including backup server for VM images, on-demand attractive through a network, and caching
through local disks by copy-on-read techniques. Experiments show that SILO features execute well and introduce
minor performance overhead.
Keywords — Deduplication, Storage area network, Load Balancing, Hash table, Disk copies.
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATIONijdms
Distributed databases and data replication are effective ways to increase the accessibility and reliability of
un-structured, semi-structured and structured data to extract new knowledge. Replications offer better
performance and greater availability of data. With the advent of Big Data, new storage and processing
challenges are emerging.
To meet these challenges, Hadoop and DHTs compete in the storage domain and MapReduce and others in
distributed processing, with their strengths and weaknesses.
We propose an analysis of the circular and radial replication mechanisms of the CLOAK DHT. We
evaluate their performance through a comparative study of data from simulations. The results show that
radial replication is better in storage, unlike circular replication, which gives better search results.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing.
A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large datasets
across cluster of workstations. To handle massive scale data, Hadoop exploits the Hadoop Distributed File
System termed as HDFS. The HDFS similar to most distributed file systems share a familiar problem on data
sharing and availability among compute nodes, often which leads to decrease in performance. This paper is an
experimental evaluation of Hadoop's computing performance which is made by designing a rack aware cluster
that utilizes the Hadoop’s default block placement policy to improve data availability. Additionally, an adaptive
data replication scheme that relies on access count prediction using Langrange’s interpolation is adapted to fit
the scenario. To prove, experiments were conducted on a rack aware cluster setup which significantly reduced
the task completion time, but once the volume of the data being processed increases there is a considerable
cutback in computational speeds due to update cost. Further the threshold level for balance between the update
cost and replication factor is identified and presented graphically.
There is a growing trend of applications that ought to handle huge information. However, analysing huge information may be a terribly difficult drawback nowadays. For such data many techniques can be considered. The technologies like Grid Computing, Volunteering Computing, and RDBMS can be considered as potential techniques to handle such data. We have a still in growing phase Hadoop Tool to handle such data also. We will do a survey on all this techniques to find a potential technique to manage and work with Big Data.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Cloud computing is widely considered as potentially the next dominant technology in IT industry. It
offers basic system maintenance and scalable source management with Virtual Machines (VMs). As a essential
technology of cloud computing, VM has been a searing research issue in recent years. The high overhead of
virtualization has been well address by hardware expansion in CPU industry, and by software realization
improvement in hypervisors themselves. However, the high order on VM image storage remains a difficult
problem. Existing systems have made efforts to decrease VM image storage consumption by means of
deduplication inside a storage area network system. Nevertheless, storage area network cannot assure the
increasing demand of large-scale VM hosting for cloud computing because of its cost limitation. In this project,
we propose SILO, improved deduplication file system that has been particularly designed for major VM
deployment. Its design provide fast VM deployment with similarity and locality based fingerprint index for data
transfer and low storage consumption by means of deduplication on VM images. And implement heart beat
protocol in Meta Data Server (MDS) to recover the data from data server. It also provides a comprehensive set of
storage features including backup server for VM images, on-demand attractive through a network, and caching
through local disks by copy-on-read techniques. Experiments show that SILO features execute well and introduce
minor performance overhead.
Keywords — Deduplication, Storage area network, Load Balancing, Hash table, Disk copies.
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATIONijdms
Distributed databases and data replication are effective ways to increase the accessibility and reliability of
un-structured, semi-structured and structured data to extract new knowledge. Replications offer better
performance and greater availability of data. With the advent of Big Data, new storage and processing
challenges are emerging.
To meet these challenges, Hadoop and DHTs compete in the storage domain and MapReduce and others in
distributed processing, with their strengths and weaknesses.
We propose an analysis of the circular and radial replication mechanisms of the CLOAK DHT. We
evaluate their performance through a comparative study of data from simulations. The results show that
radial replication is better in storage, unlike circular replication, which gives better search results.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing.
A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large datasets
across cluster of workstations. To handle massive scale data, Hadoop exploits the Hadoop Distributed File
System termed as HDFS. The HDFS similar to most distributed file systems share a familiar problem on data
sharing and availability among compute nodes, often which leads to decrease in performance. This paper is an
experimental evaluation of Hadoop's computing performance which is made by designing a rack aware cluster
that utilizes the Hadoop’s default block placement policy to improve data availability. Additionally, an adaptive
data replication scheme that relies on access count prediction using Langrange’s interpolation is adapted to fit
the scenario. To prove, experiments were conducted on a rack aware cluster setup which significantly reduced
the task completion time, but once the volume of the data being processed increases there is a considerable
cutback in computational speeds due to update cost. Further the threshold level for balance between the update
cost and replication factor is identified and presented graphically.
There is a growing trend of applications that ought to handle huge information. However, analysing huge information may be a terribly difficult drawback nowadays. For such data many techniques can be considered. The technologies like Grid Computing, Volunteering Computing, and RDBMS can be considered as potential techniques to handle such data. We have a still in growing phase Hadoop Tool to handle such data also. We will do a survey on all this techniques to find a potential technique to manage and work with Big Data.
Data Distribution Handling on Cloud for Deployment of Big Dataijccsa
Cloud computing is a new emerging model in the field of computer science. For varying workload Cloud computing presents a large scale on demand infrastructure. The primary usage of clouds in practice is to process massive amounts of data. Processing large datasets has become crucial in research and business environments. The big challenges associated with processing large datasets is the vast infrastructure required. Cloud computing provides vast infrastructure to store and process Big data. Vms can be provisioned on demand in cloud to process the data by forming cluster of Vms . Map Reduce paradigm can be used to process data wherein the mapper assign part of task to particular Vms in cluster and reducer combines individual output from each Vms to produce final result. we have proposed an algorithm to reduce the overall data distribution and processing time. We tested our solution in Cloud Analyst Simulation environment wherein, we found that our proposed algorithm significantly reduces the overall data processing time in cloud.
Survey on cloud backup services of personal storageeSAT Journals
Abstract In widespread cloud environment cloud services is tremendously growing due to large amount of personal computation data. Deduplication process is used for avoiding the redundant data. A cloud storage environment for data backup in personal computing devices facing various challenge, of source deduplication for the cloud backup services with low deduplication efficiency. Challenges facing in the process of deduplication for cloud backup service are-1)Low deduplication efficiency due to exclusive access to large amount of data and limited system resources of PC based client site.2)Low data transfer efficiency due to transferring deduplicate data from source to backup server are typically small but that can be often across the WAN. Keywords- Cloud computing, Deduplication, cloud backup, application awareness
Survey on Division and Replication of Data in Cloud for Optimal Performance a...IJSRD
Outsourcing information to an outsider authoritative control, as is done in distributed computing, offers ascend to security concerns. The information trade off may happen because of assaults by different clients and hubs inside of the cloud. Hence, high efforts to establish safety are required to secure information inside of the cloud. On the other hand, the utilized security procedure should likewise consider the advancement of the information recovery time. In this paper, we propose Division and Replication of Data in the Cloud for Optimal Performance and Security (DROPS) that all in all methodologies the security and execution issues. In the DROPS procedure, we partition a record into sections, and reproduce the divided information over the cloud hubs. Each of the hubs stores just a itary part of a specific information record that guarantees that even in the event of a fruitful assault, no important data is uncovered to the assailant. Additionally, the hubs putting away the sections are isolated with certain separation by method for diagram T-shading to restrict an assailant of speculating the areas of the sections. Moreover, the DROPS procedure does not depend on the customary cryptographic procedures for the information security; in this way alleviating the arrangement of computationally costly approaches. We demonstrate that the likelihood to find and bargain the greater part of the hubs putting away the sections of a solitary record is to a great degree low. We likewise analyze the execution of the DROPS system with ten different plans. The more elevated amount of security with slight execution overhead was watched.
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
The computer industry is being challenged to develop methods and techniques for affordable data processing on large datasets at optimum response times. The technical challenges in dealing with the increasing demand to handle vast quantities of data is daunting and on the rise. One of the recent processing models with a more efficient and intuitive solution to rapidly process large amount of data in parallel is called MapReduce. It is a framework defining a template approach of programming to perform large-scale data computation on clusters of machines in a cloud computing environment. MapReduce provides automatic parallelization and distribution of computation based on several processors. It hides the complexity of writing parallel and distributed programming code. This paper provides a comprehensive systematic review and analysis of large-scale dataset processing and dataset handling challenges and
requirements in a cloud computing environment by using the MapReduce framework and its open-source implementation Hadoop. We defined requirements for MapReduce systems to perform large-scale data processing. We also proposed the MapReduce framework and one implementation of this framework on Amazon Web Services. At the end of the paper, we presented an experimentation of running MapReduce
system in a cloud environment. This paper outlines one of the best techniques to process large datasets is MapReduce; it also can help developers to do parallel and distributed computation in a cloud environment.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Cloud computing is a new computing paradigm that, just as electricity was firstly generated at home and
evolved to be supplied from a few utility providers, aims to transform computing into a utility. It is a mapping
strategy that efficiently equilibrates the task load into multiple computational resources in the network based on the
system status to improve performance. The objective of this research paper is to show the results of Hybrid DEGA,
in which GA is implemented after DE
A DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical Systemsijseajournal
Cyber-Physical Systems (CPSs) involve the interconnection of heterogeneous computing devices which are
closely integrated with the physical processes under control. Often, these systems are resource-constrained
and require specific features such as the ability to adapt in a timeliness and efficient fashion to dynamic
environments. Also, they must support fault tolerance and avoid single points of failure. This paper
describes a scalable framework for CPSs based on the OMG DDS standard. The proposed solution allows
reconfiguring this kind of systems at run-time and managing efficiently their resources.
The advent of Big Data has seen the emergence of new processing and storage challenges. These challenges are often solved by distributed processing. Distributed systems are inherently dynamic and unstable, so it is realistic to expect that some resources will fail during use. Load balancing and task scheduling is an important step in determining the performance of parallel applications. Hence the need to design load balancing algorithms adapted to grid computing. In this paper, we propose a dynamic and hierarchical load balancing strategy at two levels: Intrascheduler load balancing, in order to avoid the use of the large-scale communication network, and interscheduler load balancing, for a load regulation of our whole system. The strategy allows improving the average response time of CLOAK-Reduce application tasks with minimal communication. We first focus on the three performance indicators, namely response time, process latency and running time of MapReduce tasks.
New Framework for Improving Bigdata Analaysis Using Mobile AgentMohammed Adam
In So many Companies Using Hadoop for bigdata Analysis. So the Hadoop has some drawbacks. To Overcome the Drawbacks of Hadoop Introducing Mobile Agent Under JADE.
Data Distribution Handling on Cloud for Deployment of Big Dataneirew J
Cloud computing is a new emerging model in the field of computer science. For varying workload Cloud
computing presents a large scale on demand infrastructure. The primary usage of clouds in practice is to
process massive amounts of data. Processing large datasets has become crucial in research and business
environments. The big challenges associated with processing large datasets is the vast infrastructure
required. Cloud computing provides vast infrastructure to store and process Big data. Vms can be
provisioned on demand in cloud to process the data by forming cluster of Vms . Map Reduce paradigm can
be used to process data wherein the mapper assign part of task to particular Vms in cluster and reducer
combines individual output from each Vms to produce final result. we have proposed an algorithm to
reduce the overall data distribution and processing time. We tested our solution in Cloud Analyst
Simulation environment wherein, we found that our proposed algorithm significantly reduces the overall
data processing time in cloud.
Data Distribution Handling on Cloud for Deployment of Big Dataijccsa
Cloud computing is a new emerging model in the field of computer science. For varying workload Cloud computing presents a large scale on demand infrastructure. The primary usage of clouds in practice is to process massive amounts of data. Processing large datasets has become crucial in research and business environments. The big challenges associated with processing large datasets is the vast infrastructure required. Cloud computing provides vast infrastructure to store and process Big data. Vms can be provisioned on demand in cloud to process the data by forming cluster of Vms . Map Reduce paradigm can be used to process data wherein the mapper assign part of task to particular Vms in cluster and reducer combines individual output from each Vms to produce final result. we have proposed an algorithm to reduce the overall data distribution and processing time. We tested our solution in Cloud Analyst Simulation environment wherein, we found that our proposed algorithm significantly reduces the overall data processing time in cloud.
Survey on cloud backup services of personal storageeSAT Journals
Abstract In widespread cloud environment cloud services is tremendously growing due to large amount of personal computation data. Deduplication process is used for avoiding the redundant data. A cloud storage environment for data backup in personal computing devices facing various challenge, of source deduplication for the cloud backup services with low deduplication efficiency. Challenges facing in the process of deduplication for cloud backup service are-1)Low deduplication efficiency due to exclusive access to large amount of data and limited system resources of PC based client site.2)Low data transfer efficiency due to transferring deduplicate data from source to backup server are typically small but that can be often across the WAN. Keywords- Cloud computing, Deduplication, cloud backup, application awareness
Survey on Division and Replication of Data in Cloud for Optimal Performance a...IJSRD
Outsourcing information to an outsider authoritative control, as is done in distributed computing, offers ascend to security concerns. The information trade off may happen because of assaults by different clients and hubs inside of the cloud. Hence, high efforts to establish safety are required to secure information inside of the cloud. On the other hand, the utilized security procedure should likewise consider the advancement of the information recovery time. In this paper, we propose Division and Replication of Data in the Cloud for Optimal Performance and Security (DROPS) that all in all methodologies the security and execution issues. In the DROPS procedure, we partition a record into sections, and reproduce the divided information over the cloud hubs. Each of the hubs stores just a itary part of a specific information record that guarantees that even in the event of a fruitful assault, no important data is uncovered to the assailant. Additionally, the hubs putting away the sections are isolated with certain separation by method for diagram T-shading to restrict an assailant of speculating the areas of the sections. Moreover, the DROPS procedure does not depend on the customary cryptographic procedures for the information security; in this way alleviating the arrangement of computationally costly approaches. We demonstrate that the likelihood to find and bargain the greater part of the hubs putting away the sections of a solitary record is to a great degree low. We likewise analyze the execution of the DROPS system with ten different plans. The more elevated amount of security with slight execution overhead was watched.
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
The computer industry is being challenged to develop methods and techniques for affordable data processing on large datasets at optimum response times. The technical challenges in dealing with the increasing demand to handle vast quantities of data is daunting and on the rise. One of the recent processing models with a more efficient and intuitive solution to rapidly process large amount of data in parallel is called MapReduce. It is a framework defining a template approach of programming to perform large-scale data computation on clusters of machines in a cloud computing environment. MapReduce provides automatic parallelization and distribution of computation based on several processors. It hides the complexity of writing parallel and distributed programming code. This paper provides a comprehensive systematic review and analysis of large-scale dataset processing and dataset handling challenges and
requirements in a cloud computing environment by using the MapReduce framework and its open-source implementation Hadoop. We defined requirements for MapReduce systems to perform large-scale data processing. We also proposed the MapReduce framework and one implementation of this framework on Amazon Web Services. At the end of the paper, we presented an experimentation of running MapReduce
system in a cloud environment. This paper outlines one of the best techniques to process large datasets is MapReduce; it also can help developers to do parallel and distributed computation in a cloud environment.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Cloud computing is a new computing paradigm that, just as electricity was firstly generated at home and
evolved to be supplied from a few utility providers, aims to transform computing into a utility. It is a mapping
strategy that efficiently equilibrates the task load into multiple computational resources in the network based on the
system status to improve performance. The objective of this research paper is to show the results of Hybrid DEGA,
in which GA is implemented after DE
A DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical Systemsijseajournal
Cyber-Physical Systems (CPSs) involve the interconnection of heterogeneous computing devices which are
closely integrated with the physical processes under control. Often, these systems are resource-constrained
and require specific features such as the ability to adapt in a timeliness and efficient fashion to dynamic
environments. Also, they must support fault tolerance and avoid single points of failure. This paper
describes a scalable framework for CPSs based on the OMG DDS standard. The proposed solution allows
reconfiguring this kind of systems at run-time and managing efficiently their resources.
The advent of Big Data has seen the emergence of new processing and storage challenges. These challenges are often solved by distributed processing. Distributed systems are inherently dynamic and unstable, so it is realistic to expect that some resources will fail during use. Load balancing and task scheduling is an important step in determining the performance of parallel applications. Hence the need to design load balancing algorithms adapted to grid computing. In this paper, we propose a dynamic and hierarchical load balancing strategy at two levels: Intrascheduler load balancing, in order to avoid the use of the large-scale communication network, and interscheduler load balancing, for a load regulation of our whole system. The strategy allows improving the average response time of CLOAK-Reduce application tasks with minimal communication. We first focus on the three performance indicators, namely response time, process latency and running time of MapReduce tasks.
New Framework for Improving Bigdata Analaysis Using Mobile AgentMohammed Adam
In So many Companies Using Hadoop for bigdata Analysis. So the Hadoop has some drawbacks. To Overcome the Drawbacks of Hadoop Introducing Mobile Agent Under JADE.
Data Distribution Handling on Cloud for Deployment of Big Dataneirew J
Cloud computing is a new emerging model in the field of computer science. For varying workload Cloud
computing presents a large scale on demand infrastructure. The primary usage of clouds in practice is to
process massive amounts of data. Processing large datasets has become crucial in research and business
environments. The big challenges associated with processing large datasets is the vast infrastructure
required. Cloud computing provides vast infrastructure to store and process Big data. Vms can be
provisioned on demand in cloud to process the data by forming cluster of Vms . Map Reduce paradigm can
be used to process data wherein the mapper assign part of task to particular Vms in cluster and reducer
combines individual output from each Vms to produce final result. we have proposed an algorithm to
reduce the overall data distribution and processing time. We tested our solution in Cloud Analyst
Simulation environment wherein, we found that our proposed algorithm significantly reduces the overall
data processing time in cloud.
A data center network is a system in which multiple server are connected to each other to share information and resources. Multiple remote office or user connected to data center network and server for resource or information sharing.
Multiple remote office connected to data center server via VPN. Multiple ISP connected each branch and give failover service and using routing protocol OSPF.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Map Reduce based on Cloak DHT Data Replication Evaluationijdms
Distributed databases and data replication are effective ways to increase the accessibility and reliability of un-structured, semi-structured and structured data to extract new knowledge. Replications offer better performance and greater availability of data. With the advent of Big Data, new storage and processing challenges are emerging.
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
Big Data has gained much interest from the academia and the IT industry. In the digital and computing
world, information is generated and collected at a rate that quickly exceeds the boundary range. As
information is transferred and shared at light speed on optic fiber and wireless networks, the volume of
data and the speed of market growth increase. Conversely, the fast growth rate of such large data
generates copious challenges, such as the rapid growth of data, transfer speed, diverse data, and security.
Even so, Big Data is still in its early stage, and the domain has not been reviewed in general. Hence, this
study expansively surveys and classifies an assortment of attributes of Big Data, including its nature,
definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a
data life cycle that uses the technologies and terminologies of Big Data. Map/Reduce is a programming
model for efficient distributed computing. It works well with semi-structured and unstructured data. A
simple model but good for a lot of applications like Log processing and Web index building.
Postponed Optimized Report Recovery under Lt Based Cloud MemoryIJARIIT
Fountain code based conveyed stockpiling system give solid online limit course of action through putting unlabeled
subset pieces into various stockpiling hubs. Luby Transformation (LT) code is one of the predominant wellspring codes for limit
systems in view of its viable recuperation. In any case, to ensure high accomplishment deciphering of wellspring code based limit
recuperation of additional segments in required and this need could avoid additional put off. We give the idea that distinctive stage
recuperation of piece is powerful to lessen the document recovery delay. We first develop a postpone display for various stage
recuperation arranges pertinent to our considered system with the made model. We focus on perfect recuperation arranges given
essentials on accomplishment decipher limit. Our numerical outcomes propose a focal tradeoff between the record recuperation
delay and the target of fruitful document unraveling and that the report recuperation deferral can be on a very basic level decrease
by in a perfect world bundle requests in a multi arrange style.
Large amount of data are produced daily from various fields such as science, economics,
engineering and health. The main challenge of pervasive computing is to store and analyze large amount of
data.This has led to the need for usable and scalable data applications and storage clusters. In this article, we
examine the hadoop architecture developed to deal with these problems. The Hadoop architecture consists of
the Hadoop Distributed File System (HDFS) and Mapreduce programming model, which enables storage and
computation on a set of commodity computers. In this study, a Hadoop cluster consisting of four nodes was
created.Regarding the data size and cluster size, Pi and Grep MapReduce applications, which show the effect of
different data sizes and number of nodes in the cluster, have been made and their results examined.
Similar to Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Replication (20)
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...IJSRD
The data in real world applications is distributed at multiple locations, and the owner of the databases may be different people. Thus to perform mining task, the data needs to be kept at central location which causes threat to the privacy of corporate data. Hence the key challenge is to applying mining on distributed source data with preserving privacy of corporate data. The system addresses the problem of incrementally mining frequent itemsets in dynamic environment. The assumption made here is that, after initial mining the source undergoes into small changes in each time. The privacy of data should not be threatened by an adversary i.e. the miner and target database owner should not be able to recover original data from transformed data.
Performance and Emission characteristics of a Single Cylinder Four Stroke Die...IJSRD
The current trends in CI engine are to use Water-diesel emulsion as alternative fuel. It can be employed directly to the existing CI Engine system with no additional modifications. This system helps in reduction of NOx as well as PM, which in turn improve the combustion efficiency of the engine. However there are still investigations have to be done. The current work mainly concentrated on diesel engine run on water-diesel emulsions and its effect on engine performance and emissions were studied. The various loads were applied on a constant speed diesel engine run on water-diesel emulsions of varying ratios of 0.2:1, 0.3:1. 0.4:1 and 0.5:1. Emission and performance characteristics were measured and were compared with base diesel operation. The emissions like NOx and smoke density were found to decrease greatly and brake thermal efficiency was found to increase at high loads. Smoke level was 4.2 BSU and 3 BSU for base diesel and water diesel emulsion of 0.4. The ignition delay was found to increase with water diesel emulsions. This also increased the maximum rate of pressure rise and peak pressure. The engine was found to run rough with water-diesel emulsions. The optimal water-diesel ratio was found to be 0.4:1 by weight. HC and CO emissions were found to increase with water diesel emulsions.
Preclusion of High and Low Pressure In Boiler by Using LABVIEWIJSRD
Pressure is an important physical parameter to be controlled in process boiler, heat exchanger, nuclear reactor and steam carrying pipeline. In the article the issue has been face in boiler operation due to pressure is handled. In boiler, the problem is due to maximum and minimum range of pressure. Due to the issues there is a chance to causes the hazop. To avoid such the problem the high and low pressure in boiler has to control. In the paper such the problem has sorted out by implementing ON-OFF control. Here the proposed control action for pressure control is implemented with the help of LabVIEW (Laboratory Virtual Instrument Engineering Workbench) software and NI ELVIS hardware. In the idea the boiler’s low range and high is monitored and controlled valve desirably. And also the high range and low range of pressure in the boiler is signified to plant operator by alarm signal.
Prevention and Detection of Man in the Middle Attack on AODV ProtocolIJSRD
In this paper it is discuss about AODV protocol and security attacks and man in the middle attack in detail. AODV Protocol is use to find route and very important protocol for communication in wireless network. So AODV protocol should be Secured and it is a big challenge. There are various attacks that occur on it. Here in this paper it discussed about the detection and preventions of man-in-the-middle attack in detail.
Comparative Analysis of PAPR Reduction Techniques in OFDM Using Precoding Tec...IJSRD
In this modern era, Orthogonal Frequency Division Multiplexing (OFDM) has been proved to be an explicit promising technique for wired and wireless systems because of its several advantages like high spectral efficiency, robustness against frequency selective fading, relatively simple receiver implementation etc. Besides having a number of advantages OFDM suffers from few disadvantages like high Peak to Average Power Ratio (PAPR), Intercarrier Interference (ICI), Intersymbol Interference (ISI) etc. These detrimental effects, if not compensated properly and timely, can result in system performance degradation. This paper mainly concentrates on reduction of PAPR.A comparisons have been made between various precoding techniques against conventional OFDM.
Evaluation the Effect of Machining Parameters on MRR of Mild SteelIJSRD
Today’s life is totally based on Internet. Now a days people cannot imagine life without Internet. Information and communication technology plays vital role in today’s online networked society. In today’s life, we are very close to the online social networks. Online social networks are used for posting and sharing information across various social networking sites. But user’s privacy is not maintained by online social networks. For maintaining users sensitive information’s privacy online social networks provides little or no support. For filtering unwanted messages we propose a system using machine learning (ML). Using machine learning in soft classifier content based filtering performed. In proposed system filtering rules (FR’s) are provided for content independent filtering.. Blacklists are used for more flexibility by which filtering choices are increased. Proposed system provides security to the Online Social Networks.
Filter unwanted messages from walls and blocking nonlegitimate user in osnIJSRD
Today’s life is totally based on Internet. Now a days people cannot imagine life without Internet. Information and communication technology plays vital role in today’s online networked society. In today’s life, we are very close to the online social networks. Online social networks are used for posting and sharing information across various social networking sites. But user’s privacy is not maintained by online social networks. For maintaining users sensitive information’s privacy online social networks provides little or no support. For filtering unwanted messages we propose a system using machine learning (ML). Using machine learning in soft classifier content based filtering performed. In proposed system filtering rules (FR’s) are provided for content independent filtering.. Blacklists are used for more flexibility by which filtering choices are increased. Proposed system provides security to the Online Social Networks.
Keystroke Dynamics Authentication with Project Management SystemIJSRD
Generally user authentication is done using username and password that is called as login process. This login process is not more secure because, however a login session is still unprotected to impersonator when the user leaves his computer without logging off. Keystroke dynamics methods can be made useful to verify a user by extracting some typing features then, after the authentication process has successfully ended. From the last decade several studies proposed the use of keystroke dynamics as a behavioral biometric tool to verify users. We propose a new method, for representing the keystroke patterns by joining similar pairs of consecutive keystrokes. The above proposed method is used to consider clustering the di-graphs which are based on their temporal features. In this project, authentication system is provide to project management system that make more Secure management system without acknowledging unauthorized user. The Project Management System addresses the management of software projects. It provides the framework for organizing and managing resources in such a way that these resources deliver all the work required to complete a software project within defined scope, time and cost constraints. The system applies only to the management of software projects and is a tool that facilitates decision making.
Diagnosing lungs cancer Using Neural NetworksIJSRD
Artificial Neural Networks is the new technology. It is the branch of Artificial Intelligence and also it is an accepted new technology. Now a days Neural Networks Plays a Vital role in Medicine, Particularly in some fields such as cardiology, oncology etc. And also it has many applications in many areas like Science and Technology, Education, Business, Business and Manufacturing, etc. Neural Networks is most useful for making the decision more Effective. In this Paper, by the use of Neural Networks how the severe disease Lungs Cancer has been diagnosed more effectively. This Paper discussed about how the Lungs cancer can be identified effectively in earlier stages and diagnosed using Neural Networks and some devices. The Neural Networks has been successfully applied in Carcinogenesis. The main aim of this research is by the use of Neural Networks the Carcinogenesis can be diagnosed more cost-effective, easy to use techniques and methods. This Paper discussed about how the Lungs cancer can be identified effectively in earlier stages and diagnosed using Neural Networks and some devices. Sputum Cytology is used to detect the Lungs Cancer in Early stages.
A Survey on Sentiment Analysis and Opinion MiningIJSRD
In Today’s world, the social media has given web users a place for expressing and sharing their thoughts and opinions on different topics or events. For this purpose, the opinion mining has gained the importance. Sentiment classification and Opinion Mining is the study of people’s opinion, emotions, attitude towards the product, services, etc. Sentiment Analysis and Opinion Mining are the two interchangeable terms. There are various approaches and techniques exist for Sentiment Analysis like Naïve Bayes, Decision Trees, Support Vector Machines, Random Forests, Maximum Entropy, etc. Opinion mining is a useful and beneficial way to scientific surveys, political polls, market research and business intelligence, etc. This paper presents a literature review of various techniques used for opinion mining and sentiment analysis.
A Defect Prediction Model for Software Product based on ANFISIJSRD
Artificial intelligence techniques are day by day getting involvement in all the classification and prediction based process like environmental monitoring, stock exchange conditions, biomedical diagnosis, software engineering etc. However still there are yet to be simplify the challenges of selecting training criteria for design of artificial intelligence models used for prediction of results. This work focus on the defect prediction mechanism development using software metric data of KC1.We have taken subtractive clustering approach for generation of fuzzy inference system (FIS).The FIS rules are generated at different radius of influence of input attribute vectors and the developed rules are further modified by ANFIS technique to obtain the prediction of number of defects in software project using fuzzy logic system.
Experimental Investigation of Granulated Blast Furnace Slag ond Quarry Dust a...IJSRD
In this experimental work ninety nine cubes has been prepared having dimension 70.7x70.7x70.7 mm are cast as per IS:4031 (2000). In this experimental investigation cement mortar mix 1:3 by volume were selected for 0%, 20%, 40%, 60%, 80% and 100% partially replacement of natural sand (NS) by Granulated blast furnace slag (GBFS) and quarry dust (QD) [3 cubes on each parameter respectively] for W/C ratio of 0.55 respectively. All the cubes were tested under compressive testing machine. To compare the average compressive strength of natural sand (NS) with granulated blast furnace slag (GBFS) and quarry dust (QD).
Product Quality Analysis based on online ReviewsIJSRD
Customers satisfaction is the most important criteria before buying any product. Technology today has grown to such an extent that every smallest possible query is found on internet. An individual can express his reviews towards a product through Internet. This allows others to have a brief idea about the product before buying one for them. In this paper, we take into account all the challenges and limitations encountered while reading the online reviews and time being consumed in understanding quality of the product from the reviews. We include several methods and algorithms that help the consumer to understand the Quality of the product in better way.
Solving Fuzzy Matrix Games Defuzzificated by Trapezoidal Parabolic Fuzzy NumbersIJSRD
The matrix game theory gives a mathematical background for dealing with competitive or antagonistic situations arise in many parts of real life. Matrix games have been extensively studied and successfully applied to many fields such as economics, business, management and e-commerce as well as advertising. This paper deals with two-person matrix games whose elements of pay-off matrix are fuzzy numbers. Then the corresponding matrix game has been converted into crisp game using defuzzification techniques. The value of the matrix game for each player is obtained by solving corresponding crisp game problems using the existing method. Finally, to illustrate the proposed methodology, a practical and realistic numerical example has been applied for different defuzzification methods and the obtained results have been compared
Study of Clustering of Data Base in Education Sector Using Data MiningIJSRD
Data mining is a technology used in different disciplines to search for significant relationships among variables in n number of data sets. Data mining is frequently used in all types’ areas as well as applications. In this paper the application of data mining is attached with the field of education. The relationship between student’s university entrance examination results and their success was studied using cluster analysis and k-means algorithm techniques.
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...IJSRD
Big data is a popular term used to define the exponential evolution and availability of data, includes both structured and unstructured data. The volatile progression of demands on big data processing imposes heavy burden on computation, communication and storage in geographically distributed data centers. Hence it is necessary to minimize the cost of big data processing, which also includes fault tolerance cost. Big Data processing involves two types of faults: node failure and data loss. Both the faults can be recovered using heartbeat messages. Here heartbeat messages acts as an acknowledgement messages between two servers. This paper depicts about the study of node failure and recovery, data replication and heartbeat messages.
Investigation of Effect of Process Parameters on Maximum Temperature during F...IJSRD
In case of friction stir welding, the maximum temperature along the weld line within appropriate range at tool workpiece interface is responsible for quality of welded joint. Through this paper, an attempt is made to establish a relationship between the input process parameters and the maximum temperature along the weld line during friction stir welding of aluminium alloy AA-7075. The design of pre-experimental simulation has been performed in accordance with full factorial technique. The simulation of friction stir welding has been performed by varying input parameters, tool rotational speed and welding speed. The analysis of variance (ANOVA) is used to investigate the effect of input parameters on maximum temperature during friction stir welding. A correlation was established between input parameters and maximum temperature by multiple regression lines. This study indicates that the tool rotational speed is the main input parameter that has high statistical influence on maximum temperature along the weld line during friction stir welding of aluminium alloy AA-7075.
Review Paper on Computer Aided Design & Analysis of Rotor Shaft of a RotavatorIJSRD
The intent of this paper is to study the various forces and stress acting on a rotor shaft of a standard rotavator which is subjected to transient loading. The standard models of rotavator, having a progressive cutting sequence was considered for the study and analysis. The study was extended to various available models having different cutting blade arrangement. The study was carried on different papers and identifies the various forces acting on a Rotor shaft of a rotavator. The positions of the torque and forces applied are varied according to the model considered. The response was obtained by considering the angle of twist and equivalent stress on the rotor shaft. This paper presented a methodology for conducting transient analysis of rotor shaft of a rotavator,
A Survey on Data Mining Techniques for Crime Hotspots PredictionIJSRD
A crime is an act which is against the laws of a country or region. The technique which is used to find areas on a map which have high crime intensity is known as crime hotspot prediction. The technique uses the crime data which includes the area with crime rate and predict the future location with high crime intensity. The motivation of crime hotspot prediction is to raise people’s awareness regarding the dangerous location in certain time period. It can help for police resource allocation for creating a safe environment. The paper presents survey of different types of data mining techniques for crime hotspots prediction.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Replication
1. IJSRD - International Journal for Scientific Research & Development| Vol. 3, Issue 10, 2015 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 999
Fault Tolerance in Big Data Processing using Heartbeat Messages and
Data Replication
T.Cowsalya1 N.Gomathi2 R.Arunkumar3
1,2,3
Assistant Professor
1,2,3
Department of Computer Science and Engineering
1,2,3
SVS College of Engineering, Coimbatore, Tamil Nadu, India, Pincode-642109
Abstract— Big data is a popular term used to define the
exponential evolution and availability of data, includes both
structured and unstructured data. The volatile progression of
demands on big data processing imposes heavy burden on
computation, communication and storage in geographically
distributed data centers. Hence it is necessary to minimize
the cost of big data processing, which also includes fault
tolerance cost. Big Data processing involves two types of
faults: node failure and data loss. Both the faults can be
recovered using heartbeat messages. Here heartbeat
messages acts as an acknowledgement messages between
two servers. This paper depicts about the study of node
failure and recovery, data replication and heartbeat
messages.
Key words: Big data, Fault Tolerance, Heartbeat Messages,
Node Recovery, Data Replication
I. INTRODUCTION
Big data is a slogan, used to define a gigantic measurement
of both structured and Unstructured data that is so large and
difficult to process using traditional database architecture.
Due to its explosive growth the volatile progression of
demands on big data processing imposes heavy burden on
computation, communication and storage in geographically
distributed data centers. The incoming large data set is
broken up into multiple chunks and each individual multiple
chunks are placed in different data canters with the help of
volley system. The Volley System [2] makes use of logs to
submit the jobs to the data center. Cloud users make use of
volley system foe automatic data placement.
A. Geo-Distributed Data Center:
The data centers distributed at multiple geographical regions
are known as geographically distributed data centers [1].For
example Google has 13 datacenters over 8 countries and 4
continents.
Fig. 1: Data Center Topology
II. FAULT TOLERANCE
The challenge of big data includes analysis, capture, search,
sharing, storage, transfer, visualization and privacy
violations. Among these challenges fault tolerance is one of
the main challenge in big data. There are possibly two faults
that can occur while processing big data. First the data
chunk may loss while transferring the data to multiple data
center. Second the server may fail or slows down.
A. Heartbeat Messages;
The solution for the above two problems are heartbeat
messages. Here heartbeat message is a message sent from an
inventor to the endpoint to identify if and when the inventor
fails or is no longer available. Heartbeat messages are non-
stop on a periodic recurring basis from the inventor’s startup
until the inventor’s shutdown. When the receiver identifies
lack of heartbeat messages during an anticipated arrival
period, the destination may determine that the inventor has
failed, shutdown, or is generally no longer available.
The developmentrelays to fault recovery in
multiprocessor system, where the processors constantly
monitor heartbeat messages from the other processor is
capable of taking autonomous recovery action in response to
a failure to receive heartbeat messages, advantageously
without the overall guidance of an executive processor.
B. Data Loss Prevention:
When transmitting the jobs to multiple data centers, there
may be chance of data loss. Data loss may occur due to
network link failure. The links in networks may vary on
transmissionrates according to their unique features. For
example the distances and optical fiber facilities between
multiple
data centers. Due to capacity constraints, all tasks
are not placedonto the same server, on which
theconsistentdata exist in. It is unavoidable when certain
data must bedownloaded from a remote server. In this case,
routingplan matters on the transmission cost.
C. Hadoop Architecture:
Hadoop is a software framework used for processing big
data in parallel. It consists of two important components
called Hadoop Distributed File System and MapReduce
1) Hadoop Distributed File System:
The Hadoop Distributed File System (HDFS)[3] is a
distributed, highly fault-tolerant file system designed to run
on low-cost commodity hardware. HDFS provides high-
throughput access to application data and it is suitable for
applications with large data sets. HDFS consists of two
nodes called name node and data node.Name node manages
file system namespace operations like opening, closing, and
renaming files and directories. A name node also maps data
blocks to data nodes, which handle read and write requests
from HDFS clients. Data nodes also create, delete, and
replicate data blocks according to instructions from the
governing name node. Name Node and Data Node send
messages to prove their identity.
2. Fault Tolerance in Big Data Processing using Heartbeat Messages and Data Replication
(IJSRD/Vol. 3/Issue 10/2015/227)
All rights reserved by www.ijsrd.com 1000
2) Mapreduce:
MapReduce [12] is a programming model and its associated
implementation for processing and generating large data sets
with parallel, distributed algorithm on a cluster. MapReduce
also consists of two nodes called job tracker and task
tracker. The JobTracker talks to the NameNode to determine
the location of the data. The Task Tracker node executes the
assigned tasks in the data nodes.
III. FAILURES
One of the major benefits of using Hadoop is its ability to
handle these failures and allow our job to complete.
A. Task Failure:
When the user code in the map or reduce tasks throws
runtime exception, then the child task fails. Another failure
mode is the sudden exit of child JVM. In this case the task
tracker notices the process has exited and marks the attempt
as failed. A task may also be killed, which is different from
failing.
B. Task Tracker Failure:
If a task tracker fails by crashing or running very slowly, it
will stop sending heartbeat messages to the job tracker. The
job tracker notice that the task trackers has stopped sending
heartbeat and remove it from its pool of task trackers to
schedule tasks on.
C. Job Tracker Failure:
Failure of job tracker is the most serious failure mode. It is a
single point of failure. This failure mode has low chance of
occurring, since the chance of particular machine failing is
low.
D. Name Node Failure:
The name node was a single point of failure, so if it failed
that meant your cluster became unstable. Even the
secondary name node doesn’t help in this case since it is
only used for checkpoints, not as a backup for the name
node. If the name node fails someone like an administrator
would have to restart the name node.
E. Data Node Failure;
A compute node can fail for any variety of reasons, for
example broken node hardware, a broken network, software
bugs, or inadequate hardware resources
Fig. 2: Name node Schema
When a compute node fails, all jobs running on that node
fail. Even though the running jobs running on other nodes
that weren’t communicating with jobs on the failed node
will continue to run without a problem.
IV. SOLUTION
A. Data Replication:
An application can specify the number of replicas of a file at
the time it is created, and this number can be changed any
time after that. [6] The name node makes all decisions
concerning block replication. HDFS uses an intelligent
replica placement model for reliability and performance.
Optimizing replica placement makes HDFS unique from
most other distributed file systems, and is facilitated by a
rack-aware replica placement policy that uses network
bandwidth efficiently.
Fig. 3: Data Replication in DFS
The name node makes all decisions regarding replications of
blocks. It periodically receives a heartbeat and a block
report from each of the data nodes in the cluster. Receipt of
a heartbeat implies that the data node is functioning
properly. A block report contains a list of all blocks on a
data node.
3. Fault Tolerance in Big Data Processing using Heartbeat Messages and Data Replication
(IJSRD/Vol. 3/Issue 10/2015/227)
All rights reserved by www.ijsrd.com 1001
B. Replica Selection:
To minimize global bandwidth consumption and read
latency, HDFS tries to satisfy a read request from a replica
that is closest to the reader. If there exists a replica on the
same rack as the reader node, then the replica is preferred to
satisfy the read request. If HDFS clusters spans multiple
data centers, then a replica that is resident in the local data
center is preferred over any remote replica.
V. CONCLUSION
Thus the failure in big data processing has been studied.
Data replication and heartbeat messages are used as a fault
tolerant mechanism. In future practical setups can be
executed and the computation and communication cost can
be computed. Result can be compared with the cost of data
processing in non- failure node
REFERENCES
[1] Data Center Locations,”
http://www.google.com/about/data
centers/inside/locations/index.html.
[2] S. Agarwal, J. Dunagan, N. Jain, S. Saroiu, A. Wolman,
and H. Bhogan, “Volley: Automated Data Placement
for Geo-DistributedCloud Services,” in The 7th
USENIX Symposium on Networked Systems Design
and Implementation (NSDI), 2010, pp. 17–32.
[3] L. Rao, X. Liu, L. Xie, and W. Liu, “Minimizing
Electricity Cost: Optimization of Distributed Internet
Data Centers in a Multi-Electricity-Market
Environment,” in Proceedings of the 29th International
Conference on Computer Communications
(INFOCOM). IEEE, 2010, pp. 1–9.
[4] A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, and
B. Maggs, “Cutting the Electric Bill for Internet-scale
Systems,” in Proceedings of the ACM Special Interest
Group on Data Communication (SIGCOMM). ACM,
2009, pp. 123–134.
[5] R. Urgaonkar, B. Urgaonkar, M. J. Neely, and A.
Sivasubramaniam, “Optimal Power Cost Management
Using Stored Energyin Data Centers,” in Proceedings of
International Conference on Measurement and
Modeling of Computer Systems (SIGMETRICS).ACM,
2011, pp. 221–232.
[6] X. Fan, W.-D.Weber, and L. A. Barroso, “Power
Provisioning for A Warehouse-sized Computer,” in
Proceedings of the 34th Annual International
Symposium on Computer Architecture (ISCA). ACM,
2007, pp. 13–23.
[7] S. Govindan, A. Sivasubramaniam, and B. Urgaonkar,
“Benefits and Limitations of Tapping Into Stored
Energy for Datacenters,”in Proceedings of the 38th
Annual International Symposium on Computer
Architecture (ISCA). ACM, 2011, pp. 341–352
[8] P. X. Gao, A. R. Curtis, B. Wong, and S. Keshav, “It’s
Not Easy Being Green,” in Proceedings of the ACM
Special Interest Group on Data Communication
(SIGCOMM). ACM, 2012, pp. 211–222.
[9] S. A. Yazd, S. Venkatesan, and N. Mittal, “Boosting
energy efficiency with mirrored data block replication
policy and energyscheduler,” SIGOPS Oper. Syst. Rev.,
vol. 47, no. 2, pp. 33–40, 2013.
[10]J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and
C. Welton, “Mad skills: new analysis practices for big
data,” Proc. VLDBEndow., vol. 2, no. 2, pp. 1481–
1492, 2009.
[11]R. Kaushik and K. Nahrstedt, “T*: A data-centric
cooling energy costs reduction approach for Big Data
analytics cloud,” in 2012 International Conference for
High Performance Computing, Networking, Storage
and Analysis (SC), 2012, pp. 1–11.
[12]MapReduce: Simpli_ed Data Processing on Large
Clusters, Jeffrey Dean and Sanjay Ghemawat,
jeff@google.com, sanjay@google.com, Google, Inc.