IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
Abstract: Hadoop is a open source software framework for storage and processing large scale of datasets on clusters of commodity hardware. Hadoop provides a reliable shared storage and analysis system, here storage provided by HDFS and analysis provided by MapReduce. MapReduce frameworks are foraying into the domain of high performance of computing with stringent non-functional requirements namely execution times and throughputs. MapReduce provides simple programming interfaces with two functions: map and reduce. The functions can be automatically executed in parallel on a cluster without requiring any intervention from the programmer. Moreover, MapReduce offers other benefits, including load balancing, high scalability, and fault tolerance. The challenge is that when we consider the data is dynamically and continuously produced, from different geographical locations. For dynamically generated data, an efficient algorithm is desired, for timely guiding the transfer of data into the cloud over time for geo-dispersed data sets, there is need to select the best data center to aggregate all data onto given that a MapReduce like framework is most efficient when data to be processed are all in one place, and not across data centers due to the enormous overhead of inter-data center data moving in the stage of shuffle and reduce. Recently, many researchers tend to implement and deploy data-intensive and/or computation-intensive algorithms on MapReduce parallel computing framework for high processing efficiency.
This document describes a proposed architecture for improving data retrieval performance in a Hadoop Distributed File System (HDFS) deployed in a cloud environment. The key aspects are:
1) A web server would replace the map phase of MapReduce to provide faster searching of data. The web server uses multi-level indexing for real-time processing on HDFS.
2) An Apache load balancer distributes requests across backend application servers to improve throughput and scalability.
3) The NameNode is divided into master and slave servers, with the master containing the multi-level index and slaves storing data and lower-level indexes. This allows distributed data retrieval.
Hadoop Online Training : kelly technologies is the bestHadoop online Training Institutes in Bangalore. ProvidingHadoop online Training by real time faculty in Bangalore.
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurationsdbpublications
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. A MapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reduce tasks. Due to 1) that map tasks can only run in map slots and reduce tasks can only run in reduce slots, and 2) the general execution constraints that map tasks are executed before reduce tasks, different job execution orders and map/reduce slot configurations for a MapReduce workload have significantly different performance and system utilization. This survey proposes two classes of algorithms to minimize the make span and the total completion time for an offline MapReduce workload. Our first class of algorithms focuses on the job ordering optimization for a MapReduce workload under a given map/reduce slot configuration. In contrast, our second class of algorithms considers the scenario that we can perform optimization for map/reduce slot configuration for a MapReduce workload. We perform simulations as well as experiments on Amazon EC2 and show that our proposed algorithms produce results that are up to 15 - 80 percent better than currently unoptimized Hadoop, leading to significant reductions in running time in practice.
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...IRJET Journal
This document discusses novel scheduling algorithms for efficiently deploying MapReduce applications in heterogeneous computing environments. It proposes dynamically allocating computing resources like slots between map and reduce tasks to minimize the completion time (makespan) of MapReduce jobs. The key idea is to leverage task status information from recently completed jobs to dynamically adjust the slot allocation ratio between map and reduce phases. This aims to better pipeline the job stages and reduce makespan, compared to Hadoop's static slot configuration.
This document discusses efficient analysis of big data using the MapReduce framework. It introduces the challenges of analyzing large and complex datasets, and describes how MapReduce addresses these challenges through its map and reduce functions. MapReduce allows distributed processing of big data across clusters of computers using a simple programming model.
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
The document describes a resource-aware scheduler for Hadoop that aims to improve task scheduling by considering both job resource demands and node resource availability. It captures job and node profiles, estimates task execution times, and applies scheduling policies like shortest job first. Evaluation on word count and Pi estimation workloads showed the estimated task times closely matched the actual times, demonstrating the accuracy of the scheduler's resource modeling and estimations.
This document describes the Pig system, which is a high-level data flow system built on top of MapReduce. Pig provides a language called Pig Latin for analyzing large datasets. Pig Latin programs are compiled into MapReduce jobs. The compilation process involves several steps: (1) parsing and type checking the Pig Latin code, (2) logical optimization, (3) converting the logical plan into physical operators like GROUP and JOIN, (4) mapping the physical operators to MapReduce stages, and (5) optimizing the MapReduce plan. This allows users to write data analysis programs more declaratively without coding MapReduce jobs directly.
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
Abstract: Hadoop is a open source software framework for storage and processing large scale of datasets on clusters of commodity hardware. Hadoop provides a reliable shared storage and analysis system, here storage provided by HDFS and analysis provided by MapReduce. MapReduce frameworks are foraying into the domain of high performance of computing with stringent non-functional requirements namely execution times and throughputs. MapReduce provides simple programming interfaces with two functions: map and reduce. The functions can be automatically executed in parallel on a cluster without requiring any intervention from the programmer. Moreover, MapReduce offers other benefits, including load balancing, high scalability, and fault tolerance. The challenge is that when we consider the data is dynamically and continuously produced, from different geographical locations. For dynamically generated data, an efficient algorithm is desired, for timely guiding the transfer of data into the cloud over time for geo-dispersed data sets, there is need to select the best data center to aggregate all data onto given that a MapReduce like framework is most efficient when data to be processed are all in one place, and not across data centers due to the enormous overhead of inter-data center data moving in the stage of shuffle and reduce. Recently, many researchers tend to implement and deploy data-intensive and/or computation-intensive algorithms on MapReduce parallel computing framework for high processing efficiency.
This document describes a proposed architecture for improving data retrieval performance in a Hadoop Distributed File System (HDFS) deployed in a cloud environment. The key aspects are:
1) A web server would replace the map phase of MapReduce to provide faster searching of data. The web server uses multi-level indexing for real-time processing on HDFS.
2) An Apache load balancer distributes requests across backend application servers to improve throughput and scalability.
3) The NameNode is divided into master and slave servers, with the master containing the multi-level index and slaves storing data and lower-level indexes. This allows distributed data retrieval.
Hadoop Online Training : kelly technologies is the bestHadoop online Training Institutes in Bangalore. ProvidingHadoop online Training by real time faculty in Bangalore.
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurationsdbpublications
MapReduce is a popular parallel computing paradigm for large-scale data processing in clusters and data centers. A MapReduce workload generally contains a set of jobs, each of which consists of multiple map tasks followed by multiple reduce tasks. Due to 1) that map tasks can only run in map slots and reduce tasks can only run in reduce slots, and 2) the general execution constraints that map tasks are executed before reduce tasks, different job execution orders and map/reduce slot configurations for a MapReduce workload have significantly different performance and system utilization. This survey proposes two classes of algorithms to minimize the make span and the total completion time for an offline MapReduce workload. Our first class of algorithms focuses on the job ordering optimization for a MapReduce workload under a given map/reduce slot configuration. In contrast, our second class of algorithms considers the scenario that we can perform optimization for map/reduce slot configuration for a MapReduce workload. We perform simulations as well as experiments on Amazon EC2 and show that our proposed algorithms produce results that are up to 15 - 80 percent better than currently unoptimized Hadoop, leading to significant reductions in running time in practice.
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...IRJET Journal
This document discusses novel scheduling algorithms for efficiently deploying MapReduce applications in heterogeneous computing environments. It proposes dynamically allocating computing resources like slots between map and reduce tasks to minimize the completion time (makespan) of MapReduce jobs. The key idea is to leverage task status information from recently completed jobs to dynamically adjust the slot allocation ratio between map and reduce phases. This aims to better pipeline the job stages and reduce makespan, compared to Hadoop's static slot configuration.
This document discusses efficient analysis of big data using the MapReduce framework. It introduces the challenges of analyzing large and complex datasets, and describes how MapReduce addresses these challenges through its map and reduce functions. MapReduce allows distributed processing of big data across clusters of computers using a simple programming model.
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
The document describes a resource-aware scheduler for Hadoop that aims to improve task scheduling by considering both job resource demands and node resource availability. It captures job and node profiles, estimates task execution times, and applies scheduling policies like shortest job first. Evaluation on word count and Pi estimation workloads showed the estimated task times closely matched the actual times, demonstrating the accuracy of the scheduler's resource modeling and estimations.
This document describes the Pig system, which is a high-level data flow system built on top of MapReduce. Pig provides a language called Pig Latin for analyzing large datasets. Pig Latin programs are compiled into MapReduce jobs. The compilation process involves several steps: (1) parsing and type checking the Pig Latin code, (2) logical optimization, (3) converting the logical plan into physical operators like GROUP and JOIN, (4) mapping the physical operators to MapReduce stages, and (5) optimizing the MapReduce plan. This allows users to write data analysis programs more declaratively without coding MapReduce jobs directly.
A comparative survey based on processing network traffic data using hadoop pi...ijcses
Big data analysis has now become an integral part of many computational and statistical departments.
Analysis of peta-byte scale of data is having an enhanced importance in the present day scenario. Big data
manipulation is now considered as a key area of research in the field of data analytics and novel
techniques are being evolved day by day. Thousands of transaction requests are being processed in every
minute by different websites related to e-commerce, shopping carts and online banking. Here comes the
need of network traffic and weblog analysis for which Hadoop comes as a suggested solution. It can
efficiently process the Netflow data collected from routers, switches or even from website access logs at
fixed intervals.
Python in an Evolving Enterprise System (PyData SV 2013)PyData
The document evaluates different solutions for integrating Python with Hadoop to enable data modeling on Hadoop clusters. It tests various frameworks like Native Java, Streaming, mrjob, PyCascading, and Pig using a sample budget aggregation problem. Pig and PyCascading allow complex pipelines to be expressed simply, while Pig is more performant and mature, making it the most viable option for ad-hoc analysis on Hadoop from Python.
Big Data Processing: Performance Gain Through In-Memory ComputationUT, San Antonio
The document reports on a project comparing the performance of Hadoop MapReduce and Spark in-memory frameworks for big data processing. The authors analyzed execution times for running a PageRank benchmark on various datasets using each framework. They found that Spark performed better for smaller datasets with multiple iterations due to its in-memory caching. However, for larger datasets MapReduce was more efficient due to Spark's memory requirements not being met by the cluster configuration.
This document discusses a Hadoop Job Runner UI Tool that was created to make running Hadoop jobs easier. It allows users to browse input data locally, copy the data and job class to HDFS, run the job, and display results without using command lines. The tool simplifies tasks like distributing data and code, executing jobs, and retrieving output. Background information on Hadoop, MapReduce, and distributed computing environments is also provided.
Cloud batch a batch job queuing system on clouds with hadoop and h-baseJoão Gabriel Lima
CloudBATCH is a batch job queuing system that allows Hadoop to function like a traditional batch job queuing system with enhanced functionality. It uses HBase tables to store resource management information like user credentials and job/queue details. Job brokers submit "wrappers" (MapReduce programs) as agents to execute jobs in Hadoop. Users can submit and check the status of jobs through a client. CloudBATCH aims to enable Hadoop to manage hybrid computing needs on a cluster by providing traditional management features like user access control, accounting, and customizable job scheduling.
A sql implementation on the map reduce frameworkeldariof
Tenzing is a SQL query engine built on top of MapReduce for analyzing large datasets at Google. It provides low latency queries over heterogeneous data sources at petabyte scale. Tenzing supports a full SQL implementation with extensions and optimizations to achieve high performance comparable to commercial parallel databases by leveraging MapReduce and traditional database techniques. It is used by over 1,000 Google employees to query over 1.5 petabytes of data and serve 10,000 queries daily.
Survey of Parallel Data Processing in Context with MapReduce cscpconf
MapReduce is a parallel programming model and an associated implementation introduced by
Google. In the programming model, a user specifies the computation by two functions, Map and Reduce. The underlying MapReduce library automatically parallelizes the computation, and handles complicated issues like data distribution, load balancing and fault tolerance. The original MapReduce implementation by Google, as well as its open-source counterpart,Hadoop, is aimed for parallelizing computing in large clusters of commodity machines.This paper gives an overview of MapReduce programming model and its applications. The author has described here the workflow of MapReduce process. Some important issues, like fault tolerance, are studied in more detail. Even the illustration of working of Map Reduce is given. The data locality issue in heterogeneous environments can noticeably reduce the Map Reduce performance. In this paper, the author has addressed the illustration of data across nodes in a way that each node has a balanced data processing load stored in a parallel manner. Given a data intensive application running on a Hadoop Map Reduce cluster, the auhor has exemplified how data placement is done in Hadoop architecture and the role of Map Reduce in the Hadoop Architecture. The amount of data stored in each node to achieve improved data-processing performance is explained here.
Today’s era is generally treated as the era of data on each and every field of computing application huge amount of data is generated. The society is gradually more dependent on computers so large amount of data is generated in each and every second which is either in structured format, unstructured format or semi structured format. These huge amount of data are generally treated as big data. To analyze big data is a biggest challenge in current world. Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage and it generally follows horizontal processing. Map Reduce programming is generally run over Hadoop Framework and process the large amount of structured and unstructured data. This Paper describes about different joining strategies used in Map reduce programming to combine the data of two files in Hadoop Framework and also discusses the skewness problem associate to it.
This document summarizes a research paper that analyzes and evaluates the performance of processing large data sets using Hadoop. It discusses how Hadoop Distributed File System (HDFS) and MapReduce provide parallel and distributed processing of large structured and unstructured data at scale. The paper also presents the results of experiments conducted on Hadoop to classify and cluster large data sets using machine learning algorithms. The experiments showed that Hadoop can process large data sets more efficiently and reliably compared to processing on a single computer.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Hadoop Mapreduce Performance Enhancement Using In-Node Combinersijcsit
This document summarizes a research paper that proposes using in-node combiners to improve the performance of Hadoop MapReduce jobs. It discusses how MapReduce jobs are I/O intensive and describes two common bottlenecks: during the map phase when data is loaded from disks, and during the shuffle phase when intermediate results are transferred over the network. The paper introduces an in-node combiner approach to optimize I/O by locally aggregating intermediate results within nodes to reduce network traffic between mappers and reducers. It evaluates this approach through an experiment counting word occurrences in Twitter messages.
This document discusses a proposed data-aware caching framework called Dache that could be used with big data applications built on MapReduce. Dache aims to cache intermediate data generated during MapReduce jobs to avoid duplicate computations. When tasks run, they would first check the cache for existing results before running the actual computations. The goal is to improve efficiency by reducing redundant work. The document outlines the objectives and scope of extending MapReduce with Dache, provides background on MapReduce and Hadoop, and concludes that initial experiments show Dache can eliminate duplicate tasks in incremental jobs.
Dache - a data aware cache system for big-data applications using the MapReduce framework.
Dache aim-extending the MapReduce framework and provisioning a cache layer for efficiently identifying and accessing cache items in a MapReduce job.
Hybrid Job-Driven Meta Data Scheduling for BigData with MapReduce Clusters an...dbpublications
It is cost-efficient for a tenant with a limited budget to establish a virtual Map Reduce cluster by renting multiple virtual private servers (VPSs) from a VPS provider. To provide an appropriate scheduling scheme for this type of computing environment, we propose in this paper a hybrid job-driven scheduling scheme (JoSS for short) from a tenant’s perspective. JoSS provides not only job level scheduling, but also map-task level scheduling and reduce-task level scheduling. JoSS classifies Map Reduce jobs based on job scale and job type and designs an appropriate scheduling policy to schedule each class of jobs. The goal is to improve data locality for both map tasks and reduce tasks, avoid job starvation, and improve job execution performance. Two variations of JoSS are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive experiments to evaluate and compare the two variations with current scheduling algorithms supported by Hadoop. The results show that the two variations outperform the other tested algorithms in terms of map-data locality, reduce-data locality, and network overhead without incurring significant overhead. In addition, the two variations are separately suitable for different Map Reduce workload scenarios and provide the best job performance among all tested algorithms.
Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkSafir Shah
This document proposes Dache, a data-aware caching system for big data applications using the MapReduce framework. It aims to extend MapReduce by provisioning a cache layer to efficiently identify and access cached items. The proposed system identifies input sources and operations applied to cache items for proper indexing. It describes cache requests and replies for the map and reduce phases. Experimental results show the proposed system eliminates duplicate tasks, reduces execution time and CPU utilization compared to traditional MapReduce.
This document discusses optimization of MapReduce for Hadoop-based big data applications. It notes that while Hadoop is effective for large data volumes, its performance could be improved through MapReduce optimization. Specifically, it proposes a multilayered scheduling approach where an initial MapReduce with sample data generates outputs to optimize a final MapReduce processing. This could reduce execution time for exploring large datasets. The document also suggests optimization through techniques like predictive load scheduling, pre-fetching, pre-shuffling, and uniform load distribution across nodes. Overall improvements to MapReduce scheduling are argued to enhance resource utilization and quality of service for big data applications on Hadoop frameworks.
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...Puneet Kansal
This research article experimentally shows, how Multiple Queries can be provided to Hive and their execution can be reduced by searching common expression and common data source. The TPC-H queries are used for demonstration and test data is generated in variation of 2 GB, 5 GB and 10 GB using DBGEN software. Technology used in this is HADOOP and Hive. HADOOP is configured in Single user over ubunto Operating system.
Asserting that Big Data is vital to business is an understatement. Organizations have generated more and more data for years, but struggle to use it effectively. Clearly Big Data has more important uses than ensuring compliance with regulatory requirements. In addition, data is being generated with greater velocity, due to the advent of new pervasive devices (e.g., smartphones, tablets, etc.), social Web sites (e.g., Facebook, Twitter, LinkedIn, etc.) and other sources like GPS, Google Maps, heat/pressure sensors, etc.
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...redpel dot com
The document proposes a novel approach for predicting quality of service (QoS) metrics for cloud services. The approach combines fuzzy neural networks and adaptive dynamic programming (ADP) for improved prediction accuracy. Specifically, it uses an adaptive-network-based fuzzy inference system (ANFIS) to extract fuzzy rules from QoS data and employ ADP for online parameter learning of the fuzzy rules. Experimental results on a large QoS dataset demonstrate the prediction accuracy of this approach. The approach also provides a convergence proof to guarantee stability of the neural network weights during training.
This thesis investigates developing a new model called the Swarm Routing Problem (SRP) for multi-objective UAV mission planning using evolutionary computation. The SRP modifies the Vehicle Routing Problem with Time Windows (VRPTW) to remove the constraint of a single vehicle per target and instead models cooperative routing of a UAV swarm across multiple targets. The author develops a multi-objective evolutionary algorithm solution approach and implements it in C++ using an open source library. Experiments apply the algorithm to benchmark problems for both the VRPTW and newly developed SRP to validate the SRP as a more realistic UAV mission modeling problem.
LEARNING MIXTURES OF MARKOV CHAINS FROM AGGREGATE DATA WITH STRUCTURAL CONSTR...Nexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
This document lists 25 Internet of Things (IOT) projects and 27 Embedded Systems projects available from Embedded Systems IEEE in 2016. The projects cover various domains including IOT, smart parking, traffic management, environmental monitoring, healthcare, and more. Contact information is provided for Newzen Infotech, the organization providing the projects. The projects offer opportunities for research and development in the fields of IOT and embedded systems.
A comparative survey based on processing network traffic data using hadoop pi...ijcses
Big data analysis has now become an integral part of many computational and statistical departments.
Analysis of peta-byte scale of data is having an enhanced importance in the present day scenario. Big data
manipulation is now considered as a key area of research in the field of data analytics and novel
techniques are being evolved day by day. Thousands of transaction requests are being processed in every
minute by different websites related to e-commerce, shopping carts and online banking. Here comes the
need of network traffic and weblog analysis for which Hadoop comes as a suggested solution. It can
efficiently process the Netflow data collected from routers, switches or even from website access logs at
fixed intervals.
Python in an Evolving Enterprise System (PyData SV 2013)PyData
The document evaluates different solutions for integrating Python with Hadoop to enable data modeling on Hadoop clusters. It tests various frameworks like Native Java, Streaming, mrjob, PyCascading, and Pig using a sample budget aggregation problem. Pig and PyCascading allow complex pipelines to be expressed simply, while Pig is more performant and mature, making it the most viable option for ad-hoc analysis on Hadoop from Python.
Big Data Processing: Performance Gain Through In-Memory ComputationUT, San Antonio
The document reports on a project comparing the performance of Hadoop MapReduce and Spark in-memory frameworks for big data processing. The authors analyzed execution times for running a PageRank benchmark on various datasets using each framework. They found that Spark performed better for smaller datasets with multiple iterations due to its in-memory caching. However, for larger datasets MapReduce was more efficient due to Spark's memory requirements not being met by the cluster configuration.
This document discusses a Hadoop Job Runner UI Tool that was created to make running Hadoop jobs easier. It allows users to browse input data locally, copy the data and job class to HDFS, run the job, and display results without using command lines. The tool simplifies tasks like distributing data and code, executing jobs, and retrieving output. Background information on Hadoop, MapReduce, and distributed computing environments is also provided.
Cloud batch a batch job queuing system on clouds with hadoop and h-baseJoão Gabriel Lima
CloudBATCH is a batch job queuing system that allows Hadoop to function like a traditional batch job queuing system with enhanced functionality. It uses HBase tables to store resource management information like user credentials and job/queue details. Job brokers submit "wrappers" (MapReduce programs) as agents to execute jobs in Hadoop. Users can submit and check the status of jobs through a client. CloudBATCH aims to enable Hadoop to manage hybrid computing needs on a cluster by providing traditional management features like user access control, accounting, and customizable job scheduling.
A sql implementation on the map reduce frameworkeldariof
Tenzing is a SQL query engine built on top of MapReduce for analyzing large datasets at Google. It provides low latency queries over heterogeneous data sources at petabyte scale. Tenzing supports a full SQL implementation with extensions and optimizations to achieve high performance comparable to commercial parallel databases by leveraging MapReduce and traditional database techniques. It is used by over 1,000 Google employees to query over 1.5 petabytes of data and serve 10,000 queries daily.
Survey of Parallel Data Processing in Context with MapReduce cscpconf
MapReduce is a parallel programming model and an associated implementation introduced by
Google. In the programming model, a user specifies the computation by two functions, Map and Reduce. The underlying MapReduce library automatically parallelizes the computation, and handles complicated issues like data distribution, load balancing and fault tolerance. The original MapReduce implementation by Google, as well as its open-source counterpart,Hadoop, is aimed for parallelizing computing in large clusters of commodity machines.This paper gives an overview of MapReduce programming model and its applications. The author has described here the workflow of MapReduce process. Some important issues, like fault tolerance, are studied in more detail. Even the illustration of working of Map Reduce is given. The data locality issue in heterogeneous environments can noticeably reduce the Map Reduce performance. In this paper, the author has addressed the illustration of data across nodes in a way that each node has a balanced data processing load stored in a parallel manner. Given a data intensive application running on a Hadoop Map Reduce cluster, the auhor has exemplified how data placement is done in Hadoop architecture and the role of Map Reduce in the Hadoop Architecture. The amount of data stored in each node to achieve improved data-processing performance is explained here.
Today’s era is generally treated as the era of data on each and every field of computing application huge amount of data is generated. The society is gradually more dependent on computers so large amount of data is generated in each and every second which is either in structured format, unstructured format or semi structured format. These huge amount of data are generally treated as big data. To analyze big data is a biggest challenge in current world. Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage and it generally follows horizontal processing. Map Reduce programming is generally run over Hadoop Framework and process the large amount of structured and unstructured data. This Paper describes about different joining strategies used in Map reduce programming to combine the data of two files in Hadoop Framework and also discusses the skewness problem associate to it.
This document summarizes a research paper that analyzes and evaluates the performance of processing large data sets using Hadoop. It discusses how Hadoop Distributed File System (HDFS) and MapReduce provide parallel and distributed processing of large structured and unstructured data at scale. The paper also presents the results of experiments conducted on Hadoop to classify and cluster large data sets using machine learning algorithms. The experiments showed that Hadoop can process large data sets more efficiently and reliably compared to processing on a single computer.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Hadoop Mapreduce Performance Enhancement Using In-Node Combinersijcsit
This document summarizes a research paper that proposes using in-node combiners to improve the performance of Hadoop MapReduce jobs. It discusses how MapReduce jobs are I/O intensive and describes two common bottlenecks: during the map phase when data is loaded from disks, and during the shuffle phase when intermediate results are transferred over the network. The paper introduces an in-node combiner approach to optimize I/O by locally aggregating intermediate results within nodes to reduce network traffic between mappers and reducers. It evaluates this approach through an experiment counting word occurrences in Twitter messages.
This document discusses a proposed data-aware caching framework called Dache that could be used with big data applications built on MapReduce. Dache aims to cache intermediate data generated during MapReduce jobs to avoid duplicate computations. When tasks run, they would first check the cache for existing results before running the actual computations. The goal is to improve efficiency by reducing redundant work. The document outlines the objectives and scope of extending MapReduce with Dache, provides background on MapReduce and Hadoop, and concludes that initial experiments show Dache can eliminate duplicate tasks in incremental jobs.
Dache - a data aware cache system for big-data applications using the MapReduce framework.
Dache aim-extending the MapReduce framework and provisioning a cache layer for efficiently identifying and accessing cache items in a MapReduce job.
Hybrid Job-Driven Meta Data Scheduling for BigData with MapReduce Clusters an...dbpublications
It is cost-efficient for a tenant with a limited budget to establish a virtual Map Reduce cluster by renting multiple virtual private servers (VPSs) from a VPS provider. To provide an appropriate scheduling scheme for this type of computing environment, we propose in this paper a hybrid job-driven scheduling scheme (JoSS for short) from a tenant’s perspective. JoSS provides not only job level scheduling, but also map-task level scheduling and reduce-task level scheduling. JoSS classifies Map Reduce jobs based on job scale and job type and designs an appropriate scheduling policy to schedule each class of jobs. The goal is to improve data locality for both map tasks and reduce tasks, avoid job starvation, and improve job execution performance. Two variations of JoSS are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive experiments to evaluate and compare the two variations with current scheduling algorithms supported by Hadoop. The results show that the two variations outperform the other tested algorithms in terms of map-data locality, reduce-data locality, and network overhead without incurring significant overhead. In addition, the two variations are separately suitable for different Map Reduce workload scenarios and provide the best job performance among all tested algorithms.
Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkSafir Shah
This document proposes Dache, a data-aware caching system for big data applications using the MapReduce framework. It aims to extend MapReduce by provisioning a cache layer to efficiently identify and access cached items. The proposed system identifies input sources and operations applied to cache items for proper indexing. It describes cache requests and replies for the map and reduce phases. Experimental results show the proposed system eliminates duplicate tasks, reduces execution time and CPU utilization compared to traditional MapReduce.
This document discusses optimization of MapReduce for Hadoop-based big data applications. It notes that while Hadoop is effective for large data volumes, its performance could be improved through MapReduce optimization. Specifically, it proposes a multilayered scheduling approach where an initial MapReduce with sample data generates outputs to optimize a final MapReduce processing. This could reduce execution time for exploring large datasets. The document also suggests optimization through techniques like predictive load scheduling, pre-fetching, pre-shuffling, and uniform load distribution across nodes. Overall improvements to MapReduce scheduling are argued to enhance resource utilization and quality of service for big data applications on Hadoop frameworks.
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...Puneet Kansal
This research article experimentally shows, how Multiple Queries can be provided to Hive and their execution can be reduced by searching common expression and common data source. The TPC-H queries are used for demonstration and test data is generated in variation of 2 GB, 5 GB and 10 GB using DBGEN software. Technology used in this is HADOOP and Hive. HADOOP is configured in Single user over ubunto Operating system.
Asserting that Big Data is vital to business is an understatement. Organizations have generated more and more data for years, but struggle to use it effectively. Clearly Big Data has more important uses than ensuring compliance with regulatory requirements. In addition, data is being generated with greater velocity, due to the advent of new pervasive devices (e.g., smartphones, tablets, etc.), social Web sites (e.g., Facebook, Twitter, LinkedIn, etc.) and other sources like GPS, Google Maps, heat/pressure sensors, etc.
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...redpel dot com
The document proposes a novel approach for predicting quality of service (QoS) metrics for cloud services. The approach combines fuzzy neural networks and adaptive dynamic programming (ADP) for improved prediction accuracy. Specifically, it uses an adaptive-network-based fuzzy inference system (ANFIS) to extract fuzzy rules from QoS data and employ ADP for online parameter learning of the fuzzy rules. Experimental results on a large QoS dataset demonstrate the prediction accuracy of this approach. The approach also provides a convergence proof to guarantee stability of the neural network weights during training.
This thesis investigates developing a new model called the Swarm Routing Problem (SRP) for multi-objective UAV mission planning using evolutionary computation. The SRP modifies the Vehicle Routing Problem with Time Windows (VRPTW) to remove the constraint of a single vehicle per target and instead models cooperative routing of a UAV swarm across multiple targets. The author develops a multi-objective evolutionary algorithm solution approach and implements it in C++ using an open source library. Experiments apply the algorithm to benchmark problems for both the VRPTW and newly developed SRP to validate the SRP as a more realistic UAV mission modeling problem.
LEARNING MIXTURES OF MARKOV CHAINS FROM AGGREGATE DATA WITH STRUCTURAL CONSTR...Nexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
This document lists 25 Internet of Things (IOT) projects and 27 Embedded Systems projects available from Embedded Systems IEEE in 2016. The projects cover various domains including IOT, smart parking, traffic management, environmental monitoring, healthcare, and more. Contact information is provided for Newzen Infotech, the organization providing the projects. The projects offer opportunities for research and development in the fields of IOT and embedded systems.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
ENJ-300 Procedimientos Penales del Juez de Paz IIIENJ
Este documento presenta el plan de estudios del curso "Procedimientos Penales del Juez de Paz III" impartido a los participantes de la Maestría en Derecho Judicial. El curso les brinda las herramientas para analizar las actividades penales en los juzgados de paz especiales de tránsito. Cubre temas como la evolución del derecho de tránsito, los conceptos básicos de la ley de tránsito, las infracciones en materia de tránsito, la responsabilidad civil derivada de accidentes, el seguro obligatorio
Pour sa première édition, l'observatoire Prism'Automobile, dévoile, en partenariat avec l'institut Ipsos, les tendances créatives publicitaires du secteur Automobile, en France et à travers un benchmark international.
Six nouveaux imaginaires automobiles émergent, dans un contexte de retour aux communications consumercentrics :
- Réconciliation (l'automobile incarnée)
- Evasion (l'automobile libérée)
- Affirmation (l'automobile personnalisée)
- Emotion (l'automobile sensorialisée)
- Initiation (l'automobile cultivée)
- Incitation (l'automobile essayée)
Cette approche sémiologique a été enrichie d'une étude qualitative des leaders d'opinions que sont les publics du pôle Premium de Prisma Pub (Capital, Management, GEO, National Geographic, Ça M'Intéresse, NEON), pour relever les insights-consommateurs les plus pertinents.
En complément, Ipsos a dégagé de sa base quantitative ASI les leviers d'efficacité des campagnes print, et identifié des best cases de complémentarité digitale.
Plus d'infos : http://www.prismapub.com/etudes/
Location aware keyword query suggestion based on document proximityShakas Technologies
Exploit every bit effective caching for high dimensional nearest neighbor search doc, Exploit every bit effective caching for high dimensional nearest neighbor search pdf, Exploit every bit effective caching, high dimensional nearest neighbor search
The document introduces various tools in XLMiner for prediction, including multiple linear regression, K-nearest neighbors, regression trees, and neural networks. It provides brief descriptions and examples of how each technique works to predict a response variable based on predictor variables in a dataset. Links are provided for more information on each predictive modeling technique in XLMiner.
This document provides an overview of using various Microsoft tools for data mining, including:
1. Business Intelligence Development Studio (BI Dev Studio) which is used to develop data mining models and contains tools like Solution Explorer and Designers.
2. Creating data sources and data source views (DSVs) to connect to and organize data for modeling.
3. Using the Data Mining Wizard to create mining structures and models by selecting data, algorithms, and parameters.
4. Refining models using the Data Mining Designer and tools like the Mining Structure Editor.
5. Generating reports on model results using SQL Server Reporting Services.
6. Managing databases and models using SQL Server
Over flow multi site aware big data management for scientific workflows on cl...ieeepondy
Over flow multi site aware big data management for scientific workflows on clouds
+91-9994232214,7806844441, ieeeprojectchennai@gmail.com,
www.projectsieee.com, www.ieee-projects-chennai.com
IEEE PROJECTS 2016-2017
-----------------------------------
Contact:+91-9994232214,+91-7806844441
Email: ieeeprojectchennai@gmail.com
RapidMiner 5 has enhanced visualization capabilities including a graph view, plot view, decision tree view, parallel plots, pie charts with value columns, and a result overview to analyze sample processes and their outputs. For additional questions, contact support@dataminingtools.net or visit www.dataminingtools.net.
XLMiner is a data mining add-in for Microsoft Excel that allows users to analyze and mine data within Excel worksheets. It includes features for partitioning data, classification, association rules, prediction, time series analysis, and charting. The 30-day demo version can be downloaded for free from the company's website and supports Excel. It provides an easy to use interface for novices to explore data mining techniques.
The document discusses data transformation techniques in RapidMiner including pivot transforms that group multiple examples into single examples. It also covers replacing missing values in data cleansing by adding operators and setting default values to zero. The complete process is shown adding all required operators for the data transformation and cleansing workflow.
Prisma Media Solutions décrypte la relation des Français au parfum, et analyse le rôle de la presse magazine et des échantillons.
http://www.prismamediasolutions.com/
2015 ME BE ECE EEE PROJECT TITLES, ABSTRACTS, BASE PAPERS, EMBEDDED, POWER E...Irissolution
Iris Solutions is a Leading ISO Certified Training and placement Company.
We Providing Final year projects With Innovative training Methods.
Project Training & Course Classes Handling by Extraordinary Qualified Staffs and also Having Very good Infrastructure.
Job support for qualified candidates. Projects in Java, J2ee, Vb, C#, .Net, Embedded, VLSI & Matlab. domain Using Networking, Network security, Mobile computing, Image Processing,etc......
Eligibility:
M.E /M.TECH, MCA, M.Sc(CSE, IT)
B.E/ B.TECH (ECE, EEE, E&I, ICE, CSE, IT)
DIPLOMA (ECE, E&I, EEE, CSE, IT, ROBOTICS)
BCA, B.Sc (CSE, IT)
FINAL YEAR STUDENT PROJECTS
REALTIME PROJECT Assistance
HIGH QUALITY TRAINING AT AFFORDABLE COST
EMBEDDED SYSTEM PROJECTS:
. WIRELESS BASED EMBEDDED SYSTEM PROJECT
. ZIGBEE BASED WIRELESS SENSOR networks
. IEEE SOLVED PAPERS PROJECT
. RFID, SMART CARD AND FINGER PRINT PROJECT
. GSM/GPRS/GPS
. ROBOTICS PROJECT
. ELECTRICAL BASED EMBEDDED SYSTEM PROJECT
. POWER ELECTRONICS PROJECT
. MATLAB PROJECT
. IMAGE PROCESSING PROJECT
*POWER ELECTRONIC ALL IEEE PAPARS…
VLSI& MATLAB.
SAFTWARE PROJECTS:
ANDROID PROJECTS
. JAVA/J2EE/J2ME PROJECTS
. .NET PROJECTS,VB,C#
. CLOUD COMPUTING PROJECTS
IMAGE PROCESSING PROJECTS
REAL TIME PROJECTS
IRIS SOLUTIONS.
Trichy - 9943 314 314
Tanjore- 9943 317 317
Kumbakonam- 9943 357 357
www.irisprojects.com
Linear accelerators use microwave technology to accelerate electrons, which are then collided with a heavy metal target to produce high-energy photons. The photons are shaped and directed to the patient's tumor. The main components of a linear accelerator include the injection system to produce electrons, the RF system to accelerate the electrons, auxiliary systems, beam transport to deliver electrons to the target, and beam collimation and monitoring systems to shape and measure the photon beam. Linear accelerators have gone through several generations with improvements like higher photon energies, computer control, dynamic wedges, and intensity modulated radiation therapy.
Self-confidence is a belief or feeling in your own abilities and talents. The way you feel or believe in yourself will impact your reactions i.e. make you either confident or not confident. Do you want to learn the simple secrets to help you improve your self-confidence? Here is presentation that highlights on the simple tips to build self-confidence easily.
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkMahantesh Angadi
This seminar presentation provides an overview of scheduling methods in the Hadoop MapReduce framework. It begins with motivations for using distributed computing for big data and introduces Hadoop and MapReduce. The presentation then surveys several proposed scheduling methods, including static methods like FIFO and adaptive methods like the Fair Scheduler. It summarizes five research papers on scheduling, discussing proposed approaches like optimizing job completion time and learning node capabilities for heterogeneous clusters. The presentation concludes that scheduling algorithms should improve data locality and use prediction to efficiently schedule jobs on heterogeneous Hadoop clusters.
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
The document summarizes a technical seminar presentation on scheduling methods in the Hadoop MapReduce framework. The presentation covers the motivation for Hadoop and MapReduce, provides an introduction to big data and Hadoop, and describes HDFS and the MapReduce programming model. It then discusses challenges in MapReduce scheduling and surveys the literature on existing scheduling methods. The presentation surveys five papers on proposed MapReduce scheduling methods, summarizing the key points of each. It concludes that improving data locality can enhance performance and that future work could consider scheduling algorithms for heterogeneous clusters.
Survey on Job Schedulers in Hadoop ClusterIOSR Journals
Abstract : Hadoop-MapReduce is a powerful parallel processing technique large for big data analysis on
distributed commodity hardware clusters such as Clouds. For job scheduling Hadoop provides default FIFO
scheduler where jobs are scheduled in FIFO order. But this scheduler might not be a good choice for some jobs.
So in such situations one should choose an alternate scheduler. In this paper we conduct a study on various
schedulers. Hopefully this study will stimulate other designers and experienced end users understand the details
of particular schedulers, enabling them to make the best choices for their particular research interests.
Keywords: Cloud Computing, Hadoop, HDFS, MapReduce, schedulers
Cache mechanism to avoid dulpication of same thing in hadoop system to speed ...eSAT Journals
This document proposes mechanisms to improve the efficiency of the Hadoop distributed file system and MapReduce framework. It suggests using locality-sensitive hashing to colocate related files on the same data nodes, which would improve data locality. It also proposes implementing a cache to store the results of MapReduce tasks, so that duplicate computations can be avoided when the same task is run again on the same data. Implementing these mechanisms could help speed up execution times in Hadoop by reducing unnecessary data transmission and repetitive task executions.
Hourglass: a Library for Incremental Processing on HadoopMatthew Hayes
Hadoop enables processing of large data sets through its relatively easy-to-use semantics. However, jobs are often written inefficiently for tasks that could be computed incrementally due to the burdensome incremental state management for the programmer. This paper introduces Hourglass, a library for developing incremental monoid computations on Hadoop. It runs on unmodified Hadoop and provides an accumulator-based interface for programmers to store and use state across successive runs; the framework ensures that only the necessary subcomputations are performed. It is successfully used at LinkedIn, one of the largest online social networks, for many use cases in dashboarding and machine learning. Hourglass is open source and freely available.
A popular programming model for running data intensive applications on the cloud is map reduce. In
the Hadoop usually, jobs are scheduled in FIFO order by default. There are many map reduce
applications which require strict deadline. In Hadoop framework, scheduler wi t h deadline
con s t ra in t s has not been implemented. Existing schedulers d o not guarantee that the job will be
completed by a specific deadline. Some schedulers address the issue of deadlines but focus more on
improving s y s t em utilization. We have proposed an algorithm which facilitates the user to
specify a jobs deadline and evaluates whether the job can be finished before the deadline.
Scheduler with deadlines for Hadoop, which ensures that only jobs, whose deadlines can be met are
scheduled for execution. If the job submitted does not satisfy the specified deadline, physical or
virtual nodes can be added dynamically to complete the job within deadline[8].
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...cscpconf
Cloud has been a computational and storage solution for many data centric organizations. The
problem today those organizations are facing from the cloud is in data searching in an efficient
manner. A framework is required to distribute the work of searching and fetching from
thousands of computers. The data in HDFS is scattered and needs lots of time to retrieve. The
major idea is to design a web server in the map phase using the jetty web server which will give
a fast and efficient way of searching data in MapReduce paradigm. For real time processing on
Hadoop, a searchable mechanism is implemented in HDFS by creating a multilevel index in
web server with multi-level index keys. The web server uses to handle traffic throughput. By web
clustering technology we can improve the application performance. To keep the work down, the
load balancer should automatically be able to distribute load to the newly added nodes in the
server.
Design Issues and Challenges of Peer-to-Peer Video on Demand System cscpconf
P2P media streaming and file downloading is most popular applications over the Internet.
These systems reduce the server load and provide a scalable content distribution. P2P
networking is a new paradigm to build distributed applications. It describes the design
requirements for P2P media streaming, live and Video on demand system comparison based on their system architecture. In this paper we described and studied the traditional approaches for P2P streaming systems, design issues, challenges, and current approaches for providing P2P VoD services.
GROUPING BASED JOB SCHEDULING ALGORITHM USING PRIORITY QUEUE AND HYBRID ALGOR...ijgca
Grid computing enlarge with computing platform which is collection of heterogeneous computing resources connected by a network across dynamic and geographically dispersed organization to form a distributed high performance computing infrastructure. Grid computing solves the complex computing
problems amongst multiple machines. Grid computing solves the large scale computational demands in a high performance computing environment. The main emphasis in the grid computing is given to the resource management and the job scheduler .The goal of the job scheduler is to maximize the resource utilization and minimize the processing time of the jobs. Existing approaches of Grid scheduling doesn’t give much emphasis on the performance of a Grid scheduler in processing time parameter. Schedulers allocate resources to the jobs to be executed using the First come First serve algorithm. In this paper, we have provided an optimize algorithm to queue of the scheduler using various scheduling methods like Shortest Job First, First in First out, Round robin. The job scheduling system is responsible to select best suitable machines in a grid for user jobs. The management and scheduling system generates job schedules for each machine in the grid by taking static restrictions and dynamic parameters of jobs and machines
into consideration. The main purpose of this paper is to develop an efficient job scheduling algorithm to maximize the resource utilization and minimize processing time of the jobs. Queues can be optimized by using various scheduling algorithms depending upon the performance criteria to be improved e.g. response
time, throughput. The work has been done in MATLAB using the parallel computing toolbox.
GROUPING BASED JOB SCHEDULING ALGORITHM USING PRIORITY QUEUE AND HYBRID ALGOR...ijgca
This document describes a proposed grouping based job scheduling algorithm for grid computing that aims to maximize resource utilization and minimize job processing times. It discusses related work on job scheduling algorithms and then presents the steps of the proposed algorithm. The algorithm uses shortest job first, first-in first-out, and round robin scheduling to process jobs in groups. The algorithm is evaluated experimentally in MATLAB and shown to reduce total job processing time compared to using only first-in first-out scheduling. Graphs demonstrate the processing time improvements achieved by the combined scheduling approach.
This document discusses a Hadoop Job Runner UI Tool that was developed to provide a graphical user interface for running Hadoop jobs. The tool allows users to browse input data locally, copy the data to HDFS, copy Java classes to remote servers, run Hadoop jobs, and copy results back from HDFS to display outputs and job statistics. The document also provides background on Hadoop and MapReduce, including an overview of how MapReduce works and how it enables distributed and parallel processing of large datasets.
This document discusses scheduling policies in Hadoop for big data analysis. It describes the default FIFO scheduler in Hadoop as well as alternative schedulers like the Fair Scheduler and Capacity Scheduler. The Fair Scheduler was developed by Facebook to allocate resources fairly between jobs by assigning them to pools with minimum guaranteed capacities. The Capacity Scheduler allows multiple tenants to securely share a large cluster while giving each organization capacity guarantees. It also supports hierarchical queues to prioritize sharing unused resources within an organization.
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
This document discusses scheduling policies in Hadoop for big data analysis. It describes the default FIFO scheduler in Hadoop as well as alternative schedulers like the Fair Scheduler and Capacity Scheduler. The Fair Scheduler was developed by Facebook to allocate resources fairly between jobs by assigning them to pools with minimum guaranteed capacities. The Capacity Scheduler allows multiple tenants to securely share a large cluster while giving each organization capacity guarantees. It also supports hierarchical queues to prioritize sharing unused resources within an organization.
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERijdpsjournal
With the interconnection of one to many computers online, the data shared by users is multiplying daily. As
a result, the amount of data to be processed by dedicated servers rises very quickly. However, the
instantaneous increase in the volume of data to be processed by the server comes up against latency during
processing. This requires a model to manage the distribution of tasks across several machines. This article
presents a study of load balancing for large data sets on a cluster of Hadoop nodes. In this paper, we use
Mapreduce to implement parallel programming and Yarn to monitor task execution and submission in a
node cluster.
This document discusses leveraging MapReduce with Hadoop to analyze weather data. It proposes building a data analytical engine using MapReduce on Hadoop to process massive amounts of temperature data from sensors. The document describes implementing MapReduce jobs to analyze National Climatic Data Center temperature data, with mappers filtering and assigning data to key-value pairs and reducers calculating averages, maximums, and minimums on the data. Overall, the document examines using Hadoop and MapReduce to scalably process large volumes of sensor weather data.
Leveraging Map Reduce With Hadoop for Weather Data Analytics iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
A novel methodology for task distributionijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
Similar to Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive Data (20)
An Examination of Effectuation Dimension as Financing Practice of Small and M...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Does Goods and Services Tax (GST) Leads to Indian Economic Development?iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Childhood Factors that influence success in later lifeiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Customer’s Acceptance of Internet Banking in Dubaiiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Consumer Perspectives on Brand Preference: A Choice Based Model Approachiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Student`S Approach towards Social Network Sitesiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Broadcast Management in Nigeria: The systems approach as an imperativeiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Study on Retailer’s Perception on Soya Products with Special Reference to T...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladeshiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...iosrjce
1. The document describes a study that designed a balanced scorecard for a nonprofit organization called Yayasan Pembinaan dan Kesembuhan Batin (YPKB) in Malang, Indonesia.
2. The balanced scorecard translated YPKB's vision and mission into strategic objectives across four perspectives: financial, customer, internal processes, and learning and growth.
3. Key strategic objectives included donation growth, budget effectiveness, customer satisfaction, reputation, service quality, innovation, and employee development. Customers perspective had the highest weighting, suggesting a focus on public service over financial growth.
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Media Innovations and its Impact on Brand awareness & Considerationiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Customer experience in supermarkets and hypermarkets – A comparative studyiosrjce
- The document examines customer experience in supermarkets and hypermarkets in India through a survey of 418 customers.
- It finds that in supermarkets, previous experience, atmosphere, price, social environment and experience in other channels most influence customer experience, while in hypermarkets, previous experience, product assortment, social environment and experience in other channels are most influential.
- The study provides insights for retailers on key determinants of customer experience in each format to help them improve strategies and competitive positioning.
Social Media and Small Businesses: A Combinational Strategic Approach under t...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Implementation of Quality Management principles at Zimbabwe Open University (...iosrjce
This document discusses the implementation of quality management principles at Zimbabwe Open University's Matabeleland North Regional Centre. It begins with background information on ZOU and the importance of quality management in open and distance learning institutions. The study aimed to determine if quality management and its principles were being implemented at the regional centre. Key findings included that the centre prioritized customer focus and staff involvement. Decisions were made based on data analysis. The regional centre implemented a quality system informed by its policy documents. The document recommends ensuring staffing levels match needs and providing sufficient resources to the regional centre.
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive Data
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. V (Nov – Dec. 2015), PP 64-75
www.iosrjournals.org
DOI: 10.9790/0661-17656475 www.iosrjournals.org 64 | Page
Map-Reduce Synchronized and Comparative Queue Capacity
Scheduler in Hadoop for Extensive Data
Sukhmani Goraya1
, Vikas Khullar2
Department of Computer Science and EngineeringCT Institute of Engineering, Management and Technology,
Shahpur, Jalandhar, India.
Abstract: Map-Reduce is drawing attention of both industrial and academic for processing of big data. In this
paper, we have mainly focused on core scheduler of Hadoop i.e. Capacity Scheduler. The scheduler assigns
tasks to the resources. Capacity Scheduler shares various resources in a cluster between concurrent jobs. We
have performed various experiments based on Map-Reduce benchmark applications i.e. TestDFSIO, Pi, Word
Count for analyzing and performance enhancement of capacity scheduler in terms of both execution time as well
as capacity. We have proposed a method Synchronized and Comparative Queue Capacity Scheduler (SCQ) that
makes better resource utilization which reflects enhanced performance in terms of execution time, throughput
and average IO rate.
Keywords: Map-Reduce, Capacity Scheduler, Fair Scheduler, HDFS, TestDFSIO, Word Count, Pi,
Synchronized and Comparative Queue (SCQ)
I. Introduction
Map-Reduce model has being successfully used by various industries to perform large computations. After
being strongly promoted by Google, it has also been implemented by the open source community through the
Hadoop [1] project, maintained by the Apache Foundation and supported by Yahoo! and even by Google itself.
Since 82% of information is in "unstructured" kind, it should be formatted in a very unique manner that creates
it suitable for ulterior analysis and data processing [2]. Hadoop is a platform for creating such data into the
structured form which helps in analysis of big data. Hadoop has its origins in Apache Nutch, an open supply net
program, itself a locality of the Lucene project [2] [4]. As we know that large data is being generated and for
processing such large data in less time, scheduling needs to be done. The role of the scheduler is to schedule
Map and Reduce task to minimize the job completion time and maximize resource utilization. The goal of
Hadoop is to supply economical and high performance process of massive information applications [5].
A. Overview Of Hadoop
Hadoop consist two components mainly as Hadoop Map-Reduce and HDFS [6] [7].
Fig. 1.1: Components of Hadoop (Map-Reduce and HDFS)
Figure 1.1 shows the components of Hadoop described as follow:
HDFS: Hadoop Distributed File System (HDFS) mimicking Google File System (GFS) [8] and Hadoop Map-
Reduce. The storage system of Hadoop is HDFS (Hadoop Distributed File System). For processing data in
Hadoop, it uses Map-Reduce engine which runs on the top of HDFS as shown in Figure 1. The engine of
Hadoop works as master-slave architecture. As we can see in the Figure 1 there is one master node and many
slave nodes. The master node has JobTracker and NameNode where as the slave node has TaskTracker and
DataNode. The JobTracker keeps the track of all the TaskTracker. The jobs are parallelized automatically to the
TaskTracker. The NameNode has all the information about all the DataNodes. Task Scheduling technology [9],
one of the key technologies of Hadoop platform, mainly controls the order of task running and the allocation of
computing resources, which is directly related with overall performance of the Hadoop platform and system
resource utilization. Data Nodes conjointly perform block creation, deletion, and replication upon instruction
2. Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive…
DOI: 10.9790/0661-17656475 www.iosrjournals.org 65 | Page
from the Name Node [10]. Existing research [11], [12] shows that the performance of Map-Reduce applications
depends on the cluster configuration, job configuration settings and input data.
Map-Reduce It has been developed by Google in 2004. In Hadoop, applications consist of map and reduce
tasks that operate on data stored on the HDFS file system [3].
Fig. 1.2: Map-Reduce Framework
In Figure 1.2, job is divided by Hadoop`s Map phase into fixed sized pieces called input split. Reduce
phase collects the output and reduces it into a single output. But the main component of Map-Reduce is job
scheduler. It schedules Map and Reduce task so that it completes in minimum time. In Hadoop version 1, there
is an assumption taken by the scheduler that the resources such as memory, CPU, network etc are of
homogenous type. Where as in Hadoop version 2(Hadoop Yarn), resource aware job schedulers to Map-Reduce
have been introduced. The paper is organized as; Section II is described as Related Research Work. Section III
presents Experimental setup and work done. Section IV has results and discussions. Section V includes the
conclusion.
II. Related Research Work
The Job runs on the JobTracker and it plays an important role in deciding where job will be executed in
the cluster. Hadoop Yarn provides the resource management and a platform to deliver consistent security and
operations. Yarn is created by separating the Map-Reduce capabilities of processing and resource management.
Resource Management can help in predicting the behavior of resources on the cluster.
There are three core schedulers of Hadoop FIFO scheduler, Fair Scheduler and Capacity Scheduler.
B. FIFO Scheduler
In the earliest Hadoop Map-Reduce computing architecture, the essential job sort is massive batch jobs
that a single user submits the job, thus Hadoop use inventory accounting (First in 1st
out) rule in early planning
algorithm [13].
FIFO is the default Hadoop scheduler [14]. The scheduling is based on the arrival time; heterogeneity
in the system is ignored. The experience from deploying Hadoop in large systems shows basic scheduling
algorithms like FIFO can cause severe performance degradation; particularly in systems that share data among
multiple users [15]. The Scheduler is single queued and jobs are executed in sequential manner. FIFO is
designed only for single type of job so, when multiple users run the job on the cluster, the performance
degrades.
C. Fair Scheduler
Fair Sharing is a Hadoop scheduler introduced to address the shortcomings of FIFO, when dealing with
small jobs and user heterogeneity [16]. The scheduling allocates resources to pools. The scheduler is responsible
for defining the pool for each user. The aim of the scheduler is fair sharing of resources among the user.
Resources to jobs such every job gets, on average, an equal share of resources over time [17]. It lets short jobs
complete among an inexpensive time whereas not starving long jobs [18]. The objective of honest scheduling
rule is to try and do a equal distribution of compute resources among the users/jobs within the system [1]. Fair
scheduling gives better performance for large clusters in comparison to FIFO scheduling. Also, it will limit the
quantity of coincidental running tasks per pool [19]. Tao et. al. introduced an improved truthful scheduling
formula, that takes under consideration job characteristics and information locality, that decreases both
information transfer and therefore the execution time of jobs. This algorithm does not consider the duty weight
of each and every node, that this is often a crucial disadvantage of it.
3. Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive…
DOI: 10.9790/0661-17656475 www.iosrjournals.org 66 | Page
D. Capacity Scheduler
Capacity Scheduler [20] originally developed at Yahoo addresses a usage scenario where the number of
users is large, and there is a need to ensure a fair allocation of computation resources amongst users. Fair
scheduler and capacity scheduling algorithm is very similar but capacity scheduler used queue instead of job
pool. Each queue is assigned to associate. The scheduling allocates resources to the pools and there is FIFO
scheduling within each pool. Capacity scheduler puts the job into various queues or hierarchy of queues
according to the conditions, and allocates bound to system capacity for every queue. It also supports hierarchy
of queues and resources are shared among the sub queues and each user has the limit of some percentage to use
the resources. If a queue has serious load, it seeks unallocated resources, then makes redundant resources
allotted equally to every job [4]. Once the job put into the queue and running task is completed then, the
resources are given back to the main queue. It also allows priority primarily based programming of jobs in
associate degree organization queue [21]. Queues in Hadoop are not created automatically. For doing all this we
need to know about the system`s information. After knowing about the system`s configuration, the capacity of
the queues can be also set which can define the bound on the elasticity of the queue. A single job does not use
more resources then its queue but the capacity of the queue can be increased using the property of elasticity. If a
queue has serious load, it seeks unallocated resources, then makes redundant resources allotted equally to every
job [22]. It also allows priority primarily based programming of jobs in associate degree organization queue
[23]. Queues are monitored and are assigned more free resources beyond its capacity if needed. The foremost
advanced among three schedulers is a vital drawback in capability algorithm [24]. Capacity Scheduler has
other limits to that it can accept the job for a queue or not. It also supports hierarchy of queues and resources are
shared among the sub queues and each user has the limit of some percentage to use the resources [25]. If needed
queues are monitored and are assigned more free resources beyond its capacity. Creation of the queues is not
done automatically, [26] for this the user needs to know about the system information. Queues are monitored by
resource manager and if needed more resources are assigned beyond its capacity.
The Capacity scheduler works according to the following:
1. CapacityScheduer.xml contains the existing configuration of the cluster. It contains the task scheduler
settings. The queues are set using this information.
2. When the job is submitted to the cluster, the scheduler checks for job submission limit to evaluate that job
can be accepted or not.
3. If the job is submitted, it is assigned a queue.
4. The JobTracker gets the heartbeat from the TaskTracker and starts the processing.
5. If required the hierarchy of queues is created and distributed among the various TaskTracker.
III. Experimental Setup
The performance of Hadoop is dependent on the available resource utilization. We had studied the Capacity
Scheduler which shares computing resources among the queues. We had performed various experiments on the
Map-Reduce application benchmark example TestDFSIO, Pi, Word Count. The purpose of this task is to task of
the scheduler and get optimized performance from optimized performance.
E. Approach
With the advancement Hadoop from version 1 to version 2, there was increase in performance in terms
of execution time and capacity of the tasks performed. In Hadoop version 1 fair scheduler was used and in latest
version of Hadoop approach of capacity scheduler is used. Although capacity scheduler has enhanced the
performance, the limitation was that it works on the approach of fair scheduler which can be improved further.
There is a scope of further increasing the performance of capacity scheduler by enhancing the concept of
pipelining. Once the tasks are performed in a synchronous way and performance of capacity scheduler will
increase further in terms of both execution time as well as capacity.
F. Methodology
The methodology used for setting up the system has been shown in the figure 2. Firstly we need to
install Java. Then we need to create a group for Hadoop users. SSH certification is used for security purpose in
Hadoop. Secondly we need to install Hadoop. After configuring Hadoop, we need to add Dynamic Scheduler to
yarn-site.xml.
4. Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive…
DOI: 10.9790/0661-17656475 www.iosrjournals.org 67 | Page
Fig 2: Working of default scheduler
Fig 3: Working of our Proposed Method SCQ
Figure 2 and 3 shows the working of default scheduler and the scheduler working with our proposed
method SCQ.
G. Working of our Proposed Method SCQ
1. Once the resource manager gets started, everything is happening in the form of events.
2. Capacity scheduler registers itself with the events and acts on those events. When the node is added,
ResourceTrackService registers with the node manager.
3. When application or job is added, it will be submitted to the queue. We have added capacity scheduler to
the file yarn-site.xml for processing Hadoop files.
4. Once the scheduler starts, the method run on Capacity scheduler gets started. The Component container of
the Resource Manager takes the responsibility of all the resources like disk, CPU, memory, etc. Then the
job is given to the existing queue and it is solved in the hierarchical way.
5. Now, the queues are being created by the method addNewCSQueue.
private void addNewCSQueues( Map queues, Map newQueues)
{
For(Map.Entryx:newQueues.entrySet()){
String queueName = x.getKey();
CSQueue queue = x.getValue();
if
(!queues.containsKey(queueName)) {
queues.put(queueName, queue); }
}
}
6. Comparison is done between the queues and capacity allocated to the Queues is being calculated by the
method CSComparator.
static final CSComparator
queueComparator = new Comparator ()
{
public int compare(CSQueue x1, CSQueue x2)
{
if (x1.getUsedCapacity()<x2.getUsedCapacity())
{
return -1;
}
Else if
(x1.getUsedCapacity () >x2.getUsedCapacity ())
{
return 1;
}
return x1.getQueuePath().compareTo(x2.getQueuePath ());
5. Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive…
DOI: 10.9790/0661-17656475 www.iosrjournals.org 68 | Page
}
};
7. The synchronization is between the queues is done by using the method SynchroniseCSQueue. While
synchronizing, the queues get information about the resources, they share resources accordingly.
We have performed various experiments using the bench mark example Pi, WordCount and TestDFSIO.
H. System configuration:
1) Configuration of Single-Node Machine
Table 1.1 Configuration Of Machine
Hadoop Version Hadoop 2.6.0
File System Hadoop File System
Bench Program
Pi, WordCount and
TestDFSIO
Operating System Linux Mint 17.02
Clustered node Single Node
Processor CPU I5
CPU 4 core
RAM 4 GB
Table 1.1 shows the single node machine configuration for examining the performance.
2) Configuration of Cluster Machines
Table 1.2 Configuration Of Cluster Machines
Hadoop Version Hadoop 2.6.0
File System Hadoop File System
Bench Program
Pi, WordCount and
TestDFSIO
Operating System Linux Mint 17.02
Clustered node 4 Nodes
Processor CPU I5
CPU 4 core for each node
RAM 4 GB for each node
Table 1.2 shows the configuration of 4 node cluster for examining the performance.
IV. Results and Discussions
We have taken two cases for experiment using the benchmark examples PI, WordCount and TestDFSIO. In
case of TestDFSIO we have combined the results of read and write operation and calculated throughput, average
IO rate and execution time.
I. In the first experiment, using the configuration of Single Machine from Table 1.1 comparison is done
between default configuration of Hadoop version 2.6.0 and our proposed method SCQ on single node of
Hadoop version 2.6.0.
For this experiment we have created a single clustered node with Hadoop Version 2.6.0 on the Operating
System of Linux Mint 17.02. Experiments are performed using the processor I5.
1) PI
TABLE 2(a) Single-node execution Time of PI in case of default Hadoop version 2.6.0 and our proposed
method SCQ on Hadoop version 2.6.0
No. of
Maps
No. of
Maps per
Sample
Execution Time (sec) of
default Hadoop (2.6.0)
Execution Time (sec) of our
Proposed Method SCQ on Hadoop
(2.6.0)
Estimated Value in
both the cases
5 5 19.965 17.086 3.68
10 10 21.157 21.08 3.2
20 20 37.27 33.211 3.17
30 30 48.321 43.252 3.137777778
40 40 59.346 55.277 3.15
40 45 95.306 79.032 3.148888889
6. Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive…
DOI: 10.9790/0661-17656475 www.iosrjournals.org 69 | Page
50 60 70.293 66.347 3.141333333
60 70 81.905 79.816 3.14
Table 2(a) shows the execution time of PI in case of default Hadoop version 2.6.0 and our proposed
method SCQ on Hadoop version 2.6.0. Default Hadoop defines the Systems default configuration setting and
Proposed Method defines the system that we have created SCQ.
We have verified our results, while doing the above experiment by estimating value of Pi, in both the
cases i.e. default Hadoop version 2.6.0 and our proposed method SCQ on Hadoop version 2.6.0 was same but
the execution time of our proposed was less.
2) WordCount
WordCount has also been executed with different data sets i.e. 12 MB, 36MB, 300 MB, 500 MB, 1
GB, 1.5 GB and 2 GB.
TABLE 2(b), Single-node execution time of WordCount in case of default Hadoop version 2.6.0 and our
proposed method SCQ on Hadoop 2.6.0
Data Size
Execution Time
(HH:MM:SS) of
default Hadoop
version 2.6.0
Execution Time
(HH:MM:SS) of our
proposed method SCQ
on Hadoop version 2.6.0
12-2 MB 0:00:20 0:00:18
36-5 MB 0:00:24 0:00:22
300 MB 0:00:42 0:00:40
500 MB 0:00:47 0:00:45
938 MB 0:01:38 0:01:28
1500 MB 0:02:15 0:01:45
2 GB 0:03:43 0:02:26
Table 2(b) shows the execution time of Default Hadoop version 2.6.0 and Proposed Method SCQ on
Hadoop version 2.6.0 is better than the Default settings of Hadoop.
3) TestDFSIO
TestDFSIO is used to measure performance of system. The commands read and write file in HDFS that
is useful in measuring system performance. A majority of Map-Reduce workloads are IO bound more than
compute and hence TestDFSIO can provide an accurate initial picture of such scenarios.
TABLE 2(c) Experimental Results of Throughput
Number
of files
Total MBs
processed
Throughput
(MB/sec) of
default Hadoop
version 2.6.0
Throughput
(MB/sec) of our
Proposed Method
SCQ on Hadoop
version 2.6.0
2 2000 60.60363406 77.39978595
3 3000 44.72412292 47.87551606
4 4000 27.05919615 33.31398167
5 5000 20.17321106 25.96059815
In Table 2(c) shows Throughput of Default Hadoop version 2.6.0 and Proposed Method SCQ on
Hadoop version 2.6.0.
7. Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive…
DOI: 10.9790/0661-17656475 www.iosrjournals.org 70 | Page
Fig. 4(a): Throughput of default Hadoop version 2.6.0 and our proposed method SCQ on Hadoop 2.6.0
Fig. 4(a) we can see that Throughput of our proposed method SCQ on Hadoop version 2.6.0 is better than the
default Hadoop version 2.6.0. As the size of the data is increasing the number bytes to be processed is more.
Graph shows that bytes processed are more in case of our proposed method SCQ.
TABLE 2(d) Experimental Results of Average IO rate
Number
of files
Total MBs
processed
Average IO
rate (MB/sec) of
default Hadoop
version 2.6.0
Average IO rate
(MB/sec) of our
Proposed
Method SCQ on
Hadoop version
2.6.0
2 2000 60.60419083 77.42837143
3 3000 44.73816299 47.88214874
4 4000 27.0789423 33.46420876
5 5000 20.18930435 25.99404621
Table 2(d) shows the Average IO rate of default Hadoop version 2.6.0 and our Proposed Method SCQ
on Hadoop version 2.6.0.
Fig. 4(b): Average IO rate of default Hadoop version 2.6.0 and our proposed method SCQ on Hadoop 2.6.0
Figure 4(b) shows as the size of data increases the average IO rate decreases. As the load increases, our
proposed method SCQ shows better results than the default configuration of Hadoop.
TABLE 2(e) Experimental Results of Execution Time
Number
of files
Total MBs
processed
Test exec
time(sec) of
default Hadoop
version 2.6.0
Test exec
time(sec) of our
Proposed Method
SCQ on Hadoop
version 2.6.0
2 2000 100.409 83.508
3 3000 132.31 118.066
4 4000 198.175 164.038
5 5000 289.096 203.145
8. Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive…
DOI: 10.9790/0661-17656475 www.iosrjournals.org 71 | Page
Table 2(e) shows the Test exec time (sec) of processing Total number bytes of default Hadoop version
2.6.0 and our proposed method SCQ on Hadoop version 2.6.0.
Fig. 4(c) Execution Time of default Hadoop version 2.6.0 and our proposed method SCQ on Hadoop 2.6.0
Figure 4(c) shows as the size of data increases the execution time for processing also increases. From
the graph we can see that our proposed method SCQ on Hadoop version 2.6.0 takes less time than the default
Hadoop version 2.6.0.
TABLE 2(f) CPU performance of default Hadoop version 2.6.0 and our proposed method SCQ on
Hadoop 2.6.0 in case of write operation
No. of
files
Total
MBs
processed
CPU
performance of
Default
Hadoop
version 2.6.0
CPU performance
of our Proposed
Method SCQ on
Hadoop version
2.6.0
2 2000 25% 28%
3 3000 24% 30%
4 4000 28% 50%
5 5000 38% 60%
TABLE 2(g) CPU performance of default Hadoop version 2.6.0 and our proposed method SCQ on
Hadoop 2.6.0 in case of read operation
No. of
files
Total
MBs
processed
CPU
performance
of Default
Hadoop
version 2.6.0
CPU
performance of
our Proposed
Method SCQ on
Hadoop version
2.6.0
2 2000 6% 10%
3 3000 6% 10%
4 4000 9% 14%
5 5000 8% 14%
Tables 2(f) and 2(g) also shows the CPU performance of default Hadoop version 2.6.0 and our
proposed method SCQ on Hadoop version 2.6.0. Our Proposed method SCQ on Hadoop version 2.6.0 shows
better resource utilization than the default configuration of Hadoop version 2.6.0.
J. In the second experiment, using the configuration of Clustered Machine from Table 1.2, comparison has also
been done on the cluster of 4 nodes of Hadoop 2.6.0 with our proposed method SCQ and Hadoop 2.6.0 with
default configuration.
Table 1.2 shows that in the second experiment, we have used the cluster of 4 nodes with RAM 4GB for each
node. In this cluster, we are using the processor I5.
1) PI
9. Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive…
DOI: 10.9790/0661-17656475 www.iosrjournals.org 72 | Page
TABLE 3(a): Multi-node execution Time of PI in case of default Hadoop version 2.6.0 and our proposed
method SCQ on Hadoop 2.6.0
No. of
Maps
No. of
samples
per
Map
Execution Time
(sec) of default
Hadoop( 2.6.0)
Execution Time (sec) of
our proposed method
SCQ on Hadoop(2.6.0)
Estimated Value in both
the cases
5 5 20.74 19.499 3.68
10 10 23.289 20.168 3.2
20 20 36.368 32.301 3.17
30 30 51.365 44.265 3.13777778
40 40 64.484 55.347 3.15
40 45 63.488 56.349 3.14888889
50 60 77.53 67.526 3.14133333
60 70 94.01 79.428 3.14
Table 3(a) shows the execution time of PI in case of multi-node with default configuration of Hadoop
(2.6.0) and our proposed method SCQ on Hadoop (2.6.0). The Execution Time of our Proposed Method SCQ is
less. In this case also, we have verified our results as the estimated value in both the cases are same.
2) WordCount
TABLE 3(b) Multi-node execution Time of WordCount in case of default Hadoop version 2.6.0 and our
proposed method SCQ on Hadoop 2.6.0
Data size
Execution time
(HH:MM:SS) of
default Hadoop
version 2.6.0
Execution time (HH:MM:SS)of
our Proposed Method SCQ on
Hadoop version 2.6.0
12-2 MB 0:00:20 0:00:19
36-5 MB 0:00:23 0:00:22
300 MB 0:00:47 0:00:41
500 MB 0:00:57 0:00:52
938 MB 0:01:56 0:01:46
1500 MB 0:02:16 0:02:09
2 GB 0:03:00 0:02:13
Table 3(b) shows the execution time of default Hadoop (2.6.0) and our proposed method SCQ on
Hadoop (2.6.0). Hadoop (2.6.0) with our proposed method SCQ gives better results.
3) Test DFSIO
TABLE 3(c) Experimental Results of Throughput on cluster
Number of
files
Total MBs
processed
Throughput
(MB/sec) of
default Hadoop
version 2.6.0
Throughput
(MB/sec) of
our Proposed
Method SCQ
on Hadoop
version 2.6.0
2 2000 33.53691306 37.17452264
3 3000 13.35859242 15.60451619
4 4000 14.30819768 16.76003192
5 5000 12.89387136 15.05275675
Fig. 5(a) Throughput of default Hadoop (2.6.0) and our proposed method on Hadoop (2.6.0) on cluster
10. Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive…
DOI: 10.9790/0661-17656475 www.iosrjournals.org 73 | Page
Figure 5(a) and table 3(c) shows that the throughput of Hadoop version 2.6.0 with our proposed method SCQ
even on the cluster gives better results i.e. its processing more number of byte than the default configuration of
Hadoop version 2.6.0.
TABLE 3(d) Experimental Results of Average IO rate
Number of
files
Total MBs
processed
Average IO
rate (MB/sec)
of default
Hadoop
version 2.6.0
Average IO
rate (MB/sec)
of our
Proposed
Method SCQ
on Hadoop
version 2.6.0
2 2000 33.53774309 37.17828274
3 3000 13.36210346 15.60545063
4 4000 14.309093 16.76963973
5 5000 12.91100812 15.10412097
Fig. 5(b) Average IO rate of default Hadoop (2.6.0) and our proposed method on Hadoop (2.6.0) on cluster
TABLE 3(e) Experimental Results of Execution Time
Number of
files
Total MBs
processed
Test exec
time(sec) of
default Hadoop
version 2.6.0
Test exec time(sec) of our
Proposed Method SCQ on
Hadoop version 2.6.0
2 2000 251.586 242.818
3 3000 411.852 385.603
4 4000 516.319 465.723
5 5000 650.049 577.913
Fig. 5(c) Execution Time of default Hadoop version 2.6.0 and our proposed method SCQ on Hadoop 2.6.0 on
cluster
11. Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive…
DOI: 10.9790/0661-17656475 www.iosrjournals.org 74 | Page
Similarly from the table 3(d) and figure 5(b) shows that our Average IO rate with our proposed method
SCQ on Hadoop (2.6.0) gives better results on cluster also. Table 3(e) and figure 6(c) also shows that the
Execution Time taken by Hadoop (2.6.0) with our proposed method SCQ is less than default Hadoop 2.6.0 on
cluster.
The execution time in case of a cluster is more than the single node of Hadoop because of cluster data
replication is there. Due to which it take more time for write operation.
TABLE 3(f) CPU performance of default Hadoop version 2.6.0 and our proposed method SCQ on
Hadoop 2.6.0 in case of write operation
No. of
files
Total MBs
processed
CPU performance of
Default Hadoop
version 2.6.0
CPU performance of our
Proposed Method SCQ on Hadoop
version 2.6.0
2 2000 26% 30%
3 3000 35% 48%
4 4000 35% 42%
5 5000 44% 50%
TABLE 3(g) CPU performance of default Hadoop version 2.6.0 and our proposed method SCQ on
Hadoop 2.6.0 in case of read operation
No. of
files
Total MBs
processed
CPU
performance
of Default
Hadoop
version 2.6.0
CPU
performance of
our Proposed
Method SCQ
on Hadoop
version 2.6.0
2 2000 22% 26%
3 3000 20% 24%
4 4000 20% 26%
5 5000 26% 30%
Tables 3(f) and 3(g) also shows the CPU performance of cluster on default Hadoop (2.6.0) and our
proposed method SCQ on Hadoop (2.6.0). Here also we can see that the performance of cluster with our
proposed method SCQ on Hadoop (2.6.0) is better.
V. Conclusion
Our proposed method is implemented on one of the core schedulers of Hadoop i.e. Capacity scheduler.
In this paper, we have performed various experiments using the benchmark examples PI, WordCount and
TestDFSIO. Firstly analysis has been done on default configuration of Hadoop (2.6.0) and our proposed method
SCQ on the same version on the single node. Secondly analysis is done on cluster of 4 nodes of Hadoop (2.6.0)
with our proposed method SCQ and Hadoop (2.6.0) with default configuration. We have also verified our
results using the benchmark example PI. Analysis is also done on the performance of CPU and it also shows that
the resource utilization is better in case of our proposed method.
From the above two comparisons we can see that our proposed SCQ on Hadoop version 2.6.0 shows
better results than default configuration of Hadoop version 2.6.0.
References
[1] THE APACHE HADOOP PROJECT, HTTP://WWW.HADOOP.ORG.
[2] INUKOLLU, V. N., ARSI, S., & RAVURI, S. R. (2014). “SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING”.
INTERNATIONAL JOURNAL OF NETWORK SECURITY & ITS APPLICATIONS (IJNSA), 6(3).
[3] T. White, Hadoop: The Definitive Guide. O’Reilly Media, Inc., 2009.
[4] Umesh V. Nikam, Anup W. Burange, Abhishek A. Gulhane, “Big Data and HADOOP: A Big Game Changer”, International
Journal of Advance Research in Computer Science and Management Studies, Volume 1, Issue 7, ISSN: 2321-7782, DEC 2013.
[5] White, T., 2012, “Hadoop: The Definitive Guide”, ed. Third, Tokyo: Yahoo press.
[6] Rasooli, Aysan, and Douglas G. Down. "A Hybrid Scheduling Approach For Scalable Heterogeneous Hadoop Systems." In High
Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: pp. 1284-1291. IEEE, 2012.
[7] Kurazumi, Shiori, Tomoaki Tsumura, Shoichi Saito, and Hiroshi Matsuo. "Dynamic Processing Slots Scheduling for I/O Intensive
Jobs of Hadoop Map-Reduce." In ICNC, pp. 288-292. 2012.
[8] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The google file system,” in 19th ACM Symposium on Operating Systems Principles,
Lake George, NY, Oct. 2003.
[9] Jilan Chen, Dan Wang and Wenbing Zhao,” A Task Scheduling Algorithm for Hadoop Platform” in Journal of Computers , vol. 8,
no. 4, April 2013.
[10] Shvachko, Konstantin, Hairong Kuang, Sanjay Radia, and Robert Chansler. "The hadoop distributed file system." In Mass Storage
Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pp. 1-10. IEEE, 2010.
[11] H. Herodotou and S. Babu, “Profiling, what-if analysis, and cost-based optimization of Map-Reduce programs,” VLDB Endowment
(PVLDB), vol. 4, no. 11, pp. 1111–1122, 2011.
12. Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop for Extensive…
DOI: 10.9790/0661-17656475 www.iosrjournals.org 75 | Page
[12] G. Wang, A. Butt., P. Pandey, and K. Gupta, “A simulation approach to evaluating design decisions in Map-Reduce setups,” in
MASCOTS ’09, London, UK, Sep. 2009, pp. 1–11.
[13] M. Isard, M. Budiu, Y. Yu, “Distributed Data-Parallel Programs from Sequential Building Blocks,” Proceedings of the 2nd ACM
SIGOPS European Conference on Computer Systems, ACM, 59-72. 2007.
[14] J. Dean and S. Ghemawat, “Map-Reduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, pp.
107–113, January 2008.
[15] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia,
“A view of Cloud computing,” Communications of the ACM, vol. 53, no. 4, pp. 50–58, April 2010. [Online]. Available:
http://doi.acm.org/10.1145/1721654.1721672
[16] M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, “Delay scheduling: a simple technique for
achieving locality and fairness in cluster scheduling,” in Proceedings of the 5th European conference on Computer systems, Paris,
France, 2010, pp. 265–278.
[17] Hadoop’s Fair Scheduler. https://hadoop.apache.org/docs/r1.2.1/fair_scheduler. [As accessed on 9 Feb. 2015].
[18] B.P Andrews and A. Binu,“Survey on Job Schedulers in Hadoop Cluster”, IOSR Journal of Computer Engineering, Vol.15, NO. 1,
Sep. - Oct. 2013, pp. 46-50.
[19] Y. Tao Y, Q. Zhang, L. Shi and P. Chen. “Job Scheduling Optimization for Multi-user Map-Reduce Clusters”, In: The fourth
International symposium on parallel Architectures, algorithms and programming. IEEE 2011. p. 213–17.
[20] Y. Chen, S. Alspaugh, and R.H. Katz, “”Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of Map-
Reduce Workloads‟‟ Proc. VLDB Endowment, vol. 5, no. 12, Aug. 2012.
[21] N. Tiwari, “Scheduling and Energy Efficiency Improvement Techniques for Hadoop Map-Reduce: State of Art and Directions for
Future Research (Doctoral dissertation, Indian Architectures, algorithms and programming.” IEEE; 2011. p. 213–17.
[22] Umesh V. Nikam, Anup W. Burange, Abhishek A. Gulhane, “Big Data and HADOOP: A Big Game Changer”, International
Journal of Advance Research in Computer Science and Management Studies, Volume 1, Issue 7, ISSN: 2321-7782, DEC 2013.
[23] N. Tiwari, “Scheduling and Energy Efficiency Improvement Techniques for Hadoop Map-Reduce: State of Art and Directions for
Future Research (Doctoral dissertation, Indian Architectures, algorithms and programming.” IEEE; 2011. p. 213–17.
[24] http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn- site/CapacityScheduler.html.
[25] Sukhmani Goraya, Vikas Khullar, “Comparative Analysis of Core Schedulers of Hadoop”, Multi-Track Conference on Sciences,
ISBN No.978-81-929077-2-7, Volume 1.
[26] Sukhmani Goraya, Vikas Khullar, “ Enhancing Dynamic Capacity Scheduler for Data Intensive Jobs”, International Journal of
Computer Applications (0975 – 8887) Volume 121 – No.12, July 2015.