An intensive use of reconfigurable hardware is expected in future embedded systems. This means that the
system has to decide which tasks are more suitable for hardware execution. In order to make an efficient
use of the FPGA it is convenient to choose one that allows hardware multitasking, which is implemented by
using partial dynamic reconfiguration. One of the challenges for hardware multitasking in embedded
systems is the online management of the only reconfiguration port of present FPGA devices. This paper
presents different online reconfiguration scheduling strategies which assign the reconfiguration interface
resource using different criteria: workload distribution or task’ deadline. The online scheduling strategies
presented take efficient and fast decisions based on the information available at each moment. Experiments
have been made in order to analyze the performance and convenience of these reconfiguration strategies.
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
A novel methodology for task distributionijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
Sharing of cluster resources among multiple Workflow Applicationsijcsit
Many computational solutions can be expressed as workflows. A Cluster of processors is a shared
resource among several users and hence the need for a scheduler which deals with multi-user jobs
presented as workflows. The scheduler must find the number of processors to be allotted for each workflow
and schedule tasks on allotted processors. In this work, a new method to find optimal and maximum
number of processors that can be allotted for a workflow is proposed. Regression analysis is used to find
the best possible way to share available processors, among suitable number of submitted workflows. An
instance of a scheduler is created for each workflow, which schedules tasks on the allotted processors.
Towards this end, a new framework to receive online submission of workflows, to allot processors to each
workflow and schedule tasks, is proposed and experimented using a discrete-event based simulator. This
space-sharing of processors among multiple workflows shows better performance than the other methods
found in literature. Because of space-sharing, an instance of a scheduler must be used for each workflow
within the allotted processors. Since the number of processors for each workflow is known only during
runtime, a static schedule can not be used. Hence a hybrid scheduler which tries to combine the advantages
of static and dynamic scheduler is proposed. Thus the proposed framework is a promising solution to
multiple workflows scheduling on cluster.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
Many computational solutions can be expressed as Di
rected Acyclic Graph (DAG), in which
nodes represent tasks to be executed and edges repr
esent precedence constraints among tasks.
A Cluster of processors is a shared resource among
several users and hence the need for a
scheduler which deals with multi-user jobs presente
d as DAGs. The scheduler must find the
number of processors to be allotted for each DAG an
d schedule tasks on allotted processors. In
this work, a new method to find optimal and maximum
number of processors that can be allotted
for a DAG is proposed. Regression analysis is used
to find the best possible way to share
available processors, among suitable number of subm
itted DAGs. An instance of a scheduler
for each DAG, schedules tasks on the allotted proce
ssors. Towards this end, a new framework
to receive online submission of DAGs, allot process
ors to each DAG and schedule tasks, is
proposed and experimented using a simulator. This s
pace-sharing of processors among multiple
DAGs shows better performance than the other method
s found in literature. Because of space-
sharing, an online scheduler can be used for each D
AG within the allotted processors. The use
of online scheduler overcomes the drawbacks of stat
ic scheduling which relies on inaccurate
estimated computation and communication costs. Thus
the proposed framework is a promising
solution to perform online scheduling of tasks usin
g static information of DAG, a kind of hybrid
scheduling
.
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORScscpconf
Many computational solutions can be expressed as Directed Acyclic Graph (DAG), in which
nodes represent tasks to be executed and edges represent precedence constraints among tasks.
A Cluster of processors is a shared resource among several users and hence the need for a
scheduler which deals with multi-user jobs presented as DAGs. The scheduler must find the
number of processors to be allotted for each DAG and schedule tasks on allotted processors. In
this work, a new method to find optimal and maximum number of processors that can be allotted
for a DAG is proposed. Regression analysis is used to find the best possible way to share
available processors, among suitable number of submitted DAGs. An instance of a scheduler
for each DAG, schedules tasks on the allotted processors. Towards this end, a new framework
to receive online submission of DAGs, allot processors to each DAG and schedule tasks, is
proposed and experimented using a simulator. This space-sharing of processors among multiple
DAGs shows better performance than the other methods found in literature. Because of spacesharing,
an online scheduler can be used for each DAG within the allotted processors. The use
of online scheduler overcomes the drawbacks of static scheduling which relies on inaccurate
estimated computation and communication costs. Thus the proposed framework is a promising
solution to perform online scheduling of tasks using static information of DAG, a kind of hybrid
scheduling.
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
A novel methodology for task distributionijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
Modern embedded systems are being modeled as Heterogeneous Reconfigurable Computing Systems
(HRCS) where Reconfigurable Hardware i.e. Field Programmable Gate Array (FPGA) and soft core
processors acts as computing elements. So, an efficient task distribution methodology is essential for
obtaining high performance in modern embedded systems. In this paper, we present a novel methodology
for task distribution called Minimum Laxity First (MLF) algorithm that takes the advantage of runtime
reconfiguration of FPGA in order to effectively utilize the available resources. The MLF algorithm is a list
based dynamic scheduling algorithm that uses attributes of tasks as well computing resources as cost
function to distribute the tasks of an application to HRCS. In this paper, an on chip HRCS computing
platform is configured on Virtex 5 FPGA using Xilinx EDK. The real time applications JPEG, OFDM
transmitters are represented as task graph and then the task are distributed, statically as well dynamically,
to the platform HRCS in order to evaluate the performance of the designed task distribution model. Finally,
the performance of MLF algorithm is compared with existing static scheduling algorithms. The comparison
shows that the MLF algorithm outperforms in terms of efficient utilization of resources on chip and also
speedup an application execution.
Sharing of cluster resources among multiple Workflow Applicationsijcsit
Many computational solutions can be expressed as workflows. A Cluster of processors is a shared
resource among several users and hence the need for a scheduler which deals with multi-user jobs
presented as workflows. The scheduler must find the number of processors to be allotted for each workflow
and schedule tasks on allotted processors. In this work, a new method to find optimal and maximum
number of processors that can be allotted for a workflow is proposed. Regression analysis is used to find
the best possible way to share available processors, among suitable number of submitted workflows. An
instance of a scheduler is created for each workflow, which schedules tasks on the allotted processors.
Towards this end, a new framework to receive online submission of workflows, to allot processors to each
workflow and schedule tasks, is proposed and experimented using a discrete-event based simulator. This
space-sharing of processors among multiple workflows shows better performance than the other methods
found in literature. Because of space-sharing, an instance of a scheduler must be used for each workflow
within the allotted processors. Since the number of processors for each workflow is known only during
runtime, a static schedule can not be used. Hence a hybrid scheduler which tries to combine the advantages
of static and dynamic scheduler is proposed. Thus the proposed framework is a promising solution to
multiple workflows scheduling on cluster.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
Many computational solutions can be expressed as Di
rected Acyclic Graph (DAG), in which
nodes represent tasks to be executed and edges repr
esent precedence constraints among tasks.
A Cluster of processors is a shared resource among
several users and hence the need for a
scheduler which deals with multi-user jobs presente
d as DAGs. The scheduler must find the
number of processors to be allotted for each DAG an
d schedule tasks on allotted processors. In
this work, a new method to find optimal and maximum
number of processors that can be allotted
for a DAG is proposed. Regression analysis is used
to find the best possible way to share
available processors, among suitable number of subm
itted DAGs. An instance of a scheduler
for each DAG, schedules tasks on the allotted proce
ssors. Towards this end, a new framework
to receive online submission of DAGs, allot process
ors to each DAG and schedule tasks, is
proposed and experimented using a simulator. This s
pace-sharing of processors among multiple
DAGs shows better performance than the other method
s found in literature. Because of space-
sharing, an online scheduler can be used for each D
AG within the allotted processors. The use
of online scheduler overcomes the drawbacks of stat
ic scheduling which relies on inaccurate
estimated computation and communication costs. Thus
the proposed framework is a promising
solution to perform online scheduling of tasks usin
g static information of DAG, a kind of hybrid
scheduling
.
MULTIPLE DAG APPLICATIONS SCHEDULING ON A CLUSTER OF PROCESSORScscpconf
Many computational solutions can be expressed as Directed Acyclic Graph (DAG), in which
nodes represent tasks to be executed and edges represent precedence constraints among tasks.
A Cluster of processors is a shared resource among several users and hence the need for a
scheduler which deals with multi-user jobs presented as DAGs. The scheduler must find the
number of processors to be allotted for each DAG and schedule tasks on allotted processors. In
this work, a new method to find optimal and maximum number of processors that can be allotted
for a DAG is proposed. Regression analysis is used to find the best possible way to share
available processors, among suitable number of submitted DAGs. An instance of a scheduler
for each DAG, schedules tasks on the allotted processors. Towards this end, a new framework
to receive online submission of DAGs, allot processors to each DAG and schedule tasks, is
proposed and experimented using a simulator. This space-sharing of processors among multiple
DAGs shows better performance than the other methods found in literature. Because of spacesharing,
an online scheduler can be used for each DAG within the allotted processors. The use
of online scheduler overcomes the drawbacks of static scheduling which relies on inaccurate
estimated computation and communication costs. Thus the proposed framework is a promising
solution to perform online scheduling of tasks using static information of DAG, a kind of hybrid
scheduling.
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...IJECEIAES
Embedded systems with multi core processors are increasingly popular because of the diversity of applications that can be run on it. In this work, a reinforcement learning based scheduling method is proposed to handle the real time tasks in multi core systems with effective CPU usage and lower response time. The priority of the tasks is varied dynamically to ensure fairness with reinforcement learning based priority assignment and Multi Core MultiLevel Feedback queue (MCMLFQ) to manage the task execution in multi core system
Cloud computing is an emerging technology. It process huge amount of data so scheduling mechanism
works as a vital role in the cloud computing. Thus my protocol is designed to minimize the switching time,
improve the resource utilization and also improve the server performance and throughput. This method or
protocol is based on scheduling the jobs in the cloud and to solve the drawbacks in the existing protocols.
Here we assign the priority to the job which gives better performance to the computer and try my best to
minimize the waiting time and switching time. Best effort has been made to manage the scheduling of jobs
for solving drawbacks of existing protocols and also improvise the efficiency and throughput of the server.
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...IJCNCJournal
Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel
system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting
of multiple collections of nodes with different types of computing devices. The execution engine of the
system is open for optimizer implementations, focusing on various criteria. In this paper, we propose a new
optimizer for KernelHive, that utilizes distributed databases and performs data prefetching to optimize the
execution time of applications, which process large input data. Employing a versatile data management
scheme, which allows combining various distributed data providers, we propose using NoSQL databases
for our purposes. We support our solution with results of experiments with real executions of our OpenCL
implementation of a regular expression matching application in various hardware configurations.
Additionally, we propose a network-aware scheduling scheme for selecting hardware for the proposed
optimizer and present simulations that demonstrate its advantages.
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGcscpconf
Parallel computing systems compose task partitioning strategies in a true multiprocessing
manner. Such systems share the algorithm and processing unit as computing resources which
leads to highly inter process communications capabilities. The main part of the proposed
algorithm is resource management unit which performs task partitioning and co-scheduling .In
this paper, we present a technique for integrated task partitioning and co-scheduling on the
privately owned network. We focus on real-time and non preemptive systems. A large variety of
experiments have been conducted on the proposed algorithm using synthetic and real tasks.
Goal of computation model is to provide a realistic representation of the costs of programming
The results show the benefit of the task partitioning. The main characteristics of our method are
optimal scheduling and strong link between partitioning, scheduling and communication. Some
important models for task partitioning are also discussed in the paper. We target the algorithm
for task partitioning which improve the inter process communication between the tasks and use
the recourses of the system in the efficient manner. The proposed algorithm contributes the
inter-process communication cost minimization amongst the executing processes.
EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...ijdpsjournal
In the area of Computer Science, Parallel job scheduling is an important field of research. Finding a best
suitable processor on the high performance or cluster computing for user submitted jobs plays an
important role in measuring system performance. A new scheduling technique called communication aware
scheduling is devised and is capable of handling serial jobs, parallel jobs, mixed jobs and dynamic jobs.
This work focuses the comparison of communication aware scheduling with the available parallel job
scheduling techniques and the experimental results show that communication aware scheduling performs
better when compared to the available parallel job scheduling techniques.
A popular programming model for running data intensive applications on the cloud is map reduce. In
the Hadoop usually, jobs are scheduled in FIFO order by default. There are many map reduce
applications which require strict deadline. In Hadoop framework, scheduler wi t h deadline
con s t ra in t s has not been implemented. Existing schedulers d o not guarantee that the job will be
completed by a specific deadline. Some schedulers address the issue of deadlines but focus more on
improving s y s t em utilization. We have proposed an algorithm which facilitates the user to
specify a jobs deadline and evaluates whether the job can be finished before the deadline.
Scheduler with deadlines for Hadoop, which ensures that only jobs, whose deadlines can be met are
scheduled for execution. If the job submitted does not satisfy the specified deadline, physical or
virtual nodes can be added dynamically to complete the job within deadline[8].
A Framework and Methods for Dynamic Scheduling of a Directed Acyclic Graph on...IDES Editor
The data flow model is gaining popularity as a
programming paradigm for multi-core processors. Efficient
scheduling of an application modeled by Directed Acyclic
Graph (DAG) is a key issue when performance is very
important. DAG represents computational solutions, in which
the nodes represent tasks to be executed and edges represent
precedence constraints among the tasks. The task scheduling
problem in general is a NP-complete problem[2]. Several static
scheduling heuristics have been proposed. But the major
problem in static list scheduling is the inherent difficulty in
exact estimation of task cost and edge cost in a DAG and also
its inability to consider and manage with runtime behavior of
tasks. This underlines the need for dynamic scheduling of a
DAG. This paper presents how in general, dynamic scheduling
of a DAG can be done. Also proposes 4 simple methods to
perform dynamic scheduling of a DAG. These methods have
been simulated and experimented using a representative set
of DAG structured computations from both synthetic and real
problems. The proposed dynamic scheduler performance is
found to be in comparable with that of static scheduling
methods. The performance comparison of the proposed
dynamic scheduling methods is also carried out.
A Dynamic Programming Approach to Energy-Efficient Scheduling on Multi-FPGA b...TELKOMNIKA JOURNAL
This paper has been studied an important issue of energy-efficient scheduling on multi-FPGA systems. The main challenges are integral allocation, reconfiguration overhead and exclusiveness and energy minimization with deadline constraint. To tackle these challenges, based on the theory of dynamic programming, we have designed and implemented an energy-efficient scheduling on multi-FPGA systems. Differently, we have presented a MLPF algorithm for task placement on FPGAs. Finally, the experimental results have demonstrated that the proposed algorithm can successfully accommodate all tasks without violation of the deadline constraint. Additionally, it gains higher energy reduction 13.3% and 26.3% than that of Particle Swarm Optimization and fully balanced algorithm, respectively.
(Paper) Task scheduling algorithm for multicore processor system for minimiz...Naoki Shibata
Shohei Gotoda, Naoki Shibata and Minoru Ito : "Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault," Proceedings of IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2012), pp.260-267, DOI:10.1109/CCGrid.2012.23, May 15, 2012.
In this paper, we propose a task scheduling al-gorithm for a multicore processor system which reduces the
recovery time in case of a single fail-stop failure of a multicore
processor. Many of the recently developed processors have
multiple cores on a single die, so that one failure of a computing
node results in failure of many processors. In the case of a failure
of a multicore processor, all tasks which have been executed
on the failed multicore processor have to be recovered at once.
The proposed algorithm is based on an existing checkpointing
technique, and we assume that the state is saved when nodes
send results to the next node. If a series of computations that
depends on former results is executed on a single die, we need
to execute all parts of the series of computations again in
the case of failure of the processor. The proposed scheduling
algorithm tries not to concentrate tasks to processors on a die.
We designed our algorithm as a parallel algorithm that achieves
O(n) speedup where n is the number of processors. We evaluated
our method using simulations and experiments with four PCs.
We compared our method with existing scheduling method, and
in the simulation, the execution time including recovery time in
the case of a node failure is reduced by up to 50% while the
overhead in the case of no failure was a few percent in typical
scenarios.
LEARNING SCHEDULER PARAMETERS FOR ADAPTIVE PREEMPTIONcscpconf
An operating system scheduler is expected to not allow processor stay idle if there is any
process ready or waiting for its execution. This problem gains more importance as the numbers
of processes always outnumber the processors by large margins. It is in this regard that
schedulers are provided with the ability to preempt a running process, by following any
scheduling algorithm, and give us an illusion of simultaneous running of several processes. A
process which is allowed to utilize CPU resources for a fixed quantum of time (termed as
timeslice for preemption) and is then preempted for another waiting process. Each of these
'process preemption' leads to considerable overhead of CPU cycles which are valuable resource
for runtime execution. In this work we try to utilize the historical performances of a scheduler
and predict the nature of current running process, thereby trying to reduce the number of
preemptions. We propose a machine-learning module to predict a better performing timeslice
which is calculated based on static knowledge base and adaptive reinforcement learning based
suggestive module. Results for an "adaptive timeslice parameter" for preemption show good
saving on CPU cycles and efficient throughput time.
Learning scheduler parameters for adaptive preemptioncsandit
An operating system scheduler is expected to not al
low processor stay idle if there is any
process ready or waiting for its execution. This pr
oblem gains more importance as the numbers
of processes always outnumber the processors by lar
ge margins. It is in this regard that
schedulers are provided with the ability to preempt
a running process, by following any
scheduling algorithm, and give us an illusion of si
multaneous running of several processes. A
process which is allowed to utilize CPU resources f
or a fixed quantum of time (termed as
timeslice for preemption) and is then preempted for
another waiting process. Each of these
'process preemption' leads to considerable overhead
of CPU cycles which are valuable resource
for runtime execution. In this work we try to utili
ze the historical performances of a scheduler
and predict the nature of current running process,
thereby trying to reduce the number of
preemptions. We propose a machine-learning module t
o predict a better performing timeslice
which is calculated based on static knowledge base
and adaptive reinforcement learning based
suggestive module. Results for an "adaptive timesli
ce parameter" for preemption show good
saving on CPU cycles and efficient throughput time.
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUScsandit
Graphics Processing Units (GPUs) have been emerged as powerful parallel compute platforms for various
application domains. A GPU consists of hundreds or even thousands processor cores and adopts Single
Instruction Multiple Threading (SIMT) architecture. Previously, we have proposed an approach that
optimizes the Tabu Search algorithm for solving the Permutation Flowshop Scheduling Problem (PFSP)
on a GPU by using a math function to generate all different permutations, avoiding the need of placing all
the permutations in the global memory. Based on the research result, this paper proposes another
approach that further improves the performance by avoiding duplicated computation among threads,
which is incurred when any two permutations have the same prefix. Experimental results show that the
GPU implementation of our proposed Tabu Search for PFSP runs up to 1.5 times faster than another GPU
implementation proposed by Czapinski and Barnes
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUScscpconf
Graphics Processing Units (GPUs) have been emerged as powerful parallel compute platforms for various
application domains. A GPU consists of hundreds or even thousands processor cores and adopts Single
Instruction Multiple Threading (SIMT) architecture. Previously, we have proposed an approach that
optimizes the Tabu Search algorithm for solving the Permutation Flowshop Scheduling Problem (PFSP)
on a GPU by using a math function to generate all different permutations, avoiding the need of placing all
the permutations in the global memory. Based on the research result, this paper proposes another
approach that further improves the performance by avoiding duplicated computation among threads,
which is incurred when any two permutations have the same prefix. Experimental results show that the
GPU implementation of our proposed Tabu Search for PFSP runs up to 1.5 times faster than another GPU
implementation proposed by Czapiński and Barnes.
An Improved Parallel Activity scheduling algorithm for large datasetsIJERA Editor
Parallel processing is capable of executing a large number of tasks on a multiprocessor at the same time period, and it is also one of the emerging concepts. Complex and computational problems can be resolved in an efficient way with the help of parallel processing. The parallel processing system can be divided into two categories depending on the nature of tasks such are homogenous parallel system and the heterogeneous parallel processing system. In the homogeneous environment, the number of processors required for executing different tasks is similar in capacity. In case of heterogeneous environments, tasks are allocated to various processors with different capacity and speed. The main objective of parallel processing is to optimize the execution speed and to shorten the duration of task execution with independent of environment. In this proposed work, an optimized parallel project selection method was implemented to find the optimal resource utilization and project scheduling. The execution speeds of the task increases and the overall average execution time of the task decreases by allocating different tasks to various processors with the task scheduling algorithm.
IOT SOLUTIONS FOR SMART PARKING- SIGFOX TECHNOLOGYCSEIJJournal
Sigfox technology has emerged as a competitive product in the communication service provider market for
approximately a decade. Widely implemented for smart parking solutions across various European
countries, it has now gained traction in Germany as well. The technology's successful track record and
reputation in the market demonstrate its effectiveness and reliability in addressing the communication
needs of IoT applications, particularly in the context of vehicle parking systems. This is noted in terms of a
city like Berlin-Germany, for on which the study is conducted. The major challenge being on how to relate
the parking techniques in a more user friendly, cost effective and less energy consumpmti0n mode where
the questions had at the beginning of the paper, relatively at the end the answers are sought to it via Sigfox
and its comparison with other related technologies like LoRA WAN and weightless. But more so future
areas of research study is also pointed out on areas which are not clearly identified in this particular
research area.
This paper entails the pros, cons adaptive, emerging and existing technology study in terms of cloud, big
data, Data analytics are all discussed in tandem to Sigfox.
Reliability Improvement with PSP of Web-Based Software ApplicationsCSEIJJournal
In diverse industrial and academic environments, the quality of the software has been evaluated using
different analytic studies. The contribution of the present work is focused on the development of a
methodology in order to improve the evaluation and analysis of the reliability of web-based software
applications. The Personal Software Process (PSP) was introduced in our methodology for improving the
quality of the process and the product. The Evaluation + Improvement (Ei) process is performed in our
methodology to evaluate and improve the quality of the software system. We tested our methodology in a
web-based software system and used statistical modeling theory for the analysis and evaluation of the
reliability. The behavior of the system under ideal conditions was evaluated and compared against the
operation of the system executing under real conditions. The results obtained demonstrated the
effectiveness and applicability of our methodology
More Related Content
Similar to Reconfiguration Strategies for Online Hardware Multitasking in Embedded Systems
Reinforcement learning based multi core scheduling (RLBMCS) for real time sys...IJECEIAES
Embedded systems with multi core processors are increasingly popular because of the diversity of applications that can be run on it. In this work, a reinforcement learning based scheduling method is proposed to handle the real time tasks in multi core systems with effective CPU usage and lower response time. The priority of the tasks is varied dynamically to ensure fairness with reinforcement learning based priority assignment and Multi Core MultiLevel Feedback queue (MCMLFQ) to manage the task execution in multi core system
Cloud computing is an emerging technology. It process huge amount of data so scheduling mechanism
works as a vital role in the cloud computing. Thus my protocol is designed to minimize the switching time,
improve the resource utilization and also improve the server performance and throughput. This method or
protocol is based on scheduling the jobs in the cloud and to solve the drawbacks in the existing protocols.
Here we assign the priority to the job which gives better performance to the computer and try my best to
minimize the waiting time and switching time. Best effort has been made to manage the scheduling of jobs
for solving drawbacks of existing protocols and also improvise the efficiency and throughput of the server.
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...IJCNCJournal
Rapid development of diverse computer architectures and hardware accelerators caused that designing parallel systems faces new problems resulting from their heterogeneity. Our implementation of a parallel
system called KernelHive allows to efficiently run applications in a heterogeneous environment consisting
of multiple collections of nodes with different types of computing devices. The execution engine of the
system is open for optimizer implementations, focusing on various criteria. In this paper, we propose a new
optimizer for KernelHive, that utilizes distributed databases and performs data prefetching to optimize the
execution time of applications, which process large input data. Employing a versatile data management
scheme, which allows combining various distributed data providers, we propose using NoSQL databases
for our purposes. We support our solution with results of experiments with real executions of our OpenCL
implementation of a regular expression matching application in various hardware configurations.
Additionally, we propose a network-aware scheduling scheme for selecting hardware for the proposed
optimizer and present simulations that demonstrate its advantages.
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGcscpconf
Parallel computing systems compose task partitioning strategies in a true multiprocessing
manner. Such systems share the algorithm and processing unit as computing resources which
leads to highly inter process communications capabilities. The main part of the proposed
algorithm is resource management unit which performs task partitioning and co-scheduling .In
this paper, we present a technique for integrated task partitioning and co-scheduling on the
privately owned network. We focus on real-time and non preemptive systems. A large variety of
experiments have been conducted on the proposed algorithm using synthetic and real tasks.
Goal of computation model is to provide a realistic representation of the costs of programming
The results show the benefit of the task partitioning. The main characteristics of our method are
optimal scheduling and strong link between partitioning, scheduling and communication. Some
important models for task partitioning are also discussed in the paper. We target the algorithm
for task partitioning which improve the inter process communication between the tasks and use
the recourses of the system in the efficient manner. The proposed algorithm contributes the
inter-process communication cost minimization amongst the executing processes.
EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...ijdpsjournal
In the area of Computer Science, Parallel job scheduling is an important field of research. Finding a best
suitable processor on the high performance or cluster computing for user submitted jobs plays an
important role in measuring system performance. A new scheduling technique called communication aware
scheduling is devised and is capable of handling serial jobs, parallel jobs, mixed jobs and dynamic jobs.
This work focuses the comparison of communication aware scheduling with the available parallel job
scheduling techniques and the experimental results show that communication aware scheduling performs
better when compared to the available parallel job scheduling techniques.
A popular programming model for running data intensive applications on the cloud is map reduce. In
the Hadoop usually, jobs are scheduled in FIFO order by default. There are many map reduce
applications which require strict deadline. In Hadoop framework, scheduler wi t h deadline
con s t ra in t s has not been implemented. Existing schedulers d o not guarantee that the job will be
completed by a specific deadline. Some schedulers address the issue of deadlines but focus more on
improving s y s t em utilization. We have proposed an algorithm which facilitates the user to
specify a jobs deadline and evaluates whether the job can be finished before the deadline.
Scheduler with deadlines for Hadoop, which ensures that only jobs, whose deadlines can be met are
scheduled for execution. If the job submitted does not satisfy the specified deadline, physical or
virtual nodes can be added dynamically to complete the job within deadline[8].
A Framework and Methods for Dynamic Scheduling of a Directed Acyclic Graph on...IDES Editor
The data flow model is gaining popularity as a
programming paradigm for multi-core processors. Efficient
scheduling of an application modeled by Directed Acyclic
Graph (DAG) is a key issue when performance is very
important. DAG represents computational solutions, in which
the nodes represent tasks to be executed and edges represent
precedence constraints among the tasks. The task scheduling
problem in general is a NP-complete problem[2]. Several static
scheduling heuristics have been proposed. But the major
problem in static list scheduling is the inherent difficulty in
exact estimation of task cost and edge cost in a DAG and also
its inability to consider and manage with runtime behavior of
tasks. This underlines the need for dynamic scheduling of a
DAG. This paper presents how in general, dynamic scheduling
of a DAG can be done. Also proposes 4 simple methods to
perform dynamic scheduling of a DAG. These methods have
been simulated and experimented using a representative set
of DAG structured computations from both synthetic and real
problems. The proposed dynamic scheduler performance is
found to be in comparable with that of static scheduling
methods. The performance comparison of the proposed
dynamic scheduling methods is also carried out.
A Dynamic Programming Approach to Energy-Efficient Scheduling on Multi-FPGA b...TELKOMNIKA JOURNAL
This paper has been studied an important issue of energy-efficient scheduling on multi-FPGA systems. The main challenges are integral allocation, reconfiguration overhead and exclusiveness and energy minimization with deadline constraint. To tackle these challenges, based on the theory of dynamic programming, we have designed and implemented an energy-efficient scheduling on multi-FPGA systems. Differently, we have presented a MLPF algorithm for task placement on FPGAs. Finally, the experimental results have demonstrated that the proposed algorithm can successfully accommodate all tasks without violation of the deadline constraint. Additionally, it gains higher energy reduction 13.3% and 26.3% than that of Particle Swarm Optimization and fully balanced algorithm, respectively.
(Paper) Task scheduling algorithm for multicore processor system for minimiz...Naoki Shibata
Shohei Gotoda, Naoki Shibata and Minoru Ito : "Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault," Proceedings of IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2012), pp.260-267, DOI:10.1109/CCGrid.2012.23, May 15, 2012.
In this paper, we propose a task scheduling al-gorithm for a multicore processor system which reduces the
recovery time in case of a single fail-stop failure of a multicore
processor. Many of the recently developed processors have
multiple cores on a single die, so that one failure of a computing
node results in failure of many processors. In the case of a failure
of a multicore processor, all tasks which have been executed
on the failed multicore processor have to be recovered at once.
The proposed algorithm is based on an existing checkpointing
technique, and we assume that the state is saved when nodes
send results to the next node. If a series of computations that
depends on former results is executed on a single die, we need
to execute all parts of the series of computations again in
the case of failure of the processor. The proposed scheduling
algorithm tries not to concentrate tasks to processors on a die.
We designed our algorithm as a parallel algorithm that achieves
O(n) speedup where n is the number of processors. We evaluated
our method using simulations and experiments with four PCs.
We compared our method with existing scheduling method, and
in the simulation, the execution time including recovery time in
the case of a node failure is reduced by up to 50% while the
overhead in the case of no failure was a few percent in typical
scenarios.
LEARNING SCHEDULER PARAMETERS FOR ADAPTIVE PREEMPTIONcscpconf
An operating system scheduler is expected to not allow processor stay idle if there is any
process ready or waiting for its execution. This problem gains more importance as the numbers
of processes always outnumber the processors by large margins. It is in this regard that
schedulers are provided with the ability to preempt a running process, by following any
scheduling algorithm, and give us an illusion of simultaneous running of several processes. A
process which is allowed to utilize CPU resources for a fixed quantum of time (termed as
timeslice for preemption) and is then preempted for another waiting process. Each of these
'process preemption' leads to considerable overhead of CPU cycles which are valuable resource
for runtime execution. In this work we try to utilize the historical performances of a scheduler
and predict the nature of current running process, thereby trying to reduce the number of
preemptions. We propose a machine-learning module to predict a better performing timeslice
which is calculated based on static knowledge base and adaptive reinforcement learning based
suggestive module. Results for an "adaptive timeslice parameter" for preemption show good
saving on CPU cycles and efficient throughput time.
Learning scheduler parameters for adaptive preemptioncsandit
An operating system scheduler is expected to not al
low processor stay idle if there is any
process ready or waiting for its execution. This pr
oblem gains more importance as the numbers
of processes always outnumber the processors by lar
ge margins. It is in this regard that
schedulers are provided with the ability to preempt
a running process, by following any
scheduling algorithm, and give us an illusion of si
multaneous running of several processes. A
process which is allowed to utilize CPU resources f
or a fixed quantum of time (termed as
timeslice for preemption) and is then preempted for
another waiting process. Each of these
'process preemption' leads to considerable overhead
of CPU cycles which are valuable resource
for runtime execution. In this work we try to utili
ze the historical performances of a scheduler
and predict the nature of current running process,
thereby trying to reduce the number of
preemptions. We propose a machine-learning module t
o predict a better performing timeslice
which is calculated based on static knowledge base
and adaptive reinforcement learning based
suggestive module. Results for an "adaptive timesli
ce parameter" for preemption show good
saving on CPU cycles and efficient throughput time.
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUScsandit
Graphics Processing Units (GPUs) have been emerged as powerful parallel compute platforms for various
application domains. A GPU consists of hundreds or even thousands processor cores and adopts Single
Instruction Multiple Threading (SIMT) architecture. Previously, we have proposed an approach that
optimizes the Tabu Search algorithm for solving the Permutation Flowshop Scheduling Problem (PFSP)
on a GPU by using a math function to generate all different permutations, avoiding the need of placing all
the permutations in the global memory. Based on the research result, this paper proposes another
approach that further improves the performance by avoiding duplicated computation among threads,
which is incurred when any two permutations have the same prefix. Experimental results show that the
GPU implementation of our proposed Tabu Search for PFSP runs up to 1.5 times faster than another GPU
implementation proposed by Czapinski and Barnes
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUScscpconf
Graphics Processing Units (GPUs) have been emerged as powerful parallel compute platforms for various
application domains. A GPU consists of hundreds or even thousands processor cores and adopts Single
Instruction Multiple Threading (SIMT) architecture. Previously, we have proposed an approach that
optimizes the Tabu Search algorithm for solving the Permutation Flowshop Scheduling Problem (PFSP)
on a GPU by using a math function to generate all different permutations, avoiding the need of placing all
the permutations in the global memory. Based on the research result, this paper proposes another
approach that further improves the performance by avoiding duplicated computation among threads,
which is incurred when any two permutations have the same prefix. Experimental results show that the
GPU implementation of our proposed Tabu Search for PFSP runs up to 1.5 times faster than another GPU
implementation proposed by Czapiński and Barnes.
An Improved Parallel Activity scheduling algorithm for large datasetsIJERA Editor
Parallel processing is capable of executing a large number of tasks on a multiprocessor at the same time period, and it is also one of the emerging concepts. Complex and computational problems can be resolved in an efficient way with the help of parallel processing. The parallel processing system can be divided into two categories depending on the nature of tasks such are homogenous parallel system and the heterogeneous parallel processing system. In the homogeneous environment, the number of processors required for executing different tasks is similar in capacity. In case of heterogeneous environments, tasks are allocated to various processors with different capacity and speed. The main objective of parallel processing is to optimize the execution speed and to shorten the duration of task execution with independent of environment. In this proposed work, an optimized parallel project selection method was implemented to find the optimal resource utilization and project scheduling. The execution speeds of the task increases and the overall average execution time of the task decreases by allocating different tasks to various processors with the task scheduling algorithm.
Similar to Reconfiguration Strategies for Online Hardware Multitasking in Embedded Systems (20)
IOT SOLUTIONS FOR SMART PARKING- SIGFOX TECHNOLOGYCSEIJJournal
Sigfox technology has emerged as a competitive product in the communication service provider market for
approximately a decade. Widely implemented for smart parking solutions across various European
countries, it has now gained traction in Germany as well. The technology's successful track record and
reputation in the market demonstrate its effectiveness and reliability in addressing the communication
needs of IoT applications, particularly in the context of vehicle parking systems. This is noted in terms of a
city like Berlin-Germany, for on which the study is conducted. The major challenge being on how to relate
the parking techniques in a more user friendly, cost effective and less energy consumpmti0n mode where
the questions had at the beginning of the paper, relatively at the end the answers are sought to it via Sigfox
and its comparison with other related technologies like LoRA WAN and weightless. But more so future
areas of research study is also pointed out on areas which are not clearly identified in this particular
research area.
This paper entails the pros, cons adaptive, emerging and existing technology study in terms of cloud, big
data, Data analytics are all discussed in tandem to Sigfox.
Reliability Improvement with PSP of Web-Based Software ApplicationsCSEIJJournal
In diverse industrial and academic environments, the quality of the software has been evaluated using
different analytic studies. The contribution of the present work is focused on the development of a
methodology in order to improve the evaluation and analysis of the reliability of web-based software
applications. The Personal Software Process (PSP) was introduced in our methodology for improving the
quality of the process and the product. The Evaluation + Improvement (Ei) process is performed in our
methodology to evaluate and improve the quality of the software system. We tested our methodology in a
web-based software system and used statistical modeling theory for the analysis and evaluation of the
reliability. The behavior of the system under ideal conditions was evaluated and compared against the
operation of the system executing under real conditions. The results obtained demonstrated the
effectiveness and applicability of our methodology
DATA MINING FOR STUDENTS’ EMPLOYABILITY PREDICTIONCSEIJJournal
This study has been undertaken to predict the student employability.Assessing student employability
provides a method of integrating student abilities and employer business requirements, which is becoming
an increasingly important concern for academic institutions. Improving student evaluation techniques for
employability can help students to have a better understanding of business organizations and find the right
one for them. The data for the training classification models is gathered through a survey in which students
are asked to fill out a questionnaire in which they may indicate their abilities and academic achievement.
This information may be used to determine their competency in a variety of skill categories, including soft
skills, problem-solving skills and technical abilities and so on.The goal of this research is to use data
mining to predict student employability by considering different factors such as skills that the students have
gained during their diploma level and time duration with respect to the knowledge they have captured
when they expect the placement at the end of graduation. Further during this research most specific skills
with relevant to each job category also was identified. In this research for the prediction of the student
employability different data mining models such as such as KNN, Naive Bayer’s, and Decision Tree were
evaluated and out of that best model also was identified for this institute's student’s employability
prediction.So, in this research classification and association techniques were used and evaluated.
Call for Articles - Computer Science & Engineering: An International Journal ...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
A Complexity Based Regression Test Selection StrategyCSEIJJournal
Software is unequivocally the foremost and indispensable entity in this technologically driven world.
Therefore quality assurance, and in particular, software testing is a crucial step in the software
development cycle. This paper presents an effective test selection strategy that uses a Spectrum of
Complexity Metrics (SCM). Our aim in this paper is to increase the efficiency of the testing process by
significantly reducing the number of test cases without having a significant drop in test effectiveness. The
strategy makes use of a comprehensive taxonomy of complexity metrics based on the product level (class,
method, statement) and its characteristics.We use a series of experiments based on three applications with
a significant number of mutants to demonstrate the effectiveness of our selection strategy.For further
evaluation, we compareour approach to boundary value analysis. The results show the capability of our
approach to detect mutants as well as the seeded errors.
XML Encryption and Signature for Securing Web ServicesCSEIJJournal
In this research, we have focused on the most challenging issue that Web Services face, i.e. how to secure
their information. Web Services security could be guaranteed by employing security standards, which is the
main focus of this search. Every suggested model related to security design should put in the account the
securities' objectives; integrity, confidentiality, non- repudiation, authentication, and authorization. The
proposed model describes SOAP messages and the way to secure their contents. Due to the reason that
SOAP message is the core of the exchanging information in Web Services, this research has developed a
security model needed to ensure e-business security. The essence of our model depends on XML encryption
and XML signature to encrypt and sign SOAP message. The proposed model looks forward to achieve a
high speed of transaction and a strong level of security without jeopardizing the performance of
transmission information.
Performance Comparison of PCA,DWT-PCA And LWT-PCA for Face Image RetrievalCSEIJJournal
This paper compares the performance of face image retrieval system based on discrete wavelet transforms
and Lifting wavelet transforms with principal component analysis (PCA). These techniques are
implemented and their performances are investigated using frontal facial images from the ORL database.
The Discrete Wavelet Transform is effective in representing image features and is suitable in Face image
retrieval, it still encounters problems especially in implementation; e.g. Floating point operation and
decomposition speed. We use the advantages of lifting scheme, a spatial approach for constructing wavelet
filters, which provides feasible alternative for problems facing its classical counterpart. Lifting scheme has
such intriguing properties as convenient construction, simple structure, integer-to-integer transform, low
computational complexity as well as flexible adaptivity, revealing its potentials in Face image retrieval.
Comparing to PCA and DWT with PCA, Lifting wavelet transform with PCA gives less computation and
DWT-PCA gives high retrieval rate..
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Paper Submission - Computer Science & Engineering: An International Journal (...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Performance Comparison of PCA,DWT-PCA And LWT-PCA for Face Image RetrievalCSEIJJournal
This paper compares the performance of face image retrieval system based on discrete wavelet transforms
and Lifting wavelet transforms with principal component analysis (PCA). These techniques are
implemented and their performances are investigated using frontal facial images from the ORL database.
The Discrete Wavelet Transform is effective in representing image features and is suitable in Face image
retrieval, it still encounters problems especially in implementation; e.g. Floating point operation and
decomposition speed. We use the advantages of lifting scheme, a spatial approach for constructing wavelet
filters, which provides feasible alternative for problems facing its classical counterpart. Lifting scheme has
such intriguing properties as convenient construction, simple structure, integer-to-integer transform, low
computational complexity as well as flexible adaptivity, revealing its potentials in Face image retrieval.
Comparing to PCA and DWT with PCA, Lifting wavelet transform with PCA gives less computation and
DWT-PCA gives high retrieval rate..
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Data security and privacy are important to prevent the re-
veal, modification and unauthorized usage of sensitive information. The
introduction of using critical power devices for internet of things (IoTs),
e-commerce, e-payment, and wireless sensor networks (WSNs) has brought
a new challenge of security due to the low computation capability of sen-
sors. Therefore, the lightweight authenticated key agreement protocols
are important to protect their security and privacy. Several researches
have been published about authenticated key agreement. However, there
is a need of lightweight schemes that can fit with critical capability de-
vices. Addition to that, a malicious key generation center (KGC) can
become a threat to watch other users, i.e impersonate user by causing
the key escrow problem
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Recommendation System for Information Services Adapted, Over Terrestrial Digi...CSEIJJournal
The development of digital television in Colombia has grown in last year’s, specially the digital terrestrial
television (DTT), which is an essential part to the projects of National Minister of ICT, thanks to the big
distribution and use of the television network and Internet in the country. This article explains how joining
different technologies like social networks, information adaptation and DTT, to get an application that
offers information services to users, based on their data, preferences, inclinations, use and interaction with
others users and groups inside the network.
Performance Comparison and Analysis of Mobile Ad Hoc Routing ProtocolsCSEIJJournal
A mobile ad hoc network (MANET) is a wireless network that uses multi-hop peer-to-peer routing instead
of static network infrastructure to provide network connectivity. MANETs have applications in rapidly
deployed and dynamic military and civilian systems. The network topology in a MANET usually changes
with time. Therefore, there are new challenges for routing protocols in MANETs since traditional routing
protocols may not be suitable for MANETs. Researchers are designing new MANET routing protocols
and comparing and improving existing MANET routing protocols before any routing protocols are
standardized using simulations. However, the simulation results from different research groups are not
consistent with each other. This is because of a lack of consistency in MANET routing protocol models
and application environments, including networking and user traffic profiles. Therefore, the simulation
scenarios are not equitable for all protocols and conclusions cannot be generalized. Furthermore, it is
difficult for one to choose a proper routing protocol for a given MANET application. According to the
aforementioned issues, this paper focuses on MANET routing protocols. Specifically, my contribution
includes the characterization of different routing protocols and compare and analyze the performance of
different routing protocols.
Adaptive Stabilization and Synchronization of Hyperchaotic QI SystemCSEIJJournal
The hyperchaotic Qi system (Chen, Yang, Qi and Yuan, 2007) is one of the important models of four-
dimensional hyperchaotic systems. This paper investigates the adaptive stabilization and synchronization
of hyperchaotic Qi system with unknown parameters. First, adaptive control laws are designed to
stabilize the hyperchaotic Qi system to its equilibrium point at the origin based on the adaptive control
theory and Lyapunov stability theory. Then adaptive control laws are derived to achieve global chaos
synchronization of identical hyperchaotic Qi systems with unknown parameters. Numerical simulations
are shown to demonstrate the effectiveness of the proposed adaptive stabilization and synchronization
schemes.
An Energy Efficient Data Secrecy Scheme For Wireless Body Sensor NetworksCSEIJJournal
Data secrecy is one of the key concerns for wireless body sensor networks (WBSNs). Usually, a data
secrecy scheme should accomplish two tasks: key establishment and encryption. WBSNs generally face
more serious limitations than general wireless networks in terms of energy supply. To address this, in this
paper, we propose an energy efficient data secrecy scheme for WBSNs. On one hand, the proposed key
establishment protocol integrates node IDs, seed value and nonce seamlessly for security, then
establishes a session key between two nodes based on one-way hash algorithm SHA-1. On the other hand,
a low-complexity threshold selective encryption technology is proposed. Also, we design a security
selection patter exchange method with low-complexity for the threshold selection encryption. Then, we
evaluate the energy consumption of the proposed scheme. Our scheme shows the great advantage over
the other existing schemes in terms of low energy consumption.
To improve the QoS in MANETs through analysis between reactive and proactive ...CSEIJJournal
A Mobile Ad hoc NETwork (MANET), is a self-configuring infra structure less network of mobile devices
connected by wireless links. ad hoc is Latin and means "for this purpose". Each device in a MANET is free
to move independently in any direction, and will therefore change its links to other devices frequently. Each
must forward traffic unrelated to its own use, and therefore be a router. The primary challenge in building
a MANET is equipping each device to continuously maintain the information required to properly route
traffic. QOS is defined as a set of service requirements to be met by the network while transporting a
packet stream from source to destination. Intrinsic to the notion of QOS is an agreement or a guarantee by
the network to provide a set of measurable pre-specified service attributes to the user in terms of delay,
jitter, available bandwidth, packet loss, and so on. The analysis is mainly between proactive or table-driven
protocols like OLSR (Optimized Link State Routing) viz DSDV (Destination Sequenced Distance Vector) &
CGSR (Cluster Head Gateway Switch Routing) and reactive or source initiated routing protocols viz
AODV (Ad hoc on Demand distance Vector) & DSR (Dynamic Source Routing). The QoS analysis of the
above said protocols is simulated on NS2 and results are shown thereby.
This paper introduces Topic Tracking for Punjabi language. Text mining is a field that automatically
extracts previously unknown and useful information from unstructured textual data. It has strong
connections with natural language processing. NLP has produced technologies that teach computers
natural language so that they may analyze, understand and even generate text. Topic tracking is one of the
technologies that has been developed and can be used in the text mining process. The main purpose of topic
tracking is to identify and follow events presented in multiple news sources, including newswires, radio and
TV broadcasts. It collects dispersed information together and makes it easy for user to get a general
understanding. Not much work has been done in Topic tracking for Indian Languages in general and
Punjabi in particular. First we survey various approaches available for Topic Tracking, then represent our
approach for Punjabi. The experimental results are shown.
EXPLORATORY DATA ANALYSIS AND FEATURE SELECTION FOR SOCIAL MEDIA HACKERS PRED...CSEIJJournal
In machine learning, the intelligence of a developed model is greatly influenced by the dataset used for the
target domain on which the developed model will be deployed. Social media platform has experienced
more of hackers’ attacks on the platform in recent time. To identify a hacker on the platform, there are two
possible ways. The first is to use the activities of the user while the second is to use the supplied details the
user registered the account with. To adequately identify a social media user as hacker proactively, there
are relevant user details called features that can be used to determine whether a social media user is a
hacker or not. In this paper, an exploratory data analysis was carried out to determine the best features
that can be used by a predictive model to proactively identify hackers on the social media platform. A web
crawler was developed to mine the user dataset on which exploratory data analysis was carried out to
select the best features for the dataset which could be used to correctly identify a hacker on a social media
platform.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Reconfiguration Strategies for Online Hardware Multitasking in Embedded Systems
1. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
DOI : 10.5121/cseij.2012.2601 1
RECONFIGURATION STRATEGIES FOR ONLINE
HARDWARE MULTITASKING IN EMBEDDED
SYSTEMS
Marcos Sanchez-Elez and Sara Roman
1
Department of Arquitectura de Computadores, Universidad Complutense, Madrid, Spain
marcos@fis.ucm.es sroman@dacya.ucm.es
ABSTRACT
An intensive use of reconfigurable hardware is expected in future embedded systems. This means that the
system has to decide which tasks are more suitable for hardware execution. In order to make an efficient
use of the FPGA it is convenient to choose one that allows hardware multitasking, which is implemented by
using partial dynamic reconfiguration. One of the challenges for hardware multitasking in embedded
systems is the online management of the only reconfiguration port of present FPGA devices. This paper
presents different online reconfiguration scheduling strategies which assign the reconfiguration interface
resource using different criteria: workload distribution or task’ deadline. The online scheduling strategies
presented take efficient and fast decisions based on the information available at each moment. Experiments
have been made in order to analyze the performance and convenience of these reconfiguration strategies.
KEYWORDS
Embedded systems, multitasking scheduling, reconfigurable architectures
1. INTRODUCTION
The combined need for flexibility and high performance in embedded systems today is clearly
pointing at the use of Reconfigurable Devices as the new design paradigm [1]. This expectation is
based on the performance gain achieved by using a FPGA because of the possibility to
dynamically reconfigure parts of the FPGA during run-time without disturbing the execution of
the other parts [3].
In order to achieve the maximum processing performance, in the system that combines
reconfigurable hardware and microprocessor, either the operating system (OS) or the application
designer, selects the tasks that are most suitable for hardware execution, and distribute the
workload between the FPGA and the microprocessor [2]. In this case the reconfiguration time is
an aspect of FPGA technology which adds a significant overhead to hardware execution and
becomes very important for an efficient use of present day FPGAs, as they only have one
reconfiguration port.
In such hybrid embedded designs, the decision on whether a task will be executed in the
microprocessor or the FPGA is taken during execution time. This means that the tasks that are
going to be executed are not known in advance: the multitasking management is an online
problem. Therefore, in this approach, it is not possible to use a known task graph to schedule task
reconfiguration and execution in the FPGA. In addition, once a task begins executing in the
2. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
2
FPGA it is not feasible to interrupt its execution: saving a hardware context is complex and time
consuming for the implementation of an effective real-time computing system. For these reasons
we are using a non-preemptive online model.
There are researchers, as is discussed in Related Work Section, have developed offline algorithms
to scheduling the reconfiguration of tasks. This approach is not completely applicable to a real-
time multitasking problem. On the other hand, many researchers have developed online heuristics
mainly focused on optimizing FPGA area use and where the impact of the reconfiguration
scheduling on the overall scheduling of tasks is neglected by implicitly including reconfiguration
time in the task execution time.
Scheduling the execution of tasks on the only basis of free available space leads to erroneous task
scheduling, as shown in the example in Fig. 1, in which several tasks have been scheduled for
execution following a First Fit area use heuristic. Fig. 1 presents a snapshot of the execution of
several tasks scheduling with an algorithm that considers the reconfiguration time implicitly
included in the execution time of a task (shown as trec + tex). When a task arrives at the FPGA it
is scheduled for immediate execution if there is enough free space in the FPGA. In the example,
when task T3 arrives at the FPGA there is enough free space and this task is scheduled for
execution from time units 4 to 17. However T2, which arrived previously, is being reconfigured
from time units 3 to 6, which means that both tasks would be reconfigured at the same time. This
error in the scheduling is caused by the lack of an explicit difference between execution and
reconfiguration times. This situation is remarked in the figure by red line rectangles.
Fig 1.a Task scheduling without reconfiguration scheduling
This example shows that, in order to provide a correct task scheduling, a real-time algorithm
needs to take into account the free area in the device as well as the availability of the
reconfiguration port, and that the real problem is an online problem. However, in a hardware
multitasking system with a high task frequency rate, this simple scheduling heuristic is not the
most efficient, as proved in this paper. When incoming tasks may not be immediately sent for
execution in the FPGA, one or more waiting queues may be set and other strategies to schedule
the execution of pending tasks in the FPGA may be applied.
The objective of this paper is to present different real time online reconfiguration strategies and to
compare them in order to select the best one. We have compared a simple strategy, like the one
shown in the example, with another strategy which makes an intensive use of the reconfiguration
interface.
The rest of this paper is organized as follows: section 2 presents the previous work in this subject,
section 3 describes the system model used for this study, section 4 the basic reconfiguration
algorithm, section 5 presents the Reconfiguration Intensive Use Strategy implemented using two
different priority policies. Section 6 shows the experimental results obtained and section 7
provides conclusions about the different reconfiguration strategies which are the subject of this
study.
3. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
3
2. STATE OF THE ART
We may find numerous papers related to task scheduling into an FPGA. All of them are focused
on FPGA area use optimization. Diessel and Elgindy [4], Bazargan [5], Handa [6] and Ahmadinia
[7]. present interesting techniques to deal with task allocation in an FPGA that is being used for
HW multitasking, but all of them neglect the fact that reconfiguration in present day FPGAs is a
key aspect to make hardware multitasking effective. Another very interesting approach is
presented in [9] which deals with the online problem of 2D task scheduling.
On the other hand, other authors focus on the reconfiguration aspect of the problem. Such is the
case of Resano [10] who presents a technique that allows reducing reconfiguration overhead. This
work is solving an offline problem, that is, the applications that are going to be executed are
known as well as their execution graph. Thus the reconfiguration overhead may be hidden by
conveniently scheduling the reconfiguration of the tasks in the graph. They are also working on
several reconfigurable units and not a single FPGA and therefore they are not dealing neither with
the area use part of the task scheduling problem nor with the only reconfigurable interface
problem.
Götz [11] applies a server-based method and models the reconfiguration activities as aperiodic
tasks. He applies the reconfiguration scheduling heuristics to the parts of the OS that are going to
be executed in the reconfigurable hardware. Redaelli [12] has also dealt with the reconfiguration
overhead problem and has presented techniques that increase task scheduling performance by
including reconfiguration considerations into the task scheduling algorithm. However this author
also solves the offline problem in 1D.
A very interesting work has been presented by Angermeier [13] in which he points at the
importance of focusing both in the area use and reconfiguration port use in the same scheduling
algorithm. They model the reconfiguration problem as a server-parallel machines problem in a
very interesting way. They compare the reconfiguration port to the server and the slots defined in
the FPGA to the parallel machines. This author uses the Erlangen Slot machine cited in [8] and
includes some scheduling algorithms to the system. He presents three different heuristics, but two
of them are only valid for two parallel machines. They are also working on the offline problem
and they are using a 1D dynamic reconfiguration model. The FPGA they use is divided into m
equal slots, and the reconfiguration time is equal for all the slots and is normalized to value 1.
To sum up, previous research in this area has been either focused on making a good FPGA area
use by means of online scheduling heuristics, or focused on the reconfiguration scheduling for
offline problems. This research work aims at solving reconfiguration scheduling issues for 2D
online problems, which means that area availability in the FPGA has to be taken into account
although it is not the main aspect under research.
3. SYSTEM MODEL
Our system is a hybrid general computing system where tasks may be executed in a software
processor (microprocessor) or a hardware processor (FPGA). Tasks arrive at the hardware
manager on demand, and it is never known when the next task will arrive for execution or how
this task will be. Hardware multitasking is implemented by using 2D partial dynamic
reconfiguration [27]. Most previous models consider the FPGA as a homogeneous CLB array.
However, real FPGAs have BRAM blocks, multipliers and DSPs in a certain disposition and
therefore this heterogeneity in the FPGA has to be taken into account for a realistic task allocation
scheduling. This is especially important when the benchmarks being used are based on real
hardware tasks. The restrictions in the possibilities for dynamic partial reconfiguration naturally
4. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
4
points at the division of the FPGA into predefined slots, as suggested by [8], although in our case
we are using this idea in a heterogeneous 2D FPGA.
From the synthetic benchmarks selected for our study, we have observed that there is a wide
variety of FPGA resources required by different tasks, which adds an extra difficulty in task
allocation in the FPGA. As area management was not our main objective, we decided to classify
tasks according to the FPGA resources they need, so that the algorithms tested are focused on
reconfiguration interface use.
Each task belongs to class ci, the one that best fits task resources and thus fulfilling:
Ti ci if resources Ti ≤ resources ci (1)
Classes have been designed taking into account the number of CLBs, RAMs and DSPs available:
(2)
The number of classes is obtained, as pointed before, from a classification of the tasks expected
for execution in the FPGA. If the sum of the number of resources needed by the classes is clearly
lower than the resources available in the FPGA, then we may say that the differences in
performance are related to the reconfiguration interface management. However, if the sum of the
number of resources needed by the classes is larger than the resources available in the FPGA, this
means that the FPGA is too small to execute the expected workload.
By combining the 2D slotted FPGA idea from [8] with a representative enough classification of
synthetic tasks, we have been able to implement a simple FPGA area management in our
algorithms, that is efficient enough to enhance the differences observed in hardware multitasking
performance, due to the reconfiguration scheduling strategies under study.
FPGA area management is thus implemented by writing incoming tasks into the task queue
associated to the task class, and may be scheduled for execution when a slot in the FPGA
containing the resources needed by the task class becomes free as well as the reconfiguration port.
Tasks are written into queues which group them according to their class (Algorithm 1). Therefore,
finding an empty slot that fulfils the task resource requirements for the task in the FPGA may be
done with a fast and simple algorithm.
The N FPGA classes are named 0 to N-1, where class 0 represents tasks with the smallest
resources’ restrictions and N represents those with the highest FPGA resources needs (not just
CLBs, but also DSPs and/or BRAMs). Subroutine FitQj(Ti) checks if Class j contains enough
resources for Ti. Therefore Classj is the smallest class where Ti may be classified, and Ti is
written into waiting queue Qj associated to Class j (Write(Ti,Qj)).
int SelectQueue(Ti)
j = 0;
sel = false;
while ((j < N-1) and (sel == false)) do
if (FitQj(Ti) == true) then
Write(Ti,Qj)
sel = true
return = j;
end if;
5. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
5
j++;
end while
Algorithm 1. Select Queue Algorithm
Where:
In addition, before reconfiguring task Ti into the FPGA, the system has to check whether the time
constraints for Ti (tmaxi) are met:
(4)
tmaxi: stands for the maximum time unit for the task to have finished execution
treci: stands for the reconfiguration time required by the task Ti
texi: stands for the execution time of the task Ti
tstarti: stands for the time unit when Ti will begin its reconfiguration
The values of tmaxi and texi are characteristic of the task. The value of treci depends on the slot
assigned to the class of the task, as reconfiguration time is proportional to the FPGA area
reconfigured.
4. ORDERLY RECONFIGURATION STRATEGY
Orderly Strategy is the simplest idea to deal with the problem. This strategy uses a
reconfiguration FIFO in order to guarantee that the reconfiguration scheduling is correct. Tasks
are written into queues according to their sizes and their execution scheduling in the FPGA is
done by taking into account both the availability of space in the FPGA as well as the availability
of the reconfiguration port. The scheduling algorithm maintains task arrival order when assigning
reconfiguration resources to tasks. This is a very simple algorithm with a low computational cost
that we use as a baseline for the evaluation of the efficiency of the other algorithms presented.
When a task Ti arrives for execution, an exact calculation for its starting time, tstarti, is made,
which depends on the availability of the reconfiguration interface and the availability of space in
the FPGA:
tstarti = MAX[tfreeICAP(Ti), tfreeA(Ti, Qj), tarri] (5)
In this equation, tfreeICAP(Ti) is the time unit at which the reconfiguration interface (Internal
Configuration Access Port, ICAP) will be free to reconfigure another task and is calculated as
follows:
tfreeICAP(Ti)= tstarti-1+treci-1 (6)
According to the Orderly Strategy, the task which arrived immediately before Ti, Ti-1, is the task
scheduled to reconfigure before Ti. Therefore the reconfiguration interface will be free when Ti-1
finishes its reconfiguration.
6. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
6
In equation 5, tfreeA( Qj) is the time unit at which there will be a free space for Ti, assigned to
class j. This time is calculated by the area assignment strategy chosen for the implementation In
our case, tfreeA(Ti) indicates the time unit when there is a free slot in the FPGA containing the
resources needed by the Task Class j.
It is possible that both the reconfiguration interface and the FPGA be free at the arrival of Ti. In
this case Ti is immediately reconfigured and executed: tstarti = tarri, where tarri stands for the
arrival time of Ti.
The algorithm for the Orderly Strategy works as shown in algorithm 2. The function Schedule(cj,
Ti, tstarti) schedules the reconfiguration of Ti at time unit tstarti. Ti is deleted from queue Qj,
associated to class c at time tstarti, when its bitmap is loaded into the FPGA. The function
Reject(Ti) deletes Ti from the queue and reports that it is not possible to execute it in the FPGA
with the present constraints.
tstart1 = 0;
tfreeA(T1) = 0;
tfreeICAP(T1) = 0;
while (tasks in queues)
j = SelectQueue(Ti);
tstarti = MAX[tfreeICAP(Ti), tfreeA(Qj), tarrivali];
if (tmaxi ≥ tstarti+treci+texi)
Schedule(cj, Ti, tstarti);
else
Reject(Ti);
end if;
i ++;
end while;
Algorithm 2. Orderly Strategy, reconfiguration scheduling algorithm
Therefore, the Orderly Strategy may be applied when a task Ti arrives at the FPGA. Its start time
can be calculated accurately, so whether the task may be executed within its time constraints or
not is immediately known. Tasks are reconfigured at the tstarti time scheduled into the
reconfiguration FIFO, which follow the same arrival order of tasks. By using this strategy there
are intervals of time when both the reconfiguration interface and there is enough free space in the
FPGA, but not for the next task scheduled for reconfiguration. These intervals could be used to
reconfigure tasks assigned to the free slots and a more efficient use of the reconfiguration
interface and FPGA area would be achieved.
5. RECONFIGURATION PORT INTENSIVE USE STRATEGY
As we have seen, there is only one reconfiguration interface for the whole FPGA, which may
reconfigure any of the first tasks in each queue when the ICAP becomes free. The overall
scheduling problem may be viewed as a multiple stations-one server problem, where the server is
the reconfiguration interface and the stations are the tasks’ queues associated with free FPGA
slots of a given class.
When a new task arrives, the scheduling algorithm assigns a queue for the task according to its
characteristics In order to make an intensive use of the ICAP, the reconfiguration time for this
task will be decided later, so that when the ICAP becomes free any task from the queues’ heads
may be chosen for reconfiguration, as long as there is enough free space in the FPGA.
7. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
7
Function FreeA(Qc)=(tfreeA, Tj) returns Tj, which is the task waiting for execution in queue Qc,
associated to class c, and time unit tfreeA is the time when a FPGA slot with the characteristics of
class c will be free.
Function StrategyRPIU(FreeQ(0), FreeQ(1), …) is executed each time there are free slots in the
FPGA as well as the reconfiguration interface is available. This function analyzes the values
returned by FreeA() for each class and chooses that with the lowest tfreeA and the task associated
that will be scheduled next for reconfiguration and execution only if equations 4 is fulfilled.
Otherwise, that task is rejected and deleted from the queue and function StrategyRPIU() is
executed again in search for a new candidate (shown in algorithm 3).
tstart1=0;
tfreeA(T1)=0;
tfreeICAP(T1)=0;
if (reconfiguration interface = free)
[(Qj, Ti), (Qk, Tv), …]= StrategyRPIU(FreeA(0), FreeA(1), …);
(Qj, Ti) = ICAParbiter((Qj, Ti), (Qk, Tv), …);
tstarti = MAX[t, tfreeICAP(Tk), tfreeA(Qj)];
if (tmaxi ≥ tstarti+treci+texi)
Schedule(Qj, Ti, tstarti);
else
Reject(Ti);
end if;
end if;
Algorithm 3. Reconfiguration Port Intensive Use Strategy, reconfiguration scheduling algorithm
It may happen that more than one class finds available slots in the FPGA at the same time unit (or
that the only free slot is suitable for several classes); in such cases, the function StrategyRPIU
returns one task for each class for which there is available space in the FPGA. Then the
ICAParbiter() chooses which task is first to be reconfigured.
The starting time for Ti is calculated as shown the algorithm, where: t is the current time unit and
fulfills t ≥ tarri, tfreeICAP(Tk)= tstartk + treck where Tk is the task scheduled to reconfigure before
Ti and tfreeA(Qj)=tstartm+trecm+texm is the time when a slot with the resources for class Qi will
be free (Tm is the task scheduled to execute in this slot before Ti).
This strategy allows making an intensive use of the reconfiguration port and therefore yields
better performance results than the basic orderly strategy.
We have developed two distinct policies that implement the RPIU strategy in combination with
two priority criteria that select the task to be reconfigured and executed first in cases where there
is a reconfiguration interface use conflict, that is, two different version of the ICAParbiter()
policy.
5.1 Most Loaded Queue Policy
This algorithm implements the RPIU Strategy with a version of the ICAP arbiter function that
gives priority to tasks in the most loaded queue (algorithm 4).
This version of the ICAParbiter() first checks if the task in the queue head fits the free slot; it then
checks if this queue is more loaded than the other queues containing tasks that also fit the free
8. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
8
slot. When it has reviewed all queues, the function returns the task written at the head of the most
loaded queue that fits the free slot. If there are no tasks that fit the free slot, the function returns T
= null.
Function Fit(free_slot,Ti) checks that the free slot has enough resources for the execution of Ti, as
shown in equation 2. Function load(Qj) returns the number of tasks in Qj. The variable old_sel is
used to avoid that one of the classes may become too greedy and block access to FPGA resources
to the other classes. The use of this variable allows this policy to occasionally reconfigure tasks
from less loaded queues in-between the reconfiguration of tasks from the most loaded queue.
function ICAParbiter(Qj, Qk, …)
sel = null;
for j=0 to N-1
T = head(Qj);
if Fit(free_slot,T) = true
if load(sel) < load(Qj)
if old_sel ≠ Qj
sel = Qj;
end if;
end if;
end if;
end for;
if sel=null
old_sel=null; // the only task that fits free slots is from old_sel
(sel, T)= ICAParbiter(Qj, Qk, …)
end if;
old_sel=sel;
return sel, T;
Algorithm 4. ICAP Arbiter Function for Most Loaded Queue Policy
5.2 Deadline Policy
This ICAP arbiter function (Algorithm 5) gives priority to the task with the lowest execution
margin. It is focused on reconfiguring those tasks which are closer to be rejected.
When tmargin is the same for more than one task, the Deadline Policy algorithm chooses the task
that belongs to the class which at this time makes the best use of the FPGA free resources.
function ICAParbiter(Qj, Qk, …)
sel = null;
T = null;
tmargin=1015
;
for j=0 to N-1
Ti = head(Qj);
if Fit(free_slot,Ti) = true
if tmaxi – (t + treci +texi) < tmargin;
sel = Qj;
T = Ti;
tmargin = tmaxi – (t + treci + texi);
else if tmaxi – (t + treci +texi) = tmargin;
if compare_fit(T, Ti, free_slot) = true
sel = Qj;
9. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
9
tmargin = tmaxi – (t + treci +texi);
end if;
end if;
end if;
end for;
return sel, T;
Algorithm 5. ICAP Arbiter Function for Deadline Policy
This version of the ICAParbiter() first checks there is more than one task that fit the slot with the
minimum tmargin, the task that makes the best use of the resources in the slot is selected. This is
implemented by using function compare_fit(T, Ti, free_slot), which returns a true value if Ti
makes better use of the free slot than T. When ICAParbiter() has reviewed all queues, it returns
the task (among those in the head of queues) with the lowest deadline that best fits the free slot, as
well as its associated queue. If there are no tasks that fit the free slot, the function returns T = null.
The criterion used to define a better use of the resources in the slot makes use of the following
equation:
pairs(x) = #CLBs·(#LUT-Flipflop_pair/CLB) +
#BRAMs·(#LUT-Flipflop_pair/CLB) + (7)
#DSPs·(#LUT-Flipflop_pair /CLB)
where x may either be a task or a slot. We are using this definition because all FPGA vendors
provide the correspondence between pair LUT-flip-flop and the different elements in their FPGAs
as standard FPGA size unit. Therefore, if compare_fit(T, Ti, free_slot) is true it means that the
value of pairs(Ti) is closer to pairs(free_slot) than pairs(T).
6. EXPERIMENTAL RESULTS
We have developed several experiments in order to compare the performance of the different
algorithms presented. As was previously explained, the objective of this paper is to discuss the
impact of the reconfiguration strategy on the performance of a hardware multitasking scheduling
algorithm.
We have used information from real applications’ hardware synthesis from Xilinx in order to
develop several synthetic benchmarks that are described in detail in subsection 6.3.
We have also generated artificial sets of tasks with a balanced profile in relation to the classes
defined from the real application data.
6.1 Experiments’ design
For each set of tasks we have done an offline study and have calculated a theoretical average
rejection ratio of the tasks in the set, ARR. As we have explained, each task is restricted by the
value of its tmax, and the chances to reject the task are related to the ratio between the offline
estimation of the task execution end time and its tmax:
(8)
Therefore the higher ARR of a task set is, the lower we expect the number of executed tasks to
be. Our study includes benchmarks with values of ARR over 1, because we are assuming a
10. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
10
regular intensive use of the FPGA. And it is also possible that at some time intervals our system
may need to execute even more computational work on the FPGA. Therefore, it is very important
to understand how the different strategies behave in heavy workload situations.
A set of tasks consists of a file with 52 different tasks arranged according to their arrival times.
This simulates the online arrival of tasks at the hardware manager. The hardware manager reads
one task from the file and does not have any information about the characteristics of the rest of
the tasks in the file (arrival of next task, number of remaining tasks...) as happens in an online
system.
We are using the ARR factor to characterize our experiments. We have obtained sets of tasks with
a low ARR by either leaving long time intervals between the arrival of tasks or by assigning them
a very high tmax value (or a combination of both). As was previously discussed, for the sets with
a low ARR, the impact on the value of tstart due to the reconfiguration interface is very low
because most of the new arriving tasks will find the reconfiguration interface free. In addition, if
the tasks’ deadlines are high, they can afford to wait for a long time until they may be
reconfigured. Then, the sets with a low ARR should be less sensitive to the reconfiguration
strategy used and we would expect to execute 100% of tasks for any of the reconfiguration
strategies used. If either the arrival times of tasks are very close to each other or the tmax value of
the tasks in the set is small (only a little margin for reconfiguration and execution) or a
combination of both, then the set of tasks has a high ARR and it is expected that many tasks will
be rejected. In these situations, a good management of the reconfiguration interface becomes
crucial and therefore these sets of tasks are more sensitive to the reconfiguration strategy used.
This is the reason why we have focused our experiments on sets of tasks with an ARR value
around 1.
6.2 Description of the target architecture
We have used a simulation platform with a Virtex 5 XC5VSX50T FPGA according to the
technical details provided by Xilinx in [26]. This FPGA consists of 4080 CLBs, 132 BRAMs and
288 DSPs.
Four different tasks’ classes have been defined according to the analysis of the data collected
from Xilinx cores of real applications, as shown in Table 1. In a hardware multitasking system it
is not expected that tasks executing will need the whole FPGA resources for execution, nor that
most of them will need less than 10% of it. We believe that these four classes of tasks chosen for
our benchmarks is representative enough of the hardware multitasking environment and serves
the purpose of our study which is the impact of the reconfiguration scheduling on total system
performance. The reconfiguration times associated to each task class have been normalized. Thus,
the basic time unit used for the experiments is the reconfiguration time for the smallest task
assigned present in C0.
Table 1. Task class profiling
Task class max. #CLBs max. #BRAMs max. #DSPs
C0 204 4 16
C1 480 20 32
C2 1000 18 80
C3 2400 90 160
11. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
11
6.3 Descriptions of the benchmarks
We have created three groups of Balanced Artificial benchmarks in which the fraction of the
workload according to tasks’ classes (sum of reconfiguration and execution times of the tasks for
each class) are equal. Thus the workload is perfectly balanced for the resources of the FPGA. We
have created 10 sets of tasks with different ARR values for each group (see Table 2). The value of
the arrival time of the last task in the set is also shown to highlight whether the differences in the
ARR are due to an increase in task frequency rate.
Table 2. Perfect, Semi-perfect and Global Balanced Benchmark Profiling
PB ARR tarr52 SB ARR tarr52 GB ARR tarr52
PB0 0.85 260 SB0 0.85 260 GB0 0.85 260
PB1 0.88 258 SB1 0.88 258 GB1 0.88 258
PB2 0.91 258 SB2 0.91 258 GB2 0.91 248
PB3 0.94 258 SB3 0.94 210 GB3 0.94 200
PB4 0.98 206 SB4 0.98 210 GB4 0.98 200
PB5 1.00 206 SB5 1.00 188 GB5 1.00 188
PB6 1.03 206 SB6 1.03 188 GB6 1.03 188
PB7 1.06 154 SB7 1.06 188 GB7 1.06 198
PB8 1.10 154 SB8 1.10 136 GB8 1.10 136
PB9 1.15 154 SB9 1.15 136 GB9 1.15 126
• Perfect Balance (PB): the arrival order of tasks is uniform in relation to their classes, that
is, a task from C0 arrives first, then a task from C1, then another from C2, etc.
• Semi-perfect Balance (SB): the arrival order of 25% of tasks in the PB task set has been
modified.
• Global Balance (GB): although the total workload is perfectly balanced in terms of tasks’
classes, the arrival order of tasks is random. However, we have checked that no more
than three tasks of the same class arrive consecutively.
We have created three groups of synthetic benchmarks, S1, S2 and S3, using Xilinx cores for
different real applications. Table 3 describes the three groups of synthetic benchmarks created
from Xilinx cores for different applications available at:
xilinx.com/support/documentation/ip_documentation/. For each group of synthetic benchmarks
we have designed three different sets of tasks. Their profiling is shown in table V. As these sets of
tasks are synthetic, we have not been able to create perfectly balanced profiles with them.
Therefore we also include information about the workload profiling, where the fraction of tasks of
each class in the total workload has been represented.
12. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
12
Table 3. Composition of Synthetic Benchmarks
Table 4. Profiles of Synthetic Benchmarks
ARR tarr52 C0 C1 C2 C3
S1A 0.75 242 0.24 0.25 0.24 0.27
S2A 0.75 205 0.21 0.28 0.28 0.26
S3A 0.76 303 0.13 0.31 0.30 0.26
S1B 0.81 187 0.24 0.25 0.24 0.27
S2B 0.80 201 0.21 0.28 0.28 0.26
S3B 0.80 303 0.13 0.31 0.30 0.26
S1C 0.88 184 0.24 0.25 0.24 0.27
S2C 0.87 154 0.21 0.28 0.28 0.26
S3C 0.88 252 0.13 0.31 0.30 0.26
6.3 Simulation Results
In this subsection we present four tables, one for each group of benchmarks. The first column
shows the name of set of task, the second column shows the ARR obtained for these sets of tasks
and the other columns, divided into two sub-columns each, show the performance results of the
two strategies and their associated policies.
We are using two parameters to describe the performance of a scheduling strategy: n, the total
number of executed tasks by the strategy, and improvement, which represents the performance
ratio of the RPIU strategy, and its two policies, in comparison with the Orderly Strategy
(percentage of tasks executed by any strategy over those executed by Orderly Strategy).
Table 5 shows that for Perfect Balanced benchmarks all algorithms execute all tasks for ARR
values up to 0.94. This is due to the uniformity of these sets of tasks, as the balance of their
workload is ideal. For higher values of ARR, none of the strategies is able to execute all tasks, as
13. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
13
expected, and we observe that the RPIU with Deadline Policy always yields better performance
than any of the other two.
Table 5. Results for Perfect Balanced Benchmarks
ARR
Orderly Deadline Queue Load
n n Improvement n Improvement
PB0 0.85 52 52 0 % 52 0 %
PB1 0.88 52 52 0 % 52 0 %
PB2 0.91 52 52 0 % 52 0 %
PB3 0.94 52 52 0 % 52 0 %
PB4 0.98 50 50 0 % 50 0 %
PB5 1 49 50 1.9 % 50 1.9 %
PB6 1.03 47 47 0 % 48 1.9 %
PB7 1.06 32 45 25 % 46 26.9 %
PB8 1.10 36 44 15.4 % 45 17.3 %
PB9 1.15 30 40 19.2 % 41 21.1 %
Table 6. Results for Semi-Perfect Balanced Benchmarks
ARR
Orderly Deadline Queue Load
n n Improvement n Improvement
SB0 0.85 52 52 0 % 52 0 %
SB1 0.88 51 52 1.9 % 52 1.9 %
SB2 0.91 50 52 2.8 % 52 2.8 %
SB3 0.94 43 52 17.1 % 52 17.1 %
SB4 0.98 43 48 9.5 % 48 9.5 %
SB5 1 38 49 20.9 % 48 20.9 %
SB6 1.03 38 48 19 % 47 17.1 %
SB7 1.06 38 45 13.4 % 44 11.5 %
SB8 1.10 29 40 21.1 % 40 21.1 %
SB9 1.15 29 36 13.4 % 36 13.4 %
From Table 6 we observe that although the sets of tasks are also balanced, the differences in the
performance of the reconfiguration algorithms become noticeable for lower values of ARR than
in the previous case. This group of experiments shows again that the RPIU Strategy with
Deadline Policy is better than the others. Although for the experiment SB5, the number of
executed tasks with the Queue Load Policy is slightly higher than that of the Deadline. Both
policies achieve better results than the Orderly Strategy from an ARR value of 0.91.
Table 7 follows the trend seen in Table 6. For this group of experiments, a high value of ARR has
more impact on the number of executed tasks than we have seen in the previous experiments.
These experiments have been designed with random arrival of tasks and therefore there are lapses
of time when the workload assigned to the FPGA is largely over the FPGA capacity. The more
efficient the reconfiguration strategy is, the lower the impact of these task peaks on the global
performance. We may see that both Queue Load and Deadline policies yield good performance
for limit ARR values (all 52 tasks are executed even with ARR=0.94) while the Orderly Strategy
starts rejecting tasks for ARR values over 0.85.
We would like to remark that Table 8 presents three different synthetic benchmarks with three
different values for ARR each, which have been grouped in contiguous lines for clarity. These
benchmarks are formed by tasks with a wide variety in resource requirements as well as in
execution times: a Discrete Fourier Transform takes about a hundred times more time to execute
than a Color Correction Matrix and uses about 30 times more FPGA resources. In spite of this,
14. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
14
both the Deadline Strategy and the Queue Load Policy are able to schedule all tasks for execution
for an ARR value of 0.85, and fail to schedule 100% of the tasks for higher values of ARR.
However, in all cases the performance of the RPIU Strategy is well over the performance of the
Orderly.
Table 7. Results for Global Balanced Benchmarks
ARR
Orderly Deadline Queue Load
n n Improvement n Improvement
GB0 0.85 47 52 5.7 % 52 5.7 %
GB1 0.88 44 51 13.4 % 51 13.4 %
GB2 0.91 43 50 13.4 % 50 13.4 %
GB3 0.94 46 50 7.7 % 50 7.7 %
GB4 0.98 44 50 11.5 % 50 11.5 %
GB5 1 44 48 7.7 % 49 7.7 %
GB6 1.03 41 43 2.8 % 42 1.9 %
GB7 1.06 39 40 1.9 % 40 1.9 %
GB8 1.10 36 40 7.7 % 40 7.7 %
GB9 1.15 30 40 19 % 41 17.3 %
Table 8. Result for Synthetic Benchmarks
ARR
Orderly Deadline Queue Load
n n Improvement n Improvement
S1A 0,85 42 52 19 % 52 19%
S2A 0,85 49 52 5.7 % 52 5.7 %
S3A 0,86 52 52 0 % 52 0%
S1B 0,91 41 48 13.4 % 47 11.5 %
S2B 0,91 36 41 9.6 % 40 7.7 %
S3B 0,91 51 51 0% 51 0%
S1C 0,98 39 47 15.3 % 46 13.4 %
S2C 0,97 27 34 13.4 % 36 17.1 %
S3C 0,98 43 43 0% 42 -1.9%
This group of experiments is most interesting, not only because it simulates the behavior of real
applications, but also because of the difficulties that the dispersion in both the execution times
and the tasks’ sizes present, and highlights the importance of a good reconfiguration strategy in
hardware multitasking scheduling.
We have calculated the average number of tasks executed for all benchmarks (artificial and
synthetic) grouped by their ARR value, and have represented them in a graph shown in Figure 2.
Being conscious that although we have made a wide range of experiments, we can never cover
the infinite possibilities that may be found in a real computing system, we have calculated a linear
interpolation of the results in order to find out whether there is a clear difference in the
performance tendency of the strategies under study.
The analysis of this graph clearly points at RPIU Strategy with Deadline Policy as a better
reconfiguration management than the Orderly Strategy and the RPIU with Queue Load policy. In
systems where an intensive use of the FPGA is expected (ARR > 0,8) the reconfiguration strategy
used is crucial for a good performance and the Deadline Poicy is the most suitable of the three
strategies analysed.
15. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
15
Figure 2. Lineal approximation of the number of tasks executed by the strategies tested
6. CONCLUSION
In this paper we have presented different reconfiguration strategies which have been combined
with a simple heuristic used to find an empty slot in the FPGA.
The FIFO strategy schedules the start time of the tasks as soon as they arrive. Then the FPGA
manager may report task rejection at a very early stage.
The RPIU strategy delays this scheduling to the time when a task may be executed in the FPGA,
this is, when there is a free slot in the FPGA and the reconfiguration interface is free. Delaying
the scheduling decision allows taking into account more parameters that are available in the
system at task launch time. Therefore, RPIU uses a wider scope for the scheduling and yields
better results. The performance difference between the RPIU policies is not as remarkable as that
of RPIU and FIFO. The difference lies in the priority criteria used to resolve reconfiguration
conflicts (there are enough FPGA resources available for several tasks at the same time).
The aim of the Queue Load Policy is to keep the workload balance among the queues. In case of
conflict this policy gives priority to the task in the most loaded queue. The Deadline Policy
calculates the execution margin of tasks in conflict and gives priority to the one with the lowest
remaining time.
The experimental results show that when the workload assigned is close to the maximum
theoretical FPGA capacity, the RPIU strategy achieves better results. Among the two RPIU
policies, the Deadline is the one that achieves more efficient results in most experiments. The
explanation for this is that if in times of conflict there is a queue that is significantly more loaded
than the rest, it is very likely that the tasks in this queue may have been waiting for a long time.
Therefore, the Deadline Policy will launch them for execution, similar to what Queue Load would
do. However, if the tasks in the most loaded queue may wait for a longer time than the others, it is
reasonable to give priority to tasks that are closer to rejection. In addition, as displayed in Fig. 2,
the RPIU Strategy with Deadline Policy has a tendency to make the best use of the FPGA
resources.
16. Computer Science & Engineering: An International Journal (CSEIJ), Vol.2, No.6, December 2012
16
REFERENCES
[1] Wong, S., Vassiliadis S., Cotofana S. “Future directions of programmable and reconfigurable
embedded processors” Domain-Specific Processors: Systems, Architectures, Modeling, and
Simulation, pp 149-235, CRC Press 2003.
[2] Steiger, C., Walder, H., Plazner, M. “Operating Systems for Reconfigurable Embedded Platforms:
Online Scheduling of Real-Time Tasks”, IEEE Transactions on Computers, Vol. 53, No. 11, pp.
1393-1407, Noviembre 2004.
[3] Lysaght P., Blodget O., Maso J., Enhanced Architectures, Design Methodologies and CAD Tools for
Dynamic Reconfiguration of Xilinx FPGAS, Procc. International Conference on Field Programmable
Logic and Applications (FPL’06), pp 1 - 6, Madrid, Spain, August 2006.
[4] Diessel, O. F., Elgindy, H., "On Dynamic Task Scheduling for FPGA-based Systems", International
Journal of Foundations of Computer Science, IJFCS'01, Vol. 12, No. 5, pp. 645-669, 2001
[5] Bazargan, K.; Kastner, R. & Sarrafzadeh, M. “Fast Template Placement for Reconfigurable
Computing Systems”, IEEE Design and Test of Computers, Vol. 17, No. 1, (January-March 2000) pp.
68–83, ISSN: 0740-7475.
[6] Handa, M., Vemuri, R. “An efficient algorithm for finding empty space for online FPGA placement”,
DAC 2004, pp 960-965, 2004
[7] Ahmadinia, A., Bobda, C., Fekete, S., Teich, J., van der Veen, J., “Optimal Free-Space Management
and Routing-Conscious Dynamic Placement for Reconfigurable Devices” IEEE Trans. on Computers,
Vol. 56, nº 5, pages 673-680, May 2007.
[8] Majer, M.,Teich, J., Ahmadinia, A., Bobda, C. “The Erlangen Slot Machine: a dynamically
reconfigurable FPGA-based computer”. Journal of VLSI Signal Processing Systems, vol.47, no. 1, pp
15-31, 2007
[9] Román, S., Mecha, H., Mozos, D., Septién J., “Constant Complexity Scheduling for Hardware
Multitasking in Two Dimensional Reconfigurable Field-Programmable Gate Arrays”. IET Computer
Digital Techniques. Volume 2. Pages 401-412. Nov. 2008
[10] Resano, J., Mozos, D., Verkest, D., Catthoor, F. “A Reconfiguration Manager for Dynamically
Reconfigurable Hardware” Journal of IEEE Computers Design and Test, vol. 22, no. 5, pp 452—460,
2005
[11] Gotz, M. ; Dittmann, F. ; Pereira, C.E. ;” Deterministic Mechanism for Run-time Reconfiguration
Activities in an RTOS” in Proceedings of IEEE International Conference on Industrial Informatics, pp
693-698, 2006
[12] Redaelli, F.; Santambrogio, M.D.; Sciuto, D.; , "Task Scheduling with Configuration Prefetching and
Anti-Fragmentation techniques on Dynamically Reconfigurable Systems," in proceedings of Design,
Automation and Test in Europe, 2008. DATE '08 , vol., no., pp.519-522, 10-14 March 2008
[13] Angermeier, J.; Teich, J.; , "Heuristics for scheduling reconfigurable devices with consideration of
reconfiguration overheads," In proceedings of Parallel and Distributed Processing, 2008. IPDPS 2008.
IEEE International Symposium on , vol., no., pp.1-8, 14-18 April 2008
[14] Defective Pixel Correction v1.0, Xilinx corporation, 2009 April.
[15] Gamma Correction v1.0, Xilinx corporation, 2009 April.
[16] Image processing pipeline v1.0, Xilinx corporation, 2009 April.
[17] Color Correction Matrix v1.0, Xilinx corporation, 2009 April.
[18] Reed Solomon Encoder v7.0, Xilinx corporation, 2009 June.
[19] Reed Solomon Decoder v7.0, Xilinx corporation, 2009 June.
[20] Convolutional Encoder v7.0, Xilinx corporation, 2009 June.
[21] Viterbi Decoder v7.0, Xilinx corporation, 2009 June.
[22] 3GPP Turbo Decoder v4.0, Xilinx corporation, 2009 June.
[23] Cordic cosin v4.0 Xilinx corporation, 2009 June.
[24] Logic Core IP Discrete Fourier Transform v3.1 Xilinx corporation, 2009 December.
[25] Logic Core IP Fast Fourier Transform v7.1 Xilinx corporation, 2009 April.
[26] Virtex V FPGA Configuration Guide, Xilinx corporation, August 2009.
[27] Gonzalez, A.L.; Mecha, H.; Septien, J.; Roman, S.; Mozos, D."Synthesis of Relocatable Tasks and
Implementation of a task communication bus in a General Purpose Hardware System" , in
proceedings of the International Conference on Engineering of Reconfigurable Systems and
Algorithms, ERSA 08, vol. 1, pp307-308, 2008.