SlideShare a Scribd company logo
1 of 8
Download to read offline
TELKOMNIKA, Vol. 15, No. 3, September 2017, pp. 1040 ∼ 1047
ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013
DOI: 10.12928/telkomnika.v15.i3.6513 1040
Application Profiling and Mapping on
NoC-based MPSoC Emulation Platform on
Reconfigurable Logic
Jia Wei Tang*
, Yuan Wen Hau**
, Nasir Shaikh-Husin*
, and Muhammad Nadzir Marsono*
*
Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor, Malaysia
**
IJN-UTM Cardiovascular Engineering Centre, Faculty of Biosciences and Medical Engineering, Universiti
Teknologi Malaysia, Johor, Malaysia
*
Corresponding author, e-mail: jwtang2@live.utm.my
Abstract
In network-on-chip (NoC) based multi-processor system-on-chip (MPSoC) development, applica-
tion profiling is one of the most crucial step during design time to search and explore optimal mapping.
Conventional mapping exploration methodologies analyse application-specific graphs by estimating its run-
time behaviour using analytical or simulation models. However, the former does not replicate the actual
application run-time performance while the latter requires significant amount of time for exploration. To map
applications on a specific MPSoC platform, the application behaviour on cycle-accurate emulated platform
should be considered for obtaining better mapping quality. This paper proposes an application mapping
methodology that utilizes a MPSoC prototyped in Field-Programmable Gate Array (FPGA). Applications are
implemented on homogeneous MPSoC cores and their costs are analysed and profiled on the platform
in term of execution time, intra-core communication and inter-core communication delays. These metrics
are utilized in analytical evaluation of the application mapping. The proposed analytical-based mapping is
demonstrated against the exhaustive brute force method. Results show that the proposed method is able to
produce quality mappings compared to the ground truth solutions but in shorter evaluation time.
Keywords: Embedded system, application mapping, multi-processor system-on-chips (MPSoCs), through-
put constraint.
Copyright c 2017 Universitas Ahmad Dahlan. All rights reserved.
1. Introduction
Multi-Processor System-on-Chips (MPSoCs) is commonly used in modern embedded
systems such as smartphones and tablets as well as in other diverse fields such as multimedia
and networking applications to perform time-constrained tasks. Mapping of time critical applica-
tions is a crucial step in design time to provide the highest possible system performance. The
problem of mapping applications onto MPSoC is one of the biggest challenge embedded system
programmers face nowadays [1–3]. An application is commonly specified as a data-flow graph
consisting of tasks that communicate between them, on the other hand, an MPSoC platform is
described by the number of cores, the type of cores (in heterogeneous platform), and their com-
munication topology or infrastructure. The design space exploration process takes the application
graph as input and assigns all tasks to processing cores on the desired MPSoC platform to find
the best performance mapping.
Methods for evaluating each mapping performance in design space can be categorized
into three types [4]: estimation from analytical model, measurement from simulation model,
and measurement from implementation prototype. Analytical estimation is the fastest evalua-
tion method but with limited accuracy as it does not sufficiently replicate the system behaviour.
Simulation-based measurement exists between two extremes, i.e. slow with high accuracy and
fast but less accurate. On the contrary, prototype-based evaluation provides the highest accuracy
but requires long development time.
Received February 27, 2017; Revised June 28, 2017; Accepted July 21, 2017
TELKOMNIKA ISSN: 1693-6930 1041
In most cases, a certain application is desired to be executed on a specific MPSoC plat-
form. As the task execution and communication delays vary for different MPSoC platforms, map-
ping that solely depends on the costs for executing application specification behaviour may not
be sufficient to produce high quality mapping when considering the MPSoC physical interconnect
fabric. The behaviour on the platform must be considered and profiled thoroughly to aid applica-
tion mapping exploration process. Hence, an application profiled in a prototype-based platform
can improve profiling precision over the analytical-based mapping while having fast evaluation
speed.
This paper presents a application mapping methodology that utilizes field-programmable
gate array (FPGA) emulation as the evaluation platform to optimize the application mapping for the
highest throughput. Applications are analysed and profiled on the emulation platform to estimate
their implemented costs in term of execution, intra-core communication and inter-core commu-
nication delays. The profiled application costs are used in improving the accuracy of analytical-
based mapping. The proposed methodology is demonstrated by mapping several test applications
on a 3×3 meshed network-on-chip (NoC) based MPSoC prototyped in FPGA.
The rest of this paper is organized as follows. Section 2. presents the related works in
application mapping. Section 3. discusses the proposed throughput aware application mapping
methodology including the execution trace analysis, platform parameters extraction and mapping
exploration steps. Section 4. presents the result of the proposed mapping approach on different
test cases and Section 5. concludes this paper.
2. Related works
Task mapping technique can be classified into design time mapping, run-time mapping
and hybrid mapping. In design time mapping [5–9], the mapping process is performed off-line
to facilitate better decision in using system resources. Design time mapping is suitable for static
workload scenarios having a predefined task graph and static platform (i.e., both applications and
platform do not change in time). Run-time mapping [10–13] on the contrary, is able to handle
dynamism of run-time workload such as insertion or removing new applications into or from the
system, as well as the dynamic platform behaviour due to run-time faults. However, run-time task
mapping is performed using limited on-line processing resource while making a quick decision
without sufficient mapping exploration, thus may produce low-quality mapping.
In addition, hybrid mapping has also been proposed in many different works [14–19] by
combining design time and run-time techniques. Hybrid mapping explores best mapping of tasks
during design time for different scenarios (constraints conditions) while chooses the most suitable
mapping at run-time. However, these conventional approaches in application mapping assume
a specific well-defined program behaviour, often specified in task graph and does not consider
platform dependent parameters, such as the changes in communication delay and communication
volume during dynamic mapping exploration.
Most mapping methodologies in literatures e.g. [5–13] assumed a specific program exe-
cution while the communication cost was considered constant across platform. As practical be-
haviour of applications may differ compared to its specified task graph, there are different methods
in literatures extracting application run-time metrics prior to the mapping exploration. Trace anal-
ysis technique was used in [20] to find the execution behavior for each mapping and examine
their differences. Singh et al. [21] used the application behaviour from trace analysis to optimally
map multiple applications on the same MPSoC. To our best knowledge, there is no work that
profiles and maps application based on execution, intra-core and inter-core communication on an
emulation platform.
3. Proposed Application Mapping Methodology
The flow of the proposed application mapping approach is depicted in Figure 1. Unlike
conventional mapping methodology which solely depends on the costs specified in application
task graph, the proposed approach considers the application practical behaviour to better estimate
Application Profiling and Mapping on NoC-based MPSoC ... (Jia Wei Tang)
1042 ISSN: 1693-6930
the mapping performance. The proposed methodology for application mapping are as follows:
1. Application Profiling – The predefined application are executed on the targeted platform
to measure its practical implementation costs. Application are profiled in term of execution
and communication costs in time delay (cycles).
2. Application Mapping – Application mapping design space exploration is performed to op-
timize overall throughput. In order to evaluate each mapping performance, execution traces
are analysed. An analytical model is formulated based on the profiled application costs are
used to estimate the mapping performance during the design space exploration.
Application Profiling
Application
t0 t1
t2
t3
t4
MPSoC in FPGA
Application Specification Platform Specification
Mapping Design Space Exploration
Application Profiles
Execution Traces Analysis
execution, x
intra-core comm., ca
inter-core comm., ce
emulate
Current Mapping
Core 1:
Core 2:
Core 3:
time
exec. delayxi cij comm. delay
Analytic Evaluation
Best Mapping?
yes
Next
Mapping
no
Optimal Mapping
Figure 1. Proposed application mapping using application profiled in FPGA-emulated MPSoC.
3.1. Application and Platform Model
Task graph of an application can be characterized by a directed graph (DG), G = (T, E),
where T = {t1, t2, ..., tn} is the set of tasks in the application and E = {e1, e2, ..., en} is the set of
directed edges representing dependencies of tasks as shown in Figure 2. Each task, ti contains
an xi denoting the worst-case execution time for the task and is assumed to remain fixed during
the execution. Each edge, ek includes vij which represents the communication volume between
tasks ti and tj.
As designing the entire NoC-based MPSoC platform is out of scope of this work, the
employed platform is generated using ProNoC tool [22]. The overview of the NoC-based MPSoC
platform used in this work is depicted in Figure 3. Figure 3a illustrates the architecture for each
core (tile) in the MPSoC platform. Distributed memory architecture is implemented, where each
core consists of a processor, a local memory, and a network interface (NI). Mapped tasks in each
core are stored and executed from the local memory, while cores inter-communicate with other
cores via network interfaces. All cores are homogeneous and connected to each other through
the NoC interconnect fabric.
TELKOMNIKA Vol. 15, No. 3, September 2017 : 1040 ∼ 1047
TELKOMNIKA ISSN: 1693-6930 1043
Figure 2. Example of an application model containing 6 tasks.
(a) Each core consists of a processor, local memory, and
network interface in MPSoC platform model
Core 0 Core 1 Core 2
Core 3 Core 4 Core 5
Core 6 Core 7 Core 8
R R R
R R R
R R R
(b) 3×3 mesh based NoC intercon-
nect.
Figure 3. MPSoC platform model for profiling, mapping and emulating application in FPGA.
Mapping of an application is the process of assigning its tasks into the available cores on
MPSoC platform. Cooperative (non pre-emptive) multi-tasking is employed in each core, where
one core can execute more than one task. Communicating tasks use message passing interface
(MPI) protocol to send and receive data and control packets. Hence, communication between
tasks that are mapped to the same core also require message copying delay that is considered
as intra-core communication. On the other hand, inter-core communication is defined as the
communication between tasks that are mapped in different cores.
3.2. Application Profiling
Application profiling is the process to extract practical task execution and communication
delays of an application when emulated in FPGA. The aim of performing application profiling
is to measure the effect of task execution and communication on the overall system throughput.
Execution time, xi of each task ti is defined as the time required to complete a task after it receives
complete data (or token) during one execution period. Task execution time of an application can
be easily obtained by capturing the time difference between the starting and ending of a task’s
execution. As the execution time on any homogeneous core is equal regardless of the mapping,
the maximum execution delays of all tasks can be recorded by mapping all tasks into one core.
On the contrary, communication time, cij is divided into two types represented by a two-
tuple < ca, ce >, with ca and ce denote the intra-core and inter-core delays, respectively. As
application as specified in a task graph contains only communication volume in each connecting
task, profiling each connecting core has to be done to measure the practical communication time.
In order to profile each intra-core communication delay, the two connecting tasks are isolated and
executed in the same core. The intra-core communication delay is measured as the throughput
difference between executing both tasks independently without communication and with com-
munication. On the contrary, both tasks are mapped in neighbouring cores to profile inter-core
communication delay. After profiling, the application graph is re-constructed with profiled spec-
ification as illustrated in Figure 4. Although the actual delays of the applications may differ for
different mappings (maybe due to the multi-task overhead or communication infrastructure de-
Application Profiling and Mapping on NoC-based MPSoC ... (Jia Wei Tang)
1044 ISSN: 1693-6930
lay such as contention), the profiled costs can be used as the basis to estimate the application
run-time behaviour.
x0
x1
c01=<ca, ce>01
t0
t1
e01
profiling
Figure 4. Application graph re-construction after profiling execution and communication time on
emulated FPGA.
3.3. Mapping Design Space Exploration
After application profiling, analytical-based mapping utilizes the profiled costs to optimize
the application throughput. Execution traces of each mapping is analysed to find the estimation
of application run-time behaviour. Each mapping is evaluated using analytical expression to cal-
culate the throughput using the profiled costs. Mapping exploration process is iterative until the
best mapping is obtained.
3.3.1. Execution Traces Analysis
Execution traces of a mapped applications are analysed to obtain the application through-
put. Task parallelism and temporal parallelism are considered in the execution traces analysis.
The former is the ability to allow parallel execution of different tasks while the latter allows parallel
execution of different temporal data in different cores at the same time.
For instance, Figure 5 depicts the execution traces of an application having 6 tasks, as
shown in Figure 2, mapped onto a 4-core system. All 4 cores are communicating with each
other while executing their assigned tasks in parallel as soon as their data are available. Task
parallelism occurs for task t1 and t2, taking x1 and x2 delays to be executed in core 1 and core
2, respectively. Data parallelism is demonstrated through the execution of different temporal data
(e.g data 1 and data 2) at different time instances. As core 4 requires the longest time to compute
one result, the period (inverse of throughput) is determined by the combination of execution delay
and communication delay.
Core 1:
Core 2:
Core 3:
Core 4:
x0 c01 x1c02
x2c02
c13
c23
c13 c23
c24
x3 c35
c24 x4 c35 x5
x0 c01 x1c02
x2c02
c13
c23
c13 c23
c24
x3 c35
c24 x4 c35 x5
x0 c01 x1c02
x2c02
c13
c23
c13 c23
c24
x3 c35
c24 x4 c35 e5
exec. delay for task i
data 1 data 2 data 3
xi comm. delay for task i to task jcij
time
period periodperiod
Figure 5. Execution traces of a 6-tasks applications running on 4 cores system.
Application throughput is reciprocal of one period (i.e, the average time needed to com-
plete one iteration of the application). Each core utilizes the time period to execute its assigned
task as well as to communicate with other cores. Since the tasks in each application can be ex-
ecuted in pipeline and task-parallel in different cores, the overall throughput is determined by the
slowest core (i.e, the core that requires the longest time to complete executions and communica-
tions of all its mapped tasks).
TELKOMNIKA Vol. 15, No. 3, September 2017 : 1040 ∼ 1047
TELKOMNIKA ISSN: 1693-6930 1045
3.3.2. Analytical Evaluation
Given a graph representation of an application with Nt tasks, mapping exploration in
this work is the process of assigning all tasks into available Nc cores to maximize the system
throughput. Based on the execution trace analysis, throughput of a mapped application is the
reciprocal of period. The period is defined as the maximum total required time (among all cores)
for execution and communication, and can be mathematically formulated in Equation 1. The total
required time in each core is calculated by the summation of the profiled execution delays of all
individual assigned tasks as well as all the profiled intra-core and inter-core communication delays
associated to it.
1
throughput
= period
= max
1≤k≤NC
(Xk + Ck)
= max
1≤k≤NC
(
NT
i=0
xi.mtik +
NT
i=0
NT
j=0
caij.maijk +
NT
i=0
NT
j=0
ceij.meijk) (1)
where i and j are task identifiers while k is the core identifier. Xk denotes the total execution
delays of mapped tasks in core k, which is the summation of the profiled execution delay xi if task
ti is mapped to core k (mtik = 1). Ck denotes the total communication delays associated to core
k, which is the summation of all profiled intra-core communication delay caij if intra-core commu-
nication of task ti and tj exist in core k (maijk = 1), and all profiled inter-core communication
delay ceij if inter-core communication of task ti and tj is associated to core k (meijk = 1).
4. Results and Discussion
The proposed analytical-based mapping is demonstrated by employing the exhaustive
brute force method to map several random application graphs and find the mapping that produces
the best application throughput. The best mapping obtained by the proposed analytical-based
exhaustive (AE) are compared with emulation-based exhaustive (EE) solutions (i.e., ground truth)
which is obtained by executing all mappings on a FPGA emulation system.
Figure 6a shows the throughput for all possible mapping for the proposed analytical-
based application mapping and emulation-based mapping. The proposed analytical-based map-
ping produces results with similar trend with the ground truth solution. Figure 6b depicts the
comparison between emulation-based throughput and analytical-based solution. The proposed
analytical-based mapping throughput is almost similar with emulation-based throughput as the
points scatter nearby the straight line of gradient one.
Table 1 illustrates the comparison between proposed analytical-based and emulation-
based exhaustive (ground truth) for application of different random graphs. The best throughput
(BT) is obtained by emulating the best mapping produced by both methods on FPGA. The eval-
uation time (ET) is the time taken to perform the exhaustive evaluation on all possible mappings
while profiling time is the time taken to profile the tasks based on the proposed method. Results
illustrate that the evaluation time for both exhaustive mapping increases exponentially with the
number of possible mappings. The proposed analytical-based method is able to evaluates all
mappings in much shorter time despite requiring extra profiling time. However, the profiling time
does not increase with number of mappings as it corresponds to the number of profiles required,
that is the total number of tasks and edges in application graphs.
5. Conclusion
Efficient mapping algorithm produces high quality solution in fast evaluation time. This
paper presents a application mapping methodology that utilizes FPGA emulation as an evaluation
platform to optimize the application throughput. Applications are analysed and profiled in the
Application Profiling and Mapping on NoC-based MPSoC ... (Jia Wei Tang)
1046 ISSN: 1693-6930
0 50 100 150 200 250 300
100000
200000
300000
400000
500000
mappings (sorted with EE)
1/throughput(cycles)
AE
EE
(a) Throughput of all possible mapping for the proposed
analytical-based exhaustive (AE) and emulation-based ex-
haustive (EE) mapping (sorted based on EE).
100000 200000 300000 400000 500000
150000
200000
250000
300000
350000
400000
450000
500000
EE 1/throughput (cycles)
AE1/throughput(cycles)
(b) Emulation-based versus analytical-based exhaustive
throughput.
Figure 6. Throughput comparison between emulation-based exhaustive (EE) and proposed
analytical-based exhaustive (AE) mapping in design space exploration of a 7-task application.
Table 1. Comparison in throughput and evaluation time between proposed analytical-based (AE)
and emulation-based exhaustive (EE) for application of different random graphs
Applications
No. of EE Proposed AE
Tasks BT ET BT PT ET
random1(r7) 5 87 607.61 s 87 252.21 s 0.03 s
random2(r8) 6 401 2912.33 s 401 296.10 s 0.15 s
random4(r2) 7 106 15745.43 s 106 309.34 s 0.96 s
random5(r4) 7 875 16143.52 s 875 276.82 s 0.91 s
BT: Best Throughput ET: Evaluation Time PT: Profiling Time
emulated environment to obtain their practical implemented costs in term of execution, intra-core
communication and inter-core communication delays. Application mapping utilizes the profiled
costs find optimal mapping through execution trace analysis and analytical evaluation. Results
show that the proposed analytical-based exhaustive based on the profiling can produce similar
best throughput compared to the ground truth solutions but in shorter evaluation time.
Acknowledgement
This work is supported in part by the Collaborative Research in Engineering, Science &
Technology (CREST) grant P17C114 (UTM vote no. 4B176) and Universiti Teknologi Malaysia
Matching grant (UTM vote no. 00M75).
References
[1] R. Marculescu, U. Y. Ogras, L.-S. Peh, N. E. Jerger, and Y. Hoskote, “Outstanding research
problems in noc design: system, microarchitecture, and circuit perspectives,” IEEE Transac-
tions on Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 1, pp. 3–21,
2009.
[2] A. K. Singh, M. Shafique, A. Kumar, and J. Henkel, “Mapping on multi/many-core systems:
survey of current and emerging trends,” in Proceedings of the 50th Annual Design Automation
Conference. ACM, 2013, p. 1.
[3] P. K. Sahu and S. Chattopadhyay, “A survey on application mapping strategies for network-
on-chip design,” Journal of Systems Architecture, vol. 59, no. 1, pp. 60–76, 2013.
[4] R. Piscitelli and A. D. Pimentel, “Design space pruning through hybrid analysis in system-
level design space exploration,” in Design, Automation & Test in Europe Conference & Exhi-
TELKOMNIKA Vol. 15, No. 3, September 2017 : 1040 ∼ 1047
TELKOMNIKA ISSN: 1693-6930 1047
bition (DATE), 2012. IEEE, 2012, pp. 781–786.
[5] J. Hu and R. Marculescu, “Energy-and performance-aware mapping for regular noc archi-
tectures,” IEEE Transactions on computer-aided design of integrated circuits and systems,
vol. 24, no. 4, pp. 551–562, 2005.
[6] K. Srinivasan, K. S. Chatha, and G. Konjevod, “Linear-programming-based techniques for
synthesis of network-on-chip architectures,” IEEE Transactions on Very Large Scale Integra-
tion (VLSI) Systems, vol. 14, no. 4, pp. 407–420, 2006.
[7] S. Tosun, “Cluster-based application mapping method for network-on-chip,” Advances in En-
gineering Software, vol. 42, no. 10, pp. 868–874, 2011.
[8] P. K. Sahu, T. Shah, K. Manna, and S. Chattopadhyay, “Application mapping onto mesh-
based network-on-chip using discrete particle swarm optimization,” IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, vol. 22, no. 2, pp. 300–312, 2014.
[9] Y. Z. Tei, Y. W. Hau, N. Shaikh-Husin, and M. N. Marsono, “Network partitioning domain
knowledge multiobjective application mapping for large-scale network-on-chip,” Applied Com-
putational Intelligence and Soft Computing, vol. 2014, p. 9, 2014.
[10] E. L. de Souza Carvalho, N. L. V. Calazans, and F. G. Moraes, “Dynamic task mapping for
mpsocs,” IEEE Design & Test of Computers, vol. 27, no. 5, pp. 26–35, 2010.
[11] A. K. Singh, T. Srikanthan, A. Kumar, and W. Jigang, “Communication-aware heuristics for
run-time task mapping on noc-based mpsoc platforms,” Journal of Systems Architecture,
vol. 56, no. 7, pp. 242–255, 2010.
[12] M. Fattah, A.-M. Rahmani, T. C. Xu, A. Kanduri, P. Liljeberg, J. Plosila, and H. Tenhunen,
“Mixed-criticality run-time task mapping for noc-based many-core systems,” in 2014 22nd
Euromicro International Conference on Parallel, Distributed and Network-Based Processing
(PDP). IEEE, 2014, pp. 458–465.
[13] T. D. Ngo, K. J. Martin, and J.-P. Diguet, “Move based algorithm for runtime mapping of
dataflow actors on heterogeneous mpsocs,” Journal of Signal Processing Systems, pp. 1–
18, 2015.
[14] S. Stuijk, M. Geilen, and T. Basten, “A predictable multiprocessor design flow for streaming
applications with dynamic behaviour,” in 2010 13th Euromicro Conference on Digital System
Design: Architectures, Methods and Tools (DSD). IEEE, 2010, pp. 548–555.
[15] L. Schor, I. Bacivarov, D. Rai, H. Yang, S.-H. Kang, and L. Thiele, “Scenario-based design
flow for mapping streaming applications onto on-chip many-core systems,” in Proceedings of
the 2012 international conference on Compilers, architectures and synthesis for embedded
systems. ACM, 2012, pp. 71–80.
[16] C. Lee, S. Kim, and S. Ha, “Efficient run-time resource management of a manycore accel-
erator for stream-based applications,” in 2013 IEEE 11th symposium on Embedded systems
for real-time multimedia (ESTIMedia). IEEE, 2013, pp. 51–60.
[17] W. Quan and A. D. Pimentel, “A scenario-based run-time task mapping algorithm for mpsocs,”
in Proceedings of the 50th Annual Design Automation Conference. ACM, 2013, p. 131.
[18] A. K. Singh, A. Kumar, and T. Srikanthan, “Accelerating throughput-aware runtime mapping
for heterogeneous mpsocs,” ACM Transactions on Design Automation of Electronic Systems
(TODAES), vol. 18, no. 1, p. 9, 2013.
[19] W. Quan and A. D. Pimentel, “A hybrid task mapping algorithm for heterogeneous mpsocs,”
ACM Transactions on Embedded Computing Systems (TECS), vol. 14, no. 1, p. 14, 2015.
[20] A. Goens and J. Castrillon, “Analysis of process traces for mapping dynamic kpn applications
to mpsocs,” in IFIP International Embedded Systems Symposium (IESS), 2015.
[21] A. K. Singh, M. Shafique, A. Kumar, and J. Henkel, “Resource and throughput aware ex-
ecution trace analysis for efficient run-time mapping on mpsocs,” IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, vol. 35, no. 1, pp. 72–85, 2016.
[22] ProNoC. (2016) Prototype NoC based MPSoC. Opencore. [Online]. Available:
http://opencores.org/project,an-fpga-implementation -of-low-latency- noc-based-mpsoc
Application Profiling and Mapping on NoC-based MPSoC ... (Jia Wei Tang)

More Related Content

What's hot

Distributed web systems performance forecasting
Distributed web systems performance forecastingDistributed web systems performance forecasting
Distributed web systems performance forecastingIEEEFINALYEARPROJECTS
 
Topological Optimization and Genetic Algorithms Used in a Wheel Project for a...
Topological Optimization and Genetic Algorithms Used in a Wheel Project for a...Topological Optimization and Genetic Algorithms Used in a Wheel Project for a...
Topological Optimization and Genetic Algorithms Used in a Wheel Project for a...Neimar Silva
 
Optimizing Effort Parameter of COCOMO II Using Particle Swarm Optimization Me...
Optimizing Effort Parameter of COCOMO II Using Particle Swarm Optimization Me...Optimizing Effort Parameter of COCOMO II Using Particle Swarm Optimization Me...
Optimizing Effort Parameter of COCOMO II Using Particle Swarm Optimization Me...TELKOMNIKA JOURNAL
 
Knowledge-based Expert System for Route Selection of Road Alignment
Knowledge-based Expert System for Route Selection of Road AlignmentKnowledge-based Expert System for Route Selection of Road Alignment
Knowledge-based Expert System for Route Selection of Road AlignmentUCSI University
 
Test sequences for web service composition using cpn model
Test sequences for web service composition using cpn modelTest sequences for web service composition using cpn model
Test sequences for web service composition using cpn modelAlexander Decker
 
Integrating profiling into mde compilers
Integrating profiling into mde compilersIntegrating profiling into mde compilers
Integrating profiling into mde compilersijseajournal
 
Maximum likelihood estimation-assisted ASVSF through state covariance-based 2...
Maximum likelihood estimation-assisted ASVSF through state covariance-based 2...Maximum likelihood estimation-assisted ASVSF through state covariance-based 2...
Maximum likelihood estimation-assisted ASVSF through state covariance-based 2...TELKOMNIKA JOURNAL
 
THE UNIFIED APPROACH FOR ORGANIZATIONAL NETWORK VULNERABILITY ASSESSMENT
THE UNIFIED APPROACH FOR ORGANIZATIONAL NETWORK VULNERABILITY ASSESSMENTTHE UNIFIED APPROACH FOR ORGANIZATIONAL NETWORK VULNERABILITY ASSESSMENT
THE UNIFIED APPROACH FOR ORGANIZATIONAL NETWORK VULNERABILITY ASSESSMENTijseajournal
 
Integrative Model for Quantitative Evaluation of Selection Telecommunication ...
Integrative Model for Quantitative Evaluation of Selection Telecommunication ...Integrative Model for Quantitative Evaluation of Selection Telecommunication ...
Integrative Model for Quantitative Evaluation of Selection Telecommunication ...TELKOMNIKA JOURNAL
 
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISONSTATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISONijseajournal
 

What's hot (13)

Distributed web systems performance forecasting
Distributed web systems performance forecastingDistributed web systems performance forecasting
Distributed web systems performance forecasting
 
Topological Optimization and Genetic Algorithms Used in a Wheel Project for a...
Topological Optimization and Genetic Algorithms Used in a Wheel Project for a...Topological Optimization and Genetic Algorithms Used in a Wheel Project for a...
Topological Optimization and Genetic Algorithms Used in a Wheel Project for a...
 
Optimizing Effort Parameter of COCOMO II Using Particle Swarm Optimization Me...
Optimizing Effort Parameter of COCOMO II Using Particle Swarm Optimization Me...Optimizing Effort Parameter of COCOMO II Using Particle Swarm Optimization Me...
Optimizing Effort Parameter of COCOMO II Using Particle Swarm Optimization Me...
 
9 coldengine
9 coldengine9 coldengine
9 coldengine
 
Knowledge-based Expert System for Route Selection of Road Alignment
Knowledge-based Expert System for Route Selection of Road AlignmentKnowledge-based Expert System for Route Selection of Road Alignment
Knowledge-based Expert System for Route Selection of Road Alignment
 
Test sequences for web service composition using cpn model
Test sequences for web service composition using cpn modelTest sequences for web service composition using cpn model
Test sequences for web service composition using cpn model
 
Project phoenix
Project phoenixProject phoenix
Project phoenix
 
Integrating profiling into mde compilers
Integrating profiling into mde compilersIntegrating profiling into mde compilers
Integrating profiling into mde compilers
 
Maximum likelihood estimation-assisted ASVSF through state covariance-based 2...
Maximum likelihood estimation-assisted ASVSF through state covariance-based 2...Maximum likelihood estimation-assisted ASVSF through state covariance-based 2...
Maximum likelihood estimation-assisted ASVSF through state covariance-based 2...
 
Data mining
Data miningData mining
Data mining
 
THE UNIFIED APPROACH FOR ORGANIZATIONAL NETWORK VULNERABILITY ASSESSMENT
THE UNIFIED APPROACH FOR ORGANIZATIONAL NETWORK VULNERABILITY ASSESSMENTTHE UNIFIED APPROACH FOR ORGANIZATIONAL NETWORK VULNERABILITY ASSESSMENT
THE UNIFIED APPROACH FOR ORGANIZATIONAL NETWORK VULNERABILITY ASSESSMENT
 
Integrative Model for Quantitative Evaluation of Selection Telecommunication ...
Integrative Model for Quantitative Evaluation of Selection Telecommunication ...Integrative Model for Quantitative Evaluation of Selection Telecommunication ...
Integrative Model for Quantitative Evaluation of Selection Telecommunication ...
 
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISONSTATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
 

Similar to Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Reconfigurable Logic

HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGcscpconf
 
HW/SW Partitioning Approach on Reconfigurable Multimedia System on Chip
HW/SW Partitioning Approach on Reconfigurable Multimedia System on ChipHW/SW Partitioning Approach on Reconfigurable Multimedia System on Chip
HW/SW Partitioning Approach on Reconfigurable Multimedia System on ChipCSCJournals
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...Absi Ahmed
 
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...IEEEGLOBALSOFTTECHNOLOGIES
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
 
A novel methodology for task distribution
A novel methodology for task distributionA novel methodology for task distribution
A novel methodology for task distributionijesajournal
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...ijesajournal
 
11.concept for a web map implementation with faster query response
11.concept for a web map implementation with faster query response11.concept for a web map implementation with faster query response
11.concept for a web map implementation with faster query responseAlexander Decker
 
Concept for a web map implementation with faster query response
Concept for a web map implementation with faster query responseConcept for a web map implementation with faster query response
Concept for a web map implementation with faster query responseAlexander Decker
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...ijgca
 
Evolutionary Multi-Goal Workflow Progress in Shade
Evolutionary  Multi-Goal Workflow Progress in ShadeEvolutionary  Multi-Goal Workflow Progress in Shade
Evolutionary Multi-Goal Workflow Progress in ShadeIRJET Journal
 
High Performance Computing for Satellite Image Processing and Analyzing – A ...
High Performance Computing for Satellite Image  Processing and Analyzing – A ...High Performance Computing for Satellite Image  Processing and Analyzing – A ...
High Performance Computing for Satellite Image Processing and Analyzing – A ...Editor IJCATR
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...ijassn
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...ijassn
 
A Transfer Learning Approach to Traffic Sign Recognition
A Transfer Learning Approach to Traffic Sign RecognitionA Transfer Learning Approach to Traffic Sign Recognition
A Transfer Learning Approach to Traffic Sign RecognitionIRJET Journal
 
Software effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN networkSoftware effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN networkIOSR Journals
 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...ijcsit
 

Similar to Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Reconfigurable Logic (20)

HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
 
HW/SW Partitioning Approach on Reconfigurable Multimedia System on Chip
HW/SW Partitioning Approach on Reconfigurable Multimedia System on ChipHW/SW Partitioning Approach on Reconfigurable Multimedia System on Chip
HW/SW Partitioning Approach on Reconfigurable Multimedia System on Chip
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
 
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
 
A novel methodology for task distribution
A novel methodology for task distributionA novel methodology for task distribution
A novel methodology for task distribution
 
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
A NOVEL METHODOLOGY FOR TASK DISTRIBUTION IN HETEROGENEOUS RECONFIGURABLE COM...
 
11.concept for a web map implementation with faster query response
11.concept for a web map implementation with faster query response11.concept for a web map implementation with faster query response
11.concept for a web map implementation with faster query response
 
Concept for a web map implementation with faster query response
Concept for a web map implementation with faster query responseConcept for a web map implementation with faster query response
Concept for a web map implementation with faster query response
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
 
Evolutionary Multi-Goal Workflow Progress in Shade
Evolutionary  Multi-Goal Workflow Progress in ShadeEvolutionary  Multi-Goal Workflow Progress in Shade
Evolutionary Multi-Goal Workflow Progress in Shade
 
A Survey and Comparison of SDN Based Traffic Management Techniques
A Survey and Comparison of SDN Based Traffic Management TechniquesA Survey and Comparison of SDN Based Traffic Management Techniques
A Survey and Comparison of SDN Based Traffic Management Techniques
 
High Performance Computing for Satellite Image Processing and Analyzing – A ...
High Performance Computing for Satellite Image  Processing and Analyzing – A ...High Performance Computing for Satellite Image  Processing and Analyzing – A ...
High Performance Computing for Satellite Image Processing and Analyzing – A ...
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
 
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
CONFIGURABLE TASK MAPPING FOR MULTIPLE OBJECTIVES IN MACRO-PROGRAMMING OF WIR...
 
A Transfer Learning Approach to Traffic Sign Recognition
A Transfer Learning Approach to Traffic Sign RecognitionA Transfer Learning Approach to Traffic Sign Recognition
A Transfer Learning Approach to Traffic Sign Recognition
 
Software effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN networkSoftware effort estimation through clustering techniques of RBFN network
Software effort estimation through clustering techniques of RBFN network
 
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
Estimation of Optimized Energy and Latency Constraint for Task Allocation in ...
 

More from TELKOMNIKA JOURNAL

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...TELKOMNIKA JOURNAL
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...TELKOMNIKA JOURNAL
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...TELKOMNIKA JOURNAL
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...TELKOMNIKA JOURNAL
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaTELKOMNIKA JOURNAL
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireTELKOMNIKA JOURNAL
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkTELKOMNIKA JOURNAL
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsTELKOMNIKA JOURNAL
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...TELKOMNIKA JOURNAL
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesTELKOMNIKA JOURNAL
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...TELKOMNIKA JOURNAL
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemTELKOMNIKA JOURNAL
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...TELKOMNIKA JOURNAL
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorTELKOMNIKA JOURNAL
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...TELKOMNIKA JOURNAL
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...TELKOMNIKA JOURNAL
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksTELKOMNIKA JOURNAL
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingTELKOMNIKA JOURNAL
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...TELKOMNIKA JOURNAL
 

More from TELKOMNIKA JOURNAL (20)

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antenna
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fire
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio network
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bands
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertainties
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensor
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networks
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imaging
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
 

Recently uploaded

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Reconfigurable Logic

  • 1. TELKOMNIKA, Vol. 15, No. 3, September 2017, pp. 1040 ∼ 1047 ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013 DOI: 10.12928/telkomnika.v15.i3.6513 1040 Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Reconfigurable Logic Jia Wei Tang* , Yuan Wen Hau** , Nasir Shaikh-Husin* , and Muhammad Nadzir Marsono* * Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor, Malaysia ** IJN-UTM Cardiovascular Engineering Centre, Faculty of Biosciences and Medical Engineering, Universiti Teknologi Malaysia, Johor, Malaysia * Corresponding author, e-mail: jwtang2@live.utm.my Abstract In network-on-chip (NoC) based multi-processor system-on-chip (MPSoC) development, applica- tion profiling is one of the most crucial step during design time to search and explore optimal mapping. Conventional mapping exploration methodologies analyse application-specific graphs by estimating its run- time behaviour using analytical or simulation models. However, the former does not replicate the actual application run-time performance while the latter requires significant amount of time for exploration. To map applications on a specific MPSoC platform, the application behaviour on cycle-accurate emulated platform should be considered for obtaining better mapping quality. This paper proposes an application mapping methodology that utilizes a MPSoC prototyped in Field-Programmable Gate Array (FPGA). Applications are implemented on homogeneous MPSoC cores and their costs are analysed and profiled on the platform in term of execution time, intra-core communication and inter-core communication delays. These metrics are utilized in analytical evaluation of the application mapping. The proposed analytical-based mapping is demonstrated against the exhaustive brute force method. Results show that the proposed method is able to produce quality mappings compared to the ground truth solutions but in shorter evaluation time. Keywords: Embedded system, application mapping, multi-processor system-on-chips (MPSoCs), through- put constraint. Copyright c 2017 Universitas Ahmad Dahlan. All rights reserved. 1. Introduction Multi-Processor System-on-Chips (MPSoCs) is commonly used in modern embedded systems such as smartphones and tablets as well as in other diverse fields such as multimedia and networking applications to perform time-constrained tasks. Mapping of time critical applica- tions is a crucial step in design time to provide the highest possible system performance. The problem of mapping applications onto MPSoC is one of the biggest challenge embedded system programmers face nowadays [1–3]. An application is commonly specified as a data-flow graph consisting of tasks that communicate between them, on the other hand, an MPSoC platform is described by the number of cores, the type of cores (in heterogeneous platform), and their com- munication topology or infrastructure. The design space exploration process takes the application graph as input and assigns all tasks to processing cores on the desired MPSoC platform to find the best performance mapping. Methods for evaluating each mapping performance in design space can be categorized into three types [4]: estimation from analytical model, measurement from simulation model, and measurement from implementation prototype. Analytical estimation is the fastest evalua- tion method but with limited accuracy as it does not sufficiently replicate the system behaviour. Simulation-based measurement exists between two extremes, i.e. slow with high accuracy and fast but less accurate. On the contrary, prototype-based evaluation provides the highest accuracy but requires long development time. Received February 27, 2017; Revised June 28, 2017; Accepted July 21, 2017
  • 2. TELKOMNIKA ISSN: 1693-6930 1041 In most cases, a certain application is desired to be executed on a specific MPSoC plat- form. As the task execution and communication delays vary for different MPSoC platforms, map- ping that solely depends on the costs for executing application specification behaviour may not be sufficient to produce high quality mapping when considering the MPSoC physical interconnect fabric. The behaviour on the platform must be considered and profiled thoroughly to aid applica- tion mapping exploration process. Hence, an application profiled in a prototype-based platform can improve profiling precision over the analytical-based mapping while having fast evaluation speed. This paper presents a application mapping methodology that utilizes field-programmable gate array (FPGA) emulation as the evaluation platform to optimize the application mapping for the highest throughput. Applications are analysed and profiled on the emulation platform to estimate their implemented costs in term of execution, intra-core communication and inter-core commu- nication delays. The profiled application costs are used in improving the accuracy of analytical- based mapping. The proposed methodology is demonstrated by mapping several test applications on a 3×3 meshed network-on-chip (NoC) based MPSoC prototyped in FPGA. The rest of this paper is organized as follows. Section 2. presents the related works in application mapping. Section 3. discusses the proposed throughput aware application mapping methodology including the execution trace analysis, platform parameters extraction and mapping exploration steps. Section 4. presents the result of the proposed mapping approach on different test cases and Section 5. concludes this paper. 2. Related works Task mapping technique can be classified into design time mapping, run-time mapping and hybrid mapping. In design time mapping [5–9], the mapping process is performed off-line to facilitate better decision in using system resources. Design time mapping is suitable for static workload scenarios having a predefined task graph and static platform (i.e., both applications and platform do not change in time). Run-time mapping [10–13] on the contrary, is able to handle dynamism of run-time workload such as insertion or removing new applications into or from the system, as well as the dynamic platform behaviour due to run-time faults. However, run-time task mapping is performed using limited on-line processing resource while making a quick decision without sufficient mapping exploration, thus may produce low-quality mapping. In addition, hybrid mapping has also been proposed in many different works [14–19] by combining design time and run-time techniques. Hybrid mapping explores best mapping of tasks during design time for different scenarios (constraints conditions) while chooses the most suitable mapping at run-time. However, these conventional approaches in application mapping assume a specific well-defined program behaviour, often specified in task graph and does not consider platform dependent parameters, such as the changes in communication delay and communication volume during dynamic mapping exploration. Most mapping methodologies in literatures e.g. [5–13] assumed a specific program exe- cution while the communication cost was considered constant across platform. As practical be- haviour of applications may differ compared to its specified task graph, there are different methods in literatures extracting application run-time metrics prior to the mapping exploration. Trace anal- ysis technique was used in [20] to find the execution behavior for each mapping and examine their differences. Singh et al. [21] used the application behaviour from trace analysis to optimally map multiple applications on the same MPSoC. To our best knowledge, there is no work that profiles and maps application based on execution, intra-core and inter-core communication on an emulation platform. 3. Proposed Application Mapping Methodology The flow of the proposed application mapping approach is depicted in Figure 1. Unlike conventional mapping methodology which solely depends on the costs specified in application task graph, the proposed approach considers the application practical behaviour to better estimate Application Profiling and Mapping on NoC-based MPSoC ... (Jia Wei Tang)
  • 3. 1042 ISSN: 1693-6930 the mapping performance. The proposed methodology for application mapping are as follows: 1. Application Profiling – The predefined application are executed on the targeted platform to measure its practical implementation costs. Application are profiled in term of execution and communication costs in time delay (cycles). 2. Application Mapping – Application mapping design space exploration is performed to op- timize overall throughput. In order to evaluate each mapping performance, execution traces are analysed. An analytical model is formulated based on the profiled application costs are used to estimate the mapping performance during the design space exploration. Application Profiling Application t0 t1 t2 t3 t4 MPSoC in FPGA Application Specification Platform Specification Mapping Design Space Exploration Application Profiles Execution Traces Analysis execution, x intra-core comm., ca inter-core comm., ce emulate Current Mapping Core 1: Core 2: Core 3: time exec. delayxi cij comm. delay Analytic Evaluation Best Mapping? yes Next Mapping no Optimal Mapping Figure 1. Proposed application mapping using application profiled in FPGA-emulated MPSoC. 3.1. Application and Platform Model Task graph of an application can be characterized by a directed graph (DG), G = (T, E), where T = {t1, t2, ..., tn} is the set of tasks in the application and E = {e1, e2, ..., en} is the set of directed edges representing dependencies of tasks as shown in Figure 2. Each task, ti contains an xi denoting the worst-case execution time for the task and is assumed to remain fixed during the execution. Each edge, ek includes vij which represents the communication volume between tasks ti and tj. As designing the entire NoC-based MPSoC platform is out of scope of this work, the employed platform is generated using ProNoC tool [22]. The overview of the NoC-based MPSoC platform used in this work is depicted in Figure 3. Figure 3a illustrates the architecture for each core (tile) in the MPSoC platform. Distributed memory architecture is implemented, where each core consists of a processor, a local memory, and a network interface (NI). Mapped tasks in each core are stored and executed from the local memory, while cores inter-communicate with other cores via network interfaces. All cores are homogeneous and connected to each other through the NoC interconnect fabric. TELKOMNIKA Vol. 15, No. 3, September 2017 : 1040 ∼ 1047
  • 4. TELKOMNIKA ISSN: 1693-6930 1043 Figure 2. Example of an application model containing 6 tasks. (a) Each core consists of a processor, local memory, and network interface in MPSoC platform model Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 8 R R R R R R R R R (b) 3×3 mesh based NoC intercon- nect. Figure 3. MPSoC platform model for profiling, mapping and emulating application in FPGA. Mapping of an application is the process of assigning its tasks into the available cores on MPSoC platform. Cooperative (non pre-emptive) multi-tasking is employed in each core, where one core can execute more than one task. Communicating tasks use message passing interface (MPI) protocol to send and receive data and control packets. Hence, communication between tasks that are mapped to the same core also require message copying delay that is considered as intra-core communication. On the other hand, inter-core communication is defined as the communication between tasks that are mapped in different cores. 3.2. Application Profiling Application profiling is the process to extract practical task execution and communication delays of an application when emulated in FPGA. The aim of performing application profiling is to measure the effect of task execution and communication on the overall system throughput. Execution time, xi of each task ti is defined as the time required to complete a task after it receives complete data (or token) during one execution period. Task execution time of an application can be easily obtained by capturing the time difference between the starting and ending of a task’s execution. As the execution time on any homogeneous core is equal regardless of the mapping, the maximum execution delays of all tasks can be recorded by mapping all tasks into one core. On the contrary, communication time, cij is divided into two types represented by a two- tuple < ca, ce >, with ca and ce denote the intra-core and inter-core delays, respectively. As application as specified in a task graph contains only communication volume in each connecting task, profiling each connecting core has to be done to measure the practical communication time. In order to profile each intra-core communication delay, the two connecting tasks are isolated and executed in the same core. The intra-core communication delay is measured as the throughput difference between executing both tasks independently without communication and with com- munication. On the contrary, both tasks are mapped in neighbouring cores to profile inter-core communication delay. After profiling, the application graph is re-constructed with profiled spec- ification as illustrated in Figure 4. Although the actual delays of the applications may differ for different mappings (maybe due to the multi-task overhead or communication infrastructure de- Application Profiling and Mapping on NoC-based MPSoC ... (Jia Wei Tang)
  • 5. 1044 ISSN: 1693-6930 lay such as contention), the profiled costs can be used as the basis to estimate the application run-time behaviour. x0 x1 c01=<ca, ce>01 t0 t1 e01 profiling Figure 4. Application graph re-construction after profiling execution and communication time on emulated FPGA. 3.3. Mapping Design Space Exploration After application profiling, analytical-based mapping utilizes the profiled costs to optimize the application throughput. Execution traces of each mapping is analysed to find the estimation of application run-time behaviour. Each mapping is evaluated using analytical expression to cal- culate the throughput using the profiled costs. Mapping exploration process is iterative until the best mapping is obtained. 3.3.1. Execution Traces Analysis Execution traces of a mapped applications are analysed to obtain the application through- put. Task parallelism and temporal parallelism are considered in the execution traces analysis. The former is the ability to allow parallel execution of different tasks while the latter allows parallel execution of different temporal data in different cores at the same time. For instance, Figure 5 depicts the execution traces of an application having 6 tasks, as shown in Figure 2, mapped onto a 4-core system. All 4 cores are communicating with each other while executing their assigned tasks in parallel as soon as their data are available. Task parallelism occurs for task t1 and t2, taking x1 and x2 delays to be executed in core 1 and core 2, respectively. Data parallelism is demonstrated through the execution of different temporal data (e.g data 1 and data 2) at different time instances. As core 4 requires the longest time to compute one result, the period (inverse of throughput) is determined by the combination of execution delay and communication delay. Core 1: Core 2: Core 3: Core 4: x0 c01 x1c02 x2c02 c13 c23 c13 c23 c24 x3 c35 c24 x4 c35 x5 x0 c01 x1c02 x2c02 c13 c23 c13 c23 c24 x3 c35 c24 x4 c35 x5 x0 c01 x1c02 x2c02 c13 c23 c13 c23 c24 x3 c35 c24 x4 c35 e5 exec. delay for task i data 1 data 2 data 3 xi comm. delay for task i to task jcij time period periodperiod Figure 5. Execution traces of a 6-tasks applications running on 4 cores system. Application throughput is reciprocal of one period (i.e, the average time needed to com- plete one iteration of the application). Each core utilizes the time period to execute its assigned task as well as to communicate with other cores. Since the tasks in each application can be ex- ecuted in pipeline and task-parallel in different cores, the overall throughput is determined by the slowest core (i.e, the core that requires the longest time to complete executions and communica- tions of all its mapped tasks). TELKOMNIKA Vol. 15, No. 3, September 2017 : 1040 ∼ 1047
  • 6. TELKOMNIKA ISSN: 1693-6930 1045 3.3.2. Analytical Evaluation Given a graph representation of an application with Nt tasks, mapping exploration in this work is the process of assigning all tasks into available Nc cores to maximize the system throughput. Based on the execution trace analysis, throughput of a mapped application is the reciprocal of period. The period is defined as the maximum total required time (among all cores) for execution and communication, and can be mathematically formulated in Equation 1. The total required time in each core is calculated by the summation of the profiled execution delays of all individual assigned tasks as well as all the profiled intra-core and inter-core communication delays associated to it. 1 throughput = period = max 1≤k≤NC (Xk + Ck) = max 1≤k≤NC ( NT i=0 xi.mtik + NT i=0 NT j=0 caij.maijk + NT i=0 NT j=0 ceij.meijk) (1) where i and j are task identifiers while k is the core identifier. Xk denotes the total execution delays of mapped tasks in core k, which is the summation of the profiled execution delay xi if task ti is mapped to core k (mtik = 1). Ck denotes the total communication delays associated to core k, which is the summation of all profiled intra-core communication delay caij if intra-core commu- nication of task ti and tj exist in core k (maijk = 1), and all profiled inter-core communication delay ceij if inter-core communication of task ti and tj is associated to core k (meijk = 1). 4. Results and Discussion The proposed analytical-based mapping is demonstrated by employing the exhaustive brute force method to map several random application graphs and find the mapping that produces the best application throughput. The best mapping obtained by the proposed analytical-based exhaustive (AE) are compared with emulation-based exhaustive (EE) solutions (i.e., ground truth) which is obtained by executing all mappings on a FPGA emulation system. Figure 6a shows the throughput for all possible mapping for the proposed analytical- based application mapping and emulation-based mapping. The proposed analytical-based map- ping produces results with similar trend with the ground truth solution. Figure 6b depicts the comparison between emulation-based throughput and analytical-based solution. The proposed analytical-based mapping throughput is almost similar with emulation-based throughput as the points scatter nearby the straight line of gradient one. Table 1 illustrates the comparison between proposed analytical-based and emulation- based exhaustive (ground truth) for application of different random graphs. The best throughput (BT) is obtained by emulating the best mapping produced by both methods on FPGA. The eval- uation time (ET) is the time taken to perform the exhaustive evaluation on all possible mappings while profiling time is the time taken to profile the tasks based on the proposed method. Results illustrate that the evaluation time for both exhaustive mapping increases exponentially with the number of possible mappings. The proposed analytical-based method is able to evaluates all mappings in much shorter time despite requiring extra profiling time. However, the profiling time does not increase with number of mappings as it corresponds to the number of profiles required, that is the total number of tasks and edges in application graphs. 5. Conclusion Efficient mapping algorithm produces high quality solution in fast evaluation time. This paper presents a application mapping methodology that utilizes FPGA emulation as an evaluation platform to optimize the application throughput. Applications are analysed and profiled in the Application Profiling and Mapping on NoC-based MPSoC ... (Jia Wei Tang)
  • 7. 1046 ISSN: 1693-6930 0 50 100 150 200 250 300 100000 200000 300000 400000 500000 mappings (sorted with EE) 1/throughput(cycles) AE EE (a) Throughput of all possible mapping for the proposed analytical-based exhaustive (AE) and emulation-based ex- haustive (EE) mapping (sorted based on EE). 100000 200000 300000 400000 500000 150000 200000 250000 300000 350000 400000 450000 500000 EE 1/throughput (cycles) AE1/throughput(cycles) (b) Emulation-based versus analytical-based exhaustive throughput. Figure 6. Throughput comparison between emulation-based exhaustive (EE) and proposed analytical-based exhaustive (AE) mapping in design space exploration of a 7-task application. Table 1. Comparison in throughput and evaluation time between proposed analytical-based (AE) and emulation-based exhaustive (EE) for application of different random graphs Applications No. of EE Proposed AE Tasks BT ET BT PT ET random1(r7) 5 87 607.61 s 87 252.21 s 0.03 s random2(r8) 6 401 2912.33 s 401 296.10 s 0.15 s random4(r2) 7 106 15745.43 s 106 309.34 s 0.96 s random5(r4) 7 875 16143.52 s 875 276.82 s 0.91 s BT: Best Throughput ET: Evaluation Time PT: Profiling Time emulated environment to obtain their practical implemented costs in term of execution, intra-core communication and inter-core communication delays. Application mapping utilizes the profiled costs find optimal mapping through execution trace analysis and analytical evaluation. Results show that the proposed analytical-based exhaustive based on the profiling can produce similar best throughput compared to the ground truth solutions but in shorter evaluation time. Acknowledgement This work is supported in part by the Collaborative Research in Engineering, Science & Technology (CREST) grant P17C114 (UTM vote no. 4B176) and Universiti Teknologi Malaysia Matching grant (UTM vote no. 00M75). References [1] R. Marculescu, U. Y. Ogras, L.-S. Peh, N. E. Jerger, and Y. Hoskote, “Outstanding research problems in noc design: system, microarchitecture, and circuit perspectives,” IEEE Transac- tions on Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 1, pp. 3–21, 2009. [2] A. K. Singh, M. Shafique, A. Kumar, and J. Henkel, “Mapping on multi/many-core systems: survey of current and emerging trends,” in Proceedings of the 50th Annual Design Automation Conference. ACM, 2013, p. 1. [3] P. K. Sahu and S. Chattopadhyay, “A survey on application mapping strategies for network- on-chip design,” Journal of Systems Architecture, vol. 59, no. 1, pp. 60–76, 2013. [4] R. Piscitelli and A. D. Pimentel, “Design space pruning through hybrid analysis in system- level design space exploration,” in Design, Automation & Test in Europe Conference & Exhi- TELKOMNIKA Vol. 15, No. 3, September 2017 : 1040 ∼ 1047
  • 8. TELKOMNIKA ISSN: 1693-6930 1047 bition (DATE), 2012. IEEE, 2012, pp. 781–786. [5] J. Hu and R. Marculescu, “Energy-and performance-aware mapping for regular noc archi- tectures,” IEEE Transactions on computer-aided design of integrated circuits and systems, vol. 24, no. 4, pp. 551–562, 2005. [6] K. Srinivasan, K. S. Chatha, and G. Konjevod, “Linear-programming-based techniques for synthesis of network-on-chip architectures,” IEEE Transactions on Very Large Scale Integra- tion (VLSI) Systems, vol. 14, no. 4, pp. 407–420, 2006. [7] S. Tosun, “Cluster-based application mapping method for network-on-chip,” Advances in En- gineering Software, vol. 42, no. 10, pp. 868–874, 2011. [8] P. K. Sahu, T. Shah, K. Manna, and S. Chattopadhyay, “Application mapping onto mesh- based network-on-chip using discrete particle swarm optimization,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, no. 2, pp. 300–312, 2014. [9] Y. Z. Tei, Y. W. Hau, N. Shaikh-Husin, and M. N. Marsono, “Network partitioning domain knowledge multiobjective application mapping for large-scale network-on-chip,” Applied Com- putational Intelligence and Soft Computing, vol. 2014, p. 9, 2014. [10] E. L. de Souza Carvalho, N. L. V. Calazans, and F. G. Moraes, “Dynamic task mapping for mpsocs,” IEEE Design & Test of Computers, vol. 27, no. 5, pp. 26–35, 2010. [11] A. K. Singh, T. Srikanthan, A. Kumar, and W. Jigang, “Communication-aware heuristics for run-time task mapping on noc-based mpsoc platforms,” Journal of Systems Architecture, vol. 56, no. 7, pp. 242–255, 2010. [12] M. Fattah, A.-M. Rahmani, T. C. Xu, A. Kanduri, P. Liljeberg, J. Plosila, and H. Tenhunen, “Mixed-criticality run-time task mapping for noc-based many-core systems,” in 2014 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). IEEE, 2014, pp. 458–465. [13] T. D. Ngo, K. J. Martin, and J.-P. Diguet, “Move based algorithm for runtime mapping of dataflow actors on heterogeneous mpsocs,” Journal of Signal Processing Systems, pp. 1– 18, 2015. [14] S. Stuijk, M. Geilen, and T. Basten, “A predictable multiprocessor design flow for streaming applications with dynamic behaviour,” in 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD). IEEE, 2010, pp. 548–555. [15] L. Schor, I. Bacivarov, D. Rai, H. Yang, S.-H. Kang, and L. Thiele, “Scenario-based design flow for mapping streaming applications onto on-chip many-core systems,” in Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems. ACM, 2012, pp. 71–80. [16] C. Lee, S. Kim, and S. Ha, “Efficient run-time resource management of a manycore accel- erator for stream-based applications,” in 2013 IEEE 11th symposium on Embedded systems for real-time multimedia (ESTIMedia). IEEE, 2013, pp. 51–60. [17] W. Quan and A. D. Pimentel, “A scenario-based run-time task mapping algorithm for mpsocs,” in Proceedings of the 50th Annual Design Automation Conference. ACM, 2013, p. 131. [18] A. K. Singh, A. Kumar, and T. Srikanthan, “Accelerating throughput-aware runtime mapping for heterogeneous mpsocs,” ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 18, no. 1, p. 9, 2013. [19] W. Quan and A. D. Pimentel, “A hybrid task mapping algorithm for heterogeneous mpsocs,” ACM Transactions on Embedded Computing Systems (TECS), vol. 14, no. 1, p. 14, 2015. [20] A. Goens and J. Castrillon, “Analysis of process traces for mapping dynamic kpn applications to mpsocs,” in IFIP International Embedded Systems Symposium (IESS), 2015. [21] A. K. Singh, M. Shafique, A. Kumar, and J. Henkel, “Resource and throughput aware ex- ecution trace analysis for efficient run-time mapping on mpsocs,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 35, no. 1, pp. 72–85, 2016. [22] ProNoC. (2016) Prototype NoC based MPSoC. Opencore. [Online]. Available: http://opencores.org/project,an-fpga-implementation -of-low-latency- noc-based-mpsoc Application Profiling and Mapping on NoC-based MPSoC ... (Jia Wei Tang)