The advent of multi-core architecture has highly influenced the area of high performance computing.
Parallel compilation is the area which still needs significant improvement by the use of this architecture.
Recent research has shown some improvement in lexical analysis phase. But it is difficult to implement the
same technique in parsing phase. This paper highlights some issues related to implementation of parallel
parsing on multi-core machines.
The advent of multi-core architecture has highly influenced the area of high performance computing.
Parallel compilation is the area which still needs significant improvement by the use of this architecture.
Recent research has shown some improvement in lexical analysis phase. But it is difficult to implement the
same technique in parsing phase. This paper highlights some issues related to implementation of parallel
parsing on multi-core machines.
Towards high performance computing(hpc) through parallel programming paradigm...ijpla
Â
Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a lot of variations in perceiving the importance of parallelism in computing machines. And it is observed that the parallel computing is a superior solution to many of the computing limitations like speed and density; non-recurring and high cost; and power consumption and heat dissipation etc. The commercial multiprocessors have emerged with lower prices than the mainframe machines and supercomputers machines. In this article the high performance computing (HPC) through parallel programming paradigms (PPPs) are discussed with their constructs and design approaches.
Concurrent Matrix Multiplication on Multi-core ProcessorsCSCJournals
Â
With the advent of multi-cores every processor has built-in parallel computational power and that can only be fully utilized only if the program in execution is written accordingly. This study is a part of an on-going research for designing of a new parallel programming model for multi-core architectures. In this paper we have presented a simple, highly efficient and scalable implementation of a common matrix multiplication algorithm using a newly developed parallel programming model SPC3 PM for general purpose multi-core processors. From our study it is found that matrix multiplication done concurrently on multi-cores using SPC3 PM requires much less execution time than that required using the present standard parallel programming environments like OpenMP. Our approach also shows scalability, better and uniform speedup and better utilization of available cores than that the algorithm written using standard OpenMP or similar parallel programming tools. We have tested our approach for up to 24 cores with different matrices size varying from 100 x 100 to 10000 x 10000 elements. And for all these tests our proposed approach has shown much improved performance and scalability
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...CSCJournals
Â
This paper proposes a parallel approach for the Vector Quantization (VQ) problem in image processing. VQ deals with codebook generation from the input training data set and replacement of any arbitrary data with the nearest codevector. Most of the efforts in VQ have been directed towards designing parallel search algorithms for the codebook, and little has hitherto been done in evolving a parallelized procedure to obtain an optimum codebook. This parallel algorithm addresses the problem of designing an optimum codebook using the traditional LBG type of vector quantization algorithm for shared memory systems and for the efficient usage of parallel processors. Using the codebook formed from a training set, any arbitrary input data is replaced with the nearest codevector from the codebook. The effectiveness of the proposed algorithm is indicated.
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
Â
In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETSIJCNCJournal
Â
The article considers a generation mechanism of compositional models simulating imperative programs behavior in terms of Petri nets. The mechanism of program models generation consists of two main stages. At the first stage, the structure of the program is prepared using such program elements like: libraries, functions and links between functions. At the second stage, the content of function bodies is generated on the base of template constructions. In the article some semantic constructions template examples of imperative programming language with their descriptions are given, and a generation example of a program model in terms of Petri nets is demonstrated
The advent of multi-core architecture has highly influenced the area of high performance computing.
Parallel compilation is the area which still needs significant improvement by the use of this architecture.
Recent research has shown some improvement in lexical analysis phase. But it is difficult to implement the
same technique in parsing phase. This paper highlights some issues related to implementation of parallel
parsing on multi-core machines.
Towards high performance computing(hpc) through parallel programming paradigm...ijpla
Â
Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a lot of variations in perceiving the importance of parallelism in computing machines. And it is observed that the parallel computing is a superior solution to many of the computing limitations like speed and density; non-recurring and high cost; and power consumption and heat dissipation etc. The commercial multiprocessors have emerged with lower prices than the mainframe machines and supercomputers machines. In this article the high performance computing (HPC) through parallel programming paradigms (PPPs) are discussed with their constructs and design approaches.
Concurrent Matrix Multiplication on Multi-core ProcessorsCSCJournals
Â
With the advent of multi-cores every processor has built-in parallel computational power and that can only be fully utilized only if the program in execution is written accordingly. This study is a part of an on-going research for designing of a new parallel programming model for multi-core architectures. In this paper we have presented a simple, highly efficient and scalable implementation of a common matrix multiplication algorithm using a newly developed parallel programming model SPC3 PM for general purpose multi-core processors. From our study it is found that matrix multiplication done concurrently on multi-cores using SPC3 PM requires much less execution time than that required using the present standard parallel programming environments like OpenMP. Our approach also shows scalability, better and uniform speedup and better utilization of available cores than that the algorithm written using standard OpenMP or similar parallel programming tools. We have tested our approach for up to 24 cores with different matrices size varying from 100 x 100 to 10000 x 10000 elements. And for all these tests our proposed approach has shown much improved performance and scalability
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...CSCJournals
Â
This paper proposes a parallel approach for the Vector Quantization (VQ) problem in image processing. VQ deals with codebook generation from the input training data set and replacement of any arbitrary data with the nearest codevector. Most of the efforts in VQ have been directed towards designing parallel search algorithms for the codebook, and little has hitherto been done in evolving a parallelized procedure to obtain an optimum codebook. This parallel algorithm addresses the problem of designing an optimum codebook using the traditional LBG type of vector quantization algorithm for shared memory systems and for the efficient usage of parallel processors. Using the codebook formed from a training set, any arbitrary input data is replaced with the nearest codevector from the codebook. The effectiveness of the proposed algorithm is indicated.
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
Â
In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETSIJCNCJournal
Â
The article considers a generation mechanism of compositional models simulating imperative programs behavior in terms of Petri nets. The mechanism of program models generation consists of two main stages. At the first stage, the structure of the program is prepared using such program elements like: libraries, functions and links between functions. At the second stage, the content of function bodies is generated on the base of template constructions. In the article some semantic constructions template examples of imperative programming language with their descriptions are given, and a generation example of a program model in terms of Petri nets is demonstrated
Over time, Machine Learning inference workloads became more and more demanding in terms of latency and throughput. Moreover, many inference workloads compute predictions based on a limited number of models that are deployed in the system. This scenario provides large rooms for optimizations of runtime and memory, which current systems fall short in exploring because they employ a black-box model of ML models and tasks.
On the opposite side, Pretzel adopts a white-box description of ML models, which allows the framework to perform optimizations over deployed models and running tasks, saving memory and increasing the overall system performance. In particular, Pretzel can properly schedule ML jobs on NUMA machines, whose complexities may impact latencies and efficiency aspects.
In this talk we will show the motivations behind Pretzel, its current design and possible future developments.
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGcscpconf
Â
Parallel computing systems compose task partitioning strategies in a true multiprocessing
manner. Such systems share the algorithm and processing unit as computing resources which
leads to highly inter process communications capabilities. The main part of the proposed
algorithm is resource management unit which performs task partitioning and co-scheduling .In
this paper, we present a technique for integrated task partitioning and co-scheduling on the
privately owned network. We focus on real-time and non preemptive systems. A large variety of
experiments have been conducted on the proposed algorithm using synthetic and real tasks.
Goal of computation model is to provide a realistic representation of the costs of programming
The results show the benefit of the task partitioning. The main characteristics of our method are
optimal scheduling and strong link between partitioning, scheduling and communication. Some
important models for task partitioning are also discussed in the paper. We target the algorithm
for task partitioning which improve the inter process communication between the tasks and use
the recourses of the system in the efficient manner. The proposed algorithm contributes the
inter-process communication cost minimization amongst the executing processes.
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
Â
Infrastructure services (Infrastructure-as-a-service), provided by cloud vendors, allow any user to provision a large number of compute instances fairly easily. Whether leased from public clouds or allocated from private clouds, utilizing these virtual resources to perform data/compute intensive analyses requires employing different parallel runtimes to implement such applications. Among many parallelizable problems, most âpleasingly parallelâ applications can be performed using MapReduce technologies such as Hadoop, CGL-MapReduce, and Dryad, in a fairly easy manner. However, many scientific applications, which have complex communication patterns, still require low latency communication mechanisms and rich set of communication constructs offered by runtimes such as MPI. In this paper, we first discuss large scale data analysis using different MapReduce implementations and then, we present a performance analysis of high performance parallel applications on virtualized resources.
Affect of parallel computing on multicore processorscsandit
Â
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make
number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of
Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTELâ˘, AMDâ˘, IBM⢠etc., and the
various techniques there followed in, for improving the performance of the Multicore
Processors.
We conducted cluster experiments to find this limit. In this paper we propose an alternate design
of Multicore processor based on the results of our cluster experiment.
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORScscpconf
Â
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTELâ˘, AMDâ˘, IBM⢠etc., and the various techniques there followed in, for improving the performance of the Multicore
Processors. We conducted cluster experiments to find this limit. In this paper we propose an alternate design of Multicore processor based on the results of our cluster experiment.
OpenGL Based Testing Tool Architecture for Exascale ComputingCSCJournals
Â
In next decade, for exascale high computing power and speed, new high performance computing (HPC) architectures, algorithms and corrections in existing technologies are expected. In order to achieve HPC parallelism is becoming a core emphasizing point. Keeping in view the advantages of parallelism, GPU is a unit that provides the better performance to achieve HPC in exascale computing system. So far, many programming models have been introduced to program GPU like CUDA, OpenGL, and OpenCL etc. and still there are number of limitations for these models that are required a deep glance to fix them. In order to enhance the performance in GPU programming in OpenGL, we have proposed an OpenGL based testing tool architecture for exascale computing system. This testing architecture detects the errors from OpenGL code and enforce to write the code in accurate way.
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
Â
Sahil Gupta, Bertram LudaĚscher, Jessica Yi-Yun Cheng.
Datalog 2.0: 3rd Workshop on the Resurgence of Datalog in Academia & Industry. Philadelphia Logic Week. June 3-7, 2019 Philadelphia.
All new computers have multicore processors. To exploit this hardware parallelism for improved
performance, the predominant approach today is multithreading using shared variables and locks. This
approach has potential data races that can create a nondeterministic program. This paper presents a
promising new approach to parallel programming that is both lock-free and deterministic. The standard
forall primitive for parallel execution of for-loop iterations is extended into a more highly structured
primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared
variables (or shared collections) freely. Shared collections modified by a POP must be selected from a
special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that
govern parallel updates in a deterministic way. This paper presents an overview of a Prototype Library
that implements this POP-PAC approach for the C++ language, including performance results for two
benchmark parallel programs.
Over time, Machine Learning inference workloads became more and more demanding in terms of latency and throughput. Moreover, many inference workloads compute predictions based on a limited number of models that are deployed in the system. This scenario provides large rooms for optimizations of runtime and memory, which current systems fall short in exploring because they employ a black-box model of ML models and tasks.
On the opposite side, Pretzel adopts a white-box description of ML models, which allows the framework to perform optimizations over deployed models and running tasks, saving memory and increasing the overall system performance. In particular, Pretzel can properly schedule ML jobs on NUMA machines, whose complexities may impact latencies and efficiency aspects.
In this talk we will show the motivations behind Pretzel, its current design and possible future developments.
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGcscpconf
Â
Parallel computing systems compose task partitioning strategies in a true multiprocessing
manner. Such systems share the algorithm and processing unit as computing resources which
leads to highly inter process communications capabilities. The main part of the proposed
algorithm is resource management unit which performs task partitioning and co-scheduling .In
this paper, we present a technique for integrated task partitioning and co-scheduling on the
privately owned network. We focus on real-time and non preemptive systems. A large variety of
experiments have been conducted on the proposed algorithm using synthetic and real tasks.
Goal of computation model is to provide a realistic representation of the costs of programming
The results show the benefit of the task partitioning. The main characteristics of our method are
optimal scheduling and strong link between partitioning, scheduling and communication. Some
important models for task partitioning are also discussed in the paper. We target the algorithm
for task partitioning which improve the inter process communication between the tasks and use
the recourses of the system in the efficient manner. The proposed algorithm contributes the
inter-process communication cost minimization amongst the executing processes.
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
Â
Infrastructure services (Infrastructure-as-a-service), provided by cloud vendors, allow any user to provision a large number of compute instances fairly easily. Whether leased from public clouds or allocated from private clouds, utilizing these virtual resources to perform data/compute intensive analyses requires employing different parallel runtimes to implement such applications. Among many parallelizable problems, most âpleasingly parallelâ applications can be performed using MapReduce technologies such as Hadoop, CGL-MapReduce, and Dryad, in a fairly easy manner. However, many scientific applications, which have complex communication patterns, still require low latency communication mechanisms and rich set of communication constructs offered by runtimes such as MPI. In this paper, we first discuss large scale data analysis using different MapReduce implementations and then, we present a performance analysis of high performance parallel applications on virtualized resources.
Affect of parallel computing on multicore processorscsandit
Â
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make
number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of
Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTELâ˘, AMDâ˘, IBM⢠etc., and the
various techniques there followed in, for improving the performance of the Multicore
Processors.
We conducted cluster experiments to find this limit. In this paper we propose an alternate design
of Multicore processor based on the results of our cluster experiment.
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORScscpconf
Â
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTELâ˘, AMDâ˘, IBM⢠etc., and the various techniques there followed in, for improving the performance of the Multicore
Processors. We conducted cluster experiments to find this limit. In this paper we propose an alternate design of Multicore processor based on the results of our cluster experiment.
OpenGL Based Testing Tool Architecture for Exascale ComputingCSCJournals
Â
In next decade, for exascale high computing power and speed, new high performance computing (HPC) architectures, algorithms and corrections in existing technologies are expected. In order to achieve HPC parallelism is becoming a core emphasizing point. Keeping in view the advantages of parallelism, GPU is a unit that provides the better performance to achieve HPC in exascale computing system. So far, many programming models have been introduced to program GPU like CUDA, OpenGL, and OpenCL etc. and still there are number of limitations for these models that are required a deep glance to fix them. In order to enhance the performance in GPU programming in OpenGL, we have proposed an OpenGL based testing tool architecture for exascale computing system. This testing architecture detects the errors from OpenGL code and enforce to write the code in accurate way.
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
Â
Sahil Gupta, Bertram LudaĚscher, Jessica Yi-Yun Cheng.
Datalog 2.0: 3rd Workshop on the Resurgence of Datalog in Academia & Industry. Philadelphia Logic Week. June 3-7, 2019 Philadelphia.
All new computers have multicore processors. To exploit this hardware parallelism for improved
performance, the predominant approach today is multithreading using shared variables and locks. This
approach has potential data races that can create a nondeterministic program. This paper presents a
promising new approach to parallel programming that is both lock-free and deterministic. The standard
forall primitive for parallel execution of for-loop iterations is extended into a more highly structured
primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared
variables (or shared collections) freely. Shared collections modified by a POP must be selected from a
special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that
govern parallel updates in a deterministic way. This paper presents an overview of a Prototype Library
that implements this POP-PAC approach for the C++ language, including performance results for two
benchmark parallel programs.
All new computers have multicore processors. To exploit this hardware parallelism for improved
perf
ormance, the predominant approach today is multithreading using shared variables and locks. This
approach has potential data races that can create a nondeterministic program. This paper presents a
promising new approach to parallel programming that is both
lock
-
free and deterministic. The standard
forall primitive for parallel execution of for
-
loop iterations is extended into a more highly structured
primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared
variable
s (or shared collections) freely. Shared collections modified by a POP must be selected from a
special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that
govern parallel updates in a deterministic way. This paper pre
sents an overview of a Prototype Library
that implements this POP
-
PAC approach for the C++ language, including performance results for two
benchmark parallel programs.
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
Â
Sensor routers play a crucial role in the sector of Internet of Things applications, in which the capacity for transmission of the network signal is limited from cloud systems to sensors and its reversal process. It describes a robust recognized framework with various architected layers to process data at high level synthesis. It is designed to sense the nodes instinctually with the help of Internet of Things where the applications arise in cloud systems. In this paper embedded PEs with four layer new design framework architecture is proposed to sense the devises of IOT applications with the support of high-level synthesis DBMF (database management function) tool.
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
Â
Sensor routers play a crucial role in the sector of Internet of Things applications, in which the capacity for transmission of the network signal is limited from cloud systems to sensors and its reversal process. It describes a robust recognized framework with various architected layers to process data at high level synthesis. It is designed to sense the nodes instinctually with the help of Internet of Things where the applications arise in cloud systems. In this paper embedded PEs with four layer new design framework architecture is proposed to sense the devises of IOT applications with the support of high-level synthesis DBMF (database management function) tool.
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP IJCSEIT Journal
Â
The current multi-core architectures have become popular due to performance, and efficient processing of
multiple tasks simultaneously. Todayâs the parallel algorithms are focusing on multi-core systems. The
design of parallel algorithm and performance measurement is the major issue on multi-core environment. If
one wishes to execute a single application faster, then the application must be divided into subtask or
threads to deliver desired result. Numerical problems, especially the solution of linear system of equation
have many applications in science and engineering. This paper describes and analyzes the parallel
algorithms for computing the solution of dense system of linear equations, and to approximately compute
the value of Ď using OpenMP interface. The performances (speedup) of parallel algorithms on multi-core
system have been presented. The experimental results on a multi-core processor show that the proposed
parallel algorithms achieves good performance (speedup) compared to the sequential
Dominant block guided optimal cache size estimation to maximize ipc of embedd...ijesajournal
Â
Embedded system software is highly constrained from performance, memory footprint, energy consumption and implementing cost view point. It is always desirable to obtain better Instructions per Cycle (IPC). Instruction cache has major contribution in improving IPC. Cache memories are realized on the same chip where the processor is running. This considerably increases the system cost as well. Hence, it is required to maintain a trade-off between cache sizes and performance improvement offered. Determining the number of cache lines and size of cache line are important parameters for cache designing. The design space for cache is quite large. It is time taking to execute the given application with different cache sizes on an instruction set simulator (ISS) to figure out the optimal cache size. In this paper, a technique is proposed to identify a number of cache lines and cache line size for the L1 instruction cache that will offer best or nearly best IPC. Cache size is derived, at a higher abstraction level, from basic block analysis in the Low Level Virtual Machine (LLVM) environment. The cache size estimated from the LLVM environment is cross validated by simulating the set of benchmark applications with different cache sizes in SimpleScalarâs out-of-order simulator. The proposed method seems to be superior in terms of estimation accuracy and/or estimation time as compared to the existing methods for estimation of optimal cache size parameters (cache line size, number of cache lines).
Dominant block guided optimal cache size estimation to maximize ipc of embedd...ijesajournal
Â
Embedded system software is highly constrained from performance, memory footprint, energy consumption
and implementing cost view point. It is always desirable to obtain better Instructions per Cycle (IPC).
Instruction cache has major contribu
tion in improving IPC. Cache memories are realized on the same chip
where the processor is running. This considerably increases the system cost as well. Hence, it is required to
maintain a trade
-
off between cache sizes and performance improvement offered.
Determining the number
of cache lines and size of cache line are important parameters for cache designing. The design space for
cache is quite large. It is time taking to execute the given application with different cache sizes on an
instruction set simula
tor (ISS) to figure out the optimal cache size. In this paper, a technique is proposed to
identify a number of cache lines and cache line size for the L1 instruction cache that will offer best or
nearly best IPC. Cache size is derived, at a higher abstract
ion level, from basic block analysis in the Low
Level Virtual Machine (LLVM) environment. The cache size estimated from the LLVM environment is cross
validated by simulating the set of benchmark applications with different cache sizes in SimpleScalarâs out
-
of
-
order simulator. The proposed method seems to be superior in terms of estimation accuracy and/or
estimation time as compared to the existing methods for estimation of optimal cache size parameters (cache
line size, number of cache lines).
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ijnlc
Â
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints.
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig
Â
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...kevig
Â
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints.
Benchmarking open source deep learning frameworksIJECEIAES
Â
Deep Learning (DL) is one of the hottest ďŹelds. To foster the growth of DL, several open source frameworks appeared providing implementations of the most common DL algorithms. These frameworks vary in the algorithms they support and in the quality of their implementations. The purpose of this work is to provide a qualitative and quantitative comparison among three such frameworks: TensorFlow, Theano and CNTK. To ensure that our study is as comprehensive as possible, we consider multiple benchmark datasets from different ďŹelds (image processing, NLP, etc.) and measure the performance of the frameworksâ implementations of different DL algorithms. For most of our experiments, we ďŹnd out that CNTKâs implementations are superior to the other ones under consideration.
An octa core processor with shared memory and message-passingeSAT Journals
Â
Abstract This being the era of fast, high performance computing, there is the need of having efficient optimizations in the processor architecture and at the same time in memory hierarchy too. Each and every day, the advancement of applications in communication and multimedia systems are compelling to increase number of cores in the main processor viz., dual-core, quad-core, octa-core and so on. But, for enhancing the overall performance of multi processor chip, there are stringent requirements to improve inter-core synchronization. Thus, a MPSoC with 8-cores supporting both message-passing and shared-memory inter-core communication mechanisms is implemented on Virtex 5 LX110T FPGA. Each core is based on MIPS III (Microprocessor without interlocked pipelined stages) ISA, handling only integer type instructions and having six-stage pipeline with data hazard detection unit and forwarding logic. The eight processing cores and one central shared memory core are inter connected using 3x3 2-D mesh topology based Network-on-chip (NoC) with virtual channel router. The router is four stage pipelined supporting DOR X-Y routing algorithm and with round robin arbitration technique. For verification and functionality test of above fully synthesized multi core processor, matrix multiplication operation is mapped onto the above said. Partitioning and scheduling of multiple multiplications and addition for each element of resultant matrix has been done accordingly among eight cores to get maximum throughput. All the codes for processor design are written in Verilog HDL. Keywords: MPSoC, message-passing, shared memory, MIPS, ISA, wormhole router, network-on-chip, SIMD, data level parallelism, 2-D Mesh, virtual channel
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Richard's entangled aventures in wonderlandRichard Gill
Â
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
Â
As consumer awareness of health and wellness rises, the nutraceutical marketâwhich includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutritionâis growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Â
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The systemâs unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called âsmallâ because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
ISSUES IN IMPLEMENTATION OF PARALLEL PARSING ON MULTI-CORE MACHINES
1. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
DOI : 10.5121/ijcseit.2014.4505 51
ISSUES IN IMPLEMENTATION OF PARALLEL
PARSING ON MULTI-CORE MACHINES
Amit Barve1
and Brijendra Kumar Joshi2
1
Asst. Professor,CSE, VIIT Pune,India
2
Professor,MCTE,Mhow
ABSTRACT
The advent of multi-core architecture has highly influenced the area of high performance computing.
Parallel compilation is the area which still needs significant improvement by the use of this architecture.
Recent research has shown some improvement in lexical analysis phase. But it is difficult to implement the
same technique in parsing phase. This paper highlights some issues related to implementation of parallel
parsing on multi-core machines.
KEYWORDS
Syntax Analysis, Parallel Parsing, Multi-core Machines.
1. INTRODUCTION
Compiler is a program that translates a source language into target language. The structure of a
compiler is composed of several phases. The first phase is lexical analysis or scanning. This is the
only phase which interacts with original source code written by the programmer. It takes stream
of characters as input and generates tokens of the form {token name, attribute value} as output.
The task that does this is called lexical analyzer or scanner. Lex [1] and Flex [2] are two popular
tools for automatically generating lexical analyzers from specifications.
The information about tokens is saved in a special data structure called symbol table. These
tokens are then forwarded to the next phase i.e. syntax analysis also known as parsing. Parsing is
an important phase in compilers. This phase takes the stream of tokens as input produced by
lexical analyzer and converts them into parse trees. A parse tree is a structural representation of
grammar being parsed. The tool which performs this task is known as parser. Parser can be
automatically generated by YACC [3] and Bison[4] which take grammar specifications as input
and produce parsers.
Interaction of the lexical analyzer and the syntax analyzer is depicted in Fig. 1. The details of
various phases of a compiler can be found in popular texts [5][6][7][8].
2. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
52
Fig. 1. Interaction of Lexical Analyzer with Parser
2. PARSING TECHNIQUES
The parsing algorithms are primarily classified into two categories, top-down parsing and bottom-
up parsing. These refer to the order in which nodes in a parse tree are constructed. In top-down
approach the construction of a tree starts from root and proceeds towards the leaves while in
bottom up approach construction of a parse tree starts with leaves and proceeds towards the root.
Some well known top-down parsing algorithms are recursive decent parsing (also called
predictive parsing) and non-recursive decent parsing. Bottom-up parsing includes some
algorithms like Simple LR (SLR) parsing, Canonical LR (CLR) parsing, and Look Ahead LR
(LALR) parsing.
In LR parsing, parser reads input from left to right and generates a right most derivation in
reverse. The name LR(k) parser is also used, where k refers to the number of unconsumed look
ahead input symbols that are used in making parsing decisions. Depending on how the parsing
table is created, an LR parser can be called SLR, LALR, or CLR Parser. LALR parsers have more
language recognition power than SLR parsers. Canonical LR parsers have more recognition
power than LALR parsers. For comparison of these parsers, refer to Table 1.
3. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
53
Table 1. Comparison of parsing techniques
Parsing
Technique
No. of
Look
Ahead
tokens
No. of
Iterations
Grammar
recognition
Power
Grammar used
SLR 0 Maximum Least Powerful Context Free Grammar
CLR 1 Less than SLR Most powerful
Technique
Context Free Grammar
LALR 1 Less than
LALR
More powerful
than SLR but less
than CLR
Context Free Grammar
3. PARALLEL PARSING
Parallel parsing has been attempted by many in the past. The parallel processing was achieved by
assigning totally different user jobs to different processors. Zosel[9] focused on recognizing
FORTRAN DO-loops that can be collapsed into vector instructions for CDC 7600 machines.
Lincoln [10] first proposed the concept of parallel object code for FORTRAN and COBOL job
cards in an environment that consisted of IBM 704 uniprocessors and CDC 6500 of ILLIAC IV.
Mickunas and Shell[11] recognized the areas in a compilation process where the parallel
processing is inherent. They proposed to divide lexical analysis into scanning and screening.
They also developed a parallel parsing technique based on LR parsing. Hickey and Katcoff[12]
have analyzed parsing algorithms for upper bound on speedup whereas Cohen and Kolodner[13]
have estimated speedup in parallel parsing. Chandwani et al [14] developed a parallel algorithm
for CKY-parsing for context free grammars. Khanna et al[15] proposed the partitioning of
grammar to make it appropriate for parallel compilation. Object Oriented parsing was proposed
by Yonezmva and Oshava[16].
4. MACHINES ARCHITECTURE
Processor is a logic circuitry that responds to and processes the basic instructions that drive a
computer.
Single Core Processor is a processor that has only one core (Processor), so it can only start one
operation at a time. It can however in some situations start a new operation before the previous
one is complete.
Multi-core processor is a processing system composed of two or more independent cores. It can
be described as an integrated circuit to which two or more individual processors (called cores in
this sense) have been attached. Fig. 2 and 3 give a simplified view of single and multi-core
machines.
4. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
54
Fig. 2 Single Core Machine
Fig. 3 Multi-Core Machine
Multi-core machines have various advantages like better resource utilization, efficient data
sharing (sharing data through memory is more efficient than massage-passing), increased
performance etc [17].
The major challenges while designing a multi-core compiler are program optimization, making
parallel programming mainstream and development of performance models to support
optimization for parallel code. Compiler should be capable of self improvement [18].
5. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
55
5. IMPLEMENTATION ISSUES IN PARALLEL PARSING
The efforts cited in reference [11]-[16] to develop parallel parsing algorithms are of theoretical
significance only. Their practical implementations have not been seen so far in real programming
languages for multi-core machines because of issues discussed next.
a) Division of code and Synchronization: Barve and Joshi[19][20][21] developed
some algorithms for doing parallel lexical analysis on multi-core machines. Their
approach is to divide the source code into number of blocks and perform lexical
analysis on individual blocks. Their approach was good for parallel lexical analysis.
If we use the same approach for syntax analysis, building of a common symbol table
is an issue as multiple instances of syntax analyzer would be in action. These syntax
analyzers would generate individual symbol tables corresponding to the source code
at their disposal.
b) Processor Issues: In the past, the researchers assumed that if n processors are
available then task is divided into several parts and is assigned to any of the
available processors that do the job independently. In multi-core machines this task
can be done by the use of processor affinity concept [22][23]. To obtain higher
degree of precision in time consumption, it is required that the underlying operating
environment be attached to a single processor relieving remaining processors for
exclusive use by the parallel parsing algorithm. Binding entire operating system to a
single processor is not straightforward.
c) Threading: Threading is an essential feature of multi-core machines which enables
us to achieve parallelism. Run time libraries like PTherad[24], Thread Building
Blocks(TBB)[25] and OpenMP[26] are used for this purpose. Threading is also
responsible for performance degradation. Some time more threading takes more
times as compared to serial counterpart of the target program. So, it is essential that
threading be used only when it is required and which results in increased
performance.
d) Task Distribution: Task distribution is also an important factor which affects
performance. The distribution of tasks may be done in such way that no processor
will be free after finishing its task. Rajan et al have evaluated the performance of
such distribution on High Performance Computing (HPC) clusters [27][28][29].
e) Context Switching: System has to pay the cost when context switching is done
specially in multi-core systems. Chuanpeng Li. Et al have shown the results of
experimentally quantifying the indirect cost of context switching using a synthetic
workload. They have also measured the impact of program data size and access
stride on context switch cost [30].
6. CONCLUSION
In this paper various issues in implementation of parallel parsing algorithms on multi-core
machines were discussed. It is imperative to pay attention to synchronization among threads for
shared resources. This point has been addressed numerous times since the decades. The problem
becomes more serious as the number of core per machines and clock speed of processors
6. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
56
increase. Still a good amount of dedicated efforts is required to explore inherent property of
parallel processing present in multi-core machines targeting parsing.
REFERENCES
[1] M. E. Lesk, E. Schmidt; âLex- A Lexical Analyzer Generatorâ; Computing Science Technical Report
No. 39, Bell Laboratories, Murray Hills, New Jersey, 1975.
[2] http://flex.sourceforge.net/
[3] S. C. Johnson; âYACC: Yet Another Compiler Compilerâ; Computing Science Technical Report no
32, Bell Laboratories, Murray Hills, New Jersey, 1975.
[4] www.gnu.org/s/bison. (Last accessed on 05-Aug-2014)
[5] Alfred V. Aho, Ravi Sethi, Jeffrey D.Ullman; âPrinciples of Compiler Designâ; Addison Wesley
Publication Company, USA, 1985.
[6] Alfred V. Aho, Ravi Sethi, Jeffrey D.Ullman; âCompilers: Principles, Techniques and
Toolsâ;Addison Wesley Publication Company, USA, 1986.
[7] Jean Paul Tremblay,Paul G. Sorenson;âThe Theory and Practice of Compiler Writingâ;McGraw-Hill
Book Company USA 1985
[8] David Gries; âCompiler Construction for digital Computersâ; John Wiley & Sons Inc. USA, 1971.
[9] M. Zosel; âA Parallel Approach to Compilationâ; Conf. REc. ACM Sysposium on Principles of
Programming Languages, Boston, MA, pp. 59-70, October 1973.
[10] N. Lincoln; âParallel Compiling Techniques for Compilersâ; ACM Sigplan Notices, 10(1970), pp. 18-
31, 1970.
[11] M. D. Mickunas, R. M. Schell; âParallel Compilation in a Multiprocessor Environmentâ; Proceedings
of the annual conference of the ACM, Washington, D.C., USA, pp. 241â246, 1978.
[12] Timothy Hickey, Joel Katcoff; âUpper Bounds for Speedup in Parallel Parsingâ; Journal of the ACM
(JACM), Vol. 29, No. 2, pp. 408 â 428, 1982.
[13] J. Cohen, Stuart Kolodner; âEstimating the Speed up in Parallel Parsingâ; IEEE Transactions on
Software Engineering, January 1985.
[14] M. Chandwani, M. Puranik , N.S. Chaudhari, âOn CKY- Parsing of Context Free Grammars in
Parallelâ; Proceedings of the IEEE Region 10 Conference, Tencon 92, Melbourne Australia, pp. 141-
145, 1992.
[15] Sanjay Khanna, ArifGhafoor, AmritGoel; âA Parallel Compilation Technique Based on Grammar
Partitioningâ; Proceedings of ACM annual conference on Cooperation, Washington, D.C., USA, pp.
385 â 391, 1990.
[16] Akinori Yonezmva, Ichiro Ohsawa; âObject-Oriented Parallel Parsing for Context-Free Grammarsâ;
Proceedings of the 12th conference on Computational linguistics â Vol. 2, Budapest, Hungry, pp.
773â778, 1988.
[17] Valeriy Shipunov, Andrey Gavryushenko, Eugene Kuznetsov,â Comparative Analysis of Debugging
Tools in Parallel Programming for Multi-core Processorsâ CADSMâ2007, February 20-24, 2007,
Polyana, UKRAINE IEEE.
[18] Mary Hall, David Padua and Keshav Pingali,âCompiler Research:The Next 50 Yearsâ,
Communication of the ACM Feb 2009,Vol. 2.
[19] Amit Barve and Dr. Brijendra Kumar Joshi;âA Parallel Lexical Analyzer for Multi-core Machineâ;
Proceeding of CONSEG-2012,CSI 6th International confernece on software engineering; pp 319-
323;5-7 September 2012 Indore,India.
[20] Amit Barve and Brijendrakumar Joshi, "Parallel lexical analysis on multi-core machines using divide
and conquer," NUiCONE- 2012 Nirma University International Conference on Engineering , pp.1,5,
6-8 Dec. 2012. Ahmedabad, India.
7. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
57
[21] Amit Barve and Brijendrakumar Joshi; âParallel lexical analysis of multiple files on multi-core
machinesâ; International Journal of Computer Applications; Vol. 96, No.8, June 2014.
[22] http://www.linuxjournal.com/article/6799?page=0,1.
[23] http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html
[24] David R. Butenhof, âProgramming with POSIX Threadsâ, Addison-Wesley Longman Publishing Co.,
USA 1997.
[25] http://openmp.org/wp
[26] http://www.threadingbuildingblocks.org.
[27] Rajan, A; Joshi, B.K.; Rawat, A; Jha, R.; Bhachavat, K., "Analysis of process distribution in HPC
cluster using HPL," 2nd IEEE International Conference on Parallel Distributed and Grid Computing
(PDGC), 2012, pp.85,88, 6-8 Dec. 2012 Solan India.
[28] Rajan A., Joshi B.K., Rawat A., Gupta S.âAnalyitical Study of HPCC Performance Using
HPLâ;International Journal of Computer Science and its Applications, Vol. 2, no. 1, p. 47-49, Apr.
2012.
[29] Rajan A., Joshi Brijendra Kumar, Rawat A.âCritical Analysis of HPL Performance under Different
Process Distribution Patternsâ.CSI 6th International Conference on Software Engineering (CONSEG-
2012), DAVV, Indore, Sep., 5-7, 2012
[30] Chuanpeng Li, Chen Ding, Kai Shen;âQuantifying the cost of context switchâ,ExpCSâ07â Proceeding
of the 2007 workshop on Experimental computer science; article 2; ACM New York USA;2007.
Authors
Mr. Amit Barve is an Assistant Professor in Computer Engineering at Vishwakarma
Institute of Information Technology, Pune (M.H.) India. He has completed BE in Computer
Science and Engineering from MIT Ujjain; M.Tech. in Computer Engineering from VJTI
Mumbai. His research interests are parallel processing, HPC, and compiler design.
Dr. Brijendra Kumar Joshi is a Professor in Electronics & Telecommunication and
Computer Engineering at Military College of Telecommunication Engineering, Mhow
(M.P.), India. He has obtained BE in Electronics and Telecommunication Engineering from
Govt. Engg. College Jabalpur; ME in Computer Science and Engineering from IISc,
Banglore, and Ph.D. in Electronics and Telecommunication Engineering from Rani Durgavati University,
Jabalpur, and M.Tech in Digital Communication from MANIT, Bhopal. His research interests are
programming languages, compiler design, digital communications, mobile ad hoc and wireless sensor
networks, software engineering and formal methods.