The advent of multi-core architecture has highly influenced the area of high performance computing.
Parallel compilation is the area which still needs significant improvement by the use of this architecture.
Recent research has shown some improvement in lexical analysis phase. But it is difficult to implement the
same technique in parsing phase. This paper highlights some issues related to implementation of parallel
parsing on multi-core machines.
The advent of multi-core architecture has highly influenced the area of high performance computing.
Parallel compilation is the area which still needs significant improvement by the use of this architecture.
Recent research has shown some improvement in lexical analysis phase. But it is difficult to implement the
same technique in parsing phase. This paper highlights some issues related to implementation of parallel
parsing on multi-core machines.
Towards high performance computing(hpc) through parallel programming paradigm...ijpla
Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a lot of variations in perceiving the importance of parallelism in computing machines. And it is observed that the parallel computing is a superior solution to many of the computing limitations like speed and density; non-recurring and high cost; and power consumption and heat dissipation etc. The commercial multiprocessors have emerged with lower prices than the mainframe machines and supercomputers machines. In this article the high performance computing (HPC) through parallel programming paradigms (PPPs) are discussed with their constructs and design approaches.
Concurrent Matrix Multiplication on Multi-core ProcessorsCSCJournals
With the advent of multi-cores every processor has built-in parallel computational power and that can only be fully utilized only if the program in execution is written accordingly. This study is a part of an on-going research for designing of a new parallel programming model for multi-core architectures. In this paper we have presented a simple, highly efficient and scalable implementation of a common matrix multiplication algorithm using a newly developed parallel programming model SPC3 PM for general purpose multi-core processors. From our study it is found that matrix multiplication done concurrently on multi-cores using SPC3 PM requires much less execution time than that required using the present standard parallel programming environments like OpenMP. Our approach also shows scalability, better and uniform speedup and better utilization of available cores than that the algorithm written using standard OpenMP or similar parallel programming tools. We have tested our approach for up to 24 cores with different matrices size varying from 100 x 100 to 10000 x 10000 elements. And for all these tests our proposed approach has shown much improved performance and scalability
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...CSCJournals
This paper proposes a parallel approach for the Vector Quantization (VQ) problem in image processing. VQ deals with codebook generation from the input training data set and replacement of any arbitrary data with the nearest codevector. Most of the efforts in VQ have been directed towards designing parallel search algorithms for the codebook, and little has hitherto been done in evolving a parallelized procedure to obtain an optimum codebook. This parallel algorithm addresses the problem of designing an optimum codebook using the traditional LBG type of vector quantization algorithm for shared memory systems and for the efficient usage of parallel processors. Using the codebook formed from a training set, any arbitrary input data is replaced with the nearest codevector from the codebook. The effectiveness of the proposed algorithm is indicated.
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETSIJCNCJournal
The article considers a generation mechanism of compositional models simulating imperative programs behavior in terms of Petri nets. The mechanism of program models generation consists of two main stages. At the first stage, the structure of the program is prepared using such program elements like: libraries, functions and links between functions. At the second stage, the content of function bodies is generated on the base of template constructions. In the article some semantic constructions template examples of imperative programming language with their descriptions are given, and a generation example of a program model in terms of Petri nets is demonstrated
The advent of multi-core architecture has highly influenced the area of high performance computing.
Parallel compilation is the area which still needs significant improvement by the use of this architecture.
Recent research has shown some improvement in lexical analysis phase. But it is difficult to implement the
same technique in parsing phase. This paper highlights some issues related to implementation of parallel
parsing on multi-core machines.
Towards high performance computing(hpc) through parallel programming paradigm...ijpla
Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a lot of variations in perceiving the importance of parallelism in computing machines. And it is observed that the parallel computing is a superior solution to many of the computing limitations like speed and density; non-recurring and high cost; and power consumption and heat dissipation etc. The commercial multiprocessors have emerged with lower prices than the mainframe machines and supercomputers machines. In this article the high performance computing (HPC) through parallel programming paradigms (PPPs) are discussed with their constructs and design approaches.
Concurrent Matrix Multiplication on Multi-core ProcessorsCSCJournals
With the advent of multi-cores every processor has built-in parallel computational power and that can only be fully utilized only if the program in execution is written accordingly. This study is a part of an on-going research for designing of a new parallel programming model for multi-core architectures. In this paper we have presented a simple, highly efficient and scalable implementation of a common matrix multiplication algorithm using a newly developed parallel programming model SPC3 PM for general purpose multi-core processors. From our study it is found that matrix multiplication done concurrently on multi-cores using SPC3 PM requires much less execution time than that required using the present standard parallel programming environments like OpenMP. Our approach also shows scalability, better and uniform speedup and better utilization of available cores than that the algorithm written using standard OpenMP or similar parallel programming tools. We have tested our approach for up to 24 cores with different matrices size varying from 100 x 100 to 10000 x 10000 elements. And for all these tests our proposed approach has shown much improved performance and scalability
Parallelization of the LBG Vector Quantization Algorithm for Shared Memory Sy...CSCJournals
This paper proposes a parallel approach for the Vector Quantization (VQ) problem in image processing. VQ deals with codebook generation from the input training data set and replacement of any arbitrary data with the nearest codevector. Most of the efforts in VQ have been directed towards designing parallel search algorithms for the codebook, and little has hitherto been done in evolving a parallelized procedure to obtain an optimum codebook. This parallel algorithm addresses the problem of designing an optimum codebook using the traditional LBG type of vector quantization algorithm for shared memory systems and for the efficient usage of parallel processors. Using the codebook formed from a training set, any arbitrary input data is replaced with the nearest codevector from the codebook. The effectiveness of the proposed algorithm is indicated.
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETSIJCNCJournal
The article considers a generation mechanism of compositional models simulating imperative programs behavior in terms of Petri nets. The mechanism of program models generation consists of two main stages. At the first stage, the structure of the program is prepared using such program elements like: libraries, functions and links between functions. At the second stage, the content of function bodies is generated on the base of template constructions. In the article some semantic constructions template examples of imperative programming language with their descriptions are given, and a generation example of a program model in terms of Petri nets is demonstrated
Over time, Machine Learning inference workloads became more and more demanding in terms of latency and throughput. Moreover, many inference workloads compute predictions based on a limited number of models that are deployed in the system. This scenario provides large rooms for optimizations of runtime and memory, which current systems fall short in exploring because they employ a black-box model of ML models and tasks.
On the opposite side, Pretzel adopts a white-box description of ML models, which allows the framework to perform optimizations over deployed models and running tasks, saving memory and increasing the overall system performance. In particular, Pretzel can properly schedule ML jobs on NUMA machines, whose complexities may impact latencies and efficiency aspects.
In this talk we will show the motivations behind Pretzel, its current design and possible future developments.
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGcscpconf
Parallel computing systems compose task partitioning strategies in a true multiprocessing
manner. Such systems share the algorithm and processing unit as computing resources which
leads to highly inter process communications capabilities. The main part of the proposed
algorithm is resource management unit which performs task partitioning and co-scheduling .In
this paper, we present a technique for integrated task partitioning and co-scheduling on the
privately owned network. We focus on real-time and non preemptive systems. A large variety of
experiments have been conducted on the proposed algorithm using synthetic and real tasks.
Goal of computation model is to provide a realistic representation of the costs of programming
The results show the benefit of the task partitioning. The main characteristics of our method are
optimal scheduling and strong link between partitioning, scheduling and communication. Some
important models for task partitioning are also discussed in the paper. We target the algorithm
for task partitioning which improve the inter process communication between the tasks and use
the recourses of the system in the efficient manner. The proposed algorithm contributes the
inter-process communication cost minimization amongst the executing processes.
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
Infrastructure services (Infrastructure-as-a-service), provided by cloud vendors, allow any user to provision a large number of compute instances fairly easily. Whether leased from public clouds or allocated from private clouds, utilizing these virtual resources to perform data/compute intensive analyses requires employing different parallel runtimes to implement such applications. Among many parallelizable problems, most “pleasingly parallel” applications can be performed using MapReduce technologies such as Hadoop, CGL-MapReduce, and Dryad, in a fairly easy manner. However, many scientific applications, which have complex communication patterns, still require low latency communication mechanisms and rich set of communication constructs offered by runtimes such as MPI. In this paper, we first discuss large scale data analysis using different MapReduce implementations and then, we present a performance analysis of high performance parallel applications on virtualized resources.
Affect of parallel computing on multicore processorscsandit
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make
number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of
Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTEL™, AMD™, IBM™ etc., and the
various techniques there followed in, for improving the performance of the Multicore
Processors.
We conducted cluster experiments to find this limit. In this paper we propose an alternate design
of Multicore processor based on the results of our cluster experiment.
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORScscpconf
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTEL™, AMD™, IBM™ etc., and the various techniques there followed in, for improving the performance of the Multicore
Processors. We conducted cluster experiments to find this limit. In this paper we propose an alternate design of Multicore processor based on the results of our cluster experiment.
OpenGL Based Testing Tool Architecture for Exascale ComputingCSCJournals
In next decade, for exascale high computing power and speed, new high performance computing (HPC) architectures, algorithms and corrections in existing technologies are expected. In order to achieve HPC parallelism is becoming a core emphasizing point. Keeping in view the advantages of parallelism, GPU is a unit that provides the better performance to achieve HPC in exascale computing system. So far, many programming models have been introduced to program GPU like CUDA, OpenGL, and OpenCL etc. and still there are number of limitations for these models that are required a deep glance to fix them. In order to enhance the performance in GPU programming in OpenGL, we have proposed an OpenGL based testing tool architecture for exascale computing system. This testing architecture detects the errors from OpenGL code and enforce to write the code in accurate way.
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
Sahil Gupta, Bertram Ludäscher, Jessica Yi-Yun Cheng.
Datalog 2.0: 3rd Workshop on the Resurgence of Datalog in Academia & Industry. Philadelphia Logic Week. June 3-7, 2019 Philadelphia.
All new computers have multicore processors. To exploit this hardware parallelism for improved
performance, the predominant approach today is multithreading using shared variables and locks. This
approach has potential data races that can create a nondeterministic program. This paper presents a
promising new approach to parallel programming that is both lock-free and deterministic. The standard
forall primitive for parallel execution of for-loop iterations is extended into a more highly structured
primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared
variables (or shared collections) freely. Shared collections modified by a POP must be selected from a
special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that
govern parallel updates in a deterministic way. This paper presents an overview of a Prototype Library
that implements this POP-PAC approach for the C++ language, including performance results for two
benchmark parallel programs.
Over time, Machine Learning inference workloads became more and more demanding in terms of latency and throughput. Moreover, many inference workloads compute predictions based on a limited number of models that are deployed in the system. This scenario provides large rooms for optimizations of runtime and memory, which current systems fall short in exploring because they employ a black-box model of ML models and tasks.
On the opposite side, Pretzel adopts a white-box description of ML models, which allows the framework to perform optimizations over deployed models and running tasks, saving memory and increasing the overall system performance. In particular, Pretzel can properly schedule ML jobs on NUMA machines, whose complexities may impact latencies and efficiency aspects.
In this talk we will show the motivations behind Pretzel, its current design and possible future developments.
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGcscpconf
Parallel computing systems compose task partitioning strategies in a true multiprocessing
manner. Such systems share the algorithm and processing unit as computing resources which
leads to highly inter process communications capabilities. The main part of the proposed
algorithm is resource management unit which performs task partitioning and co-scheduling .In
this paper, we present a technique for integrated task partitioning and co-scheduling on the
privately owned network. We focus on real-time and non preemptive systems. A large variety of
experiments have been conducted on the proposed algorithm using synthetic and real tasks.
Goal of computation model is to provide a realistic representation of the costs of programming
The results show the benefit of the task partitioning. The main characteristics of our method are
optimal scheduling and strong link between partitioning, scheduling and communication. Some
important models for task partitioning are also discussed in the paper. We target the algorithm
for task partitioning which improve the inter process communication between the tasks and use
the recourses of the system in the efficient manner. The proposed algorithm contributes the
inter-process communication cost minimization amongst the executing processes.
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
Infrastructure services (Infrastructure-as-a-service), provided by cloud vendors, allow any user to provision a large number of compute instances fairly easily. Whether leased from public clouds or allocated from private clouds, utilizing these virtual resources to perform data/compute intensive analyses requires employing different parallel runtimes to implement such applications. Among many parallelizable problems, most “pleasingly parallel” applications can be performed using MapReduce technologies such as Hadoop, CGL-MapReduce, and Dryad, in a fairly easy manner. However, many scientific applications, which have complex communication patterns, still require low latency communication mechanisms and rich set of communication constructs offered by runtimes such as MPI. In this paper, we first discuss large scale data analysis using different MapReduce implementations and then, we present a performance analysis of high performance parallel applications on virtualized resources.
Affect of parallel computing on multicore processorscsandit
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make
number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of
Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTEL™, AMD™, IBM™ etc., and the
various techniques there followed in, for improving the performance of the Multicore
Processors.
We conducted cluster experiments to find this limit. In this paper we propose an alternate design
of Multicore processor based on the results of our cluster experiment.
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORScscpconf
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTEL™, AMD™, IBM™ etc., and the various techniques there followed in, for improving the performance of the Multicore
Processors. We conducted cluster experiments to find this limit. In this paper we propose an alternate design of Multicore processor based on the results of our cluster experiment.
OpenGL Based Testing Tool Architecture for Exascale ComputingCSCJournals
In next decade, for exascale high computing power and speed, new high performance computing (HPC) architectures, algorithms and corrections in existing technologies are expected. In order to achieve HPC parallelism is becoming a core emphasizing point. Keeping in view the advantages of parallelism, GPU is a unit that provides the better performance to achieve HPC in exascale computing system. So far, many programming models have been introduced to program GPU like CUDA, OpenGL, and OpenCL etc. and still there are number of limitations for these models that are required a deep glance to fix them. In order to enhance the performance in GPU programming in OpenGL, we have proposed an OpenGL based testing tool architecture for exascale computing system. This testing architecture detects the errors from OpenGL code and enforce to write the code in accurate way.
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
Sahil Gupta, Bertram Ludäscher, Jessica Yi-Yun Cheng.
Datalog 2.0: 3rd Workshop on the Resurgence of Datalog in Academia & Industry. Philadelphia Logic Week. June 3-7, 2019 Philadelphia.
All new computers have multicore processors. To exploit this hardware parallelism for improved
performance, the predominant approach today is multithreading using shared variables and locks. This
approach has potential data races that can create a nondeterministic program. This paper presents a
promising new approach to parallel programming that is both lock-free and deterministic. The standard
forall primitive for parallel execution of for-loop iterations is extended into a more highly structured
primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared
variables (or shared collections) freely. Shared collections modified by a POP must be selected from a
special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that
govern parallel updates in a deterministic way. This paper presents an overview of a Prototype Library
that implements this POP-PAC approach for the C++ language, including performance results for two
benchmark parallel programs.
All new computers have multicore processors. To exploit this hardware parallelism for improved
perf
ormance, the predominant approach today is multithreading using shared variables and locks. This
approach has potential data races that can create a nondeterministic program. This paper presents a
promising new approach to parallel programming that is both
lock
-
free and deterministic. The standard
forall primitive for parallel execution of for
-
loop iterations is extended into a more highly structured
primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared
variable
s (or shared collections) freely. Shared collections modified by a POP must be selected from a
special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that
govern parallel updates in a deterministic way. This paper pre
sents an overview of a Prototype Library
that implements this POP
-
PAC approach for the C++ language, including performance results for two
benchmark parallel programs.
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
Sensor routers play a crucial role in the sector of Internet of Things applications, in which the capacity for transmission of the network signal is limited from cloud systems to sensors and its reversal process. It describes a robust recognized framework with various architected layers to process data at high level synthesis. It is designed to sense the nodes instinctually with the help of Internet of Things where the applications arise in cloud systems. In this paper embedded PEs with four layer new design framework architecture is proposed to sense the devises of IOT applications with the support of high-level synthesis DBMF (database management function) tool.
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
Sensor routers play a crucial role in the sector of Internet of Things applications, in which the capacity for transmission of the network signal is limited from cloud systems to sensors and its reversal process. It describes a robust recognized framework with various architected layers to process data at high level synthesis. It is designed to sense the nodes instinctually with the help of Internet of Things where the applications arise in cloud systems. In this paper embedded PEs with four layer new design framework architecture is proposed to sense the devises of IOT applications with the support of high-level synthesis DBMF (database management function) tool.
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP IJCSEIT Journal
The current multi-core architectures have become popular due to performance, and efficient processing of
multiple tasks simultaneously. Today’s the parallel algorithms are focusing on multi-core systems. The
design of parallel algorithm and performance measurement is the major issue on multi-core environment. If
one wishes to execute a single application faster, then the application must be divided into subtask or
threads to deliver desired result. Numerical problems, especially the solution of linear system of equation
have many applications in science and engineering. This paper describes and analyzes the parallel
algorithms for computing the solution of dense system of linear equations, and to approximately compute
the value of π using OpenMP interface. The performances (speedup) of parallel algorithms on multi-core
system have been presented. The experimental results on a multi-core processor show that the proposed
parallel algorithms achieves good performance (speedup) compared to the sequential
Dominant block guided optimal cache size estimation to maximize ipc of embedd...ijesajournal
Embedded system software is highly constrained from performance, memory footprint, energy consumption and implementing cost view point. It is always desirable to obtain better Instructions per Cycle (IPC). Instruction cache has major contribution in improving IPC. Cache memories are realized on the same chip where the processor is running. This considerably increases the system cost as well. Hence, it is required to maintain a trade-off between cache sizes and performance improvement offered. Determining the number of cache lines and size of cache line are important parameters for cache designing. The design space for cache is quite large. It is time taking to execute the given application with different cache sizes on an instruction set simulator (ISS) to figure out the optimal cache size. In this paper, a technique is proposed to identify a number of cache lines and cache line size for the L1 instruction cache that will offer best or nearly best IPC. Cache size is derived, at a higher abstraction level, from basic block analysis in the Low Level Virtual Machine (LLVM) environment. The cache size estimated from the LLVM environment is cross validated by simulating the set of benchmark applications with different cache sizes in SimpleScalar’s out-of-order simulator. The proposed method seems to be superior in terms of estimation accuracy and/or estimation time as compared to the existing methods for estimation of optimal cache size parameters (cache line size, number of cache lines).
Dominant block guided optimal cache size estimation to maximize ipc of embedd...ijesajournal
Embedded system software is highly constrained from performance, memory footprint, energy consumption
and implementing cost view point. It is always desirable to obtain better Instructions per Cycle (IPC).
Instruction cache has major contribu
tion in improving IPC. Cache memories are realized on the same chip
where the processor is running. This considerably increases the system cost as well. Hence, it is required to
maintain a trade
-
off between cache sizes and performance improvement offered.
Determining the number
of cache lines and size of cache line are important parameters for cache designing. The design space for
cache is quite large. It is time taking to execute the given application with different cache sizes on an
instruction set simula
tor (ISS) to figure out the optimal cache size. In this paper, a technique is proposed to
identify a number of cache lines and cache line size for the L1 instruction cache that will offer best or
nearly best IPC. Cache size is derived, at a higher abstract
ion level, from basic block analysis in the Low
Level Virtual Machine (LLVM) environment. The cache size estimated from the LLVM environment is cross
validated by simulating the set of benchmark applications with different cache sizes in SimpleScalar’s out
-
of
-
order simulator. The proposed method seems to be superior in terms of estimation accuracy and/or
estimation time as compared to the existing methods for estimation of optimal cache size parameters (cache
line size, number of cache lines).
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ijnlc
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints.
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...kevig
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints.
Benchmarking open source deep learning frameworksIJECEIAES
Deep Learning (DL) is one of the hottest fields. To foster the growth of DL, several open source frameworks appeared providing implementations of the most common DL algorithms. These frameworks vary in the algorithms they support and in the quality of their implementations. The purpose of this work is to provide a qualitative and quantitative comparison among three such frameworks: TensorFlow, Theano and CNTK. To ensure that our study is as comprehensive as possible, we consider multiple benchmark datasets from different fields (image processing, NLP, etc.) and measure the performance of the frameworks’ implementations of different DL algorithms. For most of our experiments, we find out that CNTK’s implementations are superior to the other ones under consideration.
An octa core processor with shared memory and message-passingeSAT Journals
Abstract This being the era of fast, high performance computing, there is the need of having efficient optimizations in the processor architecture and at the same time in memory hierarchy too. Each and every day, the advancement of applications in communication and multimedia systems are compelling to increase number of cores in the main processor viz., dual-core, quad-core, octa-core and so on. But, for enhancing the overall performance of multi processor chip, there are stringent requirements to improve inter-core synchronization. Thus, a MPSoC with 8-cores supporting both message-passing and shared-memory inter-core communication mechanisms is implemented on Virtex 5 LX110T FPGA. Each core is based on MIPS III (Microprocessor without interlocked pipelined stages) ISA, handling only integer type instructions and having six-stage pipeline with data hazard detection unit and forwarding logic. The eight processing cores and one central shared memory core are inter connected using 3x3 2-D mesh topology based Network-on-chip (NoC) with virtual channel router. The router is four stage pipelined supporting DOR X-Y routing algorithm and with round robin arbitration technique. For verification and functionality test of above fully synthesized multi core processor, matrix multiplication operation is mapped onto the above said. Partitioning and scheduling of multiple multiplications and addition for each element of resultant matrix has been done accordingly among eight cores to get maximum throughput. All the codes for processor design are written in Verilog HDL. Keywords: MPSoC, message-passing, shared memory, MIPS, ISA, wormhole router, network-on-chip, SIMD, data level parallelism, 2-D Mesh, virtual channel
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
ISSUES IN IMPLEMENTATION OF PARALLEL PARSING ON MULTI-CORE MACHINES
1. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
DOI : 10.5121/ijcseit.2014.4505 51
ISSUES IN IMPLEMENTATION OF PARALLEL
PARSING ON MULTI-CORE MACHINES
Amit Barve1
and Brijendra Kumar Joshi2
1
Asst. Professor,CSE, VIIT Pune,India
2
Professor,MCTE,Mhow
ABSTRACT
The advent of multi-core architecture has highly influenced the area of high performance computing.
Parallel compilation is the area which still needs significant improvement by the use of this architecture.
Recent research has shown some improvement in lexical analysis phase. But it is difficult to implement the
same technique in parsing phase. This paper highlights some issues related to implementation of parallel
parsing on multi-core machines.
KEYWORDS
Syntax Analysis, Parallel Parsing, Multi-core Machines.
1. INTRODUCTION
Compiler is a program that translates a source language into target language. The structure of a
compiler is composed of several phases. The first phase is lexical analysis or scanning. This is the
only phase which interacts with original source code written by the programmer. It takes stream
of characters as input and generates tokens of the form {token name, attribute value} as output.
The task that does this is called lexical analyzer or scanner. Lex [1] and Flex [2] are two popular
tools for automatically generating lexical analyzers from specifications.
The information about tokens is saved in a special data structure called symbol table. These
tokens are then forwarded to the next phase i.e. syntax analysis also known as parsing. Parsing is
an important phase in compilers. This phase takes the stream of tokens as input produced by
lexical analyzer and converts them into parse trees. A parse tree is a structural representation of
grammar being parsed. The tool which performs this task is known as parser. Parser can be
automatically generated by YACC [3] and Bison[4] which take grammar specifications as input
and produce parsers.
Interaction of the lexical analyzer and the syntax analyzer is depicted in Fig. 1. The details of
various phases of a compiler can be found in popular texts [5][6][7][8].
2. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
52
Fig. 1. Interaction of Lexical Analyzer with Parser
2. PARSING TECHNIQUES
The parsing algorithms are primarily classified into two categories, top-down parsing and bottom-
up parsing. These refer to the order in which nodes in a parse tree are constructed. In top-down
approach the construction of a tree starts from root and proceeds towards the leaves while in
bottom up approach construction of a parse tree starts with leaves and proceeds towards the root.
Some well known top-down parsing algorithms are recursive decent parsing (also called
predictive parsing) and non-recursive decent parsing. Bottom-up parsing includes some
algorithms like Simple LR (SLR) parsing, Canonical LR (CLR) parsing, and Look Ahead LR
(LALR) parsing.
In LR parsing, parser reads input from left to right and generates a right most derivation in
reverse. The name LR(k) parser is also used, where k refers to the number of unconsumed look
ahead input symbols that are used in making parsing decisions. Depending on how the parsing
table is created, an LR parser can be called SLR, LALR, or CLR Parser. LALR parsers have more
language recognition power than SLR parsers. Canonical LR parsers have more recognition
power than LALR parsers. For comparison of these parsers, refer to Table 1.
3. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
53
Table 1. Comparison of parsing techniques
Parsing
Technique
No. of
Look
Ahead
tokens
No. of
Iterations
Grammar
recognition
Power
Grammar used
SLR 0 Maximum Least Powerful Context Free Grammar
CLR 1 Less than SLR Most powerful
Technique
Context Free Grammar
LALR 1 Less than
LALR
More powerful
than SLR but less
than CLR
Context Free Grammar
3. PARALLEL PARSING
Parallel parsing has been attempted by many in the past. The parallel processing was achieved by
assigning totally different user jobs to different processors. Zosel[9] focused on recognizing
FORTRAN DO-loops that can be collapsed into vector instructions for CDC 7600 machines.
Lincoln [10] first proposed the concept of parallel object code for FORTRAN and COBOL job
cards in an environment that consisted of IBM 704 uniprocessors and CDC 6500 of ILLIAC IV.
Mickunas and Shell[11] recognized the areas in a compilation process where the parallel
processing is inherent. They proposed to divide lexical analysis into scanning and screening.
They also developed a parallel parsing technique based on LR parsing. Hickey and Katcoff[12]
have analyzed parsing algorithms for upper bound on speedup whereas Cohen and Kolodner[13]
have estimated speedup in parallel parsing. Chandwani et al [14] developed a parallel algorithm
for CKY-parsing for context free grammars. Khanna et al[15] proposed the partitioning of
grammar to make it appropriate for parallel compilation. Object Oriented parsing was proposed
by Yonezmva and Oshava[16].
4. MACHINES ARCHITECTURE
Processor is a logic circuitry that responds to and processes the basic instructions that drive a
computer.
Single Core Processor is a processor that has only one core (Processor), so it can only start one
operation at a time. It can however in some situations start a new operation before the previous
one is complete.
Multi-core processor is a processing system composed of two or more independent cores. It can
be described as an integrated circuit to which two or more individual processors (called cores in
this sense) have been attached. Fig. 2 and 3 give a simplified view of single and multi-core
machines.
4. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
54
Fig. 2 Single Core Machine
Fig. 3 Multi-Core Machine
Multi-core machines have various advantages like better resource utilization, efficient data
sharing (sharing data through memory is more efficient than massage-passing), increased
performance etc [17].
The major challenges while designing a multi-core compiler are program optimization, making
parallel programming mainstream and development of performance models to support
optimization for parallel code. Compiler should be capable of self improvement [18].
5. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
55
5. IMPLEMENTATION ISSUES IN PARALLEL PARSING
The efforts cited in reference [11]-[16] to develop parallel parsing algorithms are of theoretical
significance only. Their practical implementations have not been seen so far in real programming
languages for multi-core machines because of issues discussed next.
a) Division of code and Synchronization: Barve and Joshi[19][20][21] developed
some algorithms for doing parallel lexical analysis on multi-core machines. Their
approach is to divide the source code into number of blocks and perform lexical
analysis on individual blocks. Their approach was good for parallel lexical analysis.
If we use the same approach for syntax analysis, building of a common symbol table
is an issue as multiple instances of syntax analyzer would be in action. These syntax
analyzers would generate individual symbol tables corresponding to the source code
at their disposal.
b) Processor Issues: In the past, the researchers assumed that if n processors are
available then task is divided into several parts and is assigned to any of the
available processors that do the job independently. In multi-core machines this task
can be done by the use of processor affinity concept [22][23]. To obtain higher
degree of precision in time consumption, it is required that the underlying operating
environment be attached to a single processor relieving remaining processors for
exclusive use by the parallel parsing algorithm. Binding entire operating system to a
single processor is not straightforward.
c) Threading: Threading is an essential feature of multi-core machines which enables
us to achieve parallelism. Run time libraries like PTherad[24], Thread Building
Blocks(TBB)[25] and OpenMP[26] are used for this purpose. Threading is also
responsible for performance degradation. Some time more threading takes more
times as compared to serial counterpart of the target program. So, it is essential that
threading be used only when it is required and which results in increased
performance.
d) Task Distribution: Task distribution is also an important factor which affects
performance. The distribution of tasks may be done in such way that no processor
will be free after finishing its task. Rajan et al have evaluated the performance of
such distribution on High Performance Computing (HPC) clusters [27][28][29].
e) Context Switching: System has to pay the cost when context switching is done
specially in multi-core systems. Chuanpeng Li. Et al have shown the results of
experimentally quantifying the indirect cost of context switching using a synthetic
workload. They have also measured the impact of program data size and access
stride on context switch cost [30].
6. CONCLUSION
In this paper various issues in implementation of parallel parsing algorithms on multi-core
machines were discussed. It is imperative to pay attention to synchronization among threads for
shared resources. This point has been addressed numerous times since the decades. The problem
becomes more serious as the number of core per machines and clock speed of processors
6. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
56
increase. Still a good amount of dedicated efforts is required to explore inherent property of
parallel processing present in multi-core machines targeting parsing.
REFERENCES
[1] M. E. Lesk, E. Schmidt; “Lex- A Lexical Analyzer Generator”; Computing Science Technical Report
No. 39, Bell Laboratories, Murray Hills, New Jersey, 1975.
[2] http://flex.sourceforge.net/
[3] S. C. Johnson; “YACC: Yet Another Compiler Compiler”; Computing Science Technical Report no
32, Bell Laboratories, Murray Hills, New Jersey, 1975.
[4] www.gnu.org/s/bison. (Last accessed on 05-Aug-2014)
[5] Alfred V. Aho, Ravi Sethi, Jeffrey D.Ullman; “Principles of Compiler Design”; Addison Wesley
Publication Company, USA, 1985.
[6] Alfred V. Aho, Ravi Sethi, Jeffrey D.Ullman; “Compilers: Principles, Techniques and
Tools”;Addison Wesley Publication Company, USA, 1986.
[7] Jean Paul Tremblay,Paul G. Sorenson;”The Theory and Practice of Compiler Writing”;McGraw-Hill
Book Company USA 1985
[8] David Gries; “Compiler Construction for digital Computers”; John Wiley & Sons Inc. USA, 1971.
[9] M. Zosel; “A Parallel Approach to Compilation”; Conf. REc. ACM Sysposium on Principles of
Programming Languages, Boston, MA, pp. 59-70, October 1973.
[10] N. Lincoln; “Parallel Compiling Techniques for Compilers”; ACM Sigplan Notices, 10(1970), pp. 18-
31, 1970.
[11] M. D. Mickunas, R. M. Schell; “Parallel Compilation in a Multiprocessor Environment”; Proceedings
of the annual conference of the ACM, Washington, D.C., USA, pp. 241–246, 1978.
[12] Timothy Hickey, Joel Katcoff; “Upper Bounds for Speedup in Parallel Parsing”; Journal of the ACM
(JACM), Vol. 29, No. 2, pp. 408 – 428, 1982.
[13] J. Cohen, Stuart Kolodner; “Estimating the Speed up in Parallel Parsing”; IEEE Transactions on
Software Engineering, January 1985.
[14] M. Chandwani, M. Puranik , N.S. Chaudhari, “On CKY- Parsing of Context Free Grammars in
Parallel”; Proceedings of the IEEE Region 10 Conference, Tencon 92, Melbourne Australia, pp. 141-
145, 1992.
[15] Sanjay Khanna, ArifGhafoor, AmritGoel; “A Parallel Compilation Technique Based on Grammar
Partitioning”; Proceedings of ACM annual conference on Cooperation, Washington, D.C., USA, pp.
385 – 391, 1990.
[16] Akinori Yonezmva, Ichiro Ohsawa; “Object-Oriented Parallel Parsing for Context-Free Grammars”;
Proceedings of the 12th conference on Computational linguistics – Vol. 2, Budapest, Hungry, pp.
773–778, 1988.
[17] Valeriy Shipunov, Andrey Gavryushenko, Eugene Kuznetsov,” Comparative Analysis of Debugging
Tools in Parallel Programming for Multi-core Processors” CADSM’2007, February 20-24, 2007,
Polyana, UKRAINE IEEE.
[18] Mary Hall, David Padua and Keshav Pingali,”Compiler Research:The Next 50 Years”,
Communication of the ACM Feb 2009,Vol. 2.
[19] Amit Barve and Dr. Brijendra Kumar Joshi;”A Parallel Lexical Analyzer for Multi-core Machine”;
Proceeding of CONSEG-2012,CSI 6th International confernece on software engineering; pp 319-
323;5-7 September 2012 Indore,India.
[20] Amit Barve and Brijendrakumar Joshi, "Parallel lexical analysis on multi-core machines using divide
and conquer," NUiCONE- 2012 Nirma University International Conference on Engineering , pp.1,5,
6-8 Dec. 2012. Ahmedabad, India.
7. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol. 4, No.5, October 2014
57
[21] Amit Barve and Brijendrakumar Joshi; “Parallel lexical analysis of multiple files on multi-core
machines”; International Journal of Computer Applications; Vol. 96, No.8, June 2014.
[22] http://www.linuxjournal.com/article/6799?page=0,1.
[23] http://www.cyberciti.biz/tips/setting-processor-affinity-certain-task-or-process.html
[24] David R. Butenhof, “Programming with POSIX Threads”, Addison-Wesley Longman Publishing Co.,
USA 1997.
[25] http://openmp.org/wp
[26] http://www.threadingbuildingblocks.org.
[27] Rajan, A; Joshi, B.K.; Rawat, A; Jha, R.; Bhachavat, K., "Analysis of process distribution in HPC
cluster using HPL," 2nd IEEE International Conference on Parallel Distributed and Grid Computing
(PDGC), 2012, pp.85,88, 6-8 Dec. 2012 Solan India.
[28] Rajan A., Joshi B.K., Rawat A., Gupta S.”Analyitical Study of HPCC Performance Using
HPL”;International Journal of Computer Science and its Applications, Vol. 2, no. 1, p. 47-49, Apr.
2012.
[29] Rajan A., Joshi Brijendra Kumar, Rawat A.”Critical Analysis of HPL Performance under Different
Process Distribution Patterns”.CSI 6th International Conference on Software Engineering (CONSEG-
2012), DAVV, Indore, Sep., 5-7, 2012
[30] Chuanpeng Li, Chen Ding, Kai Shen;”Quantifying the cost of context switch”,ExpCS’07’ Proceeding
of the 2007 workshop on Experimental computer science; article 2; ACM New York USA;2007.
Authors
Mr. Amit Barve is an Assistant Professor in Computer Engineering at Vishwakarma
Institute of Information Technology, Pune (M.H.) India. He has completed BE in Computer
Science and Engineering from MIT Ujjain; M.Tech. in Computer Engineering from VJTI
Mumbai. His research interests are parallel processing, HPC, and compiler design.
Dr. Brijendra Kumar Joshi is a Professor in Electronics & Telecommunication and
Computer Engineering at Military College of Telecommunication Engineering, Mhow
(M.P.), India. He has obtained BE in Electronics and Telecommunication Engineering from
Govt. Engg. College Jabalpur; ME in Computer Science and Engineering from IISc,
Banglore, and Ph.D. in Electronics and Telecommunication Engineering from Rani Durgavati University,
Jabalpur, and M.Tech in Digital Communication from MANIT, Bhopal. His research interests are
programming languages, compiler design, digital communications, mobile ad hoc and wireless sensor
networks, software engineering and formal methods.