Adaptive Weight Computation Processor for Medical Ultrasound Beam former: VLS...iosrjce
We find difficult to detect and diagnosis the disease in olden times ,our human body is sensitive and
there are lot of obstacles in which we fail to understand the disease as well as the activities that are going inside
our body, so recently scientists have found out how ultrasound can be useful in medical field .In this paper we
discuss about the beam former technology using VLSI architecture and FPGA implementation.
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel MovidiusbyteLAKE
The document summarizes byteLAKE’s basic benchmark results between two different setups of example edge devices: with NVIDIA GPU and with Intel’s Movidius cards.
Key takeaway: the comparison of Movidius and NVIDIA as two competing accelerators for AI workloads leads to a conclusion that these two are meant for different tasks.
Deep Convolutional Network evaluation on the Intel Xeon PhiGaurav Raina
With a sharp decline in camera cost and size along with superior computing power available at increasingly low prices, computer vision applications are becoming ever present in our daily lives. Research shows that Convolutional Neural Networks (ConvNet) can outperform all other methods for
computer vision tasks (such as object detection) in terms of accuracy and versatility.
One of the problems with these Neural Networks, which mimic the brain, is that they can be very demanding on the processor, requiring millions of computational nodes to function. Hence, it is challenging for Neural Network
algorithms to achieve real-time performance on general purpose embedded platforms.
Parallelization and vectorization are very eective ways to ease this problem and make it possible to implement such ConvNets on energy efficient embedded platforms. This thesis presents the evaluation of a novel ConvNet for road speed sign detection, on a breakthrough 57-core Intel Xeon Phi
processor with 512-bit vector support. This mapping demonstrates that the parallelism inherent in the ConvNet algorithm can be effectively exploited by the 512-bit vector ISA and by utilizing the many core paradigm.
Detailed evaluation shows that the best mappings require data-reuse strategies that exploit reuse at the cache and register level. These implementations are boosted by the use of low-level vector intrinsics (which are
C style functions that map directly onto many Intel assembly instructions).
Ultimately we demonstrate an approach which can be used to accelerate Neural Networks on highly-parallel many core processors, with execution speedups of more than 12x on single core performance alone.
Deep learning @ Edge using Intel's Neural Compute Stickgeetachauhan
Talk @ Intel Global IoT DevFest, Nov 2017
The new generation of hardware accelerators are enabling rich AI driven, Intelligent IoT solutions @ the edge.
The talk showcased how to use Intel's latest Nervana Compute Stick for accelerating deep learning IoT solutions. It also covered use cases and code details for running Deep Learning models on Intel's Nervana Compute Stick.
Adaptive Weight Computation Processor for Medical Ultrasound Beam former: VLS...iosrjce
We find difficult to detect and diagnosis the disease in olden times ,our human body is sensitive and
there are lot of obstacles in which we fail to understand the disease as well as the activities that are going inside
our body, so recently scientists have found out how ultrasound can be useful in medical field .In this paper we
discuss about the beam former technology using VLSI architecture and FPGA implementation.
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel MovidiusbyteLAKE
The document summarizes byteLAKE’s basic benchmark results between two different setups of example edge devices: with NVIDIA GPU and with Intel’s Movidius cards.
Key takeaway: the comparison of Movidius and NVIDIA as two competing accelerators for AI workloads leads to a conclusion that these two are meant for different tasks.
Deep Convolutional Network evaluation on the Intel Xeon PhiGaurav Raina
With a sharp decline in camera cost and size along with superior computing power available at increasingly low prices, computer vision applications are becoming ever present in our daily lives. Research shows that Convolutional Neural Networks (ConvNet) can outperform all other methods for
computer vision tasks (such as object detection) in terms of accuracy and versatility.
One of the problems with these Neural Networks, which mimic the brain, is that they can be very demanding on the processor, requiring millions of computational nodes to function. Hence, it is challenging for Neural Network
algorithms to achieve real-time performance on general purpose embedded platforms.
Parallelization and vectorization are very eective ways to ease this problem and make it possible to implement such ConvNets on energy efficient embedded platforms. This thesis presents the evaluation of a novel ConvNet for road speed sign detection, on a breakthrough 57-core Intel Xeon Phi
processor with 512-bit vector support. This mapping demonstrates that the parallelism inherent in the ConvNet algorithm can be effectively exploited by the 512-bit vector ISA and by utilizing the many core paradigm.
Detailed evaluation shows that the best mappings require data-reuse strategies that exploit reuse at the cache and register level. These implementations are boosted by the use of low-level vector intrinsics (which are
C style functions that map directly onto many Intel assembly instructions).
Ultimately we demonstrate an approach which can be used to accelerate Neural Networks on highly-parallel many core processors, with execution speedups of more than 12x on single core performance alone.
Deep learning @ Edge using Intel's Neural Compute Stickgeetachauhan
Talk @ Intel Global IoT DevFest, Nov 2017
The new generation of hardware accelerators are enabling rich AI driven, Intelligent IoT solutions @ the edge.
The talk showcased how to use Intel's latest Nervana Compute Stick for accelerating deep learning IoT solutions. It also covered use cases and code details for running Deep Learning models on Intel's Nervana Compute Stick.
In this deck from ATPESC 2019, James Moawad and Greg Nash from Intel present: FPGAs and Machine Learning.
"Neural networks are inspired by biological systems, in particular the human brain. Through the combination of powerful computing resources and novel architectures for neurons, neural networks have achieved state-of-the-art results in many domains such as computer vision and machine translation. FPGAs are a natural choice for implementing neural networks as they can handle different algorithms in computing, logic, and memory resources in the same device. Faster performance comparing to competitive implementations as the user can hardcore operations into the hardware. Software developers can use the OpenCL device C level programming standard to target FPGAs as accelerators to standard CPUs without having to deal with hardware level design."
Watch the video: https://wp.me/p3RLHQ-lnc
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
and
https://www.intel.com/content/www/us/en/products/programmable/fpga.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Recent articles published in VLSI design & Communication SystemsVLSICS Design
International Journal of VLSI design & Communication Systems (VLSICS) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of VLSI Design & Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & communication concepts and establishing new collaborations in these areas.
Authors are solicited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the VLSI design & Communications.
Smart Data Slides: Emerging Hardware Choices for Modern AI Data ManagementDATAVERSITY
Leading edge AI applications have always been resource-intensive and known for stretching the limits of conventional (von Neumann architecture) computer performance. Specialized hardware, purpose built to optimize AI applications, is not new. In fact, it should be no surprise that the very first .com internet domain was registered to Symbolics - a company that built the Lisp Machine, a dedicated AI workstation - in 1985. In the last three decades, of course, the performance of conventional computers has improved dramatically with advances in chip density (Moore’s Law) leading to faster processor speeds, memory speeds, and massively parallel architectures. And yet, some applications - like machine vision for real time video analysis and deep machine learning - always need more power.
Participants in this webinar will learn the fundamentals of the three hardware approaches that are receiving significant investments and demonstrating significant promise for AI applications.
- neuromorphic/neurosynaptic architectures (brain-inspired hardware)
- GPUs (graphics processing units, optimized for AI algorithms), and
- quantum computers (based on principles and properties of quantum-mechanics rather than binary logic).
Note - This webinar requires no previous knowledge of hardware or computer architectures.
Elliptic Curve Cryptography (ECC) is capable of constructing public-key cryptosystems. Specifically, the security of the ECC minimizes to testing the ability to handle and solve the DLP (Discrete Logarithmic Problem) in the group of points of an elliptic curve (ECDLP). ECC based on ECDLP is in the list of recommended algorithms for use by NIST (National Institute of Standards and Technology) and NSA (National Security Agency). Given that ECDLP based cryptosystems are in wide-spread utilization, continuous efforts on monitoring the effectiveness of new attacks or improvements to pre-existing attacks on ECDLP over large prime factor is a significant part in that. This paper aims to provide a secure, effective, and flexible method to improve data security in cloud computing. In this chapter, a novel algorithm using MapReduce and Pollard-Rho’s approach to solve the ECDLP problems and to enhance the security level.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
In this paper we study NVIDIA graphics processing unit (GPU) along with its computational power and applications. Although these units are specially designed for graphics application we can employee there computation power for non graphics application too. GPU has high parallel processing power, low cost of computation and less time utilization; it gives good result of performance per energy ratio. This GPU deployment property for excessive computation of similar small set of instruction played a significant role in reducing CPU overhead. GPU has several key advantages over CPU architecture as it provides high parallelism, intensive computation and significantly higher throughput. It consists of thousands of hardware threads that execute programs in a SIMD fashion hence GPU can be an alternate to CPU in high performance environment and in supercomputing environment. The base line is GPU based general purpose computing is a hot topics of research and there is great to explore rather than only graphics processing application.
Best Practices for On-Demand HPC in Enterprisesgeetachauhan
Traditionally HPC has been popular in Scientific domains, but not in most other Enterprises. With the advent of on-demand-HPC in cloud and growing adoption of Deep Learning, HPC should now be a standard platform for any Enterprise leading with AI and Machine Learning. This session will cover the best practices for building your own on-demand HPC cluster for Enterprise workloads along with key use cases where Enterprises will benefit from HPC solution.
NUMERICAL STUDIES OF TRAPEZOIDAL PROTOTYPE AUDITORY MEMBRANE (PAM)IJCSEA Journal
In this research, we developed numerically a Prototype Auditory Membrane (PAM) for a fully implantable and self contained artificial cochlea. Cochleae are one of the important organs for hearing in the human and animals. Material of the prototype and implant of PAM are made of Polyvinylidene fluoride (PVDF)- Kureha, Japan which is fabricated using MEMS and thin film technologies. Another important thing in the characteristic of the PAM is not only convert the acoustic wave into electric signal but also the frequency selectivity. The thickness, Young’s modulus and density of the PAM are 40 μm, 4 GPa, and 1.79 103 kg/m3, respectively. The shape and dimension of the PAM is trapezoidal with the width is linearly changed from 2.0 to 4.0 mm with the length are 30 mm. Numerically, we develop the model of PAM is based on commercial CFD software, Fluent 6.3.26 and Gambit 2.4.6. The geometry model of the PAM consists of one-sided blocks of quadrilateral elements for 2D model and tetrahedral elements for 3 D model respectively. In this study we set the flow as laminar and carried out using unsteady time dependent calculation. The results show that the frequency selectivity of the membrane is detected on the membrane surface.
Healthcare has became one of the most important aspects of everyones life. Its importance has surged due to the latests outbreaks and due to this latest pandemic it has become mandatory to collaborate to improve everyones Healthcare as soon as possible.
IBM has reacted quickly sharing not only its knowledge but also its Artificial Intelligence Supercomputers all around the world.
Those Supercomputers are helping to prevail this outbreak and also future ones.
They have completely different features compared to proposals from other players of this Supercomputers market.
We will try to make a quick look at the differences of those AI focused Supercomputers and how they can help in the R&D of Healthcare solutions for everyone, from those ones with access to a big IBM AI Supercomputer to those ones with access to only one small IBM AI focused server.
Adaptive Computing Using PlateSpin OrchestrateNovell
Adaptive computing goes beyond just intelligently utilizing available resources; it encompasses quality of service (QoS) targets, fault tolerance (high availability), monitoring, and iterative analysis of the resulting dataset to determine what corrective measures (adaptations) should occur at any given moment. As virtualization becomes widespread in the data center, the need for automating the placement and configuration of workloads (virtual machines) using an adaptive computing model becomes vitally important. This session demonstrates how to use events, introduced in PlateSpin Orchestrate 2.0.2, to create rules that trigger workload provisioning, migration, and other virtual machine lifecycle operations. It will also offer a preview of new functionality included in the upcoming 2.1 release of the product.
In this deck from ATPESC 2019, James Moawad and Greg Nash from Intel present: FPGAs and Machine Learning.
"Neural networks are inspired by biological systems, in particular the human brain. Through the combination of powerful computing resources and novel architectures for neurons, neural networks have achieved state-of-the-art results in many domains such as computer vision and machine translation. FPGAs are a natural choice for implementing neural networks as they can handle different algorithms in computing, logic, and memory resources in the same device. Faster performance comparing to competitive implementations as the user can hardcore operations into the hardware. Software developers can use the OpenCL device C level programming standard to target FPGAs as accelerators to standard CPUs without having to deal with hardware level design."
Watch the video: https://wp.me/p3RLHQ-lnc
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
and
https://www.intel.com/content/www/us/en/products/programmable/fpga.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Recent articles published in VLSI design & Communication SystemsVLSICS Design
International Journal of VLSI design & Communication Systems (VLSICS) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of VLSI Design & Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & communication concepts and establishing new collaborations in these areas.
Authors are solicited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the VLSI design & Communications.
Smart Data Slides: Emerging Hardware Choices for Modern AI Data ManagementDATAVERSITY
Leading edge AI applications have always been resource-intensive and known for stretching the limits of conventional (von Neumann architecture) computer performance. Specialized hardware, purpose built to optimize AI applications, is not new. In fact, it should be no surprise that the very first .com internet domain was registered to Symbolics - a company that built the Lisp Machine, a dedicated AI workstation - in 1985. In the last three decades, of course, the performance of conventional computers has improved dramatically with advances in chip density (Moore’s Law) leading to faster processor speeds, memory speeds, and massively parallel architectures. And yet, some applications - like machine vision for real time video analysis and deep machine learning - always need more power.
Participants in this webinar will learn the fundamentals of the three hardware approaches that are receiving significant investments and demonstrating significant promise for AI applications.
- neuromorphic/neurosynaptic architectures (brain-inspired hardware)
- GPUs (graphics processing units, optimized for AI algorithms), and
- quantum computers (based on principles and properties of quantum-mechanics rather than binary logic).
Note - This webinar requires no previous knowledge of hardware or computer architectures.
Elliptic Curve Cryptography (ECC) is capable of constructing public-key cryptosystems. Specifically, the security of the ECC minimizes to testing the ability to handle and solve the DLP (Discrete Logarithmic Problem) in the group of points of an elliptic curve (ECDLP). ECC based on ECDLP is in the list of recommended algorithms for use by NIST (National Institute of Standards and Technology) and NSA (National Security Agency). Given that ECDLP based cryptosystems are in wide-spread utilization, continuous efforts on monitoring the effectiveness of new attacks or improvements to pre-existing attacks on ECDLP over large prime factor is a significant part in that. This paper aims to provide a secure, effective, and flexible method to improve data security in cloud computing. In this chapter, a novel algorithm using MapReduce and Pollard-Rho’s approach to solve the ECDLP problems and to enhance the security level.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
In this paper we study NVIDIA graphics processing unit (GPU) along with its computational power and applications. Although these units are specially designed for graphics application we can employee there computation power for non graphics application too. GPU has high parallel processing power, low cost of computation and less time utilization; it gives good result of performance per energy ratio. This GPU deployment property for excessive computation of similar small set of instruction played a significant role in reducing CPU overhead. GPU has several key advantages over CPU architecture as it provides high parallelism, intensive computation and significantly higher throughput. It consists of thousands of hardware threads that execute programs in a SIMD fashion hence GPU can be an alternate to CPU in high performance environment and in supercomputing environment. The base line is GPU based general purpose computing is a hot topics of research and there is great to explore rather than only graphics processing application.
Best Practices for On-Demand HPC in Enterprisesgeetachauhan
Traditionally HPC has been popular in Scientific domains, but not in most other Enterprises. With the advent of on-demand-HPC in cloud and growing adoption of Deep Learning, HPC should now be a standard platform for any Enterprise leading with AI and Machine Learning. This session will cover the best practices for building your own on-demand HPC cluster for Enterprise workloads along with key use cases where Enterprises will benefit from HPC solution.
NUMERICAL STUDIES OF TRAPEZOIDAL PROTOTYPE AUDITORY MEMBRANE (PAM)IJCSEA Journal
In this research, we developed numerically a Prototype Auditory Membrane (PAM) for a fully implantable and self contained artificial cochlea. Cochleae are one of the important organs for hearing in the human and animals. Material of the prototype and implant of PAM are made of Polyvinylidene fluoride (PVDF)- Kureha, Japan which is fabricated using MEMS and thin film technologies. Another important thing in the characteristic of the PAM is not only convert the acoustic wave into electric signal but also the frequency selectivity. The thickness, Young’s modulus and density of the PAM are 40 μm, 4 GPa, and 1.79 103 kg/m3, respectively. The shape and dimension of the PAM is trapezoidal with the width is linearly changed from 2.0 to 4.0 mm with the length are 30 mm. Numerically, we develop the model of PAM is based on commercial CFD software, Fluent 6.3.26 and Gambit 2.4.6. The geometry model of the PAM consists of one-sided blocks of quadrilateral elements for 2D model and tetrahedral elements for 3 D model respectively. In this study we set the flow as laminar and carried out using unsteady time dependent calculation. The results show that the frequency selectivity of the membrane is detected on the membrane surface.
Healthcare has became one of the most important aspects of everyones life. Its importance has surged due to the latests outbreaks and due to this latest pandemic it has become mandatory to collaborate to improve everyones Healthcare as soon as possible.
IBM has reacted quickly sharing not only its knowledge but also its Artificial Intelligence Supercomputers all around the world.
Those Supercomputers are helping to prevail this outbreak and also future ones.
They have completely different features compared to proposals from other players of this Supercomputers market.
We will try to make a quick look at the differences of those AI focused Supercomputers and how they can help in the R&D of Healthcare solutions for everyone, from those ones with access to a big IBM AI Supercomputer to those ones with access to only one small IBM AI focused server.
Adaptive Computing Using PlateSpin OrchestrateNovell
Adaptive computing goes beyond just intelligently utilizing available resources; it encompasses quality of service (QoS) targets, fault tolerance (high availability), monitoring, and iterative analysis of the resulting dataset to determine what corrective measures (adaptations) should occur at any given moment. As virtualization becomes widespread in the data center, the need for automating the placement and configuration of workloads (virtual machines) using an adaptive computing model becomes vitally important. This session demonstrates how to use events, introduced in PlateSpin Orchestrate 2.0.2, to create rules that trigger workload provisioning, migration, and other virtual machine lifecycle operations. It will also offer a preview of new functionality included in the upcoming 2.1 release of the product.
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
Splice Machine is an open-source database that combines the benefits of modern lambda architectures with the full expressiveness of ANSI-SQL. Like lambda architectures, it employs separate compute engines for different workloads - some call this an HTAP database (Hybrid Transactional and Analytical Platform). This talk describes the architecture and implementation of Splice Machine V2.0. The system is powered by a sharded key-value store for fast short reads and writes, and short range scans (Apache HBase) and an in-memory, cluster data flow engine for analytics (Apache Spark). It differs from most other clustered SQL systems such as Impala, SparkSQL, and Hive because it combines analytical processing with a distributed Multi-Value Concurrency Method that provides fine-grained concurrency which is required to power real-time applications. This talk will highlight the Splice Machine storage representation, transaction engine, cost-based optimizer, and present the detailed execution of operational queries on HBase, and the detailed execution of analytical queries on Spark. We will compare and contrast how Splice Machine executes queries with other HTAP systems such as Apache Phoenix and Apache Trafodian. We will end with some roadmap items under development involving new row-based and column-based storage encodings.
Speakers:
Monte Zweben, is a technology industry veteran. Monte’s early career was spent with the NASA Ames Research Center as the Deputy Chief of the Artificial Intelligence Branch, where he won the prestigious Space Act Award for his work on the Space Shuttle program. He then founded and was the Chairman and CEO of Red Pepper Software, a leading supply chain optimization company, which merged in 1996 with PeopleSoft, where he was VP and General Manager, Manufacturing Business Unit. In 1998, he was the founder and CEO of Blue Martini Software – the leader in e-commerce and multi-channel systems for retailers. Blue Martini went public on NASDAQ in one of the most successful IPOs of 2000, and is now part of JDA. Following Blue Martini, he was the chairman of SeeSaw Networks, a digital, place-based media company. Monte is also the co-author of Intelligent Scheduling and has published articles in the Harvard Business Review and various computer science journals and conference proceedings. He currently serves on the Board of Directors of Rocket Fuel Inc. as well as the Dean’s Advisory Board for Carnegie-Mellon’s School of Computer Science.
October 2016 HUG: Pulsar, a highly scalable, low latency pub-sub messaging s...Yahoo Developer Network
Yahoo recently open-sourced Pulsar, a highly scalable, low latency pub-sub messaging system running on commodity hardware. It provides simple pub-sub messaging semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication. Pulsar is used across various Yahoo applications for large scale data pipelines. Learn more about Pulsar architecture and use-cases in this talk.
Speakers:
Matteo Merli from Pulsar team at Yahoo
First part of the talk will describe the anatomy of a typical data pipeline and how Apache Oozie meets the demands of large-scale data pipelines. In particular, we will focus on recent advancements in Oozie for dependency management among pipeline stages, incremental and partial processing, combinatorial, conditional and optional processing, priority processing, late processing and BCP management. Second part of the talk will focus on out of box support for spark jobs.
Speakers:
Purshotam Shah is a senior software engineer with the Hadoop team at Yahoo, and an Apache Oozie PMC member and committer.
Satish Saley is a software engineer at Yahoo!. He contributes to Apache Oozie.
Implementing Saas as Cloud controllers using Mobile Agent based technology wi...Sunil Rajput
Setup your own cloud for Software as a Service (SaaS) over the existing LAN in your laboratory. In this assignment you have to write your own code for cloud controllers using open source technologies without HDFS. Implementing the basic operations may be like uploading and downloading files on/from cloud in encrypted form.
Concurrent Matrix Multiplication on Multi-core ProcessorsCSCJournals
With the advent of multi-cores every processor has built-in parallel computational power and that can only be fully utilized only if the program in execution is written accordingly. This study is a part of an on-going research for designing of a new parallel programming model for multi-core architectures. In this paper we have presented a simple, highly efficient and scalable implementation of a common matrix multiplication algorithm using a newly developed parallel programming model SPC3 PM for general purpose multi-core processors. From our study it is found that matrix multiplication done concurrently on multi-cores using SPC3 PM requires much less execution time than that required using the present standard parallel programming environments like OpenMP. Our approach also shows scalability, better and uniform speedup and better utilization of available cores than that the algorithm written using standard OpenMP or similar parallel programming tools. We have tested our approach for up to 24 cores with different matrices size varying from 100 x 100 to 10000 x 10000 elements. And for all these tests our proposed approach has shown much improved performance and scalability
OpenACC and Open Hackathons Monthly Highlights May 2023.pdfOpenACC
Stay up-to-date on the latest news, research, and resources. This month's edition covers the call for speakers for the Open Accelerated Computing Summit, scheduled Open Hackathons and Bootcamps, an interview with Sunita Chandrasekaran, a call for proposals for the DOE's INCITE program, upcoming webinars, and more!
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers the newly released PGI 19.7, the upcoming 2019 OpenACC Annual Meeting, GPU Bootcamp at RIKEN R-CCS, a complete schedule of GPU hackathons and more!
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGcscpconf
In this article, we present a new multistage architecture oriented to real-time complex processing applications. Given a set of rules, this proposed architecture allows the using of different communication links (point to point link, hardware router…) to connect unlimited number of parallel computing elements (software processors) to follow the increasing complexity of algorithms. In particular, this work brings out a parallel implementation of multihypothesis approach for road recognition application on the proposed Multiprocessor Systemon-Chip (MP-SoC) architecture. This algorithm is usually the main part of the lane keeping applications. Experimental results using images of a real road scene are presented. Using a low cost FPGA-based System-on-Chip, our hardware architecture is able to detect and recognize the roadsides in a time limit of 60 mSec. Moreover, we demonstrate that our multistage architecture may be used to achieve good speed-up in solving automotive applications.
1. On
“ADAPTIVE COMPUTING”
Submitted By
Mr. Suyog M. Potdar
Under The Guidance Of
Prof. V. S. Gullhane
Department of
COMPUTER SCIENCE AND ENGINEERING
SIPNA SHIKSHAN PRASARAK MANDAL’S
College of Engineering & Technology, Amravati
Sant Gadge Baba Amravati University,
Amravati.
YEAR- 2010-2011
2. This is to certify that
Mr. Suyog M. Potdar
Of final year B.E.(Comp. Sci & Engg) has Successfully completed his
seminar titled
“ADAPTIVE COMPUTING”
and submitted the seminar report in partial fulfillment of the Degree in
Bachelor of Engineering(Comp. Science & Engg) during academic year
2010-2011
Prof.Dr.P.R.Deshmukh Prof.V.S.Gulhane
(H.O.D) (Guide)
(Dept. Comp Sci & Engg) (Dept. Comp Sci & Engg)
Department of
COMPUTER SCIENCE AND ENGINEERING
SIPNA SHIKSHAN PRASARAK MANDAL’S
College of Engineering & Technology, Amravati
Sant Gadge Baba Amravati University, Amravati
YEAR-2010-2011
3. The making of the seminar needed co-operation and
guidance of a number of people. I therefore consider it my prime
duty to thank all those who had helped me through his venture.
It is my immense pleasure to express my
gratitude to Prof. V. S. Gulhane as my guide who provided me
constructive and positive feedback during the preparation of this
seminar.
I express my sincere thank to the head of
department Prof. Dr. P. R. Deshmukh (Computer Science And
Engg) and all other staff members of Computer department for
their kind co- operation.
Last but not least,I am thankful to my friends and
library staff members whose encouragement and suggestion helped
me to complete my seminar.
I will be also thankful to my parent whose best
wishes are always with me.
Thanking you.
Mr. Suyog M. Potdar
Final Year- Comp. Sci & Engg.
SIPNA’S College of Engg. & Tech.
Amravati.
4. 1 Abstract
2 Introduction
3 Why... Adaptive Computing?
4 A Glance At Adaptive Computing
4.1 What is Adaptive Computing?
4.2 History of Adaptive Computing
4.3 Current Systems
4.4 Programming of reconfigurable computers
4.5 Mitrionics
4.6 SRC
4.7 National Instruments
4.8 Research Projects
4.9 Comparison of systems
4.9.1 Granularity
4.9.2 Rate of reconfiguration
4.9.3 Host coupling
4.9.4 Routing/interconnects
4.9.5 Tool flow
5 Application with Example
Semantic-based Context-aware Dynamic Service Composition
an industrial Application Adaptive Computing
5.1 Dynamic Service Composition
6 Conclusion
7 References
5. 1.ABSTRACT:
Adaptive Computing is emerging as an important new organizational
structure for implementing computations. It combines the post-fabrication
programmability of processors with the spatial computational style most
commonly employed in hardware designs.
The result changes traditional “hardware” and “software” boundaries,
providing an opportunity for greater computational capacity and density
within a programmable media. Adaptive Computing must leverage
traditional CAD technology for building spatial designs. Beyond that,
however, reprogrammablility introduces new challenges and opportunities
for automation, including binding-time and specialization optimizations,
regularity extraction and exploitation, and temporal partitioning and
scheduling.
Due to its potential to greatly accelerate a wide variety of
applications, reconfigurable computing has become a subject of a great deal
of research. Its key feature is the ability to perform computations in
hardware to increase performance, while retaining much of the flexibility of
a software solution. In this introduction to reconfigurable computing, we
give an overview of the hardware architectures of reconfigurable computing
machines, and the software that targets these machines, such as compilation
tools. Finally, we consider the issues involved in run-time reconfigurable
systems, which re-use the configurable hardware during program execution.
Adaptive computing environments involve a variety of smart devices, which
tend to overcharge humans with complex or irrelevant interaction. Ambient
Intelligence pushes forward a vision where technology is integrated into
everyday objects with the intent of making users’ interaction with their
6. surrounding environment simpler and more intuitive. In this paper, we
expose how Ambient Intelligence can benefit from the coupling of a service-
oriented approach and multi-agent systems towards more appropriate
interactions with users. Our approach combines multi-agent techniques with
semantic web services to enable dynamic, context-aware service
composition, thus providing users with relevant high level services
depending on their current context and activity.
[ Ref 4,5]
7. 2. INTRODUCTION:
Today many software programs, devices and resources are deployed as
distributed components. For example, various service providers (e.g.,
Google, Yahoo!) deploy their services (e.g. search, email and map services)
as distributed components through web service Technologies. Various
Audio and Visual devices also become accessible as distributed components
through networking protocols such as Bluetooth, Universal Plug and Play
i.e.
UPnP Devices, Havi, etc. As a Network becomes more ubiquitous, the
number of distributed components available on the network also grows
rapidly.
In the computer and electronics world, we are used to two different
ways of performing computation: hardware and software. Computer
hardware, such as application-specific integrated circuits (ASICs), provides
highly optimized resources for quickly performing critical tasks, but it is
permanently configured to only one application via a multimillion-dollar
design and fabrication effort.
Computer software provides the flexibility to change applications and
perform a huge number of different tasks, but is orders of magnitude worse
than ASIC implementations in terms of performance, silicon area efficiency,
and power usage.
Field-programmable gate arrays (FPGAs) are truly revolutionary devices
that blend the benefits of both hardware and software. They implement
circuits just like hardware, providing huge power, area, and performance
benefits over software, yet can be reprogrammed cheaply and easily to
implement a wide range of tasks. Just like computer hardware, FPGAs
8. implement computations spatially, simultaneously computing millions of
operations in resources distributed across a silicon chip. Such systems can be
hundreds of times faster than microprocessor-based designs. However,
unlike in ASICs, these computations are programmed into the chip, not
permanently frozen by the manufacturing process. This means that an
FPGA-based system can be programmed and reprogrammed many times.
Sometimes reprogramming is merely a bug fix to correct faulty behavior, or
it is used to add a new feature. Other times, it may be carried out to
reconfigure a generic computation engine for a new task, or even to
reconfigure a device during operation to allow a single piece of silicon to
simultaneously do the work of numerous special-purpose chips.
However, merging the benefits of both hardware and software does come at
a price. FPGAs provide nearly all of the benefits of software flexibility and
development models, and nearly all of the benefits of hardware efficiency—
but not quite. Compared to a microprocessor, these devices are typically
several orders of magnitude faster and more power efficient, but creating
efficient programs for them is more complex. Typically, FPGAs are useful
only for operations that process large streams of data, such as signal
processing, networking, and the like.
Compared to ASICs, they may be 5 to 25 times worse in terms of area,
delay, and performance. However, while an ASIC design may take months
to years to develop and have a multimillion-dollar price tag, an FPGA
design might only take days to create and cost tens to hundreds of dollars.
For systems that do not require the absolute highest achievable performance
or power efficiency, an FPGA’s development simplicity and the ability to
easily fix bugs and upgrade functionality make them a compelling design
alternative. For many tasks, and particularly for beginning electronics
designers, FPGAs are the ideal choice.
9. [ Ref 2,5,8]
3.WHY.. ADAPTIVE COMPUTING?
There are two primary methods in traditional computing for the
execution of algorithms. The first is to use an Application Specific
Integrated Circuit, or ASIC, to perform the operations in hardware. Because
these ASICs are designed specifically to perform a given computation, they
are very fast and efficient when executing the exact computation for which
they were designed. However, after fabrication the circuit cannot be altered.
Microprocessors are a far more flexible solution. Processors execute a set of
instructions to perform a computation. By changing the software
instructions, the functionality of the system is altered without changing the
hardware. However, the downside of this flexibility is that the performance
suffers, and is far below that of an ASIC. The processor must read each
instruction from
memory, determine its meaning, and only then execute it. This results in a
high execution overhead for each individual operation. Reconfigurable
computing is intended to fill the gap between hardware and software,
achieving potentially much higher performance than software, while
maintaining a higher level of flexibility than hardware.
It is well known that the computational power of general-purpose computers
is growing exponentially. Nevertheless, demand for computational power is
growing even faster. This deficit has driven research in new computer
architectures which might overcome some limitations of current
microprocessors. To date, most performance improvements have stemmed
from incremental (though by no means trivial) enhancements of the
theoretical von Neumann Architecture.
10. All of these designs, including the most recent superscalar
CPUs, still execute a sequenced stream of instructions taken from a fixed
instruction set. The instruction set is a list of all of the operations the
processor can perform, and it is fixed at the time of chip design. In contrast,
it is interesting to explore architectures that do not have this fixed instruction
set limitation — architectures that are reconfigurable or Adaptive.
This is the focus of a field known as Adaptive computing (AC).
Following are some more reasons to give stress on Adaptive Computing :-
_Need for Speed
Modern processors intended for the PC market
(e.g. Pentium 4, Core 2 Duo, I3, Athlon, etc.) are very powerful.
_What if we need more speed?
Supercomputers are very expensive.
Modern supercomputers are built from a large number of (fairly) ordinary
processors running in parallel.
_What if we also need low power?
Modern embedded CPUs (e.g. the ARM) are quite fast.
_ We have to design hardware
Implementing an algorithm as hardware is often the best way to achieve best
possible performance with the lowest possible transistor count and power
requirements.
_Fortunately. . .
FPGA (Field Programmable Gate Array) technology makes custom
hardware far more accessible.
11. _FPGAs
FPGAs are still 15 to 25 times slower than ASICs and use more power, so
aren’t a perfect solution –
conventional CPUs can still be faster for some things.
_Now the Solution is -
Adaptive Computing (Also known as Reconfigurable Computing)
[ Ref no. 1,2,3]
12. 4. A GLANCE AT ADAPTIVE COMPUTING
4.1] What is Adaptive Computing ?
Adaptivecomputing, also known as Reconfigurable Computing, is a
computer architecture combining some of the flexibility of software with the
high performance of hardware by processing with very flexible high speed
computing fabrics like field-programmable gate arrays (FPGAs). The
principal difference when compared to using ordinary microprocessors is the
ability to make substantial changes to the datapath itself in addition to the
control flow. On the other hand, the main difference with custom hardware,
i.e. application-specific integrated circuits (ASICs) is the possibility to adapt
the hardware during runtime by "loading" a new circuit on the
reconfigurable fabric.
It also refers to a logic chip that can change its physical circuitry on the fly.
Evolved from programmable architectures such as CPLD and FPGA,
adaptive computing is an order of magnitude faster in rate of reuse (ROR)
and can reconfigure itself in nanoseconds.
4.2] History of Adaptive Computing
The concept of reconfigurable computing has existed since the 1960s, when
Gerald Estrin's landmark paper proposed the concept of a computer made of
a standard processor and an array of "reconfigurable" hardware. The main
13. processor would control the behavior of the reconfigurable hardware. The
latter would then be tailored to perform a specific task, such as image
processing or pattern matching, as quickly as a dedicated piece of hardware.
Once the task was done, the hardware could be adjusted to do some other
task. This resulted in a hybrid computer structure combining the flexibility
of software with the speed of hardware; unfortunately this idea was far
ahead of its time in needed electronic technology. [ref 2]
In the 1980s and 1990s there was a renaissance in this area of research with
many proposed reconfigurable architectures developed in industry and
academia, such as: COPACOBANA, Matrix, Garp, Elixent, PACT XPP,
Silicon Hive, Montium, Pleiades, Morphosys, PiCoGA. Such designs were
feasible due to the constant progress of silicon technology that let complex
designs be implemented on one chip. The world's first commercial
reconfigurable computer, the Algotronix CHS2X4, was completed in 1991.
It was not a commercial success, but was promising enough that Xilinx (the
inventor of the Field-Programmable Gate Array, FPGA) bought the
technology and hired the Algotronix staff.
4.3] Current systems
The Adaptive computers can be categorized in two classes of architectures:
hybrid computer and fully FPGA based computers. Both architectures are
designed to transport the benefits of reconfigurable logic to large scale
computing. They can be used in traditional CPU cluster computers and
network infra structures.
The hybrid computer combine a single or a couple of reconfigurable logic
chip, FPGAs, with a standard microprocessor CPU by exchanging e.g. one
14. CPU of a multi CPU board with a FPGA, also known as hybrid-core
computing, or adding a PCI or PCI Express based FPGA expansion card to
the computer. Simplified, they are Von-Neumann based architectures with
an integrated FPGA accelerator. This architectural compromise results in an
reduced scalability of hybrid computers and raises their power consumption.
They have the same bottlenecks as all von Neumann based architectures
have nevertheless it enables users to get an acceleration of their algorithm
without losing their standard CPU based environment.
A relatively new class are the fully FPGA based computers. This class
usually contains no CPUs or uses the CPUs only as interface to the network
environment. Their benefit is to transport the energy efficiency and
scalability of FPGAs fully without compromise to their users. Depending on
the Architecture of the interconnection between the FPGAs these machines
are fully scalable even across single machine borders. Their bus system and
overall architecture eliminate the bottlenecks of the von Neumann
architecture.
Examples for hybrid computers
Known examples for hybrid computers are the XD1. A machine designed by
OctigaBay. Cray supercomputer company (not affiliated with SRC
Computers) acquired OctigaBay and its reconfigurable computing platform,
which Cray marketed as the XD1 until recently. SGI sells the RASC
platform with their Altix series of supercomputers. SRC Computers, Inc.
has developed a family of reconfigurable computers based on their
IMPLICIT+EXPLICIT architecture and MAP processor. A current example
of hybrid-core computers is Convey Computer Corporation's HC-1, which
15. has both an Intel x86 processor and a Xilinx FPGA coprocessor. Another
example is CompactRIO from National Instruments.
Examples for fully FPGA based computers
There are a several academic projects who tried to create fully FPGA based
computers for several markets. One is the COPACOBANA the Cost
Optimized Codebreaker and Analyzer. A spin-off company SciEngines of
the COPACOBANA-Project of the universities of Bochum and Kiel in
Germany currently sells the fourth generation of fully FPGA based
computers, COPACOBANA RIVYERA (Reconfigure Versatally your raw
architecture).
4.4] Programming of reconfigurable computers
The FPGA reconfiguration can be accomplished either via the traditional
hardware description languages (HDLs), which can be generated directly or
by using electronic design automation ("EDA") or electronic system level
("ESL") tools, employing high level languages like the graphical tool
Starbridge Viva or C-based languages like for example Impulse C from
Impulse Accelerated Technologies, SystemC, LISA, Handel-C from
Celoxica, DIME-C from Nallatech, C-to-Verilog.com or Mitrion-C from
Mitrionics, or graphical programming languages like LabVIEW.
4.5] Mitrionics
Mitrionics has developed a SDK that enables software written using a single
assignment language to be compiled and executed on FPGA-based
computers. The Mitrion-C software language and Mitrion processor enable
software developers to write and execute applications on FPGA-based
computers in the same manner as with other computing technologies, such
as graphical processing units (“GPUs”), cell-based processors, parallel
16. processing units (“PPUs”), multi-core CPUs, and traditional single-core
CPU clusters.
4.6] SRC
SRC has developed a "Carte" compiler that takes an existing high-level
languages like C or Fortran, and with a few modifications, compiles them
for execution on both the FPGA and microprocessor. According to SRC
literature[citation needed], "...application algorithms are written in a high-
level language such as C or Fortran. Carte extracts the maximum parallelism
from the code and generates pipelined hardware logic that is instantiated in
the MAP. It also generates all the required interface code to manage the
movement of data to and from the MAP and to coordinate the
microprocessor with the logic running in the MAP." (note that SRC also
allows a traditional HDL flow to be used). The SRC systems communicate
via the SNAP memory interface, and/or the (optional) Hi-Bar switch.
4.7] National Instruments
National Instruments have developed a hybrid embedded computing system
called CompactRIO. CompactRIO systems consist of reconfigurable chassis
housing the user-programmable FPGA, hot swappable I/O modules, real-
time controller for deterministic communication and processing, and
graphical LabVIEW software for rapid RT and FPGA programming.
4.8] Research projects
The research community is also acting on the subject with projects like
MORPHEUS in Europe which implements on a single 100 mm² 90 nm chip
17. an ARM9 processor, an eFPGA from Abound Logic (formerly M2000), a
DREAM PiCoGA and a PACT XPP matrix.
Abound Logic contributes to the MORPHEUS project with an embedded
FPGA, and uses the same architecture to make its very large standard
FPGAs.
The University of Florida also has its own version of a reconfigurable
computer called Novo-G that is currently the world's fastest and most
powerful.
4.9] Comparison of systems
As an emerging field, classifications of reconfigurable architectures are still
being developed and refined as new architectures are developed; no unifying
taxonomy has been suggested to date. However, several recurring
parameters can be used to classify these systems.
4.9.1] Granularity
The granularity of the reconfigurable logic is defined as the size of the
smallest functional unit (configurable logic block, CLB) that is addressed by
the mapping tools. High granularity, which can also be known as fine-
grained, often implies a greater flexibility when implementing algorithms
into the hardware. However, there is a penalty associated with this in terms
of increased power, area and delay due to greater quantity of routing
required per computation. Fine-grained architectures work at the bit-level
manipulation level; whilst coarse grained processing elements
(reconfigurable datapath unit, rDPU) are better optimised for standard data
path applications. One of the drawbacks of coarse grained architectures are
that they tend to lose some of their utilisation and performance if they need
to perform smaller computations than their granularity provides, for example
for a one bit add on a four bit wide functional unit would waste three bits.
18. This problem can be solved by having a coarse grain array (reconfigurable
datapath array, rDPA) and a FPGA on the same chip.
Coarse-grained architectures (rDPA) are intended for the implementation for
algorithms needing word-width data paths (rDPU). As their functional
blocks are optimized for large computations and typically comprise word
wide arithmetic logic units (ALU), they will perform these computations
more quickly and with more power efficiency than a set of interconnected
smaller functional units; this is due to the connecting wires being shorter,
resulting in less wire capacitance and hence faster and lower power designs.
A potential undesirable consequence of having larger computational blocks
is that when the size of operands may not match the algorithm an inefficient
utilisation of resources can result. Often the type of applications to be run
are known in advance allowing the logic, memory and routing resources to
be tailored (for instance, see KressArray Xplorer) to enhance the
performance of the device whilst still providing a certain level of flexibility
for future adaptation. Examples of this are domain specific arrays aimed at
gaining better performance in terms of power, area, throughput than their
more generic finer grained FPGA cousins by reducing their flexibility.
4.9.2] Rate of reconfiguration
Configuration of these reconfigurable systems can happen at deployment
time, between execution phases or during execution. In a typical
reconfigurable system, a bit stream is used to program the device at
deployment time. Fine grained systems by their own nature require greater
configuration time than more coarse-grained architectures due to more
elements needing to be addressed and programmed. Therefore more coarse-
grained architectures gain from potential lower energy requirements, as less
information is transferred and utilised. Intuitively, the slower the rate of
reconfiguration the smaller the energy consumption as the associated energy
19. cost of reconfiguration are amortised over a longer period of time. Partial re-
configuration aims to allow part of the device to be reprogrammed while
another part is still performing active computation. Partial re-configuration
allows smaller reconfigurable bit streams thus not wasting energy on
transmitting redundant information in the bit stream. Compression of the bit
stream is possible but careful analysis is to be carried out to ensure that the
energy saved by using smaller bit streams is not outweighed by the
computation needed to decompress the data.
4.9.3] Host coupling
Often the reconfigurable array is used as a processing accelerator attached to
a host processor. The level of coupling determines the type of data transfers,
latency, power, throughput and overheads involved when utilising the
reconfigurable logic. Some of the most intuitive designs use a peripheral bus
to provide a coprocessor like arrangement for the reconfigurable array.
However, there have also been implementations where the reconfigurable
fabric is much closer to the processor, some are even implemented into the
data path, utilising the processor registers. The job of the host processor is to
perform the control functions, configure the logic, schedule data and to
provide external interfacing.
4.9.4] Routing/interconnects
The flexibility in reconfigurable devices mainly comes from their routing
interconnect. One style of interconnect made popular by FPGAs vendors,
Xilinx and Altera are the island style layout, where blocks are arranged in an
array with vertical and horizontal routing. A layout with inadequate routing
may suffer from poor flexibility and resource utilisation, therefore providing
limited performance. If too much interconnect is provided this requires more
20. transistors than necessary and thus more silicon area, longer wires and more
power consumption.
4.9.5] Tool flow
Generally, tools for configurable computing systems can be split up in two
parts, CAD tools for reconfigurable array and compilation tools for CPU.
The front-end compiler is an integrated tool, and will generate a structural
hardware representation that is input of hardware design flow. Hardware
design flow for reconfigurable architecture can be classified by the approach
adopted by three main stages of design process: technology mapping,
placement algorithm and routing algorithm. The software frameworks differ
in the level of the programming language.
Some types of reconfigurable computers are microcoded processors where
the microcode is stored in RAM or EEPROM, and changeable on reboot or
on the fly. This could be done with the AMD 2900 series bit slice processors
(on reboot) and later with FPGAs (on the fly).
Some[which?] dataflow processors are implemented using reconfigurable
computing.
A new method of application development for reconfigurable computing is
being developed by MNB Technologies, Inc,[1] under contract to the United
States Air Force Office of Scientific Research (AFOSR). This approach uses
a national repository of generic algorithms, similar to the BLAS and
LAPACK libraries found at netlib.org. In addition to the repository, the
project is developing a tightly integrated suite of expert system based tools
that largely eliminate the need for an application developer to have any in-
depth knowledge of the underlying hardware or how to use the specialized
21. Verilog and VHDL hardware description languages. The results of this
research will be available without charge to individuals and organizations
based in the United States.
To compare the effect of various ways to implement an algorithm on the
runtime and energy used, some tools allow compiling the same piece of C
code for a fixed CPU, a soft processor, or compiling directly to FPGA
[Ref no. 9, 10]
4.10] Applications of Adaptive Computing
Some of the Applications are :-
1] Isolation Points.
2] On the use of memory and resources in Minority Games.
3] Semantic Based Context Aware Dynamic Service Composition.
4] Exploiting User Location for Load Balancing WLAN's and improving
Wireless QoS
5] Self Adaptive Software: Landscape and Research Challenges.[Ref no. 8]
22. 5] Semantic-based Context-aware Dynamic
Service Composition- an Industria Application
Adaptive Computing
Complex services may be dynamically composed through combining
distributed components on demand (i.e., when requested by a user) in order
to provide new services without preinstallation. Several systems have been
proposed to dynamically compose services. However, they require users to
request services in a manner that is not intuitive to the users. In order to
allow a user to request a service in an intuitive form (e.g., using a natural
language), this paper proposes a semantics-based service composition
architecture. The proposed architecture obtains the semantics of the service
requested in an intuitive form, and dynamically composes the requested
service based on the semantics of the service. To compose a service based
on its semantics, the proposed architecture supports semantic representation
of components [through a component model named Component Service
Model with Semantics (CoSMoS)], discovers components required to
compose a service [through a middleware named Component Runtime
Environment (CoRE)], and composes the requested service based on its
semantics and the semantics of the discovered components [through a
service composition mechanism named Semantic Graph-Based Service
Composition (SeGSeC)].
23. This paper presents the design, implementation and empirical evaluation of
the proposed architecture.
5.1] Dynamic Service Composition
The recent development of distributed component technologies such as
CORBA andWeb Service made it possible to componentize various software
programs, devices, and resources and to distribute them over a network. In
such an environment where a large number of various components are
distributed, it is possible to dynamically compose a service on demand,
i.e., composing a service upon receiving a request from a user, through
discovering, combining and executing necessary components. Composing
services on demand (i.e., dynamic service composition) has various
advantages. For instance, by dynamically composing services on demand,
services do not need to be configured or deployed in advance. In addition, by
composing services based on requests from users, it is possible to customize
the services to individual user profiles.
To illustrate the advantage of dynamic service composition, consider the
following example scenario. Suppose Tomwants to take his family to a new
restaurant he recently found on the web, so he wants to print out a map
showing a direction to the restaurant from his house. Assume that: 1) the
restaurant’sWeb server stores the restaurant’s information such as its address
in a structured document (e.g., in XML); 2) Tom’s PC stores his personal
information such as his home address in a database; 3) there is a Web
Service which, given two addresses, generates a map showing a direction
from one address to the other; and 4) Tom has a printer connected to his
home network. In order to print out the map showing the direction from his
24. home to the restaurant with the technology currently available, Tom has to
manually perform the following steps: a) discover the restaurant’s homepage
and the Web Service that generates a map; b) obtain the addresses of the
restaurant and his home; c) invoke theWeb Service using the addresses
obtained; and d) print out the map generated by theWeb Service. However,
if Tomis not an experienced PC user, it may be difficult for him to perform
these steps. For instance, he may not know how to input the restaurant’s
address to theWeb Service that generates a map. Dynamic service
composition, on the other hand, will automatically compose the direction
printing service, upon Tom’s request, by discovering the four necessary
components, identifying the steps a)–d) and executing them on Tom’s
behalf. Since the direction printing service is composed on demand, it is not
required to be configured or deployed in advance. Also, if Tom’s daughter,
Alice, requests for a direction to the restaurant, and if she carries a PDA with
her, the service may be customized for her such that it shows the map on her
PDA’s display instead of printing it out.
[Ref no. 3,1,8]
25. 6. CONCLUSION:
Reconfigurable computing is becoming an important part of research in
computer architectures and software systems. By placing the
computationally intense portions of an application onto the reconfigurable
hardware, the overall application can be greatly accelerated. This is because
reconfigurable
computing combines the benefits of both software and ASIC
implementations. Like software, the mapped circuit is flexible, and can be
changed over the lifetime of the system or even the execution time of an
application. Similar to an ASIC, reconfigurable systems provide a method to
map circuits into hardware, achieving far greater performance than software
as a result of bypassing the fetch-decode-execute cycle of traditional
microprocessors, and parallel execution of multiple operations.
Reconfigurable hardware systems come in many forms, from a configurable
functional unit integrated directly into a CPU, to a reconfigurable co
processor coupled with a host microprocessor, to a multi-FPGA stand-alone
unit. The level of coupling, granularity of computation structures, and form
of routing resources are all key points in the design of reconfigurable
systems. Compilation tools for reconfigurable systems range from simple
tools that aid in the manual design and placement of circuits, to fully
automatic design suites that use program code written in a high-level
language to generate circuits and the controlling software. The variety of
26. tools available allows designers to choose between manual and automatic
circuit creation for any or all of the design steps. Although automatic tools
greatly simplify the design process, manual creation is still important for
performancedriven applications.
Finally, run-time reconfiguration provides a method to
accelerate a greater portion of a given application by allowing the
configuration of the hardware to change over time. Apart from the benefits
of added capacity through the use of virtual hardware, run-time
reconfiguration also allows for circuits to be optimized based on run-time
conditions. In this manner, the performance of a reconfigurable system can
approach or even surpass that of an ASIC.
Reconfigurable computing systems have shown the ability to greatly a
ccelerate program execution, providing a high-performance alternative to
software-only implementations. However, no one hardware design has
emerged as the clear pinnacle of reconfigurable design. Although general-
purpose FPGA structures have standardized into LUT-based architectures,
groups designing hardware for reconfigurable computing are currently also
exploring the use of heterogeneous structures and word-width computational
elements. Those designing compiler systems face the task of improving
automatic design tools to the point where they may achieve mappings
comparable to manual design for even high-performance applications.
Within both of these research categories lies the additional topic of run-time
reconfiguration. While some work has been done in this field as well,
research must continue in order to be able to perform faster and more
efficient reconfiguration. Further study into each of these topics is necessary
in order to harness the full potential of reconfigurable computing.
27. [ Ref no. 4,6,7]
7. REFERENCES:
1] D. Mennie and B. Pagurek, “An architecture to support dynamic
composition
of service components,” in Proc. 5th Int. Workshop Component-
Oriented Program., Sophia Antipolis, France, 2000
2.] Estrin, G. 2002. Reconfigurable computer origins: the UCLA fixed-plus-
variable (F+V) structure computer. IEEE Ann. Hist. Comput. 24, 4 (Oct.
2002), 3–9. DOI=http://dx.doi.org/10.1109/MAHC.2002.1114865
3] F. Casati, S. Ilnicki, L.-J. Jin, V. Krishnamoorthy, and M.-C. Shan,
“Adaptive and dynamic service composition in eFlow,” in Proc. Int.
Conf Advanced Inf. Syst. Eng., Stockholm, Sweden, 2000.
4] K. Compton, S. Hauck, “Configurable Computing: A Survey of Systems
and Software",
Northwestern University, Dept. of ECE Technical Report, 1999.
5] M. Minami, H. Morikawa, and T. Aoyama, “The design and evaluation
of an interface-based naming system for supporting service synthesis
in ubiquitous computing environment,” Trans. Inst. Electron., Inf.
Commun. Eng., vol. J86-B, no. 5, pp. 777–789, May 2003.
6] S. Hauck, “The Roles of FPGAs in Reprogrammable Systems”
Proceedings of the IEEE,
28. Vol. 86, No. 4, pp. 615-638, April 1998.
7] W. H. Mangione-Smith, B. Hutchings, D. Andrews, A. DeHon
8] ACM Transactions on Autonomous and Adaptive Systems Vol 4 No. 2 ,
2009 -http://www.acm.org/tass
9] http://www.manuals-search-pdf.com/Adaptive_Computing
10] http://www.pdfgeni.com/Reconfigurable_Computing