This document provides an overview of challenges for future embedded systems, including:
1) A performance gap is emerging as transistor scaling slows and instruction-level parallelism improvements stagnate.
2) Power and energy constraints are increasing as leakage power rises and batteries do not improve at the rate of Moore's Law.
3) Reusing existing binary code is important for commercial success but maintains compatibility with inefficient instruction sets.
4) Yield and manufacturing costs are rising rapidly due to mask costs, lithography costs, and increased design complexity verification.
The Case Study highlights Mistral’s expertise in the architecture and design of a custom hybrid system that supports two industry standard bus architectures VME and VPX, to meet the requirements of input/output signals, communication links, processing capability and data transfer capability.
International Standards: The Challenges for an Interoperable Smart GridSchneider Electric
Building an electric energy Smart Grid involves proper interfacing between existing devices, applications and systems – all likely sourced from many different vendors. The resulting interoperability allows valuable advantages, such as the ability to use distribution system demand response (DSDR) to improve the efficiency of delivered power. Interoperability enables automated switching sequences, for system ‘self-healing’ and improved reliability, along with effective integration of distributed renewable and non-renewable resources that can enable peak shaving. Interoperability also is vital for assimilating emerging automation technologies that will enable the utility to realize these benefits in the future – and protect public and private sector technology investments.
The International Electrotechnical Commission (IEC) defines international standards, recognized globally, that characterize interoperability and security of electrical, electronic and related technologies. These standards are created to assure interoperability within all the major power system objects in an electrical utility enterprise and allow mission critical distribution functions to take advantage of real-time data in a secure manner. The IEC standards also enable reliable exchange of data among utilities and across power pools.
The U.S. National Institute of Standards and Technology (NIST) is incorporating IEC standards, and developing new or revised standards, to be applied in its development of a Smart Grid as a national energy goal. This standards framework aims to eliminate the implementation of technologies that might become obsolete prematurely or be implemented without necessary security measures – and help utilities make the infrastructure decisions that reduce cost and energy loss, improve network reliability and embrace technology innovation.
Get with the system - Rogerio Martins, Schneider Electric disucsses the advantages of modern distributed control systems in coal handling preparation plants.
The Case Study highlights Mistral’s expertise in the architecture and design of a custom hybrid system that supports two industry standard bus architectures VME and VPX, to meet the requirements of input/output signals, communication links, processing capability and data transfer capability.
International Standards: The Challenges for an Interoperable Smart GridSchneider Electric
Building an electric energy Smart Grid involves proper interfacing between existing devices, applications and systems – all likely sourced from many different vendors. The resulting interoperability allows valuable advantages, such as the ability to use distribution system demand response (DSDR) to improve the efficiency of delivered power. Interoperability enables automated switching sequences, for system ‘self-healing’ and improved reliability, along with effective integration of distributed renewable and non-renewable resources that can enable peak shaving. Interoperability also is vital for assimilating emerging automation technologies that will enable the utility to realize these benefits in the future – and protect public and private sector technology investments.
The International Electrotechnical Commission (IEC) defines international standards, recognized globally, that characterize interoperability and security of electrical, electronic and related technologies. These standards are created to assure interoperability within all the major power system objects in an electrical utility enterprise and allow mission critical distribution functions to take advantage of real-time data in a secure manner. The IEC standards also enable reliable exchange of data among utilities and across power pools.
The U.S. National Institute of Standards and Technology (NIST) is incorporating IEC standards, and developing new or revised standards, to be applied in its development of a Smart Grid as a national energy goal. This standards framework aims to eliminate the implementation of technologies that might become obsolete prematurely or be implemented without necessary security measures – and help utilities make the infrastructure decisions that reduce cost and energy loss, improve network reliability and embrace technology innovation.
Get with the system - Rogerio Martins, Schneider Electric disucsses the advantages of modern distributed control systems in coal handling preparation plants.
The rush to the edge and new applications around AI are causing a shift in design strategies toward the highest performance per watt, rather than the highest performance or lowest power.
System on Chip is a an IC that integrates all the components of an electronic system. This presentation is based on the current trends and challenges in the IP based SOC design.
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
In this video from the HPC User Forum at Argonne, Arno Kolster from Providentia Worldwide presents: Applying Cloud Techniques to Address Complexity in HPC System Integrations.
"The Oak Ridge Leadership Computing Facility (OLCF) and technology consulting company Providentia Worldwide recently collaborated to develop an intelligence system that combines real-time updates from the IBM AC922 Summit supercomputer with local weather and operational data from its adjacent cooling plant, with the goal of optimizing Summit’s energy efficiency. The OLCF proposed the idea and provided facility data, and Providentia developed a scalable platform to integrate and analyze the data."
Watch the video: https://wp.me/p3RLHQ-kOg
Learn more: http://www.providentiaworldwide.com/
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSijdpsjournal
Advances in Integrated Circuit processing allow for more microprocessor design options. As Chip Multiprocessor system (CMP) become the predominant topology for leading microprocessors, critical components of the system are now integrated on a single chip. This enables sharing of computation resources that was not previously possible. In addition the virtualization of these computation resources exposes the system to a mix of diverse and competing workloads. On chip Cache memory is a resource of primary concern as it can be dominant in controlling overall throughput. This Paper presents analysis of various parameters affecting the performance of Multi-core Architectures like varying the number of cores, changes L2 cache size, further we have varied directory size from 64 to 2048 entries on a 4 node, 8 node 16 node and 64 node Chip multiprocessor which in turn presents an open area of research on multicore processors with private/shared last level cache as the future trend seems to be towards tiled architecture executing multiple parallel applications with optimized silicon area utilization and excellent performance.
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsHannes Tschofenig
Position paper for the NIST Lightweight Cryptography Workshop, 20th and 21st July 2015, Gaithersburg, US.
The link to the workshop is available at: http://www.nist.gov/itl/csd/ct/lwc_workshop2015.cfm
The rush to the edge and new applications around AI are causing a shift in design strategies toward the highest performance per watt, rather than the highest performance or lowest power.
System on Chip is a an IC that integrates all the components of an electronic system. This presentation is based on the current trends and challenges in the IP based SOC design.
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
In this video from the HPC User Forum at Argonne, Arno Kolster from Providentia Worldwide presents: Applying Cloud Techniques to Address Complexity in HPC System Integrations.
"The Oak Ridge Leadership Computing Facility (OLCF) and technology consulting company Providentia Worldwide recently collaborated to develop an intelligence system that combines real-time updates from the IBM AC922 Summit supercomputer with local weather and operational data from its adjacent cooling plant, with the goal of optimizing Summit’s energy efficiency. The OLCF proposed the idea and provided facility data, and Providentia developed a scalable platform to integrate and analyze the data."
Watch the video: https://wp.me/p3RLHQ-kOg
Learn more: http://www.providentiaworldwide.com/
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSijdpsjournal
Advances in Integrated Circuit processing allow for more microprocessor design options. As Chip Multiprocessor system (CMP) become the predominant topology for leading microprocessors, critical components of the system are now integrated on a single chip. This enables sharing of computation resources that was not previously possible. In addition the virtualization of these computation resources exposes the system to a mix of diverse and competing workloads. On chip Cache memory is a resource of primary concern as it can be dominant in controlling overall throughput. This Paper presents analysis of various parameters affecting the performance of Multi-core Architectures like varying the number of cores, changes L2 cache size, further we have varied directory size from 64 to 2048 entries on a 4 node, 8 node 16 node and 64 node Chip multiprocessor which in turn presents an open area of research on multicore processors with private/shared last level cache as the future trend seems to be towards tiled architecture executing multiple parallel applications with optimized silicon area utilization and excellent performance.
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsHannes Tschofenig
Position paper for the NIST Lightweight Cryptography Workshop, 20th and 21st July 2015, Gaithersburg, US.
The link to the workshop is available at: http://www.nist.gov/itl/csd/ct/lwc_workshop2015.cfm
Low power network on chip architectures: A surveyCSITiaesprime
Mostly communication now days is done through system on chip (SoC) models so, network on chip (NoC) architecture is most appropriate solution for better performance. However, one of major flaws in this architecture is power consumption. To gain high performance through this type of architecture it is necessary to confirm power consumption while designing this. Use of power should be diminished in every region of network chip architecture. Lasting power consumption can be lessened by reaching alterations in network routers and other devices used to form that network. This research mainly focusses on state-of-the-art methods for designing NoC architecture and techniques to reduce power consumption in those architectures like, network architecture, network links between nodes, network design, and routers.
I understand that physics and hardware emmaded on the use of finete .pdfanil0878
I understand that physics and hardware emmaded on the use of finete element methods to predict
fluid flow over airplane wings,that progress is likely to continue. However, in recent years, this
progress has been achieved through greatly increased hardware complexity with the rise of
multicore and manycore processors, and this is affecting the ability of application developers to
achieve the full potential of these systems. currently performance is measured on a dense
matrix–matrix multiplication test which has questionable relevance to real applications.the
incredible advances in processor technology and all of the accompanying aspects of computer
system design, such as the memory subsystem and networking
In embedded it seems to combination of both hardware and the software , it is used to be
combined function of action in the systems .while we do that the application to developed in the
achieve the full potential of the systems in advanced processer technology.
Hardware
(1) Memory
Advances in memory technology have struggled to keep pace with the phenomenal advances in
processors. This difficulty in improving the main memory bandwidth led to the development of a
cache hierarchy with data being held in different cache levels within the processor. The idea is
that instead of fetching the required data multiple times from the main memory, it is instead
brought into the cache once and re-used multiple times. Intel allocates about half of the chip to
cache, with the largest LLC (last-level cache) being 30MB in size. IBM\'s new Power8 CPU has
an even larger L3 cache of up to 96MB [4]. By contrast, the largest L2 cache in NVIDIA\'s
GPUs is only 1.5MB.These different hardware design choices are motivated by careful
consideration of the range of applications being run by typical users.
One complication which has become more common and more important in the past few years is
non-uniform memory access. Ten years ago, most shared-memory multiprocessors would have
several CPUs sharing a memory bus to access a single main memory. A final comment on the
memory subsystem concerns the energy cost of moving data compared to performing a single
floating point computation.
(2) Processors
CPUs had a single processing core, and the increase in performance came partly from an increase
in the number of computational pipelines, but mainly through an increase in clock frequency.
Unfortunately, the power consumption is approximately proportional to the cube of the
frequency and this led to CPUs with a power consumption of up to 250W.CPUs address memory
bandwidth limitations by devoting half or more of the chip to LLC, so that small applications can
be held entirely within the cache. They address the 200-cycle latency issue by using very
complex cores which are capable of out-of-order execution , By contrast, GPUs adopt a very
different design philosophy because of the different needs of the graphical applications they
target. A GPU usually has a number of functional u.
A New Direction for Computer Architecture Researchdbpublications
This paper we suggest a different computing environment as a worthy new direction for computer architecture research: personal mobile computing, where portable devices are used for visual computing and personal communications tasks. Such a device supports in an integrated fashion all the functions provided to-day by a portable computer, a cellular phone, a digital camera and a video game. The requirements placed on the processor in this environment are energy efficiency, high performance for multimedia and DSP functions, and area efficient, scalable designs. We examine the architectures that were recently pro-posed for billion transistor microprocessors. While they are very promising for the stationary desktop and server workloads, we discover that most of them are un-able to meet the challenges of the new environment and provide the necessary enhancements for multimedia applications running on portable devices.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
2. 2 A.C.S. Beck et al.
these conflicting design constraints in a sustainable fashion, and still allow huge
fabrication volumes. Each challenge is developed in details throughout the next
chapters, providing an extensive literature review as well as settling a promising
research agenda for adaptability.
1.1 Performance Gap
The possibility of increasing the number of transistors inside an integrated circuit
with the passing years, according to Moore’s Law, has been sustaining the perfor-
mance growth along the years. However, this law, as known today, will no longer
hold in a near future. The reason is very simple: physical limits of silicon [11, 19].
Because of that, new technologies that will completely or partially replace silicon
are arising. However, according to the ITRS roadmap [10], these technologies have
either higher density levels and are slower than traditional scaled CMOS, or entirely
the opposite: new devices can achieve higher speeds but with a huge area and power
overhead, even if one considers future CMOS technologies.
Additionally, high performance architectures as the diffused superscalar machines
are achieving their limits. According to what is discussed in [3, 7], and [17], there
are no novel research results in such systems regarding performance improvements.
The advances in ILP (Instruction Level Parallelism) exploitation are stagnating:
considering Intel’s family of processors, the overall efficiency (comparison of
processors performance running at the same clock frequency) has not significantly
increased since the Pentium Pro in 1995. The newest Intel architectures follow the
same trend: the Core2 micro architecture has not presented a significant increase in
its IPC (Instructions per Cycle) rate, as demonstrated in [15].
Performance stagnation occurs because these architectures are challenging some
well-known limits of the ILP [21]. Therefore, even small increases in the ILP
became extremely costly. One of the techniques used to increase ILP is the careful
choice of the dispatch width. However, the dispatch width offers serious impacts on
the overall circuit area. For example, the register bank area grows cubically with
the dispatch width, considering a typical superscalar processor such as the MIPS
R10000 [5].
In [1], the so-called “Mobile Supercomputers” are discussed, which are those
embedded devices that will need to perform several intensive computational tasks,
such as real-time speech recognition, cryptography, augmented reality, besides
the conventional ones, like word and e-mail processing. Even considering desk-
top computer processors, new architectures may not meet the requirements for
future and more computational demanding embedded systems, giving rise to a
performance gap.
3. 1 Adaptability: The Key for Future Embedded Systems 3
1.2 Power and Energy Constraints
Additionally to performance, one should take into account that the potentially
largest problem in embedded systems design is excessive power consumption.
Future embedded systems are expected not to exceed 75 mW, since batteries do not
have an equivalent Moore’s law [1]. Furthermore, leakage power is becoming more
important and, while a system is in standby mode, leakage will be the dominant
source of power consumption. Nowadays, in general purpose microprocessors, the
leakage power dissipation is between 20 and 30 W (considering a total power budget
of 100 W) [18].
One can observe that, in order to attain the power constraints, companies are
migrating to chip multiprocessors to take advantage of the extra area available, even
though there is still a huge potential to speed up single threaded software. In the
essence, stagnation in the increase of clock frequency, excessive power consumption
and higher hardware costs to ILP exploitation, together with the foreseen slower
technologies, are new architectural challenges that must be dealt with.
1.3 Reuse of Existing Binary Code
Among thousands of products launched by consumer electronics companies, one
can observe those which become a great success and those which completely fail.
The explanation perhaps is not just about their quality, but it is also about their
standardization in the industry and the concern of the final user on how long the
product that is being acquired will be subject to updates.
The x86 architecture is one of these major examples. Considering nowadays
standards, the x86 ISA (Instruction Set Architecture) itself does not follow the
last trends in processor architectures. It was developed at a time when memory
was considered very expensive and developers used to compete on who would
implement more and different instructions in their architectures. The x86 ISA
is a typical example of a traditional CISC machine. Nowadays, the newest x86
compatible architectures spend extra pipeline stages plus a considerable area in
control logic and microprogrammable ROM just to decode these CISC instructions
into RISC-like ones. This way, it is possible to implement deep pipelining and all
other high performance RISC techniques while maintaining the x86 instruction set
and, consequently, backward software compatibility.
Although new instructions have been included in the x86 original instruction
set, like the SIMD, MMX, and SSE ones [6], targeting multimedia applications,
there is still support to the original 80 instructions implemented in the very first x86
processor. This means that any software written for any x86 in the past, even those
launched at the end of the 1970s, can be executed on the last Intel processor. This is
one of the keys to the success of this family: the possibility of reusing the existing
binary code, without any kind of modification. This was one of the main reasons
4. 4 A.C.S. Beck et al.
why this product became the leader in its market segment. Intel could guarantee to
its consumers that their programs would not be obsoleted during a long period of
time and, even when changing the system to a faster one, they would still be able to
reuse and execute the same software again.
Therefore, companies such as Intel and AMD keep implementing more power
consuming superscalar techniques, trying to push the frequency increase for their
operation to the extreme. Branch predictors with higher accuracy, more advanced
algorithms for parallelism detection, or the use of Simultaneous Multithreading
(SMT) architectures, like the Intel Hyperthreading [12], are some of the known
strategies. However, the basic principle used for high performance architectures is
still the same: superscalarity. As embedded products are more and more based on a
huge amount of software development, the cost of sustaining legacy code will most
likely have to be taken into consideration when new platforms come to the market.
1.4 Yield and Manufacturing Costs
In [16], a discussion is made about the future of the fabrication processes using
new technologies. According to the authors standard cells, as they are today, will
not exist anymore. As the manufacturing interface is changing, regular fabrics will
soon become a necessity. How much regularity versus how much configurability
is necessary (as well as the granularity of these regular circuits) is still an open
question. Regularity can be understood as the replication of equal parts, or blocks, to
compose a whole. These blocks can be composed of gates, standard-cells, standard-
blocks, to name a few. What is almost a consensus is the fact that the freedom of the
designers, represented by the irregularity of the project, will be more expensive in
the future. By the use of regular circuits, the design company can decrease costs, as
well as the possibility of manufacturing defects, since the reliability of printing the
geometries employed today in 65 nm and below is a big issue. In [4] it is claimed
that maybe the main research focus for researches when developing a new system
will be reliability, instead of performance.
Nowadays, the amount of resources to create an ASIC design of moderately high
volume, complexity and low power, is considered very high. Some design compa-
nies can still succeed to do it because they have experienced designers, infrastructure
and expertise. However, for the very same reasons, there are companies that just
cannot afford it. For these companies, a more regular fabric seems the best way to
go as a compromise for using an advanced process. As an example, in 1997 there
were 11,000 ASIC design startups. This number dropped to 1,400 in 2003 [20]. The
mask cost seems to be the primary problem. For example, mask costs for a typical
system-on-chip have gone from $800,000 at 65 nm to $2.8 million at 28 nm [8]. This
way, to maintain the same number of ASIC designs, their costs need to return to tens
of thousands of dollars.
The costs concerning the lithography tool chain to fabricate CMOS transistors
are also a major source of high expenses. According to [18], the costs related to
5. 1 Adaptability: The Key for Future Embedded Systems 5
lithography steppers increased from $10 to $35 million in this decade. Therefore,
the cost of a modern factory varies between $2 and $3 billion. On the other hand,
the cost per transistor decreases, because even though it is more expensive to build
a circuit nowadays, more transistors are integrated onto one die.
Moreover, it is very likely that the design and verification costs are growing in the
same proportion, impacting the final cost even more. For the 0.8 μm technology, the
non-recurring engineering (NRE) costs were only about $40,000. With each advance
in IC technology, the NRE costs have dramatically increased. NRE costs for 0.18 μm
design are around $350,000, and at 0.13 μm, the costs are over $1 million [20]. This
trend is expected to continue at each subsequent technology node, making it more
difficult for designers to justify producing an ASIC using nowadays technologies.
The time it takes for a design to be manufactured at a fabrication facility and
returned to the designers in the form of an initial IC (turnaround time) is also
increasing. Longer turnaround times lead to higher design costs, which may imply
in loss of revenue if the design is late to the market.
Because of all these issues discussed before, there is a limit in the number
of situations that can justify producing designs using the latest IC technology.
Already in 2003, less than 1,000 out of every 10,000 ASIC designs had high enough
volumes to justify fabrication at 0.13 μm [20]. Therefore, if design costs and times
for producing a high-end IC are becoming increasingly large, just a few of them
will justify their production in the future. The problems of increasing design costs
and long turnaround times become even more noticeable due to increasing market
pressures. The time available for a company to introduce a product into the market
is shrinking. This way, the design of new ICs is increasingly being driven by time-
to-market concerns.
Nevertheless, there will be a crossover point where, if the company needs a
more customized silicon implementation, it will be necessary to afford the mask
and production costs. However, economics are clearly pushing designers toward
more regular structures that can be manufactured in larger quantities. Regular fabric
would solve the mask cost and many other issues such as printability, extraction,
power integrity, testing, and yield. Customization of a product, however, cannot rely
solely on software programming, mostly for energy efficiency reasons. This way,
some form of hardware adaptability must be present to ensure that low cost, mass
produced devices can still be tuned for different applications needs, without redesign
and fabrication costs.
1.5 Memory
Memories have been a concern since the early years of computing systems. Whether
due to size, manufacturing cost, bandwidth, reliability or energy consumption,
special care has always been taken when designing the memory structure of a
system. The historical and ever growing gap between the access time of memories
and the throughput of processors has also driven the development of very advanced
6. 6 A.C.S. Beck et al.
and large cache memories, with complex allocation and replacement schemes.
Moreover, the growing integration capacity of manufacturing processes has further
fueled the use of large on-chip caches, which occupy a significant fraction of the
silicon area for most current IC designs. Thus, memories represent nowadays a
significant component for the overall cost, performance and power consumption of
most systems, creating the need for careful design and dimensioning of the memory
related subsystems.
The development of memories for current embedded systems is supported mainly
by the scaling of transistors. Thus, the same basic SRAM, DRAM and Flash
cells have been used generation after generation with smaller transistors. While
this approach improves latency and density, it also brings several new challenges.
As leakage current does not decrease at the same pace as density increases, the static
power dissipation is already a major concern for memory architectures, leading to
joint efforts at all design levels. While research on device level tries to provide low
leakage cells [23], research on architecture level tries to power off memory banks
whenever possible [13, 24]. Moreover, the reduced critical charge increases the
soft error rates and places greater pressure on efficient error correction techniques,
especially for safety-critical applications. The reduced feature sizes also increase
process variability, leading to increased losses in yield. Thus, extensive research
is required to maintain the performance and energy consumption improvements
expected from the next generations of embedded systems, while not jeopardizing
yield and reliability.
Another great challenge arises with the growing difficulties found in CMOS
scaling. New memory technologies are expected to replace both the volatile and
the non-volatile fabrics used nowadays. These technologies should provide low
power consumption, low access latency, high reliability, high density, and, most
importantly, ultra-low cost per bit [10]. As coupling the required features on
new technologies is a highly demanding task, several contenders arise as possible
solutions, such as ferroelectric, nanoelectromechanical, and organic cells [10]. Each
memory type has specific tasks within an MPSoC. Since memory is a large part of
any system nowadays, bringing obvious costs and energy dissipation problems, the
challenge is to make its usage as efficient as possible, possibly using run-time or
application based information not available at design time.
1.6 Communication
With the increasing limitations in power consumption and the growing complexity
of improving the current levels of ILP exploitation, the trend towards embedding
multiple processing cores in a single chip has become a reality. While the use
of multiple processors provides more manageable resources, which can be turned
off independently to save power, for instance [9], it is crucial that they are able to
communicate among themselves in an efficient manner, in order to allow actual ac-
celeration with thread level parallelism. From the communication infrastructure one
7. 1 Adaptability: The Key for Future Embedded Systems 7
expects high bandwidth, low latency, low power consumption, low manufacturing
costs, and high reliability, with more or less relevance to each feature depending on
the application. Even though this may be a simple task for a small set of processors,
it becomes increasingly complex for a larger set of processors. Furthermore, aside
from processors, embedded SoCs include heterogeneous components, such as
dedicated accelerators and off-chip communication interfaces, which must also be
interconnected. The number of processing components expected to be integrated
within a single SoC is expected to grow quickly in the next years, exceeding
1,000 components in 2019 [10]. Thus, the need for highly scalable communication
systems is one the most prominent challenges found when creating a multi-
processor system-on-chip (MPSoC).
As classical approaches such as busses or shared multi-port memories have poor
scalability, new communication techniques and topologies are required to meet the
demands of the new MPSoCs with many cores and stringent area and power limi-
tations. Among such techniques, networks-on-chip (NoCs) have received extensive
attention over the past years, since they bring high scalability and high bandwidth
as significant assets [2]. With the rise of NoCs as a promising interconnection
for MPSoCs, several related issues have to be addressed, such as the optimum
memory organization, routing mechanism, thread scheduling and placement, and
so on. Additionally, as all these design choices are highly application-dependant,
there is a great room for adaptability also on the communication infrastructure, not
only for NoCs but for any chosen scheme covering the communication fabric.
1.7 Fault Tolerance
Fault Tolerance has gained more attention in the past years due to the intrinsic
vulnerability that deep-submicron technologies have imposed. As one gets closer
to the physical limits of current CMOS technology, the impact of physical effects
on system reliability is magnified. This is a consequence of the susceptibility that a
very fragile circuit has when exposed to many different types of extreme conditions,
such as elevated temperatures and voltages, radioactive particles coming from outer
space, or impurities presented in the materials used for packaging or manufacturing
the circuit, etc. Independent on the agent that causes the fault, the predictions about
future nanoscale circuits indicate a major need for fault tolerance solutions to cope
with the expected high fault rates [22].
Fault-tolerant solutions exist since 1950, first for the purpose of working in
hostile and remote environments of military and space missions. Later, to attain the
demand for highly reliable mission-critical applications systems, such as banking
systems, car braking, airplanes, telecommunication, etc. [14]. The main problem of
the mentioned solutions is the fact that they are targeted to avoid that a fault affects
the system at any cost, since any problem could have catastrophic consequences.
For this reason, in many cases, there is no concern with the area/power/performance
overhead that the fault-tolerant solution may add to the system.
8. 8 A.C.S. Beck et al.
In this sense, the main challenge is to allow the development of high performance
embedded systems, considering all the aspects mentioned before, such as power
and energy consumption, applications with heterogeneous behavior, memory, etc.,
while still providing a highly reliable system that can cope with a large assortment
of faults. Therefore, this ever-increasing need for fault-tolerant, high performance,
low cost, low energy systems leads to an essential question: which is the best fault-
tolerant approach targeted to embedded systems, that is robust enough to handle
high fault rates and cause a low impact on all the other aspects of embedded
system design? The answer changes among applications, type of task and underlying
hardware platform. Once again, the key to solve this problem at different instances
relies on adaptive techniques to reduce cost and sustain performance.
1.8 Software Engineering and Development for Adaptive
Platforms
Adaptive hardware imposes real challenges for software engineering, from the
requirement elicitation to the software development phases. The difficulties for
software engineering are created due to the high flexibility and design space that
exists in adaptive hardware platforms. Besides the main behavior that the software
implements, i.e. the functional requirements, an adaptive hardware platform unveils
a big range of non-functional requirements that must be met by the software
under execution and supported by the software engineering process. Non-functional
requirements are a burden to software development even nowadays. While it is
somewhat known how to control some of the classical ones, such as performance or
latency, for the ones specifically important to the embedded domain, such as energy
and power, the proper handling is still an open research problem.
Embedded software has radically changed at fast pace within just a few years.
Once being highly specialized to perform just a few tasks, such as decoding voice,
or organizing a simple phone book in case of mobile phones and one at a time,
the software we find today in any mainstream smart phone contains several pieces
of interconnected APIs and frameworks working together to deliver a completely
different experience to the user. The embedded software is now multitask and runs
in parallel, since even mobile devices contains a distinct set of microprocessors,
each one dedicated to a certain task, such as speech processing and graphics. These
distinct architectures exist and are necessary to save energy. Wasting computational
and energy resources is a luxury that resource constrained devices cannot afford.
However, the above intricate and heterogeneous hardware, which support more
than one instruction set architecture (ISA), were designed to be resource-efficient,
and not to ease software design and production. In addition, since there are
potentially many computing nodes, parallel software designed to efficiently occupy
the heterogeneous hardware is mandatory also to save energy. Needless to say
how difficult parallel software design is. If the software is not well designed to
take advantage and efficiently use all the available ISAs, the software designer
9. 1 Adaptability: The Key for Future Embedded Systems 9
will probably miss an optimal point of resources utilization, yielding energy-
hungry applications. One can easily imagine several of them running concurrently,
coming from unknown and distinct software publishers, implementing unforeseen
functionalities, and have the whole picture of how challenging software design and
development for these devices can be.
If adaptive hardware platforms are meant to be programmable commodity
devices in the near future, the software engineering for them must transparently
handle their intrinsic complexity, removing this burden from the code. In the
adaptive embedded systems arena, software will continue to be the actual source of
differentiation between competing products and of innovation for electronics con-
sumer companies. A whole new environment of programming languages, software
development tools, and compilers may be necessary to support the development of
adaptive software or, at least, a deep rethink of the existing technologies. Industry
uses a myriad of programming and modeling languages, versioning systems,
software design and development tools, just to name a few of the key technologies,
to keep delivering innovation in their software products. The big question is how to
make those technologies scale in terms of productivity, reliability, and complexity
for the new and exciting software engineering scenario created by adaptive systems.
1.9 This Book
Industry faces a great number of challenges, at different levels, when designing
embedded systems: they need to boost performance while maintaining energy con-
sumption as low as possible, they must be able to reuse existent software code, and
at the same time they need to take advantage of the extra logic available in the chip,
represented by multiple processors working together. In this book we present and
discuss several strategies to achieve such conflicting and interrelated goals, through
the use of adaptability. We start by discussing the main challenges designers must
handle in these days and in the future. Then, we start showing different hardware
solutions that can cope with some of the aforementioned problems: reconfigurable
systems; dynamic optimization techniques, such as Binary Translation and Trace
Reuse; new memory architectures; homogeneous and heterogeneous multiprocessor
systems and MPSoCs; communication issues and NOCs; fault tolerance against
fabrication defects and soft errors; and, finally, how to employ specialized software
to improve this new scenario for embedded systems design, and how this new kind
of software must be designed and programmed.
In Chap. 2, we show, with the help of examples, how the behavior of even
a single thread execution is heterogeneous, and how difficult it is to distribute
heterogeneous tasks among the components in a SoC environment, reinforcing the
need for adaptability.
Chapter 3 gives an overview of adaptive and reconfigurable systems and their
basic functioning. It starts with a classification about reconfigurable architectures,
10. 10 A.C.S. Beck et al.
including coupling, granularity, etc. Then, several reconfigurable systems are
shown, and for those which are the most used, the chapter discusses their advantages
and drawbacks.
Chapter 4 discusses the importance of memory hierarchies in modern embedded
systems. The importance of carefully dimensioning the size or associativity of
cache memories is presented by means of its impact on access latency and energy
consumption. Moreover, simple benchmark applications show that the optimum
memory architecture greatly varies according to software behavior. Hence, there
is no universal memory hierarchy that will present maximum performance with
minimum energy consumption for every application. This property creates room
for adaptable memory architectures that aim at getting as close as possible to
this optimum configuration for the application at hand. The final part of Chap. 4
discusses relevant works that propose such architectures.
In Chap. 5, Network-on-Chips are shown, and several adaptive techniques that
can be applied to them are discussed. Chapter 6 shows how dynamic techniques,
such as binary translation and trace reuse, work to sustain adaptability and still
maintain binary compatibility. We will also discuss architectures that present some
level of dynamic adaptability, as well as what is the price to pay for such type of
adaptability, and for which kind of applications it is well suited.
Chapter 7, about Fault Tolerance, starts with a brief review of some of the
most used concepts concerning this subject, such as reliability, maintainability,
and dependability, and discusses their impact on the yield rate and costs of
manufacturing. Then, several techniques that employ fault tolerance at some level
are demonstrated, with a critical analysis.
In Chap. 8 we discuss how important the communication infrastructure is for
future embedded systems, which will have more heterogeneous applications being
executed, and how the communication pattern might aggressively change, even with
the same set of heterogeneous cores, from application to application.
Chapter 9 puts adaptive embedded systems into the center of the software engi-
neering process, making them programmable devices. This chapter presents tech-
niques from the software inception, passing through functional and non-functional
requirements elicitation, programming language paradigms, and automatic design
space exploration. Adaptive embedded systems impose harsh burdens to software
design and development, requiring us to devise novel techniques and methodologies
for software engineering. In the end of the chapter, a propositional software design
flow is presented, which helps to connect the techniques and methods discussed in
the previous chapters and to put into technological grounds a research agenda for
adaptive embedded software and systems.
References
1. Austin, T., Blaauw, D., Mahlke, S., Mudge, T., Chakrabarti, C., Wolf, W.: Mobile supercom-
puters. Computer 37(5), 81–83 (2004). doi:http://dx.doi.org/10.1109/MC.2004.1297253
11. 1 Adaptability: The Key for Future Embedded Systems 11
2. Bjerregaard, T., Mahadevan, S.: A survey of research and practices of network-on-chip. ACM
Comput. Surv. 38(1) (2006). doi:http://doi.acm.org/10.1145/1132952.1132953.
3. Borkar, S., Chien, A.A.: The future of microprocessors. Commun. ACM 54(5), 67–77 (2011).
doi:10.1145/1941487.1941507. http://doi.acm.org/10.1145/1941487.1941507
4. Burger, D., Goodman, J.R.: Billion-transistor architectures: there and back again. Computer
37(3), 22–28 (2004). doi:http://dx.doi.org/10.1109/MC.2004.1273999
5. Burns, J., Gaudiot, J.L.: Smt layout overhead and scalability. IEEE Trans. Parallel Distrib. Syst.
13(2), 142–155 (2002). doi:http://dx.doi.org/10.1109/71.983942
6. Conte, G., Tommesani, S., Zanichelli, F.: The long and winding road to high-performance
image processing with mmx/sse. In: CAMP ’00: Proceedings of the Fifth IEEE International
Workshop on Computer Architectures for Machine Perception (CAMP’00), p. 302. IEEE
Computer Society, Washington, DC (2000)
7. Flynn, M.J., Hung, P.: Microprocessor design issues: Thoughts on the road ahead. IEEE Micro.
25(3), 16–31 (2005). doi:http://dx.doi.org/10.1109/MM.2005.56
8. Fujimura, A.: All lithography roads ahead lead to more e-beam innovation. In: Future Fab. Int.
(37), http://www.future-fab.com (2011)
9. Isci, C., Buyuktosunoglu, A., Cher, C., Bose, P., Martonosi, M.: An analysis of efficient
multi-core global power management policies: maximizing performance for a given power
budget. In: Proceedings of the 39th annual IEEE/ACM International Symposium on Mi-
croarchitecture, MICRO 39, pp. 347–358. IEEE Computer Society, Washington, DC (2006).
doi:10.1109/MICRO.2006.8
10. ITRS: ITRS 2011 Roadmap. Tech. rep., International Technology Roadmap for Semiconduc-
tors (2011)
11. Kim, N.S., Austin, T., Blaauw, D., Mudge, T., Flautner, K., Hu, J.S., Irwin, M.J., Kandemir,
M., Narayanan, V.: Leakage current: Moore’s law meets static power. Computer 36(12), 68–75
(2003). doi:http://dx.doi.org/10.1109/MC.2003.1250885
12. Koufaty, D., Marr, D.T.: Hyperthreading technology in the netburst microarchitecture. IEEE
Micro. 23(2), 56–65 (2003)
13. Powell, M., Yang, S.H., Falsafi, B., Roy, K., Vijaykumar, T.N.: Gated-vdd: a circuit technique
to reduce leakage in deep-submicron cache memories. In: Proceedings of the 2000 Interna-
tional Symposium on Low Power Electronics and Design, ISLPED ’00, pp. 90–95. ACM,
New York (2000). doi:10.1145/344166.344526. http://doi.acm.org/10.1145/344166.344526
14. Pradhan, D.K.: Fault-Tolerant Computer System Design. Prentice Hall, Upper Saddle River
(1996)
15. Prakash, T.K., Peng, L.: Performance characterization of spec cpu2006 benchmarks on intel
core 2 duo processor. ISAST Trans. Comput. Softw. Eng. 2(1), 36–41 (2008)
16. Rutenbar, R.A., Baron, M., Daniel, T., Jayaraman, R., Or-Bach, Z., Rose, J., Sechen, C.:
(when) will fpgas kill asics? (panel session). In: DAC ’01: Proceedings of the 38th Annual
Design Automation Conference, pp. 321–322. ACM, New York (2001). doi:http://doi.acm.
org/10.1145/378239.378499
17. Sima, D.: Decisive aspects in the evolution of microprocessors. Proc. IEEE 92(12), 1896–1926
(2004)
18. Thompson, S., Parthasarathy, S.: Moore’s law: The future of si microelectronics. Mater. Today
9(6), 20–25 (2006)
19. Thompson, S.E., Chau, R.S., Ghani, T., Mistry, K., Tyagi, S., Bohr, M.T.: In search of “forever,”
continued transistor scaling one new material at a time. IEEE Trans. Semicond. Manuf. 18(1),
26–36 (2005). doi:10.1109/TSM.2004.841816. http://dx.doi.org/10.1109/TSM.2004.841816
20. Vahid, F., Lysecky, R.L., Zhang, C., Stitt, G.: Highly configurable platforms for embedded
computing systems. Microelectron. J. 34(11), 1025–1029 (2003)
21. Wall, D.W.: Limits of instruction-level parallelism. In: ASPLOS-IV: Proceedings of the
Fourth International Conference on Architectural Support for Programming Languages and
Operating Systems, pp. 176–188. ACM, New York (1991). doi:http://doi.acm.org/10.1145/
106972.106991
12. 12 A.C.S. Beck et al.
22. White, M., Chen, Y.: Scaled cmos technology reliability users guide. Tech. rep., Jet Propulsion
Laboratory, National Aeronautics and Space Administration (2008)
23. Yang, S., et al: 28nm metal-gate high-k cmos soc technology for high-performance mobile
applications. In: Custom Integrated Circuits Conference (CICC), 2011 IEEE, pp. 1–5 (2011).
doi:10.1109/CICC.2011.6055355
24. Zhang, C., Vahid, F., Najjar, W.: A highly configurable cache architecture for embedded
systems. In: Proceedings of the 30th Annual International Symposium on Computer Archi-
tecture, ISCA ’03, pp. 136–146. ACM, New York (2003). doi:10.1145/859618.859635. http://
doi.acm.org/10.1145/859618.859635