Embedded System

  • 2,703 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,703
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
176
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • http://www.jpsco.com/site.nsf/key/jps_y2k_embed_system A general purpose definition of embedded systems is that they are devices used to control, monitor or assist the operation of equipment, machinery or plant. 'Embedded' reflects the fact  that they are an integral part of the system. In many cases their very embedded nature may be such that their presence is far from obvious to the casual observer, and even the more technically skilled might need to  examine the operation of a piece of equipment for some time before being able to conclude that an embedded control device was involved in its functioning. At the other extreme a general purpose computer may be used to  control the operation of a large complex processing plant, and its presence will be obvious. All embedded systems are computers. Some of them are, however, very simple devices compared with a P.C. The simplest devices  consist of a single microprocessor (often called a "chip") which may itself be packaged with other chips in a hybrid or Application Specific Integrated Circuit (ASIC). Its input comes from a detector or sensor  and its output goes to a switch or activator which (for example) may start or stop the operation of a machine or, by operating a valve, may control the flow of fuel from an engine. The very simplest embedded systems  are capable of performing only a single function or set of functions to meet a single predetermined purpose. In more complex systems, the functioning of the embedded system is determined by an application program which  enables the embedded system to do things for a specific application. The ability to have programs means that the same system can be used for a variety of different purposes. In some cases a microprocessor may be  designed in such a way that application software for a particular purpose can be added to the basic software in a second provess, adter which it is not possible to make further changes: this is sometimes referred to as  firmware.
  • [http://www-2.cs.cmu.edu/~koopman/iccd96/iccd96.html] An embedded system encompasses the CPU as well as many other resources.
  • In addition to the emphasis on interaction with the external world, embedded systems also provide functionality specific to their applications. Instead of executing spreadsheets, word processing and engineering analysis, embedded systems typically execute control laws, finite state machines, and signal processing algorithms. They must often detect and react to faults in both the computing and surrounding electromechanical systems, and must manipulate application-specific user interface devices.
  • http://www. engr . uvic .ca/~seng440/Introduction. pdf
  • [http://www-2.cs.cmu.edu/~koopman/iccd96/iccd96.html] Embedded computers typically have tight constraints on both functionality and implementation. In particular, they must guarantee real time operation reactive to external events, conform to size and weight limits, budget power and cooling consumption, satisfy safety and reliability requirements, and meet tight cost targets Predicting the worst case may be difficult on complicated architectures, leading to overly pessimistic estimates erring on the side of caution. The Signal Processing and Mission Critical example systems have a significant requirement for real time operation in order to meet external I/O and control stability requirements These events may be periodic, in which case scheduling of events to guarantee performance may be possible. On the other hand, many events may be aperiodic, in which case the maximum event arrival rate must be estimated in order to accommodate worst case situations. Most embedded systems have a significant reactive component
  • Some systems have obvious risks associated with failure. In mission-critical applications such as aircraft flight control, severe personal injury or equipment damage could result from a failure of the embedded computer. Traditionally, such systems have employed multiply-redundant computers or distributed consensus protocols in order to ensure continued operation after an equipment failure However, many embedded systems that could cause personal or property damage cannot tolerate the added cost of redundancy in hardware or processing capacity needed for traditional fault tolerance techniques. This vulnerability is often resolved at the system level as discussed later.
  • Even though embedded computers have stringent requirements, cost is almost always an issue (even increasingly for military systems). Although designers of systems large and small may talk about the importance of cost with equal urgency, their sensitivity to cost changes can vary dramatically. A reason for this may be that the effect of computer costs on profitability is more a function of the proportion of cost changes compared to the total system cost, rather than compared to the digital electronics cost alone. For example, in the Signal Processing system cost sensitivity can be estimated at approximately $1000 ( i.e. , a designer can make decisions at the $1000 level without undue management scrutiny). However, with in the Small system decisions increasing costs by even a few cents attract management attention due to the huge multiplier of production quantity combined with the higher percentage of total system cost it represents.
  • In order to be competitive in the marketplace, embedded systems require that the designers take into account the entire system when making design decisions
  • End-product utility: The utility of the end product is the goal when designing an embedded system, not the capability of the embedded computer itself. Embedded products are typically sold on the basis of capabilities, features, and system cost rather than which CPU is used in them or cost/performance of that CPU. One way of looking at an embedded system is that the mechanisms and their associated I/O are largely defined by the application. Then, software is used to coordinate the mechanisms and define their functionality, often at the level of control system equations or finite state machines. Finally, computer hardware is made available as infrastructure to execute the software and interface it to the external world. While this may not be an exciting way for a hardware engineer to look at things, it does emphasize that the total functionality delivered by the system is what is paramount. System safety & reliability: software doesn't normally "break" in the sense of hardware, it may be so complex that a set of unexpected circumstances can cause software failures leading to unsafe situations. This is a difficult problem that will take many years to address, and may not be properly appreciated by non-computer engineers and managers involved in system design decisions
  • Controlling physical systems: The usual reason for embedding a computer is to interact with the environment, often by monitoring and controlling external machinery. In order to do this, analog inputs and outputs must be transformed to and from digital signal levels. Additionally, significant current loads may need to be switched in order to operate motors, light fixtures, and other actuators. All these requirements can lead to a large computer circuit board dominated by non-digital components. In some systems "smart" sensors and actuators (that contain their own analog interfaces, power switches, and small CPUS) may be used to off-load interface hardware from the central embedded computer. This brings the additional advantage of reducing the amount of system wiring and number of connector contacts by employing an embedded network rather than a bundle of analog wires. However, this change brings with it an additional computer design problem of partitioning the computations among distributed computers in the face of an inexpensive network with modest bandwidth capabilities. Power management: A less pervasive system-level issue, but one that is still common, is a need for power management to either minimize heat production or conserve battery power. While the push to laptop computing has produced "low-power" variants of popular CPUs, significantly lower power is needed in order to run from inexpensive batteries for 30 days in some applications, and up to 5 years in others.
  • First a need or opportunity to deploy new technology is identified. Then a product concept is developed. This is followed by concurrent product and manufacturing process design, production, and deployment. But in many embedded systems, the designer must see past deployment and take into account support, maintenance, upgrades, and system retirement issues in order to actually create a profitable design. Some of the issues affecting this life-cycle profitability are discussed below.
  • WindowsCE: require the underlying CPU to use a flat 32-bit address space and support kernel- and usermode operation, to support virtual memory using an MMU and to be little-endian.
  • Memory is always a precious resource on embedded system. The embedded operating systems thus make large effort to reduce its memory occupation size.
  • Embedded operating system use either microkernel or a modular architecture to make them easily tailored to fit in different application requirement. The QNX microkernel implement four services: 1) Interprocess communication. 2) Lowlevel network communication. 3) Process scheduling. 4) Interrupt dispatching. The OS service processes are optional, and user can choose which one is needed for their applications. This kind of design makes QNX flexible for different kinds of applications which require different kinds of OS services. Embedded Linux takes the Linux kernel and extract the necessary modules as needed. Within the kernel layer, Linux is composed of five major subsystems: the process scheduler (sched), the memory manager (mm), the virtual file system (vfs), the network interface (net), and the interprocess communication (ipc). Conceptually, the clustering of the components composes the Linux Kernel and each subsystem is an independent component of the Linux kernel. The operating system architecture of Windows CE is a hierarchical one. At the bottom lies the device drivers and OAL( OEM Abstraction Layer). They are implemented by the OEM when porting WindowsCE. Above them lie the Graphics, Windowing and Events Subsystem (GWES), the kernel itself and the communication stacks. This layer is implemented by Microsoft. The Remote API capability is built on top of the communication functionality. On top of the kernel lies the database and file system. This is accessed by the RAPI calls, and is made available to the applications via the Win32 interface. Application execute in their own address space and interact with the rest of Windows CE via the Win32 system call interface.
  • [Tornado Training Workshop Book 1, April 1998]
  • Memory mapping results in very fast data transfers between cooperating processes. It can be used to dramatically enhance real-time performance.
  • [Tornado Training Workshop Book 1, April 1998] A set of APIs to support these method.
  • Most modern conventional OS uses paged Virtual memory. Virtual Memory page is the unit of protection and memory allocation. The use of Virtual memory is also closely related to multiprogramming, because strictly speaking, each process operates in its own environment and address space; each process has its own virtual memory mapping, thus its own page table, upon process switching, different page tables are used. The use of processes and memory protection in embedded systems is very important to embedded system. If a single address space is used for all applications, a software bug in a single application can result in corruption of the memory and leads to the failure of the system. The disadvantage, however, is that the memory protection requires the CPU to support a MMU, which result in a more complex CPU, and the overhead of context switching between processes can be high. Different from conventional OS, most embedded OS'es are targeted to simple CPU which often don't have MMU, have limited memory, little or no disk, so they often don’t' use virtual memory or use restricted Virtual memory, Which we'll see below in the examples of WinCE and uClinux.
  • Network support is important to embedded system because it makes them easily communicate with the outside world also easy to upgrade them. Almost all embedded system OS support network facilities in their kernel.
  • Application requirement impact 1. The embedded operating system scheduling, process management, protection system design need to take into account the embedded application attributes (e.g. real-time, relatively few processes coexisting, or even single task, fixed application) 2. The scalability and flexibility of embedded application is also an important issue that system designer must consider in choosing the system architecture. Hardware Impact Hardware platform features (e.g. lack of hard disks, small memory, no MMU support, low power consumption) force the designer to pay special attention to the efficient usage of system resource.
  • http://www.windriver.com/products/html/virtual_cpu.html
  • [http://www.tm.tue.nl/race/ce/swhwco.html] Software/Hardware Codesign Software/Hardware codesign can be defined as the simultaneous design of both hardware and software to implement in a desired function. Succesful codesign goes hand in hand with coverification, which is the simultaneously verification of both software and hardware and in what extent it fits into the desired function. In today's world it is necessary to incorporate codesign in the early system design phase and put software-hardware integration downstream because traditional methodologies aren't effective any longer. Today, we try to put the foundation to incorporate codesign and coverification into the normal product development in place. Especially when products incorporating embedded systems are involved. There are very many traditional barriers to effective codesign and coverification such as organizational structures and oldfashioned paradigms of other companies in the same market or concepts developped in the past and worked well back then. Suppliers often lack an integrated view of the design process, too. What we need are tools which better estimate the constraints between the boundaries, before iterating through a difficult flow. By using simulation models we can find conflicts between top-down constraints, which come from design requirements and bottom-up constraints, which come from physical data. Bottom-up constraints for software can only be realized in a hardware context because the abstraction-level of software is higher than that of hardware on which it is executed. It is often the case that hardware is available (which is 'physical data'), so this can't be changed by software/hardware codesign. Only the software can be changed, and it should be fitted to this physical data. Therefore a certain modeling strategy is necessary to cover the existing hardware. This modeling isn't easy and it will never be perfect because the reality is too complex to find a perfect model. As to that it seems easier to design both hardware and software, because it is often easier to design two things that have to work together, than design one thing, and fit it around another. But if both hardware and software have to designed, powerful verification is essential because you have to design two different 'products' who interact with eachother and nothing is 'physical' on both 'products'. Ofcourse different techniques have been developped to verify combined hardware-software systems, but each of them has its own limitations. It's possible to run code on models of hardware emulated through dedicated programmable hardware, offering near real-time speed for code execution. Unfortunally, sometimes real-time interaction with other hardware and external environments is required, so full speed code execution isn't supported. Hardware-software codesign exists for several decades. To ensure system capability designers had to face the realities of combining digital computing with software algorithms. To verify interaction between these two prototypes, hardware had to be build. But in the '90s this won't suffice because codesign is turning from a good idea into an economic necessity. Predictions for the future point to greater embedded software content in hardware systems than ever before. So something has to be done to speed up and improve traditional software-hardware codesign. Developments in this matter direct to: Top-down system level codesign and cosynthesis work at universities Major advances made by EDA (Electronic Design Automation) companies in high speed emulation systems. Codesign focusses on the areas of system specification, architectural design, hardware-software partitioning and iteration between hardware and software as design progresses. Finally, codesign is complimented by hardware-software integration and tested. Design re-use is being applied more often, too. Previous and current generation IC's are finding their way into new designs as embedded cores in a mix-and-match fashion. This requires greater convergence of methodologies for codesign and coverification and high demands on system-on-a-chip-density. That's why this concept was an elusion for many years, until recently. In the future the need for tools to estimate the impact of design changes earlier in the design process, will increase. To get a hold of elusive design errors, quickly applying the right modeling strategy at the right time is essential. It is often necessary to consider multiple models, but how can multiple approaches be fit into a very tight design process ? This depends on the goals and constraints of the design project as well as the computational environment and the end-use. To find the right approach, iteration is the only way out. Because there is no widely accepted methodology or tool available to help designers create a functional specification, mostly ad-hoc manners are used, heavily relying on informal and manual techniques and exploring only few possibilities. There should be developped a hierarchical modeling methodology to improve this situation. The main concern in such a methodology is precisely specifying the system's functionality and exploring system-level implementations. To create a system-level design, the following steps should be taken: Specification capture: Decomposing functionality into pieces by creating a conceptual model of the system. The result is a functional specification, which lacks any implementation detail. Exploration: Exploration of design alternatives and estimating their quality to find the best suitable one. Specification: The specification as noted in 1. is now refined into a new description reflecting the decisions made during exploration as noted in 2. Software and hardware: For each of the components an implementation is created, using soft- and hardware design techniques. Physical design: Manufacturing data is generated for each component When succesfully run over the steps above, embedded-system design methodology from product conceptualization to manufacturing is roughly defined. This hierarchical modeling methodology enables high productivity, preserving consistency through all levels and thus avoiding unnecessary iteration, which makes the process more efficient and faster. Now let's go get a closer look at some processes run through in the steps above. To describe a system's functionality, the functionality should first be decomposed and relationships between the pieces should be described. There are many models for describing a system's functionality, let's name four important ones: Dataflow graph. A dataflow graph decomposes functionality into data-transforming activities and the dataflow between these activities. Finite-State Machine (FSM). By this model the system is represented as a set of states and a set of arcs that indicate transition of the system from one state to another as a result of certain ocurring events. Communicating Sequential Processes (CSP). This model decomposes the system into a set of concurrently executing processes, processes that execute program instructions sequentialy. Program-State Machine (PSM). This model combines FSM and CSP by permitting each state of a concurrent FSM to contain actions, described by program instructions. Each model has its own advantages and disadvantages. No model is perfect for all classes of systems, so the best one should be chosen, matching closely as possible the characteristics of the system into the models. This should be done very accurately because the choice of a model is the most important influence on the ability to understand and define system functionality during system specification. To specify functionality, several languages are commonly used by designers. VHDL and Verilog are very popular standards because of the easy description of a CSP model through their process and sequential-statement constructs. Ofcourse other languages are used as well but none of them directly supports state transitions. Just like some models are better suitable for specific systems, some languages are better suitable for specific models than others. Finally, it should be noted that codesign still is a very new field and researchers in this area have rapidly evolving interests. Work is in progress aiming at introducing more sophisticated algorithms and features on top of a basic framework as discussed above. Most of the implementation effort is devoted to transformation algorithms and to cost/performance evaluation. Higher level of automation in optimization, direct user selection, analysis of data flow connectivity and resource-analysis is currently researched.
  • http://ptolemy.eecs.berkeley.edu/presentations/01/model-based.pdf
  • http://ptolemy.eecs.berkeley.edu/presentations/01/model-based.pdf
  • http://ptolemy.eecs.berkeley.edu/presentations/01/model-based.pdf Concurrency: The synchrony abstraction Event-driven modeling Reusability: Cell libraries Interface definition Reliability: Leveraging limited abstractions Leveraging verification Heterogeneity: Mixing synchronous and asynchronous designs Resource management
  • http://ptolemy.eecs.berkeley.edu/presentations/01/model-based.pdf
  • The Furuta pendulum has a motor controlling the angle of an arm, from which a free-swinging pendulum hangs. The objective is to swing the pendulum up and the balance it.
  • http://www. engr . uvic .ca/~seng440/Introduction. pdf
  • http://diwww.epfl.ch/recherche/epflfiles/codesign.html
  • Reconfigurable hardware (FPGA) integrated onto a robot base allows it to adapt to changing environments
  • http://www.cs.ubc.ca/wccce/program98/micaela/micaela.html Paul Chow and Rob Jeschke. Rapid-Prototyping Board Users Guide . Dept. of Electrical and Computer Engineering, University of Toronto, February 1996. CMC #ICI-068_Users_Guide R/01. G. De Micheli and M. Sami, Ed. Hardware/Software Co-Design. Kluwer Academic Publishers, 1996. D. Gajski, F. Vahid, S. Narayan and J. Gong. Specification and Design of Embedded Systems. Prentice-Hall, 1994. W. Gardner, and M. Serra, "Concurrent Simulation of Heterogeneous Multiprocessor Embedded Systems", Proc. of 7th Int. Symp. on IC Technology, Systems & Application, Sept. 1997. W. Gardner and M. Serra. "An Object-Oriented Layered Approach to Interfaces for Hardware/Software Codesign of Embedded Systems". Proc. Hawaii Int. Conf. on System Sciences , Jan. 1998. S. Kumar, J. Aylor, B. Johnson and W. Wulf, The Codesign of Embedded Systems - a Unified Hardware/Software Representation. Kluwer Academic Publishers, 1996. J. Rozenblit and K. Buchenrieder, Ed. Codesign. IEEE Press, 1995. D. Sharp, W.B. Gardner and M. Serra, Gizgate: an object-oriented gateway for hardware/software codesign on the CMC Rapid Prototyping Board. Proc. FDP `98, Montreal, June 1998. M. Smith , Application-Specific Integrated Circuits. Addison Wesley, 1997
  • Figure 1 represents a more utopian view, where codesign and codesign tools provide an almost automatic framework for producing a balanced and optimized design from some initial high level specification. The goal of codesign tools and platform is not to push towards this kind of total automation; the designer interaction and continuous feedback is considered essential. The main goal is instead to incorporate in the "black box of codesign tools" the support for shifting functionality and implementation between hardware and software, with effective and efficient evaluation. At the end of the process, either the prototypes or the final products are output, based on currently available platforms (software compilers and commercial hardware synthesis tools). Codesign as an area of research does not aim to reinvent the wheel of system design; however, the necessary flexibility must be effectively incorporated and enhanced. For example, in the design of a real time system as a graduate project, a sub path in the figure above may indeed be followed. The difference is that the designers are given predetermined choices of hardware and software allocation and must meet the timing constraints within the specifications. Codesign introduces the research into the trade-offs of that allocation, dynamically throughout the entire process.
  • Figure 2 shows graphically the two paths, leading to a final system integration, with no reconfiguration choices shown after the initial split – Model Continuity Problem
  • It is easy to draw such picture and assign grandiose labels. Yet here the triangles shown spanning the two paths and covering the integrated substrate do not refer to mere feedback sessions and weekly designers meetings! They represent, at a minimum, an integrated database, with powerful tools which can support the exploration, prototype implementation and rapid evaluation of the repartitioning of functionality between hardware and software, together with an essential and extremely effective cosimulation platform.
  • The red "interaction and feedback" arrow is the crucial part. Another important aspect is the central "Interface" submodule, which in normal system design is often left on the sideline, causing disastrous effects at the time of integration. Given that many embedded systems which use codesign methodologies are often implemented at a very low level of programming and details (e.g. assembly code), the proper development of an effective interface becomes extremely important, even more so from the view that any reconfiguration of the design will change the critical interface modules
  • http://www.sigda.org/Archives/ProceedingArchives/CODES/Codes99/papers/1999/codes99/pdffiles/7_5.pdf http://www.sigda.org/Archives/ProceedingArchives/Dac/Dac97/papers/1997/dac97/pdffiles/15_2.pdf http://www.cornfed.com/doc/prot.pdf
  • http://www.engr.usask.ca/~khan/publications/iccd98.pdf
  • http://www.eventhelix.com/RealtimeMantra/SoftwareFaultTolerance.htm Timeouts Most Realtime systems use timers to keep track of feature execution. A timeout generally signals that some entity involved in the feature has misbehaved and a corrective action is required. The corrective action could be of two forms: Retry: When the application times out for a response, it can retry the message interaction. You might argue that we do not need to implement application level retries as lower level protocols will automatically recover from message loss. Keep in mind that message loss recovery is not the only objective of implementing retries. Retries help in recovering from software faults too. Consider a scenario where a message sent to a task is not processed because of a task restart or processor reboot. An application level retry will recover from this condition. Abort: In this case timeout for a response leads to aborting of the feature. This might seem too drastic, but in reality aborting a feature might be the simplest and safest solution in recovering from the errors. The feature might be retried by the user invoking the feature. Consider a case where a call has to be cleared because the task originating the call did not receive a response in time. If this condition can happen only in rare scenarios, the simplest action on timeout might be to clear the call. The user would retry the call. The choice between retrying or aborting on timeouts is based on several factors. Consider all these factors before you decide either way: If the feature being executed is fairly important for system stability, it might be better to retry. For example, a system startup feature should not be aborted on one timeout. If the lower layer protocol is not robust, retry might be a good option. For example, message interactions using an inherently unreliable protocol like slotted aloha should always be retried. Complexity of implementation should also be considered before retrying a message interaction. Aborting a feature is a simpler option. More often than not system designers just default to retrying without even considering the abort option. Keep in mind that retry implementation complicates the code and state machine design. If the entity invoking this feature will retry the feature, the simplest action might be abort the feature and wait for an  external retry. Retrying every message in the system will lower system performance because of frequent timer start and stop operations. In many cases, performance can be improved by just running a single timer for the complete feature execution. On timeout the feature can simply be aborted. For most external interactions, the designer might have no choice. As the timeouts and retry actions are generally specified by the external protocols. Many times the two techniques are used together. The task retries a message certain number of times. If no response is received after exhausting this limit, the feature might be aborted. Audits Most Realtime systems comprise of software running across multiple processors. This implies that data is also distributed. The distributed data may get inconsistent in Realtime due to reasons like: independent processor reboot  software bugs race conditions hardware failures protocol failures The system must behave reliably under all these conditions. A simple strategy to overcome data inconsistency is to implement audits. Audit is a program that checks the consistency of data structures across processors by performing predefined checks. Audit Procedure System may trigger audits due to several reasons: periodically failure of certain features processor reboots processor switchovers certain cases of resource congestion Audits perform checks on data and look for data inconsistencies between processors. Since audits have to run on live systems, they need to filter out conditions where the data inconsistency is caused by transient data updates. On data inconsistency detection, audits perform multiple checks to confirm inconsistency. A inconsistency is considered valid if and only if it is detected on every iteration of the check. When inconsistency is confirmed, audits may perform data structure cleanups across processors. At times audits may not directly cleanup inconsistencies; they may trigger appropriate feature aborts etc. Lets consider the Xenon Switching System . If the call occupancy on the system is much less than the maximum that could be handled and still calls are failing due to lack of space-slot resources, call processing subsystem  will detect this condition and will trigger space-slot audit. The audit will run on the XEN and CAS processors cross-check if a space-slot that is busy at CAS actually has a corresponding call at XEN. If no active call is found on XEN for a space-slot, the audit will recheck the condition after a small delay for several times. If the inconsistency holds on every attempt, the space-slot resource is marked free at CAS. The audit performs several rechecks to eliminate the scenario in which the space-slot release message may be in transit. Exception Handling Whenever a task receives  a message, it performs a series of defensive checks before processing it.  The defensive checks should verify the consistency of the message as well as the internal state of the task. Exception handler should be invoked on defensive check failure. Depending on the severity, exception handler can take any of the following actions: Log a trace for developer post processing. Increment a leaky-bucket counter for the error condition. Trigger appropriate audit. Trigger a task rollback. Trigger processor reboot.   Leaky Bucket Counter Leaky-bucket counters are used to detect a flurry of error conditions. To ignore rare error conditions they are periodically leaked i.e. decremented. If these counters reach a certain threshold, appropriate exception handling is triggered. Note that the threshold will never be crossed by rare happening of the associated error condition. However, if the error condition occurs rapidly, the counter will overflow i.e. cross the threshold.  Task Rollback In a complex Realtime system, a software bug in one task leading to processor reboot may not be acceptable. A better option in such cases is to isolate the erroneous task and handle the failure at the task level. The task in turn may decide to rollback i.e. start operation from a known or previously saved state. In other cases, it may not be expensive to forget the context by just deleting the offending task and informing other associated tasks. For example, if the Space Slot Manager on the CAS card encounters a exception condition leading to task rollback, it might resume operation by recovering the space slot allocation status from the connection memory. On the other hand, exception in a call task might just be handled by clearing the call task and releasing all the resources assigned to this task. Task rollback may be triggered by any of the following events: Hardware exception conditions like divide by zero, illegal address access (bus error) Defensive check leaky-bucket counter overflows. Audit detected inconsistency to be resolved by task rollback. Incremental Reboot Software processor reboots can be time consuming, leading to unacceptable amount of downtime. To reduce the system reboot time, complex Realtime systems often implement incremental system initialization procedures. For example, a typical Realtime system may implement three levels of system reboot : Level 1 Reboot : Operating system reboot  Level 2 Reboot : Operating system reboot along with configuration data download Level 3 Reboot : Code reload followed by operating system reboot along with configuration data download. Incremental Reboot Procedure A defensive check leaky-bucket counter overflow will typically lead to rollback of the offending task. In most cases task rollback will fix the problem. However, in some cases, the problem may not be fixed leading to subsequent rollbacks too soon. This will cause the task level rollback counter to overflow, leading to a Level 1 Reboot. Most of the times, Level 1 Reboot will fix the problem. But in some cases, the processor may continue to hit Level 1 Reboots repeatedly. This will cause the Level 1 Reboot counter to overflow, leading to a Level 2 Reboot.   Majority of the times, Level 2 Reboot is able to fix the problem. If it is unable to fix the problem, the processor will repeatedly hit Level 2 Reboots, causing the Level 2 Reboot counter to overflow leading to Level 3 Reboot. Voting This is a technique that is used in mission critical systems where software failure may lead to loss of human life .e.g. aircraft navigation software. Here, the Realtime system software is developed by at least three distinct teams. All the teams develop the software independently. And, in a live system, all the three implementations are run simultaneously. All the inputs are fed to the three versions of software and their outputs are voted to determine the actual system response. In such systems, a bug in one of the three modules will get voted out by the other two versions.  
  • http://www.eventhelix.com/RealtimeMantra/HardwareFaultTolerance.htm
  • Bus Cycle Level Synchronization In this scheme the active and the standby are locked at processor bus cycle level. To keep itself synchronized with the active unit, the standby unit watches each processor instruction that is performed by active. Then, it performs the same instruction in the next bus cycle and compares the output with that of the active unit. If the output does not match, the standby might takeover and become active. The main disadvantage here is that specialized hardware is needed to implement this scheme. Also, bus cycle level synchronization introduces wait states in bus cycle execution. This will lower the overall performance of the processor. Memory Mirroring Here, the system is configured with two CPUs  and two parity based memory cards. One of the CPU is active and the other is standby. Both the memory cards are driven by the active CPU. No memory is attached to the standby unit. Each memory write by the active is made to both the memory cards. The data bits and the parity bits are updated individually on both the memory cards. On every memory read, the output of both the memory cards is compared. If a mismatch is detected, the processor believes the memory card with correct parity bit. The other memory card is marked suspected and a fault trigger is generated. The standby unit continuously monitors the health of the active unit by sanity punching or watchdog mechanism. If a fault is detected, the standby takes over both the memory cards. Since the application context is kept in memory, the new active processor gets the application context.   The main disadvantage here is that specialized hardware is needed to implement this scheme. Also, memory mirroring introduces wait states in bus cycle execution. This will lower the overall performance of the processor. Message Level Synchronization In this scheme, active unit passes all the messages received from external sources to the standby. The standby performs all the actions as though it were active with the difference that no output is sent to the external world.  The main advantage here is that no special hardware is required to implement this. The scheme is practical only in conditions where the processor is required to take fairly simple decisions. In cases of complex decisions, the synchronization can be easily lost  if the two processor take different decisions on the same input message. Checkpoint Level Synchronization To some extent, this one is like message level synchronization as active conveys synchronization information in terms of messages to standby. The difference is that all the external world messages are not conveyed. The information is conveyed only about predefined milestones. For example, in a Call Processing system, checkpoints may be passed only when the call reaches conversation or is cleared.  If standby takes over, all the calls in conversation would be retained whereas all calls in transient states will be lost. Resource information for the transient calls may be retrieved by running software audits with other modules. This scheme is not prone to loss of synchronization under normal conditions. Also, the message traffic to the standby is reduced, thus improving the overall performance of the active. Reconciliation on Takeover In this scheme, no synchronization between the active and the standby. When the standby takes over, it recovers the processor context by requesting information with other modules in the system. The advantage of this scheme lies in its simplicity of implementation. Also, there is no performance overhead due to mate synchronization. The disadvantage is that the standby take over may be delayed due to reconciliation requirements.
  • Assume that the system is running with copy-0 as active unit and copy-1 as standby. When the copy-0 fails, copy-1 will detect the fault by any of the fault detection mechanisms .  At this point, copy-1 takes over from copy-0 and becomes active. The state of copy-0 is marked suspect, pending diagnostics. The system raises an alarm, notifying the operator that the system is working in a non-redundant configuration. Diagnostics are scheduled on copy-0. This  includes power-on diagnostics and hardware interface diagnostics. If the diagnostics on copy-0 pass, copy-0 is brought in-service as standby unit. If the diagnostics fail, copy-0 is marked failed and the operator is notified about the failed card. The operator replaces the failed card and commands the system to bring the card in-service. The system schedules diagnostics on the new card to ascertain that the card is healthy. Once the diagnostics pass, copy-0 is marked standby. The copy-0 now starts monitoring the health of copy-1 which is currently the active copy. The system clears the non-redundant configuration alarm as redundancy has been restored. The operator can restore the original configuration by switching over the two copies.
  • One of the most important aspects of fault handling is detecting a fault immediately and isolating it to the appropriate unit as quickly as possible. Here are some of the commonly used fault detection mechanisms. Sanity Monitoring: A unit monitors the health of another unit by expecting periodic health messages. The unit that is being monitored should check its sanity and send the periodic health update to the monitoring unit. The monitoring unit will report a fault if more than a specified number of successive health messages are lost. Watchdog Monitoring: This is the hardware based monitoring technique to detect hanging hardware or software modules. The system is configured with a hardware timer that should be never allowed to timeout. The software periodically restarts the timer under normal conditions. If the software goes in an infinite loop or a hardware module gets stuck, the watchdog timer would go off. This typically leads to a hardware reset of the unit and a hardware signal to the mate unit. Protocol Faults: If a unit fails, all the units that are in communication with this unit will encounter protocol faults. The protocol faults are inherently fuzzy in nature as they may be due to a failure of any unit in the path from the source to destination. Thus further isolation is required to identify the faulty unit. In-service Diagnostics: Sometimes the hardware modules are so designed that they allow simple diagnostic checks even in the in-service state. These checks are non-destructive in nature so they do not interfere with the normal functioning of the card. For example, on a digital trunk card, in-service diagnostics may be performed on idle channels. If a diagnostic check fails, a fault trigger is raised. Transient Leaky Bucket Counters: When the hardware is in operation, many transient faults may be detected by the system. Transient faults are typically handled by incrementing a leaky bucket counter. If the leaky bucket counter overflows, a fault trigger is raised. The following are few examples of transient faults. Spurious interrupts: If the interrupt service routine gets called but no device is found to have raised any interrupt, only a leaky bucket counter is incremented. The hardware unit is suspected only if the leaky bucket counter overflows due to spurious interrupts happening repeatedly in a short interval. Spurious fault triggers: As we discussed in the fault handling lifecycle, when a fault trigger is generated the hardware unit is suspected and diagnostics are run. If the diagnostics pass, the unit is brought in and a leaky bucket counter is incremented. If this sequence repeats too often, the hardware unit may be actually faulty but the diagnostics are not exhaustive enough to detect the hardware fault. Killer trunks: Due to events like lightening, rains etc. digital trunks might frequently generate fault triggers but may come back in-service. If this happens too frequently, the digital trunk is marked as a "killer trunk" and is taken out of service. This is done to avoid the system getting overloaded with transient fault processing.
  • Timing characteristics of embedded systems: hard, soft and firm systems; fail-safe and fail-operational systems. Guaranteed-response, best-effort, event and time-triggered systems. Timing constraints in embedded systems.

Transcript

  • 1. Embedded System
    • A specialized computer system that is part of a larger system or machine. Typically, an embedded system is housed on a single microprocessor board with the programs stored in ROM. Some embedded systems include an operating system, but many are so specialized that the entire logic can be implemented as a single program.
  • 2. Embedded System Testing Wei-Tek Tsai Department of Computer Science and Engineering Arizona State University Tempe, AZ 85287
  • 3. Embedded System(cont’)
  • 4. Embedded System(cont’)
    • Differences with desktop computing
      • The human interface may be as simple as a flashing light or as complicated as real-time robotic vision.
      • The diagnostic port may be used for diagnosing the system that is being controlled -- not just for diagnosing the computer.
      • Special-purpose field programmable (FPGA), application specific (ASIC), or even non-digital hardware may be used to increase performance or safety.
      • Software often has a fixed function, and is specific to the application.
  • 5. Embedded System(cont’)
  • 6. Embedded System(cont’)
    • Characteristics of Embedded Systems
      • Application specific
        • Jobs are known a priori
        • Static scheduling of tasks and allocation of resources
      • Real time
        • Hardware/software tradeoff
        • Exceptions
      • Reactive
        • Interacts with external environment continuously
      • Hierarchy of behaviours
        • Sequential and concurrent subbehaviours
  • 7. Embedded System(cont’)
    • Characteristics of digital systems
      • Application domain
        • General purpose
        • Dedicated computing and control systems
        • Emulation and prototyping systems
      • Degree of programmability
        • Application level
        • Instruction level
        • Hardware level
      • Hardware fabrication technology
        • Bipolar versus CMOS
      • Level of integration
        • Discrete components versus integrated
  • 8. Embedded System(cont’)
    • Definitions
      • FPGA(Field Programmable Gate Array)
        • Programmable HW; configurable gate level interconnection of circuits after manufacturing
        • Consists of a matrix of cells: Configurable Logic Blocks(CLBs) and I/O Blocks(IOBs), with programmable switches to provide the desired connections between blocks
        • Slower than non-programmable devices, but allows prototype to be designed quickly(circuit design, implementation, verification on desktop workstations)
  • 9. Embedded System(cont’)
      • ASIC(Application Specific Integrated Circuit)
        • Custom designed chip to implement a digital function/system
        • Hardwired(non-programmable) gives the best performance
        • Must produce in volume to cover non-recurrent engineering design cost
  • 10. Embedded System(cont’)
      • ASIP(Application Specific Instruction Processor)
        • A microprocessor with special architecture design, and instruction set chosen for a specific domain of programs
        • Easier to cover non-recurrent engineering cost since ASIP has multiple applications
  • 11. Embedded System(cont’)
    • HW/SW Co-Design
      • Meeting system level objectives by exploiting the synergism of HW and SW through their concurrent design
      • Simultaneously design the software architecture of an application and the HW on which that SW is implementation to meet performance, cost, or reliability goals
  • 12. Embedded System(cont’)
      • HW/SW Co-Simulation
        • The joint simulation of HW and SW components and their interaction
  • 13. Embedded System(cont’)
    • Design requirements
      • Real time/reaction operation
      • Small size, low weight
      • Safe and reliable
      • Harsh environment
      • Cost sensitivity
  • 14. Embedded System(cont’)
      • Real time/reactive operation
        • Real time system operation: the correctness of a computation depends, in part, on the time at which it is delivered
        • Reactive computation: the software executes in response to external events
        • Challenge: Worst case design analyses without undue pessimism in the face of hardware with statistical performance characteristics ( e.g., cache memory [ Philip Koopman, " Perils of the PC Cache ", Embedded Systems Programming, May 1993, 6 (5) 26-34 ]).
  • 15. Embedded System(cont’)
      • Small size, low weight
        • Physically located within some larger artifact, therefore, form factors may be dictated
        • Weight might be critical in transportation and portable systems for fuel economy or human endurance
        • Challenges:
          • Non-rectangular, non-planar geometries.
          • Packaging and integration of digital, analog, and power circuits to reduce size.
  • 16. Embedded System(cont’)
      • Safe and reliable
        • Challenges:
          • Low-cost reliability with minimal redundancy
      • Harsh environment
        • Many embedded systems do not operate in a controlled environment
        • Additional problems can be caused for embedded computing by a need for protection from vibration, shock, lightning, power supply fluctuations, water, corrosion, fire, and general physical abuse
        • Challenges: accurate thermal modeling and de-rating components differently for each design, depending on operating environment
  • 17. Embedded System(cont’)
      • Cost sensitivity
        • Challenge:Variable "design margin" to permit tradeoff between product robustness and aggressive cost optimization
  • 18. Embedded System(cont’)
    • System level requirements for embedded system
      • End-product utility
      • System safety & reliability
      • Controlling physical systems
      • Power management
  • 19. Embedded System(cont’)
      • End-product utility
        • Challenge: Software- and I/O-driven hardware synthesis (as opposed to hardware-driven software compilation/synthesis).
      • System safety & reliability
        • Challenges:
          • Reliable software
          • Cheap, available systems using unreliable components
          • Electronic vs. non-electronic design tradeoffs
  • 20. Embedded System(cont’)
      • Controlling physical systems
        • Challenge: Distributed system tradeoffs among analog, power, mechanical, network, and digital hardware plus software
      • Power management
        • Challenge: Ultra-low power design for long-term battery operation
  • 21. Embedded System(cont’)
    • Embedded system lifecycle
  • 22. Major Embedded OS
    • QNX 4 RIOS
    • Embedded Linux
    • Windows CE
    • VxWorks
  • 23. Major Embedded OS (cont’)
    • Supported processor
      • QNX: all generic x86 based processors(386+)
      • Linux: virtually on every general purpose micro-processor(ARM, StrongARM, MIPS, Hitachi SH, PowerPC, x86)
      • WindowsCE: ( x86, MIPS, Hitachi SH3 and SH4, PowerPC and StrongArm processors)
      • VxWorks: (PowerPc, 68K, CPU32, ColdFire, MCORE, 80x86 and Pentium, i960, ARM and StrongARM, MIPS, SH, SPARC, NECV8xx, M32 R/D, RAD6000, ST 20, TriCore)
  • 24. Major Embedded OS(cont’)
    • Memory constraints
      • QNX is the smallest
      • Windows CE needs 350KB for a minimal system
      • Linux needs 125 – 256 KB fro a reasonable configured kernel
      • VxWorks: a few kilobytes for a deeply embedded system
  • 25. Major Embedded OS(cont’)
    • Architecture Comparison
      • QNX: A very small microkernel surrounded by a team of cooperating processes that provide higher level OS services.
      • Linux: a layering structure and comprised of modules.
      • WindowsCE: The operating system architecture of Windows CE is a hierarchical one.
      • VxWork: Individual modules may be used in development and omitted in production systems.
  • 26. Major Embedded OS(Cont’)
    • Process Management
      • QNX:
        • Process manager is not in micro kernel
        • Use message passing primitives to communicate with other processes
        • Scheduling is managed by the micro kernel scheduler
        • Scheduling methods: FIFO, RR, adaptive
        • Fully preemptible
  • 27. Major Embedded OS(cont’)
      • Linux:
        • Implements threads in kernel
        • Three classes of threads
          • Real-time FIFO: having highest priority and not preemptable
          • Real-time RR: same as real-time FIFO but preemptable
          • Time sharing: lowest prority
  • 28. Major Embedded OS(cont’)
      • WindowsCE:
        • Support both processes and threads
        • Full memory protection applied to application processes
        • Thread scheduling is preemptive, using 8 different priority levels
        • A maximum of 32 simultaneous processes
        • A process contains one or more threads
  • 29. Major Embedded OS(cont’)
      • VxWorks:
        • A multitasking kernel
        • Transparently interleave task execution
        • Uses Task Control Blocks(TCB’s) to keep track of tasks
        • Priority scheduling and priority based preemption
        • RR scheduling only applies to the tasks of the same priority
  • 30. Major Embedded OS(cont’)
    • Interprocess Communication
      • QNX:
        • Message passing facilities
        • a message delivered from one process to another need not occupy a single, contiguous area in the memory
        • All the system services within QNX are built upon message passing primitives
  • 31. Major Embeded OS(cont’)
      • Linux:
        • Uses original Linux IPC mechanisms: signals, wait queues, file locks, pipes and named pipes, system V IPC, Unix Domain Sockets.
        • The embedded linux can choose any one of the IPC methods for its particular application.
  • 32. Major Embedded OS(cont’)
      • WindowsCE:
        • Supports message passing between processes
        • Supports memory mapping between processes
  • 33. Major Embedded OS(Cont’)
      • VxWorks:
        • Shared memory
        • Message passing queues
        • Pipes
  • 34. Major Embedded OS(Cont’)
    • Memory management
      •  Clinux :
        • Without a MMU, running on a flat memory model(virtual = physical)
        • No paging, no protection, no memory sharing
        • No fork() since no copy-on-write, only limited version of vfork()
  • 35. Major Embedded OS(cont’)
      • WindowsCE:
        • More elaborate memory management
        • Supports paged virtual memory partially
        • Requires CPU to support a TLB but not a full page model
        • Full memory protection applies to application processes
  • 36. Major Embedded OS(cont’)
    • Network support
      • QNX: contains low level network communication in its microkernel
      • Linux: automatically get the most current Internet protocols.
      • WindowsCE: has communication stacks of various kinds at the same level as kernel. Supports IP, HTTP, FTP and so on
      • VxWorks: very good network support of almost all internet protocols.
  • 37. Major Embedded OS(cont’)
    • Factors impact embedded operating system
      • Application requirement impact
      • Hardware impact
  • 38. H/S interaction
    • Today’s embedded system designers face the difficult task of integrating and verifying application-specific software and hardware with intellectual property (IP) such as protocol stacks and commercial real-time operating systems (RTOSs). When the system design includes developing application-specific integrated circuits (ASICS), engineers often postpone the integration and verification task until a hardware prototype is available. Waiting until this stage to debug adds unnecessary costs and delays. However, new methodologies allow integration teams to verify their embedded systems applications while meeting design and time-to-market goals.
  • 39. H/S interaction(cont’)
    • Semiconductior manufactures should not ignore embedded software
    • Software experts are unlikely to solve the embedded software problem on their own
  • 40. H/S interaction(cont’)
    • Software/Hardware Codesign
      • Simultaneous design of both hardware and software to implement in a desired function
  • 41. H/S interaction(cont’)
    • Why is embedded software an issue for semiconductor manufactures?
      • Silicon without software getting rare
      • Time-to-volume is often dominated by SW development
      • Software requirements affect hardware design
      • Embedded software design is getting harder(networking, complexity)
      • Mainstream SW engineering is not addressing embedded SW well
  • 42. H/S interaction(cont’)
    • Why is embedded SW not just SW on small computers?
      • Interaction with physical processes(sensors, actutors, etc.)
      • Critical properties are not all functional(real-time, fault recover, power, security, robustness)
      • Heterogeneous(hareware/software, mixed architectures)
      • Concurrent(interact with multi processes)
      • Reactive(operating at the speed of environment)
  • 43. H/S interaction(cont’)
    • Hardware experts have sth to teach to the SW world
      • Concurrency
      • Reusability
      • Reliability
      • Heterogeneity
  • 44. H/S interaction(cont’)
    • What an embedded program might look like
  • 45. H/S interaction(cont’)
    • Simple example: controlling an inverted pendulum with embedded SW
  • 46. H/S interaction(cont’)
    • Metaphor for:
      • Disk drive controllers
      • Manufacturing equipment
      • Automotive:
        • Drive-by-wire devices
        • Engine control
        • Antilock braking systems, traction control
  • 47. H/S interaction(cont’)
      • Avionics
        • Fly-by-wire devices
        • Navigation
        • Flight control
      • Certain “software radio” functions
      • Printing and paper handling
      • Signal processing(audio, video, radio)
  • 48. H/S interaction(cont’)
    • System HW/SW design methodology
      • Specification capture
        • Create model
        • Select language for specification
      • Exploration
        • Allocate architectural components
        • Partition the specification to the architectural components
  • 49. H/S interaction(cont’)
      • Refinement of specification
      • Hardware and software design
        • Synthesis
        • Simulation
      • Physical design
        • Generate manufacture data for hardware
        • Compile code for instruction sequence
  • 50. H/S interaction(cont’)
    • Co-design, particularly important when designing embedded systems or systems-on-a-chip
    • There are many areas in which the co-design principle can bring product enhancements
    • Massive parallelism, distributed algorithms and special architectures
    • Efficient interfaces are required
    • Low-Power Processors
    • Reconfigurable Systems are capable of adapting to changing environments or to incomplete specifications
    • Parallel I/O on multiple PC's
  • 51. H/S interaction(cont’)
  • 52. H/S interaction(cont’)
    • Hardware/software codesign
      • the cooperative design of hardware and software components;
      • the unification of currently separate hardware and software paths;
      • the movement of functionality between hardware and software;
      • the meeting of system-level objectives by exploiting the synergism of hardware and software through their concurrent design.
  • 53. H/W interaction(cont’)
    • Why is it important?
      • Reconfiguration: exploiting the synergy between hardware and software
  • 54. H/S interaction(cont’)
      • Embedded systems are application specific systems which contain both hardware and software tailored for a particular task and generally part of a larger system
      • Reusability: to provide design approaches that scale up, without a total redesign for a legacy product
  • 55. H/S interaction(cont’)
    • Existing Problems
      • Model Continuity Problem
  • 56. H/S interaction(cont’)
    • Importance of Model Continuity
      • many complex systems do not perform as expected in their operational environment;
      • continuity allows the validation of system level models at all levels of hardware/software implementation;
      • trade-offs are easier to evaluate at several stages
  • 57. H/S interaction(cont’)
    • Consequences of losing such model continuity
      • cost increases and schedule over-runs (due to modifications late in phases);
      • the ability to explore hardware/software trade-offs is restricted (e.g. movement of functionality between, modification of interfaces);
      • state of the art applications require a case-by-case analysis;
      • different design cultures hamper integration
  • 58. H/S interaction(cont’)
    • Solution
      • Unified Design Environment: it is emphasized that hardware design and software design use the same integrated infrastructure, resulting in an improvement of overall system performance, reliability, and cost effectiveness.
  • 59. H/S interaction(cont’)
    • Typical context for co-design process
      • An “ideal” process flow
  • 60. Memory Constraints
    • Memory is usually a critical resource and the memory size is often very restricted
    • Both static and dynamic memory usage within a task and the dynamic memory usage due to communication should be considered
    • Mapping is also a problem
  • 61. Fault-tolerance
    • Allowable system failure probability is 10 -10 per hour
    • Software fault tolerance
    • Hardware fault tolerance
  • 62. Fault-tolerance(cont’)
    • Software fault tolerance
      • Timeouts
      • Audits
      • Exception handling
      • Task roll back
      • Incremental reboot
      • Voting
  • 63. Fault-tolerance(cont’)
    • Hardware fault tolerance
      • Redundancy Schemes
        • One for one redundancy: each hardware module has a redundant hardware module
        • N + X redundancy: if N hardware modules are required to perform system functions, the system is configured with N + X hardware modules; typically X is much smaller than N
        • Load sharing: under zero fault conditions, all the hardware modules that are equipped to perform system functions, share the load
  • 64. Fault-tolerance(cont’)
      • Standby synchronization
        • Bus cycle level synchronization
        • Memory mirroring
        • Message level synchronization
        • Checkpoint level synchronization
        • Reconciliation on takeover
  • 65. Fault-tolerance(cont’)
    • Fault handling techniques
      • Fault handling lifecycle
      • Fault detection
      • Fault isolation
  • 66. Fault-tolerance(cont’)
    • Fault-handling lifecycle
  • 67. Fault-tolerance(cont’)
    • Fault detection
      • Sanity monitoring
      • Watchdog monitoring
      • Protocol faults
      • In-service diagnostics
      • Transient leaky bucket counters
  • 68. Fault-tolerance
    • Fault isolation
    • If a unit is actually faulty, many fault triggers will be generated for that unit. The main objective of fault isolation is to correlate the fault triggers and identify the faulty unit. If fault triggers are fuzzy in nature, the isolation procedure involves interrogating the health of several units. For example, if protocol fault is the only fault reported, all the units in the path from source to destination are probed for health
  • 69. Timing
    • Real time: A real-time system provides specified system services with known timing and latency characteristics, so that applications can be designed and developed that meet perscribed timing constraints
    • Hard real time: In a hard real-time system, the timing constraints have an upper worst-case value, which if exceeded, cause the application to fundamentally fail
    • Soft real time:In a soft real-time system, the timing constraints do not have an upper worst-case value, but meet an acceptable statistical distribution of timings. In this case, occasional longer latencies either do not really cause failures, or the failure rates are acceptable