Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios
Upcoming SlideShare
Loading in...5
×
 

Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios

on

  • 1,951 views

Este curso tem como principal objetivo apresentar aos ouvintes conceitos sobre redes de sensores sem fio (RSSF), protocolos de comunicação para RSSF e conceitos de computação autonômica. Além ...

Este curso tem como principal objetivo apresentar aos ouvintes conceitos sobre redes de sensores sem fio (RSSF), protocolos de comunicação para RSSF e conceitos de computação autonômica. Além disso, aplicações focadas nas áreas de monitoramento ambiental, agricultura de precisão, segurança e defesa também serão apresentados.

Statistics

Views

Total Views
1,951
Views on SlideShare
1,949
Embed Views
2

Actions

Likes
0
Downloads
5
Comments
0

1 Embed 2

http://192.168.33.10 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

 Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios Document Transcript

  • PUC - Campinas - SP, Brazil May 21 - 25, 2012 II BrazilianConference on Critical Embedded SystemsPromoted by: Support:Organized by: Sponsors: Published by:
  • II Critical Embedded Systems School (CES-School) II Brazilian Conference on Critical Embedded Systems (CBSEC 2012)PrefaceThe II Brazilian Conference on Critical Embedded Systems (CBSEC 2012) aims to joinacademy and industry to discuss major technical and practical issues in the developmentof critical embedded systems. The first edition took place in May, 2011, in São Carlos(Brazil).In this second edition, the emphasis is on aerial and terrestrial autonomous vehicles. Themain objective is to boost the capabilities of the academy and industry in teaching,training, researching and development in the area through papers presentation, shortcourses, tutorials, a student workshop and an exhibition. A comprehensive display ofrelevant scientific and technological tools, applications and methodologies with socialand economic impact in strategic areas such as agriculture, security and defense,automotive, aviation, satellite and environment protection will be put together anddiscussed from 20th to 25th of May, 2012, in Campinas (Brazil).The II Critical Embedded Systems School (CES-School) is a joint event of the CBSEC.In this edition, we received 12 short courses proposals from which four were selected forpresentation. In addition, two advanced courses and one international tutorial wereinvited. All of them explore themes of interest for academics and professionals involvedwith the development of critical embedded systems.We thank the Pontifícia Universidade Católica de Campinas for hosting the secondedition of CES-School into CBSEC. Finally, we welcome the speakers and participants ofCES-School 2012. We wish everyone a great conference! Ellen Francine Barbosa (ICMC/USP) Itana Maria de Souza Gimenes (DIN/UEM) CES-School 2012 Chairs
  • Table of ContentsTutorialInteraction Control for Contact Robotics Neville Hogan (MIT-USA)Invited CoursesThe “Why” and “How” of Software Safety Analysis in a Cross-Domain Review Sören Kemmann (Fraunhofer/IESE)Model-Driven Engineering of Complex Embedded Systems: Concepts and Tools Flávio R. Wagner, Francisco A. M. Nascimento, Marcio F. S. Oliveira (UFRGS / University of Paderborn)Short CoursesIntrodução ao Desenvolvimento de Software Embarcado Alexandra C. P. Aguiar, Sérgio J. Filho, Felipe G Magalhães, Fabiano P. Hessel (PUC-RS)Introdução a Sistemas Embarcados e Projeto baseado em Plataformas Marcio S. Oyamada, Alexandre A. Giron, João A. Martini (UEM, UNIOESTE)Introdução aos Sistemas Embarcados utilizando FPGAs Edilson R. R. Kato, Emerson C. Pedrino (UFSCar)Redes de Sensores sem Fio Autonômicas: Abordagens, Aplicações e Desafios Alex S. R. Pinto, Gustavo M. Araújo, José M. Machado, Adriano Cansian, Carlos Montez (UNESP - Rio Preto)
  • Interaction Control for Contact RoboticsNeville HoganSun Jae Professor of Mechanical EngineeringProfessor of Brain and Cognitive SciencesMassachusetts Institute of TechnologyAbstractContact robotics—close physical contact and cooperation between robots and humans—requires reliable, robust control of interaction. I will review some of the interesting andperhaps unique challenges of interaction control. Most control theory is permeated by a“signals” perspective: each system component is described as a mathematical operatorthat unilaterally determines its output (signals) as a function of its input (signals)—butnot vice-versa. Composition of operators is straightforward and the result is modularity:behavior of a component is essentially unaffected by its assembly into a system, therebydramatically simplifying design of complex machines. Unfortunately, the interactions dueto physical contact are usually bi-lateral—each system affects the other. The “controlledsystem” blends the robot dynamics with those of the contacted object, which may bepoorly or incompletely unknown. As a result the “signals” perspective doesn’t work well.I will review the mechanical physics of interaction, define what is meant by a “port” andshow its usefulness for establishing impedance or admittance control. Drawing heavily onconcepts from physical systems, I will review how a port-based perspective yields simplesolutions for stabilizing contact, coping with (and taking advantage) of redundancy andselecting optimal behavior for different tasks.Background papersHogan, N. and S. P. Buerger (2004). Impedance and Interaction Control. Robotics and Automation Handbook. T. R. Kurfess, CRC Press: 19-1 to 19-24.Fasse, E. D. and N. Hogan (1995). Control of physical contact and dynamic interaction. 7th International Symposium on Robotics Research. Germany.Mussa-Ivaldi, F. A. and N. Hogan (1991). "Integrable Solutions of Kinematic Redundancy Via Impedance Control." International Journal Of Robotics Research 10(5): 481-491.Hogan, N. (1988). "On The Stability of Manipulators Performing Contact Tasks." IEEE Journal of Robotics and Automation 4(6): 677-686.Hogan, N. (1985). "Impedance Control: An Approach to Manipulation." ASME Journal of Dynamic Systems, Measurement and Control 107: 1-24.
  • Model-Driven Engineering of Complex Embedded Systems: Concepts and Tools Fl´ vio Rech Wagner∗ , Francisco A. M. Nascimento∗ and Marcio F. S. Oliveira∗† a ∗ Instituteof Informatics , Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil ∗† Cooperative Computing and Communication Laboratory(C-LAB), University of Paderborn, Paderborn, Germany Abstract—This paper starts presenting a brief history of engi- In order to overcome the difficulty in rising the abstractionneering methods till Model-Driven Engineering (MDE). Then, it level and to improve the automation of the design from theintroduces the basic principles of MDE, including the concepts initial specification until the final system, research efforts lookof models, meta-models, transformation between models, anddomain specific languages (DSLs). One can identify two classes for modeling methods, formalisms, and suitable abstractionsof tools. The first one is the framework required to support to specify, analyze, verify, and synthesize embedded systemsMDE of any kind. It supports different operations and common in a fast and precise way.tasks, independently from development domain, and strongly rely The main motivation to use models in the design ofon standards. As such, some MDE standard approaches, as for embedded systems is abstraction. Abstraction helps us toexample MDA (Model Driven Architecture), Software Factories,and MIC (Model Integrated Computing), are explained, and we understand a complex system, hiding irrelevant informationprovide a short survey on different technologies supporting MDE to solve a specific problem. However, abstraction alone does- MOF and Ecore for metamodeling, UML and DSLs for model- not improve the development. Accuracy is required, so thating, QVT, ATL, Xtend, and Xpand as transformations languages. models truly represent a specific system view. A model mustA second class of tools adopts an MDE framework to provide clearly communicate its intent and must be easy to understandDomain Specific Engineering Tools (DSET), which aggregatedomain specific knowledge to define relations between models and to develop, in order to be effective [2].and how these models can be refined. Focusing on Complex A prominent effort that attempted to use models in orderEmbedded Systems, some DSETs for the development of these to rise abstraction and automate development tasks originatedsystems are described. The paper shows in detail a DSET, which the Computer Aided Software Engineering (CASE) tools.uses an MDE framework based on the Eclipse Modeling Project CASE tools provide graphical representations for fundamentaland OMG standards. Finally, the paper presents the applicationof a DSET for Embedded Systems in a complete development programming concepts and automatically generate implemen-flow. For this, we start defining a sample embedded system and tation code from them. The main purpose of these tools was toshow how the system requirement specification can be refined reduce the effort of manually coding, debugging and portingthrough different development phases using an MDE approach. programs. However, due to the limited platforms existing atThe development process relies on different tools, which support that time, the code to be generated was too complex formultiple semi-automatic or automatic development tasks. the available technology. Moreover, the graphical represen- tations were too generic and poorly customizable, and thus I. I NTRODUCTION they could not support many application domains. Nowadays, Nowadays we are surrounded by devices containing hard- these limitations have been drastically reduced, due to object-ware and software components. These devices support many oriented languages and development frameworks, which makedifferent domains, such as telecommunication, avionics, au- easier the reuse of software components. However, thesetomobile, space, military, medical care, and others. They development frameworks and platforms are extremely complexare inserted into our dayly lives, in cell phones, in cars as and evolve quickly, causing a fragmented view due to multiplecontrollers for multiple subsystems (e.g., ABS, EPS, etc), in tool integrations required for developing new applications [3].the electronic toys, in the blood pressure measurement systems Although models are used in any engineering domain, onlyand so on. In short, they are found anywhere, and so they are recently models start playing a central role in the develop-called Embedded Systems, as they are information processing ment process of software and embedded systems [2]. Model-systems embedded into products, where the processing system Driven Engineering (MDE) [4] has been proposed in order tois not the main goal or functionality of the product [1]. improve the complexity management and also the reusability The ever growing complexity in modern embedded sys- of previously developed/specified artifacts. The MDE methodtems require the utilization of more hardware and software raises the design abstraction level and provides mechanisms tocomponents to implement the functions incorporated into a improve the portability, interoperability, maintainability, andsingle system. Such increasing functionality leads to a growing reusability of models.design complexity, which must be managed properly, because, In our MDE approach, we use only MOF concepts (Metabesides stringent requirements regarding power, performance Object Facility, a standard representation for meta-modelsand cost, also time-to-market hinders the design of embedded and models proposed by OMG [5]) to define our internalsystems. representation. Thus, as our metamodels conform to MOF, the
  • representation can take advantage of the concept of transfor- The method proposed in our MDE approach overcomesmations between models to implement DSE (Design Space those restrictions by defining a design space abstraction,Exploration), formal verification and co-synthesis tasks. using a categorical graph product [9]. Besides the automatic Our MDE approach defines internal models conforming to construction of the design space, performed by the productMOF-based metamodels proposed to represent applications, of graphs, this abstraction provides a common representationcapturing functionality by means of processes communicat- for multiple design activities. Moreover, the specification ofing by ports and channels; platforms, indicating available a metamodel using a well-adopted technology allows us tohardware/software resources; mappings from applications into exploit the MDE approach, such that model-to-model trans-platforms; and implementations, oriented to code generation formation rules are used to implement any user constraints,and hardware synthesis. Additional metamodels and transfor- improving the flexibility of the DSE method.mations extend this infrastructure to perform design specific The remaining of the text is organized as follows. Sectiontasks such as DSE and verification. II provides the background on MDE. Section III presents We support a formal verification methodology. By using the technologies developed to support MDE, such as modelingMDE approach, we generate a MOF-based representation of a languages, transformation languages, and engines. Sectionnetwork of timed automata [6] from UML Class and Sequence IV presents an overview on MDE approaches for embeddeddiagrams. We use the network of timed automata as input to systems design. Sections V and VI present the MDEthe UPPAAL model checking tool [7], which can validate the framework for embedded systems design, in development atdesired functional and temporal properties of the embedded UFRGS. Section VII present a case study, which illustrates thesystem specification. Since the network of timed automata is methodology described inthe previous secion. Section VIII,automatically generated from UML models, the methodology finally, discusses future trends and gives final remarks.is very useful for the designers, making easier the debugging II. M ODEL -D RIVEN E NGINEERING BACKGROUNDand formal validation of the system specification. Moreover, we offer an MDE methodology for the co- The MDE approach was proposed to overcome the lim-synthesis problem [8], which is integrated with the formal itation of the object technology to rise the abstraction andverification approach. This way, after the formal validation deal with the increasingly more complex and rapidly evolvingof the desired properties, the validated system specification is systems we are developing today. Proposing that ”Everythingused as input to our MDE co-synthesis tool. Therefore, this is a model”, MDE promotes the paradigm shift required to themethodology exploits the MDE approach to automatically gen- necessary evolution [10]. Although the central concept of thiserate a correct-by-construction implementation for a specific proposal still has multiple definitions, a consensual definitionplatform. of model and modeling is presented in [11]: Other approaches do not consider the influence of the ”Modeling, in the broadest sense, is the cost-structural features of the UML model in the communication effective use of something in place of something elsebehavior of a specified application. Our internal design repre- for some cognitive purpose. It allow us to use some-sentation, in turn, also captures the hierarchy and communica- thing simpler, safer or cheaper than reality instead oftion structure of the UML model in the form of a graph. This reality by some purpose. A model represents realityway, we represent the control and data flow dependencies in for the given purpose; the model is an abstractiona convenient way for the co-synthesis algorithms. of reality in the sense that it cannot represent all During the development of complex embedded systems, a aspects of reality. This allows us to deal with thewide range of design alternatives arises from different design world in a simpler manner, avoiding the complexity,activities. The combination of alternative designs and stringent danger and irreversibility of reality.” [11]requirements unveils a complex design space, which the design Since the main principle of MDE is that ”Everything is ateam must evaluate under reduced time-to-market. Design model”, models play a central role in the development process,Space Exploration (DSE) consists in systematically searching thus defining the scope of MDE proposed in [4]. The basicfor different design candidates, by mapping an application into concepts to support the MDE principle are system, model,an architectural platform. Each different candidate corresponds metamodel, and the relations between them, so that a modelto a trade-off regarding design requirements and constraints. represents a system and conforms to a metamodel [10]. Such Concerning the DSE process, all methods discussed in the concepts were organized in 3+1 layers [10], and are illustratedfollowing Section IV restrict the design space, according to in Figure 1.the activity to be performed. Moreover, the generation of Formally, a model in MDE is a graph composed of elementscandidate designs is internally implemented, usually as a (vertices and edges), where each element corresponds to afunction that is programmed directly in the tool. As a result, concept in a reference graph (metamodel) as defined below:no extension mechanisms are provided, requiring multiple Definition 1: A directed graph G = NG , EG , ΓG consiststools to support each design activity. Moreover, for most of a set of distinct nodes NG ; a set of edges EG ; and aapproaches either the constraints set is restricted to previous mapping function ΓG : EG → NG × NG .constraints implemented by the tool or the method supports Definition 2: A model M = G, ω, µ is a tuple wherelimited constraints constructs. G = NG , EG , ΓG is a directed graph; ω is itself a model,
  • Fig. 3. MDE context: principles, standards and tools • Model refactoring Fig. 1. Basic concepts and layered organization • Reverse engineering • Verification, etc. Based on the posible applications of model transformations, they can be classified in: • Model-to-Model, when the source and target of the trans- formation are models, e.g. transformation from UML to a relational Data Base (RDB) schema or from a Platform Independent Model (PIM) to a Platform Specific Model (PSM); • Model-to-System, characterizing a generation from model to system, which can include program code or any other artifact, e.g. UML to Java or Simulink to C++; • System-to-Model, meaning a reverse engineering, such as Fig. 2. Model transformation in the context of MDE from Java code to a UML model or from Java code to a business model. A more detailed survey on model transformation approachesnamed reference model of M , associated to a graph Gω = is presented in [13]. Nω , Eω , Γω ; and µ : NG ∪ EG → Nω is a functionassociating elements (nodes and edges) of G to nodes of Gω III. T ECHNOLOGICAL F RAMEWORKS(metamodel). A metamodel is a model, which is a reference model for Technological frameworks [14] are tools to support differentother models, so that it defines classes of models that can be operations and common tasks for MDE independently fromproduced conforming to it. It is an abstraction, which collects the application domain. Such tools rely on standards, such asconcepts of a certain domain and the relations between these MDA, MIC, and Software Factories, in order to generalizeconcepts. the manipulation of models, providing facilities such as per- MDE models are operated through transformations, aiming sistence, repository management, copy, etc. Figure 3 illustratesat the automation of some development activity. Such trans- the relationship between the principles, presented in Section II,formations define clear relationships between models [10] and standards and tools.usually are specified in a specialized language to operate on An overview on some standards and tools are presented in(graph) models. Following the description in [12], a model the next subsections.transformation means converting one or more source modelsto a target model, where all models must conform to some A. MDE Standardsmetamodel, including the model transformation itself, which 1) Model-Driven Architecture: Model-Driven Architectureis also a model. Figure 2 illustrates the concept of model (MDA) is a standard proposed by OMG for software de-transformation in the MDE context. velopment. The main purpose of MDA is the abstraction Model transformation plays a key role in MDE and has of platforms, so that the business models can be reused asmany applications, as enumerated in [13]: the technological platform evolves. MDA integrates different • Generating low-level models from high-level ones OMG standards, such as MOF for metamodeling, UML for • Generating development artifacts (e.g. configuration files system modeling, SPEM for process modeling, and QVT for and source code) model transformation. In order to separate business and appli- • Mapping and synchronizing models cation models from the underlying platform, MDA advocates • Creating query-based views of a system three modeling dimensions (view points):
  • • The Computation Independent Model (CIM) focuses on line that provides a production facility for the product family the required features of the system and on the environ- by configuring extensible tools using a software template based ment where it must operate; on a software schema” 1 • Platform Independent Model (PIM) focuses on business A Software Factory Schema describes the artifacts that functionality and behavior, which are unlikely to change comprise a software product. It is represented by a graph, from one platform to another; where vertices are viewpoints and edges are relationships be- • Platform Specific Model (PSM) describes platform spe- tween viewpoints (mapping). Each viewpoint defines the tools cific details integrated with elements of PIM. and materials required by a concern in a specific abstraction The relationship between PIM and PSM in MDA can level. Attached to a viewpoint, a micro process is definedbe established by automatic or semi-automatic mechanisms, for producing the artifacts described in the viewpoint. Suchspecifying a mapping between these models. MDA suggests process is constrained by preconditions, postconditions andthat this mapping can be specified by using QVT, so that invariants that must hold when the view is stabilized.a transformation engine can generate the automatic transfor- A Software Factory Template is the collection of DSL’s,mation from PIM to PSM. The languages used to express patterns, frameworks and tools described in the Softwarethese models are defined by means of metamodels using MOF, Factory Schema, which is made available to developers, inwhich are able to represent abstract and concrete syntaxes, as order to create a specific software product.well as the operational semantics of the modeling language.Originally, MDA was proposed for enterprise architectures B. MDE Toolsthat use platforms, such as Java2EE, CORBA, VisiBroker,and WebSphere. However, as using the MDA approach the The MDE approach has a practical relevance only if it candevelopment of systems can be focused on aspects that do produce and transform models bringing considerably morenot involve implementation details, many other domains start benefits than the current practices. Therefore, to enhance theconsidering the MDA approach, such as real-time and embed- value of models, they must become tangible artifacts, whichded systems. Therefore, MDA and the experience with OMG can be simulated, verified, transformed, and so on, and thestandards are in the origins of MDE. burden for maintaining these models in synchronization with 2) Model Integrated Computing: Model Integrated Com- the produced system must be reduced [4].puting (MIC) [15] is an initiative from Vanderbilt University. Supporting tools are essential to provide all benefits ofIn this approach, models representing different views capture MDE. This section describes some MDE tools, focusing onthe designer’s understanding of the computer-based system, tools supported by the Eclipse Modeling Project (EMP)2 . EMPincluding information process, physical architecture, and oper- provides a unified set of modeling frameworks, tooling, andating environment. A formal specification of the dependences standards implementations.and constraints among these models allows the generation of 1) Metamodeling/Abstract Syntax: As the model is thetools to solve an entire class of problems. MIC proposes a most important artifact in MDE, defining the class of modelstwo step development process. In the first step, a domain- an MDE process must work on is one of the first steps. Thisindependent abstraction is used to formally define a domain is done by metamodeling, which defines the structured dataspecific environment and the required models, languages and types used to represent a system (abstract syntax). In EMP,tools. In the second step, three typical components delivered metamodels are defined conforming to ECORE, a metameta-from the previous phase are used for system engineering: model (layer 3 in Figure 1) defined by the Eclipse Modeling Framework (EMF). EMF is a projection of ECORE, and of • A graphical model builder is used to specify domain the models conforming to it, into Java API. It provides code specific models. Constraints explicitly defined at meta- generation facilities and tools to building model editors and to level allow model testing. compare, query, persist and validate models. As most tools in • A model database stores domain specific multiview mod- EMP are based on ECORE and EMF, and many other projects els using a multigraph architecture. make use of EMF, ECORE is a de fato standard. • Model Interpreters are used to synthesize executable programs from the domain specific models and generate Besides Ecore metametamodels and EMF, other metamod- data structures for the tools. eling tools are found. Kermeta 3 is based on the OMG standard Essential MOF (EMOF), which was originated from ECORE MIC has a strong influence on the principles of MDE as it and KM3, a metametamodel proposed in [17]. MetaGME ishas a wider basis on engineering of systems than MDA. More- a metamodeling tool, which implements the metamodelingover, the two step process advocated by MIC is close to the concepts for MIC. Originally, its metametamodel was calledidea of Technological Frameworks as a basis of development Multigraph Architecture. Newest versions use UML classfor Domain Specific Engineering Tools present in the MDE diagrams notation and OCL for metamodeling.approach. 3) Microsoft Software Factories: The main idea behind the 1 http://msdn.microsoft.com/en-us/library/ms954811.aspxSoftware Factories [16] is to introduce patterns of industrial- 2 http://www.eclipse.org/modeling/ization in the software development. It is ”a software product 3 http://www.kermeta.org/
  • 2) Concrete Syntax: A concrete syntax for a DSML (Do- to engineer not only software, but entire systems, which maymain Specific Modeling Language) can be defined using the be also composed of hardware, electrical, and mechanicaltools from the Eclipse Graphical Modeling Project. It provides parts. This section present some DSMDETs for embeddedtools, such as GMF Notation and Graphiti, to specify the system development.concrete syntax and to generate an editor to express models The adoption of platform-independent design and exe-graphically. cutable UML has been vastly investigated. For example, The definition of the concrete syntax of languages expressed xtUML [19] defines an executable and translatable UML sub-as text is also possible by using tools such as Xtext. It provides set for embedded real-time systems, allowing the simulation ofa simple EBNF language, which is used to define grammars, UML models and C code generation oriented to different mi-and a generator to create a parser, an AST-metamodel (im- crocontroller platforms. The Model Execution Platform (MEP)plemented in EMF), and a Eclipse text editor for the defined [20] is another approach based on MDA, oriented to codelanguage. generation and model execution, as well as the Framework 3) Model Development: For common general purpose and for UML Model Behavior Simulation (FUMBeS) [21].domain specific languages, there is no need to build new Other approaches improve the integration of the designeditors as good tools can be found, such as Magic Draw, En- tools into an MDE environment, by defining meta-models, andterprise Architecture and Rhapsody for modeling with UML. the transformations on them include some refinement. ThisSimulink and Scade are DSML’s commonly used for control approach includes the DaRT (Data Parallelism to Real Time)engineering and signal processing and specialized tools for that project [22], [23], whose evolution produced the Gaspard2are also provided. Eclipse Model Development tools provide framework. It proposes an MDA-based approach that hasmodel editors for some standards such as UML, XML, and many similarities with our approach in terms of meta-modelingOCL. concepts. DaRT defines MOF-based metamodels to specify ap- 4) Model Transformation: Since model transformation is plication, architecture, and software/hardware associations andthe key operation for MDE, many transformation engines and uses transformations between models to refine an associationlanguages were proposed. However, after the experience with model. In the Gaspard2 framework [24] UML/MARTE modelsfirst languages, a discussion on classification [13] and quality are used as input and transformation to other tools, providingmetrics [18] is starting to take place in the research agenda, support for co-synthesis, simulation and formal verification,so that a standard with high adoption may rise. by translating its model into synchronous reactive languages. EMP had many model-to-model transformation languages, However, no automated DSE (Design Space Exploration)but now the efforts concentrate on ATL and in a reference im- strategy based on these transformations is implemented, andplementation of QVT, the QVT Operational. Other languages the main focus is code generation for simulation at TLMare provided as Eclipse projects or Eclipse plug-ins, such as (Transaction Level Model) and RT (Register Transfer) levels.VITRAII and GReAT. In this approach, each candidate solution is simulated at a Model-to-text (Model-to-System) transformation is pro- different abstraction level, thus guiding the designer in thevided by EMF through three different template-based lan- DSE activities.guages: Java Emitter Template (JET); Acceleo, which is a The Aspect-oriented Model-Driven Engineering for Real-implementation of an OMG standard, named MOF to Text Time systems (AMoERT) methodology [25] proposes anLanguage; and Xpand, which was initially an openArchitec- automated integration of design phases for distributed em-turalware component. bedded real-time systems, focusing on automation systems. The proposed approach uses MDE techniques together with IV. M ODEL D RIVEN E NGINEERING OF C OMPLEX Aspect-Oriented Design (AOD) and previously developed (or E MBEDDED S YSTEMS third party) hardware and software platforms to design the In [14] two classes of MDE tools are identified. The first components of distributed embedded real-time systems. AODclass is called MDE Technology Framework, which support concepts allow a separate handling of functional and non-the MDE process by providing tools for different operations functional requirements, improving the modularization of theand common tasks, independently from development domain, produced artifacts. In addition, the mehodology is supportedsuch as metamodeling, transformation engines and languages, by GenERTiCA code generation tool [25], which uses map-debugger, tracing, and other facilities. These tools rely strongly ping rules for the automatic transformation of UML modelson standards. Some of them where presented in the previous into source code for software and hardware components,section, such as the tools provided by the Eclipse Modeling which can be compiled or synthesized by other tools, thusProject. The second type of tools adopts an MDE framework to obtaining the realization/implementation of the distributedprovide Domain Specific Application Development Environ- embedded real-time system. During the generation process,ments (DSAEs), which aggregate domain specific knowledge the tool includes the required implementation code to handlefor defining relations between models and how these models the specified aspects for non-functional requirements (modelcould be refined. Generalizing this concept, we assume that weaving).Domain Specific Model-Driven Engineering Tools (DSMDET) Metropolis [26] is an infrastructure for electronic system de-are those tools which rely on an MDE technology framework sign, in which tools are integrated through an API and a com-
  • mon metamodel. Following the platform-based approach [27],the Metropolis infrastructure captures application, architectureand mapping using a proposed UML-platform profile [28].Furthermore, its infrastructure is general enough to supportdifferent Models of Computation and to accommodate newones. Non automatic support for Design Space Exploration isprovided by Metropolis, which proposes an infrastructure tointegrate different tools. Nevertheless, the current simulationand verification tools integrated into Metropolis and the pro-posed refinement process can be used to manually performsome architectural explorations (task mapping, scheduling,hardware/software partitioning) and component configuration.Moreover, the refinement process allows the explicit explo-ration of application algorithms, which implement a higherlevel specification. Koski [29] is a UML-based framework to support MPSoC(Multi-Processor System-on-Chip) design. It is a library-basedmethod, which implements a platform-based design. Koskiprovides tools for UML system specification, estimation, ver-ification, and system implementation on FPGA. The Koski Fig. 4. MODES Development Flowdesign flow starts with a requirement analysis, which spec-ifies the application or architecture requirements and designconstraints. Following the design flow, the application, ar- engineer specifies the application independently from the plat-chitecture and the initial mapping are specified as UML 2.0 form using UML as modeling language. MoDES provides themodels. A UML interface handles these models and generates components System Designer and Application, Platform andan internal representation, which is used for architectural Implementation Managers, which transform the UML modelsexploration. The architectural exploration is performed in two into internal models conforming to metamodels proposed tosteps; the first one is static, fast and less accurate; the second represent applications, capturing functionality by means ofone is dynamic. At the end of the design flow, the UML models processes communicating by ports and channels; platforms,are used to generate code and the selected components from indicating available hardware/software resources; mappingsthe platform are linked to build the system. from applications into platforms; and implementations, ori- Other complete environment for design space exploration ented to code generation and hardware synthesis. Additionalis the MILAN [30] framework, with two exploration tools metamodels and transformations extend this infrastructure tocalled DESERT [31] and HiPerE [32]. The focus of MILAN perform design specific tasks such as DSE (Design Spaceis the integrated simulation of embedded systems, so that Exploration) and verification. Figure 4 illustrates the MoDESit evaluates pre-selected candidate solutions. The hierarchi- infrastructure including the models, according to metamodelscal simulation provided by MILAN allows a designer to with the same names, and the flow of transformation be-explore the design space at several abstraction levels, by tween tools, which provides support to DSE (H-SPEX) [34],using the DESERT and HiPerE tools. First, the DESERT estimation (SPEU) [35], formal verification (UPPAAL) [7],tool uses models of aggregated system sub-components and and co-synthesis (System Designer, Application, Platform andconstraints to automatically compose the embedded system Implementation Managers).through Ordered Binary Decision Diagrams (OBDD), based We implemented the MODES framework using the open-on a complete pre-characterization of components. Moreover, source Eclipse Modeling Framework (EMF) to define ourthe DESERT tool performs design space pruning, reducing Ecore conformant metamodels, while openArchitectureWarethe number of candidate solutions. After that, HiPerE can be is used to define transformations and workflow between tools.used for accurate system-level estimation, exploring the pruned The UML models can be specified in any editor that providesdesign space. Finally, by using integrated simulation at lower an XMI compatibility with EMF tools such as Eclipse UML2.abstraction levels, the designer can explore the reminder of A. System Modelingthe design space, performing then also platform tuning. The proposed system development methodology adopts V. MODES: A N MDE F RAMEWORK FOR E MBEDDED UML and the MARTE profile together with modeling guide- S YSTEMS D ESIGN lines to specify application, architecture, and mapping. As an example, consider a real-time embedded system dedicated to Our MDE approach to embedded systems design automa- the automation and control of an intelligent wheelchair. Thetion is supported by the Model-based Design for Embedded application structural model is specified using Class Diagrams.System (MoDES) Framework [33]. In this approach, the Figure 5 shows a partial class diagram for the movement
  • Fig. 5. Application Model: UML Class Diagram Fig. 8. Internal Application Metamodel a ModuleBody. ModuleDeclarations are used to spec- ify typed Channels, Ports, Signals, and Variables. These concepts come from hardware description languages, such as VHDL. Channels are used by Processes to send or receive messages. Ports interconnect the Modules. Fig. 7. Architecture Models: Composite Diagram Signals are used to specify shared memories for processes. Variables correspond to the local memories for processes.control of the automated wheelchair. A ModuleBody consists of Interconnections with The behavioral model is defined using Interaction Diagrams, other Modules and a ModuleBehavior, as well as sub-containing loop and conditional execution, interaction between modules. The ModuleBehavior is captured in terms of a setobjects, and dependencies between execution scenarios. An of Processes, and each Process has a set of Actions,Interaction Overview Diagram identifies and link the scenarios which represent the occurrence of UML events in the scenariosused to evaluate the system during the estimation process. (UML sequence diagrams), as will be shown in the following.For our example, an Interaction Overview Diagram specifies a The behaviors of the processes are associated to a Modelparallel composition of three UML sequence diagrams, which of Computation (MoC). This association allows the translationare illustrated in Figures 6 (a), (b), and (c). from an abstract behavior description to a specific MoC The allocated architectural components, such as processing and the execution of algorithms to automate design tasks.units, memories and communication busses, are defined in Currently, two MoC’s are supported and their metamodelsComposite Diagrams. The Composite Diagram can also be extend the IAMM as described in subsections V-C and V-D.used to define the mapping from application to architecture, C. Interaction Graph Metamodelfor example to specify in which processing unit a software el- The control and data flow graph (CDFG) [8] of an applica-ement must execute, as illustrated in Figure 7. The Component tion model is defined conforming to the metamodel presentedDiagrams are considered as constraints during the automatic in Figure 9.DSE process. Figure 9 represents an InteractionGraph, which con- Alternatively, a Domain Specific Language (DSL), defined sists of a set of IGNodes and IGEdges. Each IGNode canto specify models for application, platform, and implementa- represent different kinds of control flow:tion, can be used instead of UML. To this purpose we use theXtext feature of openArchitectureWare, which automatically • IGInitialNode and IGFinalNode indicate the be-generates the parser and a text editor for these DSLs as Eclipse gin and end of an InteractionGraph, respectively;plug-ins from an Extended BackusNaur Form (EBNF) [36] • IGForkNode and IGJoinNode represent parallel ex-specification. ecution; and • IGDecisionNode and IGMergeNode represent con-B. Internal Application Metamodel ditionals and loops. Representing an application in a standard way, the model There are also two kinds of executable nodes, which are sub-captured from UML is translated into a common applica- classes of the IGMessageNode class: IGCallNode cap-tion model defined by our Internal Application Metamodel tures the sending of messages and IGReplyNode represents(IAMM), partly depicted in Figure 8. the reply messages in the UML sequence diagram. Conforming to this metamodel a system specification cap- For each UML sequence diagram SDm there is antures the functionality of the application in terms of a set of InteractionGraph IGSDm = V, E, K, L , which is aModules. Each Module has ModuleDeclarations and CDFG, where:
  • Fig. 6. UML Sequence Diagrams identified as: a) SD1; b) SD2; c) SD3 Fig. 10. Labeled Timed Automata metamodel [6], conforming to the LTA metamodel illustrated in Figure 10. Fig. 9. Interaction Graph metamodel The LTA metamodel captures all concepts introduced by the UPPAAL model checking tool [7]. According to this metamodel, a system consists of LTADeclarations, which • V is the set of nodes, representing the actions in the are used to declare variables, functions, and channels, and behavioral modeling; LTAProcesses, which are instances of LTATemplates. • E is the set of edges, representing the data and control Each LTATemplate corresponds to a timed automaton, flow between the actions; which can also have LTADeclarations of local variables • K : V → {Initial; F inal; F ork; Join; M erge; and functions. Each timed automaton is represented by a set Decision; Call; Reply} is a function that indicates the of LTALocations, corresponding to states of the automa- type of each node; and ton, and LTATransitions, corresponding to transitions • L : V → {IGSDi } is a relation that associates an between states, thus having source and target locations. IGCallNode to another InteractionGraph and Each transition may have attributes such as: allows the capture of the behavioral hierarchy of the application. • LTASelections, which non-deterministically bind a given identifier to a value in a given range when a An Interaction Overview Diagram links multiple Se- transition is taken;quence Diagrams, and from this diagram is generated an • LTAGuards - the transition is enabled in a state if andInteractionGraph IGapp = V, E, K, L , representing only if the guard evaluates to true;the CDFG for the entire application. • LTASyncronizations - transitions labeled with com- Therefore, our IAMM captures structural aspects of an ap-plication model by using a hierarchy of modules and processes, plementary synchronization actions (send and receive)as well as behavioral aspects by means of the actions of over a common channel; and • LTAUpdates - when the transition is taken, its updatesending and replying messages, where a message may executesome method in the corresponding object. expression is evaluated and the side effect of this expres- sion changes the state of the system.D. Labeled Timed Automata Metamodel The LTA model is used in the UPPAAL model checker Additionally, the functional behavior of a UML model is to perform formal verification of specified properties of thetranslated into a network of Labeled Timed Automata (LTA) system. This feature is very useful for the designer, since
  • Fig. 12. Mapping metamodel queries on the leftSide determine the source metamodel el- Fig. 11. Internal Platform metamodel ements, which will be manipulated by the Action of the rule side, and the specified action is applied only when the specified Condition evaluates to true. Thus, a transformation rulethe LTA model is automatically generated and can help the may also change the elements of the source metamodel.designer to debug and validate the specification. Similarly, the queries on the rightSide determine the target metamodel elements, which will be manipulated byE. Internal Platform Metamodel the Action of the rule side, and the specified action is applied only when the specified Condition evaluates to In a platform-based design context, a large number of true. Instead of defining our own transformation language ashardware and software components are provided and can be a concrete syntax for the Mapping Metamodel, we use thereused in the system development. Such reusable components Xtend transformation language. Therefore, transformations inmust be pre-characterized such that their Quality of Service Xtend are considered instances of the Mapping Metamodel.values, such as performance, energy, memory footprint, and The source models are IAM and IPM, and the target is theothers, are acquired. This pre-characterized library dramati- Implementation Model. A Mapping model defines also thecally reduces the design phases and the uncertainty about the rules, which guide the DSE process and prune the designsystem properties, thus improving the design productivity. The space. By means of model-to-model transformations, the rulessoftware component characterization is performed after the on the Mapping model manipulate instances of the DSEcomponent code is compiled for the target architecture, since Metamodel, to generate candidate designs during the DSEat this time a simulation/estimation tool can capture archi- process. The Mapping model provides flexibility to specifytectural information with high accuracy. The characterization constraints that directly handle the concepts of the design, suchof hardware components must be performed from adequate as processors, tasks, slots, voltage level, and others.synthesized descriptions, to obtain values that are independentof technology and frequency, such as execution cycles and gate G. Design Space Exploration Metamodelswitchings per cycle (a measure for power consumption). In The DSE Metamodel defines the relevant concepts to per-our methodology, the available hardware/software components form automated DSE. Figure 13 shows this metamodel.and the characterization information are stored in a platform The root container in this metamodel is DSEDomain, whichrepository. Figure 11 shows our Internal Platform Metamodel is a container for all elements related to DSE. It inherits(IPMM). properties from DSEModelElement as all other elements In our IPMM, a Platform contains different in this metamodel. The generalization was omitted to keepComponents, which offer Services for the application. the diagram clear. DSEDomain contains DSEProblema,These Services must be pre-characterized in terms of which define a DSE scenario. DSEProblem contains a listQuality of Service (QoS), and this information is reused of DesignGraphs extracted from Application and Platformduring system development. Our approach uses performance, Models.energy, data memory, and program memory as QoS metrics. A DesignGraph contains vertices and edges, where ver-However, other metrics could also be used, thus extending tices are ExplorableElements and Edge represents thethe QoS concept. dependences between vertices. ExplorableElement is a reference to a design element from which the DesignGraphF. Mapping Metamodel is generated. This reference is important to hook the The Mapping Metamodel is responsible for describing the ExplorableElements to the design model and allowsrules used to transform instances of IAMM and IPMM into the metamodel to be attached to multiple models, such asan instance of the Implementation Metamodel. Conforming UML, Simulink, and others. Currently, this reference is im-to the Mapping Metamodel, presented in Figure 12, a Map- plemented by holding the name of the design element as aping model consists of a set of Transformations, whose field of ExplorableElement and using queries to findRules are specified by leftSides and rightSides. The the instance of the design element in the design repository.
  • Fig. 13. Design Space Exploration metamodelThis implementation could be improved, but it is important toevaluate factors such as performance, increase of dependencebetween metamodels, and traceability of design elements. DSEProblem also contains a list of Objectives, whichare the values to be optimized, defined by their name andunit. We represent a DesignSpace as a categorical graphproduct, as we propose in [37]. DesignDecisions repre-sent vertices in the design space graph, and Alternativeslink the allowed DesignDecisions. DesignDecision Fig. 14. Implementation metamodelis a tuple of n vertices from the DesignGraphs. It con-tains a GraphToExplorableMap, which contains an in-stance of DesignGraph as a key and an instance of and scripts. An instance of this metamodel is obtained byExplorableElement as a value, so that it can map a the application of mapping rules, which are selected from thedesign decision to the ExplorableElements represented mapping model by means of our DSE approach.in the DesignGraphs. DSESolution is a sub-graph of the design space VI. D ESIGN AUTOMATION TASKSand represents candidate designs. A DSESolution has A. Co-Synthesis Tasksits costs defined in the ObjectiveToCostMap, ac- A co-synthesis design process, from a specification of thequired from an estimation/simulation process. DSESolution system functionality, produces an efficient implementation ofalso contains a list of decisions, which identifies the the embedded system in terms of: a set of software modules toDesignDecisions selected from DesignSpace and be executed by hardware components from a given platform;maps it to an ObjectiveTo-CostMap. a set of hardware modules, which are specifically designed Our H-SPEX DSE tool invokes the engine that executes for the application, in the form of ASICs or FPGAs, withthe transformations defined by ExplorationRules, which minimum latency and costs; and a set of interface modules tois an instance of the Mapping model required to generate perform the communication between all the elements of theDSESolutions conforming to the DSE Metamodel. implementation. Thus, the co-synthesis process must include design tasks for the specification of the system functionalityH. Implementation Metamodel and its translation to a representation. This representation The Implementation Metamodel, presented in Figure 14, must be adequate for the execution of tasks such as hard-represents a model that can implement a system. An ware/software partitioning, scheduling, allocation and binding,Implementation is a list of Resources, which are the during the design space exploration, and code generation, forHardware, Software and Communication components obtaining the final implementation of the specified system.required to implement the system. The following sections present the co-synthesis tasks of our The metamodel also represents the association between approach.Hardware and Software and the Communication 1) Capturing Application: Our Application Managerbetween these resources. Each Resource may have adopts an MDE approach to generate the Internal Applica-ImplementationLinks, which are references to artifacts tion Model (IAM) conforming to our IAMM. It does so byrequired for its final implementation, such as source code files performing model transformations on the UML application
  • Fig. 15. UML to IAM transformation Fig. 16. InteractionGraph: CDFG for applicationmodel, which are implemented using the Xtend language fromthe openArchitectureWare framework. To give an idea of the respectively.kind of model transformations we define, Figure 15 illustrates The InteractionGraph for the entire applicationpart of our model transformations. is shown in Figure 16-B, where we have three IGExecutableNodes cn-ig1, cn-ig2, and cn-ig3,UML structural constructs which are associated by relation L to the corresponding The main transformation is performed in lines 6-10 of InteractionGraphs of the sequence diagrams SD1,Figure 15, where each Package in the UML model is traversed SD2, and SD3, respectively. The IGForkNode fk-m1(line 7). After that, the sub-modules are identified (line 8), and the IGJoinNode jn-m1 indicate that the threethe processes are built from the sequence diagrams (line 9), InteractionGraphs are composed in parallel.and, finally, the InteractionGraphs are built from the 2) Capturing the Platform: Our Platform Manager gener-sequence diagrams (line 10). ates the Internal Platform Model (IPM) from a specification As seen in lines 12-14, which show the function of the platform resources. The platform specification is givenhandlePackage(), each Package in the UML model using a textual DSL (Domain Specific Language) for the IPM.is traversed recursively (see line 14), and each existing A parser and an editor for the textual IPM’s DSL were au-UML Class in a package is transformed into a Module tomatically generated using the Xtext feature of openArchitec-class by calling the function mapModule() (line 13). Each tureWare. From an EBNF specification, openArchitectureWareUML Attribute of each UML class is transformed into a automatically produces Eclipse plug-ins, which implement theModuleDeclaration class, as shown in lines 17-18. parser and editor for a DSL. Listing 1 shows part of the EBNF The associations between the UML classes determine the for the IPM’s DSL.sub-modules of each module. Each UML Class, which is Listing 2 presents an example of IPM given in the textualpart of an aggregation or composition of another UML Class, DSL defined by the EBNF of Listing 1.is transformed into a sub-module, by calling the function In the IPM example of Listing 2 we have two platformputSubModule() (lines 23-25). components. The component Comp1 consists of a processor with memory (component HwComp1) and an interface compo-UML behavioral constructs nent InterfComp1, which can implement hardware to hard- In Figure 16, we have the Interaction Graph for the sequence ware and software to software communications. The specifieddiagram SD1 from Figure 6-C. The IGExecutableNodes processor has a functional unit fu1, which can implementare shown as circles, the IGControlFlow edges as arrows, operations calcspeed and calcangle with latencies 2and the IGControlNodes as rounded boxes. and 1, respectively (as indicated by the qos attributes). The IGCallNodes cn-m1 and cn-m2 in Figure 16-A rep- component Comp2 defines a software component (SwComp1),resent the message calls for calcAngle() and move() in which consists of an API with methods that also can im-the SD1 of Figure 16-C, respectively. The IGReplyNodes plement the operations calcspeed and calcangle. Thisrn-m1 and rn-m2 represent the corresponding reply mes- platform information is read by the Platform Manager andsages for calcAngle() and move() in the same SD1, passed to the System Designer during the co-synthesis process.
  • 3) Code Generation: Our Implementation Manager gen- erates the design artifacts for the final implementation de-P l a t f o r m : ’ p l a t f o r m ’ name=ID termined by the System Designer. The code generation in ’ { ’ ( c o m p o n e n t s +=Component ) ∗ ’ } ’ ; the Implementation Manager is implemented using the XpandComponent : ’ p l a t c o m p o n e n t ’ name=ID ’{ ’ language of openArchitectureWare. ( h a r d w a r e c o m p s += HardwareComp ) ∗ By using Xpand templates, the Implementation Manager ( s o f w a r e c o m p s += SoftwareComp ) ∗ ( i n t e r f a c e c o m p s += I n t e r f a c e C o m p ) ∗ produces HDL descriptions for the application parts mapped ( c o m p s e r v i c e s += C o m p o n e n t S e r v i c e ) ∗ ’} ’ ; to hardware and programs for the application parts mapped toHardwareComp : ’ p l a t h a r d w a r e ’ name=ID ’{ ’ software. ( memories += MemoryComp ) ∗ ( p r o c e s s o r s += P r o c e s s o r C o m p ) ∗ ’} ’ ; B. Design Space ExplorationMemoryComp : ’ platmemory ’ name=ID ’{ ’ 1) Design Space Abstraction: Similarly to most DSE ap- ( a t t r i b u t e s += A t t r i b u t e ) ∗ ’} ’ ; proaches we explicitly define the design space as a mappingAttribute : name=ID ’= ’ V a l u e ’ ; ’ ; of graphs. However, differently from the usual approach,V a l u e : v a l u e =STRING | v a l u e =INT | v a l u e =ID ; as presented in [38], which is a manual mapping between semantically defined graphs, our approach uses the categoricalP r o c e s s o r C o m p : ’ p l a t p r o c e s s o r ’ name=ID ’{ ’ ( a t t r i b u t e s += A t t r i b u t e ) ∗ ’} ’ ; graph product, automatically generating the mapping between graphs. These graphs are free of any specific semantics fromSoftwareComp : ’ p l a t s o f w a r e ’ name=ID ’{ ’ ( Oss += OSComp ) ∗ ( APIs += APIComp ) ∗ ’} ’ ; the view of the H-SPEX tool. In the following, we formalize the design space abstraction, which is represented in the DSEOSComp : ’ p l a t O S ’ name=ID ’ { ’ ( s y s c a l l s += S y s c a l l ) ∗ ’ } ’ ; Metamodel presented in Subsection V-G. Listing 1. Xtext grammar for the Platform DSL Consider G = V, E, ∂0 , ∂1 as a graph, where V is the set of all vertices of G; E is the set of all edges of G; ∂0 : E → V is the source function of an edge; and ∂1 : E → V is the target function of an edge. Let S be the set of graphs, where Gi = Ei , Ti , ∂0i , ∂1i ⊂ S, i = {1 . . . n} and n is the number of graphs in S. This set is formed of graphs, such as a task graph, an architectural graph, and the communication structurep l a t f o r m TTA1 { of buses, extracted from instances our internal metamodels. p l a t c o m p o n e n t Comp1 { p l a t h a r d w a r e HwComp1 { The specific semantics of each graph is not considered during p l a t m e m o r y MemComp1{ S i z e = 4 0 9 5 ; Width =32;} the generation of the design space, for the purpose of design p l a t p r o c e s s o r Processor1 { v e r s i o n =1; FU f u 1 { space abstraction. The specific semantics of these graphs is s e r v i c e c a l c s p e e d { q o s D e l a y { V a l u e =2;}} considered in the exploration rules defined in a Mapping s e r v i c e c a l c a n g l e { q o s D e l a y { V a l u e =1;}} } model. The design space is a graph D resulting from the RF r f 1 { categorical graph product of the sequence of terms, which are s e r v i c e move { q o s D e l a y { V a l u e =1;}} } ... all graphs in S. In this fashion, D = Gi × Gi+1 . . . × . . . } / / Processor1 Gn = (Vi × Vi+1 . . . × . . . Vn , Ei × Ei+1 . . . × . . . En , ∂0i × p l a t i n t e r f a c e InterfComp1 { plathwhw HwHwInterf1 { Width = 3 2 ; ∂0i+1 . . . × . . . ∂0n , ∂1i × ∂1i+1 . . . × . . . ∂1n ) represents the s e r v i c e r e a d { q o s D e l a y { V a l u e =1;}} graph product between Gi , Gi+1 . . . and Gn , where {∂ki × } p l a t s w s w S w S w I n t e r f 1 {Width = 3 2 ; ∂ki+1 . . . × . . . ∂kn |k ∈ {0, 1}} are unambiguously induced by s e r v i c e r e a d { q o s D e l a y { V a l u e =1;}} the dot product between vertices and edges, considering that } } / / InterfComp1 any two vertices (ui , ui+1 , . . . , un ) and (vi , vi+1 , . . . , vn ) are } / / HwComp1 adjacent in D, if and only if ui is adjacent with vi in Gi , ui+1 p l a t c o m p o n e n t Comp2 { p l a t s o f w a r e SwComp1 { is adjacent with vi+1 in Gi+1 . . . and un is adjacent with vn p l a t A P I APIComp1 { in Gn , i = 1 . . . n − 1, where n is the number of graphs in S. method c a l S p e e d { i n 1 = 0 ; i n 2 = 0 ; o u t = 0 ; s e r v i c e c a l c s p e e d { q o s D e l a y { V a l u e =3;}} Each product of the sequence Gi × Gi+1 . . . × . . . Gn } that constitutes D represents a design activity, such as task method c a l A n g l e { i n 1 = 0 ; i n 2 = 0 ; o u t = 0 ; s e r v i c e c a l c a n g l e { q o s D e l a y { V a l u e =2;}}} mapping, processor selection, processor allocation, voltage method move{ i n = 0 ; o u t = 0 ; scaling selection, etc., such that vertices in D are design s e r v i c e move{ q o s D e l a y { V a l u e =1;}}} } / ∗ APIComp1 ∗ / decisions and edges in D are design alternatives available at a } / ∗ SwComp1 ∗ / . . . } specific vertex of D. The projection function pi = pVi , pEi :} / / TTA1 Gi × Gi+1 → Gi is defined and returns the graph Gi involved Listing 2. Platform Model for the case study in the product. Using this abstraction, a graph G is a sub-graph of D and represents a candidate design. Using the categorical graph product as abstraction, DSE is performed for multiple design activities simultaneously, as
  • DesignSpace specificTaskMapping ( algorithms, to improve the optimization step during candidate D e s i g n D e c i s i o n v1 , D e s i g n D e c i s i o n v2 , DesignSpace inDesignSpace , generation: Crowding Population-based Ant Colony Optimiza- tion for Multi-Objective (CPACO-MO) [40] and Random. String task , String processor ) : l e t t 2 = v2 . g e t ( ’ ’TASK’ ’ ) : Actually, H-SPEX is not limited to these algorithms, and l e t p2 = v2 . g e t ( ’ ’ PROCESSOR ’ ’ ) we are planning to integrate this tool to some optimization ( ( t 2 == g e t T a s k ( t a s k ) ) && ( p2 ! = g e t P r o c e s s o r ( p r o c e s s o r ) ) ? framework to improve the optimization support with analysis i n D e s i g n S p a c e . removeEdge ( v1 , v2 ) : n u l l − t h i s ; > and graphical features. The optimization is observed as a Listing 3. Sample of exploration rules black-box transformation, which uses an API to communicate information between the transformation engine and the op- timization algorithm. In order to evaluate candidate designs, we use an extended version of SPEU [35], a static analysiseach product represents a design activity. Specific properties tool, which provides a very fast evaluation step, which is theof this product, such as a restriction on the adjacencies, reduce bottleneck of the DSE process. However, any other evaluationthe number of available alternatives, as the navigation on the tool could be used, since the evaluation tool and H-PSEX aredesign space is performed through the edges. Moreover, this integrated by assigning the costs for a DSESolution in therepresentation overcomes the interdependence between design DSE model. The DSESolution is then obtained by meansactivities, as one vertex in the design space represents multiple of model-to-model transformations or using the API generateddesign decisions at the same time. This abstraction also by the EMF tool from the DSE Metamodel.exposes the communication (dependencies) between elementsand is well suited to combine the communication in multiple C. Formal Verification Based on LTAhierarchies, such as classes, task, processors, and systems. One of the important aspects in the design of embedded 2) Design Space Exploration Rules: In our approach, ex- systems is to ensure that a given system really does what itploration rules are model-to-model transformation rules, which is intended to do. Nowadays, with the growing complexity offollow the Mapping Metamodel and are specified using the embedded systems, an exhaustive test of all possible systemXtend language. These rules receive an instance of a uncon- executions, or of at least a set of representative ones, is anstrained DesignSpace as input and generate a constrained impractical or even impossible approach. An alternative toDesignSpace instance as output. They are constraints to testing is mathematically proving correctness, by specifyingguide and prune the available design space, to reduce the precise models of the embedded system and formally verifyingexploration time, and to ensure the feasibility of a candidate logical properties over these models.solution. The user of our DSE method is expected to define An LTA is an extension of the classic finite-state automatasome rules, which apply to his/her specific DSEProblem. concept [6] and captures the behavior of a system by means ofHowever, to alleviate the user effort, a set of typical rules states and transitions between states, where timing constraintswas implemented and is provided as a library to the user. can be associated to the transitions.As example, a rule named specificTaskMapping is illustrated In our approach, by means of model transformations, wein the Listing 3 and is applied when a Composite diagram generate an LTA from each InteractionGraph and asuch as the one in Figure 7 is specified. Other examples of set of InteractionGraphs will produce a network ofimplemented rules are: intercommunicating LTAs. By using model to code transfor- • Multiple Assignments of a Task: Avoids assigning the mations, we generate a textual representation for a network same task to different processors. of LTAs, which is submitted to the UPPAAL model checking • Lower / Upper Performance / Power/ Memory / Com- tool. UPPAAL is able to check for invariant properties, for munication Value: Defines the lower or upper values for example if a given formula is valid at all reachable states performance, power, memory, or communication amount of the LTAs, and reachability properties, as if given states for a task. are reachable or not during the execution of the network of • Task Deadline Violation: Removes the candidate from the LTAs. The generated network of LTAs can also be simulated population if there is a deadline violation. by UPPAAL, allowing one to visualize specific sequence of • Specific Processor Selection: Defines the processor type state transitions of the specified system and thus to debug that must be selected to implement the candidate design. possible specification errors. • Specific Task Execution Frequency: Defines the fre- 1) Generating LTA from UML: From the quency at which a processor must execute for a specific InteractionGraph in Figure 16, we obtain a network task. of timed automata, where we have a ltaProcess • Specific Task Mapping: Defines that a task must execute PWheelchair for the entire application and a in a specific processor. ltaProcess for each sequence diagram. By using 3) Design Space Exploration Tool: The DSE method pre- the Xpand language of the openArchitectureWare framework,sented in this work extends the H-SPEX tool [39], by im- we implemented model-to-code transformations that generate,plementing the design space abstraction method described in from the LTA model, the textual input for the UPPAALsubsection VI-B1. We also implement other two optimization model checker.
  • 2) Verifying Properties: At this point, the designer canspecify logical properties using CTL formulae and use UP-PAAL to verify them. As examples, we may specify propertiesto check if the application model is deadlock-free (using theUPPAAL macro A[] not deadlock) and if eventually allprocesses corresponding to the sequence diagrams will beexecuted in parallel (using the CTL formula E<> startsd1and startsd2 and startsd3). VII. C ASE S TUDY Ilustrating our approach, this section presents a develop-ment scenario for a real-time embedded system dedicated tothe automation and control of an intelligent wheelchair thathelps people with special needs. This wheelchair has severalfunctions, such as movement control, collision avoidance,navigation, target pursuit, battery control, system supervision,task scheduling, and automatic movement. Our flow starts by modeling the wheelchair system as Fig. 17. Normalized DSE results with five objectives: performance (+), powerprescribed in Section V-A. The UML model describes the ( . ), total memory (x), energy (*), and communication (o).wheelchair movement control, collision avoidance, and nav-igation Use Cases, which are essential to the system andincorporate critical hard real-time constraints. It also consists • mapping of the active objects to selected processors (upof a Class model, 18 interaction diagrams, and one composite to 6 processors);diagram. Some of the models were presented in the Section • allocation of the selected processors into a hierarchicalV-A. bus with two segments; The UML model is used as input to our design flow. The • processor voltage scaling with four distinct voltage levels.Application Manager transforms the UML model into our IM. Exploring all these activities simultaneously, H-SPEX wasNo user-defined Mapping is required, so we use only rules configured to optimize the system in terms of performancefrom our exploration rule library, described in Section VI-B2. (cycles), power (Watt), energy (Joules), total memory (bytes), The platform library provides software and hardware com- and communication volume (bytes transmitted on the bus).ponents to be reused during the implementation of an ap- The candidate population was found after 5,000 evaluationsplication. The components include mathematical functions to and represents the non-dominated set of candidate designs.solve control equations, algorithms for image filtering, a real- Figure 17 illustrates these results. The best overall candidatetime communication API, and RTOS components. The library must be selected after a trade-off analysis between the obtainedalso provides different architectures of a Java microcontroller, estimations and based on some criteria, such as weights forcommunication busses and hardware implementation of algo- the optimized objectives, or any other design feature.rithms. All components are previously characterized. Software The design space in this case study contains 2,064 alterna-components are simulated in the different microcontroller tive design decisions (vertices) and 334,080 edges, from whichmicroarchitectures, in order to define their QoS. This platform a set containing up to 17 (maximum active task distribution)was previously defined using the Eclipse Editor generated for vertices must be selected to define a candidate design solutionour Platform DSL, generating an instance of IPMM. (subgraph). The unveiled design space presents more than The System Designer coordinates the design automation 5.89 × 1041 alternative designs, considering an unrestrictedtools, invoking the HSPEX tool to perform DSE. The model design space (fully connected graph). However, in this pro-for the selected candidate is used for the transformation, which posal, edges guide the available alternatives, and constraints,generates the LTA Model as input for the UPPAAL tool for specified as model-to-model transformation rules, are locallyformal verification. After verification, the verified Implemen- applied between the current vertex and its neighbors, thustation Model is ready to be synthesized by the Implementation pruning the design space and speeding up the DSE process.Manager. Examples for DSE, formal verification, and synthesis Let a task drawn from the wheelchair case study beare provided in the next subsections. identified as T15, which implements a stereovision function (in Figure 18, T15 corresponds to the Correlation-based +A. A Design Space Exploration Median Filters vertex), presenting heavy image processing In the automatic DSE process performed in this scenario, algorithms. Figure 7 shows a composite diagram specifyingH-SPEX was configured to perform the following design tasks: that H-SPEX must map Task 15 into the DSP processor P0, • definition of which objects are active or passive benefiting from the DSP processor architecture. The resulting (runnables), among the 17 behaviors defined in the In- exploration rule from the Mapping model is presented in teraction Graphs; Listing 3.
  • LTATransitions, corresponding to the IGEdges labeled e1, e2, e3, e4, and e5. We also have an LTAProcess PWheelchair for the entire application. Thus, the diagram for the LTA model is very similar to the one for the InteractionGraphs model presented in Figure 16. By using the Xpand language of the openArchitectureWare framework, we implemented model-to-text transformations that generate, from the LTA model, the textual input for the UPPAAL model checker. Fig. 18. Task dependency graph for the wheelchair system. At this point, the designer can specify logical properties using CTL formulae and use UPPAAL to verify them. We have specified properties to check: if the application model is deadlock-free (using the UPPAAL macro A[] not deadlock); and if eventually all processes corresponding to the sequence diagrams will be executed in parallel (using the CTL formula E<> startsd1 and startsd2 and startsd3). C. Code Generation and Synthesis In our approach, the code generation strategy is based on templates. The generation tool uses the EMF API to obtain information from the Implementation Model and to complete these templates, which are specified using the Xpand language Fig. 19. Effect of constraints: Sample of a partial design space graph from openArchitectureWare. The code generator uses different templates, according to the specified resource in the Implementation Model. In this Let us consider a vertex from the design space graph be the way, communicating tasks allocated to different processorstuple T 13, P 1, C1, V 2 , which specifies that task T13 must imply the generation of specific communication directivesbe mapped to processor P1, while P1 must be allocated to and/or interconnection components. Likewise, the allocationcommunication bus C1 and execute T13 with voltage level V2. of various active tasks to the same processor implies theThere are 48 alternatives at this vertex. Figure 19 illustrates generation of scheduler services, as well as of real-timea partial graph, representing the design space at this vertex, directives on each active task to specify its activation pattern.which is located at the center. The shadowed vertices around Listing 4 shows part of the software source code that ourthe vertex T 13, P 1, C1, V 2 in the centre are pruned nodes, tool generates for the MovementController class, whichand the white nodes are the alternative designs that satisfy all includes objects responsible for controlling the wheelchairconstraints. movement. Applying the structural constraints and the sample design The software source code contains two important methodsconstraint here defined, the pruning process has reduced the for a RealtimeThread subclass: mainTask() (lines 18-design space by 83% on the specific vertex, avoiding wasting 23) and exceptionTask() (lines 24-26). The former rep-time with unnecessary evaluations and unfeasible designs, thus resents the task body, i.e. the code executed when the taskfocusing the search for an adequate solution on the most is activated. This is a periodic task, for which the periodicrelevant design points. activation is implemented as a loop with execution frequency being controlled by calling the waitForNextPeriod()B. Functional Verification method. This method uses the task release parameters to inter- After the selection of the candidate design after the DSE act with the scheduler and to control the correct execution ofprocess, the System Designer performs the transformation the method. The exceptionTask() method represents thefrom the InteractionGraphs in the Application Internal exception handling code that is triggered if the mainTask()model into the LTA model, according to the partition defined in method does not finish until the established deadline. We usethe Implementation model. As example, consider the Sequence the Java API for real-time specification described in [41].diagrams presented in Figure 6. From them we obtain the Besides the software source code generation, the design flownetwork of timed automata shown in Figure 20. is also automated by a set of generated scripts, which configure For the sequence diagram SD1, we have: an LTAProcess and execute compilers, synthesis tools, and simulators for thePSD1 with 6 LTALocations, corresponding to generated and assembled components of the Implementationthe IGNodes labeled Start-IG-SD1, cn-m1, Model. Thus, to perform the entire design flow, a designer cancn-m2, rn-m1, rn-m2, and end-IG-SD1; and 5 execute a script, such that all design process phases, including
  • Fig. 20. Network of LTA in UPPAAL for InteractionGraphs in the IAM.01 p u b l i c c l a s s M o v e m e n t C o n t r o l l e r e x t e n d s of MDE was presented and the main concepts, namely, mod- RealtimeThread02 { els, meta-models and transformation between models, were03 p r i v a t e s t a t i c P r i o r i t y P a r a m e t e r s introduced.04 s c h e d P a r a m s = new P r i o r i t y P a r a m e t e r s (05 P r i o r i t y S c h e d u l e r . g e t M a x P r i o r i t y ( ) −3) ; The technological framework supporting MDE relies on06 p r i v a t e s t a t i c P e r i o d i c P a r a m e t e r s standards. Three of them were identified and described,07 r e l e a s e P a r a m s = new P e r i o d i c P a r a m e t e r s (08 null , / / s t a r t time namely MDA, MIC, and Software Factories. Moreover, the09 n u l l , / / end t i m e most adopted languages, tools and technologies for MDE were10 T i m e O b j e c t s . 10 ms , / / p e r i o d11 T i m e O b j e c t s . 4 200 ms , / / c o s t presented.12 T i m e O b j e c t s . 10 ms ) ; / / d e a d l i n e A short survey on domain specific engineering tools for13 p u b l i c s t a t i c M o v e m e n t A c t u a t o r14 m o v e m e n t A c t u a t o r = new M o v e m e n t A c t u a t o r ( ) ; embedded systems was presented. The paper also described15 p r i v a t e i n t m L a s t V a l i d S p e e d V a l u e = 0 ; in detail the MDE framework for embedded systems, in16 p r i v a t e i n t m L a s t V a l i d A n g l e V a l u e = 0 ;17 development at UFRGS.18 p u b l i c v o i d mainTask ( ) { In that approach, named MoDES, the MDE fundamental19 while ( isRunning ( ) ) {20 / / . . . movement c o n t r o l c o d e notion of transformation between models is used to generate,21 waitForNextPeriod () ; from a UML model of an application consisting of class22 }23 } and sequence diagrams, an internal representation model to24 p u b l i c v o i d e x c e p t i o n T a s k ( ) { be used by formal verification and co-synthesis tools. The25 / / d e a d l i n e miss h a n d l i n g code26 } obtained model captures structural aspects of an application27 / / c o d e c o n t i n u e . . . model by using a hierarchy of modules and processes, as well28 } ; as behavioral aspects by means of a CDFG model. Listing 4. Generated source code for MovementController class A new design space abstraction based on the categorical graph product was also proposed. This abstraction overcomes the challenge to deal with interdependencies between designautomated exploration, compilation, synthesis, simulation, and activities and provides a flexible representation for multipledeployment, will be performed. design activities. A DSE metamodel was defined, so that the design space can be easily handled by MDE transformation VIII. F INAL R EMARKS engines using their transformation rules. These rules are used The pressure to reduce the time-to-market and the ever to implement design constraints that prune the design spacegrowing design difficulties require new research efforts to and generate the candidate design, thus improving DSE re-adopt languages with high abstraction level or/and new ap- sults. Moreover, UML/MARTE models are used to generateproaches to cope with that. Model Driven Engineering (MDE) the additional transformation rules, which remove unfeasibleis the current betting to raise the design abstraction level and designs from the design space, thus saving time that would beto provide mechanisms to improve the portability, interoper- spent with unnecessary evaluations.ability, maintainability, and reusability of models. In addition, Observing the history of MDE for embedded systems, itMDE helps to abstract platform complexity and to represent is possible to identify some trends. The first application ofdifferent concerns of the system. MDE for embedded systems consisted in using model-to- This paper presented a introduction to MDE applied for the model transformation to integrate tools by transforming thedevelopment of complex embedded systems. A brief history output from a tool to another, usually integrating existing co-
  • design tools. Afterwards, domain specific metamodels were [17] F. Jouault and J. B´ zivin, “KM3: a DSL for metamodel specification,” in eproposed to capture the heterogeneous nature of embedded Proceedings of the 8th IFIP WG 6.1 international conference on Formal Methods for Open Object-Based Distributed Systems, ser. FMOODS’06,systems and syntactic transformations were used to generate vol. 4037. Berlin, Heidelberg: Springer-Verlag, 2006, pp. 171–185.systems from these metamodels, as in the Gaspard framework. [Online]. Available: http://dx.doi.org/10.1007/11768869 14The next steps are the development of smart generators which [18] M. F. van Amstel, C. F. J. Lange, and M. G. J. van den Brand, “Metrics for Analyzing the Quality of Model Transformations,” 2008.use transformations based on the semantics of elements, such [19] S. J. Mellor and M. Balcer, Executable UML: A Foundationas GenERTiCA. Additional improvements can be achieved, for Model-Driven Architectures. Boston, MA, USA: Addison-by including domain expertise in model-to-model transforma- Wesley Longman Publishing Co., Inc., 2002. [Online]. Available: http://portal.acm.org/citation.cfm?id=545976tions, such as the design space exploration methodology using [20] T. Schattkowsky, W. Mueller, and A. Rettberg, “A Generic ModelMDE concepts, implemented by H-SPEX. Execution Platform for the Design of Hardware and Software,” in UML for SOC Design, G. Martin and W. M¨ ller, Eds. Springer US, 2005, pp. u 63–88. [Online]. Available: http://dx.doi.org/10.1007/0-387-25745-4 4 ACKNOWLEDGMENT [21] M. A. Wehrmeister, J. G. Packer, and L. M. Ceron, “Support for early verification of embedded real-time systems through UML models This work was partially supported by CNPq. simulation,” SIGOPS Oper. Syst. Rev., vol. 46, no. 1, pp. 73–81, Feb. 2012. [Online]. Available: http://dx.doi.org/10.1145/2146382.2146396 [22] P. Boulet, J. L. Dekeyser, C. Dumoulin, and P. Marquet, “MDA R EFERENCES for SoC Design, Intensive Signal Processing Experiment,” in FDL. ECSI, 2003, pp. 309–317. [Online]. Available: http://dblp.uni- [1] P. Marwedel, Embedded System Design, 1st ed. Boston, USA: trier.de/rec/bibtex/conf/fdl/BouletDDM03 Kluwer Academic Publishers, Oct. 2003. [Online]. Available: [23] L. Bond´ , C. Dumoulin, and J.-L. Dekeyser, “Metamodels and e http://www.worldcat.org/isbn/1402076908 MDA Transformations for Embedded Systems,” in Advances in [2] B. Selic, “The pragmatics of model-driven development,” Software, Design and Specification Languages for SoCs, P. Boulet, Ed. IEEE, vol. 20, no. 5, pp. 19–25, 2003. [Online]. Available: Boston: Springer US, 2005, ch. 8, pp. 89–105. [Online]. Available: http://dx.doi.org/10.1109/MS.2003.1231146 http://dx.doi.org/10.1007/0-387-26151-6 8 [3] D. C. Schmidt, “Guest Editor’s Introduction: Model-Driven [24] E. Piel, R. B. Atitallah, P. Marquet, S. Meftali, S. Niar, A. Etien, J. L. Engineering,” Computer, vol. 39, no. 2, pp. 25–31, Feb. 2006. Dekeyser, and P. Boulet, “Gaspard2: from marte to systemc simulation,” [Online]. Available: http://dx.doi.org/10.1109/MC.2006.58 in DATE’08 Workshop on Modeling and Analysis of Real-Time and [4] S. Kent, “Model Driven Engineering,” in Proceedings of the Third Embedded Systems with the MARTE UML profile, 2008. International Conference on Integrated Formal Methods, ser. IFM [25] M. A. Wehrmeister, E. P. Freitas, C. E. Pereira, and F. Rammig, ’02. London, UK, UK: Springer-Verlag, 2002, pp. 286–298. [Online]. “GenERTiCA: A Tool for Code Generation and Aspects Weaving,” in Available: http://portal.acm.org/citation.cfm?id=743552 Proceedings of the 2008 11th IEEE Symposium on Object Oriented [5] OMG, “Meta Object Facility (MOF) Core Specification Version 2.4,” Real-Time Distributed Computing, ser. ISORC ’08. Washington, DC, OMG, Tech. Rep. December, 2010. USA: IEEE Computer Society, 2008, pp. 234–238. [Online]. Available: [6] R. Alur and D. L. Dill, “A theory of timed automata,” Theoretical http://dx.doi.org/10.1109/ISORC.2008.67 Computer Science, vol. 126, no. 2, pp. 183–235, Apr. 1994. [Online]. [26] F. Balarin, Y. Watanabe, H. Hsieh, L. Lavagno, C. Passerone, Available: http://dx.doi.org/10.1016/0304-3975(94)90010-8 and A. Sangiovanni-Vincentelli, “Metropolis: an integrated electronic [7] K. G. Larsen, P. Pettersson, and W. Yi, “Uppaal in a nutshell,” system design environment,” Computer, vol. 36, no. 4, pp. 45–52, Apr. International Journal on Software Tools for Technology Transfer 2003. [Online]. Available: http://dx.doi.org/10.1109/MC.2003.1193228 (STTT), vol. 1, no. 1-2, pp. 134–152, Dec. 1997. [Online]. Available: [27] A. Sangiovanni-Vincentelli and G. Martin, “Platform-based design and http://dx.doi.org/10.1007/s100090050010 software design methodology for embedded systems,” IEEE Design & [8] S. Edwards, L. Lavagno, E. A. Lee, and A. Sangiovanni-Vincentelli, Test of Computers, vol. 18, no. 6, pp. 23–33, 2001. “Design of embedded systems: formal models, validation, and [28] R. Chen, M. Sgroi, L. Lavagno, G. Martin, A. S. Vincentelli, and synthesis,” Proceedings of the IEEE, vol. 85, no. 3, pp. 366–390, Mar. J. Rabaey, UML and platform-based design. Norwell, MA, USA: 1997. [Online]. Available: http://dx.doi.org/10.1109/5.558710 Kluwer Academic Publishers, 2003, pp. 107–126. [Online]. Available: [9] P. M. Weichsel, “The Kronecker Product of Graphs,” Proceedings of http://portal.acm.org/citation.cfm?id=886350 the American Mathematical Society, vol. 13, no. 1, 1962. [Online]. [29] T. Kangas, P. Kukkala, H. Orsila, E. Salminen, M. H¨ nnik¨ inen, a a Available: http://dx.doi.org/10.2307/2033769 T. D. H¨ m¨ l¨ inen, J. Riihim¨ ki, and K. Kuusilinna, “UML-based a aa a[10] J. Bezivin, “On the unification power of models,” Software and Systems multiprocessor SoC design framework,” ACM Trans. Embed. Comput. Modelling, vol. 4, no. 2, pp. 171–188, May 2005. Syst., vol. 5, no. 2, pp. 281–320, May 2006. [Online]. Available:[11] J. Rothenberg, “The nature of modeling,” in AI, Simulation, and http://dx.doi.org/10.1145/1151074.1151077 Modeling. John WIley and Sons, 1989, pp. 75–92. [Online]. Available: [30] A. Bakshi, V. K. Prasanna, and A. Ledeczi, “MILAN: A Model http://www.rand.org/pubs/notes/2007/N3027.pdf Based Integrated Simulation Framework for Design of Embedded[12] D. Gasevic, D. Djuric, and V. Devedzic, Model Driven Engineering and Systems,” in Proceedings of the ACM SIGPLAN workshop on Ontology Development. Berlin, Heidelberg: Springer Berlin Heidelberg, Languages, compilers and tools for embedded systems, ser. LCTES ’01. 2009. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-00282-3 New York, NY, USA: ACM, 2001, pp. 82–93. [Online]. Available:[13] K. Czarnecki and S. Helsen, “Feature-based survey of model http://dx.doi.org/10.1145/384197.384210 transformation approaches,” IBM Syst. J., vol. 45, no. 3, pp. 621–645, [31] S. Neema, J. Sztipanovits, G. Karsai, and K. Butts, “Constraint- Jul. 2006. [Online]. Available: http://dx.doi.org/10.1147/sj.453.0621 Based Design-Space Exploration and Model Synthesis,” in Embedded[14] R. France and B. Rumpe, “Model-driven Development of Complex Software, ser. Lecture Notes in Computer Science, R. Alur and I. Lee, Software: A Research Roadmap,” in 2007 Future of Software Eds. Berlin, Heidelberg: Springer Berlin / Heidelberg, 2003, vol. 2855, Engineering, ser. FOSE ’07. Washington, DC, USA: IEEE ch. 19, pp. 290–305. [Online]. Available: http://dx.doi.org/10.1007/978- Computer Society, May 2007, pp. 37–54. [Online]. Available: 3-540-45212-6 19 http://dx.doi.org/10.1109/FOSE.2007.14 [32] S. Mohanty and V. K. Prasanna, “Rapid system-level performance[15] J. Sztipanovits and G. Karsai, “Model-integrated computing,” Computer, evaluation and optimization for application mapping onto vol. 30, no. 4, pp. 110–111, Apr. 1997. [Online]. Available: SoC architectures,” 2002, pp. 160–167. [Online]. Available: http://dx.doi.org/10.1109/2.585163 http://dx.doi.org/10.1109/ASIC.2002.1158049[16] J. Greenfield, “Software factories: assembling applications with patterns, [33] F. A. M. Nascimento, M. F. S. Oliveira, and F. R. Wagner, “ModES: models, frameworks and tools,” pp. 16–27, 2004. [Online]. Available: Embedded Systems Design Methodology and Tools based on MDE,” http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.5929 in Model-Based Methodologies for Pervasive and Embedded Software,
  • 2007. MOMPES ’07. Fourth International Workshop on, Mar. 2007, pp. [38] T. Blickle, J. Teich, and L. Thiele, “System-Level Synthesis 67–76. [Online]. Available: http://dx.doi.org/10.1109/MOMPES.2007.14 Using Evolutionary Algorithms,” Design Automation for Embedded[34] M. F. S. Oliveira, E. W. Bri ao, F. A. M. Nascimento, and F. R. Wagner, Systems, vol. 3, no. 1, pp. 23–58, Jan. 1998. [Online]. Available: “Model driven engineering for MPSOC design space exploration,” in http://dx.doi.org/10.1023/A:1008899229802 Proceedings of the 20th annual conference on Integrated circuits and [39] M. F. S. Oliveira, E. W. Bri˜ o, F. A. Nascimento, and F. R. Wagner, a systems design, ser. SBCCI ’07. New York, NY, USA: ACM, 2007, pp. “Model driven engineering for MPSOC design space exploration,” 81–86. [Online]. Available: http://dx.doi.org/10.1145/1284480.1284509 Journal of Integrated Circuits and Systems, vol. 3, no. 1, pp. 13–22,[35] M. F. S. Oliveira, L. B. de Brisolara, L. Carro, and F. R. 2008. Wagner, “Early Embedded Software Design Space Exploration Using [40] D. Angus, “Crowding Population-based Ant Colony Optimisation UML-Based Estimation,” in Proceedings of the Seventeenth IEEE for the Multi-objective Travelling Salesman Problem,” in IEEE International Workshop on Rapid System Prototyping. Washington, Symposium on Computational Intelligence in Multi-Criteria DC, USA: IEEE Computer Society, 2006, pp. 24–32. [Online]. Decision-Making, Apr. 2007, pp. 333–340. [Online]. Available: Available: http://portal.acm.org/citation.cfm?id=1136925 http://dx.doi.org/10.1109/MCDM.2007.369110[36] R. S. Scowen, “Extended BNF - A Generic Base [41] M. A. Wehrmeister, L. B. Becker, F. R. Wagner, and C. E. Pereira, Standard,” in Proceedings of the 1993 Software Engineering “An Object-Oriented Platform-based Design Process for Embedded Standards Symposium (SESS’93), Aug. 1993. [Online]. Available: Real-Time Systems,” in Proceedings of the Eighth IEEE International http://www.cl.cam.ac.uk/˜mgk25/iso-14977-paper.pdf Symposium on Object-Oriented Real-Time Distributed Computing, ser.[37] M. F. S. Oliveira, F. A. M. Nascimento, W. Mueller, and F. R. Wagner, ISORC ’05. Washington, DC, USA: IEEE Computer Society, 2005, pp. “Design space abstraction and metamodeling for embedded systems 125–128. [Online]. Available: http://dx.doi.org/10.1109/ISORC.2005.13 design space exploration,” in Proceedings of the 7th International Workshop on Model-Based Methodologies for Pervasive and Embedded Software, ser. MOMPES ’10. New York, NY, USA: ACM, 2010, pp. 29–36. [Online]. Available: http://dx.doi.org/10.1145/1865875.1865880
  • ¸˜ Introducao ao desenvolvimento de software embarcado Alexandra da Costa Pinto de Aguiar Felipe G. de Magalh˜ es a S´ rgio J. Filho e PUCRS - PPGCC PUCRS - PPGCC PUCRS - PPGCC Email:alexandra.aguiar@pucrs.br Email: felipe.magalhaes@acad.pucrs.br Email:sergio.johann@acad.pucrs.br Fabiano Hessel Oliver Longhi PUCRS - PPGCC PUCRS - PPGCCEmail:fabiano.hessel@pucrs.br Email:oliver.longhi@acad.pucrs.br Resumo—Sistemas embarcados est˜ o presentes na vida da a ¸˜ Sistemas onde existem tais restricoes de tempo s˜ o chamados a e ´maioria das pessoas e a tendˆ ncia e que cada vez mais esses de Sistemas de Tempo Real (do inglˆ s, Real-time Systems, edispositivos sejam essenciais para o nosso dia-a-dia. Ao longo RTS). Em seu contexto, espera-se que uma resposta ou odos anos, a alta convergˆ ncia dos sistemas resultou em um con- estante acr´ scimo de funcionalidades nos dispositivos embarcados, e ¸˜ resultado de uma operacao seja fornecido em um tempo pr´ - e ´ ¸˜principalmente os da industria de comunicacao e entretenimento. ¸˜ ´ determinado e previs´vel, ou seja, uma operacao e considerada ıNesse contexto, o desenvolvimento de sistemas embarcados, onde correta somente se, al´ m de seu resultado l´ gico esperado, ela e o ¸˜o software era projetado especificamente para uma aplicacao tem for finalizada at´ o tempo previsto. edado lugar ao desenvolvimento baseado em plataformas, onde ´ Por conseguinte, e comum que RTSs utilizem um Sistemao software atrelado a um sistema operacional vem ganhandodestaque. O principal problema em se aumentar a relevˆ ncia do a Operacional (do inglˆ s, Operating System, OS) espec´fico, e ısoftware reside no atendimento aos requisitos t´picos dos sistemas ı denominado de Sistema Operacional de Tempo Real (doembarcados, que mesmo com um acr´ scimo nas funcionalidades e inglˆ s, Real-Time Operating System - RTOS) respons´ vel e a ¸˜ainda possuem limitacoes quanto ao tamanho do c´ digo, consumo o por viabilizar seu gerenciamento. Al´ m de prover funcional- e e ¸˜de energia, al´ m de restricoes temporais para determinadas idades comuns a OSs, como temporizadores e gerencia- ¸˜aplicacoes. Assim, ferramentas de aux´lio ao desenvolvimento ıde software embarcado tˆ m sido cada vez mais objeto de e ¸˜ mento de interrupcoes e tarefas, uma das principais funcoes ¸˜estudo. Este minicurso apresenta conceitos introdut´ rios sobre o ´ ¸˜ dos RTOSs e garantir que a execucao de uma determinadao desenvolvimento de software embarcado al´ m de fundamentos, e ¸˜ tarefa ocorra conforme a restricao temporal atribu´da a ela ı ¸˜definicoes e, em um car´ ter te´ rico-pr´ tico, exemplos pr´ ticos a o a a [1]. Assim, o escalonador de tarefas de um RTOS n˜ o e a ´de desenvolvimento embarcado utilizando a plataforma Hellfire, respons´ vel apenas por gerenciar a ordem de execucao, mas a ¸˜ ´seguido pelos desafios e oportunidades na area. ¸˜ tamb´ m por assegurar o cumprimento das restricoes temporais e ¸˜ I. I NTRODUC AO de um determinado conjunto de tarefas. Esse mecanismo e ´ implementado com base em uma das diversas pol´ticas de ı ´ Nos ultimos anos, os sistemas embarcados tˆ m aumentado e escalonamento existentes e que se adapte melhor ao tipo desua importˆ ncia na vida das pessoas. Esses sistemas podem a ¸˜ aplicacao alvo. ´ser encontrados nas mais diversas areas, em equipamen- ¸˜ A complexidade computacional e as restricoes de desem-tos m´ dicos, na ind´ stria automotiva e em dispositivos de e u ¸˜ penho em aplicacoes embarcadas tˆ m aumentado nos ultimos e ´ ¸˜entretenimento. Ainda, entre as aplicacoes tradicionalmente ` anos o que traz a tona a consequente necessidade por capaci-conhecidas como sendo embarcadas podem ser citadas as dade de processamento cada vez maior. Ainda, o tempo que ¸˜ ¸˜espaciais, os sistemas de navegacao, de aviacao e rob´ tica. o e e ´ esses dispositivos tˆ m at´ chegar ao mercado e cada vez mais u ¸˜ In´ meras vezes, para que aplicacoes embarcadas possam reduzido, diminuindo o chamado time-to-market. ´ser utilizadas em diferentes setores e necess´ rio que estejam a Como principal consequˆ ncia e desse aumento ¸˜adaptadas a uma s´ rie de restricoes impostas pelo ambiente e ¸˜ de exigˆ ncias e restricoes, tem-se a crescente e ¸˜ ¸˜ou pela situacao. Dentre as principais limitacoes destacam-se complexidade do projeto de sistemas embarcados,tamanho final do dispositivo, consumo limitado de energia, principalmente considerando as metodologias existentestempo de funcionamento irrestrito, controle da geracao de¸˜ [2]. Assim, o uso de plataformas altamente configur´ veis, acalor, imunidade a interferˆ ncias e impactos, entre outros. e ¸˜ al´ m do aumento no n´vel de abstracao das especificacoes, e ı ¸˜ ¸˜ Al´ m de restricoes que podem ser consideradas f´sicas, e ı s˜ o propostas para lidar com esses desafios durante o a ¸˜muitas aplicacoes embarcadas possuem uma caracter´stica em ı desenvolvimento de sistemas complexos [3].comum: uma falha temporal em qualquer ponto do sistema ¸˜ Ainda, al´ m de poder apresentar restricoes temporais, e e ´pode causar sua inutilidade (total ou parcial), grandes perdas desej´ vel, por quest˜ es de custo e desempenho, que sistemas a ofinanceiras ou at´ mesmo desastres de grandes proporcoes. e ¸˜ embarcados sejam implementados atrav´ s da integracao de e ¸˜
  • todos os componentes necess´ rios para sua execucao em a ¸˜ [5]. Al´ m disso, precisam ter atendidos seus requisitos de e ´um unico chip, formando um sistema comumente chamado tempo real, especificados em tempo de projeto.de System-on-Chip (SoC). Um SoC permite a utilizacao de ¸˜ Por fim, todas essas quest˜ es s˜ o fatores motivadores para o acomponentes he-terogˆ neos, tais como CPUs, mem´ rias e e o que novas plataformas de aux´lio ao desenvolvimento do ı e ´ ı ´barramentos, entre outros. Al´ m disso, e poss´vel que um unico software embarcado sejam pes-quisadas e estudadas, o queSoC seja formado por mais de um elemento de processamento ´ e feito ao longo deste trabalho.(EP). SoCs que empregam v´ rios elementos de processamento a ´em um unico chip s˜ o denominados de SoC multiprocessados a B. Organizacao do Texto ¸˜ou MPSoC (do inglˆ s, Multiprocessor System-on-Chip). e ` ¸˜ ´ No que concerne a organizacao do texto, e poss´vel ob- ı ´ Dentro desse contexto, e importante observar que algumas servar a divis˜ o do cap´tulo em duas partes principais: a a ıcaracter´sticas mais gen´ ricas de sistemas multiprocessados ı e ¸˜ fundamentacao te´ rica e o estudo de caso da plataforma ocomputacionais de prop´ sito geral podem ser observadas nos o Hellfire.MPSoCs, incluindo diversos desafios antes vistos somente ¸˜ Na fundamentacao te´ rica, s˜ o detalhados os itens o anesses sistemas. Destacam-se a dificuldade de programacao ¸˜ ` necess´ rios a compreens˜ o de uma plataforma embarcada no a aparalela, a necessidade por balanceamento de carga, o mel- ¸˜ seu todo e das restricoes geralmente impostas nesses sistemas.hor aproveitamento das unidades de processamento al´ m dee ¸˜ ¸˜ Itens como definicao e classificacao de sistemas embarcados ¸˜mecanismos de comunicacao eficientes. a ˆ s˜ o amplamente discutidos, dando-se uma enfase especial ao principal componente do software embarcado: o Sistema Op-A. Motivacao ¸˜ eracional de Tempo Real. Ent˜ o, s˜ o mostrados os principais a a Ao longo dos anos sistemas uniprocessados costumavam fatores que diferenciam os sistemas de tempo real dos deser empregados em grande escala, tanto na ind´ stria quanto u prop´ sito geral e de melhor esforco. O escalonamento das o ¸na academia. Por´ m, ao mesmo tempo em que apresentavam e tarefas e algoritmos empregados pelo escalonador do sistemacada vez mais funcionalidades e maior desempenho, devido a tamb´ m s˜ o detalhados. e a ¸˜crescente melhoria nos processos de fabricacao e implementa- J´ o estudo de caso apresenta, de maneira mais extensa, a¸˜cao de circuitos integrados, seu consumo de energia cres- a plataforma Hellfire, incluindo seus diversos componentes ´cia proporcionalmente. Assim, nos ultimos anos, uma nova e exemplos pr´ ticos de uso, seguido pelas consideracoes a ¸˜abordagem vem sendo utilizada, consistindo na construcao de ¸˜ ´ finais do trabalho. A plataforma Hellfire e composta por umsistemas que utilizem m´ ltiplos elementos de processamento u simulador do tipo ISS (do inglˆ s, Instruction Set Simulator), eintegrados, operando a uma frequˆ ncia menor. e ¸˜ que possibilita a simulacao de at´ 256 processadores. Um e Assim, sistemas embarcados multiprocessados (do inglˆ s, e framework voltado pra web foi desenvolvido para facilitar aMultiprocessed System-on-Chip, MPSoC) est˜ o presentes em a ¸˜ configuracao do principal componente da plataforma Hellfire: ¸˜grande parte das aplicacoes que eram, tradicionalmente, geren- o HellfireOS. Esse OS pode ser personalizado em diversosciadas por sistemas uniprocessados. O emprego de sistemas ¸˜ itens visando a otimizacao do projeto final, sendo isso facil- ´MPSoC e cada vez maior, tanto em ambientes acadˆ micos e ¸˜ itado pela integracao do simulador e do OS em um mesmo u ¸˜ aquanto na ind´ stria, sua vasta utilizacao j´ tornou-se realidade framework. ´ao longo dos ultimos anos [4]. ¸˜ A implementacao de m´ ltiplos elementos de processamento u ¸˜ ´ II. F UNDAMENTAC AO TE ORICA ´operando a uma frequˆ ncia menor em um unico chip facilita a e ¸˜ Para facilitar a compreens˜ o do leitor, esta Secao possui a ¸˜solucao de diversos problemas, tais como consumo de energia, trˆ s divis˜ es principais: sistemas embarcados, plataformas de e odesempenho e paralelismo, enquanto introduz novos desafios, desenvolvimento e software embarcado. S˜ o apresentados con- adevido ao aumento da complexidade arquitetural. Entre os ¸˜ ceitos e informacoes de car´ ter b´ sico pertinentes a cada um a aprincipais desafios est˜ o a programabilidade, o gerenciamento, a desses t´ picos, sendo que o leitor experiente nesses assuntos o ¸˜ ¸˜ ¸˜a otimizacao e adaptacao de tais sistemas a aplicacoes com ¸˜ pode concentrar sua leitura na pr´ xima Secao. ocaracter´sticas dinˆ micas e de tempo real. ı a ´ Uma forma de lidar com esses fatores e se utilizar modelos e A. Sistemas Embarcados ¸˜ferramentas que permitam a simulacao do ambiente em n´veis ı Os sistemas embarcados est˜ o em toda parte e presentes a ¸˜de abstracao superiores. Dessa forma, mais possibilidades ¸˜ no cotidiano das pessoas. Essa afirmacao vem se tornandopodem ser avaliadas em tempo de projeto, al´ m de se pos- e cada vez mais verdadeira e absoluta, uma vez que a ocupacao¸˜ ¸˜sibilitar uma reducao no esforco necess´ rio para a conclus˜ o ¸ a a dos sistemas embarcados vem crescendo em um ritmo bas-do sistema. tante acelerado devido aos avancos tecnol´ gicos. Dentre esses ¸ o Ainda, tipicamente, MPSoCs s˜ o formados por poucos a ¸ ¸˜ avancos, destaca-se o grande poder de miniaturizacao providoelementos de processamento, de m´ dio poder computacional, e ¸˜ pelas tecnologias inovadoras de fabricacao de chips [6]. ¸˜ ´onde a aplicacao e definida e tem suas tarefas posicionadas em Para que se possa conceituar os sistemas embarcados e ´ ¸˜tempo de projeto. Aplicacoes atuais, no entanto, tendem a ter ¸˜ preciso diferenciar os tipos de computacao. Existem computa-grande complexidade e apresentar carga vari´ vel nos elemen- a dores de prop´ sito geral e de prop´ sito espec´fico. Como a o o ıtos de processamento ao longo do tempo de vida do sistema pr´ pria nomenclatura sugere, os computadores de prop´ sito o o
  • geral s˜ o projetados e desenvolvidos para que qualquer tipo ade tarefa possa ser executada nele com algum desempenhom´nimo garantido. J´ os computadores de prop´ sito espec´fico, ı a o ıtamb´ m, como a nomenclatura esclarece, s˜ o pensados e e aimplementados para um fim espec´fico, ou seja, para aquele ıobjetivo, possivelmente seu desempenho (do ponto de vistacomputacional e energ´ tico) seja maior do que o oferecido epor um computador de prop´ sito geral. o Classicamente, os sistemas de prop´ sito espec´fico o ıencontravam-se embarcados em algum outro sistemamaior, originando o termo de sistemas embarcados.Atualmente, o termo sofreu alguma mudanca na sua ¸ Fig. 1. ¸˜ ¸˜ Abstracoes e aglomeracoes no projeto de hardware ¸˜ ¸˜conceituacao principalmente em funcao da grande tendˆ ncia e Fonte: Adaptado de [8]de convergˆ ncia entre dispositivos de entretenimento que epossuem diversas funcionalidades e podem ser consideradosh´bridos. ı ´ ´ plataforma e um modelo unico abstrato que esconde os detal- Mesmo assim, diferente dos computadores de prop´ sito o ¸˜ hes de diferentes implementacoes poss´veis, a exemplo de um ıgeral como PCs (do inglˆ s, Personal Computer), os Sis- e aglomerado de componentes de mais baixo n´vel de abstracao ı ¸˜temas Embarcados (SE) realizam um conjunto de tarefas [8]. O uso de plataformas permite que os custos de projetopr´ -definidas e, geralmente, contˆ m requisitos espec´ficos a e e ı ¸˜ e fabricacao possam ser divididos entre uma gama maior deserem atendidos. Em grande parte dos casos, esses sistemas potenciais usu´ rios, sendo que tal fato n˜ o aconteceria se um a a ¸˜possuem restricoes bastante severas quanto ao tamanho f´sico ı ´ projeto unico fosse desenvolvido para cada produto [8].do dispositivo final, ao custo e ao consumo de energia - itens ¸˜ Um novo e mais alto n´vel de abstracao vem surgindo, ıque tendem a aumentar ainda mais a complexidade do seu ´ ` nos ultimos anos, em reposta a crescente complexidade doprojeto. projeto dos circuitos integrados. Nesse n´vel os objetos po- ı Outro aspecto acerca desses sistemas, n˜ o menos impor- a ¸˜ dem ser descricoes funcionais de comportamentos complexos `tante, diz respeito a grande press˜ o mercadol´ gica imposta as a o ` ¸˜ ou especificacoes arquiteturais de plataformas completas de `empresas, levando a necessidade de projetos cuja realizacao ¸˜ ¸˜ hardware. A relacao entre os elementos de uma plataformaseja cada vez mais eficiente. Isso significa que os novos ¸˜ ´ e uma aplicacao e conhecida como mapeamento. Em umsistemas devem ser desenvolvidos dentro de poucos meses, ı e ´ n´vel sistˆ mico, o mapeamento e feito entre objetos funcionaisal´ m de ter seu retorno financeiro garantido em per´odos igual- e ı e elementos de plataformas e associa-se um comportamentomente restritos [7]. Desse modo, para que a implementacao de ¸˜ funcional a um elemento arquitetural que pode implementarum sistema embarcado possa ser bem sucedida, respeitando esse comportamento. ¸˜essas restricoes impostas pelo intr´nseco e curto time-to-market ı Segundo [8], o mapeamento em n´vel sistˆ mico opera ı e ´ ¸˜dispon´vel atualmente, e essencial a realizacao de um projeto ı sobre objetos heterogˆ neos e, tamb´ m, permite a separacao e e ¸˜que compreenda o sistema como um todo. de aspectos diferentes e ortogonais, tais como: Novas metodologias vem sendo desenvolvidas com o obje-tivo de aumentar a produtividade dos projetistas. Dentre essas, • computacao e comunicacao: importante porque o refi- ¸˜ ¸˜destaca-se a desenvolvida por [8] que se baseia no emprego de ¸˜ ´ namento da com-putacao e geralmente feito de forma ¸˜uma sequˆ ncia de t´ cnicas - as de abstracao e de aglomeracao e e ¸˜ ¸˜ manual ou por compilacao e escalo-namento, enquanto- adotada ao longo dos anos, que visa alcancar esse objetivo. ¸ ¸˜ que a comunicacao faz uso de padr˜ es; o ¸˜ A t´ cnica de abstracao prevˆ a descricao de um determinado e ¸˜ e • implementacao de plataforma e de aplicacao: frequente- ¸˜ ¸˜objeto atrav´ s do emprego de um modelo, no qual, alguns de- e mente definidos e projetados por empresas ou grupostalhes de baixo n´vel podem ser ignorados. J´ , a aglomeracao ı a ¸˜ diferentes de forma independente, e; ¸˜faz a juncao de um conjunto de modelos que pertencem ao • comportamento e desempenho: devem ser separados ¸˜mesmo n´vel de abstracao para conceber um novo tipo de ı ¸˜ porque as informa-coes de desempenho podem ou n˜ o aobjeto, o qual normalmente possui novas propriedades que n˜ o a representar requisitos n˜ o funcionais ou resultar em uma afazem parte dos modelos isolados que o constitui. Atrav´ s da e escolha de implementacao. ¸˜ ¸˜aplicacao dessas duas t´ cnicas de forma sucessiva, a eletrˆ nica e o ¸˜ Todas essas separacoes acarretam em um reuso melhor umadigital fez com que o projeto dos sistemas evolu´sse de ı vez que desacoplam aspectos independentes permitindo umadesenhos de layouts, a esquem´ ticos de transistores, para a ¸˜ reducao no tempo de projeto e aumentando a produtividadenetlists de portas l´ gicas, e por fim, at´ o n´vel de transferˆ ncia o e ı e ¸˜ ¸˜ atrav´ s da reducao do tempo necess´ rio para a verificacao do e aentre registradores (do inglˆ s, Register Transfer Level - RTL), e sistema.conforme pode ser observado na Figura 1. 1) Sistemas de Tempo Real em Sistemas Embarcados: e ´ O emprego de plataformas tamb´ m e importante para o Conforme observado, os sistemas embarcados visam, em geral, e ¸˜uso eficiente das t´ cnicas de abstracao e aglomeracao. Uma ¸˜ resolver problemas bastante espec´ficos al´ m de, em muitos ı e
  • casos, n˜ o ser diretamente percebidos pelo usu´ rio. Nesse con- a a ¸˜ sua utilizacao em larga escala tem estreita relacao com a¸˜texto, pode ser exemplificado um tipo diferente de aplicacoes ¸˜ necessidade de a ind´ stria automotiva reduzir o ´ndice de u ıque possuem requisitos de respostas em tempos determinados emiss˜ o de gases poluentes, agindo como um poderoso e eficaz ae n˜ o suportam falhas: os chamados Sistemas de Tempo Real a controle da mistura admitida pelo motor. Isso significa que(do inglˆ s, Real-time Systems - RTS). e ¸˜ motores que possuem o sistema de injecao eletrˆ nica possuem o Os RTSs podem ser definidos como sistemas computa- uma maior economia de combust´vel j´ que sempre trabalham ı acionais que interagem fisicamente com o mundo real, al´ m e ¸˜ com a relacao ideal na mistura entre combust´vel e ar. ıde possu´rem requisitos de tempo nessas interacoes [1]. Tipi- ı ¸˜ Outro sistema que exemplifica bem os sistemas embarcados ¸˜ ´camente, a interacao com o mundo real e realizada atrav´ s de e ` no mundo automotivo corresponde a suspens˜ o ativa, uma asensores e atuadores em vez de utilizar o par teclado e monitor tecnologia que controla os movimentos verticais das rodascomum nos computadores de prop´ sito geral. o atrav´ s de um sistema eletrˆ nico. Assim, ao contr´ rio da e o a ¸˜ A partir da conceituacao existente de RTS pode-se citar ex- suspens˜ o comum, que trabalha de acordo com a rodagem, aemplos dos mais diversos tipos: sistemas de air bag, videocon- ¸˜ a suspens˜ o ativa corrige as imperfeicoes da pista com mais aferˆ ncia, sistema de controle de tr´ fego a´ reo, controladores e a e eficiˆ ncia. Isso garante mais estabilidade e desempenho ao ede m´ quinas de lavar roupa e DVD players, entre outros. a ¸˜ ve´culo em situacoes diversas, como curvas, aceleracao ou ı ¸˜Na medida que o uso de sistemas computacionais prolifera frenagem, al´ m de facilitar o controle do condutor. e ¸˜em nossa sociedade, aplicacoes de tempo real tornam-se mais Muitos outros sistemas veiculares, como assistente paracomuns [9]. Atrav´ s de uma breve an´ lise desses exemplos e a ¸˜ estacionamento, auxiliares de navegacao e verdadeiras´ ¸˜e poss´vel observar que a aceitacao de falhas e atrasos e ı ´ ¸˜ estacoes de entretenimento contemplando DVD players esuportada de forma diferente entre eles, o que caracteriza uma outros itens de divers˜ o est˜ o cada vez mais difundidos, sendo a adivis˜ o nos RTS em: cr´ticos (do inglˆ s, hard) e n˜ o-cr´ticos a ı e a ı que seu custo, por consequˆ ncia, tamb´ m est´ decaindo. Desse e e a(do inglˆ s, soft). e ´ modo, pode-se afirmar que a area de estudo dos sistemas ¸ ¸˜ A principal diferenca em relacao a RTS cr´ticos e n˜ o- ı a ´ inteligentes para autom´ veis e promissora para pesquisas e o ı ´cr´ticos e a consequˆ ncia que o atraso na execucao de uma e ¸˜ existem muitas iniciativas ao redor do mundo focadas nadeterminada tarefa pode causar. Nos RTS cr´ticos uma eventual ı seguranca e conforto dos condutores de autom´ veis atrav´ s ¸ o e ´falha e catastr´ fica e pode causar preju´zos e/ou apresen- o ı do uso de sistemas embarcados complexos. `tar riscos a vida humana e ao meio ambiente em geral a a ı a ¸˜[1]. J´ os RTS n˜ o-cr´ticos n˜ o possuem limitacoes t˜ o fortes a B. Arquiteturas ¸˜em relacao ao atraso de uma determinada tarefa, pois essedescumprimento de tempo, em geral, acarreta apenas em uma ¸˜ Nesta Secao apresentam-se as plataformas utilizadas para o ¸˜degradacao do desempenho do sistema, sem causar, no entanto, ¸˜ desenvolvimento e implementacao dos sistemas embarcados.os poss´veis preju´zos de uma falha em um sistema cr´tico [10]. ı ı ı Uma plataforma de desenvolvimento pode ser definida como 2) Exemplos de sistemas embarcados: Para finalizar a a ¸˜ sendo a infraestrutura necess´ ria para a criacao e desenvolvi- ¸˜Secao introdut´ ria a respeito dos sistemas embarcados, o ¸˜ mento de um determinado sistema. Essa definicao abrangediscute-se, brevemente, sobre como os sistemas embarcados o ı ¸˜ quest˜ es de diferentes n´veis de abstracao.est˜ o presentes em ve´culos automotivos. a ı ¸˜ Em um n´vel mais baixo encontram-se as definicoes arquite- ı Assim como muitos sistemas embarcados, os subsis- turais, enquanto que em n´veis mais altos est˜ o informacoes ı a ¸˜temas inteligentes que est˜ o presentes nos autom´ veis est˜ o a o a a respeito do software a ser empregado. Embora possa serdispon´veis em v´ rios modelos e muitos s˜ o t˜ o discretos que ı a a a dividida em camadas de acordo com o n´vel de abstracao, a ı ¸˜at´ mesmo o condutor tem dificuldade de perceber sua atuacao. e ¸˜ ¸˜ plataforma de desenvolvimento apresenta restricoes normal-Entre esses, podem ser citados os sistemas de freios ABS, a ` mente ligadas a natureza do sistema embarcado. ¸˜injecao eletrˆ nica e a suspens˜ o ativa. o a 1) Monoprocessadas: Arquiteturas monoprocessadas tipi- Sistemas com freios a disco com ABS (do inglˆ s, Anti-Brak e camente encontram-se em um System-on-Chip (SoC) ondeSystem) e EBD (do inglˆ s, Electronic Brake Distribuition) e est˜ o diretamente em contato com outros componentes, tais as˜ o eficientes, evitando o travamento das rodas garantindo a como mem´ rias, decodificadores, circuitos dedicados, entre omelhor aderˆ ncia com o piso de rolamento. Assim, ao se e outros, em um mesmo chip. Essas arquiteturas represen-evitar o deslizamento das rodas durante a frenagem, os freios taram uma queda muito grande no custo dos dispositivosantitravamento beneficiam os condutores de duas maneiras: (i) embarcados, principalmente os de entretenimento em funcao ¸˜o autom´ vel ir´ parar mais r´ pido, e; (ii) a trajet´ ria do carro o a a o ¸˜ da reducao do n´ mero total de chips necess´ rio para a u a ´pode ser alterada enquanto a frenagem e realizada. Sensores ¸˜ composicao total do sistema.de velocidade, bomba, v´ lvulas al´ m da unidade controladora a e O crescente avanco tecnol´ gico vem possibilitando a ¸ ocomp˜ em esse sistema. o ¸˜ integracao de v´ rios blocos de hardware, tais como proces- a a ¸˜ J´ os sistemas de injecao eletrˆ nica tamb´ m funcionam o e sadores, mem´ ria e perif´ ricos, em um mesmo chip. O circuito o e ¸˜sem a percepcao do condutor e consistem em sistemas de ´ integrado (CI) que cont´ m esse sistema e denominado de e ¸˜alimentacao de combust´vel e gerenciamento eletrˆ nico do ı o ¸˜ System-on-Chip e permite a realizacao de tarefas espec´ficasımotor do autom´ vel. Nesse contexto, pode-se afirmar que o [11] [12]. Um dos principais resultados dessa caracter´stica e ı ´
  • a possibilidade de aumento de desempenho e funcionalidades A arquitetura de um MPSoC t´pico lembra as j´ consagradas ı aencontradas nos equipamentos atuais. arquiteturas multiprocessadas. No entanto, o projeto de um Assim como no desenvolvimento de SEs, o processo ¸˜ ¸˜ MPSoC possui restricoes adicionais em relacao ao custo ede desenvolvimento dos SoCs tamb´ m possui press˜ o mer- e a ¸˜ ao consumo de energia, possibilitando, assim, a realizacao decadol´ gica e, por isso, deve ser eficaz e eficiente. Conforme o ´ diversos estudos novos nessa area [15]. Um exemplo b´ sico a ¸˜pode ser observado, a aplicacao de t´ cnicas de reuso [13] e e ´ de MPSoC que conta com elementos de processamento (EP), ¸˜essencial para que as diversas restricoes existentes nesse tipo mem´ ria, interface de E/S e barramentos para interconex˜ o, o ade projeto possam ser respeitadas, uma vez que defendem a pode ser visua-lizado na Figura 3. ¸˜ ¸˜padronizacao de interfaces e a modularizacao de diferentescomponentes. Na Figura 2 pode-se observar um exemplo t´pico de SoC ıque conta com: • um ou mais microcontroladores, microprocessador e n´ cleos de DSP’s; u • blocos de mem´ ria (ROM, RAM, EEPROM e/ou Flash); o • osciladores e PLLs (do inglˆ s, Phase Locked Loop); e Fig. 3. Exemplo de MPSoC • perif´ ricos diversos; e • interfaces externas (anal´ gicas e digitais); o Em sistemas embarcados multiprocessados utilizam-se duas • meios de interconex˜ o para ligar os blocos mencionados. a ou mais CPUs com um consumo de energia reduzido, o que diminui, consequentemente, sua capacidade computacional. Apesar disso ainda s˜ o capazes de realizar tarefas complexas, a ¸˜ pois paralelizam a computacao. Um cuidado que deve ser tomado ao se utilizar MPSoC est´ no fato de que o meio de a interconex˜ o possui papel importante no desempenho geral do a sistema. Por exemplo, em um sistema altamente comunicante, se o meio de conex˜ o escolhido n˜ o suportar diversas trocas de a a mensagens, o desempenho total ser´ prejudicado. Sendo assim, a o desempenho do sistema n˜ o depende apenas da capacidade a computacional dos processadores, mas tamb´ m, do poder de e ¸˜ comunicacao. Fig. 2. Exemplo de System-on-Chip ¸˜ Comunicacao por Barramento. Nesse modelo, N nodos s˜ o interligados por uma ou mais vias que se encarregam a Por fim, outro item importante a repeito de SoCs e o ´ de transmitir os pacotes trocados. Cada nodo representa uma ¸˜desempenho. Para determinadas aplicacoes o emprego de um unidade do sistema podendo ser composto, por exemplo, por ´processador unico que seja respons´ vel por toda a execucao a ¸˜ mem´ rias e/ou processadores. Essa estrutura de comunicacao o ¸˜do sistema pode ser mais custoso, em termos de desempenho ´ e amplamente empregada principalmente devido a sua simpli-e/ou consumo de energia, do que quando comparado a sistemas e ¸˜ cidade e eficiˆ ncia de implementacao [16].compostos por mais de um processador [14]. Nesse contexto Nos barramentos simples (n˜ o hier´ rquicos), somente um a asurgem os SoCs multiprocessados ou MPSoCs abordados a elemento de processamento pode usar o barramento por vez,seguir. enquanto todos os outros devem esperar o t´ rmino da trans- e 2) Multiprocessadas: Arquiteturas multiprocessadas rep- a ´ miss˜ o em andamento e dependendo de um arbitro para ter ¸˜ ¸˜ `resentam uma evolucao em relacao as monoprocessadas e acesso ao barramento. A Figura 4 mostra um exemplo deest˜ o tipicamente nos chamados Multiprocessor System-on- a barramento onde uma CPU, uma mem´ ria e uma unidade de oChip (MPSoC), onde tamb´ m se comunicam com outros dis- e a ´ E/S est˜ o conectadas por uma unica via.positivos do sistema [4]. Essas arquiteturas dificultam o desen-volvimento do software, uma vez que problemas encontradosem sistemas paralelos de pro-p´ sito geral tamb´ m podem o eser vistos nesse novo contexto, por´ m, com um conjunto de e ¸˜restricoes bem maior e mais diverso. ¸˜ A crescente demanda de diversas aplicacoes, tais como asde multim´dia e os sistemas m´ veis, traz a necessidade da ı o ¸˜utilizacao de mais de um elemento de processamento em um Fig. 4. Exemplo de Barramento´unico SoC. Nesse caso, os diversos elementos de processa-mento (homogˆ neos ou heterogˆ neos) agregados aos demais e e Uma alternativa mais eficiente para uso desse tipo de barra-componentes t´picos de um SoC formam um MPSoC [15]. ı ´ mentos, denominado de Simples e o Barramento Hier´ rquico. a
  • ¸ ´A grande diferenca dessa topologia para o simples e a ex- ¸˜ com apenas duas ligacoes destacadas em negrito, al´ m do nodo eistˆ ncia de diversos n´veis de barramentos interconectados e ı ¸˜ 0201, que cont´ m quatro ligacoes destacadas. epor pontes (do inglˆ s, bridges), respons´ veis pela troca de e apacotes entre os n´veis. Este tipo de barramento possibilita ı ¸˜o paralelismo de comunicacoes, no entanto este paralelismo´ ¸˜e restrito, e uma comunicacao entre n´ cleos conectados para u ¸˜sub-barramentos diferentes provocar´ a paralizacao de diversos arecursos [7]. A Figura 5 mostra um barramento hier´ rquico acomposto por quatro CPUs, uma mem´ ria compartilhada e o ¸˜uma unidade de comunicacao com o mundo externo. No n´vel ısuperior, denominado n´vel mestre (do inglˆ s, Master), duas ı e o ¸˜CPUs e o m´ dulo de comunicacao externa est˜ o interligados apor um barramento simples. J´ no n´vel inferior, denomi- a ınado n´vel escravo (do inglˆ s, Slave), outras duas CPUs e ı ea mem´ ria est˜ o conectadas por outro barramento simples. A o a ¸˜comunicacao entre os dois n´veis ocorre via uma ponte de ı Fig. 6. Exemplo NoC com Topologia em malhatroca de pacotes. J´ na Figura 7 outra alternativa de topologia para NoCs e a ´ exibida. Nesse caso, denomina-se Torus sendo muito semel- ` hante a topologia em malha, com a principal diferenca de que ¸ os nodos externos s˜ o conectados aos nodos externos da outra a extremidade. Fig. 5. Barramento Hier´ rquico Interligando CPUs a ¸˜ Comunicacao por Redes Intra-Chip. Devido ao aumentoda demanda por sistemas altamente comunicantes aliado` ¸˜as limitacoes dos modelos baseados em barramentos, no- ¸˜vas solucoes tiveram de ser pesquisadas. Um modelo de ¸˜comunicacao que vem sendo muito explorado nos ultimos ´ ´anos e o de Redes Intra-Chip (do inglˆ s, Network-on-Chip e- NoC) [16]. Fig. 7. Rede Torus Interligando CPUs ¸˜ Nesse modelo, a abordagem para comunicacao difere em ¸˜ `relacao aquela adotada pelos barramentos: enquanto que em ´ 3) Virtualizadas: Por fim, e apresentado um tipo maisbarramentos todos os elementos s˜ o interligados por um meio a recente de arquitetura de sistemas embarcados: as plataformas ¸˜simples e direto de comunicacao, em NoCs roteadores geren- virtualizadas. Essas plataformas, assim como no seu uso nosciam todo tr´ fego e direcionam os pacotes da maneira mais a sistemas de prop´ sito geral, s˜ o empregadas com objetivos o aadequada. A eficiˆ ncia na entrega dos pacotes est´ ligada aos e a ¸˜ diversos, entre os quais: reducao de custo e aumento de de-algoritmos de roteamento presentes na rede, que podem ser ¸˜ sempenho. Virtualizacao de sistemas computacionais consistedivididos em trˆ s grupos principais: e em criar um grupo l´ gico de recursos que se assemelham o • roteamento est´ tico e dinˆ mico; a a aos recursos f´sicos oferecidos por um ambiente computa- ı • roteamento distribu´do, e; ı cional [17]. Essa t´ cnica tem sido adotada amplamente no e • roteamento m´nimo e n˜ o-m´nimo. ı a ı mundo empresarial, especialmente para explorar o potencial de Nas NoCs diversas topologias s˜ o propostas sendo que o a sistemas multiprocessados, al´ m de oferecer outras vantagens, e ´modelo mais usado e o de rede tipo malha (do inglˆ s, mesh) e tais como:onde todas as conex˜ es possuem o mesmo comprimento, o • permitir que v´ rios sistemas operacionais sejam executa- afacilitando o projeto. Nesse modelo todos roteadores, ex- ´ dos em uma unica m´ quina; acluindo os nodos externos que possuem apenas duas conex˜ es, o • prover isolamento de uma m´ quina virtual para outra, aest˜ o interligados a, no m´ ximo, quatro roteadores vizinhos, a a aumentando a seguranca; ¸agilizando a troca de pacotes [16]. • aumentar a flexibilidade do sistema; Um exemplo de rede em malha contendo 16 processadores • melhorar o gerenciamento da carga de trabalho, e:´ ´e mostrado na Figura 6, onde e poss´vel observar o nodo 0003 ı • permitir a independˆ ncia de hardware. e
  • ¸˜ Por outro lado, a virtualizacao pode ser considerada umat´ cnica que demanda alto poder computacional, j´ que, nor- e amalmente, requer um grande es-paco em disco e muito ¸uso de mem´ ria RAM, al´ m de inserir uma camada ex- o etra de gerenciamento: o monitor de m´ quinas virtuais (do ainglˆ s, Virtual Machine Monitor - VMM), tamb´ m conhecido e ecomo hypervisor, camada essa res-pons´ vel por permitir que a ¸˜instrucoes executadas pela m´ quina virtual sejam executadas anormalmente pela m´ quina hospedeira. a ¸˜ Em servidores comerciais, a virtualizacao permite que um´unico servidor f´sico funcione como m´ ltiplos servidores ı u Fig. 8. Hypervisors tipos 1 e 2l´ gicos al´ m de prover m´ ltiplas ins-tˆ ncias de diferentes o e u aSistemas Operacionais, como Windows, Linux e outros. Fre- ´ E importante destacar que uma vez que a m´ quina virtual aquentemente, esses sistemas s˜ o empregados em processadores amulti-core da Intel e AMD, tendˆ ncia essa que e adotada e ´ imita o hardware real, tamb´ m deve separar a execucao nos e ¸˜atualmente pela maioria dos fabricantes de processadores, modos kernel e usu´ rio. Nesse sentido, estudos cl´ ssicos de a acujos projetos ultrapassam os quatro cores para um futuro Popek e Goldberg [17] introduzem uma classificacao dos ¸˜pr´ ximo. o ISA (do inglˆ s, Instruction Set Architecture) em trˆ s grupos e e diferentes: ´ Como discutido anteriormente, e vis´vel a tendˆ ncia de ı ese utilizar plataformas multiprocessadas tamb´ m nos sis- e ¸˜ 1) instrucoes privilegiadas: aquelas que causam em umatemas embarcados [4], sendo que dispositivos que utilizam trap quando executadas em modo usu´ rios mas que n˜ o a atais plataformas forcam uma mudanca na maneira pela qual ¸ ¸ causam trap se empregadas no modo kernel; ¸˜seus desenvolvedores realizam a concepcao de seus sistemas. ¸˜ 2) instrucoes sensitivas de controle: aquelas que tentamIsso acontece principalmente porque t´ cnicas antes vistas na e ¸˜ modificar a confi-guracao dos recursos no sistema, e; ¸˜computa-cao multiprocessada de prop´ sito geral precisam ser o ¸˜ 3) instrucoes sensitivas de comportamento: aquelas cujoreavaliadas antes de ser empregadas em SEs [18]. comportamento ou resultado depende da configuracao ¸˜ ¸˜ de recursos (o conte´ do do registrador de relocacao ou u ¸˜ ¸˜ Enquanto a virtualizacao possibilita a execucao de m´ ltiplas u o modo do processador). ´instˆ ncias de sistemas operacionais em um unico processador a ¸˜(mono- ou multi-core), a sua utilizacao em sistemas embar- Assim sendo, de acordo com os Popek e Goldberg [17], a ´cados n˜ o e trivial, pois s˜ o muito diferentes de sistemas a ¸˜ para que a virtualizacao de uma dada m´ quina seja poss´vel, a ıempresariais [19]. Desse modo, para que a virtualiza-cao possa ¸˜ ¸˜ as instrucoes sensitivas (de controle e comportamento) devemser empregada de maneira vantajosa em sistemas embarcados, ¸˜ ser um subconjunto das instru-coes privilegiadas. Isso n˜ o e a ´muito esforco deve ser realizado para que se entenda como ¸ realidade em muitos processadores, como os da fam´lia Intel ı `se deve adapt´ -la as necessidades e caracter´sticas dos SEs, a ı ¸˜ x86, e, nesse caso, a solucao comumente perpassa por adotar ¸˜sistemas normalmente restritos com relacao ao consumo de en- suporte em n´vel de hardware por parte do processador. A ı ¸˜ergia, quantidade de mem´ ria, restricoes temporais e tamanho o Intel possui o IntelVT (do inglˆ s, Virtualization Technology) e ´de area. e a AMD possui o SVM (do inglˆ s, Secure Virtual Machine). e O suporte pelo hardware pode n˜ o ser a melhor solucao no a ¸˜ O hypervisor, tamb´ m denominado de monitor de m´ quinas e a ´ caso dos sistemas embarcados, j´ que e interessante que a a ´virtuais, juntamente com o hardware, e respons´ vel por lidar a ¸˜ virtualizacao consiga lidar com o diverso hardware existente, ¸˜com as instrucoes vindas da m´ quina virtual, al´ m de realizar a e especialmente para acelerar o restrito time-to-market.todo o controle das m´ quinas virtuais. Adicionalmente, deve- a ´ Quando o suporte de hardware e inexistente, a maneirase observar o funcionamento desse componente. De acordo ´ mais comum de se virtualizar um sistema e conhecida comocom [17], existem dois tipos diferentes de hypervisor: virtualizacao pura. Nesse caso, sempre que a m´ quina vir- ¸˜ a • ¸˜ tipo 1, conhecido como virtualizacao no n´vel de hard- ı ¸˜ tual tentar executar uma instrucao privilegiada (requisicao ¸˜ ´ ware, onde se considera que o hypervisor e um sistema de E/S, escrita em mem´ ria etc.), ocorre uma trap o operacional por si s´ , j´ que somente ele opera em o a ´ para o hypervisor. Normalmente, essa e considerada uma modo kernel, como pode ser observado no lado esquerdo ¸˜ forma muito ineficiente de se aplicar a virtualizacao tanto da Figura 8. Sua principal tarefa, al´ m de controlar a e em sistemas de prop´ sito geral quanto em embarcados o a ´ ¸˜ m´ quina real, e prover a nocao de m´ quinas virtuais, e; a [19]. • ¸˜ tipo 2, ou virtualizacao no n´vel de SO, onde o hypervisor ı e ¸˜ Alternativamente, a t´ cnica de para-virtualizacao pode ser ´ ¸˜ e como qualquer outra aplicacao de usu´ rio e n˜ o tem a a ¸˜ empregada para substituir as instrucoes sensitivas do c´ digo o acesso direto ao hardware (deve passar antes pelo sistema original por chamadas expl´citas ao hypervisor (hypercalls). ı operacional da m´ quina). Nesse caso, perde-se uma das a Na verdade, o sistema operacional da m´ quina virtual est´ a a ¸˜ ´ principais vantagens da virtualizacao que e justamente o ¸˜ agindo como uma aplicacao normal de usu´ rio sendo execu- a uso de SOs diferentes. tada sobre um sistema operacional normal, com a diferenca ¸
  • que o sistema operacional convidado est´ sendo executado a ¸˜ ser descartadas para determinadas aplicacoes, com o objetivo ¸˜ ´sobre o hypervisor. Quando a para-virtualizacao e adotada, o de poupar recursos.hypervisor deve definir uma interface composta por chamadas Ainda acerca dos RTOSs, uma quest˜ o interessante - e que ade sistemas que possam ser usadas pelo sistema operacional ´ os faz diferente dos OSs de prop´ sito geral - e que tipicamente o ´convidado. Ainda, e poss´vel remover todas as instrucoes ı ¸˜ o programador pode ter um acesso mais f´ cil e direto ao asensitivas do SO convidado, forcando-o a usar comente as ¸ ´ hardware. O objetivo principal dessa abordagem e deixar essehypercalls o que torna o hypervisor mais parecido com acesso mais previs´vel e r´ pido apesar de, talvez, possibilitar ı aum microkernel, o que pode aumentar o desempenho da o acesso indevido aos dispositivos, tais como a mem´ ria do o ¸˜virtualizacao. sistema. Conceitos b´ sicos em RTOS. Define-se uma tarefa como aC. Software embarcado uma das pequenas partes que formam um programa em ¸˜ execucao, o qual possui um espaco de enderecamento pr´ prio. ¸ ¸ o Ap´ s examinar com mais cautela as nuances das platafor- o Tarefas podem ser classificadas em peri´ dicas e aperi´ dicas. o omas de desenvolvimento dos sistemas embarcados, apresenta- Uma tarefa peri´ dica e aquela cujas ativacoes de proces- o ´ ¸˜ ¸˜se, nesta Secao, o conceito de software embarcado. Esse samento ocorrem em uma sequˆ ncia infinita e acontecem esoftware pode estar em diversas camadas e linguagens e - em intervalos regulares, denominados de per´odo. J´ umaı anos dias de hoje - corresponde desde o sistema operacional tarefa aperi´ dica e aquela cuja ativacao corresponde a eventos o ´ ¸˜respons´ vel pelo funcionamento do sistema embarcado, at´ os a e ¸˜ ´ internos ou externos, sendo que sua execucao e aleat´ ria. oaplicativos ins-talados ou mesmo desenvolvidos pelo usu´ rio a Quando existe um intervalo m´nimo conhecido entre duas ıde um telefone celular, por e-xemplo. ativacoes consecutivas, a tarefa e dita espor´ dica. ¸˜ ´ a Historicamente, sistemas embarcados eram espec´ficos a ı ¸˜ ´ Al´ m disso, outra classificacao poss´vel e em relacao a e ı ¸˜ ¸˜uma aplicacao e, por conta disso, o software era visto apenas preemptividade da tarefa. Tarefas preemptivas s˜ o aquelas acomo um complemento - por vezes opcional - do sistema. que podem ser interrompidas ao longo de sua execucao, ao ¸˜ ¸˜Com a constante mudanca e evolucao nesse mercado, novos ¸ contr´ rio das n˜ o preemptivas, que devem ser executadas de a adispositivos e novas necessidades por parte dos consumidores forma atˆ mica. ofizeram com que a complexidade dos sistemas aumentasse em ´ Por fim, a outra diferenca b´ sica entre tarefas e em relacao ¸ a ¸˜ ¸˜igual proporcao. Dessa forma, o software antes visto como a sua prioridade. Tarefas est´ ticas n˜ o tˆ m seu n´vel de a a e ıopcional em alguns casos passou a ser o elemento fundamen- ¸˜ prioridade modificado ao longo da execucao, sendo que e esta- ´tal de um sistema embarcado. No entanto, ferramentas que belecido pelo sistema operacional ou pelo usu´ rio. J´ as tarefas a aexplorem e auxiliem o desenvolvedor ainda s˜ o escassas. a dinˆ micas s˜ o iniciadas com um valor de prioridade est´ tico, a a a 1) Sistemas Operacionais Embarcados de Tempo Real: por´ m esse n´vel pode ser alterado ao longo da execucao de e ı ¸˜Dentre as camadas de software, a principal a ser destacada acordo com diversos parˆ metros, tais como o tempo de CPU acorresponde ao sistema operacional existente. Desde que uma reservado e aquele consumido por uma determinada tarefa. ¸˜classe generosa de sistemas embarcados possui restricoes de Outro conceito fundamental em RTOSs est´ relacionado ao atempo real, destacamos os Sistemas Operacionais de Tempo ¸˜ tempo. Dessa forma, existem v´ rias definicoes em relacao ao a ¸˜Real (do inglˆ s, Real Time Operating System - RTOS) como e tempo de ocorrˆ ncia de um certo evento, como por exemplo: e ` ¸˜os principais agentes no atendimento as restricoes cr´ticas e ın˜ o cr´ticas encontradas nesses sistemas. a ı • ¸˜ ¸˜ tempo de computacao ou execucao (computation time): Um RTOS deve atender aos requisitos funcionais e, prin- ´ e o tempo utilizado por uma tarefa para a execucao ¸˜cipalmente, aos temporais, imprenscind´veis aos RTS. Nesses ı ¸˜ completa de suas atribuicoes. Casos especiais de temposistemas, diferentemente daqueles de prop´ sito geral, mecan- o ¸˜ de execucao incluem:ismos como caches de disco e mem´ ria virtual, em geral, n˜ o o a 1) BCET (Best Case Execution Time) - melhor (menor)s˜ o empregados porque podem dificultar a previsibilidade das a ¸˜ tempo de exe-cucao poss´vel de uma determinada ıtarefas a serem executadas. tarefa; ´ Dessa forma, um RTOS e um sistema que, tipicamente, 2) ACET (Average Case Execution Time) - tempo ¸˜possui suporte a prioridades e sincronizacao previs´vel de ı ¸˜ m´ dio de execucao de uma determinada tarefa, e; ethreads, al´ m de oferecer um comportamento determin´stico e ı 3) WCET (Worst Case Execution Time) - pior (maior)de todo o sistema operacional [20]. Para que isso seja poss´vel, ı ¸˜ tempo de exe-cucao poss´vel de uma determinada ıdevem ser conhecidos itens como: o pior caso de tempo de tarefa. ¸˜execucao (do inglˆ s, Worst-Case Execution Time - WCET) e e • ¸˜ tempo limite de execucao (deadline): e o tempo m´ ximo ´ a ¸˜o tempo no qual as interrupcoes ser˜ o atendidas. a permitido para que uma tarefa seja executada; ´ Outro ponto caracter´stico de um RTOS e a possibili- ı • tempo de in´cio (start time): e o instante de in´cio do ı ´ ıdade de ser personalizado para uma determinada aplicacao ¸˜ ¸˜ processamento da tarefa em ativacao; ¸˜em tempo de compilacao, de maneira que inclua somente • tempo de t´ rmino (completion time): e o instante de e ´um pequeno subconjunto das funcionalidades dispon´veis ı ¸˜ tempo em que se completa a execucao da tarefa;[20]. Isso faz com que partes subutilizadas do RTOS possam • tempo de chegada (arrival time): e o instante em que ´
  • o escalonador toma conhecimento de uma ativacao da¸˜ tarefa, e; • ¸˜ tempo de liberacao (release time): corresponde ao in- stante de inclus˜ o da tarefa na fila de tarefas prontas a a ser executadas. Escalonamento de Tarefas em RTOS. Conforme exposto,um RTOS deve ser capaz de realizar as tarefas que lhe forem Fig. 9. M´ quina de estados t´pica de uma tarefa a ı ¸˜atribu´das respeitando as restri-coes de tempo impostas pela ı ¸˜ ´aplicacao. Para que isso seja poss´vel e neces-s´ rio que exista ı a a ´ tarefas no tempo exigido. J´ a previsibilidade e alcancada ¸ ¸˜um escalonador de tarefas que consiga respeitar as restricoes ´ quando o sistema e determin´stico e sabe-se exatamente ıexistentes nesses sistemas. Assim, define-se por escalona- ¸˜ quando determinadas situacoes ocorrer˜ o. amento a ordem estabelecida de execucao de um determinado ¸˜ ´ Adicionalmente, e importante que um escalonador verifiqueconjunto de tarefas. A forma pela qual o escalonamento ser´ a ´ se um determinado conjunto de tarefas e escalon´ vel ou n˜ o. a arealizado depende, basicamente, de dois fatores: do tipo de Para poder realizar esse teste, e feita uma an´ lise de escalon- ´ a ¸˜tarefa (seus atributos e restricoes) e do algoritmo de escalon- abilidade que, atrav´ s de f´ rmulas matem´ ti-cas, calculam o e o aamento utilizado. ¸˜ ´ tempo de utilizacao total da CPU. O caso otimo ocorre quando Apesar de existirem diversos algoritmos de escalonamento, se consegue manter esse tempo em 100%, mas nunca mais doem geral, todos se baseiam em uma m´ quina de estados a ´ ´ que isso. Nesse ultimo caso, o conjunto de tarefas e dito n˜ o- a a ¸˜b´ sica que representa as situacoes poss´veis de uma tarefa ao ı escalon´ vel. alongo do tempo. Essa m´ quina possui trˆ s estados: pronto, a e Dado um determinado conjunto de tarefas que sejaexecutando e bloqueado e est´ representada pela figura 9. a escalon´ vel cabe ao escalonador, atrav´ s de uma pol´tica de a e ı a e ¸˜Est˜ o representadas, tamb´ m, as interacoes poss´veis entre ı escalonamento, decidir a ordem pela qual essas tarefas ser˜ o aesses estados, que s˜ o: a executadas. Essa pol´tica pode ser: ı • pronto para executando: a tarefa que se encontra no • offline - realiza todas as decis˜ es relacionadas ao escalon- o estado de pronto est´ apta a receber a CPU a qual- a ¸˜ amento das tarefas antes da execucao do sistema, quer momento, mas ainda n˜ o a recebeu porque existe a armazendo-as em uma tabela. No momento da execucao ¸˜ uma outra tarefa que tem o controle do processador. ´ e utilizada uma estrutura semelhante a um despachante Quando essa outra tarefa for bloqueada ou preemptada (dispatcher) de tarefas para ativ´ -las de acordo com a pelo escalonador, a primeira tarefa da fila de tarefas o esca-lonamento gerado. Esse mecanismo e baseado ´ prontas assume o controle da CPU, caracterizando, assim, em um timer implementado em hardware, que sinaliza ¸˜ a transicao do estado de pronto para o de executando; quando uma outra tarefa deve ser executada. Na abor- • executando para pronto: quando a tarefa est´ executando a dagem dirigida a rel´ gio todas as tarefas tˆ m o mesmo o e e n˜ o solicita nenhum servico que a bloqueie (como os a ¸ tempo de processador dispon´vel, j´ que o timer possui ı a de E/S), cabe ao escalonador preempt´ -la para que outra a um valor de intervalo fixo. Por outro lado, na abordagem tarefa possa assumir a CPU. Quando isso ocorre (tarefa circular com pesos cada tarefa pode ter um tempo de ´ e preemptada) ela volta para a fila de tarefas prontas e ¸˜ processador para a execucao dos seus trabalhos diferente ¸˜ diz-se que houve uma transicao do estado de executando uma das outras. Para isso, usam-se pesos, onde tarefas para o pronto; com maior peso possuem mais tempo de processador • executando para bloqueado: quando a tarefa est´ execu- a dispon´vel para serem executadas. Nesse caso, o escalon- ı tando e solicita algum servico que seja bloqueante, como ¸ ´ amento e realizado de forma circular; ¸˜ por exemplo uma operacao de E/S, ela passa do estado • online - realiza todas as decis˜ es relacionadas ao escalon- o de executando para o de bloqueado; ¸˜ amento das tarefas durante a execucao do sistema. Essas • bloqueado para pronto: quando a tarefa est´ bloqueada a decis˜ es s˜ o baseadas em diversos parˆ metros que podem o a a ¸ ´ e o servico que foi solicitado e conclu´do ou h´ alguma ı a ¸˜ ou n˜ o mudar em tempo de exe-cucao. Essa abordagem a tarefa de prioridade superior sendo executada; ela volta comporta os algoritmos dirigidos a prioridade, onde e ´ para a fila de pronto do sistema, e; atribu´da uma prioridade para cada tarefa. O escalonador ı • bloqueado para executando: quando a tarefa est´ blo- a ` disponibiliza o recurso computacional aquela tarefa que ´ queada e o servico que foi solicitado e conclu´do ou ¸ ı tiver maior prioridade. Nessa categoria, h´ m´ todos pre- a e n˜ o h´ nenhuma tarefa de prioridade superior sendo a a emptivos e n˜ o-preemptivos, sendo que diversos algorit- a executada; ela volta para o estado de executando. mos podem ser citados como exemplo. Entre os mais Dessa forma, cabe aos escalonadores de RTOS manter duas utilizados est˜ o os algoritmos Rate-Monotonic (RM) [21], acaracter´sti-cas fundamentais a esses sistemas: o cumprimento ı [22], Deadline Monotonic (DM), Earliest-Deadline-First ¸˜de prazos e a previsibilidade na execucao das tarefas. Para (EDF) [23], [24], Least-Slack-Time (LST) e Latest-realizar o cumprimento de prazos, o escalonador deve ser Release-Time (LRT) (Reverse-EDF). Desses algoritmos,capaz de evitar a perda de dados ou o descumprimento de ´ o Rate-Monotonic e o menos complexo, sendo muito
  • a ´ ´ utilizado. J´ o algoritmo EDF e considerado otimo - assim • componentes - uma vez definidos quais os componentes como o LST e o LRT - por´ m, quando comparado aos e necess´ rios, os mesmos devem ser concebidos e imple- a ´ ´ dois ultimos, e o menos complexo do ponto de vista mentados incluindo m´ dulos em software e hardware o computacional, e, por consequˆ ncia, mais utilizado que e especializados, e; os demais. • integracao do sistema - a montagem do sistema de ¸˜ Observa-se que existem diversas pol´ticas de escalonamento ı ´ forma global e alcan-cada pela uni˜ o dos componentes ¸ aempregadas em RTSs. Apesar disso, existem dois algoritmos previamente concebidosque se destacam perante os outros, pois s˜ o os mais empre- agados: o RM (do inglˆ s, Rate Monotonic) e EDF (do inglˆ s, e eEarliest Deadline First) [25].D. Modelo e t´ cnicas de programacao e ¸˜ Uma vez que os sistemas embarcados possuem tantas pecu-liaridades, o desenvolvimento de seu software tamb´ m sofre eas consequˆ ncias dessa ca-racter´stica. Com a utilizacao de e ı ¸˜plataformas monoprocessadas, multiprocessadas e at´ virtual- e `izadas, um dos principais problemas diz respeito a maneira ¸˜ ´pela qual a programacao desses recursos e realizada [26]. As- Fig. 10. ı ¸˜ Principais n´veis de abstracao no processo de desenvolvimentosim, pode-se observar que existem diversos n´veis de abstracao ı ¸˜ Fonte: Adaptado de [25]para que o software possa ser projetado e desenvolvido.Esses n´veis tˆ m o intuito de facilitar o desenvolvimento dos ı e ¸˜ Adicionalmente, a fase de verificacao e teste de um sistemasistemas, al´ m de diminuir os erros encontrados durante e ap´ s e o embarcado deve garantir a sua confiabilidade, sem, no entanto, ¸˜sua implantacao. ultrapassar o restrito espaco de tempo designado para o projeto ¸ Devido a poss´vel complexidade da arquitetura de um ı e desenvolvimento do produto final permitindo que atinga o ´sistema embarcado e essencial que o projeto elaborado ı ´ time-to-market esperado. Para que isso seja poss´vel, e im-seja dividido em diversos n´veis de abstracao. A ex- ı ¸˜ ¸˜ portante que se realize a validacao de diferentes componentesistˆ ncia de ferramentas de CAD, cada vez mais robus- e (tanto de software quando de hardware) de maneira separada.tas, que automatizam as diversas etapas da metodologia Dessa forma, a divis˜ o do sistema em n´veis de a ıgarantem um produto final confi´ vel, mesmo aqueles ini- a ¸˜ ´ abstracao e novamente empregada para permitir acialmente descritos nas camadas mais altas de abstracao ¸˜ ¸˜ validacao separada dos componentes de software[25]. ¸˜ [27]. A Figura 11 ilustra essas camadas de abstracao para uma Resumidamente, pode-se adotar uma metodologia que seja ¸˜ aplicacao simples que cont´ m trˆ s tarefas (denominadas no e e ¸˜abrangente a todos os n´veis de abstracao envolvidos em um ı gr´ fico de T1, T2 e T3) a serem mapeadas em uma arquitetura aprojeto desse gˆ nero. Assim, de forma gen´ rica, os principais e e composta de dois processadores e subsistemas de hardware. ı ¸˜n´veis de abstracao no processo de desenvolvimento s˜ o: req- a ¸˜ Para cada um dos n´veis essa figura mostra a organizacao do ı ¸˜uisitos, especificacao, arquitetura, componentes e integra-cao ¸˜ software, a interface entre software e hardware e a plataformado sistema. Esses n´veis podem ser visualizados na Figura 10 ı de desenvolvimento de software que ser´ utilizada para sua aonde, no lado esquerdo, est´ exibida a abordagem top-down a ¸˜ valida-cao em cada um dos n´veis. ı ´enquanto que no lado direito a vis˜ o bottom-up e evidenciada. a Segundo [27], a Figura 11 apresenta quatro diferentes n´veis ıEssas abordagens dizem respeito a maneira pela qual o sistema ¸˜ de abstracao (descritos dos mais altos para os mais baixos´ ¸˜e desenvolvido. Na primeira solucao inicia-se o processo de ¸˜ n´veis de abstracao): ı ¸˜planejamento pela definicao dos requisitos do sistema, seguido • n´vel de arquitetura do sistema - um conjunto de ı ¸˜da especificacao e assim por diante. Por outro lado, na segunda ¸˜ funcoes agrupadas em tarefas formam um software. Aalternativa inicia-se o projeto a partir dos componentes sendo ¸˜ ¸˜ comunicacao entre funcoes, tarefas e subsistemas e re-´ ´ ¸˜que a etapa seguinte e a definicao da arquitetura e assim ¸˜ alizada atrav´ s de canais abstratos de comunicacao. A e ¸˜sucessivamente. A seguir, a descricao das fases que constituem ¸˜ ı ´ simulacao nesse n´vel e realizada, por exemplo, atrav´ s do ecada uma dessas abordagens: ambiente Simulink com objetivo de validar a funcionali- • requisitos - s˜ o listados os requisitos funcionais e estru- a ¸˜ dade da aplicacao. turais do sistema como um todo; • n´vel de arquitetura virtual - cada uma das tarefas s˜ o ı a • especificacao - nessa etapa cada um dos requisitos ante- ¸˜ refinadas, por e-xemplo, em c´ digos C que contˆ m o o e riormente citados s˜ o detalhados de forma a descrever a a ¸˜ c´ digo da aplicacao final e usam a API (do inglˆ s, o e maneira pela qual o sistema deve se comportar; Application Programming Interface) HdS (do inglˆ s, e • arquitetura - s˜ o descritos os detalhes internos do sistema a Hardware Dependent Software) - cujas primitivas de propondo como pode ser constru´do. Nessa etapa est˜ o ı a ¸˜ comunicacao acessam de forma expl´cita os componentes ı estruturados os componentes a serem utilizados; ¸˜ de comunicacao;
  • • ı ¸˜ ´ n´vel de arquitetura de transacoes - o software e ligado que SOs diferentes possam coexistir na mesma m´ quina. Nesse a especificamente a um OS (do inglˆ s, Operating System) e e caso, pode-se atacar dois problemas diferentes: a um software de E/S (Entrada e Sa´da), respons´ vel por ı a a ´ ı ¸˜ • o uso de software legado, j´ que e poss´vel a criacao (ou ¸˜ implementar as unidades de comunicacao. O software re- ¸˜ manutencao) de um sistema operacional compat´vel com ı ı ¸˜ sultante usa primitivas de n´veis de abstracao de hardware esse software e outro mais moderno, que permita que (do inglˆ s, Hardware Abstraction Level - HAL). e novos recursos sejam explorados, e; • n´vel de prot´ tipo virtual - a API HAL e o processador ı o • dividir o sistema em uma parte onde o usu´ rio tem acesso, a s˜ o implementados atrav´ s do uso de uma camada de a e com chama-das espec´ficas conhecidas por ele, separadas ı software HAL e da parte correspondente do proces- da parte cr´tica, respon-s´ vel por manter o dispositivo ı a sador para cada um dos subsistemas de software. A funcionando. Nesse caso, dois sistemas operacionais, um ¸˜ ı ´ simulacao nesse n´vel e realizada atrav´ s de modelos de e de usu´ rio e outro de sistema, podem ser empregados a ¸˜ co-simulacao hardware/software cl´ ssicos. a simultaneamente. ¸˜ Quando a virtualizacao for empregada com esses objetivos, o hypervisor deve ter controle total do hardware al´ m de criar e diferentes m´ quinas virtuais, uma por OS. Como pode ser a observado na Figura 12, essa abordagem pode ser usada tanto em m´ quinas mono- ou multi-core. Ainda, permite que se a aumente a qualidade de desenvolvimento de software, uma vez que o projetista pode escolher entre diversos OSs aquele mais ` ¸˜ adequado a sua aplicacao. Al´ m disso, o tempo requerido para e Fig. 11. ¸˜ N´veis de abstracao de software ı ¸˜ desenvolver uma aplicacao pode ser reduzido drasticamente, j´ a Fonte: Adaptado de [27] ¸˜ que a reusabilidade das aplicacoes cresce sensivelmente [28].E. Exemplos de Aplicacoes ¸˜ ¸˜ Algumas aplicacoes s˜ o consideradas cl´ ssicas em termos a ade sistemas embarcados e, usualmente, s˜ o utilizadas como abenchmarks para o teste de novas propostas. ¸˜ Uma das aplicacoes mais utilizadas atualmente, principal- ¸˜ ¸˜mente em funcao da ampla utilizacao de dispositivos embarca- ı ´ a ´dos multim´dia, e o padr˜ o H.264. Esse e um padr˜ o utilizado apara compress˜ o de v´deo baseado originalmente no formato a ıMPEG-4. Dentre os sistemas que utilizam esse padr˜ o pode-se acitar a TV digital brasileira, aparelhos de DVD e consoles dev´deo-game. ı Fig. 12. ¸˜ Hypervisor para separacao de m´ quinas com v´ rios OSs a a ´ O H.264 e dividido em perfis e, visto que sua utilizacao ¸˜´e bastante ampla (de produtos port´ teis at´ decodificadores a e Al´ m disso, essa abordagem oferece a vantagem de se e ¸˜ ´de alta definicao) e importante otimizar ao m´ ximo a sua a alcancar uma arquitetura de software unificada que pode ser ¸ ¸˜execucao, visando diminuir itens como custo de fabricacao, ¸˜ executada em m´ ltiplas plataformas de hardware. Nesse caso, uo consumo de potˆ ncia e energia, visando os dispositivos que e um problema atual e recorrente nos sistemas embarcados - aoperam com bateria. portabilidade de software - pode ser amplamente afetado e os Ainda, padr˜ es famosos como JPEG e MP3 figuram entre as o projetistas tˆ m o potencial de satisfazer mais rapidamente o e ¸˜principais aplicacoes embarcadas de uso cotidiano e em pro- time-to-market cada vez mais restrito. ¸˜dutos de natureza diversa e que, portanto, possuem restricoes Adicionalmente, a seguranca do sistema embarcado tamb´ m ¸ eigualmente heterogˆ neas. e ´ ¸˜ e um forte apelo para a virtualizacao, j´ que atrav´ s dela a e pode se prevenir que ataques ocorridos ao sistema atinjamF. Virtualizacao de Software ¸˜ o OS principal, como pode ser visto na Figura 13. Nesse ¸˜ A virtualizacao do software objetiva permitir o uso de sis- ` caso, o c´ digo malicioso fica restrito a maquina virtual, sem otemas legados com sistemas atuais, aumentar a seguranca dos ¸ contaminar o resto do sistema, pois n˜ o possui o conhecimento asistemas embarcados, al´ m de - juntamente com plataformas e necess´ rio do hypervisor para poder explorar os seus pontos avirtualizadas - reduzir o custo total do produto final e aumentar fracos. Ainda, o hypervisor pode detectar a ocorrˆ ncia de um e ¸˜seu desempenho. Nessa Secao encontram-se os detalhes mais ataque e reinicializar a maquina virtual, sem prejudicar o resto ¸˜importantes a respeito da aplicacao dessa t´ cnica nos sistemas e do sistema.embarcados. ¸˜ Enquanto a virtualizacao embarcada pode trazer in´ meras u Uma das vantagens mais apropriadas e diretas da ´ vantagens, e importante que se esclareca a que custo esses ¸ ¸˜virtualizacao em sistemas embarcados consiste em permitir ¸˜ benef´cios podem ser alcancados. Algumas das limitacoes j´ ı ¸ a
  • diversos subsistemas embarcados cooperem entre si, o que altamente desej´ vel em sistemas embarcados. a ¸˜ J´ entre os principais usos para virtualizacao em sistemas a ¸˜ embarcados, pode-se destacar a reducao do n´ mero total u de processadores em um sistema, colocando-os em diversas ´ m´ quinas virtuais sobre um unico processador (seja ele mono- a ou multi-core). Em outro exemplo, a confiabilidade de sistemas as- sim´ tricos, onde cada processador de um sistema multipro- e cessado possui seu pr´ prio OS, pode ser aumentada atrav´ s o eFig. 13. Ataque de usu´ rio bloqueado atrav´ s do isolamento das m´ quinas a e a ¸˜ da separacao dos recursos, com a capacidade de se reiniciarvirtuais as m´ quinas virtuais de maneira independente. Tamb´ m, e a e ´ poss´vel migrar sistemas existentes em uma m´ quina virtual ı a e adicionar novas funcionalidades a elas, provendo assim, a ¸˜est˜ o presentes na virtualizacao de prop´ sito geral, enquanto o ¸˜ oportunidade para reuso e inovacao. Al´ m disso, a migracao e ¸˜que outras surgem do seu uso em ambientes t˜ o severamente a ´ de tarefas entre m´ quinas virtuais e facilitada. ares-tritos, como os sistemas embarcados. Para que mecanismos Finalmente, vale destacar que diversos es- ¸˜ ´de virtualizacao possam ser implementados, e necess´ rio que a tudos sobre ¸˜ virtualizacao em sistemas em-exista suporte do hardware, o que muitas vezes n˜ o existea barcados tˆ m e sido realizados [29], [30], a ´ aou n˜ o e vi´ vel em sistemas embarcados. O suporte para [31] e j´ existem diversos sistemas espec´ficos com o a ı ¸˜ ´virtualizacao acarreta em um crescimento na area do chip, intuito de monitorar as m´ quinas virtuais embarcadas. ae consequentemente do consumo de energia e aumento de Dentre esses sistemas pode-se destacar [32], [33], [34], [35], e ¸˜temperatura. Algumas t´ cnicas, como traducao dinˆ mica dea [36], [37]. ¸˜c´ digo ou emulacao podem ser utilizadas como alternativas. o G. Principais DesafiosEstas t´ cnicas, no entanto, aumentam o tempo de execucao e ¸˜ ¸˜da aplicacao e mecanismos de hardware, apesar de suas ´ Os desafios na area do software embarcado dizem respeito,desvantagens, mostram-se mais adequados na grande maioria principalmente, ao desenvolvimento em n´veis mais altos de ıdos casos [SPE]. ¸˜ ¸˜ abstracao, mas que permitam simulacoes com fidelidade ao Um dos principais problemas a ser atacado est´ relacionado a modelo final bastante agucada. ¸com o escalo-namento das tarefas realizado pelo hypervisor. e ¸˜ Tamb´ m figura como grande desafio a melhor utilizacao das ¸˜Sistemas embarcados tipicamente tˆ m restricoes temporais e, e plataformas multiprocessadas tanto no que tange ao aproveita-por isso, qualquer deslize do hypervisor pode comprometer o mento da totalidade desse poder computacional quanto no quesistema. ` diz respeito a maneira pela qual o software dessas plataformas ´ ¸˜ e desenvolvido. Programacao paralela tem sido um desafio ao Pode-se ainda considerar o caso onde um dado multi-core ¸˜ a ´ longo dos anos para a computacao de prop´ sito geral e n˜ o e oapresenta um comportamento multi-processado assim´ trico, e diferente no mundo dos embarcados.com dois OSs: um de usu´ rio e um RTOS. Nesse caso, a ¸˜ Do ponto de vista do hardware, a principal atencao deve ´cada OS e tratado como uma m´ quina virtual separada e, em a ser dada aos meios de interconex˜ o que devem ser eficientes a ´sistemas embarcados, e desej´ vel que o RTOS seja priorizado a o bastante para atender a uma quantidade cada vez maior de ¸˜em relacao ao OS de usu´ rio, assim como tarefas de tempo- a ´ elementos de processamento em um unico sistema. Pesquisasreal que eventualmente sejam executadas no OS de usu´ rio a envolvendo novos algoritmos de roteamento para NoCs, a ¸˜(como aplicacoes multim´dia) tamb´ m devem ter preferˆ ncia. ı e e mescla entre barramentos e NoC e at´ mesmo o uso de eEsse escalonamento com prioridades vai de encontro com os ¸˜ virtualizacao como estrat´ gia para diminuir o overhead de eprinc´pios das m´ quinas virtuais, nos quais todas as m´ quinas ı a a ¸˜ comunicacao tamb´ m se mostram como uma forte tendˆ ncia. e e ¸˜virtuais devem dividir o hardware real em proporcoes iguais. Al´ m disso, a heterogeneidade t´pica de sistemas embarca- e ı III. E STUDO DE C ASO - S ISTEMA H ELLFIREdos pode representar um grande desafio, j´ que o hypervisor a O projeto de um sistema embarcado cont´ m diver- etem de, teoricamente, conseguir comunicar-se com o maior ¸˜ sas restricoes como cada vez menores time-to-market.n´ mero poss´vel de arquiteturas. Enquanto que na computacao u ı ¸˜ Nesse contexto, plataformas de desenvolvimento, teste e o ´de prop´ sito geral a arquitetura Intel x86 e amplamente ¸˜ simulacao de sistemas embarcados, como por exemplousada, por exemplo, em sistemas embarcados existe uma [38] e [26], procuram diminuir o tempo gasto com o de-grande variedade de arquiteturas empregadas, desde DSPs a senvolvimento disponibilizando recursos que o agilizem, taisprocessadores ARM, passando ainda por arquiteturas PowerPC ¸˜ como simuladores e ferramentas de depuracao (do inglˆ s, ee MIPS. debug). Ainda, o isolamento excessivo e absoluto trazido pelas ¸˜ Nesta Secao, destaca-se a plataforma empregada comom´ quinas virtuais - que aumenta os n´veis de seguranca a ı ¸ estudo de caso no presente trabalho: o Sistema Hellfire [39].e confiabilidade - podem causar dificuldades para que os ´ Atualmente, o Sistema Hellfire e constitu´do por trˆ s m´ dulos: ı e o
  • • ¸˜ OS, que cont´ m a descricao do HellfireOS. Nesse m´ dulo e o camada de abstracao de hardware (HAL) para uma solucao ¸˜ ¸˜ todas funcionalidades b´ sicas de um RTOS est˜ o disponi- a a espec´fica de hardware, o que facilita a portabilidade do sis- ı bilizadas para uso via uma API; tema operacional para outras arquiteturas. Atualmente, existem • hardware, formado pela plataforma prototipada em FPGA vers˜ es para as arquiteturas MIPS (multiprocessador) e x86 o (do inglˆ s, Field-programmable gate array - FPGA). e (monoprocessador). Nesse m´ dulo quatro processadores MIPS est˜ o interliga- o a A Figura 14 apresenta a estrutura do sistema operacional. dos por um barramento simples e, em cada processador, ¸˜ Todas as funcoes dependentes de arquitetura s˜ o implemen- a uma imagem do HellfireOS e instanciada, e;´ ´ tadas na HAL (camada 1). O microkernel e implementado ¸˜ • simulacao, consistindo em um ambiente onde e poss´vel ´ ı sobre esta camada (camada 2). Alguns device drivers de simular at´ 256 processadores contendo o HellfireOS. e baixo n´vel s˜ o implementados nesta camada, onde possuem ı a HellfireOS. Esee sistema operacional, baseado em uma acesso privilegiado ao sistema e ao hardware. Uma bibliotecaarquitetura microkernel, segue os principais conceitos de ¸˜ reduzida de funcoes padr˜ o da linguagem C (LibC), assim aRTOSs apresentados anteriormente e possui ferramentas para como a API (do inglˆ s, Application Programming Interface) e ¸˜ ¸˜o desenvolvimento e simulacao de aplicacoes embarcadas de do sistema operacional s˜ o implementadas sobre o microkernel atempo real [39]. O sistema operacional pode ser configurado (camada 3). Tanto as tarefas quanto o sistema operacional ¸˜de acordo com a aplicacao a ser executada e parˆ metros como a compartilham a biblioteca padr˜ o, o que permite reducao a ¸˜o n´ mero m´ ximo de tarefas no sistema, tamanho de pilha u a ¸˜ na utilizacao de mem´ ria. As tarefas de usu´ rio s˜ o imple- o a adas tarefas, tamanho da mem´ ria heap (pode ser alocada di- o mentadas na camada 4, e utilizam da API disponibilizada.namicamente), pol´tica de escalonamento, opcoes para debug, ı ¸˜ Nessa camada tamb´ m s˜ o implementados os device drivers e a ¸˜velocidade do processador, migracao de tarefas e verificacao ¸˜ que executam em n´vel de usu´ rio, que possuem os mesmos ı ade erros de hardware tamb´ m podem ser customizados. e parˆ metros de tarefas do sistema, ou seja, s˜ o regidos pela a a ¸˜ ´ O principal objetivo dessa customizacao e permitir que mesma pol´tica de escalonamento. ıo tamanho da imagem bin´ ria final1 do sistema operacional a ¸˜seja otimizada, tornando poss´vel a execucao do sistema em ıarquiteturas com tamanho de mem´ ria reduzido. Algumas das ofuncionalidades disponibilizadas ao desenvolvedor incluem: • Sistema operacional preemptivo (tarefas podem opcional- mente coope-rar); • Gerenciamento dinˆ mico de tarefas (adicionar, remover, a bloquear, resumir, alterar parˆ metros, fork()); a ¸˜ • Chamadas de sistema (informacoes sobre deadlines, uso de processador, mem´ ria, energia, parˆ metros de tarefas, o a tempos de trocas de contexto); • Diferentes pol´ticas de escalonamento para tarefas com ı prioridade fixa (Rate Monotonic e Priority Round Robin) e dinˆ mica (Earliest Deadline First); a Fig. 14. Estrutura em camadas do sistema operacional Hellfire • Exclus˜ o m´ tua e sem´ foros; a u a ¸˜ ¸˜ • Alocacao, liberacao e gerˆ ncia dinˆ mica de mem´ ria; e a o ¸˜ Rotinas de tratamento de interrupcao, salvamento e ¸˜ • Verificacoes de integridade do sistema de forma au- ¸˜ recuperacao de contexto s˜ o dependentes de arquitetura e a tom´ tica; a dessa forma foram escritas em linguagem de m´ quina. Essas a • LibC customizada (com funcionalidades adicionais, como ¸˜ ´ rotinas fazem parte da camada de abstracao de hardware. E im- ¸˜ c´ lculos de CRC, geracao de n´ meros randˆ micos, entre a u o portante salientar que essa camada pode ser facilmente portada outros); ` para outras arquiteturas, devido a modularidade do sistema ¸˜ • Biblioteca para emulacao de ponto flutuante com pre- ¸˜ operacional. O fluxo de execucao do sistema operacional segue cis˜ o simples (com funcionalidades adicionais como a ¸˜ os padr˜ es de inicializacao e espera por eventos como outros o convers˜ es, c´ lculos de raiz quadrada e funcoes o a ¸˜ sistemas existentes. trigonom´ tricas); e ¸˜ • Comunicacao entre tarefas por trocas de mensagem ou A. Medidas de Escalonamento mem´ ria compartilhada; o De acordo com [40], um sistema operacional de tempo real ¸˜ • Migracao de tarefas. a ´ n˜ o e apenas definido por seu comportamento, ou seja sua Perif´ ricos s˜ o acessados atrav´ s de entrada e sa´da mapeada e a e ı pol´tica de escalonamento, mas tamb´ m por suas propriedades ı eem mem´ -ria. O mapa de perif´ ricos pode ser configurado na o e ¸˜ ¸˜ temporais, as quais impactam na evolucao da execucao de um conjunto de tarefas. 1 A imagem bin´ ria final do sistema e composta pelo sistema operacional a ´ Base de Tempo. Provida por um contador em hardware ´e tarefas que executam no mesmo. Essa imagem e carregada na mem´ ria o ¸˜de uma unidade de processamento, permitindo que, ap´ s a inicializacao, o o de 32 bits, que opera na mesma frequˆ ncia do elemento esistema operacional execute as tarefas. ´ de processamento. Essa base de tempo e referenciada como
  • `tick, e corresponde as unidades das medidas utilizadas nas que no per´odo em quest˜ o a tarefa n˜ o realiza chamada ı a a ¸˜definicoes dos parˆ metros de tarefa. Pode-se selecionar um a por reescalonemento e a mesma seja preemptada ap´ s o osinal apropriado desse contador em hardware e a partir desse t´ rmino do tick). Se for considerada uma latˆ ncia de 1500 e e ¸˜ ¸˜sinal obter-se a gera-cao de interrupcoes de timer. Dependendo ciclos do sistema operacional2 , e observado um overhead de ´ ¸˜do sinal selecionado e da frequˆ ncia de operacao, podem ser e aproximadamente 0.57%.obtidos diferentes per´odos de tick. ı ¸˜ a ¸˜ Interrupcoes do timer s˜ o utilizadas para a geracao de ticks ´ O per´odo de tick e calculado de acordo com a f´ rmula, ı o do sistema. O seu per´odo deve ser bem balanceado, de forma ıonde a e o sinal desejado do contador e f req e a frequˆ ncia ´ ´ e que uma fatia de tempo muito longa pode tornar o sistema ¸˜de operacao do elemento de processamento, em hertz: a ¸˜ pouco responsivo (e pode n˜ o honrar as restricoes de tempo 2a real) e uma fatia de tempo muito curta pode aumentar o periodo = f req overhead do sistema operacional. ¸˜ ¸˜ Diferentes frequˆ ncias de operacao e selecao de sinais e B. Implementacao do Modelo de Tarefas ¸˜do contador definem um grande conjunto de valores parao tempo do tick, permitindo ao desenvolvedor a escolha da Uma tarefa τi e definida pelos parˆ metros da n-upla ´ agranularidade de escalonamento adequada a uma determinada (idi , ri , W CETi , Di , Pi ), onde os parˆ metros significam a ¸˜aplicacao. A Tabela I apresenta valores para tempos de tick, identificacao, release time, worst case execution time, deadline ¸˜variando-se o sinal selecionado do contador (bit) e a frequˆ ncia e e per´odo da tarefa τi , respectivamente. ı ¸˜de operacao. ´ O comportamento de uma tarefa e definido como um bloco ¸˜ A Tabela II enumera a quantidade de interrupcoes de timer de c´ digo em linguagem C, implementado por uma funcao do o ¸˜ ¸˜por segundo, de acordo com a frequˆ ncia de operacao e sinal e ¸˜ tipo void, ou seja, uma funcao que n˜ o recebe parˆ metros a aselecionado do contador. Observa-se que a 100MHz e com nem retorna valores. Uma tarefa pode ser entendida como ¸˜o sinal 15 selecionado, s˜ o geradas 3125 interrupcoes por a ¸˜ uma funcao que itera infinitamente, mas pode ser interrompidasegundo, o que equivale, em um algoritmo de escalonamento a qualquer momento pelo sistema operacional (a tarefa e ´que n˜ o reescalona a tarefa rec´ m preemptada ao mesmo a e ¸˜ preemptada) e ter sua execucao continuada posteriormente.n´ mero de trocas de contexto. u Uma troca de contexto ocorre apenas por interrupcao de¸˜ Os valores de 25MHz para a frequˆ ncia de operacao e e ¸˜ hardware (que pode ser mascarada) ou se a tarefa desiste dasinal 18 do contador foram utilizados como padr˜ o, o que a ¸˜ execucao voluntariamente, permitindo que o sistema opera- ¸˜corresponde ao per´odo de 10.48ms entre interrupcoes. Assim, ı ¸˜ cional eleja outra tarefa para execucao. A Figura 15 apresentas˜ o realizadas aproximadamente 95 trocas de contexto por a ¸˜ um exemplo de implementacao, mostrando de maneira geralsegundo. Esses valores s˜ o equivalentes ao prot´ tipo em a o ´ como uma tarefa e organizada. Vari´ veis locais s˜ o declaradas a a ¸˜hardware e foram adotadas em funcao de um compromisso no corpo da tarefa e armazenadas em sua pilha. O c´ digo de oentre o tempo de resposta do sistema operacional, facilidade ¸˜ ´ inicializacao e um segmento de c´ digo que executa apenas o ¸˜na prototipacao e overhead em virtude das trocas de contexto. uma vez, n˜ o sendo seu uso mandat´ rio (pode ser utilizado, a o Overhead do Sistema Operacional. O sistema operacional no entanto, para inicializar estruturas de dados da tarefa). OHellfire provˆ uma chamada de sistema que retorna o tempo e verdadeiro c´ digo da tarefa executa em um laco infinito. o ¸gasto em trocas de contexto em ciclos. Essa chamada utiliza ¸˜o contador em hardware para efetuar a medicao, sendo,portanto, independente das ferramentas de software. Tendo-se o tempo gasto em trocas de contexto (dependente dapol´tica de escalonamento e compilador utilizado), o n´ mero ı u ¸˜de interrupcoes de timer por segundo (ticks) e a frequˆ ncia de e ¸˜operacao, o overhead pode ser calculado por: tps×csl overhead = f req ´ Onde overhead e expresso por um n´ mero entre 0 e 1, tps u´ ´e o n´ mero de ticks por segundo, csl e a latˆ ncia (ou tempo) u e ´ ¸˜das trocas de contexto e freq e a frequˆ ncia de operacao, em e ´hertz. O tempo de uma troca de contexto e despendido sempre Fig. 15. ¸˜ Corpo da descricao de uma tarefa exemplo ¸˜que ocorrer uma interrupcao de timer. Assim, esse custo incidesempre sobre o progresso das tarefas, uma vez que as fatias de Cada tarefa do sistema encontra-se em um dos seguintes es-tempo de processador s˜ o distribu´das de acordo com a pol´tica a ı ı tados: pronta, rodando, bloqueada, esperando e n˜ o executou a ´de escalonamento empregada e o overhead e absorvido a cada ´ ainda. A tarefa e considera-da pronta quando foi preemptadatick. 2 Valor estimado, obtido por testes realizados no sistema operacional Hellfire ¸˜ Como exemplo, a uma frequˆ ncia de operacao de 25MHz e e compilado com GCC 4.4.2 e executando a pol´tica Rate Monotonic. A latˆ ncia ı e ı ¸˜um per´odo entre interrupcoes de 10.48ms, uma tarefa escalon- depende de fatores como compilador, arquitetura, pol´tica de escalonamento ıada executa por aproximadamente 262000 ciclos (supondo ¸˜ e sua implementacao.
  • TABELA I VALORES PARA TEMPOS DE tick ¸˜ Freq. de operacao (MHz) 15 16 17 18 19 20 21 25 1.31ms 2.62ms 5.24ms 10.48ms 20.97ms 41.94ms 83.88ms 33 0.99ms 1.98ms 3.97ms 7.94ms 15.88ms 31.77ms 63.55ms 50 0.65ms 1.31ms 2.62ms 5.24ms 10.48ms 20.97ms 41.94ms 66 0.49ms 0.99ms 1.98ms 3.97ms 7.94ms 15.88ms 31.77ms 100 0.32ms 0.65ms 1.31ms 2.62ms 5.24ms 10.48ms 20.97ms TABELA II ´ N UMERO DE TROCAS DE CONTEXTO ¸˜ Freq. de operacao (MHz) 15 16 17 18 19 20 21 25 763.36 381.68 190.84 95.42 47.69 23.84 11.92 33 1010.1 505.05 251.89 125.94 62.97 31.48 15.74 50 1538.46 763.36 381.68 190.84 95.42 47.69 23.84 66 2040.82 1010.1 505.05 251.89 125.94 62.97 31.48 100 3125 1538.46 763.36 381.68 190.84 95.42 47.69pelo sistema operacional ou realizou pedido de reescalona- armazenadas em uma estrutura especial denominada TCBmento voluntariamente. Nesse estado, a tarefa encontra-se na (do inglˆ s, Task Control Block, bloco de controle de tarefa). efila de escalonamento. No estado rodando a tarefa encontra-se Nessa estrutura, o sistema operacional mant´ m todas as pro- e ¸˜ ´em execucao e acabou de ser escalonada. A tarefa e encontrada ¸˜ ¸˜ priedades das tarefas: sua identificacao, descricao, estado deno estado bloqueada quando est´ pronta para rodar, no entanto, a ¸˜ escalo-namento, informacoes de progresso, per´odo, tempo ıfoi removida da fila de escalonamento (vo-luntariamente ou ¸˜ ¸˜ de execucao, deadline, utilizacao do processador e mem´ ria, on˜ o). No estado esperando a tarefa est´ em espera em um a a contexto da tarefa, ponteiros de uso geral (regi˜ o de mem´ ria a o a ¸˜sem´ foro e n˜ o pode progredir sua execucao at´ que outra a e ¸˜ da pilha, por exemplo) e informacoes sobre transmiss˜ o de atarefa o incremente at´ o ponto em que ela seja liberada. e dados.Inicialmente, todas as tarefas encontram-se no estado n˜ o a ¸˜executou ainda. Ap´ s a primeira execucao, se n˜ o ficar presa o a C. Pol´ticas de Escalonamento ıem um sem´ foro ou bloqueada uma determinada tarefa e a ´ Segundo [41], pol´ticas que utilizam prioridades est´ ticas ı amantida no estado pronta at´ que seja escalonada novamente. e s˜ o mais adequadas a sistemas de tempo real, devido a um aCaso n˜ o exista tarefa a ser escalonada, uma tarefa especial a fator conhecido como estabilidade. Este fator define que, ¸˜adicionada na inicializacao do sistema chamada idle task e ´ mesmo em um sistema sobrecarregado, aonde ocorrem perdasescalonada. Apenas s˜ o executadas tarefas que estiverem na a de deadline, as tarefas com maior prioridade n˜ o s˜ o afetadas a afila de escalonamento. Os poss´veis estados em que uma tarefa ı pela sobrecarga. Pol´ticas de escalonamento com prioridade ıpode estar s˜ o apresentados na Figura 16. a dinˆ mica, apesar de apresentarem uma maior utilizacao de a ¸˜ processador, s˜ o inst´ veis a partir do momento em que ocorre a a ¸˜ uma situacao de sobrecarga, o que para muitos sistemas de ´ tempo real e inadmiss´vel.ı ı a ´ A pol´tica padr˜ o empregada no sistema Hellfire e o algo- ritmo Rate Monotonic, onde tarefas com per´odos curtos pos- ı suem prioridade sobre outras tarefas. Essa pol´tica, no entanto, ı n˜ o garante que todas as tarefas ser˜ o executadas obedecendo a a ¸˜ restricoes temporais caso o conjunto n˜ o seja escalon´ vel. a a Outras pol´ticas est˜ o dispon´veis e o pr´ prio programador ı a ı o ¸˜ pode definir novas pol´ticas devido a alta modularizacao do ı sistema. D. Comunicacao Entre Tarefas ¸˜ ¸˜ ´ A comunicacao entre tarefas e realizada por dois mode- Fig. 16. Estados das tarefas ´ los diferentes no sistema Hellfire. O primeiro e o modelo ¸˜ de comunicacao por mem´ ria compartilhada, adequado para o ¸˜ Para garantir a execucao de tempo real do sistema, tarefas tarefas que executam no mesmo processador. O outro modelo ¸˜ ¸˜n˜ o podem desabilitar interrupcoes. Mascarar uma interrupcao a ´ ¸˜ e comunicacao por troca de mensagens, adequado para tarefasdo timer, mesmo que por um curto espaco de tempo, pode ¸ que executam em processadores diferentes. ¸˜fazer com que o kernel perca a interrupcao e o escalonamento Esses dois modelos diferem em sua perspectiva deperca sua validade de tempo real. ¸˜ ¸˜ programacao. Comuni-cacao por mem´ ria compartilhada o ¸˜ Todas as informacoes que dizem respeito a tarefas s˜ o a ¸˜ pode ser implementada pela protecao de uma estrutura
  • de dados global compartilhada. Essa estrutura pode ser ¸˜ criacao de uma imagem bin´ ria final, utilizando ferramentas ade qualquer tipo, como por exemplo, uma struct em ¸˜ ¸˜ ´ para a manipulacao. A criacao da imagem final e necess´ ria a ¸˜ ¸˜ ´linguagem de programacao C. A protecao e feita com para que a mesma possa ser diretamente carregada na mem´ ria oo uso de primitivas para exclus˜ o m´ tua ou sem´ foros a u a de determinado elemento de processamento. ¸˜[42]. Essa protecao precisa ser utilizada para evitar que mais ¸˜ N˜ o s˜ o utilizadas funcoes da biblioteca padr˜ o. O sis- a a ade uma tarefa acesse a mesma estrutura de dados de forma tema operacional inclui uma vers˜ o customizada da biblioteca aconcomitante3 , causando incoerˆ ncia dos dados. e ¸˜ padr˜ o, com o objetivo de reduzir a utilizacao de mem´ ria e a o ¸˜ ´ Comunicacao por troca de mensagens e implementada com ¸˜ aumentar o desempenho das aplicacoes.o uso de primitivas espec´ficas do sistema operacional, que ıpodem enviar e receber qualquer tipo de dado. O programador G. M´ dulo de Hardware o´ a ¸˜e respons´ vel por alocar buffers e especificar a identificacao O m´ dulo o de hardware ´ e com-´unica da tarefa a receber os dados em um envio. A tarefa posto por quatro processadores MIPSdestino automaticamente identifica a tarefa fonte em um [43] interligados por um barramento. Esse m´ dulo e o ´recebimento de dados. prototipado em um FPGA e em cada processador uma ¸˜ As primitivas de comunicacao por troca de mansagem ´ ´ imagem do OS e instanciada. O processador utilizado e umseguem o modelo produtor / consumidor. Cada tarefa possui Plasma [44] (arquitetura MIPS). ¸˜uma fila circular local de recepcao, contendo mensagens quepodem ser retiradas em ordem pela primitiva adequada. Se H. M´ dulo de Simulacao o ¸˜ ´a fila estiver vazia, a tarefa e bloqueada. Da mesma forma, ¸˜ O m´ dulo de simulacao do Sistema Hellfire e chamado o ´uma tarefa que envia dados a outra pode ser bloqueada caso a de N-MIPS MPSoC Simulator e consiste de um ISS. Esse a ¸ ¸˜tarefa receptora n˜ o possua mais espaco na fila de recepcao. A ¸˜ simulador foi descrito em C e permite a simulacao de at´ 256 e ¸˜ ¸˜insercao de dados na fila de recepcao, assim como bloqueio e ´ imagens do HellfireOS. E importante ressaltar que a mesma ¸˜ ´liberacao de tarefas e gerenciado por kernel drivers, ativados imagem usada no hardware pode ser simulada, n˜ o sendo a ¸˜por interrupcao. ¸˜ necess´ ria nenhuma alteracao no c´ digo fonte. Em simulacoes a o ¸˜E. API do Sistema Hellfire ´ com dois ou mais processadores e assumido como meio de ´ A seguir e apresentada a API do sistema operacional ¸˜ comunicacao um barramento simples. ´Hellfire. A interface do sistema e bastante simples, entretanto O N-MIPS gera diversos relat´ rios de funcionamento do ofornece muitos servicos b´ sicos necess´ rios para o desenvolvi- ¸ a a ¸˜ sistema ap´ s a simulacao, conforme lista a seguir: o ¸˜mento de aplicacoes embarcadas de tempo real. A API consiste • sa´da padr˜ o de cada processador; ı aem 6 classes de chamadas de sistema: gerenciamento de tare- o ¸˜ • relat´ rio contendo as instrucoes do Plasma utilizadas, ¸˜fas, informacoes do sistema, exclus˜ o m´ tua, gerenciamento a u ¸˜ quantas vezes cada instrucao foi utilizada e percentual de ¸˜ ¸˜de mem´ ria, primitivas de comunicacao e migracao de tarefas. o ¸˜ uso para cada grupo de instrucoes (aritm´ ticas, l´ gicas, e o ´Esta API e apresentada na Tabela III. ...); • resumo do consumo de energia estimado do sistema comoF. Toolchain um todo e individualmente para cada processador. Para ¸˜ Para o desenvolvimento de aplicacoes e do sistema op- este c´ lculo ser´ usada como base [45]; a aeracional foi cons-tru´do um conjunto de ferramentas para o ı • relat´ rio contendo as principais caracter´sticas do sistema, o ı ¸˜ambiente Linux, baseado na colecao de compiladores GCC como por exemplo n´ mero de perda de deadlines e carga u(do inglˆ s, Gnu Compiler Collection) vers˜ o 4.4.2. O conjunto e a da CPU, e;de ferramentas tem como arquitetura alvo o conjunto de • relat´ rio individual do funcionamento ciclo a ciclo dos o ¸˜instrucoes MIPS, e inclui: ¸˜ processadores. Todas informacoes contidas na pilha do • compilador cruzado (mips-elf-gcc); sistema s˜ o mostradas nesse relat´ rio. a o • montador de linguagem de m´ quina (mips-elf-as); a • linker (mips-elf-ld); I. MPSoC: Particionamento e Mapeamento Inicial ¸˜ • ferramentas para manipulacao de bin´ rios (mips-elf- a Atualmente, tanto o particionamento quanto mapeamento objdump, mips-elf-readelf, mips-elf-objcopy). ´ ´ inicial das tarefas e realizado manualmente, isto e, de- ¸˜ Para a construcao de uma imagem bin´ ria a ser carregada a pende da experiˆ ncia do projetista para obter bons resul- eem cada elemento de processamento, s˜ o realizados alguns a ´ tados. E o desenvolvedor o respons´ vel por descrever a a ¸˜ ¸˜passos: (i) montagem, compi-lacao e customizacao do sistema ¸˜ aplicacao e definir os grupos de tarefa (particionamento) ¸˜ ¸˜ ¸˜operacional; (ii) compilacao da aplicacao; (iii) criacao de uma ¸˜ e a posicao dos grupos nas respectivas unidades de pro- ¸˜imagem ELF contendo a aplicacao e sistema operacional; (iv) ¸˜ cessamento (mapeamento). Essas definicoes s˜ o feitas no a ¸˜ c´ digo fonte da aplicacao, como apresentado na Figura o 3 Diz-se que tarefas acessam dados concomitantemente quando determinada 17. No exemplo, quatro tarefas s˜ o particionadas entre dois a ¸˜tarefa modifica uma estrutura de dados (mas n˜ o completa a modificacao) e aocorre uma troca de contexto, sendo que a tarefa escalonada tamb´ m acessa a e elementos de processamento. N˜ o foi especificada a quanti- aestrutura, corrompendo dados (em uma escrita) ou lendo dados corrompidos. dade de CPUs que comp˜ em a arquitetura, sendo esse um o
  • TABELA III API DO S ISTEMA O PERACIONAL H ELLFIREChamada de sistema Classe Formato ¸˜ DescricaoOS BlockTask() Gerenciamento de tarefas int OS BlockTask(unsigned char id) Bloquear tarefaOS ResumeTask() Gerenciamento de tarefas int OS ResumeTask(unsigned char id) ¸˜ Continuar a execucao de tarefa bloqueadaOS KillTask() Gerenciamento de tarefas int OS KillTask(unsigned char id) Remover tarefaOS AddPeriodicTask() Gerenciamento de tarefas int OS AddPeriodicTask(void (*task)(), Adicionar tarefa peri´ dica o unsigned short int period, unsigned short (realtime) e configurar int capacity, unsigned short int deadline, parˆ metros a char description[], unsigned int energy t, unsigned char locked)OS ChangeTaskParameters() Gerenciamento de tarefas int OS ChangeTaskParameters(unsigned Modificar parˆ metros a char id, unsigned short int period, un- (per´odo, ı tempo de signed short int capacity, unsigned short ¸˜ execucao, deadline e int deadline, unsigned char locked) ¸˜ possibilidade de migracao)OS Fork() Gerenciamento de tarefas int OS Fork(void) Criar uma c´ pia da tarefa, o com os mesmos parˆ metros aOS TaskYield() Gerenciamento de tarefas void OS TaskYield(void) Chamar por escalonamento de maneira volunt´ ria aOS Start() Gerenciamento de tarefas void OS Start(void) Iniciar o sistema opera- cional.OS TaskDeadlineMisses() ¸˜ Informacoes do sistema unsigned int N´ mero de perdas de dead- u OS TaskDeadlineMisses(unsigned char line de uma tarefa id)OS TaskCpuUsage() ¸˜ Informacoes do sistema unsigned int OS TaskCpuUsage(unsigned ¸˜ Utilizacao do processador char id) de uma terefaOS TaskEnergyUsage() ¸˜ Informacoes do sistema unsigned int Consumo de energia de uma OS TaskEnergyUsage(unsigned char tarefa id)OS TaskMemoryUsage() ¸˜ Informacoes do sistema unsigned int ¸˜ Utilizacao de mem´ ria de o OS TaskMemoryUsage(unsigned char id) uma tarefaOS TaskParameters() ¸˜ Informacoes do sistema unsigned int Parˆ metros de uma tarefa a OS TaskParameters(unsigned char id, unsigned short int *period, unsigned short int *capacity, unsigned short int *deadline, unsigned char *locked)OS TaskTicks() ¸˜ Informacoes do sistema unsigned int OS TaskTicks(unsigned char N´ mero de vezes que uma u id) tarefa executouOS TaskLastTickTime() ¸˜ Informacoes do sistema unsigned int Tempo (em ciclos) da OS TaskLastTickTime(unsigned char ´ ¸˜ ultima execucao de uma id) tarefaOS PacketsSent() ¸˜ Informacoes do sistema unsigned int OS PacketsSent(unsigned N´ mero de pacotes envia- u char id) dos por uma tarefaOS PacketsReceived() ¸˜ Informacoes do sistema unsigned int N´ mero de pacotes rece- u OS PacketsReceived(unsigned char bidos por uma tarefa id)OS LastContextSwitchCycles() ¸˜ Informacoes do sistema unsigned int Tempo (em ciclos) da OS LastContextSwitchCycles(void) ´ ultima troca de contextoOS TaskIdFromUniqueId() ¸˜ Informacoes do sistema unsigned char Converte um n´ mero de u OS TaskIdFromUniqueId(unsigned ¸˜ ´ identificacao unico em uma short int uid) id localOS CurrentTaskId() ¸˜ Informacoes do sistema unsigned char OS CurrentTaskId(void) id da tarefa atualOS CurrentTaskUniqueId() ¸˜ Informacoes do sistema unsigned short int ´ id unica da tarefa atual OS CurrentTaskUniqueId(void)OS CurrentCpuId() ¸˜ Informacoes do sistema unsigned char OS CurrentCpuId(void) N´ mero do processador at- u ualOS CurrentCpuFrequency() ¸˜ Informacoes do sistema unsigned int Frequˆ ncia do processador e OS CurrentCpuFrequency(void) atualOS NCores() ¸˜ Informacoes do sistema unsigned char OS NCores(void) N´ mero de processadores u no MPSoCOS NTasks() ¸˜ Informacoes do sistema unsigned char OS NTasks(void) N´ mero de tarefas no pro- u cessador atualOS CpuUsage() ¸˜ Informacoes do sistema unsigned int OS CpuUsage(void) ¸˜ Utilizacao total do proces- sadorOS EnergyUsage() ¸˜ Informacoes do sistema unsigned int OS EnergyUsage(void) Consumo total de energiaOS MemoryUsage() ¸˜ Informacoes do sistema unsigned int OS MemoryUsage(void) ¸˜ Utilizacao total de mem´ ria oOS FreeMemory() ¸˜ Informacoes do sistema unsigned int OS FreeMemory(void) Mem´ ria dispon´vel para o ı ¸˜ alocao dinˆ mica aOS EnterRegion() Exclus˜ o m´ tua a u void OS EnterRegion(mutex *m) Entrada em regi˜ o cr´tica a ıOS LeaveRegion() Exclus˜ o m´ tua a u void OS LeaveRegion(mutex *m) Sa´da de regi˜ o cr´tica ı a ıOS SemInit() Exclus˜ o m´ tua a u void OS SemInit(semaphore *s, int value) Inicializar sem´ foro com a determinado valorOS SemWait() Exclus˜ o m´ tua a u void OS SemWait(semaphore *s) Decrementar sem´ foro, e a esperar se necess´ rio aOS SemPost() Exclus˜ o m´ tua a u void OS SemPost(semaphore *s) Incrementar sem´ foro, e lib- a erar se necess´ rio aOS Free() Gerenciamento de mem´ ria o void OS Free(void *ptr) Liberar regi˜ o de mem´ ria a oOS Malloc() Gerenciamento de mem´ ria o void *OS Malloc(unsigned int size) Alocar regi˜ o de mem´ ria a oOS SendPacket() ¸˜ Comunicacao int OS SendPacket(unsigned short int tar- Enviar pacote de dados para get uid, unsigned char buf[], unsigned tarefa com identificacao ¸˜ short size) ´ unicaOS ReceivePacket() ¸˜ Comunicacao int OS ReceivePacket(unsigned short int Receber pacote de dados *source uid, unsigned char buf[], un- ¸˜ de tarefa com identificacao signed short *size) ´ unicaOS TaskMigrate() ¸˜ Migracao de tarefas int OS TaskMigrate(unsigned char Migrar tarefa local para source id, unsigned char target cpu) outro processador
  • para a tarefa migrada, o kernel driver de comunicacao re- ¸˜ sponde para onde a tarefa foi migrada, de forma que a tarefa da CPU origem possa descobrir o novo destino. Caso a tarefa seja transferida para uma CPU onde j´ esteve, sua entrada na a ¸˜ ´ lista de migracao e removida. Atualmente, o sistema operacional realiza migracao parcial ¸˜ ´ de tarefas, isto e, apenas os dados e o contexto da tarefa migrada s˜ o transferidos a CPU destino. Esse foi o modelo a Fig. 17. Particionamento e mapeamento manual no Hellfire adotado inicialmente devido a fatores como a n˜ o existˆ ncia de a e gerˆ ncia de mem´ ria em hardware (MMU) e falta de suporte e o ¸˜ do toolchain para a geracao de c´ digo completamente re- o ¸˜parˆ metro de configuracao do sistema operacional em tempo a ¸˜ ¸˜ loc´ vel. Apesar de suas limitacoes, a migracao parcial oferece a ¸˜de compilacao e da arquitetura. ¸˜ vantagens em algumas aplicacoes, onde o tempo de migrar o ¸˜ Migracao de Tarefas. O sistema operacional possui, at- c´ digo, al´ m dos dados, torna-se proibitivo. Segundo [46], o o e ¸˜ualmente, uma primitiva que permite a migracao expl´cita de ı ¸˜ tempo de migracao pode ser reduzido significativamente aotarefas de uma CPU para outra. A primitiva OS TaskMigrate() ser utilizado esse m´ todo. e ¸˜ a ¸˜realiza essa funcao e aceita como parˆ metros a identificacao ¸˜ Para que a migracao parcial funcione, o c´ digo das tarefas ode uma tarefa local e o processador destino da tarefa. a serem migra-das precisa estar em todas as CPUs que sejam Todas as tarefas que forem adicionadas ao a ¸˜ alvo prov´ vel de migracao. Al´ m disso, o c´ digo precisa estar e osistema podem ser migradas, desde que tenham sido alinhado, ou seja, os enderecos de c´ digo das tarefas precisam ¸ o ¸˜configuradas com a opcao de migracao TASK CAN¸˜ ser exatamente os mesmos. O alinhamento de c´ digo utiliza re- oMIGRATE. Tarefas de sistema, como a tarefa idle e drivers ´ gras simples onde e estabelecida a ordem de ligacao (linking). ¸˜ a ¸˜s˜ o configurados com a opcao TASK CANNOT MIGRATE e Essas regras ditam que inicialmente todo o c´ digo do sistema o ¸˜ ´a primitiva de migracao e impedida de migrar tais tarefas. operacional que n˜ o tem seus enderecos modificados e ligado, a ¸ ´ ´ Na Figura 18 e apresentado um exemplo de uso da prim- ¸˜ ap´ s as tarefas da aplicacao e por fim c´ digo dependente da o o ¸˜itiva de migracao. No exemplo, existem 2 tarefas de usu´ rio a CPU em quest˜ o, como pol´tica de escalonamento e kernel a ıatribu´das a CPU 0. Ap´ s executarem por um tempo determi- ı o drivers.nado pelo algoritmo, a tarefa migration executa a primitiva de Design Flow. Unificando os itens explicados pode-se vi- ¸˜migracao, transferindo a tarefa i am alive para a CPU 1. sualizar na Figura 19 o fluxo de desenvolvimento adotado pelo sistema Hellfire. Nessa figura, est´ bem clara a divis˜ o a a ¸˜ sofrida pelo sistema em configuracao de software e hardware, sendo que o desenvolvimento da aplicacao, a criacao do ¸˜ ¸˜ ¸˜ projeto na plataforma e a configuracao do OS s˜ o realizadas a ¸˜ como parte do software final. J´ configuracoes do processador a (arquitetura e frequˆ ncia, por exemplo), personalizacoes do e ¸˜ ¸˜ meio de comunicacao dispon´vel (barramento ou NoC) e at´ ı e mesmo o tamanho da mem´ ria final do sistema dizem respeito o ao hardware final que estar´ em constante interacao com o a ¸˜ software configurado anteriormente. ¸˜ Ap´ s essa configuracao inicial de tanto software como o ´ hardware e criada uma imagem bin´ ria contendo o OS e a a ¸˜ aplicacao desejada que juntamente com uma descricao do ¸˜ ´ hardware e passado para a ferramenta N-MIPS para refina- ¸˜ ¸˜ mento e simulacao. Caso modificacoes sejam necess´ rias, no a o ´ pr´ prio framework e poss´vel retornar ao ponto desejado e ı refazˆ -lo. Caso o resultado esperado tenha sido obtido, com e a mesma imagem bin´ ria (contendo a aplicacao e o OS), e a ¸˜ ´ poss´vel test´ -lo em uma plataforma real, desde que requisitos ı a ´ como area do dispositivo sejam respeitados. ¸˜ IV. C ONSIDERAC OES F INAISFig. 18. ¸˜ Exemplo de migracao de tarefas no sistema operacional Hellfire Este Cap´tulo apresentou uma extensa revis˜ o do estado ı a da arte contemplando os principais conceitos envolvendo sis- ¸˜ A primitiva de migracao al´ m de realizar a transferˆ ncia de e e ¸˜ temas embarcados. Al´ m de uma conceituacao inicial, pode- euma tarefa para outra CPU insere em uma lista a identificacao¸˜ ¸˜ se observar exemplos e a descricao de ca-racter´sticas t´picas ı ıglobal da tarefa migrada. Caso seja enviada uma mensagem encontradas nesses sistemas.
  • [2] K. Vivekanandarajah and S. K. Pilakkat, “Task mapping in heterogeneous MPSoCs for system level design,” in ICECCS 2008: Proceedings of the 13th IEEE International Conference on Engineering of Complex Computer Systems. Washington, DC, USA: IEEE Computer Society, Apr. 2008, pp. 56–65. [Online]. Available: http://dx.doi.org/10.1109/ICECCS.2008.18 [3] A. Sangiovanni-Vincentelli, “Quo vadis, SLD? reasoning about the trends and challenges of system level design,” Proceedings of the IEEE, vol. 95, no. 3, pp. 467–506, 2007. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=4167779 [4] Y. Cho, S. Yoo, K. Choi, and N.-E. Z. A. A. Jerraya, “Scheduler implementation in mp soc design,” in ASP-DAC ’05: Proceedings of the 2005 conference on Asia South Pacific designautomation. New York, NY, USA: ACM Press, 2005, pp. 151–156. [5] G. Marchesan Almeida, G. Sassatelli, and P. Benoit, “An adaptive message passing mpsoc framework,” International Journal of Reconfigurable Computing, vol. 2009, pp. 1–21. [Online]. Available: http://hindawi.com/journals/ijrc/2009/242981.pdf [6] W. Wolf, “How many system architectures?” Computer, vol. 36, no. 3, pp. 93–95, 2003. [7] L. Carro and F. R. Wagner, “Sistemas computacionais embarcados,” in Jornadas de Atualizacao em Inform´ tica 2003, 2003, ch. 2. ¸˜ a [8] L. Lavagno and C. Passerone, “Design of embedded systems,” in Embedded Systems Handbook, R. Zurawski, Ed. CRC press, 2005, ch. 3. ¸˜ [9] R. S. de Oliveira, A. da Silva Carissimi, and S. S. Toscani, “Organizacao de sistemas operacionais convencionais e de tempo real,” in Jornadas Fig. 19. Fluxo de desenvolvimento da plataforma Hellfire de Atualizacao em Inform´ tica 2002, 2002, ch. 8. ¸˜ a [10] H. Hansson, M. Nolin, and T. Nolte, “Real-time in embedded systems,” in Embedded Systems Handbook, R. Zurawski, Ed. CRC press, 2005, ch. 2. Dentre essas caracter´sticas, destacou-se, principalmente, ı [11] R. A. Bergamaschi and W. R. Lee, “Designing systems-on-chip using cores,” in DAC ’00: Proceedings of the 37th conference on Designos Sistemas de Tempo Real, como sendo aqueles onde automation. New York, NY, USA: ACM Press, 2000, pp. 420–425. ¸˜o resultado de uma computacao depende n˜ o somente de a [12] R. A. Bergamaschi, S. Bhattacharya, R. Wagner, C. Fellenz, M. Muh-sua corretude l´ gica como, tamb´ m, do tempo em que o e lada, W. R. Lee, F. White, and J.-M. Daveau, “Automating the design of socs using cores,” IEEE Des. Test, vol. 18, no. 5, pp. 32–45, 2001.essa tarefa foi conclu´da. Conceitos importantes sobre sis- ı [13] K. K. Fellow, “Ieee transactions on computer-aided design of integratedtemas de tempo real, enfatizando-se a parte do escalona- circuits and systems, vol. 19, no. 12, december 2000 1523 system-levelmento tida como vital para que o leitor possa compreen- design: Orthogonalization of concerns and platform-based design.” [Online]. Available: citeseer.ist.psu.edu/756855.htmlder o sistema proposto. Ainda, brevemente discutiu-se o [14] W. Wolf, “How many system architectures?” Computer, vol. 36, no. 3, ¸˜uso da virtualizacao em sistemas embarcados, destacando-se as pp. 93–95, 2003.principais vantagens, problemas e casos de uso dessa t´ cnica. e [15] A. Jerraya, H. Tenhunen, and W. Wolf, “Multiprocessor systems-on- chips,” Computer, vol. 38, no. Issue 7, pp. 36– 40, July 2005. Na segunda parte do texto, o sistema Hellfire foi exposto e [16] S. Pasricha and N. Dutt, On-Chip Communication Architectures: Systemseu principal componente, o HellfireOS foi detalhadamente de- on Chip Interconnect. San Francisco, CA, USA: Morgan Kaufmannscrito. Esse sistema, tipicamente de tempo real, possui diversas Publishers Inc., 2008.caracter´sticas para auxiliar o desenvolvedor a aumentar a pro- ı [17] G. J. Popek and R. P. Goldberg, “Formal requirements for virtualizable third generation architectures,” Commun. ACM, vol. 17, no. 7, pp. 412–dutividade de seu desenvolvimento sem perder o compromisso 421, 1974.com as tarefas portadoras de requisitos temporais. Assim, foi [18] G. Martin, “Overview of the mpsoc design challenge,” in DAC ’06: ¸˜exposta a implementacao do modelo de tarefas existente nesse Proceedings of the 43rd annual conference on Design automation. New York, NY, USA: ACM Press, 2006, pp. 274–279.sistema operacional. Hellfire ainda pode ser considerado como [19] C. A. Waldspurger, “Memory resource management in vmware esxum sistema operacional que pode ser extens´vel e adaptado ı server,” SIGOPS Oper. Syst. Rev., vol. 36, no. SI, pp. 181–194, 2002.a diferentes arquiteturas. Tamb´ m foi exposto o fluxo de e [20] G. C. Buttazzo, “Real-time operating systems: The scheduling and resource management aspects,” in Embedded Systems Handbook, R. Zu-desenvolvimento adotado pela plataforma Hellfire (e todas as rawski, Ed. CRC press, 2005, ch. 12.ferramentas que a comp˜ em) no intuito de facilitar a vis˜ o o a [21] C. L. Liu and J. Layland, “Scheduling algorithms for multiprogrammingglobal do sistema. in a hard real-time environment,” Journal of the ACM, vol. 20, no. 1, pp. 46–61, 1973. Espera-se que com este trabalho seja poss´vel entender a ı [22] J. Lehoczky, L. Sha, and Y. Ding, “The rate monotonic schedulingimportˆ ncia do software embarcado nos sistemas atuais al´ m a e algorithm: exact characterization and average case behaviour,” IEEEde estimular que novos desenvolvedores sintam-se atra´dos ı Real-Time Systems Symposium, pp. 166–171, 1989. [23] W. H. Hesselink and R. M. Tol, “Formal feasibility conditions for earliest ´para os desafios atuais e futuros desta area em pleno desen- deadline first scheduling,” Tech. Rep., 1994.volvimento. [24] M. Andrews, “Probabilistic end-to-end delay bounds for earliest deadline first scheduling,” in In Proceedings of the IEEE INFOCOM 2000, 2000. R EFERENCES [25] W. Wolf, Computers as components: principles of embedded computing system design. San Francisco, CA, USA: Morgan Kaufmann Publishers [1] J.-M. Farines, J. da Silva Fraga, and R. S. de Oliveira, Sistemas de Inc., 2001. a ¸˜ Tempo Real. S˜ o Paulo-SP: Second Escola de Computacao, IME-USP, [26] S. Yoo, G. Nicolescu, L. Gauthier, and A. Jerraya, “Automatic generation 2000. of fast timed simulation models for operating systems in soc design,” in
  • DATE ’02: Proceedings of the conference on Design, automation and test in Europe. Washington, DC, USA: IEEE Computer Society, 2002, p. 620.[27] K. Popovici, X. Guerin, F. Rousseau, P. S. Paolucci, and A. Jerraya, “Efficient software development platforms for multimedia applications at different abstraction levels,” in Proceedings of the 18th IEEE/IFIP International Workshop on Rapid System Prototyping. Washington, DC, USA: IEEE Computer Society, 2007, pp. 113–122. [Online]. Available: http://portal.acm.org/citation.cfm?id=1263545.1263926[28] H. Shen and F. Petrot, “Novel task migration framework on configurable heterogeneous mpsoc platforms,” in Design Automation Conference, 2009. ASP-DAC 2009. Asia and South Pacific, Jan. 2009, pp. 733–738.[29] S. Subar, “Virtualisation to enable next billion devices,” Web, Available at http://www.embeddeddesignindia.co.in/ ART 8800576093 2800003 TA 7cb7532e.HTM. Accessed at 10 feb., 2009.[30] G. Heiser, “The role of virtualization in embedded systems,” in IIES ’08: Proceedings of the 1st workshop on Isolation and integration in embedded systems. New York, NY, USA: ACM, 2008, pp. 11–16.[31] A. Aguiar and F. Hessel, “Embedded systems’ virtualization: The next challenge?” jun. 2010.[32] ——, “Virtual hellfire hypervisor: Extending hellfire framework for embedded virtualization support,” in To appear in Quality Electronic Design (ISQED), 2011 12th International Symposium on, 2011.[33] XEN.org, “Embedded xen project.” Web, Available at http://www.xen.org/community/projects.html. Accessed at 10 ago., 2010.[34] W. River, “Wind river,” Web, Available at http://www.windriver.com/. Accessed at 2 oct., 2010.[35] V. VLX, “Real-time virtualization for connected devices,” Web, Avail- able at http://www.virtuallogix.com/. Accessed at 2 oct., 2010.[36] Trango, “Trango hypervisor,” Web, Available at http://www.trango.com/. Accessed at 2 oct., 2010.[37] XtratuM, “Trango hypervisor,” Web, Available at http://www.trango.com/. Accessed at 2 oct., 2010.[38] R. Le Moigne, O. Pasquier, and J.-P. Calvez, “A generic rtos model for real-time systems simulation with systemc,” in Design, Automation and Test in Europe Conference and Exhibition, 2004. Proceedings, vol. 3, Feb. 2004, pp. 82–87 Vol.3.[39] A. Aguiar, S. J. Filho, F. G. Magalhaes, T. D. Casagrande, and F. Hessel, “Hellfire: A design framework for critical embedded systems’ applications,” in ISQED. IEEE, 2010, pp. 730–737.[40] R. Le Moigne, O. Pasquier, and J.-P. Calvez, “A generic rtos model for real-time systems simulation with systemc,” in DATE ’04: Proceedings of the conference on Design, automation and test in Europe. Washing- ton, DC, USA: IEEE Computer Society, 2004, p. 30082.[41] L. Sha, “Rate monotonic analysis for real-time systems,” Computer, vol. 26, pp. 73–74, 1993.[42] G. L. Peterson, “Myths about the mutual exclusion problem,” Informa- tion Precessing Letters, vol. 12, no. 3, pp. 115–116, 1981.[43] G. Kane, MIPS RISC architecture. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1988.[44] O. Cores, “Plasma most mips i(tm) opcodes,” http://www.opencores.org.uk/projects.cgi/web/mips/, Accessed, September 2009, 2007.[45] S. J. Filho, A. Aguiar, C. A. M. Marcon, and F. P. Hessel, “High- level estimation of execution time and energy consumption for fast homogeneous mpsocs prototyping,” in RSP ’08: Proceedings of the 2008 The 19th IEEE/IFIP International Symposium on Rapid System Prototyping. Washington, DC, USA: IEEE Computer Society, 2008, pp. 27–33.[46] A. Mehran, A. Khademzadeh, and S. Saeidi, “Dsm: A heuristic dynamic spiral mapping algorithm for network on chip,” IEICE Electronics Express, vol. 5, no. 13, pp. 464–471, 2008.
  • CBSEC 2012- CES-School paper 96934 1 Introduction to embedded systems and platform- based design Marcio Seiji Oyamada, Alexandre Giron, João Angelo Martini more power. This is a key requirement in the embedded system Abstract— Embedded systems design is restricted by functional design, since most of the devices are powered by batteries. Theand non-functional requirements as performance, power and high power consumption will require constant recharging. Itenergy consumption, memory footprint, availability, reliability, will be an uncomfortable fact in some devices such as mobilecosts and design time. These requirements are very different fromthe usually found in the application development for the desktop- phones.based systems. The embedded systems are a multi-domain design, Fault tolerance is especially important in the criticalcomposed of digital and analog hardware and software embedded systems, such as train and aircraft controller andcomponents. The platform-based design intends to decrease the ABS brakes. In such cases, even if a component has failed, thedesign time using predefined hardware platform. Consequently, systems must be designed to mask it.the design effort concentrates in the mapping of functionalities to The restrictions of the ES must be satisfied in order tothe platform components. This short course provides anintroduction to embedded systems design. Additionally, the ensure a final product which meets the requirements. However,course will present a development board for embedded system the design of an ES must be done within the shortest timecalled Beagleboard. The main component of the Beagleboard is possible, because this is a critical factor and impacts directly inthe OMAP 3530 platform which has two processors, a general the acceptance and profit.purpose ARM Cortex A8 and a DSP processor C64x+. Advances in microelectronics enabled the development of innovative embedded systems. Increases in the integration Index Terms—embedded systems, platform-based design, capacity of transistors have allowed the development ofrequirements solutions with various components integrated into a single chip, such as processors, memory, analog and digital I. INTRODUCTION interfaces. These solutions are called system-on-chip (SoC), or MPSoC when the multiple processors are integrated on a chip.N owadays, is more and more common the use of electronic devices with an embedded processor. Thesedevices called embedded systems (ES) have some similarities The main advantages of SoCs are [2]: • Increases in the operation speed, because the communication between the processor and the otherswith the general purpose computer, but usually are design to components are performed on chip;execute specific tasks. Marwedel [1] defines an embedded • Reduction on the power consumption and size;system as an information processing system which is • Increases the reability compared to solutions with multi-integrated inside a product, and usually not visible to the user. ICs;Cell phones, avionics systems, car navigation systems are • Potentially low cost solutions.example of embedded systems. This paper presents some important topics in the embedded The development of hardware and software components for system design. Section II describes the key requirements in theembedded systems has some differences compared to the embedded system design. Section III presents the overalldevelopment of general purpose desktop software. In the ES design flow of an embedded system, and details the platform-development, the hardware layer need to be considered, based and IP-based design methodologies. In Section IV, somebecause it will be an important component to cover the MPSoC are described and Section V describes the OMAPrequirements like performance, power and energy 3530 platform. Section VI presents the conclusions.consumption, costs and design time. The performance is directly related to the power and energy II. EMBEDDED SYSTEM REQUIREMENTSrequirement. A high-performance processor tends to consume A. Performance and power consumption Manuscript received April 30, 2012. This work was supported in part byFundação Araucária and CNPq. The performance requirement depends on the application Marcio Seiji Oyamada, UNIOESTE – Curso de Ciência da Computação, characteristics. Multimedia applications require higherCascavel, PR, Brasil (e-mail: marcio.oyamada@unioeste.br). performance compared to word processing. The performance Alexadre Augusto Giron, UNIOESTE – Curso de Ciência da is directly related to the power consumption, since processorsComputação, Cascavel, PR, Brasil (e-mail: alexandre.giron@unioeste.br). Joao Angelo Martini, UEM- Departamento de Informática, Maringá, PR, with higher performance tend to consume more power.Brasil (e-mail: jangelo@din.uem.br).
  • CBSEC 2012- CES-School paper 96934 2 In battery-powered devices the requirements of peak ofpower consumption and energy consumption must be takeninto account. In order to increase the utilization time of anembedded system, it is necessary to reduce the energyconsumed. However, since each battery has a peak power thatit supports, the maximum power of the system is also animportant requirement. However, the decision betweenconsuming more in less time or decrease the powerconsumption and increase the processing time is not trivial.The energy (E) used by a device is the resulting from thepower consumed P over the time t, as shown in Equation 1. Fig. 1. Energy consumption as a factor of power and processing time[1] E = ∫ P dt (1) To improve the tradeoff between performance and power consumption, some processors have included some features to In [3] the dynamic power consumption in CMOS reduce the power consumption like as DVS (Dynamic Voltage(Complementary Metal-Oxide-Semiconductor) circuits, can be Scaling or frequency), Clock Gating and Power States.obtained by means of the Equation 2. The DVS technique reduces the frequency or voltage of the processor, thus, reducing its power consumption as stated in P = C x A x F x V² / 2 (2) the Equation 2. The Intel Xscale is an example of an embedded processor with the DVS, varying the operationwhere C is switching capacitance, A is switching activity, F is frequency from 150 MHz to 800 MHz and power ranging fromthe operation frequency and V is the voltage. In the CMOS 0.1 to 0.9 Watts as shown in Figure 2. However, is importantcircuits we also can consider the frequency as a linear to note that the performance varies according to the frequencyproportion of the voltage [4]. changes. Based on the Equation 2, it can be stated that changes in the 1200 1frequency has a quadratic impact on the power consumed by Watts MIPS 0,9the circuit. It has motivated the use of multiprocessor 1000 0,8architectures with small cores, where one can get the same 0,7 800performance, but reducing the total system power 0,6consumption. However it is important to note that the same 600 0,5performance is obtained only if the division of processing on Watts MIPS 0,4different processors is perfect. The parallelization of an 400 0,3application involves the communication and synchronization 0,2of multiple tasks, decreasing the efficiency. If these losses are 200 0,1not significant, the solution with multiple cores can be 0 0advantageous due to the reduction of power consumption. 150 600 800 Thus, we can reduce the energy consumption by decreasing Frequencythe power or decreasing the processing time. In the case of aprocessor, one can increase the frequency of operation to Fig. 2. Xscale frequency vs power consumption [5]complete the task faster, resulting in an increase in powerconsumption. Another solution is to decrease the frequency of The clock gating method is used in synchronous circuits andthe processor, reducing power consumption. However, this the basic idea is to disable portions of the circuit not bing used.choice would result in a longer processing time, increasing the Using the Equation 2, when the clock gating method is enabledoperating time t. This scenario is described in Figure 1, where the activity A becomes zero, and consequently the dynamicthe line P1 represents a processor with higher frequency and power consumption is also zero, resulting significant savingsthe line P2 represents a processor with a lower frequency. The in the power consumption.choice of which approach should be used is defined in the The use of different operational states is another low-powerdesign space exploration. In this case, the designer should technique used for embedded processors. This techniquecheck the requirements because some applications need to be defines different states to turn off some of the componentscompleted quickly, and others must be executed with the best when they are not required. The StrongARM processor [6] isbalance between power and processing time. an example of the use of this technique. The Run state provides full processor operation and, in Idle state, the CPU is disabled and the interrupts are monitored. In Sleep state, CPU is totally turned off, and only few interrupts like real-time clock are enabled. As show in the Figure 3, the Sleep state offers the greatest energy savings, but it spends more time to back to the Run state.
  • CBSEC 2012- CES-School paper 96934 3 Fig. 4 shows an ideal embedded system design flow. The system specification describes the behavior of the system under development. Software engineers see the system specification as a document that describes the required functionalities, using abstract representations. For instance, using the UML notation, a system may be represented by classes and use cases diagrams. Usually, in electronic design, an executable specification is used to represent system behaviour. Some languages have been proposed for electronic Fig. 3. Power State em um processador StrongARM [6] system specification, such as SystemC [8] and SpecC [9]. These languages are extensions of existing imperativeB. Development time, cost and design languages (for example, C++) and support hardware-oriented Another important requirement in an embedded system descriptions. However, the research community is nowdesign is the development time. These products usually have focusing on the use of more abstract specification languagesfactors like innovation and novelty making necessary the such as UML and Simulink.decreasing of the design time. Large profits of an innovativeproduct are achieved in the initial time of life cycle, wherenormally there is no competition of similar products. As thenovelty factor goes, profitability decreases and it requires thedevelopment of new innovative products. It pushes theindustry to develop new products reducing the design timeavailable. Embedded systems also suffer from seasonal factors, forexample in the case of video game consoles, where thelaunches are scheduled before the Christmas day. Cost is also an important requirement in the design, becausethe products for the consumer market are very sensitive to thefinal product price. Thus, any possible optimization to reducecost will be a competitive aspect. In high-volume products, anUS$ 1 reduction in product cost will have a significant impact. Embedded system has many different aspects compared todesktop computers, where the design is focused on userinterface. In an embedded system, another factor to beconsidered is the design of the product. Thus, the design mustconsider the size, shape, material used and input/outputinterfaces. All these aspects have a direct influence on the Fig. 4. Embedded system design flow[10]hardware and software to be developed. Architecture exploration uses the specification to define the III. DESIGN METHODOLOGIES golden architecture that covers application requirements in This section will present some aspects related to the design terms of performance, power consumption, energyflow of embedded systems. Initially, we will present the consumption, and area, among others.different abstraction levels that can be used in the embedded Due to strict requirements, usually the design of ansystem design. Additionally, we will describe the platform- embedded system involves the development of hardware andbased and component-based design, discussing the differences software. Thus, it is necessary to explore the design space tobetween these two approaches. obtain the best configuration of the hardware architecture for a A. Abstraction Levels given application. This phase helps designers in the detection Abstract descriptions provide a suitable way to manage and resolution of problems related to architectural design.design complexity, hiding implementation details that the According to Carro and Wagner [10], the exploration phasedesigner may want to leave out at some point. In consequence, will find the solution to three questions:the description is short, making its understanding easier. 1) How many and which processors and dedicated blocks The design flow must define different abstraction levels and are needed?refinement steps that lead to a final solution [7]. Ideally, the 2) What ideal mapping of functions on the hardwaredesigner would have the benefit of automatic refinement from components?higher abstraction levels to system implementation. The 3) What is the ideal communication infrastructure toembedded system design is quite complex and the available connect the components in the architecture?tools do not cover all design steps. From the architecture exploration step, a macroarchitecture
  • CBSEC 2012- CES-School paper 96934 4that represents the system in terms of software and hardware implemented by assembling hardware and software IPcomponents is obtained. Each component is then refined components available in a library or provided by third-partyfollowing the traditional hardware and software design flow. companies. Components should comply with a given protocol,For the software it includes the software and RTOS thus making their integration into the platform possible. Thegeneration. The hardware side includes the synthesis of reuse of pre-tested components reduces design time andspecific hardware components. The communication synthesis facilitates the verification of the solution in terms of expectedincludes the generation of necessary hardware and software system functionality and requirements.components to accomplish it. In some cases communication Component-based design requires a well-defined processprotocols may require hardware components such as co- involving IP creation, qualification, and classification[13] onprocessors and channel adapters, which are responsible for the IP provider side. On the client side, IP integration includesadapting the internal component bus to the interconnection the search process, validation, and final integration with thenetwork. platform. The integration step is highly influenced by the IP The final step includes the hardware and software distribution form. IP components may be distributed in hardintegration and the validation of the whole solution. form, when all gates and interconnects are placed and routed; soft form, with only an RTL representation; or, firm form, with B. Platform-based Design an RTL description together with physical floorplanning or The design of a new architecture involves non-recurring placement. Using hard IP components has the advantage ofengineering (NRE) costs that are not negligible in the overall yielding more predictable estimations of performance, power,cost of manufacturing and designing a SoC[11]. Due to these and area. However, they are less flexible and therefore lesscosts, developing a new architecture from scratch for each new reusable than adaptable components.product becomes unacceptable. Consequently, platforms are IP integration imposes problems due to the heterogeneousproposed to cover an application domain, and then tailored to and hard IP components. The bus-based approach usesa specific product. standard interconnection, to which the IP interface must Platform-based design [12] uses architecture templates to comply, following a plug-and-play integration. AMBA[14]obtain a solution called a derivative, by tailoring the platform and CoreConnect[15] are examples of standard buses availablefor a given application. Architecture templates are domain- in the market. When the source code is available, the IPspecific hardware platforms composed of processors, component may be changed and adapted for the targetmemories, hardware blocks, and communication structures. platform. Another solution is to construct a wrapper around theOccasionally, these components have some degree of component that adapts it to the bus or the interconnectionconfigurability, such as processor caches and memory sizes. network. Software IP components are standardized by the API Fig. 5 presents the overall platform-based design flow. The and target OS. OSEK[16] (for automotive systems) andplatform is defined from the past designs and the requirements ITRON[17] (for consumer electronics) are examples ofof a group of applications or domain. A solution is obtained domain-specific APIs.using the base platform and customizing according to the userneeds. This includes the software development, user interface IV. MPSOC ARCHITECTURESand hardware customizations. The increase in performance requirements and restrictions requirements past designs on the power consumption as shown in the previous sections, calls for solutions using multiple processors (homogeneous or heterogeneous) in embedded systems, called MPSoC platform (multiprocessor system-on-chip). MPSoC design opens many possible solutions in terms of processor architectures, IP components, and interconnection structure. The next sections user present the trade-offs, in terms of hardware for MPSoC. needs A. Processor Fig. 6 shows the market share for each type of embedded 32-bit processor. In contrast to personal computer processors, product the market, here, is shared among different architectures and Fig. 5. Platform-based design [12] manufacturers. These different architectures provide various options in terms of performance, power consumption, area, Platform-based design provides gains in terms of design and cost.time and cost. Application mapping to platform componentsmust be efficient and handled by system-level design tools. C. Component-based Design In component-based design the architectural template is
  • CBSEC 2012- CES-School paper 96934 5 MIPS Low-power techniques such as frequency/voltage scaling or 12% operation states as presented in Section I need the OS or Other 1% another supervisor component to control their use. Normally, for laptops, the processor dynamically adjusts the frequency/voltage based on application demand. However, Proprietary these techniques impact on processor performance and system 16% response. As a consequence, these techniques require an integrated application and OS design in order to not disturb the real-time behavior that is commonly required for embedded ARM applications. SuperH 57% 2% B. Memory x86 Memory design has an important impact on processor 3% performance and power consumption. For embedded PowerPC 68K processors, cache design is important because its influence on 6% 3% system power consumption represents about 50% of core Fig. 6. Market share of 32-bit embedded processors [18] power consumption (see Fig. 7). Processor microarchitecture design has an important impacton MPSoC quality. Microarchitecture optimization for a givenapplication includes pipeline configuration, branch prediction,and prefetch, among others. Processor data size is anotherdesign parameter, since embedded applications require aminimal size. Processor cores are available in differentversions of 8, 16, and 32 bits. Currently, most embeddedsoftware remains unchanged after product deployment, makingit possible to tune architectural parameters according to systemrequirements. Application-specific processors (ASIP) optimize the > 50%architecture by creating new instructions to efficiently executea given application. Commercial processors like Tensilica [19]are sold with an environment to analyze the application Ccode, in order to configure and derive the optimizedarchitecture. The multimedia domain is composed of processing-intensive applications and requires the use of moreperformance/power efficient architectures, such as digitalsignal processors (DSP). These processors optimize theexecution of DSP algorithms using MAC (multiply and Fig. 7. Processor power-consumption [20]accumulate) units, address generators, and Harvardarchitecture, among other features. DSP processors efficiently Fig 8 presents work done by Zhang et al. [20] showing theexecute digital signal processing algorithms and can run at low influence of cache size on global energy consumption. It canfrequencies compared to general-purpose processors (GPP), be seen that global energy is directly related to cache size.consequently decreasing energy consumption. Initially, when cache size increases, global energy decreases Very long instruction word (VLIW) processors also provide because of fewer memory accesses. However, after a givenan efficient architecture to execute processing-intensive point, the sheer influence of cache size dominates globalapplications, exploiting instruction-level parallelism (ILP) at energy consumption, despite a small number of memorycompilation-time. For this reason, VLIW processors do not accesses. The same scenario occurs for processor performancerequire the complex dispatch units and speculative techniques [21]. After a given point, an increase in cache size will notused in general purpose processors, since ILP is statically result in an increase in performance, because the applicationextracted. reaches a temporal and locality limit. The purpose of multithread architectures is to efficientlyexecute multithread applications by supporting fast contextswitch and concurrent execution. Fast context switch providesa way to hide memory latency by executing other threads whenmemory access occurs.
  • CBSEC 2012- CES-School paper 96934 6 5 uP M IO uP M Cache Memory T otal uP M uP 4 IP uP M (a) (b) (c) Energy(J) 3 Fig 9. Communication topologies (a) point-to-point, (b) bus-based 2 connection, and (c) network-on-chip 1 To improve reusability, communication interconnection is provided in the form of IP component[14] that must be 0 configured for a given application (for example, number of 128KB 256KB 512KB 16K B 32K B 64K B 1KB 2KB 4KB 8KB 1MB masters in a bus, switch buffer size in a NoC). This requires tools to explore the communication structure and to link Cache Size application QoS requirements to the real implementation. Fig 8. Cache size and its influence on system energy consumption[20] D. MPSoC Platforms Other techniques are available to decrease power MPSoC with heterogeneous processors have been proposedconsumption and execution time in memory hierarchies. to reduce energy consumption and increase performance inScratchpad or fast memories are small memories inside the specific tasks. A heterogeneous MPSoC usually has a generalprocessor core used to decrease access time and power purpose processor, which runs the operating system, and oneconsumption. The main difference with cache is that their or more special purpose processors like graphics or digitalcontents are directly loaded by the application, making the signal processors.programmer responsible for choosing which data and In this section some MPSoC platafoms will be presented.instructions are important in regards to fast memory. This Fig 10 shows the NovaThor[23] platform targeted to mobiletechnique makes execution time more predicable in phones and multimedia PDAs. Nova is a multimedia platformcomparison to caches, which can be polluted by other tasks. composed of a dual-ARM processor and audio and videoJain et al. [22] propose a technique to lock the cache lines, accelerators. Many I/O interfaces, such as an LCD controller,avoiding undesirable line substitution. In both techniques, USB, and flash card, are available.knowledge of application behavior is necessary to optimizescratchpad and cache use. C. Interconnection SoC interconnection design complexity is increasing due tothe number of components and sophisticated communicationschemes. Ad-hoc solutions cannot deal with concernsregarding flexibility and design time, which can only beaddressed by long-term solutions that can cope with futureMPSoC requirements. Point-to-point connections, shown in Fig 9(a), enabledesigns customized in terms of performance and predictability.However, design time and low reuse make point-to-pointinterconnections impracticable in future MPSoC designs. Current MPSoC designs commonly adopt bus-based (seeFig 9(b)) solutions. Due to scalability problems many Fig. 10. NovaThor U9500 architecturevariations, such as hierarchical buses and time-slicedarbitration, are proposed. OMAP 1610[24] is another example of a platform targeted The network-on-chip (NoC) approach represents a long- for use in multimedia mobile devices. The platform isterm solution for MPSoC design. A NoC, as shown in Fig. composed of two processors: a general-purpose processor9(c), provides the scalability and reuse necessary to future (ARM926), used to execute system-level tasks, and a DSPMPSoC designs. Predictability and real-time requirements call used for multimedia processing (see Figure 11). The SoC alsofor NoC solutions with quality-of-service (QoS) capabilities. integrates digital interfaces with external devices. An APICurrently, NoCs are a subject of intense research. However called OMAPI is provided to access the multimedia resourcesfew real designs exist, due to high latency and area overhead available in the DSP processor, thus abstracting the hardwarewhen compared to other interconnection solutions. architecture. The OMAP platform leaves the programmer responsible for detecting code suitable for execution in the DSP.
  • CBSEC 2012- CES-School paper 96934 7 V. PLATFORM OMAP 3530 The OMAP3530 [26] (Open Media Application Platform) is an MPSoC for mobile and portable multimedia developed by Texas Instruments (TI). Some examples of embedded devices using OMAP 3 platform are presented in Table I. TABLE I EMBEDDED DEVICES CONTAINING OMAP 3 PLATFORM Device Type MPSoC Touch Book Netbook OMAP 3530 Pandora Portable Video Game OMAP 3530 DevKit8000 Evaluation Kit OMAP 3530 BeagleBoard Evaluation Kit OMAP 3530 Motorola Milestone Smartphone OMAP 3430 Nokia N9000 Smartphone OMAP 3430 Fig. 11. Platform OMAP 1610 Samsung i8910 Smartphone OMAP 3430 Galaxy S Smartphone OMAP 3630 Figure 12 shows the Phillips Nexperia [25] platform. Droid 2 Smartphone OMAP 3630 Milestone 2 Smartphone OMAP 3630Nexperia is a heterogeneous platform composed of a general-purpose processor (MIPS), DSP processors, and varioushardware application-specific accelerators. The memorycontroller manages communication and is interconnected with The OMAP is an heterogeneous MPSoC and has a generaldifferent buses available in the platform. Bridges share purpose processor ARM Cortex A8, and a digital signalcommunication among the subsystems, avoiding overload of processor, DSP C64x+. These two processors will bethe memory controller. discussed in more detail in subsequent sections. A. ARM Cortex A8 ARM (Advanced Risc Machine) is a RISC (Reduced Instruction Set Computer) 32 bits architecture, targeting the embedded market [27]. The ARM Company holds the copyright of the architecture and the other companies wishing to produce the ARM processors needed to be licensed. The ARM is responsible for the evolution and development of new architectures. There are two license types: implementation and architecture. The implementation license provides all information required to produce integrated circuits containing the ARM processor. The architectural license gives the rights to develop a processor with an ARM compatible instruction set. Several factors made the ARM processor suitable for embedded systems. Its simple RISC architecture requires less transistors and consequently decreasing the footprint, costs and Fig 12. Phillips Nexperia PNX8550 power consumption. Programming models for MPSoC platforms have become a The ARM architecture has many features of typical RISCmajor issue, due to the programming complexity of architectures, but it is not entirely RISC. When the first RISCcoordinating platform elements. UHAPI is Nexperia’s abstract processors emerged, their objectives were reducing theprogramming model and is used for home applications based complexity of the instructions and obtain higher operationon use cases. UHAPI brings platform programming close to frequency and performance through the use of pipeline. Issuessoftware engineering models such as UML, by providing high- such as power consumption, size and low cost productionlevel use cases for the most common needs of home requirements were not the main objective, despite the RISCapplications. For instance, the API provides use cases to play architecture contribute to some of them because of theirDVDs, record movies, and so on. This represents an important simplicity.tendency because the value of the platform is not only In RISC processors, the executable size is larger whenattributed to the hardware solution, but also to the API that is compared to CISC architecture due to the need of moreprovided. instructions to represent the same behavior. In the case of
  • CBSEC 2012- CES-School paper 96934 8embedded devices, where memory requirements are strict, the ARM Cortex architecture is the NEON coprocessor. It hassome changes were made in ARM to reduce the memory SIMD(single instruction multiple data) instructions thatoccupied by the code. Sloss, Symes e Wright [28] describes operate on vectors of 128 bits, and can be used to speed upthe main differences between the RISC processors and ARM: processing multimedia audio and video, among others.  Different execution cycles in some instructions: the The ARM Cortex-A8 pipeline has 3 stages(see Figure 13): number of registers involved in execution and the type of • Fetch stage (3 cycles) memory access (sequential or random), interferes in the • Decode stage (4 cycles) number of cycles of each instruction; • Execution stage (6 cycle)  Shift preprocessing: before reaching the ALU, one operand can be modified using shift operations, expanding the operand before it is used. It reduces the code size;  Instruction set Thumb 16-bit: this increases the density of the code in approximately 30% when compared to fixed- length instructions of 32 bits;  Conditional execution: an instruction can be executed when a specific condition is satisfied. This avoids explicit branch instructions, improving performance and code density;  Specific instructions: some instructions were added for Fig. 13. Pipeline ARM Cortex A8 [29] digital signal processing as the integer multiplication of 16 bits operands with saturation; The execution stage of the pipeline has three functional  Multiple Load/Store: memory access can operate on units, two ALUs and one load/store unit (LS). This decreases multiple registers. the occurrence of pipeline stalls and enables the superscalar execution.1) Coprocessors One feature that makes the ARM processors suitable for B. DSP TMS320C64X+embedded systems is the possibility of extending the Analog signals represent physical quantities like pressureinstruction set using coprocessors. The coprocessors are and temperature that varies continuously over time. To treatspecial purpose processors, designed to extend the them computationally is necessary a digital representationfunctionality of the processor or improve performance for a format. A digital signal in this case is nothing more than agiven domain. The ARM core, for example, does not contain sequence of discrete states that encodes a message. In general,instructions for floating point, requiring the software emulation digital processing functions are mathematical operations inof these operations. However, a floating point coprocessor can real-time signals, repetitive and numerically intensive [30].be added in the architecture improving the performance of Digital Signal Processors (DSP) are specialized processorsapplications using floating point operations. Thus, a device that treat signals from various types (data, video, audio, etc.),that does not require floating point calculations can use an and has advantages in terms of performance, cost and powerARM processor unit without the floating point processing. consumption [31, 32]. In contrast to general purposeThis reduces the cost, size and the power consumption of the processors (GPP), they are developed to efficiently executeprocessor. On the other hand, a device that performs a large DSP algorithms [31]. Operations such as multiply-and-number of floating point operations may include a floating accumulate (MAC) and SIMD (Single Instruction Multiplepoint coprocessor. Data), which operate in parallel on a set of data, are commonly The ARM Cortex A8 has 16 coprocessors from CP0 to supported by DSP processors.CP15. The instructions to use the coprocessors are part of the Since the DSP processors are designed to efficiently executeARM instruction set. As consequence of this, all features algorithms for digital signal processing, its architecture isavailable in the coprocessors are accessed through assembler different from those found in GPPs. Older DSPs processorsinstructions, or instructions for the ARM core. were very different from general purpose processors, using As previously mentioned, the ARM architecture is licensed fixed-point arithmetic instead of floating point, and Harvardfor the manufacturing by third parties. The licensees then architecture instead of the traditional Von Neumannproduce its ARM processors, including more or less architecture [32]. Such differences no longer exist in the latestcoprocessors to ARM core, according to the application and generations of DSPs.the target market. By default, the compiler does not use the DSP processors now implement floating-point arithmetic asARM coprocessors to generate machine code, but it can be TMS320C67xTM generation of Texas Instruments (TI). Onsignaled to the compiler which coprocessors are available. the other hand, Harvard architectures that have separate caches2) ARM-Cortex A8 for data and instructions, are now used by many GPPs. The ARM Cortex-A8 architecture is based on ARMv7, and However, most DSP processors still use fixed point arithmetic,the operation frequency can vary from 600Mhz to 1Ghz, allowing the reduction of hardware complexity and energydepending on the processor model. One particular element of consumption. In DSP algorithms this characteristic is not a
  • CBSEC 2012- CES-School paper 96934 9problem because digital signals have a well defined range and on average. The minimum speed up achieved was 1.1 timescan be discretized with little or no loss of information. and the maximum of six times when compared to the The TMS320C64x+ processor, or just c64x+ is a high ARM9[24].performance processor for digital signal processing with fixed- The C64x+ processor has eight functional units (L1, L2, S1,point, manufactured by Texas Instruments (TI). It belongs to S2, M1, M2, D1 and D2) and are divided into two datapathsthe TMS320C6000 generation and has a VLIW with four units each (Figure 14). Each datapath has a set of 32architecture[33]. The specifications for the C64x+ are shown registers, and it is also possible to communicate with the otherin Table II. datapath with a penalty of few cycles. The format of the The C64x+ processor reaches the performance of up to instructions in c64x+ processor are based on RISC and VLIW4160 million instructions per second (MIPS). The architecture architectures. Each instruction has a fixed format of 32 bits,is composed 64 registers (32-bit) and eight general-purpose but the compiler can divide it into two parts of 16-bit wherefunctional units. With the maximum possible parallelization, possible. In a VLIW architecture, the compiler is responsibleeach functional unit will perform one of the eight instructions for the instruction ordering and schedule, in order to maximizein a VLIW instruction every clock cycle. the parallel execution. It is different compared to superscalar Making a comparison with a general-purpose processor that architectures where the processor manages the order of theexecutes one instruction per cycle, the nominal performance instructions and decide when they can execute in parallel dueshows that the DSP processor is eight times faster than the to the data dependencies. It results in savings of hardware andgeneral purpose processor. However, in real applications the energy consumption, because all work done by the hardware ingains are no more than four times. superscalar architectures is done by the compiler in VLIW Table III presents the results comparing the performance of architectures. Another advantage of the VLIW architecture isa DSP processor and the ARM9 RISC processor. The DSP that the optimization is performed only once in the compilationprocessor is a C55x and has similar features of the C64x +. and has no time constraint.The performance gains of the DSP processor are three times TABLE II C64X SPECIFICATIONS Clock Frequency 700 MHz Instructions per second Up to 5600 Million (MIPS) Cache L1: 256 Kb L2: 640 Kb Registers 64 General purpose Functional Units 2 multipliers 6 ULA Instructions fetched by cycles 8-14 instructions: 256 bytes Operands fetched by cycle 4-8 operands: 256 bytes Arithmetic Fixed-point arithmetic TABLE III ARM9 VS C55 PERFORMANCE ARM9E1 GPP StrongARM DSP Performance 11001 TMS320C55101 DSP/ARM2 Echo Cancellation 16-bit (32 ms - 8 kHz) 24 39 4 6x Echo Cancellation 32-bit (32 ms - 8 kHz) 37 41 15 2.46x MPEG4/H263 Decoding QCIF @ 15 fps 33 34 17 1.94x MPEG4/H263 Coding QCIF @ 15 fps 179 153 41 3.73x JPEG (QCIF) Decoding 2.1 2.06 1.2 1.71x MP3 Decoding 19 20 17 1.11x Proportional Average cycle to C5510TM 3.1 3 11 Performance in MIPS (millions of instruction pe second)2 DSP speed up compared to the ARM9 processor
  • CBSEC 2012- CES-School paper 96934 10 The execution stage is divided in five phases (E1-E5). Different types of instructions require different numbers of cycles for execution. A 16-bit multiplication requires two cycles. A store instruction requires three cycles, while a load instruction requires five cycles. Instructions may take more cycles due to stalls, which may occur during its execution. For instance, if an operand is not in cache, it should be fetched in the main memory, and consequently it will consume more cycles. Figure 16 shows a full pipeline in C64x+ DSP without stalls. Fig. 16. C64x pipeline The C64x+ supports also SIMD instructions, using vector of 128 bits. Table IV presents some SIMD instructions supported Fig. 14. C64x architecture overview [34] by C64x+ processor. Figure 17 shows also the result of instruction execution SADDU4, which adds 4 byte integers The format of the instructions is shown in Figure 15. The p with saturation. The saturation is performed directly infield bit determines whether an instruction can run in parallel hardware, as opposed to software implementation that needs atwith other instructions. The p-bits are read from right to left. If least 2 instructions: compare and assignment instruction.the p-bit of an instruction I is 1, then the instruction I+1 can beexecuted in parallel with the instruction I. If the p-bit of an TABLE IVinstruction is zero, then the instruction I +1 must be executed C64X+ SIMD INSTRUCTIONSafter the instruction I. Thus, up to eight instructions can be uint_saddu4(int src1, int src2); Performs saturated additionexecuted in parallel, where each one must use a different between two 8-bit unsignedfunctional unit. values in src1 and src2 double_mpy2(int src1, int src2); Returns the product of low and high 16-bit values in src1 and src2 int_subabs4(int src1, int src2); Calculates the absolute value of the subtraction between src1 Fig. 15. C64x+ Instruction format [34] and src2, for each packet of 8 bitsThe pipeline of the DSP c64x+ is divided in three stages[34]: uint_avgu4(uint src1, uint src2); Calculates the average for each pair of 8-bit values • Fetch (4 cycles) uint_swap4(uint src); Exchange pairs of bytes within • Decode (2 cycles) each 16 bit value • Execution (5 cycles) In the fetch stage a package of eight instructions is fetchedfrom the memory. The fetch stage has four phases for allinstructions: PG (program address generate), PS (Programaddress send), PW (program access ready wait) and PR(Program fetch, packet received). During the PG, the programaddress is generated. In PS phase, the program address is sentto the memory. In PW step, the memory read occurs. Finally,in stage PR, the fetched packet is received in the CPU. In thedecode stage an instruction package is divided into executable Fig 17. C64x+ SIMD instructions and saturation examplepackages. Executable packages are formed from one to eightinstructions that can be executed in parallel. The decode stagehas two phases: DP (Instruction dispatch) and DC (Instruction C. DSP Programmingdecode). During the DP, the instructions are attributed to In order to obtain the best performance in a DSP processor,functional units, and in the DC stage the registers are decoded the application must be developed considering the processorfor the execution inside the functional units. architecture. The performance improvement can be reach using SIMD or VLIW instructions, in order to execute
  • CBSEC 2012- CES-School paper 96934 11multiple instructions in parallel, using the multiple functional data from each operand vector are read from memory tounits available in the architecture. There are basically three registers. The saddu4 instruction adds 4 bytes and stores thesteps to develop a code for a DSP processor. result in the memory. The first step is to encode a standard C code, identifying the void sum_info(const unsigned char * restrict a,time-consuming parts. In DSP applications such parts are in const unsigned char * restrict b,loops. unsigned char *restrict c, In the second phase, the designer tries to optimize the code const int n) {in the time-consuming parts, passing to the compiler as much int i;information as possible. For instance, the number of minimum #pragma MUST_ITERATE (512, 1024, 8) for(i=0; i<n; i++){and maximum iterations of a loop, use of aligned data, and res[i] = a[i] + b[i];informing if a memory location can be addressed using }different pointers. With this information the compiler can, for } Algorithm 2- Sum with annotationsexample, make a loop unrolling, pack four instructions insingle SIMD instruction, or determine the instructions that can void sum_intrinsic(const unsigned char *restrict a,be executed in parallel. The programmer can also use intrinsic const unsigned char *restrict b,functions that are mapped directly to assembler instructions. unsigned char *restrict c, const int n )An example of intrinsic function is the _sadd4, which adds a 4 {byte integer with saturation and is translated into the assembly #pragma MUST_ITERATE ( 512/8 , 1024/8 ,8)instruction SADD4. One can also optimize the transfer for ( i =0; i <k / 8 ; i +=8){ unsigned int a1_a0 , a3_a2 ;between memory/cache and CPU using intrinsic instructions unsigned int b1_b0 , b3_b2 ;that read or write up to 8 bytes in memory in a single access. unsigned int c1_c0 , c3_c2 ; a3_a2 = _ hi ( _amemd8_const (&a[i])); The last step consists in the writing assembler code for the /*higher part of 8 bytes */main functions if the desired performance is not reached. a1_a0 = _ lo ( _amemd8_const (&a[i]));Coding in assembler language is possible to decide in which /*lower part of 8 bytes */ b3_b2 = _ hi ( _amemd8_const (&b[i]));functional units a given instruction will execute. This step b1_b0 = _ lo ( _amemd8_const (&b[i]));relies heavily on the developers knowledge about the /* 4 bytes sum intrinsics instruction*/ c3_c2 = _saddu4 (a3_a2, b3_b2);architecture. c1_c0 = _saddu4 (a1_a0, b1_b0); In the c64x+ DSP, a compiler flag activates a feedback that _amemd8(&c[i]) =_itod(c3_c2,c1_c0);informs the used resources, the pipeline depth and number of /*packs to integer into a double and stores in the memory */execution cycles. The optimization is made using the upper }bounds of loops, where the parallel execution can use the }efficiently the resources of the architecture. Algorithm 3- Vector addition using intrinsic To examine how performance can be improved DSP we willanalyze three different codes that performs the 8-bit integer The compiler feedback for the three algorithms is presentedvector addition[35]. in Table V. The values are the maximum performance that can The first function is a standard C for the vector addition, be achieved and considers ideal situations. For instance, theand is presented in Algorithm 1. It can be compiled in any C execution cycles is calculated considering that data is alwayscompiler and no extra information is passed to the compiler. present in the cache. The performance of the function sum_intrinsic was about 7.5 faster than sum_info. Comparedvoid sum(unsigned char *a, unsigned char to the standard Algorithm 1, the sum_info is 2 times faster. *b,unsigned char *res, int n) Table V presents also the following results:{ int i; • Loop unrolling: indicates how many times the original for(i=0; i<n; i++){ loop was unrolled; res[i] = a[i] + b[i]; • Minimum and Maximum number of iterations: indicates }} the number of loop executions; Algorithm 1- Standard C vector addition • Total cycles: number of cycles in terms of iterations. TABLE V COMPILER FEEDBACK FOR THE SUM ALGORITHM[35] In the Algorithm 2, some information is passed to thecompiler. The keyword const indicates that the operands Sum Sum_info Sum_intrinsicvectors a and b will not be changed inside the function. The Loop unrolling Not 2x 2xkeyword restrict indicates that there no other pointer accessing Min iterations Unknown 256 32the location. The pragma MUST_ITERATE tells to the Max iterations Unknown 512 64compiler the minimum and the maximum amount of Total cycles 8+iterations*2 8+iterations*3 6+iterations*3executions of the loop. The last parameter is the multiply and Cycles n=512 1030 777 192 Cycles n=1024 2054 1545 198indicates that the loop will always iterate a multiple of thismany times. Using this extra information the compiler cansafely unroll the loop. The Algorithm 3 uses intrinsic instructions that perform 8sum operations in each loop iteration. In this case, 8 bytes of
  • CBSEC 2012- CES-School paper 96934 12D. DSPBridge- ARM-DSP communication In order to finish the node life cycle, the function As described in the previous sections the ARM and DSP DSPNode_Delete will release the all resources in the DSP andprocessors have different goals in OMAP 3530 platform. ARM side.While the first is a general purpose processor which runs theoperating system, the second is dedicated to digital signal E. Beagleboard development kitprocessing in real time. The ARM is considered the main The BeagleBoard is a development kit that uses OMAP3530processor or host, while the DSP can be considered as a platform. The Beagleboard measures approximately 3"x 3",coprocessor. The ARM and the DSP processor has a specific allowing the development of prototypes of small size. Anotheroperating system for resources management, and the feature of the board is low power consumption around 2W.connection between the two OSs is made by DSP Bridge [36]. The consumption is dependent of the number of peripheral The DSPBridge [37] driver provides communication and devices attached to the USB port.control functions of the DSP processor. In the ARM side, the The BeagleBoard supports multiple operating systems andDSPBridge API is used in the following tasks: Linux distributions like Angstrom, Debian, Ubuntu and • start tasks in the DSP processor; Android, adapting to the different products requirements. Most • send and receive messagens to/from the DSP; of these distributions provides applications and libraries • create and use streams to data transfer to/from the repositories in binary format. The application includes DSP; graphics processing, window managers and compilers. It eases • dynamic memory mapping in the DSP address space; the development of new applications and also allows the • stop, restart and delete DSP tasks. porting of applications already developed for other architectures. The file system containing the operating system In the DSP side, the API enables the messages exchanges and applications are stored on the SD card. Figure 19between the DSP and ARM processors, and the use of streams. summarizes the resources of the BeagleBoard C4 version.The DSP applications are abstracted as execution nodes. The The application development for the BeagleBoard can benodes can be charged at the boot time or in the execution time made by compiling and testing applications directly on theusing the DSPBridge API. board. However, due to the restricted resources, another The DSP node states are presented in the Figure 18. A DSP solution is to use cross-compilers for the ARM processor,node starts the execution when an ARM task calls the allowing the development using a desktop.DSPNode_Allocate function. This function is responsible to The development of DSP modules should be performed on aallocate the data structure to enable the control and desktop, since it requires a C6000 compiler from Texascommunication of the node. In this state the node is allocated Instruments. This compiler is free and available for Linux andjust in the ARM side. After this, the function Windows operating systems. After compiling the code, it mustDSPNode_Create, will create the node in the DSP side. be transferred to the SD card. When the ARM processor executes the DSPNode_Run, itwill start the node execution into the DSP. It is possible to Processors - ARM Cortex-A8 - 720 Mhz (RISC)suspend a node using the function DSPNode_Pause, and also - TMS320C64x+ - 520MHz (DSP)resume the execution using the DSPNode_Run command Memory - 256MB DDR RAMagain. - 256MB NAND flash memory Peripheral and - HDMI connections - S-Video - USB - I/O audio stereo - RS232 - Connector JTAG Storage - Slot SD Fig. 18. Life cycle of a DSP node[38] The node will change to the state Done, after the processingis finished or if the ARM calls a DSPNode_Terminatefunction. Fig. 19. Beagleboard C4
  • CBSEC 2012- CES-School paper 96934 13 VI. CONCLUSIONS [20] C. ZHANG, F. VAHID and R. L. LYSECKY, “A Self-Tuning Cache Architecture for Embedded Systems,” in: Proc. DESIGN This paper presented an introduction to embedded AUTOMATION AND TEST IN EUROPE, DATE, 2004, Paris, France system, its requirements and main characteristics. Also we Los Alamitos: IEEE Computer Society Press, 2004. p. 142-147. described the steps involving the embedded system design, [21] J. HENNESSY and D. PATTERSON. Computer Architecture: A Quantitative Approach. 3th ed. San Francisco: Morgan Kauffman, 2002. and some methodologies currently applied. [22] P. JAIN, S. DEVADAS, D. ENGELS and L. RUDOLPH, “Software- The embedded applications have heterogeneous assisted cache replacement mechanisms for embedded systems,” in: characteristics. In order to cope with the different Proc. INTERNATIONAL CONFERENCE ON COMPUTER AIDED requirements, chip manufacturers have provided DESIGN, ICCAD, 2001, San Jose, USA. New York: ACM Press, 2001. p. 119-126. heterogeneous multiprocessor platforms. This paper [23] NovaThor Platform- U9500. Available: described the platform OMAP 3530, detailing the key http://www.stericsson.com/products/u9500-novathor.jsp aspects of the ARM and DSP processors, showing the main [24] OMAP™ Technology Overview White paper, Texas Instruments, Inc. differences in the development of such systems in relation to Dallas, TX, 2000. [25] K. GOOSSENS, et al. “Service-Based Design of Systems on Chip and implementation of desktop applications. Networks on Chip”. In: VAN DER STOK, P. (Ed.), Dynamic and Robust Streaming in and Between Connected Consumer-Electronics Devices. [S.l.]: Springer, 2005. p. 37-60. REFERENCES [26] OMAP 3530/25 Applications Processor White paper, Texas Instruments, Inc. Dallas, TX, 2009.[1] P. MARWEDEL, “Embedded System Design”. Netherland: Springer, [27] ARM Holdings Company Profile. Available: 2006. http://www.arm.com/about/company-profile/index.php[2] G. MARTIN and H. CHANG, "System-on-Chip design," in Proc. 4th [28] A. SLOSS, D. SYMES and C. WRIGHT, “ARM System Developer’s International Conference on ASIC, pp.12-17, 2001 Guide: Designing and Optimizing System Software”. San Francisco,[3] T. GIVARGIS and F. VAHID, “PLATUNE: A Tuning Framework for CA, USA: Morgan Kaufmann Publishers Inc. 2004. ISBN 1558608745. System-on-a-Chip Platforms,” IEEE Transactions on Computer – Aided [29] Architeture and Implementation of the ARM® Cortex™-A8 Design of Integrated Circuits and Systems, Vol. 21, No. 11, 2002, p. Microprocessor White paper, ARM Ltd. Cambridge, UK, 2005. 1317-1327. [30] E. J. TAN and W. B. HEINZELMAN, “DSP architectures: past, present[4] T. SIMUNIC, L. BENINI, A. ACQUAVIVA, P. GLYNN and G. DE and futures”. SIGARCH Comput. Archit. News, ACM, New York, NY, MICHELI, “Dynamic Voltage Scaling and Power Management for USA, v. 31, p. 6–19, June 2003. ISSN 0163-5964. Available: Portable Systems”. in Proc. Design Automation Conference. June, http://doi.acm.org/10.1145/882105.882108 2001, Las Vegas. [31] J. EYRE and J BIER, “The evolution of DSP processors”. Signal[5] Intel® XScale™ Microarchitecture Technical Summary, Intel Processing Magazine, IEEE, v. 17, n. 2, p. 43 –51, mar 2000. ISSN Corporation, 2000. 1053-5888.[6] L. BENINI, A. BOGLIOLO and G. DE MICHELI. “A Survey of Design [32] Y. MOSHE and N. PELEG, “Implementations of h.264/avc baseline Techniques for System-Level Dynamic Power Management”. IEEE decoder on different digital signal processors”. In: ELMAR, 2005. 47th Transactions on Very Large Scale Integration (VLSI) Systems, Boston, International Symposium. [S.l.: s.n.], 2005. p.37 – 40. v.8, n. 3, p. 299-316, June 2000. [33] Fixed-Point Digital Signal Processor Texas Instruments, Inc. Dallas,[7] S. EDWARDS, L. LAVAGNO, E. A. LEE and A. SANGIOVANNI- TX, 2009. VICENTELLI. “Embedded Systems Design: Formal Models, [34] TMS320C64x/C64x+ DSP: CPU and Instruction Set – Reference Validation, and Synthesis”. Proceedings of the IEEE, New York, v. 85, Guide, Texas Instruments, Inc. Dallas, TX, 2009. n. 3, p. 366-390, Mar. 1997. [35] D. R. HACHMANN, “Distribuição de tarefas em MPSoC Heterogêneo:[8] SystemC 2.0.1 Language Reference Manual, Open SystemC Initiative, estudo de caso no OMAP3530”. Trabalho de Conclusão de Curso: San Jose, CA, 2003. Ciência da Computação, UNIOESTE, 2011.[9] SpecC Language Reference Manual, Copyright © R. Domer, A. [36] Developing Core Software Technologies for TI’s OMAP Platform, Gerstlauer, D. Gajski, 2002. Texas Instruments, Inc. Dallas, TX, 2002.[10] F. WAGNER and L. CARRO, “Sistemas Computacionais Embarcados,” [37] DSP/BIOS™ Bridge Integration Document. Texas Instruments, Inc. In XXII Jornadas de Atualização em Informática. Campinas: Dallas, TX, 2006. UNICAMP, 2003, v. 1, p. 45-94.[11] P. MAGARSHACK and P. PAULIN. System-on-Chip Beyond the Nanometer Wall. In: Proc. DESIGN AUTOMATION CONFERENCE, DAC, 40., 2003, Anaheim, USA. New York: ACM Press, 2003. p. 419- 424.[12] K. KEUTZER, S. MALIK, R. NEWTON, J. RABAEY and A. SANGIOVANNI-VICENTELLI. “System Level Design: Orthogonalization of Concerns and Platform-Based Design,” IEEE Transactions on Computer-Aided Design of Circuits and Systems, New York, v. 19, n. 12, p. 1523-1543, Dec. 2000.[13] F. R. WAGNER, W. CESARIO, L. CARRO and A.A. JERRAYA. “Strategies for the Integration of Hardware and Software IP Components,” Embedded Systems-on-Chip. Integration - the VLSI Journal, Amsterdam, v. 37, n. 4, p. 223-252, Sept. 2004.[14] AMBA™ Specification (Rev 2.0), ARM Ltd., 1999.[15] The CoreConnect™ Bus Architecture, IBM Microeletronics, 2006.[16] OSEK/VDX Operating System Specification 2.2.3, Continental GmbH, 2005.[17] µITRON 4.0 Specification, TRON Association, 2002.[18] Embedded processors market, IDC. Available: <http://www.idc.com >. Visited on: June 2007.[19] Xtensa® Instruction Set Architecture (ISA) Reference Manual For All Xtensa Processor Cores, Tensilica Inc. Santa Clara, CA, 2010.
  • Introdução aos Sistemas Embarcados utilizando FPGAs Edilson Reis Rodrigues Kato Emerson Calos Pedrino Universidade Federal de São Carlos Universidade Federal de São Carlos São Carlos, Brasil São Carlos, Brasil e-mail: kato@dc.ufscar.br e-mail: emerson@dc.ufscar.brResumo— O Curso de “Introdução aos Sistemas Embarcados implementados de várias maneiras nas FPGAs, assim, essesutilizando FPGAs” promovido pela CES-School (Escola de podem ser projetados de maneira convencional utilizando-seSistemas Embarcados Críticos) em conjunto com o 2º CBSEC elementos de lógica digital na forma de esquemáticos,(Congresso Brasileiro de Sistemas Embarcados Críticos) temcomo objetivo fornecer uma visão introdutória ao aluno, de estabelecendo-se a quantidade de bits do processador e osFPGAs (hardware), e de como podem os circuitos ser periféricos necessários ao projeto, e também através de umprogramados (esquemático e linguagens de programação de processador básico existente agregar qualquer elemento ouhardware), de quais circuitos podem ser explorados, e das periférico, ou utilizar um processador (core) básico pronto.várias formas de embarcar um Sistema Computacional, O presente curso tem como objetivo fornecer uma visãoauxiliando o usuário a ponderar a melhor forma de embarcar introdutória ao aluno, de FPGAs (hardware), e de comoo sistema de acordo com o projeto a ser especificado. Nolaboratório, será utilizada a placa DE1 da Altera, permitindo podem os circuitos ser programados (esquemático eao usuário a implementação de um sistema computacional linguagens de programação de hardware), e quais circuitossimples, utilizando o SOPC Builder com o auxílio do NIOS II, podem ser explorados, e as várias formas de se embarcar umum processador softcore de 32bits embarcado na FPGA. Sistema Computacional, auxiliando o usuário a ponderar a melhor forma de embarcar o sistema de acordo com oPalavras Chave- FPGA, Sistema Computacional Embarcado, projeto a ser especificado.hardware No laboratório, será utilizada a placa DE1 da Altera, permitindo ao usuário a implementação de um sistema I. INTRODUÇÃO computacional simples, utilizando-se o SOPC Builder com o auxílio do NIOS II (Altera), um processador softcore de Uma FPGA (Field-Programmable Gate Array) é um 32bits embarcado na FPGA, e serão propostos tambémcircuito integrado reprogramável ou reconfigurável exercícios em classe, os quais poderão ampliar o sistemacomposto por vários componentes básicos de circuitos inicial visando o melhor aprendizado do aluno [5,6].lógicos, além de outros blocos de circuitos mais complexos,tais como DSPs (Digital Signal Processor), memórias,PLLs (Phase Locked Loop), etc. Esses circuitos podem ser A. Tópicos a serem abordados no curso:vistos como componentes padrões, os quais podem ser  Introdução a FPGAsconfigurados e conectados independentemente a partir de o Hardwareuma matriz de trilhas e conexões programáveis pelos o Softwareusuários [1,2].  Sistemas Embarcáveis em FPGAs o DSPs A programação da FPGA utiliza um conjunto de o Microcontroladoresferramentas de software associado ao fluxo de projeto, o Microcompuradoresfornecendo ao desenvolvedor um nível de abstração que o NIOS IIpermite se concentrar no algoritmo a ser implementado. A o Dedicadoprogramação é realizada através de linguagens de  Implementação Exemplo (Laboratório)programação de hardware (HDLs), tais como, Verilog- o Altera DE1 BoardHDL, VHDL, ou outras linguagens de modelagem de o NIOS IIsistemas onde o ciclo de projeto de circuitos em FPGAs o Assemblertrata da especificação, implementação e verificação [3,4]. o Implementação de um computador simples em FPGA Um sistema dedicado pode então ser embarcado em uma  ExercíciosFPGA de forma que os recursos do projeto sejamotimizados e sua forma de implementação possua grande Espera-se dessa forma que o aluno esteja apto a realizar oflexibilidade. projeto de um sistema computacional embarcado em uma Sistemas computacionais dedicados podem ser FPGA e configurá-lo conforme suas necessidades.
  • II. CONCEITOS BÁSICOS Na Figura 2, apresenta-se uma ilustração do processo de configuração de um PLD. Na figura, é possível ver todos os A tecnologia de PLDs (Programmable Logic Devices), passos do processo de projeto desde a criação do mesmotais como FPGAs, CPLDs, entre outros dispositivos, é através de diagramas esquemáticos ou linguagens deextremamente poderosa para projeto de sistemas digitais nos descrição de hardware, passando pela geração do arquivodias de hoje. Assim, tal dispositivo pode ser definido netlist, o mapeamento, o posicionamento e o roteamento dobasicamente como sendo um circuito integrado (arranjo de chip, até a geração do arquivo binário de configuração finalportas lógicas) usado para implementar circuitos digitais do dispositivo [7].onde este pode ser configurado e reconfigurado pelo usuáriofinal através de um software específico fornecido pelo seufabricante. Os dispositivos atuais podem lidar com qualquer tarefacomputacional e alguns já possuem no mínimo uma CPUembutida. As Técnicas de programação para essesdispositivos variam de HDLs à linguagens de alto nível taiscomo Handel-C e Streams-C. Também, alguns dispositivosjá possuem Capacidade de Reconfiguração dinâmica. Comoexemplos de aplicações, podem-se citar: processamentodigital de imagens, reconhecimento de padrões, criptografia,experimentos em sala de aula, etc. A tecnologia de programação de PLDs dita se asinterconexões do chip são feitas por transistores comandadospor células SRAMs, transistores EEPROM, fusíveis,multiplexadores, etc. Dependendo da aplicação a serexplorada, o dispositivo poderá ter alta granularidade,contendo LUTs (Lookup Tables), ou poderá ser degranularidade fina, contendo, por exemplo, mais elementos Fig. 2. Processo de projeto de um PLD.de lógica combinacional. Todas as interconexões discutidas gerarão atrasos emrelação a um simples contato metálico utilizado nasinterconexões de um MPGA (Mask Programmable Gate III. IMPLEMENTAÇÃO PRÁTICAArrays), por exemplo. Também, em CPLDs os atrasos sãomais previsíveis do que em FPGAs (interconexões A primeira parte da implementação prática consiste emsegmentadas). Em relação aos tamanhos dos blocos, por implementar um processador com poucos periféricosexemplo, um bloco maior implicará em maior desperdício embarcados na FPGA (o sistema inicial contém apenaspara implementar funções mais simples. alguns componentes): um processador, memória e alguns As categorias comerciais de FPGAs são dividadas periféricos simples de entrada e saída.basicamente, independentemente do fabricante, em: arranjo Em seguida serão implementadas e testadas outrassimétrico, baseada em linhas, PLD hierárquico, e mar de funcionalidades no sistema computacional criado de formaportas. Na Figura 1, é apresentado um exemplo dessas que se aproveite os recursos existentes na placa DE1. A partecategorias. prática do curso foi baseada no Tutorial “Introduction to the Altera SOPC Builder” obtido do site da Altera [1]. Para a implementação do sistema computacional será utilizado o SOPC Builder em conjunto com o Quartus® II e o processador NIOS II embarcado na FPGA. Dessa forma o usuário deve ter instalado em seu computador o Quartus II 10.1, o NIOS II 10.1, e o Altera Monitor Program, além de possuir a placa de desenvolvimento DE1.[8-13]. A. Primeira Parte – Sistema Computacional Dedicado Simples Para a implementação de um sistema computacional dedicado simples, o usuário poderá utilizar o processador NIOS II embarcado na FPGA. A partir dele, são conectados Fig. 1. Categorias comerciais de FPGAs. os periféricos desejados para se estabelecer sua
  • funcionalidade. A Figura 3 ilustra o sistema a ser será realizado por um programa armazenado na memória aimplementado. bordo do chip onde o Nios II será o responsável em executar O processador Altera Nios II é um processador de 32 o programa.bits que pode ser instanciado em um chip FPGA Altera. SOPC Builder será utilizado para projetar o hardwareTrês versões do processador Nios II estão disponíveis: descrito na Figura 3. Em seguida atribui-se os pinos da sérieeconômico (/e), padrão (/s), e rápido (/f). O Sistema Cyclone II FPGA para realizar as conexões entre asComputacional a ser implementado trata do Nios II versão interfaces paralelas e os interruptores e LEDs que/e. Uma maneira fácil de começar a trabalhar com o Sistema funcionam como dispositivos I/O. O sistema projetado éComputacional e com o processador Nios II é fazer uso de compilado e descarregado na placa de desenvolvimento eum utilitário chamado de Programa Monitor Altera. Este finalmente, utiliza-se a ferramenta de software chamadautilitário proporciona uma maneira fácil de montar e Altera Program Monitor para compilar, linkar, descarregar ecompilar programas Nios II no Sistema Computacional que executar o programa no hardware do Nios II para realizar asão escritos em qualquer linguagem de montagem ou tarefa desejada. Figura 3 – Sistema Computacional Simples a ser implementadolinguagem de programação C. O Programa Monitor, o qual Os passos para esta implementação podem ser resumidospode ser adquirido a partir do site da Altera, é uma a seguir:aplicação que é executada a partir de um computador host • Utilizar o Construtor de SOPC para projetar umconectado à placa de DE1. O Programa Monitor pode ser sistema baseado no Nios IIusado para controlar a execução de código no Nios II, listar • Integrar o sistema projetado Nios II em um projeto(e editar) o conteúdo do processador, registros, editar o Quartus IIconteúdo de memória na placa de DE1 e operações • Implementar o sistema projetado na placa DE1similares. • Execução de um programa de aplicação sobre o No sistema exemplo da Figura 3, iremos conectar oito processador Nios IIinterruptores SW7-0 para ligar ou desligar os oito LEDsverdes, LEDG7-0 da placa DE1. Os switches são A.1 - Altera’s SOPC Builderconectados ao Nios II por meio de uma interface I / Oparalela configurada para atuar como uma porta de entrada. O SOPC é a ferramenta utilizada em conjunto com oOs LEDs são acionados pelos sinais de outra interface de software CAD Quartus II. Ele permite ao usuário facilmenteI/O paralela configurada para atuar como uma porta de criar um sistema baseado no processador Nios II,saída. Para a realização da operação desejada, os oito bits simplesmente selecionando as unidades funcionaispadrão correspondente ao estado dos interruptores tem que desejadas e especificando seus parâmetros. Paraser enviados para a porta de saída para ativar os LEDs. Isso
  • implementar o sistema na Figura 3, temos que instanciar as 4. O processador Nios II é executado sob o controle deseguintes unidades funcionais: um relógio. Para este curso iremos fazer uso do clock de 50 MHz que é fornecido na placa DE1. Conforme mostrado na • Nios II, que é referida como uma Unidade Central Figura 4, é possível especificar os nomes e freqüência de de Processamento (CPU). sinais de clock no visor SOPC Builder. Se já não estiver • memória on-chip, que consiste em blocos de incluído nesta lista, especifique um relógio chamado clk_0 memória no chip Cyclone série II utilizado na placa com a fonte externa de freqüência 50,0 MHz. DE1, vamos especificar 4-Kbytes de memória dispostos em palavras de 32 bits. 5. Em seguida, especifique o processador da seguinte • Duas paralelas interfaces I / O forma, no lado esquerdo da janela na Figura 6 expanda • Interface JTAG UART para comunicação com o Processadores, selecione Nios II Processor e clique em computador host. Adicionar, o que leva à janela na Figura 5. Escolha Nios II /e que é a versão mais simples do Para definir o sistema desejado, inicie o software processador. Clique em Concluir para retornar à janela daQuartus II e execute os seguintes passos: Figura 4, que agora mostra o processador Nios II específico, como indicado na Figura 6. Pode haver alguns avisos ou 1. Criar um novo projeto Quartus II para o seu sistema mensagens de erro exibidas na janela SOPC Buildercom o nome pratica1 em um diretório chamado (Mensagens na parte inferior da tela), pois algunscurso_FPGA. Escolha a FPGA da placa DE1, ou seja, o parâmetros ainda não foram especificados. Ignorar essasCyclone II EP2C20F484C7. mensagens pois iremos fornecer os dados necessários mais tarde. 2. Selecione Ferramentas SOPC Builder. Digitenios_system como o nome do sistema, este será o nome do 6. Para especificar a memória on-chip faça o seguinte:sistema que a SOPC Builder irá gerar. Escolha Verilog. • Selecione Memories and Memory Controllers >Clique em OK para alcançar a janela na Figura 4. On-Chip > On-Chip Memory (RAM or ROM) e clique em Adicionar 3. A Figura 4 apresenta o guia de montagem do sistema • Na janela On-Chip Memory, mostrada na Figuracomputacional no SOPC Builder, que é usado para adicionar 7, definir a Data width para 32 bits e o tamanho total dacomponentes para o sistema e configurar os componentes memória com 4 Kbytes (4096 bytes) (clique Enter)selecionados para atender às exigências do projeto. Os • Não altere as outras configurações padrão, cliquecomponentes disponíveis estão listados no lado esquerdo da em Concluir.janela. Figura 4 – Interface de construção do Sistema computacional no SOPC
  • Figura 5 – Configuração do processador Nios II.Figura 6 – Definição do processador na placa DE1.
  • Figura 7 – Definição da on-chip memory 7. Especificar a entrada paralela I / O interface comosegue: • Selecione Peripherals > Microcontroller Peripherals > PIO (Parallel I/O) e clique em Adicionar para abrir a janela de configuração da Figura8. • Especifique a largura da porta em 8 bits e escolha a porta como entrada, como mostrado na Figura 8. Clique em Concluir. Figura 8 – Definição da interface paralela de entrada. 8. Da mesma forma, especificar a interface de saída 9. Queremos conectar a um computador host e fornecerparalela I / O: um meio de comunicação entre o sistema Nios II e o • Selecione Peripherals > Microcontroller computador host. Para isso devemos instanciar a interface Peripherals > PIO (Parallel I/O) e clique em JTAG UART como segue: Adicionar para abrir a janela de configuração da PIO • Selecione Interface Protocols > Serial > JTAG novamente. UART e clique em Adicionar para abrir a janela de • Especifique a largura da porta a ser 8 bits e escolha configuração JTAG UART da Figura 9. a porta como saída. • Não alterar as configurações padrão. • Clique em Concluir para retornar. • Clique em Concluir para retornar.
  • Note que a SOPC Builder escolhe automaticamente nomes para os vários componentes. Os nomes não são necessariamente descritivos o suficiente para serem facilmente associados com o projeto, mas eles podem ser mudados. Na Figura 3, usamos os nomes Switches e LEDs para as interfaces paralelas de entrada e saída, respectivamente. Esses nomes podem ser usados no sistema implementado. Clique com o botão direito do mouse no nome pio_0 e selecione Renomear. Mudar o nome para Switches. Da mesma forma, mudar pio_1 para LEDs. 11. Os endereços de base e final dos diversos componentes do sistema projetado podem ser atribuídos pelo usuário, mas também podem ser atribuídos automaticamente pelo SOPC Builder. Vamos escolher a última possibilidade. Então, selecione o comando (usando os menus no topo da janela do SOPC) System > Auto- Assign Base Addresses , para que o SOPC estabeleça os endereços conforme a Figura 10. 12. O comportamento do processador Nios II quando é resetado é definido pelo seu vetor de reset. É o local no dispositivo de memória que o processador busca a próxima instrução quando é resetado. Da mesma forma, o vetor de exceção é o endereço de memória que o processador vai para quando uma interrupção é gerada. Para especificar estes dois parâmetros, deve-se fazer o seguinte: • Clique o botão direito do mouse sobre o item CPU_0 e selecione Editar (Figura 10). • Selecione onchip_memory2_0 como o dispositivo Figure 9 – Definição da interface JTAG UART. de memória para ambos os vetores de reset e de exceção, como mostrado na Figura 11.10. O sistema completo é mostrado na Figura 10. Figure 10 – A especificação final da Placa DE1.
  • Figura 11 – Definição dos vetores de reset e de exceções. • Não altere a configuração de offset 13. Após ter especificado todos os componentes • Clique em Concluir para retornar à guia System necessários para implementar o sistema desejado, o sistemaContents. computacional pode agora ser gerado. Selecione a aba System Generation, o que leva à janela na Figura 12. Figura 12 - Geração do sistema
  • Desligue Simulation - Create Project simulator files. do university program, para a versão do Quartus II e o tipoClique em Gerar na parte inferior da janela do SOPC, nesse de dispositivo o arquivo DE1_pin_assignments.qsf.momento pode-se salvar o projeto com o nome de pratica1.O processo de geração produz as mensagens exibidas nafigura 12. Quando a mensagem "SUCESSO: GERAÇÃO doSISTEMA COMPLETO" aparecer, clique em Sair paravoltar à janela principal do Quartus II. Mudanças no sistema projetado podem ser realizadasfacilmente a qualquer momento, reabrindo a ferramentaSOPC Builder, qualquer componente na aba SystemContents do SOPC Builder pode ser selecionado eexcluído, e um novo componente pode ser adicionado e osistema novamente gerado.A.2 - Integração do Sistema Nios II em um ProjetoQuartus Figura 13 – Instanciando o sistema NIOS II no Quartus II. Para completar o projeto de hardware, temos que fazer o O SOPC Builder fornece um arquivo exemplo deseguinte: instanciação, no nosso caso o arquivo nios_system_inst.v. • Instanciar o módulo gerado pelo SOPC Builder Deve-se apagá-lo ou comentá-lo por completo para que o para o projeto Quartus II. código possa ser compilado com o arquivo criado • Atribuir os pinos FPGA. pratica1.v. • Compilar o circuito projetado. Após realizar as configurações necessárias deve-se • Programar e configurar o dispositivo Cyclone II compilar o código. Você pode ver algumas mensagens de sobre a placa DE1. aviso associado ao Nios II do sistema, tais como alguns sinais que estão sendo utilizados ou erro nos comprimentosA.2.1 - Instanciação do Módulo Gerado pelo SOPC de vetores, estes avisos podem ser ignorados.Builder A.2.2 - Programação e Configuração Tudo o que precisamos fazer é instanciar o sistema NiosII e conectar as entradas e saídas paralelas das portas I / O, Programar e configurar o FPGA Cyclone II no modo debem como as entradas de clock e reset, aos pinos programação JTAG como segue:apropriados no dispositivo Cyclone II. O módulo Verilog gerado pelo SOPC Builder está no 1. Conectar a placa DE1 para o computador host porarquivo nios_system.v no diretório do projeto. Note que o meio de um cabo USB conectado ao USB-Blaster. Ligue anome do módulo Verilog é o mesmo que o nome do sistema alimentação da placa DE1. Verifique se o interruptorespecificado no SOPC Builder. RUN/PROG está na posição RUN. Se necessário instalar a A Figura 13 mostra o módulo Verilog de mais alto nível USB Blaster a partir do driver Altera USB Blaster contidoque instancia o sistema Nios II. Este módulo é chamado no diretório do Altera>10.1> Quartus>Drivers>pratica1, porque este é o nome que foi especificado para o USB_Blaster.projeto no Quartus II. Note que as portas do módulo deentrada e saída usam os nomes especificos, como os botões 2. Selecione Tools > Programmer para abrir a janelade pressionar KEY, chaves SW, os leds verdes LEDG e o da Figura 14.clock de 50 MHz como CLOCK_50. Isso porque iremosutilizar um arquivo específico com os noves dos pinos de 3. Se ainda não estiver escolhido por padrão, selecioneentrada e saída já especificados, de acordo com a placa DE1 JTAG na caixa Mode. Além disso, se o USB-Blaster não éda Altera. Digite o código em um editor conforme a Figura escolhido por padrão, pressione o botão Hardware Setup...13 e salve como pratica1.v. Adicione este arquivo e todos os e selecione o USB-Blaster na janela que aparece.arquivos *. v produzido pelo SOPC Builder para o projetodo Quartus II (Project>Add/Remove files in Project). 4. O arquivo de configuração pratica1.sof deve ser Para preparar os pinos devemos importar o mapa de listado na janela.Caso o arquivo não esteja listado, selecionepinos de um arquivo (Assigments>Import Assigments...). O o arquivo e clique em Add.university program da altera possui um arquivo para cadaplaca, para facilitar o projeto, busque dentro dos diretórios 5. Clique na caixa Program/Configure para estabelecer a ação.
  • Figura 14 – A janela do programador. 6. Nesse ponto a janela se parece com a da Figura 14. Digite esse código em um arquivo pratica1.s e coloque oPressione Start para configurar a FPGA. arquivo no diretório de trabalho. O programa tem de ser montado e transformados em um arquivo S-Record,A.3 - Execução do Programa de Aplicação pratica1.srec, adequado para fazer o download para o sistema implementado Nios II. Tendo configurado o hardware necessário no dispositivo A Altera fornece um software monitor, chamado AlteraFPGA, é agora necessário criar e executar um programa que Monitor Program, para uso com as placas DEs. Esteexecuta a operação desejada. Isto pode ser feito utilizando a software fornece um meio simples para a compilação,linguagem assembly do Nios II ou em uma linguagem de montagem e download de programas em um sistemaalto nível, como C. Vamos ilustrar a programação em implementado com o Nios II em uma placa DE. Ele tambémassembly. torna possível para o usuário executar tarefas de depuração. A descrição deste software está disponível no tutorial Altera Monitor Program.A.3.1 – Usando a Linguagem de Programação do Nios II. Abra o Altera Monitor Program (Figura 16). Este software precisa conhecer as características do sistema Usando um programa em linguagem assembly (Figura projetado, o qual é dado por um arquivo do tipo .ptf, nesse15) carregaremos os endereços dos dados registrados na caso o nios_system.ptf. Clique File > New Project para abrirPIOs em dois registradores r2 e r3. Em seguida, em um loop a janela New ProjectWizard e siga os seguintes passos:infinito transfere os dados da PIO de entrada, Switches, à desaída, LEDs. 1. Entre no diretório curso_FPGA como o diretório de projeto digitando diretamente ou buscando no campo Browse. 2. Digite pratica1 como o nome do projeto e clique Next>.. 3. A partir da caixa Select a System, selecione <Custom System>. Figura 15 - Código Assembly para controlar as luzes. 4. Clique Browse... antes do campo System Description para mostrar a janela de seleção e escolha o arquivo Para uma explicação detalhada das instruções em nios_system.ptf. Note que esse arquivo está no diretório delinguagem assembly do Nios II ver o tutorial Introduction to projeto curso_FPGA.the Altera Nios II Soft Processor [14].
  • Figura 16 – Janela do Altera Monitor Program.Figura 17 – A janela de especificação da memória do programa.
  • 5. Especifique o arquivo .sof (pratica1.sof) no campo é exibido na janela Disassembly do Altera Monitor,Quartus II Programming (SOF) File para permitir que o conforme ilustrado na Figura 18.usuário faça o download do programa dentro da placa apartir do Altera Monitor Program. Click Next>. Clique Actions > Continue para executar o programa. Com a execução do programa, agora você pode testar o 6. Selecione Assembly Program como o tipo de projeto ligando os interruptores, SW7 a SW0, os LEDsprograma a ser utilizado e clique Next>. devem responder em conformidade. Figura 18 – Visão da janela depois do programa descarregado na placa. 7. Clique Add... para abrir a janela de seleção de arquivo B. Segunda Parte – DE1 Basic Computere escolha o arquivo pratica1.s e clique select., clique Next >. 8. Verifique que a conexão do host está configurada para Nessa segunda parte iremo implementar um computadorUSB-Blaster,o processador está setado para cpu_0 e o básico completo. Um diagrama de blocos do computadorTerminal Device está setado para JTAG UART, e clique DE1 básico é mostrado na Figura 19 mostra todos osNext>. recursos encontrados na placa DE1 Board. Seus principais componentes incluem o Nios II da Altera processador, 9. O Altera Monitor Program também precisa saber onde memória para o programa e armazenamento de dados,buscar o programa de aplicação. Nesse caso será no bloco portas paralelas conectados a switches e luzes, um módulode memória da FPGA. O SOPC Builder estabeleceu o nome timer e uma porta serial.onchip_memory2_0 para esse bloco. Como mostrado naFigura 17, o Monitor Program já selecionou os dispositivos Como mostrado na Figura 19, o processador e suasde memória corretos. interfaces para dispositivos I/O são implementadas dentro do chip Cyclone II FPGA na placa DE1. 10. Clique Finish para confirmar a configuração dosistema. Utilizaremos o University Program da Altera, especificando a placa DE1 e o Sistema Computacional DE1 Básico. Esse pacote e o tutorial completo pode ser Em seguida clique em Actions > Compile & Load. O encontrado no site da Altera [15].Altera Monitor irá invocar o assemblador e um linkador.Depois que o programa já foi baixado na placa, o programa
  • Figura 19 - Diagrama de blocos do Computador DE1 Básico. [4] A. S. Tanenbaum, “Organização Estruturada de Computadores”, CONCLUSÕES E TRABALHOS FUTUROS Pearson: Prentice-Hall, 2007. [5] W. Stallings, “Arquitetura e Organização de Computadores”, Pearson, 2010. Com o curso de introdução aos sistemas embarcados em [6] R. J. Tocci, “Sistemas Digitais - Princípios e Aplicações”, Pearson:FPGAs, o aluno pode realizar a implementação de um Prentice Hall, 1994sistema computacional simples, utilizando o SOPC Builder [7] E. C. Pedrino, “Arquitetura pipeline reconfiguravel atraves decom o auxílio do NIOS II, além de implementar um instrucoes geradas por programação genetica para processamentocomputador básico. Espera-se, assim, que no futuro o aluno morfologico de imagens digitais utilizando FPGAs” (in Portuguese),possa projetar de um sistema computacional embarcado em Doctoral thesis, São Paulo University -USP-EESC, pp. 220,2008.uma FPGA e configurá-lo conforme suas necessidades. [8] “Introduction to the Altera SOPC Builder Using Verilog Designs”,Como trabalhos futuros, os autores pretendem realizar Accessed April 27, 2012 ftp://ftp.altera.com/up/pub/ Alteraversões mais avançadas de cursos de sistemas embarcados Material/9.1/Tutorials/Verilog/Introduction_to_the_Altera_SOPC_Bu ilder.pdfutilizando FPGAs, DSPs e computadores dedicados. [9] J. O. Hamblen, Altera DE2 Board Resources for Students http://users.ece.gatech.edu/~hamblen/DE2/, 2011. [10] “Altera University Program – IP Cores for Education”, Accessed Oct 14, 2010. [Online]. Available: http://www.altera.com/education/ AGRADECIMENTOS univ/materials/ip-cores/unv-ip-cores.html [11] “Altera’s Embedded Processors”, Accessed Oct 14, 2010. [Online]. Available: http://www.altera.com/products/ip/processors/nios2/ni2- The authors are grateful to the “Instituto Nacional de index.htmlCiência e Tecnologia em Sistemas Embarcados Críticos [12] “Nios II Community FTP”, Accessed Oct 14, 2010. [Online].(INCT-SEC)” for the financial support. Available: http://www.niosftp.com/pub/ [13] J. O. Hamblen, T. S. Hall, M. D. Furman, “Tutorial IV: Nios II Processor Hardware Design” In Rapid Prototyping of Digital Systems SOPC Edition Springer 352-370 (2008). REFERENCES [14] “Introduction to the Altera Nios II Soft Processor.” Accessed April 27, 2012, ftp://ftp.altera.com/up/pub/Tutorials /DE2/ Computer _Organization/ tut_nios2_introduction.pdf[1] J. O. Hamblen, T. S. Hall, and M. D. Furman, “Altera Rapid Prototyping of Digital Systems SOPC Edition”, Springer, 2008. [15] “Basic Computer System for the Altera DE1 Board” Accessed April 27, 2012, ftp://ftp.altera.com/up/pub/Altera_Material/11.0/Examples/[2] Z. Navabi, “Digital Design and Implementation with Field DE1/ NiosII_Computer_Systems/DE1_Basic_Computer.pdf Programmable Devices”, Ed. Kap, 2005.[3] D. A. Patterson, and J. L. Hennessy, “Organização e Projeto de Computadores - A Interface Hardware/Software”, Editora Campus, 2005.
  • Prof. Dr. Edilson Reis Rodrigues Kato Prof. Dr. Emerson Carlos Pedrino Universidade Federal de São Carlos (UFSCar) Universidade Federal de São Carlos (UFSCar) Rod. Washington Luis, Km 235 – São Carlos SP Rod. Washington Luis, Km 235 – São Carlos SP kato@dc.ufscar.br emerson@dc.ufscar.br Professor Adjunto da Universidade Federal de São Carlos do Possui graduação em Bacharelado em Física ComputacionalDepartamento de computação DC regime de dedicação exclusiva pela Universidade de São Paulo - IFSC - (2000), Especialização em(DE). Possui graduação em Engenharia Elétrica pela Universidade Geoprocessamento pela Universidade Federal de São Carlos -de São Paulo - USP (1988), mestrado em Engenharia Mecânica DECiv - (2003), Mestrado em Engenharia Elétrica pela Universidadepela Universidade de São Paulo - USP (1994), doutorado em de São Paulo - EESC - (2003) e Doutorado em Engenharia ElétricaEngenharia Mecânica pela Universidade de São Paulo - USP (1999) pela Universidade de São Paulo - EESC - (2008). Atualmente ée Pós-Doutorado em Automação e Inteligência Artificial pela Professor Adjunto do Departamento de Computação daUniversidade Federal de São Carlos - UFSCar (2001). Tem Universidade Federal de São Carlos. Tem experiência nas áreas deexperiência na área de Engenharia Elétrica e de Computação, com Ciência da Computação, Engenharia Elétrica e Geoprocessamento,ênfase em Automação Eletrônica de Processos Elétricos e atuando principalmente nos seguintes temas: desenvolvimento deIndustriais, atuando principalmente nos seguintes temas: arquiteturas rápidas e inteligentes para processamento de imagensModelagem de Sistemas Automatizados, Sistemas Inteligentes em tempo real utilizando dispositivos de lógica programável de altaaplicados a Manufatura, Inteligência Artificial, Arquitetura de capacidade, instrumentação microprocessada, programaçãoSistemas e Dispositivos de Lógica Programável, genética, morfologia matemática, sensoriamento remoto e visão robótica http://lattes.cnpq.br/8517698122676145 http://lattes.cnpq.br/6481363465527189
  • Autonomic Wireless Sensor Networks A.R. Pinto1, G.M. Araújo2, J.M. Machado1, Adriano Cansian1, Carlos Montez2 1 State University of São Paulo - UNESP São José do Rio Preto-SP, Brasil {arpinto,jmachado,adriano}@ibilce.unesp.br 2 PGEAS – Universidade Federal de Santa Catarina – UFSC, Brazil {araujo,montez}@das.ufsc.br Abstract The nodes that compose these networks are able to collect scalars and they are also able to communicateWireless Sensor Networks (WSN) can be used to monitor with each other. The set of nodes can be homogeneoushazardous and inaccessible areas. In these situations, or some of them may have special characteristics. Somethe power supply (e.g. battery) in each node can not be WSN considers the use of a base station that has moreeasily replaced. One solution is to deploy a large computational power than other nodes. The base stationnumber of sensor nodes, since the lifetime and has the responsibility to collect, process and store datadependability of the network can be increased through sent by slave nodes [10].cooperation among nodes. In addition to energy Resources in WSN technology (processor, memoryconsumption, applications for WSN may also have other and battery) are generally restricted. Some networks areconcerns, such as, meeting deadlines and maximizing deployed in hazardous or inaccessible places wherethe quality of information. The large number of WSN change of battery is prohibitive [6]. This way, there arenodes and the harsh or inaccessible areas where WSN multiple research efforts currently underway to increaseare generally deployed increase the efforts of WSN the system lifetime, adopting approaches that minimizemanagement. Thus, autonomic computing approaches the duration of processing and communication tasks andare necessary to maintain networks that must be active that also minimize context switches. Moreover, due toduring a long period of time. battery depletion, faults in the wireless communicationIn this chapter, we shown the WSN and autonomic and faults in hardware nodes, the network topologycomputing characteristics. Two autonomic approaches becomes very dynamic [1].for dense WSN are also presented. The first approach is Some approaches consider a large number of nodes (aa Genetic Machine Learning algorithm aimed at dense network), which are deployed near theapplications that make use of trade-offs between phenomenon that needs to be monitored. Sometimes, duedifferent metrics. Simulations were performed on to the fact that the network is deployed quickly and theirrandom topologies assuming different levels of faults. nodes are scattered over a large area in a randomGMLA showed a significant improvement when fashion, the position of the nodes can not becompared with the use of IEEE 802.15.4 protocol. predetermined [6].Moreover, an approach that autonomically provides The strategy behind the deployment of a large numberQoS for dense WSN called VOA (Variable Offset of non-reliable nodes has several advantages: (i) betterAlgorithm) is presented. Experimental results had fault tolerance through distributed operation; (ii) uniformshowed that VOA can significantly improve covering of the monitored environment; (iii) easycommunication efficiency in dense WSN. deployment; (iv) reduced energy consumption; and (v) longer network lifetime.1. Introduction The high interaction degree that a WSN may haveWireless sensor networks (WSN) are a denomination for with the environment where sensors are deployednetwork that covers several variations in compositions imposes multiple implicit and explicit time constraints.and deployment of nodes. These networks are composed For instance, the concept of data freshness implies thatof small communicating nodes, which contain a sensing some data in the system has a short time of validity [3].unit, wireless communication module, processor, For instance, in security applications, whenever someonememory and a power supply, typically a battery [9]. accesses a predetermined room, the system must to localize the potential intruder within a maximum period of time (a deadline).
  • Due to the high fault degree, the inherent non-determinism, the surrounding noise and the resourcerestrictions, it is extremely hard to guarantee real-timeproperties in WSN. In this way, applications with harddeadlines constraints are generally not considered. Data fusion approaches, in dense networks, are usedin order to increase sensor readings dependability, tomake a more accurate estimation of monitoredenvironment and to achieve longer network lifetime[5,6]. In these approaches, sensed scalars are sent to abase station that fuses data, with the objective ofextracting useful information from a set of readings. Thisway, even in the presence of faulty sensors, dependableinformation may be generated. This issue is one of themost important that outcomes from data fusionapproaches: it is no longer necessary to rely just upon Figure 1 – IEEE 802.15.4 behavior.one sensor reading, when supporting dependableapplications. In this chapter, two autonomic approaches are Even though dense WSN presents several advantages, presented. The first one is a self-organizing approach forself-management characteristics are required in order to WSN based on the use of genetic machine learningdeal with the management of a large number of nodes. algorithm. Genetic Machine Learning algorithmsSelf-management techniques are part of autonomic- (GMLA) are a machine learning approach based oncomputing methodologies, that can also be used to genetic algorithms (GA).manage WSN with conflicting goals (energy efficiency, GA are optimization algorithms based on the naturalself organizing, time constraints and fault tolerance). The selection procedures proposed by Charles Darwin. GAmain goal of self-management is the development of a are efficient in solving multi-goal optimizationcomputing system that does not need the human problems. However, its overhead may be cumbersome.intervention to operate. This way, computing systems are On the other hand, GMLA approaches reduce partiallyable to self-organize and self-optimize themselves, once the overheads (genotype evolution is made after somethey follow global objective dictated by a system classifier consults). GMLA tries to achieve a trade-offadministrator . between communication efficiency (Ef) and quality of Dense WSN composed for several sensor nodes and a fusion (QoF). The second presented approach, VOAbase station in a star network topology is in conformance (Variable Offset Algorithm) uses random offset beforewith IEEE 802.15.4 standard, which is becoming a de each transmission in order to decrease collisions infacto standard in WSN [2]. The IEEE 802.15.4 offers wireless media.support for different kind of applications, but many This chapter is organized as follows: section 2 showsissues are still open, when the goals are conflicting (for WSN challenges and characteristics. The autonomicinstance, increase dependability and energy efficiency, computing principles are briefly describe in section 3.while meeting time constraints). IEEE 802.15.4 Genetic Machine Learning Algorithms are shown inprotocols do not seem to be able to deal with such section 4. Related works are presented in section 5.complexities. Model System is presented in section 6. GMLA For example, when the number of nodes in a network simulation results are presented in section 7. VOAis increased to achieve better reliability, the WPAN may approach is presented in section 8 and VOAbe congested, and fewer messages arrive in the base experimental tests in section 9. Finally, final remarks arestation on time. In order to show this situation, we presented in section 10.perform experiments using TrueTime simulator1. Twometrics called Ef (efficiency) and QoF have been 2. Wireless Sensor Networksadopted. The efficiency is a metric that measures theratio between sent and received messages. QoF, WSN are generally composed by a large number of tinyrepresents, roughly, the average number of received nodes that can sense the environment and communicatemessages by the base station, periodically. QoF provides through wireless media. The nodes of a WSN area quality measurement. Figure 1 shows that when embedded systems with severe hardware and softwaredensity network is increased, QoF increases slowly, but constraints [15]. The Figure 1 shows a WSN nodecommunication efficiency quickly decreases. scheme.1 It is freely available at http://www.control.lht.se/truetime.
  • Figure 1 – WSN Node SchemeThe small size and wireless communication of WSNtechnology allow the quick deployment of the networkover a monitoring area. The WSN deployment can be Figure 3: Robotic WSN Deploymentpreviously determined or the sensors can be randomlydeployed. The sensor placement can be manually or WSN can also be dropped by an UAV (unnamed aerialautonomically done, following the pros and cons of each vehicles) or an airplane during the flight (Figure 4).deployment type will be discussed. Thus, sensor nodes will be placed in a completeThe sensors deployment made by human beings is unpredictable location, and it is necessary to deal withshown in Figure 2. The placement can be done in a this uncertainty through special self-organizationpredetermined way or in a random fashion. The main techniques. Moreover, the sensor must be extremelyadvantage of this kind of deployment is the simplicity. cheap due to the fact that many of them can be lost orHowever, the placement precision is lower than the damaged during the aerial deployment. However, thisrobotic one. strategy is suitable in battlefields and contaminated sites (for example, chemical or radiation contamination). Figure 2: Manual WSN DeploymentWSN deployment can also be autonomically done byrobots (Figure 3). There are many advantages behindthis strategy: the WSN topology is precisely formed, it ispossible to deploy a WSN over a harsh or inaccessible Figure 4: WSN nodes dropped by Aerial Vehicles.environment without human lives risk and thedeployment cost is lower than a manual deployment.However, the complexity of robot development and the The sensor nodes can also present mobilitycost of robot management could be prohibitive. capabilities (robots). Thus, each node can move to a specific location in order to form the network. Figure 5 show a WSN formed by mobile sensors. This strategy present several challenges like: group coordination, self- localization and energy consumption (generally the
  • mobility itself consumes more energy than wireless by the base station can be considered member of thecommunication). On the other hand, this kind of WSN is WSN. Figure 7 shows the case where nodes are locatedmore flexible (nodes can move to specific geographic out of the base station radio coverage.location when it is necessary). Figure 5: Autonomic Deployment After the deployment WSN nodes are turned on inorder to sense and report environmental values like: Figure 7: Star Topology Problemtemperature, humidity, pH, etc. There are several advantages in a star topology WSN Due to the fact that WSN is suitable to sense large formation:areas, there are three main network topologies that can Simple network communication: Due to the fact thatbe used in order to cover the monitoring area and to the communication is just in one way (base station todeliver data to the users: star, cluster-tree and mesh. The sensors and sensors to base station), sensor nodes can beadvantages, disadvantages and characteristics of each set into sent mode during a long period of time.topology will be following discussed. Moreover, the base station can just receive messages and The star topology is formed by two main kind of turn into sent mode just in special situations like: querynodes: base station that is used to collected data from specific group of nodes or to send checkpoint packets.sensors, WSN management, data fusion. There are also Due to the fact that star topology is based on a one-hopcommon sensor nodes that just sample data from sensors communication, routing algorithms are not necessaryand broadcast the information to the base station in just (these algorithms are generally complex and spend highone hop. The star topology scheme is showed in Figure level of energy).6. Faulty nodes do not affect the network communication: The one-hop communication of star topology is much more simple than the k-hop approaches that are used in cluster-tree and mesh WSNs. Thus, faults in nodes would not interrupt the data delivery to the base station (when nodes that are used as routers fail, the entirely WSN can be affected). Moreover, the fault nodes can be easily replaced or another WSN node could assume the fault nodes role. Simple software implementation: the star communication pattern needs a simple software implementation than other topologies. Moreover, the development of routing algorithms (that is more complex) is not necessary Low cost WSN: base stations are more expensive Figure 6: Star Topology than common nodes (they must have powerful hardware, The base station generally presents a more powerful software and long life batteries). Topologies like mesh orhardware, special software and a larger energy budget to cluster-tree may need more than one base station, thusdeal with the WSN management. their cost is also higher. The common sensor nodes must be placed in such a The simple topology scheme of star topology can alsoway that their distance is lower than the maximum base cause many problems, some of them will be listedstation radio diameter. Thus, just nodes that are covered below.
  • Limited coverage area: the WSN coverage is limitedby the radio antenna. Other topologies relies on routingalgorithms which can increase the monitored area.Larger number of nodes per base station: The nodes of astar topology are disposed over a singe cluster. Thus, thenumber of nodes managed by a single base station isgenerally larger than in other topologies.Single Failure Point: Star topology just rely on onesingle base station that receives messages and managethe entire WSN, when this base station fails all the WSNcollapses.Heavy wireless traffic: all the information collected bysensor nodes is broadcasted to a single node. Thus, theheavy network traffic can cause congestion.Cluster-tree TopologyThe nodes of a cluster-tree topology are divided into clusters.Each cluster has a special node called cluster-head (CH) that is Figure 9: Overlapping Problemresponsible for the management of all the nodes of its cluster. CH Election: The election of a CH in a cluster isThe base station node must manage all WSN nodes andgenerally receives messages from the CH´s. Thus, a cluster- based on metrics like: battery charge, geographicaltree WSN is composed by three main actors: base station, location, special hardware characteristics (mobility,sensor nodes and cluster heads (Figure 8). wireless module, process power or storage). Due to the fact that CH spent more energy than common nodes, the CH schedule is also used in order to increase WSN lifetime. Management of a larger WSN: due to the fact that nodes are divided into clusters the monitored area is bigger than in a star topology. Besides, there are more nodes to manage. Routing Techniques: the unpredictability of wireless media and the low dependability of nodes cause several changes in the WSN topology. Thus, the state of routes must be periodically updated. The high level of exchanged messages, also increase the energy consumption. Moreover, routing tables can not have all the routes due to the limited memory size of WSN nodes. The third kind of WSN topology is mesh, that is used when there is no central management node(see Figure 10). Thus, every WSN node can be used as a router. This way, there are several possible paths along the WSN. The main advantage of a mesh WSN can be considered the fault tolerance. This characteristic is mainly due to Figure 8: Cluster-tree Topology the distributed nature of the mesh network. Due to the fact that all nodes can route messages and there is no There are several challenges when a cluster-tree central node, when some node fails another can assumetopology is used: its duties. The main disadvantage is that all nodes must Cluster formation: The cluster formation in a random be prepared to deal with uncertainties that are notor strategically deployed WSN is a complex task. The previously known (for example route changes).choose of nodes of each cluster is generally done based Moreover, the approaches of mesh WSN must beon goals like: cluster size, increase WSN coverage, carefully implemented In order to overcome hardwaredecrease overlapping of CHs(see Figure 9) or minimize and software constraints.power consumption.
  • necessary [18]. This characteristics are easily listed as challenge, however it is extremely difficult to achieve. High scale/density: There are several WSN approaches that consider a large number of nodes in order to overcome hardware or software faults, thus there is a minimum number of nodes that are necessary to guarantee the WSN service. The main challenges include: the processing of this large number of generated data, the assurance that WSN achieves the minimum desirable density, and the development of solutions that requires the least density and energy consumption in order to minimize energy consumption and maximize the WSN lifetime. The WSN based on huge number of nodes that are deployed in large areas are considered large scale systems. Due to the high density, these systems are subject of faults, noise (that sometimes can Figure 10: Mesh WSN be caused by the WSN itself) and other uncertainties. Moreover, when a WSN is deployed it must be self- Finally, there are several advantages when a WSN operational and present self-maintenance, due to the facttechnology is used like: that human intervention is sometimes very expensive or Non-intrusive monitoring: the small size of WSN even impossible. Therefore, all these characteristicsnodes allows a non-intrusive environmental monitoring. imposes several conflicting goals. These challenges canMoreover, the wireless communication decrease the be increased due to the minimization tendency in thedeployment effort. industry (nanometrics WSN are being considered) [17]. Low cost Technology: The WSN technology is muchmore cheaper than other wired solutions. Real-time: WSN operates in the real world, thus real- Larger area monitoring: the wireless communication time features are really necessary to guarantees theand the low cost of WSN nodes allow the deployment of correct WSN functioning. These systems present implicitlarge scale networks. Thus, it is possible to cover larger real-time constraints. Besides the response time of thismonitoring areas than other technologies. task is also important, thus the system tasks must be finished as faster as possible. Several WSN also presentOn the other hand, the small size, wireless media and explicit real-time constraints. For example, a structuralsevere hardware and software constraints introduces new monitoring application imposes explicit deadlines for thechallenges in the development of WSN approaches: data sensing [19]. However, due to the large number of nodes, non-determinism and noise it is extremely hard toEnergy Consumption: WSN are often deployed over guarantee real-time properties.harsh or inaccessible areas. Thus, the battery Security: WSN can be used in critical applications,replacement is generally prohibitive. thus the security is an essential issue to be considered. Denial of Service techniques can be easily executed overResource Constraints: As noted above, WSN faces a WSN. Moreover, coordination and real timesevere resource constraints. The main resource communication approaches do not consider securityconstraints are: limited energy budget, restricted CPU issues. Thus, some intruder can easily exploit these WSNclock, memory and network bandwidth. This security faults. The great dilemma is how to implementcharacteristic imposes the application of new solutions. security techniques that need large computationalThe fact that WSN topologies are composed of a huge resources in a technology that have severe hardwarenumber of nodes is also a new issue that were not constraints.usually considered in simple ad-hoc networks. Forinstance, trade-off approaches that aims energy economyand real-time requirements became necessary [20]. 3. Autonomic Computing PrinciplesSelf-*: One of the biggest challenge is how to create a Computer systems have achieved such a high level ofWSN vision in the network application layer. Due to the complexity that human efforts to keep them operationalfact that WSN are deployed to operate with few or none has become inadequate. A similar problem took place inhuman intervention, self-* characteristics like self- 1920 in telephony. At that time, human operators wereorganization, self-optimization and self-healing become required to work hand in switchboards. The rapid
  • popularization of the phone caused serious concerns a certain environment. Rules are generally of theregarding the number of trained operators to meet the following form: if <condition> then <action>demand. The introduction of the machines that The meaning of this production rule is that the actionperformed the work eliminated the need for human has to be imposed to the system when the condition isintervention [21]. satisfied. Classifiers are generally composed of threeThe term autonomic computing was introduced by IBM characters {0,1,#}, where # is a wildcard: it can mean 0in 2001 to describe computer systems able to self- or 1. A message received by the system can activate onemanage [22]. The main properties "auto-x" proposed by or more classifiers.IBM are: self-configuring, self-optimizing, self-healing Table 1 - Example of classifier population.and self-protection. Each one of them is detailed asfollow [21]: Condition 10#01# 11#1#0 0#1111 100001Self-configuration: systems ability to configure itself Action 100 111 001 110according to high level objectives;Self-optimization: The system can decide to start a When the system receives a message “101011” fromchange in the system so pro-active, in order to optimize the environment, then the first classifier will be activatedthe performance or quality of service; and the action “100” will be executed. Classifier systemsSelf-healing: the system detects and diagnoses problems. are able to adapt their classifiers in a way where actionsThe problems here can be either faulty bits in a memory that enhance the performance of the system arechip and a software error; privileged. This way, a classifier system is able to adaptSelf-protection system is able to self-defend against itself in an unknown system.malicious attacks or unauthorized changes. In the startup of the classifier system, all classifiersThe idea of autonomic computing is heavily inspired by receive the same budget. Budget is an adaptationbiological systems. Biological systems are the result of measure of some classifier. When some classifier isyears of evolution and have desirable features such as chosen by the classifier system, it has to pay aautonomic systems [23], some of these characteristics predetermined amount of its budget to the apportionmentare mentioned below: system. This amount is previously set by the manager • Environmental changes adaptation; system. When more than one classifier satisfies some • Robustness to failures caused by internal or predefined condition, the one that has the larger budget external factors; will be chosen. The amount of accumulated budget by • Ability to achieve complex behaviors usually the apportionment system will be paid to a classifier that based on a limited set of basic rules; improves system performance. On the other hand, if the • Ability to learn and evolve as new conditions last classifier did not improve the system, it will lose part are applied; of its budget as a payment for its bad action. This way, • Ability to self-organize in a distributed manner, the most adapted classifiers will increase their budgets. achieving an effective balance in a collaborative After some consults to the classifier system, the genetic manner; algorithm (GA) evolutes the classifier population in • Intelligent management of limited resources order to get better solutions to the problem. through a global intelligence; GA consider a population of answers for some • Survivability in harsh environments. question, in this case, individuals are represented by theirThe features of biological systems mentioned above are genotypes, which are usually a set of bits or characters.presented by computation techniques evolutionary [24]. This population is evoluted by the GA every cycle ofMoreover, the WSN introduce an explicit need for self- evolution. At each generation of answers, a new set oforganization [23], especially with the tendency to artificial creatures (set of characters) is generated. Theseachieve nanoscale devices [17]. answers are based on fragments of the most adapted previous individuals. The main focus of GA is robustness. Once a system is more robust it will not need4. Genetic machine learning algorithms intervention of programmers or redefinitions. Moreover, they will achieve higher levels of adaptation and theyClassifier systems are machine-learning algorithms will be able to execute better and longer.based on genetic algorithms. These systems are able to The main difference between classical GAlearn syntactically simple rules. In this paper we call it approaches and the GMLA approach is that GMLAGenetic Machine Learning Algorithm (GMLA). optimizes the answer for the problem on-line, due to theClassifiers systems are composed of three main fact that it gives answers instantly. Otherwise, GA needscomponents: (i) Rules and Message System; (ii) more time to achieve a problem solution.Apportionment system; and (iii) Genetic Algorithm. Rules and Messages System is a computationalscheme that uses just simple rules to guide the system in
  • 4.1. GMLA Dynamic adjust of sending probability Freshness) and considers miss deadline ratio and the dataThe main target of the proposed approach is to freshness (QoD levels) as the relevant metrics.dynamically adapt the sending probability Sp. In such a A parallel data fusion scenario is considered in [5],way, there is a trade-off between QoF and Ef. The where the master node is not aware of the number ofconfiguration of the classifiers is shown in Table 2. sensor nodes. The data fusion rule (referred as counting Table 2. Classifiers configuration. rule) imposes that the number of packets must be greater than a pre-defined threshold in order to make a decision. Classifier Bits Meaning A serial fusion based on genetic algorithms is Part presented in [1]. However, a mobile agent approach for C1 1 0 = decrease, 1 = increase the target detection is used in order to validate the multi C2 3 000 = [0%;12%] 100 = (48%;64%] objective genetic algorithm. 001 = (12%;24%] 101 = (64%;72%] A node grouping approach called H-NAME is 010 = (24%;36%] 110 = (72%;84%] presented in [11], where the authors show that such 011 = (36%;48%] 111 = (84%;100%] technique can improve the network throughput, A1 1 0 = decrease, 1 = increase reliability and energy efficiency, while the transfer delay A2 3 000 = 12% 100 = 64% is decreased. The impact of hidden nodes was 001 = 24% 101 = 72% highlighted through simulation in an IEEE 802.15.4 star 010 = 36% 110 = 84% 011 = 48% 111 = 100% topology. Two test beds were also used to demonstrate the performance of the proposed approach. However, when the network is grouped in clusters the networkAccording to Table 2, a classifier is composed of four density is decreased. Moreover, the tests just consideredparts: C1, C2, A1 and A2. The form of a classifier is a IEEE 802.15.4 network with 18 nodes in beacon mode.<C1+C2>:<A1+A2>. C1 value indicates if the efficiency Q-DAP is a QoS data aggregation and processinghas increased or decreased since the last checkpoint, and approach that is executed at the intermediated nodes of aC2 indicates the efficiency gain level. A1 value indicates cluster-tree network [12]. Thus, the energy efficiencyif Sp will be increased or decreased, and A2 indicates the and network lifetime are increased while end-to-endlevel of change in Sp. The overhead imposed by GMLA latency and data loss are decreased. The main effort inis much smaller than the overhead of traditional GAs. Q-DAP is to determine when and where execute dataHowever, the evolution requires more system execution aggregation based just in local information. Q-DAP wastime. The key of GMLA is that the evolution is done evaluated through simulation and mathematicalduring the execution time, whereas a traditional GA modeling. The main concern about this approach is thatevolutes to candidate solutions before executing them. it considers a static cluster tree topology withThis is one of the reasons why we consider GMLA- predetermined routes. Therefore scalability issues maybased solutions more suitable for dynamic systems like hinder the quality of the proposed approach.WSN applications. The efficiency variation is calculated MMSPEED is a routing protocol for probabilisticas follows: QoS guarantee in WSNs. It provides two quality  Ef i −1   domains, called timeliness and reliability [13]. Efi =    − 1 ×100  (1) MMSPEED guarantee multiple packet delivery speed  Ef i   options in the timeliness domain. It also provides various reliability requirements by multipath forwarding. End-to-5. Related work end requirements can be guaranteed in a localized way,The proposed model has its roots in previous research which is desirable for scalability and adaptability to largework from the authors and also in some existing wireless scale dynamic sensor networks. Nevertheless, the use ofnetwork standards. The adopted star topology is part of geographic routing poses to the nodes to be aware ofZigBee technology (based in IEEE 802.15.4). their positions. This way, authors assumed that WSNApproaches presented in [4,5,7] use star topologies, as nodes have GPS or distributed location services.well, where sensor nodes reach the base station in just However GPS devices are expensive and do not functionone hop. well indoor and distributed location services impose The round concept is showed in [8]. The main goal of extra overhead in packet exchanging (a node mustthis concept is to discretize the time intervals at which periodically send its location in broadcast).decisions are made. In this work, the monitoring phase(that is equivalent to our session concept) is divided inequal duration rounds. 5.1. IEEE 802.15.4 standard A metric similar to the proposed QoF concept ispresented in [3], within the context of real-time IEEE 802.15.4 [2] was proposed in 2003 and isdatabases. The metric is called QoD (Data Base becoming a de facto standard for low power and low rate
  • wireless networks. The physical layer can operate with250 Kbps of maximum transmission rate. The MACsupports two types of operational modes that can beselected by a central node called PAN coordinator: (1)beaconless mode, a non-slotted CSMA/CA; and (2)beacon mode, where beacons are sent periodically byPAN coordinator. In this last case, nodes aresynchronized by a superframe structure. An IEEE 802.15.4 network can enable the use of upto 65,000 nodes, based on its address scheme. Threetypes of topologies are supported: star, mesh and clustertree. The star topology is considered the simplestscheme, where nodes achieve to communicate with eachother in just one hop. CSMA/CA in beaconless mode is used when thecoordinator does not send a periodic beacon. Thus,backoff periods of one device are not related in time to Figure 2 - System Architecture.the backoff periods of any other device in the network[2]. The concept of monitoring session is adopted. A Two variables are maintained by each device in monitoring session is a time interval where all slavebeaconless mode: NB, the number of times a CSMA/CA nodes send periodically sensed data to the master node.algorithm is required to backoff and BE, the backoff A session S is composed of N TS rounds with the lenghtexponent, which is related to how many backoff periods R. Therefore, it is composed of 0,R,2R,3R, ..., (N-1)Ra device shall wait before attempting to asses a channel. rounds. The round concept is used to synchronize nodes, The first step in the algorithm is the initialization of and it also represents the periodicity of the data fusionNB and BE. After this step, the MAC sublayer shall task. On each round, a slave node can send zero or onedelay for a random number of complete backoff periods message M containing the sensed data to master node.in the range of 0 to 2BE – 1 and request the physical layer All slave nodes are synchronized by the WSN roundto perform CCA (Clear Channel Assessment). If the concept. Each message M sent by a slave node has anchannel is assessed to be busy, the MAC sublayer will absolute deadline D, that is the maximum time intervalincrement NB and BE by one (the algorithm must ensure within which it must be delivered to the master node.that BE is not greater than macMaxBE). If the value of Otherwise, it will no longer be useful for the data fusionNB is greater than macMaxCSMABackoffs, the task. This absolute deadline is computed based on aCSMA/CA shall end with a channel access failure status relative deadline d. We considered an homogeneous[2]. architecture where all slave nodes have the same relative This way, we may consider three main parameters deadline. This relative deadline value is sent by thethat influence beaconless CSMA/CA performance: master node in the beggining of the session. The absolutemacMaxBe (default value 5), macMaxCSMABackoffs deadline of a slave node at round n is D=nR+d, where R(default value 4) and macMinBE (default value 3) [3]. is the round lenght. These default values can decrease battery The master node performs a data fusion operationconsumption (due to the fact that one device just try 5 considering just the messages that arrived on time. Intimes before abort the transmission), however when the this work, the master node just fuses data that arrivednumber of nodes increase in the network the within the same round. That is, the relative deadline of acommunication efficiency decreases (see Figure 1). message sent in round n is always 0<d<R, andThus, IEEE 802.15.4 does not seem to be adequate for consequently, the absolute deadline is nR<D<(n+1)R.applications requiring the use of dense networks.6. Communication modelThe used communication model considers one masternode (base station) and N slave nodes (Figure 2), wherethe slave nodes periodically sense scalar data [16]. Thesignal is considered to be homogeneous in themonitoring area. Data collected by slaves is sent to themaster node that performs the data fusion. All the slavenodes reach the master using just one hop. That is, aparallel data fusion is performed in master node.
  • in Figure 3. The sending probability, the round time and the relative deadline parameters are sent by the master node in the beginning of each session. Some of these parameters may remain valid during all the monitoring session, or they can be changed at checkpoint C. A checkpoint is a special round where it is imposed the resynchronization of parameters based on the network condition. Slave nodes do not send messages in a checkpoint round; they just receive new parameter values. The first round of every monitoring session is a checkpoint round and slave nodes always wait for parameter values in the first round. Figure 3: Network Behavior Figure 5 - Master Node Algorithm. The master node calculates performance metrics during a checkpoint round in order to tune the WSN. In the proposed model, two metrics are considered: Quality of Fusion (QoF) and Efficiency (Ef). Ef is the relation between timely received messages (messages received by the master node before the deadline) and sent messages. It is calculated according to: N Figure 4 - Slave Node Algorithm. ∑ Mr (2) A sending probability (Sp) parameter is considered in Ef = i =1 EMsthe model, and all slave nodes have the same Sp. Thisparameter control the messages sent by slave nodes where N is the number of rounds since previouswithin each round. For instance, if Sp is configured to checkpoint C, Mr is the number of received messages0.1, each slave has a 10% probability of sending its and EMs is an estimation of the number of messages sentmessage. The signal is considered homogeneous and by slave nodes (3). This metric indicates how manyredundant in the monitoring area, so a well-configured messages are used in data fusion task:Sp saves network energy, reducing the number of Em = s Sp × De × N (3)packets in the WSN. The network behavior is presented
  • where De is the density of slave nodes in the WSNdeployment. Finally, QoF is the average number of 45 IEEE 802.15.4 GMLAreceived messages by the master node during a 40monitoring session, which is evaluated according to: 35 Efficiency (%) 30 N ∑ Mr 25 20 (4) QoF = i =1 15 TS 10 5 The basic idea of the QoF metric is to represent the 0quality of information on data fusion. A higher number 500 1000 1500 2000of messages used in data fusion task result in more roundsreliable information. Figures 4 and 5 present an activitymodel of Slave and Master algorithms respectively. Figure 6 - Comparison of GMLA and IEEE 802.15.4.7. GMLA Simulation results We collected the maximum communication efficiency in 33 simulations. Our approach, always achieved higherOur approach was evaluated using the TrueTime levels of communication efficiency too. We could noticesimulator. We considered a star topology in IEEE that IEEE 802.15.4 presents a static behavior, and that it802.15.4 and fixed the position of the master node in the can not learn better communication patterns whencenter of a 70x70 meters square and slave nodes are topology changes are faced.randomly deployed in this square in a way that their Table 5 presents average and standard deviation of 33antennas are able to reach the master node. Therefore, at simulations. GMLA presented higher values of standardeach experiment, we have a different network topology. deviation than IEEE 802.15.4. This is due to the learningBesides this, our fault injection scheme increases the characteristics of GMLA. GMLA has to test differentnetwork topology uncertainty. types of Sp, in order to get higher levels of Ef. However, We used a fault injection scheme to add an uncertain IEEE 802.15.4 is not able to optimize Ef, so it maintainsbehavior in the WSN. This way, at each round some the same reduced level of communication efficiency.slave nodes have a probability of failures in the The higher level of Ef was achieved in 1000 roundscommunication. Therefore, the networks topology may simulation, where GMLA presented a 39% efficiencychange at each round, what justifies the use of the (average). Figure 7 presents the maximum Ef, achievedGMLA. in 33 simulations. It is possible to notice that GMLA The main target of this simulation is to check GMLA always achieve a higher level of Ef.performance in a random deployed IEEE 802.15.4 WSN.Learning GMLA capability has also been checked. This Table 5 - Average of Efficiency.way, the simulation time was varied (500, 1000, 1500 Rounds GMLA IEEE 802.15.4and 2000 seconds), and we made 33 simulations for each 500 37+/-2 34+/-0.7 1000 39+/-3 34+/-0.5simulation time. Table 3 and 4 presents, respectively, 1500 38+/-2 34+/-0.2IEEE 802.15.4 and GMLA parameters. 2000 37+/-2 34+/-0.1 Table 3 - IEEE 802.15.4 parameters. Data Transm Receiver Pathloss Ack Retry 50 rate it signal exponen timeout limit IEEE 802.15.4 GMLA power threshold t Maximum Efficiency (%) 250 kbps -10 dbm -90 dbm 3.5 0.864 3 40 ms 30 Table 4 - GMLA Parameters.Checkpoint Reposition Evolution Population Action 20 time rate size tax 10 2 15 16 100 10 The main goal of the GMLA approach is to improvethe communication efficiency, in a communication 0 500 1000 1500 2000environment where the network topology is unknown to roundsthe master node (node that tunes Pe). We can notice inFigure 6 that the communication efficiency maintains the Figure 7 - Comparison of Maximum Efficiency.same level when IEEE 802.15.4 is used. However, whenGMLA is used, it is possible to notice a gain of almost10% in communication efficiency.
  • 45 nodes transmission. Efficiency QoF 40 A round is triggered by the master through the broadcasting of a checkpoint message. This message Efficiency/QoF (%) 35 30 synchronizes the beginning of the session among all 25 slaves, and conveys 4 parameters: a) SL: the session 20 length; b) RL: the round length; c) MO: the maximum 15 offset that a slave can use during a session; and d) K: the 10 number of messages that each slave should transmit in a 5 session. 0 Therefore, a checkpoint imposes a resynchronization 500 1000 rounds 1500 2000 of parameters based on the network condition. If the checkpoint message is not received by a slave, than the Figure 8 - GMLA Efficiency and QoF values. device it will wait in listen mode until receive the next checkpoint message (on the next session). The maximum offset (MO) is a parameter that is usedAn analysis of Figure 8 indicates that QoF maintains o compute a random delay in range [0, MO[ using aalmost the same level, in all simulations. However, the Uniform distribution. This delay is used later tohigher level of Ef was achieved in 1,000 rounds desynchronize the instants of transmission betweensimulation. This could be explained through GMLA´s different slaves.learning behavior, which tries different Sp when longer During a session each slave should transmit Ksimulation times are runned. messages. This QoS requirement is defined by the data fusion application that is executed in the master node. 90000 Message transmissions proceed as follows. After the 80000 IEEE 802.15.4 GMLA random delay, each slave should transmit one message 70000 every round times, until K messages are transmitted, or 60000 the round finishes (which occurs first). Message transmissions are always acknowledged. The default Messages 50000 40000 behavior of IEEE 802.15.4 is assumed regarding 30000 medium access, collisions, retransmissions, timeouts, 20000 etc. 10000 In conclusion, a K-out-of-N model is proposed where slaves have a QoS requirement of sending K messages 0 RM SM RM SM RM SM RM SM during a session (i.e. N round). This approach guarantees 500 rounds 1000 rounds 1500 rounds 2000 R ounds that the probability of several slaves try to transmit a message in the same instant is minimized, which reduces the number of collisions and allows the transmission of a Figure 9: Sent (SM) and Received Messages (RM). higher number of messages when compared with the The relation between Received Messages (RM) and standard solution (i.e. IEEE 802.15.4). Moreover, thisSent Messages (SM) is shown in Figure 9. It is possible to model introduces a trade-off between QoS and energynotice that GMLA always presents a higher Ef than consumption in the WSN. The tuning of number of sentIEEE 802.15.4. Thus, a WSN running the proposed messages is enabled with VOA, which increases both theGMLA approach send a lower number of messages sent network lifetime and the QoS even in random deployedthan an ordinary IEEE 802.15.4. This way, we can networks.conclude that the proposed GMLA approach is able to The master performs the data fusion operationtrade-off QoF and Ef. Moreover, GMLA expend lower considering just the messages that arrived on time. Inlevels of energy than IEEE 802.15.4. However, this this case, the master just fuses data that arrived in theapproach is just suitable for applications where the signal previous session. In order to tune the operation of theis homogeneous through the monitoring area. network, the master computes QoF and Efficiency8. Variable Offset Algorithm (VOA) metrics at the end of each session. Efficiency is the relation between timely received messages (messages VOA algorithm (Variable Offset Algorithm) targets received by the master in the previous session) and thethe optimization of the communication efficiency in required messages. It is computed as following:dense WSNs with star topology. The VOA algorithm canbe easily implemented upon IEEE 802.15.4 devices, as it Nis a light middleware implemented at the application ∑ Mr i (5)layer. The main target of VOA is the communication Ef = i =1 EMsefficiency through the use of random offsets before slave
  • where Mri is the number of received messages from Listen()slave i and EMs is an estimation of the number of Endifmessages sent by slave nodes (Eq. 6). This metric Endifindicates how many messages are used in data fusiontask: If start_Transmission then msg = Sense_data() E Ms = × K N (6) Send_data(msg)where K is the QoS requirement and N is the number nMT++slaves. Quality of Fusion is the average number of If !end_Session thenreceived messages by the master node during all the If nMT < K Thensessions, and is evaluated as equation 4. start_Transmission = Start_timer(RL) In the following section a detailed description of EndifVOA algorithm is performed using pseudo-code Elselanguage. Stop_timers() new_Session = True8.1. VOA Algorithm Endif VOA slave and master algorithm are presented below: Endif % Master Algorithm % A final remark about the random number generation. % K : number of transmissions Since this task could be computationally and timely % MO : maximum offset intensive we adopt a simple approach for their % SL : Session length implementation. Instead of performing a direct % RL : Round length calculation using a specific algorithm we used pre- computed values which are stored in the slave ROM. If new_Session then K = number_of_transmissions() 9. VOA Experimental results MO = maximum_offset() The experimental setup is composed by 30 MicaZ end_Session =Start_timer(SL) nodes [14], featuring an Atmel ATmega128L 8-bit Broadcast (SL, RL, K, MO) microcontroller with 128 kB of in-system programmable Endif memory and with IEEE 802.15.4. However the deployment area is now a 1,3x1,3m square (Figure 10). While !end_Session The checkpoint message has 19 bytes, and messages msg = Receive_msg() sent by the slaves have 18 bytes. The memory occupied Store (msg) by VOA code is the following: Endwhile o Master algorithm: 572 bytes (RAM memory) and 15902 bytes (ROM memory); If end_Session then o Slave algorithm: 333 bytes (RAM memory) and Calculate_efficency() 12088 bytes (ROM memory). Calculate_QoF() Data_fusion() Stop_timers() new_Session= True Endif % Slave Algorithm % % nMT : number of messages transmitted % SOF : slave offset If new_Session then Figure 10: Experimental setup. If Receive_ckeckpoint (SL, RL, K, MO) then SOF = Slave_offset(MO) As in the simulation, two sets of results were obtained: start_Transmission = Start_timer(SOF) with and without VOA (Table 6). end_Session = Start_timer(SL) new_Session = False nMT = 0 Else
  • Table 6: Experimental results. VOA Received Required K Messages Messages Ef QoF 1 28,825 29,000 99,4 2,8 2 57,617 58,000 99,3 5,7 3 86,260 87,000 99,2 8,6 4 115,209 116,000 99,3 11,5 5 139,979 145,000 96,5 13,9 6 156,088 174,000 89,7 15,6 7 161,585 203,000 79,6 16,1 8 159,421 232,000 68,7 15,9 9 169,653 261,000 65,0 16,9 IEEE 802.15.4 Received Required Figure 12: QoF and Efficiency. K Messages Messages Ef (%) QoF - 80,127 261,000 30,7 8,0 A second experiment was also performed by varying the number of slaves (Figure 13). The goal was to It is possible to notice that when K increases the evaluate the influence of the number nodes in the Ef andefficiency decreases. However, VOA obtained good QoF metrics. When compared with VOA, IEEE 802.15.4levels of Efficiency due to the fact the round duration is presents similar results for just one case: a network withjust 0,1 second. Moreover, a decrease in the level of 4 slaves. When the number of slaves increases thisefficiency is just seem when K is higher than 4. When K difference between VOA and IEEE 802.15.4 becomeis in range 1 to 4, the levels of efficiency achieved were higher. The difference of efficiency between VOA andhigher than 99%. IEEE 802.15.4 when considering 29 slaves its more than The relationship between received and required 100%. These results show that VOA has a satisfactorymessages is presented in Figure 11. It is possible to performance and maintains a minimum QoS level evennotice a gap between estimated and received messages with a high number of slaves.when K>4. The lower level of efficiency was achievedwhen K= 9 (65 %). VOA QoF IEEE 802.15.4 QoF. VOA Ef. IEEE 802.15.4 Ef. 100 90 25 80 Quality of Fusion (QoF) 20 70 60 Efficiency % 15 50 40 10 30 20 5 10 0 0 4 9 14 19 24 29 Num ber of nodes Figure 11: Required and received messages. Figure 13: Variable number of slaves. The relationship between QoF and Efficiency ispresented in Figure 12. When K < 4 the level of 10. Final remarks In this chapter we have shown the challenges of WSN Efficiency is almost the maximum. However, when technology. Moreover, we have shown how autonomicK is near of the number of microcycles the level of computing can support large scale WSNs. The GMLAEfficiency decreases and the QoF increases. This is approach for WSN data fusion applications waspossibly due to the fact that the wireless media is very presented as a case study. GMLA achieves higher levelsbusy. of Ef than IEEE 802.15.4 even facing random topologies
  • and communication faults. This way, the proposed Networked Sensor Systems, 2003.approach seems to be suitable for non-predictable WSN. [9] I.F.Akyildiz, , W. Su, Y. Sankarasubramaniam, E Cayirci AMoreover, GMLA presented a trade-off between QoF Survey on Sensor Networks. IEEE Communicationsand Ef. GMLA presented almost 13% of gain over IEEE Magazine, 2002, 102-114.802.15.4 in 1000 rounds simulation. [10] J.A. Stankovic, T.F. Abdelzaher, C. Lu, L. Sha, J.C Hou,It has also been shown the VOA algorithm to enhance Real-Time Communication and Coordination inthe communication efficiency in dense wireless sensor Embedded Sensor Networks. Proceedings of The IEEE,networks. A set of comparisons with IEEE 802.15.4 bare Vol. 91, No. 7, Jul. 2003, 1002-1022.nodes showed an impressive enhancement in terms of [11] A. Koubaa, R. Severino, M. Alves, E. Tovar, Improvingcommunication efficiency and QoS. The VOA algorithm Quality-of-Service in Wireless Sensor Networks bywas assessed with the help of an experimental setup mitigating “Hidden-Node Collisions”, IEEE Transactionsbased on MicaZ motes. The obtained results showed a on Industrial Informatics, Special Issue on Real-Timeclear improvement of the efficiency attained by the and Embedded Networked Systems, Volume 5, Numberproposed algorithm. Moreover, both VOA and GMLA 3, August 2009algorithms can be implemented as a light middleware at [12] J. Zhu, S. Papavassiliou, J. Yang, Adaptative Localizedthe application layer, thus no network stack QoS-Constrained Data Aggregation and Processing inmodifications are necessary. Distributed Sensor Networks, IEEE Transactions on Parallel and Distributed Systems, vol 17 no 9, September 2006, 923-933.Acknowledgement. [13] E. Felemban, Chang-Gun Lee,E. Ekici,MMSPEED: The authors acknowledge the support granted by Multipath Multi-SPEED Protocol for QoS Guarantee ofCNPq and FAPESP to the INCT-SEC (National Institute Reliability and Timeliness in Wireless Sensor Networks,of Science and Technology - Critical Embedded Systems IEEE Transactions on Mobile Computing, Vol.5, No. 6,- Brazil), processes 573963/2008-9 and 08/57870-9. june 2006, 738-754. [14] MicaZ Mote Datasheets [Online]. Available at:References http://www.xbow.com [15] Akyildiz, I.F., Su W., Sankarasubramaniam Y. e Cayirci[1] Q. Wu, N.S.V. Rao, J. Barhen, S.S. Iyergen, V.K. E.(2002). A Survey on Sensor Networks. IEEE Vaishnavi, H. Qi, K. Chakrabarty, On Computing Mobile Communications Magazine, pp. 102-114. Agent Routes for Data Fusion in Distributed Sensor [16] A. R. Pinto, C. Montez, Autonomic Approaches for Networks, IEEE Trans. on Knowledge and Data Enhancing Communication QoS in Dense Wireless Engineering, Vol 16, No. 6, 2004. Sensor Networks with Real Time Requirements, 2010[2] 802.15.4 Part 15.4: Wireless Medium Access Control IEEE Int. Test Conference, 2010. 1-10. (MAC) and Physical Layer (PHY) Specifications for [17] Akyildiz I., Brunetti, F., Blazquez C., Nanonetworks: A Low-Rate Wireless Personal Area Network (LR- new communication paradigm, Computer Networks, no. WPAN), IEEE-SA Standards Board 802.15.4, 2006 52, 2008, 2260–2279. [3] K. Kang,, H. S. Son, J.A. Stankovic, Managing Deadline [18] Huebscher M.C, McCann J.A. A survey of autonomic Miss Ratio and Sensor Data Freshness in Real-Time computing degrees, models, and applications. ACM Databases, IEEE Trans. on Knowledge and Data Comput Surveys 2008;40(3):1-28. Engineering, vol. 16, No. 10, 2004. [19] Kim, S., Pakzad, S., Culler, D., Demmel, J., Fenves, G., [4] K. Morita, K. Watanabe, N. Hayashibara, T. Enokido, M. Glaser, S., Turon, M., Health Monitoring of Civil Takanizawa, Efficient Data Transmission in a Lossy and Infrastructures Using Wireless Sensor. Resource Limited Wireless Sensor-Actuator Network, [20] Yick, J., Mukherjee, B., Ghosal. D., Wireless sensor Proc. of the ISORC07, 2007. network survey, Computer Networks, no. 52, 2008, [5] R Niu, P.K. Vashney, Q. Cheng, , Distributed Detection in 2292–2330. a Large Wireless Sensor Network, Science Direct, [21] Huebscher M.C, McCann J.A. A survey of autonomic Information Fusion 7, July, 2006. computing degrees, models, and applications. ACM[6] G. Werner-Allen, J. Johnson, M. Ruiz, J. Lees, M. Welsh, Comput Surveys 2008;40(3):1-28. Monitoring Volcanic Eruptions with a wireless Sensor [22] Kephart , J.O., Chess, D.M. ,The Vision of Autonomic Network, Proc. of the Second European Workshop on Computing, IEEE Computer, 2003. Wireless Sensor Networks, 2005, 108-120. [23] Dressler, F., A study of self-organization mechanisms in [7] A.A. Somasundara, A. Rammorthy, M.B. Srivastava, ad hoc and sensor networks, Computer Communications, Mobile Element Scheduling with Dynamic Deadlines, no. 31, 2008, 3018–3029. IEEE Trans. on Mobile Computing, Vol 6, no 4, 2007. [24] Miorandi, D., Yamamoto, L., Pellegrini, F., A survey of evolutionary and embryogenic approaches, Computer[8] T. Yan, T. He, J.A . Stankovic, Differentiated Surveillance Networks, no. 54, 2010, 944–959. for Sensor Networks, Proc. of First Int. Conf. on Embedded