Your SlideShare is downloading. ×
Saxion Hogeschool Enschede




                                       Thesis
               Dynamic System Configuration us...
Change log
Version      Date         Modifications
0.1          2009-02-09   Initial version
0.2          2009-03-13   Dist...
Samenvatting
Deze scriptie is geschreven in het kader van een afstudeerproject bij luminis in de klantcontext
van Thales. ...
Summary
This thesis is written as a part of the graduation internship at luminis in the client context of
Thales. Thales u...
Preface
This thesis is written as part of the documentation of the graduation internship of Jeroen Rosen-
berg and Lesley ...
CONTENTS                                                                                                                  ...
CONTENTS                                                                                                                  ...
LIST OF FIGURES                                                                                                        LIS...
LIST OF FIGURES                                                             LIST OF FIGURES


  10   Processing node start...
1   INTRODUCTION


1     Introduction
1.1   Purpose
This document is a final report of our study during the graduation peri...
1.3    Document structure                                                  1   INTRODUCTION


      • Part II: Literature ...
Part I
Problem Analysis & Assignment




                    11
2   PROBLEM ANALYSIS


2     Problem analysis
This chapter analyses the problem posed by Thales. First, the context of the...
2.2     Case Study: The Thales Radar Chain                           2   PROBLEM ANALYSIS


The first subsystem is the phys...
2.2    Case Study: The Thales Radar Chain                                2   PROBLEM ANALYSIS


Hardware domain
The softwa...
2.2    Case Study: The Thales Radar Chain                                2   PROBLEM ANALYSIS


        For every link in ...
2.3   Problem definition                                                 2   PROBLEM ANALYSIS


2.3    Problem definition
To...
2.4   Research questions                                                   2   PROBLEM ANALYSIS


2.4    Research question...
3   ASSIGNMENT


3     Assignment
This chapter describes the assignment as given by Thales. First, the goals of this study...
3.3     Conclusion                                                           3   ASSIGNMENT


3.2.2     Outside the scope ...
Part II
Literature Study




                   20
4   DISTRIBUTED SYSTEMS


4     Distributed Systems
Processing radar signals requires many complicated computations to be ...
4.2   Objectives                                                  4   DISTRIBUTED SYSTEMS


4.2    Objectives
Reliability ...
4.4     Conclusion                                                           4      DISTRIBUTED SYSTEMS


and fault-tolera...
5   SERVICE ORIENTED ARCHITECTURES


5     Service Oriented Architectures
The previous chapter focused on common issues an...
5.2   Principles                                     5   SERVICE ORIENTED ARCHITECTURES


to perform computations on those...
5.2   Principles                                     5   SERVICE ORIENTED ARCHITECTURES


Non-Functional section contains ...
5.2   Principles                                      5   SERVICE ORIENTED ARCHITECTURES


single point of failure due to ...
5.3   Patterns                                       5   SERVICE ORIENTED ARCHITECTURES


      services and provides a se...
5.3   Patterns                                       5   SERVICE ORIENTED ARCHITECTURES


Conversely, consumers could send...
5.3     Patterns                                            5   SERVICE ORIENTED ARCHITECTURES


respectively the resource...
5.3   Patterns                                      5   SERVICE ORIENTED ARCHITECTURES


Publish-subscribe
Publish-subscri...
5.4   Conclusion                                    5   SERVICE ORIENTED ARCHITECTURES


5.4    Conclusion
This chapter ex...
6   OSGI


6     OSGi
This chapter details about an existing SOA implementation named OSGi. Only the Thales radar
chain ca...
6.2   Module Layer and Fault-Tolerance                                                  6   OSGI


6.2    Module Layer and...
6.4   Service Layer and Service Discovery                                                         6   OSGI


object after ...
6.6   Wiring of Processes                                                             6   OSGI


are grouped into Framewor...
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Dynamic System Configuration using SOA
Upcoming SlideShare
Loading in...5
×

Dynamic System Configuration using SOA

1,060

Published on

This thesis is about applying the principles and patterns of Service Oriented Architecture to make a distributed system dynamic reconfigurable on hardware failure. It provides a case study of a radar chain system. In our solution we use the Service Location Protocol and the (R-)OSGI framework to make the radar chain dynamically reconfigurable.

Published in: Technology, Travel, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,060
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Dynamic System Configuration using SOA"

  1. 1. Saxion Hogeschool Enschede Thesis Dynamic System Configuration using SOA Version 1.0 Contractors: Supervisors: Jeroen Rosenberg Richard van der Laan jeroen.rosenberg@luminis.nl richard.vanderlaan@luminis.nl Lesley Wevers Ferenc Schopbarteld lesley.wevers@luminis.nl ferenc.schopbarteld@nl.thalesgroup.com Douwe van Twillert d.a.vantwillert@saxion.nl Abstract Thales uses a static configuration to map software components to hardware compo- nents. In case of hardware failures, this mapping has to be adapted manually to restore the system. This requires the system to be inoperative for a significant amount of time, which isn’t acceptable in the mission critical systems Thales builds. Thales feels they were not technologically able to find a solution for this problem in the past, but they now see an opportunity to tackle the problem using the principles and patterns of service oriented architectures (SOA). To recover the system, processes which ran on failed processing nodes could be moved to available processing nodes. A SOA layer has been defined on top of the radar chain model to coordinate the process of restoring the system. This SOA layer is realized using the SOA based OSGi framework and the R-OSGi extension. Hengelo, December 22, 2009
  2. 2. Change log Version Date Modifications 0.1 2009-02-09 Initial version 0.2 2009-03-13 Distributed Systems, SOA characteristics 0.3 2009-04-17 Case Study Radar Chain Case, SOA principles and patterns 0.4 2009-05-08 Introduction, Systematic Approach chapter 0.5 2009-05-25 Background, Assignment, Problem Analysis, Solution, Design and Implementation 1.0 2009-06-01 Summary, Conclusion, Retrospective 1
  3. 3. Samenvatting Deze scriptie is geschreven in het kader van een afstudeerproject bij luminis in de klantcontext van Thales. Thales gebruikt zeer omvangrijke gedistribueerde systemen om de ingewikkelde berekeningen uit te voeren die te pas komen bij het verwerken van radarsignalen. De mapping van software componenten op hardware componenten in deze radarketen vindt plaats op basis van een statische configuratie. Als er componenten uitvallen in het systeem of er veranderingen in de configuratie plaatsvinden, dient deze mapping in de huidige situatie handmatig aangepast te worden, waardoor het systeem voor een significant tijdsbestek inoperatief is. Dit is onac- ceptabel in de kritieke systemen die Thales gebruikt, zoals de hierboven beschreven radarketen. Derhalve is in dit afstudeerproject onderzoek gedaan naar de mogelijkheden van service oriented architecture (SOA) om (her)configuratie dynamisch te laten plaatsvinden, waarbij de focus lag op de representatieve Thales radarketen. In een gedistribueerd systeem, zoals de radarketen, kunnen er hele andere problemen optreden dan in een volledig lokaal systeem. Een slecht ontworpen gedistribueerd systeem kan volledig plat komen te liggen doordat er een component uit is gevallen. Componenten dienen te allen tijde beschikbaar te zijn en zo min mogelijk last te hebben van het uitvallen van andere componenten. Met deze gegevens dient er in het ontwerp van een dynamisch configureerbaar gedistribueerd systeem rekening gehouden te worden. Bij het ontwerpen van het systeem passen we een aantal principes en patterns van SOA toe. SOA is een architectueel paradigma binnen software design dat gebaseerd is op samenwerkende services die een bepaalde taak uitvoeren. Een aantal SOA patterns lossen verschillende subproblemen op die we tegenkomen bij het ontwerpen van een dynamisch configureerbaar gedistribueerd systeem. Het lookup pattern helpt bij het vinden van beschikbare services; met behulp van het leasing pattern kan gedetecteerd worden of services inactief worden en met het whiteboard pattern kan de levenscyclus van componenten consequent beheert worden. In een logisch ontwerp defini¨ren we een aantal SOA services om dynamische e configuratie mogelijk te maken. Voor de implementatie van een proof-of-concept is gebruik gemaakt van het op SOA gebaseerde OSGi framework in combinatie met R-OSGi, een extensie van OSGi. OSGi biedt ons een aantal van de benodigde faciliteiten die gedefini¨erd zijn in het logisch ontwerp. Zo zorgt de OSGi e Module layer dat de afzonderlijke componenten ongevoelig zijn voor het falen van andere com- ponenten; de Life cycle layer voor dynamisch beheren van de levenscyclus van componenten en de WireAdmin voor het dynamisch configureren van de verbindingen tussen componenten. R-OSGi implementeert het Service Location Protocol (SLP), waardoor het ook mogelijk is om services op andere processing nodes binnen een netwerk te lokaliseren en te gebruiken. Daar- naast biedt R-OSGi de zogeheten RemoteEvents welke door middel van broadcasting kunnen worden verzonden om andere services binnen een netwerk op de hoogte te stellen van bepaalde gebeurtenissen, zoals het wegvallen van een zekere service. Het logisch ontwerp is vertaald naar een oplossing binnen het OSGi model. In eerste instantie is er een implementatie gemaakt van een vereenvoudigde weergave van de Thales radarketen, waarin faal scenario’s gesimuleerd konden worden. Vervolgens is het logisch ontwerp ge¨ ımplementeerd bovenop het OSGi framework, zodat het systeem dynamisch confi- gureerbaar was binnen een lokale machine. Tenslotte is het systeem aangepast met behulp van R-OSGi, waardoor herconfiguratie ook mogelijk was in een gedistribueerde omgeving. 2
  4. 4. Summary This thesis is written as a part of the graduation internship at luminis in the client context of Thales. Thales uses very large distributed systems to make complicated computations which are needed for the processing of radar signals. The mapping of software components to hardware components in this radar chain is based on a static configuration which has to be adapted man- ually in case of failures. As a result the system could be inoperative for a significant amount of time, which isn’t acceptable in the critical systems Thales uses, such as the so-called radar chain. In this project research to the possibilities of service oriented architecture (SOA) for dynamic (re)configuration has been carried out while we focused on the representative Thales radar chain. A distributed system such as the radar chain can pose quite different problems than a fully local system. A poorly designed distributed system could crash completely if one component has failed. Components should be available at all times and be fault-tolerant with regard to failures of other components. These aspects have to be taken into account while designing a dynamic configurable distributed system. In the design of the system several principles and patterns of SOA are applied. SOA is an architectural paradigm in software design based on interoperable services which perform a certain task. Several subproblems we face while designing the system are solved by SOA patterns. The lookup pattern allows finding available services; the leasing pattern provides detection of services becoming inactive and the whiteboard pattern allows con- sequent life cycle management of components. In a logical design a set of SOA services is defined to allow dynamic configuration. The SOA-based OSGi framework and the R-OSGi extension have been used to implement a proof-of-concept. OSGi provides several capabilities defined in the logical design. The OSGi Module layer allows components to be fault-tolerant; the Life cycle layer provides dynamic life cycle management and the WireAdmin allows dynamic wiring between components. R-OSGi implements the Service Location Protocol (SLP) which allows finding and using re- mote services within a network. Additionally, R-OSGi provides RemoteEvents which could be broadcasted to notify other services within a network of certain events, such as the failure of a particular service. The logical design is translated to a solution within the OSGi model. Firstly, a simplified model of the Thales radar chain is implemented, as such fail scenarios could be simulated. Secondly, the logical design is implemented on top of the OSGi framework so the system was dynamic configurable on a local machine. Finally, the system has been adapted using R-OSGi for integration in distributed environments. 3
  5. 5. Preface This thesis is written as part of the documentation of the graduation internship of Jeroen Rosen- berg and Lesley Wevers. This internship is part of the study in computer science provided by Saxion Hogescholen in Enschede, the Netherlands. The thesis work has been carried out from February 2009 to June 2009 at the Surface Radar Department of Thales Hengelo, and consisted of one paper and a developed prototype as a proof-of-concept. This thesis marks the end of our study. The following people are acknowledged for their assistance: Ing. D.A. van Twillert, R. van der Laan, F. Schopbarteld and R. van Hees. We wish to take this opportunity to express our grati- tude to Ir. J.W.M. Stroet and Mr. dr. H.J.A. Mentink for support, advice and encouragement through ups and downs. Hengelo, June 2009 Lesley Wevers Jeroen Rosenberg 4
  6. 6. CONTENTS CONTENTS Contents Samenvatting 2 Summary 3 Preface 4 1 Introduction 9 1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Client and organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Document structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 I Problem Analysis & Assignment 11 2 Problem analysis 12 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Case Study: The Thales Radar Chain . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 System context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 System components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Assignment 18 3.1 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Study scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 Solution criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.2 Outside the scope of this study . . . . . . . . . . . . . . . . . . . . . . 19 3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 II Literature Study 20 4 Distributed Systems 21 4.1 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Challenges and Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5 Service Oriented Architectures 24 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.2 Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.3 Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5
  7. 7. CONTENTS CONTENTS 6 OSGi 33 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.2 Module Layer and Fault-Tolerance . . . . . . . . . . . . . . . . . . . . . . . . 34 6.3 Lifecycle Layer and Dynamic Life Cycle Management . . . . . . . . . . . . . . 34 6.4 Service Layer and Service Discovery . . . . . . . . . . . . . . . . . . . . . . . 35 6.5 Event Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.6 Wiring of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.7 R-OSGi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.7.1 Remote Service Discovery . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.7.2 Using Remote Services through Dynamic Proxies . . . . . . . . . . . . 37 6.7.3 Remote Event Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 III Solution Approach & Analysis, System Design & Implementation 40 7 Solution approach 41 7.1 Analysing a naive solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 7.2 Solution proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 7.3 Prototype considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 7.4 Conlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 8 Solution analysis 43 8.1 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 8.1.1 System instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 8.1.2 Restoring the system . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 8.2 Use-case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 8.2.1 Use-case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 8.2.2 Actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 8.2.3 Administrator use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . 47 8.2.4 Configuration system use-cases . . . . . . . . . . . . . . . . . . . . . . 47 8.3 Secundairy use-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 8.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 9 System design 50 9.1 Design challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 9.2 Service decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 9.3 Service interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 9.4 Service capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 9.5 Service descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 9.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 10 System implementation 55 10.1 Process and link implementations . . . . . . . . . . . . . . . . . . . . . . . . . 55 10.1.1 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 10.1.2 Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 10.1.3 Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 10.2 Configuration system local implementation . . . . . . . . . . . . . . . . . . . . 57 6
  8. 8. LIST OF FIGURES LIST OF FIGURES 10.2.1 OSGi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 10.2.2 OSGi service mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 10.2.3 OSGi bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 10.2.4 Service implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 59 10.2.5 Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 10.3 R-OSGi integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 10.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 IV Conclusion & Recommendations 63 11 Conclusion 64 11.1 System recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 11.2 Service oriented architecures . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 11.3 OSGi and R-OSGi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 11.4 Final conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 12 Recommendations 67 12.1 Applying the study results to the O2 framework . . . . . . . . . . . . . . . . . 67 12.2 Remove single point of failures . . . . . . . . . . . . . . . . . . . . . . . . . . 67 12.3 Code provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 12.4 Dynamic configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 References 69 Glossary 70 Appendices 72 A Sequence diagrams 73 A.1 Service lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 A.2 Mapping service start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 A.3 Process service lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 A.4 Processing node start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 A.5 Processing node goes down . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 A.6 Software system specification changed . . . . . . . . . . . . . . . . . . . . . . 78 List of Figures 1 A high-level overview of the Thales radar chain. . . . . . . . . . . . . . . . . . 12 2 A more detailed view of the software processing subsystem. . . . . . . . . . . . 13 3 OSGi Framework layering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Use-case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5 Class diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6 Class diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 7 Service lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 8 Mapping Service start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 9 Process Service lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7
  9. 9. LIST OF FIGURES LIST OF FIGURES 10 Processing node start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 11 Processing node goes down . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 12 Software system specification changed . . . . . . . . . . . . . . . . . . . . . . 78 8
  10. 10. 1 INTRODUCTION 1 Introduction 1.1 Purpose This document is a final report of our study during the graduation period. The main purpose of this study is to validate the following thesis statement: Thesis statement. The principles and patterns of service oriented architecture contribute to implementing a system which can automatically restore the health of a software system instance after it has become damaged due to processing nodes becoming unavailable. 1.2 Client and organization The graduation assignment was commissioned by Thales and is performed in an intensive part- nership with luminis under supervision of Ferenc Schopbarteld from Thales and Richard van der Laan from luminis. luminis is a free thinking and innovative company which has a wide range of services in the field of consulting, coaching, training, application development and software engineering. Richard van der Laan, the project supervisor on behalf of luminis, is a part of the Software Development department, one of the six cores of luminis. Thales operates in several market segments, from marine radar to eTicketing and security. Within Thales, the software section of the Surface Radar / Technical Unit Processing business unit is responsible for the development of software for use in radar and optronic systems for naval and air defense applications. 1.3 Document structure This document has been structured into four parts which cover different aspects of this study. Each part consists of a number of chapters discussing related matters. Below, a general overview is provided of the document structure, including a brief description of the content of the chapters. • Part I: Problem Analysis, Assigment & Approach – Problem Analysis The motivation of the problem is explained, the terminology used in this thesis is defined, the problem is defined and the research questions are introduced. – Assignment The goal of the project is defined and the scope of the study is defined by defining the solution criteria and the assumptions which have been adopted during the study. 9
  11. 11. 1.3 Document structure 1 INTRODUCTION • Part II: Literature Study – Distributed Systems The main characteristics and common issues of a distributed system are described. – Service Oriented Architecture The principles and relevant patterns of Service Oriented Architecture are detailed. – OSGi The OSGi framework is explained and solutions of case related issues are provided. Furthermore, the additional solutions provided by the R-OSGi framework are detailed. • Part III: System Analysis, Design & Implementation – Solution Approach The problem is analysed and a functional solution to the problem is provided. – Solution Analysis Scenarios for the solution proposed in the previous chapter illustrate the workings of the solution. Following, the scenarios are analysed using a use-case analysis. – System Design The design challenges posed by the use-cases are identified. A logical design of cooperating services conquers these challenges. – System Implementation Describes the mapping of OSGi and R-OSGi to the logical design. Additionally, the implementation of the prototype is explained and motivated. • Part IV: Conclusion, Recommendations & Retrospective – Conclusion The main research question is answered, conclusions are drawn regarding the suit- ability of service oriented architecture for dynamic reconfiguration and the thesis statement is validated. – Recommendations Recommendations regarding usage of service oriented architecture and future devel- opment are made. 10
  12. 12. Part I Problem Analysis & Assignment 11
  13. 13. 2 PROBLEM ANALYSIS 2 Problem analysis This chapter analyses the problem posed by Thales. First, the context of the problem is ex- plained. Next, a case which captures the essence of the problem is analysed and the associated terminology is defined. And finally, based on the case study the problem is defined along with the main research question. 2.1 Motivation The Surface Radar department of Thales has developed a generic middleware and service frame- work which goes under the name of O2. Within the O2 framework, hardware systems and software applications can be modelled using UML diagrams and XML. These models can be read and validated by O2 to generate software components (C or Java) which can run on a multitude of platforms. Besides software components, O2 is also able to generate hardware components (VHDL) for applications demanding high-performance. For processing radar signals, Thales uses distributed systems running their O2 framework. These systems can contain hundreds of hardware boards on which distributed O2 applications can be run. To instantiate the system, a configuration is defined which maps O2 software components to the available hardware boards. While the system is operational, it is not uncommon for hardware boards to fail. In case this happens to one of the hardware boards running a crucial O2 component the whole system may fail, possibly resulting in significant downtime of mission critical systems. To repair the system in the current situation, a hardware board needs to be replaced and the mapping has to be adapted to match the new hardware configuration. As this process may take a while, the system may be down for a significant period of time. This is a serious problem in mission-critical situations, a solution to this problem has to be found. 2.2 Case Study: The Thales Radar Chain Thales has defined a case which captures the essence of the problem as described in the previous section. The case is based on an existing O2 based system where the output of a physical radar system is processed and transformed to a form which can be displayed on a radar screen. In this section the case and the terminology will be defined to be used in later chapters. 2.2.1 System context In a radar chain, a radar system generates data which has to be displayed on a radar screen in a form that is understandable by humans. The complete system can be divided into a number of subsystems. Figure 1 provides a high-level overview of the radar chain, showing the subsystems and the flow of data between them. Figure 1: A high-level overview of the Thales radar chain. 12
  14. 14. 2.2 Case Study: The Thales Radar Chain 2 PROBLEM ANALYSIS The first subsystem is the physical radar system itself. The radar system picks up an analog signal of electromagnetic waves and transforms this into a digital signal which can be processed to extract relevant information. The radar system can generate hundreds of gigabytes of data per second which all needs to be processed in realtime. At present this is a lot of data to handle by software running on general purpose processors. Thales has chosen to reduce the data stream to a more manageable level before moving to software processing. This first processing step is implemented in hardware as this allows for much higher processing rates than software implementations. After this initial processing step, at most a few megabytes of data per second remain to be processed, which allows further processing to be performed in the software domain. Software processing is performed on a distributed system to distribute the workload onto multiple pro- cessing nodes. This part of the system will be referred to as the software processing subsystem. The final step in the radar chain is to actually do something with the processed data. The data can for example be visualized and send to a screen for an operator to view. The problem, which is introduced in the next chapter, revolves mainly around the software processing subsystem. The details of the other subsystem are not in the scope of this study and they will therefore not further be discussed. The next section will continue by defining the software processing system in more detail. 2.2.2 System components As noted in the previous section, this case study will primarily focus on the software processing subsystem. This section will define the elements of which this subsystem consists. Also, the relations between these elements will be defined, together providing a global overview of the architecture of this subsystem. Figure 2: A more detailed view of the software processing subsystem. The software processing subsystem can broadly be divided into two domains, the hardware domain containing physical hardware, and the software domain containing the software elements. 13
  15. 15. 2.2 Case Study: The Thales Radar Chain 2 PROBLEM ANALYSIS Hardware domain The software processing subsystem’s hardware domain consists of processing nodes which are interconnected by physical connections. A processing node is a physical hardware board capable of running processes from the software domain. A processing node is considered to be available if it is able to host processes, or already hosting processes. Otherwise, the processing node is considered to be unavailable. A processing node may become unavailable at any time due to hardware failures or by an administrator turning the system off. A collection of interconnected processing nodes is called a hardware system. The topology of a hardware system may change during operation of the system in the event of processing nodes becoming available or unavailable. For the purpose of this case, it can be assumed that all processing nodes within a hardware system are able to communicate with eachother at all times. Processing nodes can’t get isolated by the network failing and processing nodes are always connected to the network while they are in an unavailable state. Software domain The software processing subsystem’s software domain consists of the non-physical elements that make up a system. For the purpose of this case, the software domain consists of processes and links which together form a software system. • Process A process is a unit of software which can accept input on an input port and which can produce output on an output port. For every process in a software system, a process configuration is available. The process configuration contains a name which unique identifies a process, and the name of the runtime which implements the functionality of the process. Based on this configuration, a process instance can be instantiated on a processing node. A process is considered to be instantiated if a process instance exists for the process, otherwise the process is considered to be uninstantiated. For the purpose of this case, it can be assumed that processes can only be instantiated once. A process instance can be destroyed to make a process uninstantiated. Also, if a pro- cessing node becomes unavailable, the processes running on it get destroyed and become uninstantiated. • Link A link is a connection between the output port of one process to the input port of another process. A link allows a data stream to be set up between two processes. The process producing the output is defined as the producer of the link, and the process accepting data is defined as the consumer of the link. 14
  16. 16. 2.2 Case Study: The Thales Radar Chain 2 PROBLEM ANALYSIS For every link in a software system, a link configuration is available. This link configura- tion specifies the name of the producer process and the name of the consumer process. If the producer process is instantiated, a link instance can be instantiated by making the producer process send its output to the address of the consumer process’s input port. If a link instance is available for a link, the link can be considered to be instantiated, otherwise the link is uninstantiated. A link can further be considered valid or invalid. If both the producer process and the consumer process of a link are instantiated, the link is valid. If one of both processes of the link is not instantiated, the link is invalid. A link can be instantiated and invalid at the same time, in this case the producer process of the link is still sending its output to the previous location of the consumer process, but the consumer process is no longer instantiated. Furthermore, if an uninstantiated link is valid, it can be instantiated to set up a data stream between the processes. A link instance is considered to be healthy if the link it represents is valid, otherwise the link instance is considered to be damaged. A healthy link instance becomes damaged if the consumer process becomes uninstantiated. A link instance can be destroyed to make a link become uninstantiated. This is accom- plished by making the producer process stop sending its output to the consumer process. Also, a link instance gets destroyed if the producer process becomes uninstantiated. • Software system A collection of processes and links which make up a processing chain is defined as a soft- ware system. The configuration of a software system is defined by a software system specification con- sisting of a collection of process configurations and link configurations. Based on a software system specification, a software system instance can be instanciated by instantiating processes for all process configurations, and instantiating links for all link configurations. If a software system instance is instantiated for a software system, the software system is considered to be instantiated, otherwise it is uninstantiated. A software system instance is considered to be healthy if all processes and links, as defined in the software system configuration, are instantiated, and all link instances are healthy. If this is not the case, the instance is considered to be damaged. When a software system instance is created, at first all processes and links are uninstan- tiated. This means that a software system instance always starts in a damaged state. Bringing a software system instance from a damaged state to a healthy state is defined as recovering the software system instance. 15
  17. 17. 2.3 Problem definition 2 PROBLEM ANALYSIS 2.3 Problem definition To instantiate a software system, the processes and links as specified in a software system speci- fication need to be be mapped onto available processing nodes. In the current situation, Thales defines this mapping in a static configuration of hardware components and software compo- nents. In case the configuration of software components or hardware components changes, the mapping has to be adapted manually to match the new configuration. While a software system instance is operational, the processing nodes it is instantiated on may become unavailable due to hardware failures. This causes any processes running on these pro- cessing nodes to become uninstantiated, resulting in a damaged software system instance. To restore the software system instance back to a healthy state, its processes and links which have become uninstantiated have to be instantiated again, and link instances which have become damaged have to become healthy again. In the current situation, a failed processing node has to be replaced and configured to perform the tasks of the processing node it is replacing. After replacing the failed processing node, the system becomes healthy again. Thales wants the radar systems to be more reliable in the event of processing nodes becoming unavailable. In case of the software processing subsystem, this means downtime of software system instances need to be minimized in the event of a processing node becomes unavailable. To accomplish this, Thales wants a software system instance to be able to recover by itself automatically in case of processing nodes becoming unavailable. The problem to be solved to get from the current situation to the desired situation can now be defined as follows: “How can a software system instance be automatically restored to health after it has become damaged due to processing nodes becoming unavailable?” Thales feels they were not technologically able to handle this problem in the past, so no work has been done yet to solve the problem. They now see an opportunity to tackle the problem by the use of the principles and patterns of service oriented architectures . A service oriented architecture, or SOA in short, is an architectural style in which related business processes are grouped and packaged as services which can interoperate to coordinate actions. Over the past years, SOA has been widely adopted in the industry and as such principles and patterns have started to emerge to solve common design problems. Some of the principles and patterns of SOA might be helpful to solve the problem. Thales wants to know how a system can be implemented to solve the problem just defined, which incorporates the patterns and principles of SOA. The problem to be solved can now be defined as follows: “How can a system be implemented, based on the principles and patterns of SOA, which can automatically restore the health of a software system instance after it has become damaged due to processing nodes becoming unavailable?” 16
  18. 18. 2.4 Research questions 2 PROBLEM ANALYSIS 2.4 Research questions Now the problem is defined, additional sub-questions rise. Below is an overview of the sub- questions we’re about to answer in the upcoming chapters. Which techniques and implementations could contribute to implementing a system to solve the problem? What characterizes a distributed system such as the Thales radar chain? What is service oriented architecture? Part II How can the principles and patterns of service oriented architecture con- tribute to implementing a system to solve the problem? Which existing implementations of service oriented architecture could con- tribute to implementing a system to solve the problem? How can a dynamic configurable system be realized based on the principles and patterns of SOA? What kind of approach could be taken to restore the health of a damaged system? Which scenarios can be identified? Which use-cases can be identified? Part III Which design challenges need to be solved? How can the principles and patterns of SOA be applied to solve these design challenges? How can a system be designed to implement the solution? How can the system design be implemented? What conclusions can be drawn based on this study? In what ways does service oriented architecture contribute to solving to the Part IV main problem? Which recommendations can be made? 2.5 Conclusion In this chapter, first the context of the problem was defined. Next, the Thales radar chain case was introduced in order to define the problem domain and terminology used in this document. Finally, the problem was defined based on the Thales radar chain case. 17
  19. 19. 3 ASSIGNMENT 3 Assignment This chapter describes the assignment as given by Thales. First, the goals of this study are defined. Next, the scope of the study is defined by specifying the solution criteria, making assumptions about the problem domain as defined by the Thales radar chain case, and specifying what is outside the scope of this study. 3.1 Goal The main goal of this study is to determine how a system can be implemented, based on the principles and patterns of SOA, to automatically restore the health of a software system instance after it has become damaged due to processing nodes becoming unavailable. To reach this goal, the following partial goals have been defined: 1. Determine how a software system instance can be restored to health after it has become damaged due to processing nodes becoming unavailable. 2. Determine how SOA can contribute to implement a system to automatically restore a software system instance after it has become damaged due to processing nodes becoming available. 3. Design and implement a prototype of a solution based on the principles and patterns of SOA. 3.2 Study scope 3.2.1 Solution criteria The solution to be found must adhere to the following criteria: 1. The current architecture as described in the case study should be kept intact as much as possible. 2. Every distinct piece of data may be only processed once per process 3. The design and implementation must be based on the principles and patterns of SOA. Further, the following assumptions are made: 1. Processes can run on any processing node. 2. If a processing node is available, it is always connected to all other available processing nodes. 3. A processing node which is already running is never connected afterwards to another running processing node. 4. Addresses of processing nodes do not change while the system is operational. 5. All processing nodes have access to all runtimes required by processes. 18
  20. 20. 3.3 Conclusion 3 ASSIGNMENT 3.2.2 Outside the scope of this study This study does not deal with the following aspects: • Multiple software system specifications. • Management of software system specifications. • Connections to the exterior of the software processing subsystem. • Loss of data which is processes by a software system instance. • Handling of software failures. • Optimizing system performance by any means. • Removing single point of failures from the system. • Applying the system to or integrating the system with any existing technologies. 3.3 Conclusion In this chapter the assignment given by thales was defined. First, the study goals were defined. Next, the study scope was defined by specifying the solution criteria, assumptions made about the problem and defining what is outside the scope of this study. 19
  21. 21. Part II Literature Study 20
  22. 22. 4 DISTRIBUTED SYSTEMS 4 Distributed Systems Processing radar signals requires many complicated computations to be performed. To accom- plish this, Thales has distributed these computations throughout hundreds of hardware boards, using a technique called distributed computing. Distributed computing is a form of parallel computing and deals with both hardware and software systems containing more than one pro- cessing element, storage element, concurrent process or program. Within distributed computing a program is divided into parts which can run simultaneously on multiple computers within a network. Such hardware or software systems are called distributed systems. The subsequent sections provide a more detailed overview of distributed systems, their charac- teristics and challenges. This chapter thereby attempts to answer the research question: Research question. What characterizes a distributed system such as the Thales radar chain? Firstly, the main characteristics regarding distributed computing are detailed. Secondly, (un)handled issues of distributed systems are discussed. These topics are relevant with respect to the Thales radar chain case. 4.1 Characteristics A distributed system is not just another name for a network of computers. It is an application that executes a collection of protocols to coordinate the actions of multiple processes on a net- work, such that all components cooperate together to perform a single or small set of related tasks. Components in networked computers communicate and coordinate their actions only by passing messages. A distributed system is build on top of a network, presenting separated com- ponents and multiple computers as if they were a single entity, providing the user, the consumer, whatever services are required. The main goal of a distributed system is to connect users and resources in a transparent, open (i.e. each subsystem is continually open to interaction with other systems), and scalable way. Ideally this arrangement is drastically more fault-tolerant and more powerful than many combi- nations of stand-alone computer systems. To accomplish this goal, a few requirements have to be met: • The system must be extremely robust. For instance, it’s unacceptable that error messages hold up the entire system until required user input is provided. • Plug and play capability. Additional hardware or software can be instantly added to the system, without needing to install them. • High compatibility. Services and devices can interact with one another without the need of additional configuration. • Automatic detection of new services or devices (e.g. a camera detects a newly connected printer) 21
  23. 23. 4.2 Objectives 4 DISTRIBUTED SYSTEMS 4.2 Objectives Reliability is an important aspect in distributed computing. Because different subsystems in- clude heterogeneous, overlapping and possibly conflicting information (pluralism), the system has to deal with concurrency and inconsistency. Besides, executed actions or made publications cannot be reverted (monotonicity). To be truly reliable, a distributed system must have certain characteristics, which are summarized in the listing below. [3, 4, 5] A distributed system needs to be: Fault-tolerant It can recover from component failures without performing incorrect actions. Highly available It can restore operations, permitting it to resume providing services even when some components have failed. Recoverable Failed components can restart themselves and rejoin the system, after the cause of failure has been repaired. Consistent The system can coordinate actions by multiple components often in the presence of concurrency and failure. This underlies the ability of a distributed system to act like a non-distributed system. Scalable It can operate correctly even as some aspect of the system is scaled to a larger size (e.g. increasing the size of the network, or the number of users). Predictable performance The ability to provide desired responsiveness in a timely manner. Secure The system authenticates access to data and services Extensible Interfaces should be cleanly separated and publicly available to enable easy exten- sions to existing components and add new components. Interoperable despite heterogeneity Various entities in the system must be able to interop- erate with one another, despite differences in hardware architectures, operating systems, communication protocols, programming languages, software interfaces, security models, and data formats. 4.3 Challenges and Issues Distributed systems cause problems more frequently than fully local systems. Moreover, some problem categories aren’t even relevant in local systems, for example (potential) networking problems. In the first place, because processes and their required resources are distributed across the network, the code or the data used by a process needs to be moved over and over again. This requires compilation and installation respectively uniformity in data formats. Sec- ondly, it can take a lot longer to access remote data, due to latency. Therefore the time that it will take to complete an operation cannot be bounded in advance (unbounded determinism). Thirdly, partial failures of the network can be a huge problem if the unavailability of a node can cause disruption of the other nodes. The characteristics listed in the previous section are high standards, which are challenging to achieve. Probably the most difficult challenge is that a distributed system must be able to continue operating correctly even when components fail. Services have to be highly available 22
  24. 24. 4.4 Conclusion 4 DISTRIBUTED SYSTEMS and fault-tolerant. A highly available service is one that continues to provide a possibly de- graded service despite a certain number and type of process failures and despite disconnected operations. A fault-tolerant service is one that always behaves correctly despite up to a given number and type of failures. To design a distributed system with the characteristics listed in the previous section, one must design for failure. This implies not making any assumptions about the reliability of the com- ponents of a system. Below is a listing of the eight most commonly (yet premature) made assumptions, better known as the eight fallacies of distributed computing [1, 2]. Eight Fallacies 1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn’t change 6. There’s one administrator 7. Transport costs are zero 8. It’s an homogeneous network 1 4.4 Conclusion This chapter focused on several aspects of distributed systems which are relevant with regard to the Thales radar chain case. We’ve overlooked some important requirements and objectives when designing a distributed system. For the Thales radar chain case, robustness, fault-tolerance and high availability are the most important among these requirements and objectives. The last part of this chapter focused on important challenges and issues which should be conquered in our system’s design. Especially the eight fallacies of distributed computing should be taken into account. 1 This fallacy was added six years later by James Gosling (inventor of Java). 23
  25. 25. 5 SERVICE ORIENTED ARCHITECTURES 5 Service Oriented Architectures The previous chapter focused on common issues and challenges regarding distributed systems which should be taken into account by designing a dynamic reconfigurable system. This chapter details about architectural patterns that could be of use while designing such a system. This chapter thereby attempts to answer the research questions: Research question. What is service oriented architecture? Research question. How can the principles and patterns of service oriented architecture con- tribute to implementing a system to solve the problem? The principles and patterns of service oriented architecture treated in this chapter provide the first step to a logical design. 5.1 Overview Service oriented architecture, or SOA for short, can essentially be defined as an architectural paradigm in software design which is based on services which interoperate to perform a certain task. There is no official definition of SOA, but a more elaborate one is stated by OASIS (Or- ganization for the Advancement of Structured Information Standards): “A paradigm for organizing and utilizing distributed capabilities that may be under the control of different ownership domains. It provides a uniform means to offer, discover, interact with and use capabilities to produce desired effects consistent with measurable preconditions and expectations.” This definition still leaves a lot of gaps to be filled if one wants to implement SOA. As can be seen by studying existing SOA implementations, the vision put forth in existing implementations can vary greatly for most aspects of this definition. This chapter will further explore the field of SOA by first looking at the characteristics which define a SOA. The principles for a good SOA design are explained followed by patterns for solving common design problems in SOA. Finally, the key elements of some SOA implementations are discussed. 5.2 Principles Services are the building blocks of SOA applications. They are an embodiment of the separation of concerns theory which is based on the notion that large problems become easier to handle as they are broken down into smaller problems. In a way services in SOA are similar to classes in object oriented programming. Using classes to break down a problem into seperate concern works well on small levels, but as a system gets bigger the shear number of classes can introduce a lot of complexity. Services, however, break problems down on a much more granular level to solve these complexity issues. They provide a collection of related capabilities to service consumers and are called as such service providers. The definition of a service does not place any limits on what kind of capabilities a service can provide. A service could for example provide functionality for user au- thentication, but it could as well provide access to hardware systems to allow service consumers 24
  26. 26. 5.2 Principles 5 SERVICE ORIENTED ARCHITECTURES to perform computations on those systems. Over the past years, the industry defined a common set of design principles for SOA which should make implementing a SOA more successful. Interoperability is fundamental to every one of these principles and therefore an expected service design characteristic. Moreover, stating that services should exist implies stating that services should be interoperable. Each of the eight common principles supports or contributes to interoperability in a way. These principles and their relation to the overarching principle of interoperability will be discussed in the subsequent sections. Loose Coupling Coupling implies some kind of connection or relationship between entities, thus, a level of de- pendency. There are numerous types of coupling involved in the design of a service within the context of SOA, regarding service contracts, their implementation and service consumers. The principle of loose coupling addresses to reduction (’loosening’) of these dependencies, by the creation of a specific type of relationship within and outside of service boundaries. By making the individual services less dependent on others, they are more accessible for different consumers and interoperability is increased. Loose coupling could obviously be achieved by detaching the service interface from its underlying implementation, but the appropriate level of coupling requires that practical considerations be balanced against various service design preferences. This includes the independent design and evolution of a service’s logic and implementation while still guaranteeing baseline interoperability with dependent consumers. [19] Having loosely coupled components also means having a more fault-tolerant system, because dependencies between components are minimalized. When a single component fails, the other components could still be in a operational state. This is very important in the previously described Thales radar chain case. Service Contracts A service contract communicates the purpose and capabilities of a service and describes how the service interacts with its consumers. It could be viewed as a composition of functional meta- data and a set of policies, such as security constraints, transport and service level agreements. For instance, security requirements may differ when the service is consumed outside a trusted network. Information about services is limited to what is published in service contracts. A service contract consists of the following components: Header section including the name, version, owner and type (e.g. process, data, etc. ) of the service. The name should indicate the functionality of the service in general terms. The type helps to distinguish the layer in which the service resides. Functional section contains the functional requirements, invocation means (e.g. SOAP, REST, Event Trigger, etc.) including the URL and interface, supported operations, methods and actions of the service. The description should be very accurate. 25
  27. 27. 5.2 Principles 5 SERVICE ORIENTED ARCHITECTURES Non-Functional section contains security constraints and roles, service level agreement which determines the amount of latency allowed and quality of service which determines the allowable failure rate. Additionally, in case the service is part of a larger transaction the means to control this should be indicated. All services within the same repository should use a standardized format for describing a service contract to maximize interoperability. Service contracts enable loose coupling by hiding service- internal details from the outside world behind a facade. Abstraction A service should never detail about how it goes about its business to meet the requirements of the contract. For example, it doesn’t matter which programming language or platform was used to implement the service, as long as the service sticks to its side of the contract. Abstracting service details limits all interoperation to the service contract. By obeying these guidelines, the underlying service logic can be exchanged or evolved indepen- dently of the components which rely on the service. This increases the long-term consistency of interoperability. Reusability Reusability forms the base of key service models. The official definition for this principle states: ”Services contain and express agnostic logic and can be positioned as reusable enterprise re- sources.” Individual service capabilities should be appropiately defined in relation to an agnostic (i.e. asserting the uncertainty of all claims to knowledge) service context. Reusability fur- ther requires a high-level of interoperability between the service and several potential service consumers. Autonomy The underlying service logic requires a certain autonomy with regard to its execution environ- ment and resources to provide their capabilities in a consistent and reliable way. Increasing this degree of control to a significant level leads to minimization or at least reduction of depen- dencies on shared resources. Moreover, it contributes to making the behaviour of the service more consistently predictable by simultaneously increasing its reuse potential and thereby its attainable level of interoperability. Autonomy on a service level distinquishes service boundaries from one another, although the service might still share several underlying resources. This can be illustrated, for instance, by a wrapper service that encapsulates a legacy system which is independently utilized from the service and still shares resources with other legacy based clients. Autonomy could also be taken one step further, as such the underlying logic is completely owned by the service. This generally is the case when the supportive service logic has been built from the ground up. On the one hand, this obviously is advantageous with regard to scalability. Besides, it provides a more reliable solution to countering the single point of failure (i.e. a part of a system which, if it fails, will stop the entire system from working) risk. This is particularly relevant in the previously explored Thales radar chain case, which currently contains such a 26
  28. 28. 5.2 Principles 5 SERVICE ORIENTED ARCHITECTURES single point of failure due to its static configuration. Increasing service autonomy could decrease mutual dependency of components of the radar chain and thereby increasing the system’s fault- tolerance. On the other hand, this implies the need of rendering and deployment of new service logic, which could increase expenses and efforts. Statelessness Services, ideally, are designed to contain state information only when this is explicitly required. Management of this information could namely compromise their availability and undermine their scalability potential. Therefore, a stateless design allows services to interoperate more frequently and reliably. In such a design, adequacy of the surrounding technology architecture to provide state management delegation and deferral options should be taken into account. Discoverability The discoverability characteristic of a SOA is meant to help avoiding the accidental creation of services that are either redundant or implement redundant logic. Owing to the fact that each particular service operation is meant to provide a potentially reusable piece of automation logic, metadata that comes attached to a service must sufficiently describe the functionality offered by its individual operations in addition to its overall purpose. Although this particular characteristic is distinct from discoverability on an architectural level, in which case the term service discoverability refers to the technology architecture’s ability to provide a mechanism of discovery (e.g. a service directory or registry), it is largely consistent with it. This actually becomes part of the overall infrastructure that is meant to support the implementation of a SOA. On a service level, the term discoverability refers to the design of an individual service so discov- erability is maximized, regardless the needs for it in its surrounding implementation environment. Even if there’s no need for a service registry, services should be designed as highly discoverable resources by equipping them with sufficient metadata to properly communicate its purpose and capabilities. This simply allows services to be more easily located by potential consumers. Be- sides, the evolutionary governance can be better managed when the service portfolio increases in size. [10, 11] When looking at the previously posed Thales radar chain case, service discovery could solve the problem of checking whether all required services are still up and running. This is of particularly importance, because all services are essential and if they appear to be down, they should be restarted in some ways. Discovery mechanism To allow service consumers to access their requested services, it’s required for them to know how to find and access the service. To accomplish this, the so-called Offer-Discover-Interact model can be used. This model consists of the following steps: Offer When a service becomes available, it publishes its services by registering it’s interface, so other entities can make use of them. Discover Service consumers can find published services by using a discovery mechanism. Usu- ally, a consumer sends a lookup request to a service registry, which contains all available 27
  29. 29. 5.3 Patterns 5 SERVICE ORIENTED ARCHITECTURES services and provides a service interface for the consumer to ’communicate’ with. Interact The service consumer can now use the published services to accomplish its tasks, through the service interface. The consumer thereby monitors the progress of the service. These steps can be accomplished by using a service discovery protocol such as the Service Lo- cation Protocol (SLP) or the one provided by the Jini framework. SLP provides a framework which allows discovering the existence, location and configuration of networked services. Jini is an open software architecture that enables developers to create services that are adaptable to changes in the network. Its specification offers a standard lookup service, which can be discovered with a simple API call once running. [15] The following steps summarize the procedure for using the SLP or Jini lookup service: 1. The address, respectively a connector stub is registered with the lookup service, possibly giving additional attributes that qualify the connector, and can be used as filters. 2. The client queries the lookup service, and retrieves one or more addresses, respectively connector stubs that match the query. 3. Finally, the client obtains a connector that is connected with the server identified by a retrieved address respectively connects directly to the server using the provided connector stub. Composability Composability of a service addresses its requirement to be capable of participating as an effective composition member, regardless of whether there is a direct need to be listed in a composition. Again, interoperability is an important precondition. In addition, succeeding in meeting the composability requirements often depends on the extent to which services are standardized and data exchange between them is optimized. 5.3 Patterns As more and more software systems are developed, similar solutions will be used to solve prob- lems which cause patterns to emerge. Just like in object-oriented design, in SOA the same architectural problems arise over and over again. In this section we’ll take a look at a couple of relevant SOA patterns with regard to the several problems we need to solve. Lookup The lookup pattern provides a way of finding and accessing resources, regardless of whether they are local or distributed. [12] A resource could initially be anything, for instance a piece of data. In the current context, a service is regarded as a resource. Problem A fundamental problem of resource acquisition is finding the concerning resource (if available) in the first place. Resources could be managed (i.e. adding and removing resources) by resource providers. Such a resource providers could, for example, frequently send broadcast messages offering available resources, so interested consumers become aware of their existence. 28
  30. 30. 5.3 Patterns 5 SERVICE ORIENTED ARCHITECTURES Conversely, consumers could send broadcast messages requesting required resources. The con- sumer could then choose the offered resources it needs from all replying resource providers. Both ways, however, could frustrate efficiency since lots of messages are send across the network (in case of a distributed system). An efficient and inexpensive solution requires [12]: Availability A resource consumer must be aware of available resources in its environment. Bootstrapping A resource consumer should be able to obtain an initial reference to a resource provider that offers the resource. Location independence Resource consumers and providers should be able to acquire respec- tively provide a resource, regardless of whether they know each others locations. Simplicity Resource consumers and providers shouldn’t be burdened. Solution The lookup pattern addresses this problem by using a so called lookup service as a mediating instance. Via this lookup service, the resource provider publishes resources along with describing properties. In the same ways, resource providers also register references to themselves, so consumers could retrieve these, search for required resources using the properties, retrieve and finally use these resources. [12] A Jini lookup services contains service type, id’s and specific attributes of registered services. Consumers search into the lookup service for their desired service, based on type, service id (if they happen to know this) or specific attributes. [16]. Leasing Leasing solves a lot of the problems inherent in distributed computing. Self-healing addresses one of the primary concerns. Distributed systems should function for a long time without needing humans to make repairs or reconfigurations. A second concern is evolvability (e.g. upgrading the system). It is out of the question to take the system down for maintenance. Moreover, it isn’t guaranteed every machine is reachable to be upgraded smoothly without failures. One must be able to evolve the system incrementally. Problem At a certain point, a resource user may lose his interest in using the concerning resource. The resource is then needlessly consumed, unless the user releases it by explicitly terminating its relationship with the provider. This not only negatively affects the performance of resource user and provider, but may also have a degrading effect on resource availability for other users. A second problem could occur when dealing with distributed resource users and providers. When the machine of the latter crashes, the resource user, being uninformed about resources becoming unavailable, may continue to reference resources which are no longer available. [12] Solution The primary idea behind leasing is that a lease holder must establish a continued proof-of-interest in using some resource, which can be essentially anything, if it is allowed access to it in the first place. So, for every resource used by some resource user a lease is introduced. This lease is granted by a grantor and obtained by a holder, typically the resource provider 29
  31. 31. 5.3 Patterns 5 SERVICE ORIENTED ARCHITECTURES respectively the resource user. Additionally, a time duration for usage of the ’reserved’ resource is specified by the lease.[12] If the lease holder fails to demonstrate interest, the lease expires and the resource is released. By granting a lease, the system guarantees that failures will be detected without requiring any separate component other than the lease grantor. Leasing also guarantees that irrelevant data will simply be forgotten when leases expire; it automatically cleans up after failed components and the concerning service will be forgotten. This provides also a way to evolve parts of the system in isolation. One is free to run a different version of a ’forgotten’ service and plug it in. In a Jini system, for instance, the lookup service uses time-based resource reservation for storing service items, called a lease. The grantor of the lease, the lookup service, makes the call, accepting or denying the lease. While a lease is active, the lease holder can cancel it, in which case the corresponding resource is also freed. The holder, the service, can renew the lease. If the lease isn’t renewed for a certain amount of time, the service is supposed to be unavailable and will be ’forgotten’ (i.e. the service item will be cleaned up) [16]. Proxy The proxy pattern lets resource consumers communicate with a representative, rather than to the resource itself. This straightforwarded principle serves many purposes, such as providing easier access and protection of unautorized access. [14] Problem In many cases it is often considered inappropriate to access a component or resource directly. It is undesired to configure their physical location in a static way and unrestricted access to them may be inefficient or even insecure. Additional control mechanisms are needed to ensure access to entities lapses in an efficient, safe and transparent 2 way. In addition, a consumer should be able to access any component or resource using the same calling behaviour and syntax. Solution The solution to the problems stated above is to a representative, a so called proxy, to offer the interface of the concerning entity. This representative performs additional pre- and postprocessing (e.g. access-control, checking or making read-only copies of the original). In a Jini system, each application uses services through so-called proxies. A proxy allows the program to communicate with the service, but shields its details. Proxies are dynamically down- loaded by the consumers of the service. This way, extension of functionality can be accomplished on-the-fly. Proxies use the same protocol as the backend portion of the service. Consumers are shielded from this information. All they care about is the provided functionality of the service. One special service, the Lookup Service, keeps track of all the available services and provides access to them. Services publish themselves by storing their specific proxy in the lookup service. This publishing process is called join [16]. The Lookup Service now contains a so-called service entry, which consists of a unique service id, a proxy and a number of attributes which describe the functionality of the service. Consumers query the lookup service for available services and the Lookup Service provides the proxy of the requested service (type). 2 Full transparency can obscure cost differences between services. 30
  32. 32. 5.3 Patterns 5 SERVICE ORIENTED ARCHITECTURES Publish-subscribe Publish-subscribe is an asynchronous messaging paradigm where senders (publishers) of mes- sages are programmed to characterize the messages into classes before posting them, regardless what receivers (subscribers) might or might not read them. Subscribers express interest in one or more classes, and only receive messages that are of interest, without knowing what publishers posted these messages. Problem In the traditional tightly-coupled client-server paradigm, the client cannot post mes- sages to the server while the server process is not running, nor can the server receive messages unless the client is running. This means that system components need to check if a specific service is up and running each time they want to send a message to it. This unnecessarily burdens the system. Solution The solution to the problem stated above is the decoupling of publishers and sub- scribers, which can allow for greater scalability and a more dynamic network topology. Distributed event-based systems use the publish-subscribe paradigm in which an event-generating object publishes the type of events that will be available for other objects. These systems are useful for communication between heterogeneous components and their asynchronous nature allows publishers and subscribers to be decoupled. Whiteboard pattern The whiteboard pattern defines a central application manager to handle dependencies between event sources and event listeners. This straightforwarded principle is of great importance when dealing with dynamic behaviour of system components. Problem The most relevant but not so obvious issue with the traditional Listener pattern is the dependency that is created between the event source and the listener. This is called the life cycle issue: If the event source goes away, the listener must clean up any references it holds and vice versa. This removal phase is hard to verify. It is often not handled at all in workstation environments, where an application is started by the user, because management of listeners is a non-issue and will be handled when exiting the application. However, when dealing with continuously running applications in a dynamic environment, as in the Thales radar chain case, consequent life cycle management is extremely important. Solution Applying the whiteboard pattern solves the problem stated above. Unlike the listener pattern, the whiteboard pattern leverages a central application manager for handling life cycle management. Instead of having event listeners track event sources and then register themselves with the event source, the whiteboard pattern has event listeners register themselves at a central application manager. When the event source has an event object to deliver, the event source calls all event listeners in this application manager. As a result, both server and application become simpler because they reuse the central application manager and can delegate the responsibility for managing the details of dependencies between source and listeners to it. 31
  33. 33. 5.4 Conclusion 5 SERVICE ORIENTED ARCHITECTURES 5.4 Conclusion This chapter explained what service oriented architecture (SOA) means. It focused on several relevant SOA principles and patterns which could be applied when designing our system. Es- pecially the lookup pattern (locating available resources), leasing pattern (detection of services going down) and the whiteboard pattern (consequent life cycle management) have proven to be very useful. By using these patterns and keeping the SOA principles in mind a set of interoperable services could be defined for dynamic life cycle management of system components. This way, a logical design is defined for a dynamic reconfigurable system. The next chapter details about an existing SOA implementation which could provide a lot of the required facilities. 32
  34. 34. 6 OSGI 6 OSGi This chapter details about an existing SOA implementation named OSGi. Only the Thales radar chain case related aspects of the OSGi framework will be described. For each aspect will be defined which problem posed by the Thales radar chain case is solved. This chapter thereby attempts to answer the research question: Research question. Which existing implementations of service oriented architecture could con- tribute to implementing a system to solve the problem? 6.1 Overview OSGi provides a service-oriented, component-based environment for developers and offers stan- dardized ways to manage the software lifecycle. Technically, OSGi is a specification for a service platform framework and service bundles. An OSGi implementation has to implement the frame- work and can optionally provide service bundles which support basic functionalities such as logging. The OSGi Framework implements a complete and dynamic component model, which doesn’t exist in standalone Java/VM environments. It is a service framework in which services, pack- aged into software components called bundles, can be installed, updated and removed without restarting the framework. Although it is intended for relatively small embedded devices, it is widely applicable. The OSGi framework consists of three layers (see figure 3), namely the module layer, the lifecycle layer and the service layer. Each of these layers contribute to solving subproblems posed in the previous chapter. Figure 3: OSGi Framework layering 33
  35. 35. 6.2 Module Layer and Fault-Tolerance 6 OSGI 6.2 Module Layer and Fault-Tolerance Owing to the fact that we’re dealing with a distributed system in our case, loose coupling be- tween system components is very important. The system must be fault-tolerant in such way that when a processing node fails, the rest of the system remains in an operational state. To accomplish this, modularization is an important issue. The modularization concept in OSGi Framework is supported by the module-based class loading policy defined by the module layer. Usually, Java applications have a flat class loader architec- ture. OSGi bundles add a modularization layer to Java which allows modules to declare shared and private class space and controls linking between modules. A bundle is the central unit of OSGi. It’s a JAR file which contains resources such as Java code or native libraries. Bundles are encapsulated and separated from each other by a name space concept. OSGi applications can consist of several bundles which are loaded by (at least) one individual private class loader. Bundles could be used by other applications running on the same platform, but unless Package-Exports are defined, bundle code is private. Package-Imports and Package- Exports define dependencies between bundle code and are stored as additional entries in the Jar Manifest File. Exported packages are public and could be used for resolving imports of other bundles who defined a package import. These bundles resolve the import by consulting the package database and creating a delegation from the importing class loader to the exporting class loader. This allows dynamic runtime linking of bundle code. When, for instance, a service must input every class within the framework, import dependencies of packages cannot be determined during compilation time. In these cases the DynamicImport mechanism could be used by defining a wildcard asterisk (*) in the bundle manifest. This indicates that additional packages might be required. 6.3 Lifecycle Layer and Dynamic Life Cycle Management The second important issue is dynamic life cycle management. This means in our case that pro- cessing nodes running container services, as defined in the logical design in the previous chapter, could be added to or removed from the system on-the-fly. The rest of the system should remain in an operational state. The lifecycle layer introduces this kind of dynamics that are normally not part of an applica- tion. It deploys application or components as OSGi bundles which can be managed at runtime. Bundles can be remotely installed, started, stopped, updated and uninstalled without the need of rebooting the system. They rely on the module layer for class loading but add an API to manage the modules in run time. Each container from our logical design is implemented as an OSGi bundle and could be inserted to the system on-the-fly. Bundles have their own Activator class which implements the start and stop methods of the BundleActivator interface. These methods will be invoked when a Bundle is started respectively stopped. A so-called BundleContext object which is passed by these methods supports usage of the OSGi framework. In general, bundles hold a public static reference to the BundleContext 34
  36. 36. 6.4 Service Layer and Service Discovery 6 OSGI object after receiving it in the start method. This allows other classes to interact with the framework. Bundles are installed by creating at least one new class loader. Deinstallation is achieved by disposing the private class loaders. Implicitly, all the bundle code is then removed from the system without affecting other bundles. Private code parts of active bundles could be updated at runtime. Exported code could only be updated when the PackageAdmin services enforces the framework to reload. 6.4 Service Layer and Service Discovery In our design we’ve defined several services, each with different responsibilities. Services need to be available for other services or components throughout the system, so some kind of lookup service, as defined in the previous chapters, is required. OSGi provides these mechanisms in the service layer. Each bundle may provide multiple services by registering service objects using the BundleContext. A service is a java object which can be used by other bundles. This way, interaction between bundles is decoupled. The service layer maintains a service registry with all provided services together with an optional set of service attributes that can be passed to the framework during registration. The service registry makes it possible for bundles to detect newly added or removed services. Bundles can retrieve a service by requesting a service reference for the name of an interface, not knowing whether a service that implemented that interface actually exists on the service platform. Service requests could also contain LDAP String Filters. These filters are matched against the service attributes3 of a candidate. ServiceFactories are special kinds of service provider classes. For every bundle that requests a service, a new instance of the service object is created. However, the framework caches in- stances per bundle so a bundle might get the same instance all the time. To track the lifecycle of bundles that provide services, the ServiceTracker can be used. The ServiceTracker is a service and provides tracking of all bundles matching certain criteria. The ServiceTracker could contribute to solving our problem of detection of services becoming unavailable. When, for instance, a container service becomes unavailable, the ServiceTracker detects this. This way, we could find out that a processing node and all its process runtimes have become unavailable. 6.5 Event Handling As stated before, a ServiceTracker could be used to keep track of the status of a certain service. Additionally, a mechanism is required to notify the rest of the system in case of state changes of services or components. OSGi signals state changes in the framework by Events. Bundles can subscribe for certain event types by implementing corresponding listeners. Events related to lifecycle management 3 Although every Comparable object can be used as an attribute, only Boxed types of the eight basic types, Vectors and Arrays containing them could be safely matched unambiguously. 35
  37. 37. 6.6 Wiring of Processes 6 OSGI are grouped into FrameworkEvents and BundleEvents. State changes related to services fire ServiceEvents which are detected by the previously described ServiceTracker. Bundles can also generate their own events. FrameworkEvents are fed into a so-called EventAdmin service. This service provides a generic framework for interservice communication. Owing to the fact that services dynamically appear and disappear, the EventAdmin uses the publish-subscribe pattern which reflects loose coupling. It could be seen as a channel between sending and receiving services. Events are published under a certain topic based on hierarchical name spaces. This topic is stored in the property field EVENT TOPIC. OSGi services generally use the form fully/qualified/- package/Classname/ACTION. For instance, framework events have the topic org/osgi/frame- work/FrameworkEvent/STARTED. Similar to service attributes, events can have EventProper- ties that provide additional information about the event. Bundles can subscribe for events by registering an EventHandler instance as a service. The EVENT TOPIC property is set to an array of relevant topics. To solve the problem of detecting services becoming unavailable, a bundle which depends on a service could register for service down events regarding that service. Again, the asterisk (*) is the wildcard character which indicates all events with a matching prefix in the topic name will be handled. As with services, an additional LDAP style filter string can be assigned to the EVENT FILTER property to narrow the scope. To publish events, a bundle has to retrieve the EventAdmin service and invoke sendEvent() or postEvent() for synchronous respectively asynchronous delivery of events. The former one should be used with care due to the risk of deadlocks. 6.6 Wiring of Processes In addition to (remotely) starting and stopping container services and their corresponding pro- cess runtimes, we need a mechanism for linking the output of a process to the input of another process. In OSGi, this can be accomplished by the use of the so-called WireAdmin Service. The goal of the OSGi WireAdmin Service is to enable services that generate some sort of data to send it to the services interested in the same data. The data can be updated dynamically so that the interested services can receive the new values regularly. The WireAdmin Service provides configuration data (in the OSGi ConfigurationAdmin Service) through which new virtual connections (known as wires) can be established when a new service needs to receive the data output. Useless wires can easily be removed. The main advantage of using the WireAdmin service is that it decreases the need for wired bundles to have context-specific knowledge about the opposite party. They never need to communicate with each other directly but through the WireAdmin Service. 6.7 R-OSGi OSGi solves a couple of problems related to detection and tracking of services and other system components on a local machine. Now we need to go one step further, because we’re dealing with a distributed system. This poses a whole lot of new problems, for instance, some services 36

×