Chapter Template for Service-Oriented Computing Series
1. Overlord: SOA Governance from JBoss
This document discusses what Overlord will provide within the context of
SOA Governance. The ﬁrst section will discuss only core requirements and
components. The second section will discuss the roadmap and where
partners may be able to play a role.
Any computer system, whether it is centralized or distributed, needs some
form of governance, i.e., the act of monitoring and managing the system.
Such governance may be as simple as ensuring only authorized users have
access to services, or as complex as guaranteeing the system and its
components maintain a level of availability or reliability in the presence of
failures or increased system load. Managing distributed systems has always
been a critical aspect for users, developers and administrators. As those
distributed systems grew in scope and scalability, spanning multiple
organizations with different infrastructures and trust boundaries, governance
became more difficult but even more important.
Governance deals with the processes by which a system operates. In order to
have successful governance, some form of management, monitoring and
administration is required for these processes. SOA governance is the
discipline of creating policies and communicating and enforcing them.
In order to govern a SOA infrastructure (SOI) such as an ESB, there needs to
be a framework in place that allows policies and Service Level Agreements
(SLAs) to be deﬁned, enforced and audited across multiple security and
identity domains. Such a framework must be able to deﬁne policies for
individual services and then either enforce them or provide means by which
they can be managed and enforced by some other component within the
SOA. One aspect of SOA governance that is implied by most deﬁnitions but
often overlooked is the necessity to communicate such policies to users of
Any implementation of governance provided by an SOI should be centered on
the four principles of enterprise architecture: the people involved, the
processes, the technology and services. A good governance implementation
needs to be supported by a hierarchical organizational reporting structure.
This impacts on an SOI in several ways, with the most obvious being that the
different levels in the reporting structure (e.g., developers, business
managers, service sponsors etc.) need different views onto the system as
services are built and deployed. Unfortunately at this stage in the evolution
of the Enterprise Service Bus as a SOA infrastructure, many implementations
present only a single view (the developer view) and organizations must rely
on ad hoc mechanisms to cover the other cases. This inevitably leads to an
impedance mismatch (translation difficulties) as managers try to understand
how to map low-level details onto their expectations. Within Overlord we
believe strongly that all good SOIs must eventually cater for everyone
involved in the SOA development and runtime within the same environment.
In the following sections we shall discuss SOA and governance from the
perspective of JBoss products currently and in the future. It should be
realized that all of what we will outline within this document will eventually
become implemented within the JBoss SOI.
1.1 Infrastructure and tooling support
Low-level infrastructure support for governance will come from JON and
more native support within the ESB and other projects. Importantly our SOA
Infrastructure has always made the registry (UDDI in our case) a central
component, which helps drive some aspects of governance into the minds of
users and developers. Furthermore, all good governance solutions need a
repository for storing artifacts (e.g., service deﬁnitions) and we have
development efforts in the Guvnor and DNA projects to ﬁll that important
Depending upon the role of the user or component some governance tools
within Overlord to be Eclipse based, whereas others will be Web based. Plus
there will be some tools that will have representations in both arenas
because some of the capabilities will be need to be available across different
roles in a format that is natural for that role. For instance, sometimes what a
sys admin needs to do is also what a developer needs to do (e.g., inspect a
Managing a LAN-based distributed system can be hard enough: imagine
expanding that so it covers different business domains where the developer
or deployer do not control all of the underlying infrastructure and cannot
work on the assumption that people are trustworthy (and possibly live on
different sides of the planet!) With SOA governance there are run-time and
design-time requirements: typically a runtime component executes to ensure
things like Service Level Agreements (SLAs) are maintained, whereas a
tooling-based governance component could be for run-time
monitoring/management or design time service composition.
With Overlord, you'll be able to graphically display:
MTTF/MTTR information on behalf of speciﬁc nodes and services on those
nodes. Also for all nodes and services that are deployed.
throughput for services.
time taken to process speciﬁc types of messages (e.g., how long to do
transformations, how long to do transformations on behalf of user X).
number of requests sent to services during the lifetime of the service/node
(in general, it is always important to distinguish between services and the
nodes on which they execute).
number of faults (service/node) in a given duration.
information about where messages are being received.
information about where messages are being sent (responses as well as
potential dependency tracking data. This can be used to determine sources
of common failure. Can also be used when deciding whether or not (and
where) to migrate services, for improved availability or performance.
what versions of services exist within a speciﬁc process (VM).
Includes sending probe messages that can test availability and performance
on request. However, this functionality is also duplicated into the design-
All of this information may be obtained periodically from a central (though
more realistically a federated) data store or direct from the services
themselves. However, both sys admins and developers will need to be able to
connect to services (and composites) and inspect their governance criteria at
will, e.g., when was the last time they violated a contract, why and under
what input messages/state: the dynamic factor is incredibly important. This
information needs to be made available across individual services as well as
the entire SOA-P deployment.
Within the Overlord project we are working on a separate and dedicated
governance console that is used to receive alarms/warnings when
contracts/SLAs are violated or close to being violated. Obviously the console
is only one such destination for these alerts: sys admin inboxes are just as
important. However, that's where the infrastructure comes into play.
Traditional management tooling (e.g., via JMX) including:
start/stop a service.
suspend/resume a service.
add/update restriction lists for services. This limits the list of receivers that
a service considers valid and will process messages from. A similar list of
destinations for responses will exist. This plays into the role/relationship
concept because although a developer may not consider the issue of security
(maybe can't, given that services could be deployed into environments that
did not exist when the developer was building the service), the sys admin (or
service container admin) will have to.
migrate services (and potentially dependent services).
inspect service contract.
update service deﬁnition.
attach and tune speciﬁc service parameters.
Fortunately we'll get a lot of this from close integration with the JBoss
Design time tooling from Overlord includes:
deﬁning the service deﬁnition/contract, which includes specifying what
message types it allows. This is tied into the service implementation in order
that the SOI can verify incoming messages against this contract for validity.
Part of the contract will also include security and role information which will
deﬁne who can interact with the service (may be ﬁne grained based on time
of day, speciﬁc message type, etc.) Policies are attached at this level on a per
service or per operation basis (if not deﬁned on an operation basis, the
service level policy is taken if deﬁned).
policy deﬁnition/construction, tracking and enforcement. Not just part of
the development tool, but also an integral part of the underlying SOI. Policies
need to be shared so that other developers can utilise them in their own
service construction. Typically these will be stored in the repository.
service construction from other services, i.e., composite services. This has
an input on SLA and on governance enforcement. In some cases a physical
instance of the service may not exist either and the infrastructure becomes
responsible for imposing the abstraction of a service by directing interactions
inspecting the registry and repository during design time to locate and
inspect desired services for composition within applications. Also ties into
runtime management so that the user can inspect all running services. This
would also tie into our graphical process ﬂow tool, by allowing a drag-and-
drop approach to application construction.
service development then into service deployment. The tool will allow the
user to view a list of available nodes and processes. The availability,
performance etc. of those nodes will also be displayed (more tooling and
infrastructure support). Then you can drag a service implementation on to
the node and deploy it, either dynamically or statically. This ties into the
runtime management tool that allows the user to view deployed services on
Service implementation tooling:
from the client perspective you go to the registry and select the right service
based on requirements (functional as well as non-functional). The associated
contract deﬁnes the message format, as mentioned earlier. The tool will
either auto-generate an appropriate stub for the client code or provide a way
of tying state variables (from the application code, or incoming messages
etc.) in to the outbound invocations on the service. At the tooling level, we're
really looking at deﬁning client tasks that the developer writes and deﬁnes
the output instance variables that are hooked into the service's input
variables. As far as the client tool is concerned, we are simply tying together
these variables. Capabilities such as security and transactions may be
exposed to the client.
from the service developer perspective, we are deﬁning services as
compositions of tasks, actions, dispatchers etc. In the graphical designer we
specify the input variables that are required for each operation type (deﬁned
as speciﬁc messages). This also plays into the contract deﬁnition effort
mentioned earlier, since the message formats accepted by a service are
implicitly deﬁned by the requirements on input state variables.
WS-CDL tooling will also be tied into the run-time as well as design-time
governance aspects of Overlord.
the need to be able to deploy services into a virtual environment to allow
them to be tested without affecting a running system. A service has to be
able to be deployed in a test mode. What this means is that at a minimum
the service is not available to arbitrary users. Test services should also not be
deployed into a running process/container that is being used by other (non-
test) services and applications in case they cause it to fail and, worst case
scenario, take the entire process with them.
Very important to our work here was the donation by Thomas Erl of his
1.2 The Registry
Registries have always been seen as one of the key services in the SOA Triad.
However, until recently most SOA infrastructures and ESBs ignored them (JBossESB
was the first SOI to put a registry at its heart). The registry allows users to manage
services deployed within the SOI, based on pre-set policies. It is also possible to
store metadata about the services.
Both design-time and run-time policies are associated with services in the registry.
Developers can define a uniform set of policies that are enforced by the SOI. The
registry plays a key role in helping to enforce policies during the process of
provisioning services within the SOI. At runtime, other components within the SOI or
application may interact with the registry to find services and their run-time policies.
With an SOI that has a complete governance implementation, those policies will then
be enforced during the execution of the service. A service management solution can
also update metadata (e.g., security capabilities) related to a policy in the registry.
Given that a registry is a key enabling technology for a mature SOI, a standardized
approach to them is important: UDDI (Uniform Discovery, Description, and
Integration), which provides a data model and standard interfaces for reading from
and writing to the registry. The UDDI standard allows for any other system to access
information about services and update that information.
SOA registries may be deployed in a distributed fashion based on organizational
needs, e.g., an enterprise registry, departmental registries and application-specific
registries and testing registries (where services that are only in use for testing
purposes are described).
1.3 Service Lifecycle
The Enterprise Service Bus as a concrete implementation platform for SOA places
certain rules on the way in which services are developed, deployed and managed.
These rules are often defined in terms of phases and referred to collectively as the
Service Lifecycle and impact directly on the required capabilities of the governance
framework we have outlined previously:
Service lifecycle management concentrates on the development and deployment of
services, whereas the governance aspect brings access control, policies etc. into the
way in which services are used within a business process. Furthermore, as
mentioned before, governance fills the critical management requirement for deployed
A service’s lifecycle management is affected by its relationship with other services in
the environment. Irrespective of whether these services are provided by the SOI or
as part of the application, services do not exist in isolation at any stage in their
The model and assemble phases of the service lifecycle are often referred to as the
design-time aspect because they are concerned with the development of the service
prior to its being deployed within the SOI. The first step in deploying a service is
obviously identifying the requirement for the service and from that, what capabilities it
should offer. This process can often take an arbitrary duration as it is typically
iterative, particularly if multiple organizations or developers are involved. Sharing of
services within an SOI at runtime is often a reflection of the sharing of services that
goes on during the development stages of the service lifecycle.
Identifying the needs of a service within an SOI can be based on short-term or long-
term requirements. A successful SOA-based approach would tend to look at the
service from a wider perspective than a single application. Particularly in large scale
SOIs, unless a service offers a very restrictive/specific capability, once it is deployed
it is often difficult to predict how it will be used. Furthermore, within few exceptions,
most SOA deployments are expected to run continuously for long durations and
hence replacing a service may be difficult without having an adverse affect on the
normal execution of the SOI: quiescent periods simply do not exist in many
Service re-use is one of the core principles behind SOA. Unlike in previous
incarnations, re-use is not about physically copying or sharing libraries and running
them within your own execution environment (which can result in security breaches).
Service re-use is simply that: re-using an existing service wherever it is deployed
within the SOI.
As soon as a service is selected for design it is important to think in terms of contract
and policy definition. These will be critical for a number of reasons including service
re-use (“is this service really offering what I want?”) and governance (“is this service
really doing what it said it would?”). Although it is possible to retrofit contracts and
policies after the service has been developed, successful SOI deployments are
frequently based on the approach of doing this as early in the design phase as
During the assembly phase of the lifecycle, the service is to be developed, either
from scratch or through leveraging existing services within the SOI. The latter offers
the ability to more quickly develop and deploy new services, reducing time to market:
one of the critical benefits offered by SOA. Leveraging existing infrastructural
investment is important throughout SOA and an SOI that offers support for this is a
good candidate for a development and deployment environment.
The SOI must facilitate design-time discovery of services and provide the capability
to compose new services from existing services. This is often approached through
the use of Registries and Repositories. Furthermore, it should give sufficient design-
time tool support to structure the flow of business processes into the interactions
1.4 Identity within the SOI
In any computer system, user identity is critical to enforcing security restrictions and
access control over services. Over the years many organizations have centralized
their administration of user identities and the privileges associated with them, e.g.,
whether or not a specific individual is allowed administrator privileges. Centralizing
makes security enforcement easier because there is only a single “site” to protect
from intruders. Furthermore, some government or vertical industry rules and
regulations require centralized management of this kind of information for compliance
Identity and Access Management products give assurances that individuals are who
they appear to be and that they can be restricted. An SOI must provide some means
by which a user (human or process) can establish its identity (obtain a credential)
and then pass this to a target service in a format it understands. From an
interoperability perspective, standard based formats are very important, or you will
end up developing an isolated SOA application whose clients and services are tied to
a specific SOI. WS-Security is one of the standards you should expect to see
Where identity is concerned, the SOI must ensure that every intermediary can
authenticate the requesting client (which could be a service) before passing
credentials to the next service. As the credential information flows, it may be
augmented or completely changed by each intermediate service: identity
management must be federated hierarchically in order for it to scale and match the
business domain. For example, although an intermediary service may call another
service on behalf of a client, it may not be possible or legal for the identity of the
original client to be exposed outside of the first service. Obviously the service must
also be able to authenticate the client/service based on credentials or intermediary
evidence before deciding whether it has authorization to call a service do some work.
The SOI should provide support for:
! Digital certificates or tokens to prove identity;
! How credentials can be associated with a message based on a service’s
! How intermediates can use credentials, authenticate them and pass evidence of
that authentication to other services;
1.5 The Service Level Agreement
A service level agreement is a document that defines the relationship between two
parties: the provider of the service and the recipient of the service. The SLA
essentially defines a contract that exists between the two parties: the quality of
service that the provider will give to the recipient. SLAs address situations when
compliance to benchmarks must be verified from the perspective of a contractual
An SLA is an extremely important contractual obligation (usually in the form of a
document in the physical world). In essence it defines the parameters for the delivery
of a service for the benefit of both parties. At a minimum it will define:
! The service(s) to be delivered.
! Performance and how to report deviations from agreed metrics.
! The recipient’s responsibilities.
! Problem management.
Although their origins are in the physical world, SLAs are just as important in
computing environments and especially so in an SOI comprehensive SLA should
always be seen as an essential requirement for both the provision of a service and
the use of that service. It can improve the quality of the development process.
Implementing Service Level Agreements in a SOA requires a process flow that can
define SLAs, measure compliance and act accordingly. This drives some essential
! The ability to capture any type of service level related metric on a per-message
! A flexible authoring environment to create policy logic based on SLA metrics and
other service data.
! A mechanism to verify policy compliance and handle SLA violations or related
Any infrastructure that meets these requirements also has to work with any existing
components that might be part of the overall business process being managed. This
may include identity management systems, application servers, Web servers, portals,
etc. From this perspective, it makes sense to use an intermediary that is not tightly
bound to the underlying infrastructure but can implement SLAs at a standards-based
This is where SOA governance starts to become a necessity. For early SOA adoption
this type of governance might not make sense but as the extent of your SOA grows
and services start to be reused more and more then governance is required and care
must be taken to examine each services SLA and provide enough information so that
other services can consume it with confidence. SLA and governance becomes
powerful allies in analyzing, developing and testing new services.
There are a number of initiatives that are trying to address governance and SLA,
notably WS-Policy from the W3C. However, at this stage there is no standard for
1.6 Policies and Contracts
We have already seen how defining the contract between service and client is an
important aspect of the service lifecycle. In order to do this successfully it is important
for the SOI to offer support for a message driven, contract driven development
approach, such as JBossESB.
An important aspect of any contract is the ability to define policies. A policy
represents a constraint or condition on the use, deployment or description of a
service. Policies are inherently driven by the service to define policies about issues
that are important to them. Policies need to address the overall impact to the
business of the services that are being created and deployed. They need to create a
strong connection between the business and the SOI infrastructure.
A contract can refer to the service interface, the messages it can accept, their
formats, or even a legal contract entered into when using the service. The difference
between a policy and a contract is that the latter is an agreed policy between service
Policies are only of real value if they can be authenticated or enforced, which is
where other aspects of governance come in. Therefore SOIs that support SOA
governance fall into five categories, which build upon one another:
! No policy support: the SOI has no support for policies or contracts within the
infrastructure. The need for policies must be defined outside of the SOI and
communicated using ad hoc techniques.
! Deﬁnition of policies: the SOI supports the capture and creation of policies
at design-time (typically via a graphical interface) and run-time (usually
through an intermediary such as a registry).
! Management of policies: the SOI allows the policies of services to be
viewed (either directly by contacting the running service, or indirectly via
an intermediary) and updated.
! Enforcement: policies are veriﬁed and enforced by the SOI.
! Storage: policies that are deﬁned for one service may be useful for another,
e.g., transactional capabilities. As such a library of policy types can be built
up and shared between services and developers. These policies are
typically stored within a repository.
The other metadata and policies (representing service constraints and capabilities)
stored in the registry include:
! Policies that describe configuration/description information for non-functional
capabilities of the service, such as those defined by the WS-Security or WS-TX
policies, for configuring low-level security and transactional aspects of the
! Policies that are markers for compliance or compatibility with certain standards or
specifications, such as support for WS-Addressing or compliance with the WS-I
! Policies that represent constraints that must be fulfilled, such as SLAs or
An implicit part of a service contract that is often overlooked is that of the service
semantics: essentially what the service is supposed to do. The purpose of a service
is the highest semantic characterization of the service. The service semantics include
such items as the format and structure of any data communicated between the users
of a service interaction.
A successful policy framework within an SOI must fulfill the following requirements:
! A canonical representation (typically XML) for expressing policies at different
levels within the SOI (per service, per operation, per deployment container etc.)
Policies should be composable.
! The ability to secure policies so that they can only be changed by authorized
individuals or components.
! The ability to persist policies (within a repository).
! The ability to create, enforce and manage policies.
! The ability to locate policy definitions within the SOI. For management reasons it
is often the case that policies will be associated with services by reference and
the actual policy definition will need to be fetched from some location (e.g., the
repository) in order to monitor or enforce it.
Policies should not be considered as static entities. As with their counterparts in the
real world, policies change to reflect the environment in which they exist. For
example, a security policy that was initially considered sufficient for inter-corporation
deployments may be deemed to weak if services are eventually exposed beyond the
corporate firewall and need to be improved. Any good SOI will allow this to occur
dynamically without requiring changes to deployed services.
However, in which case how are changes to the policies communicated to the
services and the enforcement aspects of the SOI governance? Once again, this will
be implementation specific. Components (including human users) that wish to know
about changes to policies, may subscribe to Policy Managers that push changes out
to them when they occur, or require periodic polling if a push approach is not
implemented. In an event-driven architecture, changes to policies constitute an event
and therefore informing interested parties will often be catered for within the SOI
architecture naturally, i.e., there will be no bespoke policy-management/monitoring
Now many people in our industry ignore formal methods or pay lip service to them,
believing they are only of use to theoreticians. Unfortunately until that changes
Computer Science will always be a "soft" science: more an art than anything. That's
not a good thing because it limits efficiency. In a local application (everything on one
machine) you can get away with cutting some corners. But in a distributed system,
particularly one that needs to be fault tolerant, it's worse. For example, how do you
prove the correctness of a system when you cannot reason about the ways in which
the individual components (or services) will act given specific expected (or
unexpected) stimuli? Put another way, how can you ensure that the system behaves
as expected and continues to do so as it executes, especially if it has non-
deterministic properties? As the complexity of your application increases, this
problem approaches being NP-complete.
Rather than just throwing together an "architecture" diagram and developing services
in relative isolation, and trusting to luck, we decided that something better had to
exist for our customers. Now there are formal ways of doing this using PetriNets, for
example. WS-CDL uses Pi-Calculus to help define the structure of your services and
composite application; you can then define the flow of messages between them,
building up a powerful way in which to reason effectively about the resultant. On
paper the end result is something that can be shown to be provably correct. And this
is not some static, developer-time process either. Because these "contracts" between
services work in terms of messages and endpoints, you can (in theory) develop
runtime monitoring that enhances your governance solution and is (again) provably
correct: not only can you reason successfully about your distributed system when it is
developed and deployed initially, but you can continue to do so as it executes.
A good governance solution could tie into this and be triggered when the contract is
violated, either warning users or preventing the system from making forward
progress (always a good thing if a mission-critical environment is involved).
With the WS-CDL tooling you can define your scenarios (the interactions between
parties). You can then use the tool to define the roles and relationships for your
application and then you can dive down into very specific interactions such as credit
checking or winning the auction.
The tooling is all Eclipse based, it should be relatively straightforward to tie this into
our overall tooling strategy as well, providing a uniform approach to system
management and governance. But even without this, what this combination offers is
very important: you can now develop your applications and services and prove they
work before deployment. Furthermore, in the SOA world of service re-use, where you
probably didn't develop everything, a suitable WS-CDL related contract for each
service should allow developers to re-use services in a more formal manner and
prove a priori that the composite application is still correct, rather than doing things in
the ad hoc manner that currently pervades the industry.
1.6.2 Policy Monitoring and Enforcement
In order to govern a SOA infrastructure such as an SOI, there needs to be a
framework in place that allows these non-functional policies (attributes) and Service
Level Agreements (SLAs) to be defined, enforced and audited across multiple
consumers and services. Such a framework must be able to define policies for
individual services and then either enforce them or provide means by which they can
be managed and enforced by some other component within the SOA; at a minimum
alerts should be supported to inform users (human or software) that contracts and
policies have been, or are about to be, broken.
Policy enforcement should be provided by the SOI infrastructure through Policy
Enforcement Points (PEP), such as interceptors within the consumer/service protocol
stack or intelligent (dynamic) proxies through whom messages involved in
interactions must pass. Only certain policies (such as security) are applicable to the
proxy pattern if the proxy does not reside within the same address space as the
The technical aspects of how policies are enforced will typically be implementation
specific. However, it is important to know that support for policies without
enforcement provides limited utility. When monitoring, management and enforcement
are all in place (a complete governance solution), the SOI should work with the
various types of user (human or software) that need to be informed when policies or
contracts are violated.
Obviously the PEP needs the capability to understand, monitor and perhaps enforce
the policies under its control. As well as limiting the types of policies that can be
defined, some SOI implementations will therefore limit the deployment scope for
services such that they can only be deployed within environments (containers,
processes etc.) that support the policies defined for them. Other SOI implementations
define policy enforcement components that are pluggable and network-able, i.e., can
be downloaded from a central Policy Management Repository by a PEP when it
encounters policies for which it has no native monitoring or support capability.
SOIs can be categorized according to how they support policy monitoring and
! No monitoring or enforcement: such SOIs are of limited use where service
policies and contracts are concerned. Unless enforcement happens outside of
the SOI, e.g., through direct interaction between producer and consumer, these
implementations should be used with care, particularly as the size and complexity
of deployments increase.
! Monitoring but no enforcement: at least these SOI implementations monitor the
policies and send alerts when violations occur. The form of alerting may be as
limited as simply outputting to a service-local log, or as complex as emailing and
interacting with Business Activity Monitoring (BAM) consoles. But the outcome is
the same: interested parties can be informed when policies are not adhered to.
Obviously whether or not such information is communicated in (relatively) real
time will depend upon the capabilities of the SOI.
! Monitoring and enforcement: where possible, the SOI implementation will enforce
policies, such as ensuring that messages are appropriately secure and if not,
refusing to allow such messages to be sent or received.
1.7 Service Monitoring and Business Activity Monitoring
The term Business Activity Monitoring (BAM), which was originally coined by Gartner,
is used to describe the real-time access to critical business performance metrics in
order to improve the efficiency and effectiveness of business processes. Real-time
process/service monitoring is a common capability supported in many distributed
infrastructures. However, BAM differs in that it draws information from multiple
sources to enable a broader and richer view of business activities. BAM also
encompasses business intelligence as well as network and systems management.
BAM is often weighted toward the business side of the enterprise. As such, there has
recently been a movement for BAM implementations to be closely related to the
As such, BAM leverages the technical aspects of monitoring and altering, but for
technical reasons as well as business-focused events. From the perspective of a
business user, the ability to create alerts based on business processes can be crucial
in order to comply with regulations, auditing etc. Although BAM has been around for
many years, it has begun to be associated strongly with the business-driven aspects
1.7.1 Why BAM?
To understand why BAM is important, let’s take a real-world example: an online
catalogue shop. In this scenario there will be a continuous production of sales related
information, as well as logistics (e.g., are the items currently available or do they
need ordering from elsewhere?), financial (e.g., does the user have the right line of
credit?) etc. All successful companies have operational processes that allow them to
analyze information in real-time, creating alerts when problems occur (or are about to
occur). This is what BAM formalizes and provides to all applications within the IT
Therefore most BAM systems will attempt to provide the following capabilities:
! The ability to monitor in real-time business processes and activities, alerting
users (by email, voice, SMS etc.) before problems arise and in some cases
allowing the system to be updated to prevent further problems. The BAM
infrastructure pushes the events at real-time to the dashboard.
! The ability to create dashboards, which are similar in intent to a car’s
dashboard and are used to present important information in a visual manner
(often on a single screen) that is tailored to a specific role or viewpoint. It is
worth stressing the importance of real-time information processing as far as
BAM is concerned because other approaches to monitoring exist (such as
Business Intelligence) that utilize dashboards. However, these systems are
not real-time based, typically refreshing information periodically.
As we mentioned above, an important aspect of BAM (or the
monitoring/management aspects of any good governance infrastructure) is alerting
users (or other services) about contract violations etc. Within BAM these alerts are
triggered on business events (e.g., is the service meeting its SLA, or are response
times longer than required?), but in general alerts may be triggered for a range of
reasons, e.g., the service has failed and the infrastructure needs to start another
The complexity of alerts that are supported will depend upon the implementation, as
will how alert messages are delivered. While most of the first BAM solutions were
closely linked to Business Process Management BPM solutions and therefore
processed events emitted as the process was being orchestrated, this had the
disadvantage of requiring enterprises to invest in BPM before being able to acquire
and use BAM. Fortunately the newer generation of BAM solutions is based on
Complex Event Processing (CEP) technology, and can process high volumes of
underlying technical events to derive higher level Business Events, therefore
severing the dependency on BPM, and providing the benefits of BAM to a wider
audience of customers.
Any good monitoring/management infrastructure should support at least the
! The ability to measure the total elapsed time from the sending of a request to
a service and obtaining a response, i.e., the round-trip time. This is necessary
for determining whether or not service execution times defined within SLAs
are (about to be) violated and is particularly useful in high volume services.
! Failure/fault detection alerts can be critical to business users as well as
infrastructural clients and services for tracking service availability and status.
Some SOIs are very static, requiring services be deployed by system
administrators; if there are failures of services then the administrator needs to
start a new instance. However, the next generation of SOIs are taking a leaf
out of distributed systems of the 1990’s and embracing a more dynamic
approach to service deployment: failures of services can be detected by these
SOIs and new instances automatically deployed. Similar techniques are used
to dynamically redeploy services to ensure a level of availability as well as to
provide load balancing.
! Detecting trends in services (similar techniques to Data Warehousing
capabilities, but using real-time data rather than offline databases). Alerts that
are triggered when thresholds are reached are good when it is important to
know that something has gone awry, but it is often more important to predict
when something is going wrong so that corrective measures can be taken
and prevent the original alert being triggered. Furthermore, threshold triggers
are often absolutes and do not take into account natural perturbations, often
leading to false positive alerts. A popular approach to this is using non-
threshold analysis, such as Bayesian Belief Networks (BINs), which are
based on probabilistic inference, allowing a prediction of an event based on
previous historical information. This can be an extremely powerful approach
for BAM implementations, allowing much more accurate predication of trends.
We also believe that BINs will begin to be used more widely within other
areas of SOA governance, particularly in the long term prediction of faults or
! Monitoring message payload size can be useful in detecting reasons for
bottlenecks, abnormal service requests/response times etc.
1.8 Service Activity Monitoring
Although BAM was popularized by BPM, the fundamental basis behind it (monitoring
the activities in an environment and informing interested parties when certain events
are triggered) has been around since the early days of (distributed) system
management and monitoring. BAM specializes this general notion and targets the
Within a distributed environment (and many local environments) services are
monitored by the infrastructure for a number of reasons, including performance and
fault tolerance, e.g., detecting when services fail so that new instances can be
automatically started elsewhere. Over the years distributed system implementations
have typically provided different solutions to specific monitoring requirements, e.g.,
failure detection (or suspicion) would be implemented differently from that used to
detect performance bottlenecks. However, for some types of event monitoring this
leads to overlap and possible inefficiencies. For instance, some approaches to
detecting (or suspecting) failures may also be used to detect services that are simply
slow, indicating problems with the network or overloaded machine on which the
As we saw earlier when discussing Policy Enforcement Points, the general concept
of interceptors (or filters) exist in many distributed systems architectures over the
past few decades. In fact Policy Enforcement Points are typically implemented as a
specific type of interceptor built on the more general infrastructure provisioning. It
turns out that interceptors are also a good implementation technique for some types
of failure detection.
We are now seeing a merging of many different approaches to entity monitoring
within distributed systems (where an entity could be a service, a machine, a network
link or something else entirely) and particularly SOIs. The emergence of event
processing has also seen an impact on this general entity monitoring, where some
implementations treat failure, slowness to respond etc. as particular events. This
uniform monitoring is termed Service Activity Monitoring (SAM) and typically includes
! Message throughput (the number of messages a service can process within a
unit of time). This might also include the time taken to process specific types
of messages (e.g., how long to do transformations).
! Service availability (whether or not the service is active).
! Service Mean Time To Failure (MTTF) and Mean Time To Recovery (MTTR).
! Information about where messages are sent.
As the diagram below illustrates, the information is made available to the
infrastructure so that it may be able to take advantage of it for improved QoS, fault
tolerance etc. The streams may be pulled from existing infrastructure, such as
availability probing messages that are typically used to detect machine or service
failures, or may be created specifically for the SAM environment. Furthermore,
streams may be dynamically generated in real-time (and perhaps persisted over
time) or static, pre-defined information, where the SAM can be used to mine the data
over time and based on explicit queries.
With the advent of SAM we are beginning to see some BAM implementations that
are built on it, whereas other implementations are (continue to be) built from scratch
and only target the business activities. The SAM approach may offer more flexibility
and power to monitoring and management, whereas a specific implementation may
be more easily transported to different environments (since it is not tied to a specific
Initial implementation must concentrate on the repository. This will be based on
Guvnor and DNA. Until DNA is available, we will use Jackrabbit as the JCR backend.
The use of CDL is a relatively medium term requirement for users, since it really
comes in to play when you have more than a handful of services. However, this is a
significant positive differentiator for JBoss and is innovative. Hence the CDL
integration with Overlord is an ongoing effort now, driven by our interactions with
Given everything governance needs to perform, e.g., monitoring of contracts/SLAs
and enforcement, the need for a good SAMM infrastructure is needed in the short-to-
medium term: retrofitting the infrastructure will prove problematical and inefficient.
The SAMM infrastructure should be based on CEP and BIN. Furthermore it should
be developed in a pluggable manner to allow deployers to use their own CEP
implementations if necessary. This does not necessarily mean the wholesale
replacement of whatever CEP implementation we use by default: federation of CEP
should be possible. SAMM also needs to tie into RHQ as one of the event streams.
Tooling for both design-time and run-time governance is a continuous requirement.
Initial tools will concentrate on run-time governance.
The Service Modeler (donated by Thomas Erl) is an ongoing community driven tool.
This ties in with the immediate need for work on defining a service contract definition
language which will be based on WS-Policy and Policy Intents.
The PEP architecture will be developed in the short term. Then all SOA Platform
projects need to understand the need for PEP and consider where they will occur
within their architecture. The implementations of PEPs for all projects will be an
ongoing effort and should also tie into the SAMM implementation when it is available.
In the short term PEPs will be developed in an ad hoc manner in order to provide a
pragmatic solution to immediate problems.
Identity Management and Security are immediate requirements.