1. OCCI Monitoring Extension
OCCI Monitoring Extension
4th ZWAH summer school in Cloud Computing
Augusto Ciuffoletti
University of Pisa
July, 2016
2. OCCI Monitoring Extension
Monitoring cloud resources: admin and user
perspective
• Infrastructure monitoring traditionally associated to
administrator
• Gather data from sensors, store and process to generate
alarms
• Keep data in databases for incident diagnosis and later
analysis
• Pervasive coverage: every resource is potentially probed
• Cloud services introduce the same need on the user side
• The NIST definition explicitly mentions this as a property of
a cloud service
• The user is in fact the admin of a virtual infrastructure
• Wants to understand when a problem arises, and why
• Wants to verify that the service meets the expectations
• A single approach may address both, but kept separate
3. OCCI Monitoring Extension
Nagios as a practical solution
• Nagios has long been an excellent solution for
infrastructure monitoring
• modular software design
• remote probe deployment and control
• highly adaptable using plugins (possibly user-defined)
• separates the collection and the management of data
• But it does not scale sufficiently well
• difficult to modularize the monitoring infrastructure
• Awkward to dinamically configure
• it is based on a configuration file
4. OCCI Monitoring Extension
Beyond Nagios: new requirements
• De-compose the monitoring infrastructure in subsystems
• Big-data aware: consider the presence of aggregation
steps in the data path
• Allow the presence of separate planes and data paths for
the user and the admin
• On-demand configuration on both planes
• Address a multi-provider (federated) environment
5. OCCI Monitoring Extension
Cloud monitoring
A cloud user wants to have a functional feedback from cloud
sourced resources:
• not only to verify service quality, but also to:
• control a scalable resource,
• provide feedback to the users,
• trigger compensating actions
• NIST indicates monitoring as one of the distinctive features
of cloud computing
6. OCCI Monitoring Extension
Our option: on demand monitoring
• Provide monitoring as part of the service
• Give the user wide possibilities to configure a monitoring
infrastructure
• Which metrics are captured and how data are
preprocessed and retrieved
• Scale from simple to complex infrastructures
• Do not overkill the problem when the use case is simple
• Cope with complex infrastructures
• Resource agnostic
• Basic functionalities plus unlimited pluggable extensions
8. OCCI Monitoring Extension
A monitoring infrastructure
• Adding a monitoring infrastructure:
• probes that collect monitoring data (collectors)
• a device that processes monitoring data (sensor)
9. OCCI Monitoring Extension
A cloud interface
• we need an interface
• an open, extensible standard exists: OCCI
10. OCCI Monitoring Extension
The OCCI framework
• There is an interface between the user of a cloud service
and the cloud service itself
• Data entities that describe the service traverse this
interface during its provisioning
• The protocol used during this conversation follows the
REST paradigm:
• the user plays the role of the client
• the conversation follows the HTTP protocol
• responses are cacheable, as far as possible
• OCCI proposes a minimalistic conceptual framework (or
ontology) for the entities used to describe the service
11. OCCI Monitoring Extension
The OCCI core concepts
• Anything is an entity, and it is identified with an URI
• A relationship between entities is an entity
• We distinguish resource entities and link entities
(relationship)
• There are many kinds of entities, with distinguishing
attributes
• An entity of a certain kind can be integrated with mixins
that carry more attributes or bind existing ones to values
12. OCCI Monitoring Extension
Basic monitoring operation
• Monitoring is made of three basic activities:
• extract operational parameters from a Resource
• gather performance parameters and compute the metric of
interest
• deliver the measurement to the relevant party
• The last two steps consist of the aggregation and
rendering of data
• this makes a candidate for a Resource
• The first step entails the collaboration among resources
• this makes a candidate for a Link
• The resource is named Sensor, and the link Collector
• and this is bare minimum to compose a monitoring
infrastructure from standard building blocks
13. OCCI Monitoring Extension
A Sensor
• It is a distiguished activity that needs the provision of cloud
resources
• Tightly integrated in cloud infrastructure
• Under control of the provider
• Tuned using user requests
• The user that wants to exert monitoring instantiates (and
pays for) a Sensor
• OCCI role in that is the description of the monitoring
infrastructure
14. OCCI Monitoring Extension
Describing a Sensor
• Any Sensor has a few generic features
• ...they can be included in a standard definition of a Sensor
• When the sensor operates
• How frequently the sensor produces a new measurement
• They are timing attributes
• Other features are specific for the provider
• ...they are defined as mixins for the sensor
• How data are filtered (low pass, patterns etc.)
• How data are rendered (archive, email, streaming etc.)
• There is no limit to the semantics of the mixins
• however the hooks to connect a Sensor to a Collector must
be defined
15. OCCI Monitoring Extension
A Collector
• Represents a flow of measurements between a OCCI
Resource and a Sensor
• ... yes, the source can be a Sensor in its turn
• The provider has control on the available measurements
• The user has control on the selection and the configuration
of the Collectors
• Cross provider measurements can be implemented
• ... to accomodate the utilization of several providers with a
unique dashboard Sensor
16. OCCI Monitoring Extension
Describing a Collector
• As in the case of the Sensor there are generic attributes of
a collector:
• The sampling period
• The accuracy of the sampling period
• ... again, just timing
• Other attributes are defined by provider-specific mixins
with an arbitrary semantic
• ...the metric that is measured (throughput, free space,
temperature etc.)
17. OCCI Monitoring Extension
The overall picture (kinds)
• Two entity kinds
• Sensor aggregates and delivers measurements
• Collector produces measurements
18. OCCI Monitoring Extension
The overall picture (mixins)
• Three mixin types
• Aggregator mixins describe the aggregation activity of a
Sensor
• Publisher mixins describe the rendering activity of a Sensor
• Metric mixins describe the measurement activity of a
Collector
• The two Kinds have a http://schemas.ogf.org/occi/monitoring
schema associated
• ...they are standard entities
• The three Mixins may be associated with a provider
specific schema
• ...but we do not exclude that some of them may be part of
another standard
19. OCCI Monitoring Extension
Hold them together: input and output hooks
• The designer needs the tools to assemble a monitoring
infrastructure
• we introduce input and output attributes for the Mixins
• This specification falls outside the capabilities of the OCCI
standard
• The provider needs to be able to specify an interface for
the building blocks
• We describe two types of channel attributes for a mixin:
• Input attributes
• Output attributes
• the value of a channel attribute is a label
• Input and Output attributes with matching labels are
connected
• this is useful to describe a flow of data among them
• The scope of a label is limited to a sensor and its adjacent
collectors
20. OCCI Monitoring Extension
Step by step design of a monitoring infrastructure
An example:
• One sensor collects measurementes from two resources
• Results are rendered through two different channels (e.g.,
streaming and database)
• Two distinct meters are applied to each of the two
resources (total four meters)
• We combine a metric from both resources (e.g., average
load)
32. OCCI Monitoring Extension
Step by step design of a monitoring infrastructure
The scope of the Sensor (for metric streams ids)
33. OCCI Monitoring Extension
Step by step design of a monitoring infrastructure
Feeding the aggregators: a,b,d are measurement stream ids
34. OCCI Monitoring Extension
Step by step design of a monitoring infrastructure
Feeding publisher 2: aggregated (f,g) and raw data (e)
35. OCCI Monitoring Extension
Step by step design of a monitoring infrastructure
Feeding publisher 1: measurement stream b is multicast
36. OCCI Monitoring Extension
Now, let’s take stock
• To describe a cloud provisioning, we need to describe also
a way to monitor its operation
• this is demonstrated by the monitoring options offered by
many providers
• A standard, aligned with the Open Cloud Computing
Interface
• Two basic concepts:
• Sensor aggregates and delivers monitoring data
• Collector produces monitoring data
• Finalized using mixins defined by the provider or by other
documents
• Can be combined to form complex infrastructures
• Next step: let’s implement an engine
37. OCCI Monitoring Extension
How do we do that?
• We want to study the big arrow in the figure
How do we implement a monitoring infrastructure starting from
its OCCI specification
• ROcMon is a "proof of concept" prototype
38. OCCI Monitoring Extension
ROcMon: from OCCI to a monitoring infrastructure
ROcMon is (R)uby (Oc)ci (Mon)itoring
• ROcMon is an abstract architecture and a prototype
• simplicity and effectiveness is the goal
• The architecture is based on a specialized resource, the
Sensor
• receives monitoring data from other resources
• aggregates, processes the data
• delivers the data to other resources
• The monitoring activity is modeled with a link between
resources, the Collector
• On one edge is a probe that extracts data from one
component
• On the other there is the destination Sensor
• A collector may link two Sensors
• Adopts Nagios "separation of concern"
• Introduces multi-stage data processing
• Allows multi-tenancy, federation
• Localizes where data is processed and aggregated
39. OCCI Monitoring Extension
Why ROcMon has an OCCI interface?
• To allow dynamic configuration we need to provide an
interface
• A dashboard GUI is not a solution
• An API gives full flexibility:
• Software defined monitoring infrastructure
• User transparent when needed
• Automatic control based on feedback
• OCCI-monitoring provides an effective model for the API:
• simple, customizable and expandable
• based on an adopted standard
• A Sensor is a resource entity
• A Collector is a link entity
40. OCCI Monitoring Extension
Architecture of ROcMon: analysis
• Monitoring is by nature split into small components
(remember Nagios)
• monitoring probes are small components, possibly
embedded
• monitoring data crosses a pipe of processors
(anonymization, aggregation etc)
• data is finally published using an endpoint reachable from
the outside (database, web service)
• Each component is supported by a specific technology
• e.g., network monitoring vs storage monitoring
• The on demand nature requires agility in deployment
• the cloud user that obtains a new resource may want to
monitor it
There is a match between
microservices and on demand monitoring
41. OCCI Monitoring Extension
Ex cursus about microservices
• A design paradigm for distributed system
• a book in O’Reilly "animal series" by S Newman
• Principles:
• each component (hosted by a container) in the system is
designed to provide one small, well defined service
• each component is a stand alone entity that interacts with
others across a network with a well defined interface
42. OCCI Monitoring Extension
Reasons to adopt the microservices paradigm
• simplifies maintenance
• e.g., upgrade one single container
• agility in deployment
• e.g., to scale up or down
• each container may use a different technology
• e.g., for technical or performance reasons
• simplifies development
• e.g., each container configured by a distinct team
• robustness
• e.g., if one container fails there is a chance that the system
still works
43. OCCI Monitoring Extension
Architecture of ROcMon: sensors as containers
• A sensor represents an autonomous activity
• It is implemented by an autonomous Docker container
• configured with required computing capabilities
• easy to instantiate/destroy on the fly
• allows the implementation of security measures
• interacts with standard Internet protocols
• OCCI attributes for a Sensor.
• only timing and timing accuracy
• specific operation configured with mixins
44. OCCI Monitoring Extension
OCCI monitoring remind
• A Sensor is a subtype of the Resource type
• A Collector is a subtype of the Link type
• Add Mixins to specify the type of activity
• Legenda:
• the sensor (red) is an OCCI resource
• the collectors (blue) are OCCI links
• computing boxes and the network are OCCI resources too
45. OCCI Monitoring Extension
Architecture of ROcMon: sensor mixins
• Mixins are one of the tools provided by OCCI for
expandibility
• A mixin is a feature that is dynamically added to an entity
(e.g. a sensor)
• A mixin can be provided by the service provider
• the provider has control on the functions added to the
sensor
• The provider may allow the user to add new mixins
• however the operation is still controlled by the provider
• Two kinds of mixin for the Sensor
• Aggregator – takes data from a collector and processes the
data
• can be used for anonymization, compression, filtering etc.
• Publisher – delivers the results to the outside with given
format and protocol
• can be used for logging, storage, visualization, triggering,
and forwarding to another sensor
46. OCCI Monitoring Extension
Architecture of ROcMon: mixins as threads
• A Sensor mixin is a defined activity (not a passive resource
feature)
• In ROcMon a sensor mixin is represented as a thread that
implements a functionality
• A Sensor may host several mixins, that may be instantiated
on the fly
• Using Ruby reflection mechanisms it is possible to
instantiate a new thread based on a request received
across the API
47. OCCI Monitoring Extension
Architecture of ROcMon: channels as WebSockets
• Monitoring consists of:
• the measurement of a number of metrics
• their processing along a pipe of functions that extract
relevant information (sensors)
• the final delivery or utilization
Relevant paper: NISTIR 8063: Primitives and Elements of
Internet of Things Trustworthiness
• The provisioning of communication along the pipe is part of
the monitoring service
• ROcMon uses WebSockets for this:
• Standard protocol
• Uses the same server (sinatra) used for REST configuration
• Encrypted communication is immediate (https)
48. OCCI Monitoring Extension
Architecture of ROcMon: probes as collector mixins
• In the OCCI monitoring document:
• The Collector link represents the communication of
measurements between a probe and a sensor
• The measured metrics are represented as Collector mixins
• One collector edge is a probe
• The probe is implemented as a thread hosting single
measurement tools as sub-threads
• Communication is based on WebSockets:
• The probe end opens a WebSocket with the Sensor
• The WebSocket on Sensor side routes incoming data to
aggregators and publishers
49. OCCI Monitoring Extension
ROcMon sandbox
• The VirtualBox VM is our private cloud provider
• Exposes a very basic OCCI interface
• remember: it is designed for testing and educational
purposes
• Two kinds of resources available as Docker images:
• generic: a generic compute resource
• sensor: the sensor resource we know
• One kind of links available:
• collector: the collector link we know
50. OCCI Monitoring Extension
Docker image construction
• The basic Docker image is shipped in the VirtualBox VM
• It needs to be specialized in
• an image for the generic resource
• an image for the sensor resource
• The build.sh script (re)builds the two Docker images
• if you want to modify the code you need this feature
• for instance, to introduce new mixins
51. OCCI Monitoring Extension
ROcMon sandbox operation
• The OCCI-server implements a simplified OCCI server
• accepts POST and GET requests on OCCI entities
• instantiates resources as running Dockers and configures
them (using their REST interfaces)
• The run.sh wrapper simplifies your life:
• you prepare a directory with files containing the OCCI
documents
• call run.sh passing the directory name as a parameter
• see the demo0 directory for a working example (call with
./run.sh demo0)
52. OCCI Monitoring Extension
Demo0 layout
• A demo infrastructure with one generic resource with a
CPU load and connectivity monitoring
53. OCCI Monitoring Extension
Demo0 directory: content
• Contents:
• an additional run.sh: displays some info and starts a UDP
socket to receive monitoring results
• 01g.json the generic computing resource
• 02s.json this sensor: computes the moving average of the
measurements and sends to the UDP socket above
• 03c.json the collector: collects two metrics from the generic
resource: CPU load and google pings.
• Conventions:
• The prefix digits are used to enforce an ordering in
activation
• The json prefix is mandatory for the documents to be
uploaded to the OCCI server
54. OCCI Monitoring Extension
Demo0 files: the generic computing resource 01g
{
" id " : " 01g" ,
" kind " : " http : / / schemas . ogf . org / occi / docker#generic " ,
" a t t r i b u t e s " : {
" occi " : {
" generic " : {
" speed " : 2 ,
"memory" : 4 ,
" cores " : 2 }
}
} ,
" l i n k s " : [ " 03c " ]
}
• the id must match the filename (redundant)
• the kind drives the processing of the document by the
OCCI-server
• the attributes are in fact unused in our sandbox
• the links are a reference to another document
55. OCCI Monitoring Extension
Demo0 files: the sensor resource 02s
{
" id " : " 02s " ,
" kind " : " http : / / schemas . ogf . org / occi / monitoring#sensor " ,
" mixins " : [
" http : / / example .com/ occi / monitoring / publisher #SendUDP" ,
" http : / / example .com/ occi / monitoring / aggregator#EWMA" ,
" http : / / example .com/ occi / monitoring / publisher #Log "
] ,
" a t t r i b u t e s " : {
" occi " : { " sensor " : { " period " : 3 } } ,
"com" : { " example " : { " occi " : { " monitoring " : {
"SendUDP" : { " hostname " : " localhost " , " port " : " 8888 " , " input " : " c " } ,
"EWMA" : { " gain " : 16 , " instream " : "a" , " outstream " : " c " } ,
" Log " : { " filename " : " / tmp/02 s . log " , " in_msg " : "b" }
} } } } } ,
" l i n k s " : [ ]
}
• the sensor features three mixins: SendUDP, EWMA and Log
• they are defined in a private namespace
• the attributes introduced by the mixins are defined below
• the input, instream, outstream and in_msg attributes define
channels
• hostname, port, gain and filename are functional parameters
56. OCCI Monitoring Extension
Demo0 files: the collector resource 03c
{
" id " : " 03c " ,
" kind " : " http : / / schemas . ogf . org / occi / monitoring# c o l l e c t o r " ,
" mixins " : [
" http : / / example .com/ occi / monitoring / metric#CPUPercent " ,
" http : / / example .com/ occi / monitoring / metric#IsReachable "
] ,
" a t t r i b u t e s " : {
" occi " : { " c o l l e c t o r " : { " period " : 3 } } ,
"com" : { " example " : { " occi " : { " monitoring " : {
" CPUPercent " : { " out " : "a" } ,
" IsReachable " : { " hostname " : " 172.217.16.4 " , " maxdelay " : 1000, " out " : "b" }
} } } } } ,
" actions " : [ ] ,
" target " : " 02s " ,
" source " : " 01g"
}
• the collectors features two mixins: CPULoad, and IsReachable
• they are defined in a private namespace
• the attributes introduced by the mixins are defined below
• the out and in attributes define channels
• hostname, maxdelay are functional parameters
57. OCCI Monitoring Extension
ROcMon: code organization (I)
• The VM provided contains the ROcMon source tree (see
Desktop/rocmon/build)
• The directory contains three subdirectories, one for each
OCCI entity: generic, sensor, collector.
• Each of them contains the code embedded in the docker
image of a OCCI resource:
• generic and collector are loaded into the occi_generic image
(a generic compute resource)
• sensor is loaded into the occi_sensor image
• look into the Dockerfiles to see what I mean
58. OCCI Monitoring Extension
ROcMon: code organization (II)
• the generic directory contains
• the REST server for configuration
• a directory containing the metric mixins
• the collector contains
• the implementation of the metric container
• the sensor contains
• the REST server for configuration
• the implementation of the sensor
• two directories containing the aggregator and publisher
mixins
59. OCCI Monitoring Extension
ROcMon: mixins commons
• They finalize the definition of abstract methods of the class
corresponding to the mixin tag
• A hash with the operational parameters is an instance
@variable
• Input is received through channels
• Hash entries correspond to mixin parameters
• Output is
• returned to the caller (metric)
• forwarded through a channel (aggregator and publisher)
60. OCCI Monitoring Extension
Example: CPU load
require " . / Metric "
class CPUPercent < Metric
require " open3 "
def measurement ( )
out , err , st = Open3 . capture3 ( ’w | head −1 ’ )
i f st . e x i t s t a t u s == 0
perc=out . s p l i t ( " load average : " ) [ 1 ] . s p l i t ( " , " ) [ 0 ] . gsub ( " , " , " . " ) . to_f ∗100
else
perc= n i l
end
return JSON. generate (Hash [ @metric_hash [ : out ] => perc ] )
end
end
• CPUPercent is a subclass of Metric (a tag)
• measurement is the name of the parameterless method
• only one attribute defined named out
• return value is a JSON object
• object labels correspond to mixin attributes
61. OCCI Monitoring Extension
Example: EWMA (I)
require " . / Aggregator "
class EWMA < Aggregator
def i n i t i a l i z e ( sensor_hash , aggregator_hash , syncChannels )
super
begin
addChannel ( @aggregator_hash [ : instream ] . to_sym )
addChannel ( @aggregator_hash [ : outstream ] . to_sym )
rescue Exception => e
puts " Problems adding a channel : #{e . message } "
puts e . backtrace . inspect
end
end
• EWMA is a subclass of Aggregator (a tag)
• first execute superclass initialize to fill instance variables
• the initialize method uses channel labels to create
internal queues
• note that instream and outstream are the channel attributes
names
62. OCCI Monitoring Extension
Example: EWMA (II)
def run ( )
output= n i l
begin
gain=@aggregator_hash [ : gain ]
inChannel=getChannelByName ( " instream " )
outChannel=getChannelByName ( " outstream " )
loop do
data=inChannel . pop
puts "EWMA: tnew data in (#{ data } ) "
output | | = data
output = ( ( output ∗ ( gain −1))+data ) / gain
outChannel . push ( output )
puts "EWMA: tdata out (#{ output } ) "
end
rescue Exception => e
puts " Problems during the run of a publisher : #{e . message } "
puts e . backtrace . inspect
end
end
• the gain parameter is retrieved from the instance hash
• as soon as data is available, it is popped out of the input
channel
• data is processed, and pushed in the output channel
63. OCCI Monitoring Extension
Example: SendUDP (I)
class SendUDP < Publisher
def i n i t i a l i z e ( sensor_hash , publisher_hash , syncChannels )
super
begin
addChannel ( @publisher_hash [ : input ] . to_sym )
puts " Input channel f o r SendUDP added "
rescue Exception => e
puts " Problems adding a channel : #{e . message } n"
puts e . backtrace . inspect
end
end
def run ( )
• This is analogous to the previous one
• input is the name of the ingress channel
64. OCCI Monitoring Extension
Example: SendUDP (II)
begin
socket = UDPSocket . new
inChannel=getChannelByName ( : input )
loop do
data=inChannel . pop
puts "UDPSOCKET: tnew data received (#{ data } ) "
begin
socket . send ( " data="+data . to_s+" n" ,0 ,
@publisher_hash [ : hostname ] ,
@publisher_hash [ : port ]
)
rescue Exception => e
puts " Problems sending with UDP( 2 ) : #{e . message } "
puts e . backtrace . inspect
end
puts "UDPSOCKET: tdata sent to socket (#{ data } ) "
end
rescue Exception => e
puts " Problems sending with UDP: #{e . message } "
puts e . backtrace . inspect
end
end
• Slightly more complex than the previous: need to manage
a UDP socket!