A key feature when monitoring and debugging any Cloud infrastructure is to provide the ability to trace, track, and collate all the individual, discrete steps that compose an event. A typical resource action in OpenStack is often a combination of smaller tasks -- which given the distributed nature of OpenStack -- can fail at unpredictable points in the workflow. By collecting the appropriate events, operators can view all events within Ceilometer, filter on a failed action and trace back the history of related events to spot anomalies or errors. In this talk, we provide an overview of the recent enhancements made in Ceilometer to support the collection of event notifications from OpenStack services. We will describe: how events are processed, transformed and stored in Ceilometer; how you can derive metrics from events; and how it’s possible to track the events of a resource and analyse where errors occur.
9. if you’re lucky, you might find the real error!
[instance: e7933ceb-d1e7-42fe-9f37-d275ebd375bd] Instance failed to spawn
Traceback (most recent call last):
...
...
ProcessExecutionError: Unexpected error while running command.
Command: qemu-img convert -O raw
/opt/stack/data/nova/instances/_base/7434c85f2968d2cfb05b07d8c769d7d938cec5e
8.part
/opt/stack/data/nova/instances/_base/7434c85f2968d2cfb05b07d8c769d7d938cec5e
8.converted
Exit code: 1
Stdout: u''
Stderr: u'qemu-img: error while reading sector 0: Input/output errorn'
10. Debugging be Hard
• actions consists of multiple steps
• asynchronous calls that can cause
timing issues
• distributed nature of OpenStack
can make it difficult to debug
• parsing log files are easy -- if you’re
a robot
16. Creating an Instance
api conductor scheduler
compute
manager
build
network
build
storage
start
guest
notification bus
17. Creating an Instance
conductor scheduler
compute
manager
build
network
build
storage
start
guest
api
FAIL HERE
notification bus
18. OpenStack Events
• most services emit notifications for some discrete events
• the content of notification represent that state of the
environment, resource, etc… at the point in time
• notifications are defined by a type to describe content
• nova: compute.instance.create.*, scheduler.create_volume
• neutron: port.create.*, network.create.*
• cinder: volume.detach.*, volume.create.*
• keystone: identity.user.*, identity.project.*
• and a lot more...
19. Creating an Instance
api conductor scheduler
host
manager
build
network
build
storage
start
guest
notification bus
consumer?
20. Ceilometer
• telemetry project in OpenStack
• notification agent which consumes messages
• listens to the queues of each OpenStack service
• picks specific measurement values from notifications and
builds meters
23. Creating an Instance
api conductor scheduler
host
manager
build
network
build
storage
start
guest
notification bus
ceilometer notification
agent
Meters Events
24. Ceilometer Events
• initially implemented in Icehouse (part of StackTach
integration)
• an Event represents the state of an object in an OpenStack
service at a point in time.
• built from INFO and ERROR level notifications emitted by
all services
• ability to normalise messages by mapping key attributes
from notification messages to a common name
25. Ceilometer Event Model
• message id
• event type
• timestamp
• traits
• queryable, indexed
attributes
• ie. payload.x.y.z => attr1
• raw
• full notification
26. Ceilometer Event Processing
• all events are forced through
pipelines
• events can be published to
multiple targets
• database
• file
• queue
• http
27. Benefits of Centralised Events
• potential lost of data if logging locally
• normalisation of data
• event flow across services gives context
• individual events means nothing
• end to end flow means something
29. Debugging be Easier
• we wanted a view to show all the
events of a given action by a
resource
• be able to see any errors
• temporally aware -- order of events
• show the flow and context of events
31. ElasticSearch
• document-oriented, schema free database
• built on top of Apache Lucene
• focused on providing full-text search capabilities
• distributed, highly available, real time db
• kibana - gui interface to database
37. Extending Events
• there is a lot of data that isn’t published
• the data that is published is disorganised
• extending support in horizon
• drilling down into event to view full raw data
• filter options - time range, events for a specific request
• ceilometer
• alarm on events
• build metrics from events