Mining the event storm
Vladik Romanovsky
Engineer
The Anatomy of an Action
Engineer
Gordon Chung
OpenStack is a wonderful place
when you use OpenStack you might see this
WTF???
if you’re lucky, you might find the real error!
[instance: e7933ceb-d1e7-42fe-9f37-d275ebd375bd] Instance failed to spawn
Traceback (most recent call last):
...
...
ProcessExecutionError: Unexpected error while running command.
Command: qemu-img convert -O raw
/opt/stack/data/nova/instances/_base/7434c85f2968d2cfb05b07d8c769d7d938cec5e
8.part
/opt/stack/data/nova/instances/_base/7434c85f2968d2cfb05b07d8c769d7d938cec5e
8.converted
Exit code: 1
Stdout: u''
Stderr: u'qemu-img: error while reading sector 0: Input/output errorn'
Debugging be Hard
• actions consists of multiple steps
• asynchronous calls that can cause
timing issues
• distributed nature of OpenStack
can make it difficult to debug
• parsing log files are easy -- if you’re
a robot
Use Case: Creating an Instance
Creating an Instance
api conductor scheduler
compute
manager
build
network
build
storage
start
guest
Creating an Instance
api conductor scheduler
compute
manager
build
network
build
storage
start
guest
FAIL HERE
Creating an Instance
api conductor scheduler
compute
manager
build
network
build
storage
start
guest
FAIL HERE
Creating an Instance
conductor scheduler
compute
manager
build
network
build
storage
start
guest
api
FAIL HERE
Creating an Instance
api conductor scheduler
compute
manager
build
network
build
storage
start
guest
notification bus
Creating an Instance
conductor scheduler
compute
manager
build
network
build
storage
start
guest
api
FAIL HERE
notification bus
OpenStack Events
• most services emit notifications for some discrete events
• the content of notification represent that state of the
environment, resource, etc… at the point in time
• notifications are defined by a type to describe content
• nova: compute.instance.create.*, scheduler.create_volume
• neutron: port.create.*, network.create.*
• cinder: volume.detach.*, volume.create.*
• keystone: identity.user.*, identity.project.*
• and a lot more...
Creating an Instance
api conductor scheduler
host
manager
build
network
build
storage
start
guest
notification bus
consumer?
Ceilometer
• telemetry project in OpenStack
• notification agent which consumes messages
• listens to the queues of each OpenStack service
• picks specific measurement values from notifications and
builds meters
but wait, there’s more!
every notification is also captured
as an Event
Creating an Instance
api conductor scheduler
host
manager
build
network
build
storage
start
guest
notification bus
ceilometer notification
agent
Meters Events
Ceilometer Events
• initially implemented in Icehouse (part of StackTach
integration)
• an Event represents the state of an object in an OpenStack
service at a point in time.
• built from INFO and ERROR level notifications emitted by
all services
• ability to normalise messages by mapping key attributes
from notification messages to a common name
Ceilometer Event Model
• message id
• event type
• timestamp
• traits
• queryable, indexed
attributes
• ie. payload.x.y.z => attr1
• raw
• full notification
Ceilometer Event Processing
• all events are forced through
pipelines
• events can be published to
multiple targets
• database
• file
• queue
• http
Benefits of Centralised Events
• potential lost of data if logging locally
• normalisation of data
• event flow across services gives context
• individual events means nothing
• end to end flow means something
connecting the dots…
Debugging be Easier
• we wanted a view to show all the
events of a given action by a
resource
• be able to see any errors
• temporally aware -- order of events
• show the flow and context of events
postmortem analysis using
Elasticsearch
ElasticSearch
• document-oriented, schema free database
• built on top of Apache Lucene
• focused on providing full-text search capabilities
• distributed, highly available, real time db
• kibana - gui interface to database
KIBANA!!!
KIBANA!!!
HORIZON!!!
HORIZON!!!
Extending Events
• there is a lot of data that isn’t published
• the data that is published is disorganised
• extending support in horizon
• drilling down into event to view full raw data
• filter options - time range, events for a specific request
• ceilometer
• alarm on events
• build metrics from events
thank you
BACKUP
Horizon Events Prototype,
by George Peristerakis
https://github.com/enovance/horizon/tree/event-prototype

Anatomy of an action

  • 1.
    Mining the eventstorm Vladik Romanovsky Engineer The Anatomy of an Action Engineer Gordon Chung
  • 2.
    OpenStack is awonderful place
  • 3.
    when you useOpenStack you might see this
  • 8.
  • 9.
    if you’re lucky,you might find the real error! [instance: e7933ceb-d1e7-42fe-9f37-d275ebd375bd] Instance failed to spawn Traceback (most recent call last): ... ... ProcessExecutionError: Unexpected error while running command. Command: qemu-img convert -O raw /opt/stack/data/nova/instances/_base/7434c85f2968d2cfb05b07d8c769d7d938cec5e 8.part /opt/stack/data/nova/instances/_base/7434c85f2968d2cfb05b07d8c769d7d938cec5e 8.converted Exit code: 1 Stdout: u'' Stderr: u'qemu-img: error while reading sector 0: Input/output errorn'
  • 10.
    Debugging be Hard •actions consists of multiple steps • asynchronous calls that can cause timing issues • distributed nature of OpenStack can make it difficult to debug • parsing log files are easy -- if you’re a robot
  • 11.
    Use Case: Creatingan Instance
  • 12.
    Creating an Instance apiconductor scheduler compute manager build network build storage start guest
  • 13.
    Creating an Instance apiconductor scheduler compute manager build network build storage start guest FAIL HERE
  • 14.
    Creating an Instance apiconductor scheduler compute manager build network build storage start guest FAIL HERE
  • 15.
    Creating an Instance conductorscheduler compute manager build network build storage start guest api FAIL HERE
  • 16.
    Creating an Instance apiconductor scheduler compute manager build network build storage start guest notification bus
  • 17.
    Creating an Instance conductorscheduler compute manager build network build storage start guest api FAIL HERE notification bus
  • 18.
    OpenStack Events • mostservices emit notifications for some discrete events • the content of notification represent that state of the environment, resource, etc… at the point in time • notifications are defined by a type to describe content • nova: compute.instance.create.*, scheduler.create_volume • neutron: port.create.*, network.create.* • cinder: volume.detach.*, volume.create.* • keystone: identity.user.*, identity.project.* • and a lot more...
  • 19.
    Creating an Instance apiconductor scheduler host manager build network build storage start guest notification bus consumer?
  • 20.
    Ceilometer • telemetry projectin OpenStack • notification agent which consumes messages • listens to the queues of each OpenStack service • picks specific measurement values from notifications and builds meters
  • 21.
  • 22.
    every notification isalso captured as an Event
  • 23.
    Creating an Instance apiconductor scheduler host manager build network build storage start guest notification bus ceilometer notification agent Meters Events
  • 24.
    Ceilometer Events • initiallyimplemented in Icehouse (part of StackTach integration) • an Event represents the state of an object in an OpenStack service at a point in time. • built from INFO and ERROR level notifications emitted by all services • ability to normalise messages by mapping key attributes from notification messages to a common name
  • 25.
    Ceilometer Event Model •message id • event type • timestamp • traits • queryable, indexed attributes • ie. payload.x.y.z => attr1 • raw • full notification
  • 26.
    Ceilometer Event Processing •all events are forced through pipelines • events can be published to multiple targets • database • file • queue • http
  • 27.
    Benefits of CentralisedEvents • potential lost of data if logging locally • normalisation of data • event flow across services gives context • individual events means nothing • end to end flow means something
  • 28.
  • 29.
    Debugging be Easier •we wanted a view to show all the events of a given action by a resource • be able to see any errors • temporally aware -- order of events • show the flow and context of events
  • 30.
  • 31.
    ElasticSearch • document-oriented, schemafree database • built on top of Apache Lucene • focused on providing full-text search capabilities • distributed, highly available, real time db • kibana - gui interface to database
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
    Extending Events • thereis a lot of data that isn’t published • the data that is published is disorganised • extending support in horizon • drilling down into event to view full raw data • filter options - time range, events for a specific request • ceilometer • alarm on events • build metrics from events
  • 38.
  • 39.
  • 40.
    Horizon Events Prototype, byGeorge Peristerakis https://github.com/enovance/horizon/tree/event-prototype