Ceilometer lsf-intergration-openstack-summit

•

0 likes•1,239 views

Tim Bell

Ceilometer
CERN use case:
● CERN delivers resources in form of virtual machines and via traditional
batch and Grid computing
● Individual batch nodes execute payload from different users and
communities
● Accounting should cover both use cases
● Interesting metrics include
● What is the resource usage of experiment A during December ?
● What is the resource usage of user B last year ?
● Accounting information has to be reported to Grid bodies (WLCG) by
experiment
Facts:
● Details of user's jobs present in batch accounting database already
● It is a huge DB with around 400,000 records being added everyday
Solution
● Use of ceilometer as single source of truth for accounting data
● Batch data is put in the ceilometer database for accounting purpose

Ceilometer: Current Implementation
Ceilometer
Agent Central
With batch Plugin
Ceilometer
Collector
for batch Data
Ceilometer
Database
(mongodb)
RabbitMQRabbitMQ-LSF
Ceilometer
Agent
Central
Ceilometer
Collector
Ceilometer
API
Ceilometer
Agent
Compute
batch specific
instances
Batch
accounting
database
IaaS specific
instances

Ceilometer: Current Implementation
● Written a ceilometer-agent-central plugin, which polls
the batch accounting database for unpublished records
● The unpublished records are then pushed to metering
queue (RabbitMQ)
● The ceilometer-collector instance consumes the
messages from the metering queue and inserts them in
the ceilometer database (mongodb)

Ceilometer: Current Implementation
● In order to decrease the load on the openstack
messaging server, the batch data is being pushed to a
different messaging server than the one to which other
openstack messages (e.g. those from agent-compute)
go.
● This means that there are dedicated instances of
agent-central and collector for VM and batch metering
● The collectors writes the data into a single database

Ceilometer: LSF Data Statistics
● The batch plugin is run once per hour if the previous
run has finished
● Most runs do not have any unpublished data as data in
the batch accounting database arrives in bursts
● Most data of the day is published to the messaging
server within 2 runs of around 200,000 job records
each
● It takes around 5 hrs to complete one such run

Ceilometer: Batch Data Statistics
● The average rate of record publishing to the batch
rabbitmq server is 11 Hz. This includes
– the time to read unpublished records,
– push them to the rabbit-server and
– marking records in batch accounting database as
published
● Most of this time is spent in records publishing only
● The time for activities other than publishing is
minuscule
● The grow rate of the mongodb database is about
2GB/day

What's hot

Flink Forward Berlin 2018: Shriya Arora - "Taming large-state to join dataset...Flink Forward

Continuously Updating Query Results over Real-Time Linked DataRuben Taelman

Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward

Stream Processing Live Traffic Data with Kafka StreamsTim Ysewyn

I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...Jonas Traub

Windowing in apexYogi Devendra Vyavahare

Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...Flink Forward

IoT Research ProjectTanvi Priyadarshini

Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward

Kubernetes at Telekom Austria Group Oliver Moser

Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward

Prometheus on AWSMitsuhiro Tanda

Spark Pitfalls meetup UnderscoreILlioron22

Relational Database Management Systemsweetysweety8

Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasFlink Forward

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward

Story of migrating event pipeline from batch to streaminglohitvijayarenu

Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Ververica

Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Taiwan User Group

Internet of things - 3/4. Solving the problemsSumanth Bhat

What's hot (20)

Flink Forward Berlin 2018: Shriya Arora - "Taming large-state to join dataset...

Continuously Updating Query Results over Real-Time Linked Data

Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...

Stream Processing Live Traffic Data with Kafka Streams

I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...

Windowing in apex

Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...

IoT Research Project

Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...

Kubernetes at Telekom Austria Group

Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...

Prometheus on AWS

Spark Pitfalls meetup UnderscoreIL

Relational Database Management System

Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...

Story of migrating event pipeline from batch to streaming

Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...

Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview

Internet of things - 3/4. Solving the problems

Similar to Ceilometer lsf-intergration-openstack-summit

Log FilesHeinrich Hartmann

Cloud Security Monitoring and Spark Analyticsamesar0

Large scale virtual Machine log collector (Project-Report)Gaurav Bhardwaj

How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...Amazon Web Services

Architectual Comparison of Apache Apex and Spark StreamingApache Apex

Big data Argentina meetup 2020-09: Intro to presto on dockerFederico Palladoro

20180503 kube con eu kubernetes metrics deep diveBob Cotton

Stream processing with Apache Flink @ OfferUpBowen Li

Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex

Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex

Netflix Data Pipeline With KafkaSteven Wu

Netflix Data Pipeline With KafkaAllen (Xiaozhong) Wang

How to Develop and Operate Cloud First Data PlatformsAlluxio, Inc.

The state of Spark in the cloudNicolas Poggi

How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...Amazon Web Services

Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex

BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...Amazon Web Services

Next Gen Big Data Analytics with Apache Apex DataWorks Summit/Hadoop Summit

Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingApache Apex

ECS19 - Ingo Gegenwarth - Running Exchangein large environmentEuropean Collaboration Summit

Similar to Ceilometer lsf-intergration-openstack-summit (20)

Log Files

Cloud Security Monitoring and Spark Analytics

Large scale virtual Machine log collector (Project-Report)

How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...

Architectual Comparison of Apache Apex and Spark Streaming

Big data Argentina meetup 2020-09: Intro to presto on docker

20180503 kube con eu kubernetes metrics deep dive

Stream processing with Apache Flink @ OfferUp

Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare

Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex

Netflix Data Pipeline With Kafka

How to Develop and Operate Cloud First Data Platforms

The state of Spark in the cloud

How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...

Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex

BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...

Next Gen Big Data Analytics with Apache Apex

Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming

ECS19 - Ingo Gegenwarth - Running Exchangein large environment

More from Tim Bell

CERN IT Monitoring Tim Bell

CERN Status at OpenStack Shanghai Summit November 2019Tim Bell

20190620 accelerating containers v3Tim Bell

20190314 cern register v3Tim Bell

20181219 ucc open stack 5 years v3Tim Bell

OpenStack at CERN : A 5 year perspectiveTim Bell

20170926 cern cloud v4Tim Bell

The OpenStack Cloud at CERN - OpenStack NordicTim Bell

20161025 OpenStack at CERN BarcelonaTim Bell

20150924 rda federation_v1Tim Bell

OpenStack Paris 2014 - Federation, are we there yet ?Tim Bell

20141103 cern open_stack_paris_v3Tim Bell

CERN Mass and Agility talk at OSCON 2014Tim Bell

20140509 cern open_stack_linuxtag_v3Tim Bell

Open stack operations feedback loop v1.4Tim Bell

CERN clouds and culture at GigaOm London 2013Tim Bell

20130529 openstack cee_day_v6Tim Bell

Academic cloud experiences cern v4Tim Bell

Havana survey results-final-v2Tim Bell

More from Tim Bell (20)

CERN IT Monitoring

CERN Status at OpenStack Shanghai Summit November 2019

20190620 accelerating containers v3

20190314 cern register v3

20181219 ucc open stack 5 years v3

OpenStack at CERN : A 5 year perspective

20170926 cern cloud v4

The OpenStack Cloud at CERN - OpenStack Nordic

20161025 OpenStack at CERN Barcelona

20150924 rda federation_v1

OpenStack Paris 2014 - Federation, are we there yet ?

20141103 cern open_stack_paris_v3

CERN Mass and Agility talk at OSCON 2014

20140509 cern open_stack_linuxtag_v3

Open stack operations feedback loop v1.4

CERN clouds and culture at GigaOm London 2013

20130529 openstack cee_day_v6

Academic cloud experiences cern v4

Havana survey results-final-v2

Ceilometer lsf-intergration-openstack-summit

1. Ceilometer CERN use case: ● CERN delivers resources in form of virtual machines and via traditional batch and Grid computing ● Individual batch nodes execute payload from different users and communities ● Accounting should cover both use cases ● Interesting metrics include ● What is the resource usage of experiment A during December ? ● What is the resource usage of user B last year ? ● Accounting information has to be reported to Grid bodies (WLCG) by experiment Facts: ● Details of user's jobs present in batch accounting database already ● It is a huge DB with around 400,000 records being added everyday Solution ● Use of ceilometer as single source of truth for accounting data ● Batch data is put in the ceilometer database for accounting purpose

2. CERN's idea to use ceilometer

3. Ceilometer: Current Implementation Ceilometer Agent Central With batch Plugin Ceilometer Collector for batch Data Ceilometer Database (mongodb) RabbitMQRabbitMQ-LSF Ceilometer Agent Central Ceilometer Collector Ceilometer API Ceilometer Agent Compute batch specific instances Batch accounting database IaaS specific instances

4. Ceilometer: Current Implementation ● Written a ceilometer-agent-central plugin, which polls the batch accounting database for unpublished records ● The unpublished records are then pushed to metering queue (RabbitMQ) ● The ceilometer-collector instance consumes the messages from the metering queue and inserts them in the ceilometer database (mongodb)

5. Ceilometer: Current Implementation ● In order to decrease the load on the openstack messaging server, the batch data is being pushed to a different messaging server than the one to which other openstack messages (e.g. those from agent-compute) go. ● This means that there are dedicated instances of agent-central and collector for VM and batch metering ● The collectors writes the data into a single database

6. Ceilometer: LSF Data Statistics ● The batch plugin is run once per hour if the previous run has finished ● Most runs do not have any unpublished data as data in the batch accounting database arrives in bursts ● Most data of the day is published to the messaging server within 2 runs of around 200,000 job records each ● It takes around 5 hrs to complete one such run

7. Ceilometer: Batch Data Statistics ● The average rate of record publishing to the batch rabbitmq server is 11 Hz. This includes – the time to read unpublished records, – push them to the rabbit-server and – marking records in batch accounting database as published ● Most of this time is spent in records publishing only ● The time for activities other than publishing is minuscule ● The grow rate of the mongodb database is about 2GB/day

Ceilometer lsf-intergration-openstack-summit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ceilometer lsf-intergration-openstack-summit

Similar to Ceilometer lsf-intergration-openstack-summit (20)

More from Tim Bell

More from Tim Bell (20)

Ceilometer lsf-intergration-openstack-summit