Simple practices in performance monitoring and evaluation

•

0 likes•330 views

This document discusses concepts and approaches for performance monitoring and evaluation. It defines key metrics like throughput, latency, concurrency and provides examples for measuring API and system performance. Specific metrics are outlined for services like call centers. Benchmarking quality of services and setting performance SLAs are also covered. The document provides code examples for implementing metrics collection and visualization using tools like JMX, Ganglia and Zabbix. It demonstrates measuring performance for a demo web application.

Simple Practices in
Performance Monitoring and Evaluation
Schubert Zhang
2016.3.24

SLA
Service Level Agreements
https://en.wikipedia.org/wiki/Service-level_agreement
SLAs commonly include segments to address:

a deﬁnition of services, performance measurement, problem management, customer duties,
warranties, disaster recovery, termination of agreement.

•
•
• API
IM SLA
•
• Performance
• Performance
performance oriented SLA

Metrics
SLA Performance SLA
Performance Metrics
e.g.1: API
•
• (99%)
•
e.g.2: Call Center
• Abandonment Rate: Percentage of calls abandoned while waiting to be answered.
• ASA (Average Speed to Answer): Average time it takes for a call to be answered
by the service desk.
• TSF (Time Service Factor): Percentage of calls answered within a deﬁnite
timeframe, e.g., 80% in 20 seconds.
• FCR (First-Call Resolution): Percentage of incoming calls that can be resolved
without the use of a callback or without having the caller call back the helpdesk to
ﬁnish resolving the case.
• TAT (Turn-Around Time): Time taken to complete a certain task.
Metrics
Performance Metrics

Benchmarking
the quality of a service must be measured, evaluated,
… benchmarked.
and we must have a set of approaches for benchmarking.

Throughput
QPS TPS CPS
in seconds, in minutes, in hours …

Latency
Response Time Round-Trip Time(RTT) …
Average Median Min. Max. Percentile …

Quantile / Percentile
refers to Google Sawzall Paper

A Summary of these Concepts
Client-1
Client-2
Client-3
Client-N
Work Thread
Work Thread
Work Thread
Work Thread
Work Thread
ThroughputLatency Concurrency
Clients Server

Example-2
Evaluation Report to a NoSQL DB
Cassandra

Benchmark for Write API
Benchmark for Writes Cluster overview
Throughput
Latency
• Each node runs 6 clients (threads), totally 54 clients.
• Each client generates random CDRs for 50 million users/phone-numbers,
and puts them into DaStor one by one.
– Key Space: 50 million
– Size of a CDR: Thrift-compacted encoding, ~200 bytes
ü Throughput: average ~80K ops/s; per-node: average ~9K ops/s
ü Latency: average ~0.5ms
p Bottleneck: network (and memory)

Benchmark for Read API
• Each node runs 8 clients (threads) , totally 72 clients.
• Each client randomly uses a user-id/phone-number out of the 50-million
space, to get it’s recent 20 CDRs (one page) from DaStor.
• All clients read CDRs of a same day/bucket.
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
100ms
percentage of read ops
ü Throughput: average ~140 ops/s; per-node: average ~16 ops/s
ü Latency: average ~500ms, 97% < 2s (SLA)
p Bottleneck: disk IO (random seek) (CPU load is very low)
average
97%
quantile

• In server side
• Add a operation-count and the time-
cost for every client call
• For every monitor interval, pull and
push the current Throughput and
Latency the monitor-tool(ganglia/
zabbix) or console.
• Throughput = sum of count / time interval
• Latency = average(sum of latency / sum of count),
max, min, quantile …
Code in Gitlab and Gerrit

• Java
• JMX (Java Management Extensions, a simple example at https://github.com/schubertzhang/jsketch)
• javaagent (java -javaagent:jar path [= premain ] )
• jmxetric (use JMX and javaagent to display metrics to Ganglia, https://github.com/schubertzhang/jmxetric)
•
• Ganglia
• Zabbix
• …

Performance Benchmark
Programing
Demo
Test and Evaluation the Throughput and Latency of http://www.fangdd.com

demo screenshots
Average 95%
The long tail …

Statistical Monitoring for Outlier
usually for trouble-shooting

Captured from UTStarcom mSwitch R5 system, Guangxi Site, 2004.
The magic matrix:

•
• Redis Memcache
• Just add at a point, very low-cost
•
• Very
• Logs ELK

This document discusses using Redmine as a ticketing system for service level agreements (SLAs). It begins with defining key terms like tracker, tracked, and project. It then outlines important considerations for implementation like authentication, privacy and availability. The remainder of the document provides steps for implementing Redmine as a helpdesk ticketing system, including customizing data entry and notifications, configuring SLA features, and generating reports. Mobile access and examples of common SLA metrics like average speed to answer and first call resolution are also covered.

Unit-I_ES.pdf

Bogiri Nagaraju

The document provides an introduction to embedded systems and Internet of Things. It discusses embedded systems, their characteristics, categories including stand-alone, real-time, networked and mobile systems. It also covers ARM processors, their architecture featuring RISC load/store architecture and features like reduced instruction set. Real-time scheduling algorithms like Rate Monotonic, Deadline Monotonic and dynamic algorithms like Earliest Deadline First, Least Laxity First are also summarized.

Rate limits and all about

Alexander Tokarev

Asynchronous programming using CompletableFutures in Java

Oresztész Margaritisz

This document discusses asynchronous programming using CompletableFuture in Java. It begins by explaining what asynchronous programming is and why it is important to use non-blocking I/O. It then provides examples of how to use CompletableFuture to make asynchronous method calls and combine results. Some pros and cons of using CompletableFuture are discussed, along with design considerations for asynchronous programming.

Metrics driven development with dedicated Observability Team

LINE Corporation

This document discusses metrics driven development from an observability perspective at LINE Corp. It summarizes LINE's observability stack, which includes metrics, logging, and tracing to monitor user experience and reliability across its many services and 170M users. The stack called IMON aggregates millions of metrics and log entries per minute from thousands of servers. Engineers are responsible for monitoring their applications and are required to do on-call rotations. Future work includes improving the telemetry system and driving an autonomous, data-driven engineering culture focused on stability.

Performance Oriented Design

Rodrigo Campos

This document discusses performance-oriented design and what metrics should be measured. It emphasizes that performance is important and organizations should care about it. Key metrics that should be measured include arrival rate, service time, throughput, queues, method counts, response times, and other application and system-level metrics. References for further reading on performance engineering and capacity planning are also provided.

This document discusses using AWS X-Ray to trace requests across distributed services to identify performance bottlenecks. It provides an overview of X-Ray concepts, demonstrates how to integrate services like API Gateway and Lambda, and discusses lessons learned around issues with async calls and costly health checks. Recording 260M traces per month for a 2,000 RPS application would cost around $1,296 for storage and $1,080 for queries, for a total of $2,376 per month. In conclusion, X-Ray provides a service graph and sampling capabilities but requires workarounds for some issues like a lack of integration with Kinesis and DynamoDB Streams.

Latency SLOs Done Right

Fred Moyer

The document discusses techniques for accurately calculating service level objectives (SLOs) based on latency. It begins with an overview of common SLO terminology. It then describes a common mistake where percentiles are incorrectly averaged across time windows. The document proceeds to examine approaches to computing SLOs using log data, request counting, and histograms. Histograms are identified as the most flexible technique since they allow thresholds to be chosen as needed and provide full statistical analysis of latency data.

An adaptive and eventually self healing framework for geo-distributed real-ti...

Angad Singh

This document discusses an adaptive and self-healing framework for real-time data ingestion across geographically distributed data centers. It describes the problem domain of ingesting 15 billion events per day across multiple schemas and data types from various sources. The proposed architecture includes an ingestion layer using technologies like Storm, Kafka and HDFS to ingest, transform and replicate streaming and batch data. It also includes a serving layer using Aerospike to provide low-latency aggregated user views. Issues encountered with technologies like Storm and Kafka are discussed, as well as features still under development.

Robotics technical Presentation

klepsydratechnologie

HIGH PERFORMANCE EDGE DATA PROCESSING SOFTWARE FOR ROBOTIC APPLICATIONS Robotic edge autonomous systems is a fast growing market with as fast growing challenges: Sophisticated applications, Increasing demand in complexity: Communications with other systems, Heavy algorithms: AI, vision navigation, etc. Hardware has improved substantially: - Better embedded computers - Improved sensors (more and faster data) https://klepsydra.com/demos-media/#earth-observation-throughput

High throughput data streaming in Azure

Alexander Laysha

The session is focused on solutions that require high-throughput ingestion & streaming of data in real-time. You'll get familiar with different business uses-cases and architecture examples to get a common idea as well as understand the concepts of stream processing systems. Next you'll get deep insights into functional and non-functional capabilities of Azure Event Hub service to see how it fits into the whole picture. Moreover you'll learn current constraints of this service to be able to qualify it's usage for your particular scenario

The Case for a Signal Oriented Data Stream Management System

Reza Rahimi

This document proposes a signal-oriented data stream management system called WaveScope. It discusses typical applications involving sensor networks, the data and programming model using a domain-specific language called WaveScript, and the system architecture involving query planning, optimization, and distributed execution. Key aspects include managing timing information across different timebases, optimizing queries using both database and signal processing techniques, and supporting archived historical data retrieval.

39245203 intro-es-iv

Embeddedbvp

Scaling habits of ASP.NET

David Giard

The document discusses the scaling habits of ASP.NET applications over multiple versions from initial launch to large-scale business success. As an application grows from version 1 with a few users to version N with thousands of users, the key scaling challenges change from fixing logical problems to addressing performance bottlenecks and high availability requirements. The solutions also evolve from simple code optimizations to sophisticated architectures with load balancing, caching, and separate servers for web and database tiers.

LeanXcale Presentation - Waterloo University

Ricardo Jimenez-Peris

Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...

HostedbyConfluent

Deploying Kafka to support multiple teams or even an entire company has many benefits. It reduces operational costs, simplifies onboarding of new applications as your adoption grows, and consolidates all your data in one place. However, this makes applications sharing the cluster vulnerable to any one or few of them taking all cluster resources. The combined cluster load also becomes less predictable, increasing the risk of overloading the cluster and data unavailability. In this talk, we will describe how to use quota framework in Apache Kafka to ensure that a misconfigured client or unexpected increase in client load does not monopolize broker resources. You will get a deeper understanding of bandwidth and request quotas, how they get enforced, and gain intuition for setting the limits for your use-cases. While quotas limit individual applications, there must be enough cluster capacity to support the combined application load. Onboarding new applications or scaling the usage of existing applications may require manual quota adjustments and upfront capacity planning to ensure high availability. We will describe the steps we took toward solving this problem in Confluent Cloud, where we must immediately support unpredictable load with high availability. We implemented a custom broker quota plugin (KIP-257) to replace static per broker quota allocation with dynamic and self-tuning quotas based on the available capacity (which we also detect dynamically). By learning our journey, you will have more insights into the relevant problems and techniques to address them.

Transactional Streaming: If you can compute it, you can probably stream it.

jhugg

This document discusses transactional stream processing and operational state. It argues that integrating state management and stream processing within the same transactional system avoids issues caused by independent failures of separate systems and reduces the need for "glue code". It provides examples of how transactional stream processing can enable features like correlation, deduplication, and aggregation in a reliable way. Key aspects that are important for operational workloads like counting, accounting, and statistics are ensuring idempotence and implementing operations atomically within transactions.

A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...

Transcat

In this short presentation, we explore three main considerations when deciding to upgrade your Benchtop Oscilloscopes. 1.) new technology reduces time to debug, gives you better signal visualization 2.) integrated features reduce total equipment count, cost 3.) longer cal cycles reduce downtime and lower total cost of ownership Presented by Mike Hoffman, an Engineer for Agilent Technologies. Mike works at Agilent's Oscilloscopes and Protocol Division headquarters in Colorado Springs, where all X-Series oscilloscopes are designed.

High Performance Erlang - Pitfalls and Solutions

Yinghai Lu

Presented at Erlang Factory 2016, San Francisco, CA. Erlang is widely used for building concurrent applications. However, when we push the performance of our Erlang based application to handle millions of concurrent clients, some Erlang scalability issues begin to show and some conventional programming paradigm of Erlang no longer hold. We would like to share some of these issue and how we address them. In addition, we share some of our experience on how to profile an Erlang application to identify bottlenecks. We will take a deep look at some of the basic mechanisms of Erlang and show how they behave under high load and parallelism, which includes message delivery, process management and shared data structures such as maps and ETS tables. We will demonstrate their limitations and propose techniques to alleviate the issues. We will also share profiling techniques on how to find those bottlenecks in Erlang applications across different levels. We will share techniques for writing highly performant Erlang applications.

Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...

InfluxData

NetApp is a global cloud-led, data-centric software company. They are an industry leader in hybrid cloud data services and data management solutions. Their platform enables their customers to store and share large quantities of digital data across physical and hybrid cloud environments. NetApp Engineering’s Site Reliability Engineering team is tasked with supporting their internal build environment, test, and automation infrastructure. After collecting their time-stamped data in InfluxDB, they are using Kapacitor to push alerts directly to Slack via webhooks. Their globally distributed SRE team are able to seamlessly collaborate and troubleshoot. Discover how NetApp uses a time series platform to detect trends in real time that can result in failures within their environments, and to provide key metrics used in SRE postmortems. Join this webinar as Dustin Sorge will dive into: NetApp's approach to monitoring their SRE team's metrics — including SLO's and SLI's Their best practices and techniques for monitoring memory usage and CPU usage How they use InfluxDB and Telegraf to detect trends and coordinate fixes faster.

Energy efficient AI workload partitioning on multi-core systems

Deepak Shankar

o create an AI system, the semiconductor, software, and systems team need to work together. Multi-core systems can provide extremely low latency and higher throughput at lower power consumption. But concurrent access to shared resources by multiple of AI workloads running on different cores can create higher worst-case execution time (WCET) and causes multiple system failures. Architecture exploration can be used to efficiently balance the compute, communication, synchronization, and storage. In this Webinar, we will be using Workloads from automotive, and data centers to demonstrate the methodology. VisualSim Architect enables designers to assemble architecture models that extend from the smallest IoT to full automotive, and Radar systems to Data Centers. These models will include any combination of software, processors, ECU, RTOS and networks. Using this platform, software designer can explore the partitioning of the AI tasks (software or model) on to cores based on the latency, bandwidth, and power constraints. Within the IoT, the processor, A/D, Bluetooth and software can be modeled while an automotive design will require the network, ECU and firmware. Both have a unique mechanism to define the traffic, test scenarios and AI workloads. Hardware engineers can select cores, cores per cluster, cache hierarchy, memory controller, accelerators, and the interface topology. Software engineers can tune the partitioning, synchronization overhead, memory access schedules and scheduling.

How to scale recommendation system with HBase

Rafael Arana

This document discusses building a recommendation system for IPTV using HBase that meets strict service level agreements (SLAs) for latency, availability, and concurrency. It covers using machine learning techniques like content-based and collaborative filtering to generate recommendations and storing the recommendation data in HBase. It then discusses various techniques for meeting the SLAs, such as using read replicas, pre-splitting tables, bulk loading, and tuning the Java garbage collector. Real-time log analytics is also proposed to monitor the system.

Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020

Redis Labs

The document discusses rate limiting and metering using Redis. It begins by introducing rate limiting and metering and why Redis is well-suited for these tasks. It then covers different Redis data structures that can be used, such as lists, hashes, sorted sets and strings. Common Redis commands for counting, setting keys and checking time to live are also presented. Different rate limiting design patterns and anti-patterns are described, including fixed window, sliding window and token bucket approaches. Finally, resources for further information are provided.

Business in a Flash: How to increase performance and lower costs in the data...

Violin Memory

Find out how Flash fabric architecture improves performance and dramatically lowers costs in the data center. In this presentation, you will learn about * Storage challenges in application deployments * Flash fabric architecture * Revolution of economics in the data center * Case studies: a global telcom company, Juniper Networks, Fortune 500 retailer, multiple quotations about improved performance and lowered costs from real customers

Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex

Apache Apex

Apache Apex is a next gen big data analytics platform. Originally developed at DataTorrent it comes with a powerful stream processing engine, rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on YARN and HDFS and is used in production in various industries. You will learn about the Apex architecture, including its unique features for scalability, fault tolerance and processing guarantees, programming model and use cases. http://apachebigdata2016.sched.org/event/6M0L/next-gen-big-data-analytics-with-apache-apex-thomas-weise-datatorrent

Blockchain in Action

Simple practices in performance monitoring and evaluation

Recommended

Recommended

More Related Content

Similar to Simple practices in performance monitoring and evaluation

Similar to Simple practices in performance monitoring and evaluation (20)

More from Schubert Zhang

More from Schubert Zhang (20)

Recently uploaded

Recently uploaded (20)

Simple practices in performance monitoring and evaluation