Brian Brazil, an engineer passionate about running software reliably in production, gave a workshop on provisioning and capacity planning. He taught attendees how to estimate spare capacity and runway by measuring the bottleneck resource, calculating utilization, and determining peak traffic. Brian also covered how to provision new machines based on queries per second per machine. While acknowledging real-world complexities, he emphasized the importance of monitoring for making operational decisions.
Prometheus Design and Philosophy by Julius Volz at Docker Distributed System Summit
Prometheus - https://github.com/Prometheus
Liveblogging: http://canopy.mirage.io/Liveblog/MonitoringDDS2016
In the glorious future, cancer will be cured, world hunger will solved and all because everything was directly instrumented for Prometheus. Until then however, we need to write exporters. This talk will look at how to go about this and all the tradeoffs involved in writing a good exporter.
Ansible at FOSDEM (Ansible Dublin, 2016)Brian Brazil
At FOSDEM 2016 we used Ansible for the first time to manage the infrastructure. This talk looks at how we did that, and tips for getting the most out of your Ansible setup.
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...Brian Brazil
Monitoring should be part of your solution, not a problem. This lightening talk takes a brief look at the ideas behind Inclusive Monitoring and how to use them with Python.
Systems Monitoring with Prometheus (Devops Ireland April 2015)Brian Brazil
Monitoring means many things to many people. This talk looks at Systems Monitoring, that is how to keep an eye on a given system and use this as part of overall management of a system. This talk will cover Why one monitors, What to monitor, How to monitor, the general design of a monitoring system and how Prometheus is a good fit for this in terms of instrumentation, consoles, alerts, general system health and sanity.
Prometheus is a next-generation monitoring system publicly announced earlier this year, developed by companies including SoundCloud, locals Boxever and Docker. Since launch there has been wide-spread interest, and many community contributions.
For more information see http://prometheus.io or http://www.boxever.com/tag/monitoring
Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil
Ever worried that you’ll have an outage someday because your production servers can’t handle increased user traffic?
Then this workshop will help put you at ease! Learn the foundations and how to apply it to your services.
At the end of the workshop you will be able to:
– Estimate how much spare capacity you have in less than 5 minutes
– Estimate how much runway that capacity provides
– Determine how many servers you need
– Spot common potential problems as you scale
Prometheus Design and Philosophy by Julius Volz at Docker Distributed System Summit
Prometheus - https://github.com/Prometheus
Liveblogging: http://canopy.mirage.io/Liveblog/MonitoringDDS2016
In the glorious future, cancer will be cured, world hunger will solved and all because everything was directly instrumented for Prometheus. Until then however, we need to write exporters. This talk will look at how to go about this and all the tradeoffs involved in writing a good exporter.
Ansible at FOSDEM (Ansible Dublin, 2016)Brian Brazil
At FOSDEM 2016 we used Ansible for the first time to manage the infrastructure. This talk looks at how we did that, and tips for getting the most out of your Ansible setup.
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...Brian Brazil
Monitoring should be part of your solution, not a problem. This lightening talk takes a brief look at the ideas behind Inclusive Monitoring and how to use them with Python.
Systems Monitoring with Prometheus (Devops Ireland April 2015)Brian Brazil
Monitoring means many things to many people. This talk looks at Systems Monitoring, that is how to keep an eye on a given system and use this as part of overall management of a system. This talk will cover Why one monitors, What to monitor, How to monitor, the general design of a monitoring system and how Prometheus is a good fit for this in terms of instrumentation, consoles, alerts, general system health and sanity.
Prometheus is a next-generation monitoring system publicly announced earlier this year, developed by companies including SoundCloud, locals Boxever and Docker. Since launch there has been wide-spread interest, and many community contributions.
For more information see http://prometheus.io or http://www.boxever.com/tag/monitoring
Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil
Ever worried that you’ll have an outage someday because your production servers can’t handle increased user traffic?
Then this workshop will help put you at ease! Learn the foundations and how to apply it to your services.
At the end of the workshop you will be able to:
– Estimate how much spare capacity you have in less than 5 minutes
– Estimate how much runway that capacity provides
– Determine how many servers you need
– Spot common potential problems as you scale
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Brian Brazil
From its humble beginnings in 2012, the Prometheus monitoring system has grown a substantial community with a comprehensive set of integrations. This talk will provide an overview of the core ideas behind Prometheus and its feature set.
Prometheus is a next-generation monitoring system. It lets you see you not just what your systems look like from the outside, but also gives visibility into the internals and business aspects of your systems. This allows everyone to benefit, including both operations and developers. This talk will look at the concepts behind monitoring with Prometheus, how it's designed, why it's suitable for Cloud Native environments and how you can get involved.
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)Brian Brazil
With the growth in usage of Prometheus and increased need to hire those with relevant skills, the need to be able to evaluate Prometheus knowledge is important. In this talk I'll show how standard interview questions from related fields can be applied.
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...Brian Brazil
Traditional relational databases focus on ACID, providing strong semantics that require careful synchronisation between actors that limit scalability. NoSQL Column Stores such as Cassandra, Riak and Dynamo offer another way, by eschewing strong consistency you can meet your application's needs while also increasing scalability and reliability. This talk will cover how and where to use eventual consistency.
What does "monitoring" mean? (FOSDEM 2017)Brian Brazil
Monitoring can mean very different things to different people, and this often leads to confusion and misunderstandings. There are many offerings both free software and commercials, and it's not always clear where each fits in the bigger picture. This talk will look a bit at the history of monitoring, and then into the general categories of Metrics, Logs, Profiling and Distributed tracing and how each of these is important in Cloud-based environment.
Video: https://www.youtube.com/watch?v=hCBGyLRJ1qo
Microservices and Prometheus (Microservices NYC 2016)Brian Brazil
If you'd like to learn more about Prometheus, contact us at prometheus@robustperception.io or follow us on twitter at https://twitter.com/RobustPerceiver
Prometheus is a next-generation monitoring system designed for microservices. This talk will look at what's the best way to monitor your microservices, which metrics you should care about, how to have useful alerts and how Prometheus empowers you to do things the right way.
Prometheus is a open-source time series database with a powerful query language designed for operational monitoring.
Contact us at prometheus@robustperception.io
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
Prometheus is a next-generation monitoring system with a time series database at it's core. Once you have a time series database, what do you do with it though? This talk will look at getting data in, and more importantly how to use the data you collect productively.
Contact us at prometheus@robustperception.io
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Brian Brazil
Counters are one of the two core metric types in Prometheus, allowing for tracking of request rates, error ratios and other key measurements. Learn why are they designed the way they are, how client libraries implement them and how rate() works.
If you'd like more information about Prometheus, contact us at prometheus@robustperception.io
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Brian Brazil
A look at how Prometheus's instrumentation, data model, query language, manageability and reliability make it a next generation solution.
Video: https://www.youtube.com/watch?v=cwRmXqXKGtk
Contact us: prometheus@robustperception.io
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Brian Brazil
Often what you monitor and get alerted on is defined by your tools, rather than what makes the most sense to you and your organisation. Alerts on metrics such as CPU usage which are noisy and rarely spot real problems, while outages go undetected. Monitoring systems can also be challenging to maintain, and overall provide a poor return on investment.
In the past few years several new monitoring systems have appeared with more powerful semantics and which are easier to run, which offer a way to vastly improve how your organisation operates Prometheus is one such system. This talk will look at the monitoring ideal and how whitebox monitoring with a time series database, multi-dimensional labels and a powerful querying/alerting language can free you from midnight pages.
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
Often what you monitor and get alerted on is defined by your tools, rather than what makes the most sense to you and your organisation. Alerts on metrics such as CPU usage which are noisy and rarely spot real problems, while outages go undetected. Monitoring systems can also be challenging to maintain, and overall provide a poor return on investment.
In the past few years several new monitoring systems have appeared with more powerful semantics and which are easier to run, which offer a way to vastly improve how your organisation operates and prepare you for a Cloud Native environment. Prometheus is one such system. This talk will look at the monitoring ideal and how whitebox monitoring with a time series database, multi-dimensional labels and a powerful querying/alerting language can free you from midnight pages.
Labels are at the core of Prometheus's dimensional data model. The Prometheus server and its surrounding ecosystem components all either attach, modify, or act on labels in various ways. In this talk, Brian explains the entire life cycle of labels, including their generation in the client libraries, their transformation in relabeling, as well as their use in service discovery and alerting.
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Brian Brazil
The biggest semantic change in Prometheus 2.0 is the new staleness handling. This long awaited feature means there's no longer a fixed 5 minute staleness. Now time series go stale when they're no longer exposed, and targets that no longer exist don't hang around for a full 5 minutes. Learn about how it works and how to take advantage of it.
Cloud Native Night August 2016, Munich: Talk by Julius Volz (@juliusvolz, Co-founder at Prometheus).
Join our Meetup: www.meetup.com/cloud-native-muc
Abstract: This talk is on monitoring dynamic cloud environments with Prometheus.
What is your application doing right now? An introduction to PrometheusMatthias Grüter
Slides from my talk at the "DevOps and the search for the Holy Grail" meetup in Stockholm on May 28, 2105
A short introduction on application monitoring & metrics with Prometheus, the monitoring system and time-series database.
The use of Prometheus is illustrated with examples in Docker and Java. We'll use Grafana as a metrics dashboard on top of Prometheus and create a simple integration with Slack for alarm triggering.
Meetup: http://www.meetup.com/DevOps-Stockholm/events/222471316/
Recording: https://youtu.be/Z0LlilNpX1U
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Brian Brazil
From its humble beginnings in 2012, the Prometheus monitoring system has grown a substantial community with a comprehensive set of integrations. This talk will provide an overview of the core ideas behind Prometheus and its feature set.
Prometheus is a next-generation monitoring system. It lets you see you not just what your systems look like from the outside, but also gives visibility into the internals and business aspects of your systems. This allows everyone to benefit, including both operations and developers. This talk will look at the concepts behind monitoring with Prometheus, how it's designed, why it's suitable for Cloud Native environments and how you can get involved.
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)Brian Brazil
With the growth in usage of Prometheus and increased need to hire those with relevant skills, the need to be able to evaluate Prometheus knowledge is important. In this talk I'll show how standard interview questions from related fields can be applied.
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...Brian Brazil
Traditional relational databases focus on ACID, providing strong semantics that require careful synchronisation between actors that limit scalability. NoSQL Column Stores such as Cassandra, Riak and Dynamo offer another way, by eschewing strong consistency you can meet your application's needs while also increasing scalability and reliability. This talk will cover how and where to use eventual consistency.
What does "monitoring" mean? (FOSDEM 2017)Brian Brazil
Monitoring can mean very different things to different people, and this often leads to confusion and misunderstandings. There are many offerings both free software and commercials, and it's not always clear where each fits in the bigger picture. This talk will look a bit at the history of monitoring, and then into the general categories of Metrics, Logs, Profiling and Distributed tracing and how each of these is important in Cloud-based environment.
Video: https://www.youtube.com/watch?v=hCBGyLRJ1qo
Microservices and Prometheus (Microservices NYC 2016)Brian Brazil
If you'd like to learn more about Prometheus, contact us at prometheus@robustperception.io or follow us on twitter at https://twitter.com/RobustPerceiver
Prometheus is a next-generation monitoring system designed for microservices. This talk will look at what's the best way to monitor your microservices, which metrics you should care about, how to have useful alerts and how Prometheus empowers you to do things the right way.
Prometheus is a open-source time series database with a powerful query language designed for operational monitoring.
Contact us at prometheus@robustperception.io
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
Prometheus is a next-generation monitoring system with a time series database at it's core. Once you have a time series database, what do you do with it though? This talk will look at getting data in, and more importantly how to use the data you collect productively.
Contact us at prometheus@robustperception.io
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)Brian Brazil
Counters are one of the two core metric types in Prometheus, allowing for tracking of request rates, error ratios and other key measurements. Learn why are they designed the way they are, how client libraries implement them and how rate() works.
If you'd like more information about Prometheus, contact us at prometheus@robustperception.io
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Brian Brazil
A look at how Prometheus's instrumentation, data model, query language, manageability and reliability make it a next generation solution.
Video: https://www.youtube.com/watch?v=cwRmXqXKGtk
Contact us: prometheus@robustperception.io
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Brian Brazil
Often what you monitor and get alerted on is defined by your tools, rather than what makes the most sense to you and your organisation. Alerts on metrics such as CPU usage which are noisy and rarely spot real problems, while outages go undetected. Monitoring systems can also be challenging to maintain, and overall provide a poor return on investment.
In the past few years several new monitoring systems have appeared with more powerful semantics and which are easier to run, which offer a way to vastly improve how your organisation operates Prometheus is one such system. This talk will look at the monitoring ideal and how whitebox monitoring with a time series database, multi-dimensional labels and a powerful querying/alerting language can free you from midnight pages.
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
Often what you monitor and get alerted on is defined by your tools, rather than what makes the most sense to you and your organisation. Alerts on metrics such as CPU usage which are noisy and rarely spot real problems, while outages go undetected. Monitoring systems can also be challenging to maintain, and overall provide a poor return on investment.
In the past few years several new monitoring systems have appeared with more powerful semantics and which are easier to run, which offer a way to vastly improve how your organisation operates and prepare you for a Cloud Native environment. Prometheus is one such system. This talk will look at the monitoring ideal and how whitebox monitoring with a time series database, multi-dimensional labels and a powerful querying/alerting language can free you from midnight pages.
Labels are at the core of Prometheus's dimensional data model. The Prometheus server and its surrounding ecosystem components all either attach, modify, or act on labels in various ways. In this talk, Brian explains the entire life cycle of labels, including their generation in the client libraries, their transformation in relabeling, as well as their use in service discovery and alerting.
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)Brian Brazil
The biggest semantic change in Prometheus 2.0 is the new staleness handling. This long awaited feature means there's no longer a fixed 5 minute staleness. Now time series go stale when they're no longer exposed, and targets that no longer exist don't hang around for a full 5 minutes. Learn about how it works and how to take advantage of it.
Cloud Native Night August 2016, Munich: Talk by Julius Volz (@juliusvolz, Co-founder at Prometheus).
Join our Meetup: www.meetup.com/cloud-native-muc
Abstract: This talk is on monitoring dynamic cloud environments with Prometheus.
What is your application doing right now? An introduction to PrometheusMatthias Grüter
Slides from my talk at the "DevOps and the search for the Holy Grail" meetup in Stockholm on May 28, 2105
A short introduction on application monitoring & metrics with Prometheus, the monitoring system and time-series database.
The use of Prometheus is illustrated with examples in Docker and Java. We'll use Grafana as a metrics dashboard on top of Prometheus and create a simple integration with Slack for alarm triggering.
Meetup: http://www.meetup.com/DevOps-Stockholm/events/222471316/
Recording: https://youtu.be/Z0LlilNpX1U
We at Preply do our best to ensure that our website loads quicky as it has huge impact on business. In my talk I will explain:
- why pageload metric is important from business standpoint and how to measure its impact.
- how we evolved with our speed optimization technics starting from very basic ones(caching, orm optimizations) to more advanced(replicas, load-balancing) and the level where we are now(CDN optmization, microservices etc.)
- I will talk about both front-end and backend optimization with focus on the stack we use: AWS, Django/Python, Postgres, Docker.
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...Puppet
This demo-heavy session led by Puppet Certified Consultant Tim Odom and Tim Carr will focus on common integration patterns for operationalizing Puppet in IaaS deployments. In this session we’ll focus on how to integrate Puppet into IaaS orchestration platforms built from tools like ServiceNow, AWS CloudFormation Templates, Cisco UCS-D, and VMware vRealize Automation. We’ll demonstrate both on-prem and public cloud use cases and address how these integrations differ. Deploying, however, is only a very small part of an object's lifecycle. In the second part of our session we’ll address how we provide feedback of application state change to ServiceNow’s change management system and how that can be leveraged to escalate incident resolution and also automate parts of your compliance workflows. Finally, we’ll show how feedback loops can be leveraged to intelligently scale resources with approval patterns.
Albert Witteveen - With Cloud Computing Who Needs Performance TestingTEST Huddle
EuroSTAR Software Testing Conference 2013 presentation on With Cloud Computing Who Needs Performance Testing by Albert Witteveen.
See more at: http://conference.eurostarsoftwaretesting.com/past-presentations/
Machine Learning in Production: Manu Mukerji, Strata CA March 2018 Manu Mukerji
Manu Mukerji walks you through Acme Corporation’s machine learning example for universal catalogs, explaining how the training and test sets are generated and annotated; how they were created when there is no public training data available; how the model is pushed to production, automatically evaluated, and used; how Acme Corporation built a Hadoop/Spark pipeline using different types of models predicting various values; production issues that arise when applying ML at scale in production; and lessons learned along the way.
Designing and Running Performance ExperimentsJ On The Beach
An accurate understanding of how our systems perform is critical for ensuring good customer service, effective capacity planning and managing the process of optimisation.
Sadly, it's all too rare to see good practice when it comes to analysing and testing the performance of systems. In this talk, we see how to approach performance analysis scientifically.
We’ll discuss how to design, construct, execute, verify and analyse performance experiments to answer these four important questions:
How much load can my system handle before it is saturated?
What service can I expect my customers to see at a given load level?
What are the bottlenecks in my application that cause saturation?
How is my performance varying over time?
Attendees will learn how to collate and process large timeseries datausing InfluxDB. We'll see how to monitor experiments as they execute,how to analyse the results of each experiment and how to compare results across experiments.
With Cloud Computing, Who Needs Performance Testing?TEST Huddle
With cloud computing we can add more hardware resources on the fly. Considering how expensive load and stress testing can be, why don't we just add more power when needed?
This presentation will explain why, especially for situations where cloud computing is available, load and stress testing often falls short but is still required. It will also show how the queuing theory can provide a different approach which allows load and stress testers to add real value. Stakeholders and test managers can use the same theory to get a handle on the coverage and depth of the tests.
Key Takeaways:
- Why performance testing so often fails to accomplish what we want
- Why relying on cloud computing alone is not enough
- How the queuing theory can provide a different approach to performance testing
- How the queuing theory can help you understand if the performance tests
www.eurostarconferences.com
www.testhuddle.com
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
The number of deployments of Apache Kafka at enterprise scale has greatly increased in the years since Kafka’s original development in 2010. Along with this rapid growth has come a wide variety of use cases and deployment strategies that transcend what Kafka’s creators imagined when they originally developed the technology. As the scope and reach of streaming data platforms based on Apache Kafka has grown, the need to understand monitoring and troubleshooting strategies has as well.
Dustin Cote and Ryan Pridgeon share their experience supporting Apache Kafka at enterprise-scale and explore monitoring and troubleshooting techniques to help you avoid pitfalls when scaling large-scale Kafka deployments.
Topics include:
- Effective use of JMX for Kafka
- Tools for preventing small problems from becoming big ones
- Efficient architectures proven in the wild
- Finding and storing the right information when it all goes wrong
Visit www.confluent.io for more information.
Monitoring Far Beyond the Operating System - WeOp 2014Marcus Vechiato
It discusses various aspects such as the implementation of monitoring solutions, the significance of automation in incident management, integration via APIs, prioritization of incidents, and the involvement of senior team members in the implementation and management processes. Additionally, it stresses the importance of revisiting and adjusting processes periodically to ensure effectiveness and adherence. Overall, it underscores the crucial role of both people and processes in maintaining a robust IT environment.
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)Brian Brazil
The OpenMetrics format intends to standardise metric exposition, making it easy for both those developing and operating systems to monitor them. It is however a new format. Will it be supported by your monitoring system? Will you need to rewrite your existing instrumentation? What's needed to transition? What about 3rd party systems you don't control? How does this differ and expand, and improve on the existing Prometheus format? This session will cover all of these questions.
Evolution of Monitoring and Prometheus (Dublin 2018)Brian Brazil
This talk looks at the evolution of monitoring over time, the ways in which you can approach monitoring, where Prometheus fit into all this, and how Prometheus itself has grown over time.
Anatomy of a Prometheus Client Library (PromCon 2018)Brian Brazil
Prometheus client libraries are notably different from most other options in the space. In order to get the best insights into your applications it helps to know how they are designed, and why they are designed that way. This talk will look at how client libraries are structured, how that makes them easy to use, some tips for instrumentation, and why you should use them even if you aren't using Prometheus.
Prometheus for Monitoring Metrics (Fermilab 2018)Brian Brazil
From its humble beginnings in 2012, the Prometheus monitoring system has grown a substantial community with a comprehensive set of integrations. This talk will give an overview of the core ideas behind Prometheus, its feature set and how it has grown to met the challenges of modern cloud-based systems.
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)Brian Brazil
As the industry moves towards more cloud based and containerised solutions such as Kubernetes, monitoring tools have to keep up. These new environments are far more dynamic than the hand-maintained machines of old, requiring more sophisticated and scalable approaches. This talk will look at how Prometheus has evolved over the past 5 years to be better able to cope with these challenges, including the 2.0 release and practices that we encourage in a cloud native world.
Evolution of the Prometheus TSDB (Percona Live Europe 2017)Brian Brazil
Prometheus is a monitoring system with a custom time series database at its core. Prometheus 2.0 features the 3rd major iteration of this database. This talk will look at how it has evolved, and how it fits into the goal of doing metrics-based monitoring.
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Brian Brazil
From its humble beginnings right here in Berlin in 2012, the Prometheus monitoring system has grown a substantial community with a comprehensive set of integrations. This talk will go over the core ideas behind Prometheus, give a brief tour of its end-to-end feature set and show how these combine with other CNCF projects to allow you to scale your systems and culture in a dynamic cloud native world.
If you're looking for help with Prometheus, contact us at prometheus@robustperception.io
An Exploration of the Formal Properties of PromQLBrian Brazil
Prometheus is often considered in a production sense. But what about the more formal and academic aspects? Is PromQL interesting from a Computer Science standpoint?
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Brian Brazil
Prometheus is a next-generation monitoring system. Since being publicly announced last year it has seen wide-spread interest and adoption. This talk will look at the concepts behind monitoring with Prometheus, and how to use it with Kubernetes which has direct support for Prometheus.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
2. Who am I?
Engineer passionate about running software reliably in production.
● TCD CS Degree
● Google SRE for 7 years, working on high-scale reliable systems such as
Adwords, Adsense, Ad Exchange, Billing, Database
● Boxever TL Systems&Infrastructure, applied processes and technology to let
allow company to scale and reduce operational load
● Contributor to many open source projects, including Prometheus, Ansible,
Python, Aurora and Zookeeper.
● Founder of Robust Perception, making scalability and efficiency available to
everyone
3. Goals
At the end of the workshop you will be able to:
● Estimate how much spare capacity you have in less than 5 minutes
● Estimate how much runway that capacity provides
● Determine how many machines you need
● Spot common potential problems as you scale
This should set you up for your first 1-2 years, if not more
4. Audience
This is an introductory workshop to teach you the basics.
Your company:
● Uses Unix in production
● Has a relatively simple setup/small number of machines
● Operations primarily performed by developers
● Performance has not been a primary consideration in your product
I’m also going to focus on webservices-type systems rather than offline processing
or batch.
6. Estimate your capacity in 3 easy steps!
1. Measure bottleneck resource at peak traffic
2. Divide to get fraction of limit
3. Multiply by peak traffic
7. Estimate your capacity in 3 not so easy steps!
1. What’s your bottleneck? How do you measure it?
2. What’s your bottleneck’s limit?
3. What’s your peak traffic?
8. Step 1: What’s the bottleneck?
The most common bottlenecks:
1. CPU
2. Disk I/O
Less common: network, disk space, external resources, quotas, hardcoded limits,
contention/locking, memory, file descriptors, port numbers, humans
9. Step 1: Where’s the bottleneck?
Look at CPU % and Disk I/O Utilisation on each type of machine.
If you’ve monitoring, use that.
Failing that:
sudo apt-get install sysstat
iostat -x 5
10. Step 1: Iostat
avg-cpu: %user %nice %system %iowait %steal %idle
4.24 0.00 1.18 0.98 0.00 93.60
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 1.40 0.00 3.80 0.00 45.20 23.79 0.00 1.05 0.00 1.05 0.84 0.32
sdb 0.00 1.40 0.00 21.00 0.00 267.20 25.45 0.09 4.11 0.00 4.11 4.11 8.64
sdc 0.00 1.40 0.00 20.00 0.00 267.20 26.72 0.06 3.24 0.00 3.24 3.24 6.48
md0 0.00 0.00 0.00 2.00 0.00 8.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
The numbers you care about are %idle and %util.
%idle is the amount of CPU not in use. %util is the amount of disk I/O in use, take
the biggest one.
11. Step 2: What’s the limit?
We now know the CPU and disk I/O usage on each machine at peak.
Which is the bottleneck though?
Need to know the limit. Rules of thumb:
● 80% limit for CPU
● 50% limit for Disk I/O
12. Step 2: Division
Find how full each CPU and disk is.
Say we had a disk 10% utilised, and a CPU 20% utilised (80% idle).
0.1/0.5 = 0.2 => Disk IO is at 20% of limit
0.2/0.8 = 0.25 => CPU is at 25% of limit
CPU is our bottleneck, with 25% of capacity used.
14. Step 3: Peak traffic
Now that we know how full our bottleneck is, we need to know how much capacity
we have.
Figure out how much traffic you were handling around the time you measured cpu
and disk utilisation.
You might do this via monitoring, or parsing logs or if you’re really stuck tcpdump.
15. Step 3: The 2nd division
Let’s say our queries per second (qps) was 10 around peak.
Our CPU was our bottleneck, and about 25% of our limit.
10/0.25 = 40qps
So we can currently handle a maximum traffic of around 40qps
17. Now you can estimate your capacity in 3 easy steps!
1. Measure bottleneck resource at peak traffic
○ Use monitoring or iostat to see how close you are to the limit, say 20% full
2. Divide to get fraction of limit
○ With a limit of 80% for CPU, you’re 20/80 = 25% full
3. Multiply by peak traffic
○ Traffic was 10qps, so 10/0.25 = 40qps capacity
19. How much runway do you have?
You now have a rough idea of how much capacity you have to spare.
In the example here, we’re using 10qps out of 40qps capacity.
How long will that 30qps last you?
The two main factors are new customers and organic growth.
20. New Customers
New customers/partners are your main source of traffic.
Look at your traffic graphs around the time a new customer started using your
system.
If the customer had say 1M users and you saw 10qps increased peak traffic, you
can now predict how much traffic future customers will need.
Based on sales predictions, you can tell how much capacity you’ll need for new
customers.
21. Organic growth
Over time your existing customers/partners will use the system more and more,
new employees are hired, they get new customers etc.
Look at your monitoring’s traffic graphs over a few months to see what the trend is
like. Do your best to ignore the impact of launches.
Calculate your % growth month on month.
Starting out, it’s likely that organic growth will not be your main consideration.
22. Calculating runway
Once again in the example here, we’re using 10qps out of 40qps capacity.
Each 1M user customer generates 10qps of additional traffic.
You also expect a negligible amount of organic growth.
This means you can handle 3M more users worth of new customers.
If you’re signing up one 1M user customer per month, that gives you 3 months.
24. Provisioning vs Capacity Planning
Capacity Planning:
In 6 months I will have 7 new customers, and need to be able to handle 100qps in
total
Provisioning:
To handle 100qps I need X frontends and Y databases
25. Provisioning: What can a machine handle?
Continuing our example, let’s say we had 4 machines and each reported being at
CPU 20% (25% of the 80% limit) while dealing with 10qps each.
The key metric is qps per machine.
10qps/.2 machines = 50qps/machine
Can only safely use 80% of the machine, so 50*.8 = 40qps
So we can handle 40 qps per machine.
26. Provisioning: How many machines do I need?
If we want to handle 100qps, we need 100/40 = 2.5 machines. So 3 machines.
For each type of machine, calculate the incoming external qps it can handle and
how many you need.
Don’t fret about $10/month worth of cost, it’s not worth your time.
28. Review: The Basics
● Estimating capacity:
○ Measure bottleneck at peak
○ Find how near bottleneck is to the limit
○ Calculate spare capacity based on peak traffic
● Keep an eye on new customers/partners and organic growth to track runway
● For provisioning, calculate qps/machine for each type of machine
30. A few wrinkles
I’ve glossed over a lot of detail so you can go away from today’s workshop with
something you can immediately use.
Some questions ye may have:
● Why measure at peak traffic?
● What if I don’t have much traffic?
● Why 80% limit on CPU and 50% on disk?
● What if a machine fails?
● What if things aren’t that simple?
● Doesn’t autoscaling take care of all this for me?
31. Why measure at peak traffic?
As your utilisation increases:
● Latency increases
● Performance decreases
In addition skew due to
background of constant CPU
usage is decreased
Measuring at peak helps
allow for these factors.
Beware the knee.
32. What if I don’t have much traffic?
If you don’t have enough traffic to show up in top or iotop, then these techniques
won’t help you much.
You could loadtest, but that takes time. Or use rules of thumb.
Easier way: Use latency to estimate throughput.
If your queries take 10ms, then you can probably handle 100/s
33. Why 80% limit on CPU and 50% on disk?
For CPU due to utilisation/latency curve you want to avoid having too high
utilisation.
If you have the CPU to yourself 90-95% is safe in a controlled environment with
good loadtesting. This is uncommon, so leave safety margin for OS processes etc.
For spinning disks the impact of utilisation tend to be more problematic, and
background tasks tend to use a lot of disk.
34. What if a machine fails?
You generally should add 2 extra machines beyond that you need to serve peak
qps. This is commonly known as “n+2”.
This is to allow for one machine failure, and to let you take down a machine to
push a new binary, perform maintenance or whatever.
This also gives you some slack in your capacity. As you grow, more sophisticated
math is required.
35. What if things aren’t that simple?
Lots of other issues can throw a spanner in the works.
● Heterogeneous machines
● Varying machine performance
● Varying traffic mixes
● Multiple datacenters
● Multi-tiered services
As a general rule try to keep things simple. A perfect model is brittle and usually
takes more time than it’s worth.
39. Doesn’t autoscaling take care of all this for me?
EC2 Autoscaling can eliminate some of the day-to-day work in provisioning
servers.
There’s operational and complexity overhead, as you have to maintain images and
systems that can be spun up.
You have to wait for instances to spin up - can’t rely on it completely for sudden
spikes. You need to do math to tune it to be able to handle a spikes.
You still have to tune everything. Control systems are hard.
41. Monitoring Matters
A common thread through this workshop is that monitoring is what should be
providing you the information you need to make operational decisions.
Make sure you have a good monitoring system.
Logs are not monitoring, though better than nothing.
I recommend Prometheus.io: If it didn’t exist I would have created it.
42. Production Matters
Provisioning and Capacity planning is just one aspect of production. There’s many
others involved with running your company:
Robust Perception can help you with all of this and more.
● Deployment
● Change Management
● Configuration Management
● Reliability
● Architecture
● Design Feasibility
● Cost Management
● Performance Tuning
● SLAs
● Contract Sanity Check
● Debugging
● Alerting
● Oncall
● Incident Management