The document discusses the process of logically sharding a growing PostgreSQL database. It describes the stages involved: diagnosing which tables are largest; evaluating options like account, geographic or hardware-based sharding; scoping the solution by separating tables between a main and marks database; implementing changes including managing transactions and configuration across databases; releasing the changes; and cleaning up afterwards. It emphasizes testing rollback processes, managing technical debt, and bringing empathy to understanding legacy code and configurations.
The document discusses the process of logically sharding a growing PostgreSQL database. It describes the stages involved: diagnosing which tables are largest; considering options like account, geographic or hardware-based sharding; scoping the solution by separating tables between a main and marks database; implementing changes including managing transactions and configuration across databases; releasing the changes; and cleaning up afterwards. It emphasizes testing rollback processes, managing technical debt, and bringing empathy to understanding legacy code and configurations.
Hadoop is an open-source framework for distributed processing of large datasets across clusters of computers. It allows for the parallel processing of large datasets stored across multiple servers. Hadoop uses HDFS for reliable storage and MapReduce as a programming model for distributed computing. HDFS stores data reliably in blocks across nodes, while MapReduce processes data in parallel using map and reduce functions.
The document discusses scaling a web application called Wanelo that is built on PostgreSQL. It describes 12 steps for incrementally scaling the application as traffic increases. The first steps involve adding more caching, optimizing SQL queries, and upgrading hardware. Further steps include replicating reads to additional PostgreSQL servers, using alternative data stores like Redis where appropriate, moving write-heavy tables out of PostgreSQL, and tuning PostgreSQL and the underlying filesystem. The goal is to scale the application while maintaining PostgreSQL as the primary database.
This document provides an overview and tutorial on streaming jobs in Hadoop, which allow processing of data using non-Java programs like Python scripts. It includes sample code and datasets to demonstrate joining and counting data from multiple files using mappers and reducers. Tips are provided on optimizing streaming jobs, such as padding fields for sorting, handling errors, and running jobs on Hadoop versus standalone.
“What should I work on next?” Code metrics can help you answer that question. They can single out sections of your code that are likely to contain bugs. They can help you get a toehold on a legacy system that’s poorly covered by tests.
It has been said that one should code as if the person maintaining the code is a violent psychopath who knows where you live. But why do we work with psychopaths? That question unfortunately cannot be answered in this presentation. However, we can shed some light on how to code for readability hopefully avoiding the problem altogether.
Readable code is about a lot more than producing beautiful code. In fact, it has nothing really to do with the beauty of the code and everything to do with the ability to quickly understand what the code does.
In this presentation we will discuss why readable code is so important. We will cover six commonly reoccurring patterns that made code hard to read, why they occur, and how they can easily be avoided:
* Deep Nesting
* Unnecessary Generalization
* Ambiguous Naming
* Hard to Follow Flow of Execution
* Code Style vs. Individualism
* Code Comments
These concepts may be applied to any programming language.
Machine Learning, Key to Your Classification ChallengesMarc Borowczak
This document discusses using machine learning algorithms to classify mushroom data. It begins by downloading mushroom data from an online repository, cleaning the data by handling missing values, and structuring the data with column names and attribute information. Then it uses several machine learning algorithms like OneR, JRip, and C5.0 to build classification models on the data and evaluate the performance of the models. The goal is to derive simple and accurate classification rules to determine whether mushrooms are edible or poisonous.
The primary focus of this presentation is approaching the migration of a large, legacy data store into a new schema built with Django. Includes discussion of how to structure a migration script so that it will run efficiently and scale. Learn how to recognize and evaluate trouble spots.
Also discusses some general tips and tricks for working with data and establishing a productive workflow.
The document discusses the process of logically sharding a growing PostgreSQL database. It describes the stages involved: diagnosing which tables are largest; considering options like account, geographic or hardware-based sharding; scoping the solution by separating tables between a main and marks database; implementing changes including managing transactions and configuration across databases; releasing the changes; and cleaning up afterwards. It emphasizes testing rollback processes, managing technical debt, and bringing empathy to understanding legacy code and configurations.
Hadoop is an open-source framework for distributed processing of large datasets across clusters of computers. It allows for the parallel processing of large datasets stored across multiple servers. Hadoop uses HDFS for reliable storage and MapReduce as a programming model for distributed computing. HDFS stores data reliably in blocks across nodes, while MapReduce processes data in parallel using map and reduce functions.
The document discusses scaling a web application called Wanelo that is built on PostgreSQL. It describes 12 steps for incrementally scaling the application as traffic increases. The first steps involve adding more caching, optimizing SQL queries, and upgrading hardware. Further steps include replicating reads to additional PostgreSQL servers, using alternative data stores like Redis where appropriate, moving write-heavy tables out of PostgreSQL, and tuning PostgreSQL and the underlying filesystem. The goal is to scale the application while maintaining PostgreSQL as the primary database.
This document provides an overview and tutorial on streaming jobs in Hadoop, which allow processing of data using non-Java programs like Python scripts. It includes sample code and datasets to demonstrate joining and counting data from multiple files using mappers and reducers. Tips are provided on optimizing streaming jobs, such as padding fields for sorting, handling errors, and running jobs on Hadoop versus standalone.
“What should I work on next?” Code metrics can help you answer that question. They can single out sections of your code that are likely to contain bugs. They can help you get a toehold on a legacy system that’s poorly covered by tests.
It has been said that one should code as if the person maintaining the code is a violent psychopath who knows where you live. But why do we work with psychopaths? That question unfortunately cannot be answered in this presentation. However, we can shed some light on how to code for readability hopefully avoiding the problem altogether.
Readable code is about a lot more than producing beautiful code. In fact, it has nothing really to do with the beauty of the code and everything to do with the ability to quickly understand what the code does.
In this presentation we will discuss why readable code is so important. We will cover six commonly reoccurring patterns that made code hard to read, why they occur, and how they can easily be avoided:
* Deep Nesting
* Unnecessary Generalization
* Ambiguous Naming
* Hard to Follow Flow of Execution
* Code Style vs. Individualism
* Code Comments
These concepts may be applied to any programming language.
Machine Learning, Key to Your Classification ChallengesMarc Borowczak
This document discusses using machine learning algorithms to classify mushroom data. It begins by downloading mushroom data from an online repository, cleaning the data by handling missing values, and structuring the data with column names and attribute information. Then it uses several machine learning algorithms like OneR, JRip, and C5.0 to build classification models on the data and evaluate the performance of the models. The goal is to derive simple and accurate classification rules to determine whether mushrooms are edible or poisonous.
The primary focus of this presentation is approaching the migration of a large, legacy data store into a new schema built with Django. Includes discussion of how to structure a migration script so that it will run efficiently and scale. Learn how to recognize and evaluate trouble spots.
Also discusses some general tips and tricks for working with data and establishing a productive workflow.
The document discusses strategies for migrating large amounts of legacy data from an old database into a new Django application. Some key points:
- Migrating data in batches and minimizing database queries per row processed can improve performance for large datasets.
- Tools like SQLAlchemy and Maatkit can help optimize the migration process.
- It's important to profile queries, enable logging/debugging, and design migrations that can resume/restart after failures or pause for maintenance.
- Preserving some legacy metadata like IDs on the new models allows mapping data between the systems. Declarative and modular code helps scale the migration tasks.
eact is a library for building HTML user interfaces. It is the "view" in a Model-View-Controller application. Created by the UI wizards at Facebook, top websites like Instagram, Netflix, Airbnb, Bleacher Report and Feedly use it. React is the 6th most starred project on GitHub and grows more popular every day.
In this two-day workshop, we will introduce you to React. On the first day, we will work through a series of increasingly more complicated tutorial exercises. Along the way, we will explain concepts like JSX, immutability, statefulness, one-way data flow, components, and virtual DOM.
With the basics out of the way, we will spend the second-day building a complex application which will put React through its paces and give us a chance to explore most of its features. Then we will learn how to think in React. We will show you how to go from design to components to working application. We will wrap the weekend with a quick preview of React Native, which allows you to use your React skills to create cross-platform mobile apps.
The document provides an introduction to using MongoDB with PHP development. It discusses advantages like enhanced development cycles without needing an ORM, and MongoDB features like dynamic schemas, indexing, replication and querying. It then demonstrates connecting to MongoDB from PHP, performing simple queries and updates, using GridFS to store files, and integrating MongoDB with PHPUnit for test logging. The presentation concludes by mentioning additional MongoDB concepts and encouraging attendees to learn more.
This document provides an overview and introduction to using MongoDB with PHP development. It discusses how MongoDB benefits PHP developers through an enhanced development cycle without the need for an ORM or database schema changes. It also outlines some of the key features of MongoDB, such as document-oriented storage, indexing, replication and querying. The document then demonstrates connecting to MongoDB from PHP, performing basic queries and updates, and using GridFS to store files. It concludes by proposing a sample student information system that could be built with MongoDB and PHP.
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018Mike Harris
I never wrote it; everybody else did! How many times have you waded through an ageing, decaying, tangled forrest of code and wished it would just die? How many times have you heard someone say that what really needs to happen is a complete rewrite? I have heard this many times, and, have uttered that fatal sentence myself. But shouldn’t we love our legacy code? Doesn’t it represent our investment and the hard work of ourselves and our predecessors? Throwing it away is dangerous, because, before we do, we’ll need to work out exactly what it does, and we’ll need to tweeze out that critical business logic nestled in a deeply entangled knot of IF statements. It could take us years to do, and we’ll have to maintain two systems whilst we do it, inevitably adding new features to them both. Yes we get to reimplement using the latest, coolest programming language, instead of an old behemoth, but how long will our new cool language be around, and who will maintain that code, when it itself inevitably turns to legacy? We can throw our arms in the air, complaining and grumbling about how we didn’t write the code, how we would never have written it the way it is, how those that wrote it were lesser programmers, possibly lesser humans themselves, but the code still remains, staring us in the face and hanging around for longer that we could possibly imagine. We can sort it out, we can improve it, we can make it testable, and we can learn to love our legacy code.
https://www.youtube.com/watch?v=qRP45l5UugE
The document discusses Test Driven Development (TDD) using PhpSpec. It begins with an overview of TDD vs Behavior Driven Development (BDD). It then covers key aspects of using PhpSpec including describing object behavior with examples, verifying behavior by running tests, matchers for assertions, describing collaborations and exceptions. The rest of the document demonstrates a TDD workflow using PhpSpec to develop a greeter class and related classes like Person in a step-by-step manner.
1. The document discusses various technologies for building big data architectures, including NoSQL databases, distributed file systems, and data partitioning techniques.
2. Key-value stores, document databases, and graph databases are introduced as alternatives to relational databases for large, unstructured data.
3. The document also covers approaches for scaling databases horizontally, such as sharding, replication, and partitioning data across multiple servers.
A whirlwind tour of the modules that any perl hacker, from beginner to experienced, should use and why.
Handout: List of modules in the talk along with many more: https://sites.google.com/site/perlhercynium/TEPHT-List2.pdf?attredirects=0
The document proposes an IT infrastructure for Shiv LLC, a company with locations in Los Angeles, Dallas, and Houston. It recommends implementing an Active Directory domain to enable communication and file sharing across the three locations. A centralized file server would store common files and applications. Each location would have its own local area network, connected to the other sites and to the internet via VPN. Firewalls, antivirus software, and regular backups would help secure the network and protect company data. The design allows for future growth and expansion as the company scales up.
The fundamentals and advance application of Node will be covered. We will explore the design choices that make Node.js unique, how this changes the way applications are built and how systems of applications work most effectively in this model. You will learn how to create modular code that’s robust, expressive and clear. Understand when to use callbacks, event emitters and streams.
The document provides an overview of key concepts in system design including:
1) Breaking problems into modules using a top-down approach and discussing trade-offs.
2) Architectural components like load balancing, databases, caching, and data partitioning that are important to consider in system design.
3) Database types like SQL and NoSQL and when each is best suited based on factors like data structure, scalability needs, and development agility.
The document provides an overview of using the Drupal database API for interacting with the Drupal database. It covers basics of db_query and dynamic queries using db_select. Key points include using placeholders in queries, working with result sets, and more advanced topics like joins, sorting, conditional statements, and query tagging. The document suggests considering the database API as an alternative to Views when custom queries or aggregated data are needed that may require complex Views configuration.
The document is a presentation about designing code for beauty, simplicity, and usability. It discusses reasons for designing code like for one's own sanity and future growth. It covers topics like balance, clarity, and harmony in code. Balance involves alignment, order, and grid usage. Clarity involves consistent naming, positive grammar, and method extraction. Harmony involves integrating code with its applications and user experience. The presentation emphasizes writing simply and designing the user experience before writing code.
1. The document discusses handling small file problems in Spark ETL pipelines. It recommends keeping partition sizes between 2GB and not too small to avoid overhead problems.
2. It provides examples of transformations like aggregation, normalization, and lookup that are commonly used.
3. Pivoting data in Spark is presented as an efficient solution to transform data compared to traditional ETL tools. The example pivots data to summarize by year and quarter within minutes for billions of records.
Hidden Gems of Performance Tuning: Hierarchical Profiler and DML Trigger Opti...Michael Rosenblum
In any large ecosystem, there are always areas that stay in the twilight, outside of the public’s attention. This deep dive attempts to change the trend regarding two, at first glance, unrelated PL/SQL topics: hierarchical profiler (HProf) and database triggers. But if you look closer, there’s something in common: they’re significantly underused! HProf because nobody heard about it, database triggers because of decades-old stigma. Let’s put both of them back into our development toolset!
Part #1. One of the most critical FREE SQL and PL/SQL performance tuning tools is almost totally unknown! If you ask, how much time is spent on routine A? How often is function B called? Most developers would hand-code something instead of using the Oracle PL/SQL HProf. This isn’t because the provided functionality is disliked, but because developers aren’t aware of its existence! This presentation is an attempt to alter this trend and reintroduce HProf to a wider audience.
Part #2. There isn’t anything “evil” about database triggers; they just have to be used where they can actually solve problems. In this presentation, various kinds of triggers will be examined from a global system optimization view, including tradeoffs between multiple goals (e.g., depending upon the available hardware, developers can select either CPU-intensive or I/O-intensive solutions). This presentation will focus on the most common performance problems related to different kinds of DML triggers and the proper ways of resolving them.
- The document discusses serialization and deserialization of objects for transfer between systems. It compares JSON and optimized JSON formats.
- JSON is more human-readable but has greater memory overhead and reduced compressibility compared to optimized formats like protocol buffers which can improve performance.
- The document recommends designing data transfer objects (DTOs) to optimize for smaller size and better compression when communicating with servers.
The document describes Wide-Search Molecular Replacement (WS-MR), a technique for determining protein structures using molecular replacement (MR) with an exhaustive search of known protein structures. WS-MR uses Phaser to attempt MR trials using all single-domain structures from the SCOP database. It is suitable when standard MR techniques have failed to find good models. The process requires amplitude data in MTZ format and substantial computing resources. Successful cases identify good MR candidates, while most cases do not find strong candidates. Users submit jobs via a portal, monitor results including scatter plots and tables of metrics, and can investigate top candidates by downloading structures.
Reliable observability at scale: Error Budgets for 1,000+Fred Moyer
This document summarizes a presentation about implementing service level objectives (SLOs) and error budgets at scale. It discusses establishing service level indicators (SLIs) to define good and bad service, setting SLOs as targets for SLIs over time periods, and calculating error budgets as the complement of SLOs. The presentation provides examples of SLIs, SLOs, and error budgets for latency and availability. It also discusses challenges including variance from real users and different stakeholders' needs, and recommends approaches like flexible latency metrics and measuring as close to users as possible.
Practical service level objectives with error budgetingFred Moyer
This document summarizes Fred Moyer's presentation on practical service level objectives with error budgeting. It discusses how to calculate error budgets based on service level indicators, objectives, and agreements. It presents methods to calculate error budgets using log files by counting errors and slow requests, and using metrics like counters and histograms to track errors and response time distributions over time. Maintaining an error budget allows managing risk while releasing new code by setting a target for acceptable errors or slow requests.
More Related Content
Similar to The Breakup - Logically Sharding a Growing PostgreSQL Database
The document discusses strategies for migrating large amounts of legacy data from an old database into a new Django application. Some key points:
- Migrating data in batches and minimizing database queries per row processed can improve performance for large datasets.
- Tools like SQLAlchemy and Maatkit can help optimize the migration process.
- It's important to profile queries, enable logging/debugging, and design migrations that can resume/restart after failures or pause for maintenance.
- Preserving some legacy metadata like IDs on the new models allows mapping data between the systems. Declarative and modular code helps scale the migration tasks.
eact is a library for building HTML user interfaces. It is the "view" in a Model-View-Controller application. Created by the UI wizards at Facebook, top websites like Instagram, Netflix, Airbnb, Bleacher Report and Feedly use it. React is the 6th most starred project on GitHub and grows more popular every day.
In this two-day workshop, we will introduce you to React. On the first day, we will work through a series of increasingly more complicated tutorial exercises. Along the way, we will explain concepts like JSX, immutability, statefulness, one-way data flow, components, and virtual DOM.
With the basics out of the way, we will spend the second-day building a complex application which will put React through its paces and give us a chance to explore most of its features. Then we will learn how to think in React. We will show you how to go from design to components to working application. We will wrap the weekend with a quick preview of React Native, which allows you to use your React skills to create cross-platform mobile apps.
The document provides an introduction to using MongoDB with PHP development. It discusses advantages like enhanced development cycles without needing an ORM, and MongoDB features like dynamic schemas, indexing, replication and querying. It then demonstrates connecting to MongoDB from PHP, performing simple queries and updates, using GridFS to store files, and integrating MongoDB with PHPUnit for test logging. The presentation concludes by mentioning additional MongoDB concepts and encouraging attendees to learn more.
This document provides an overview and introduction to using MongoDB with PHP development. It discusses how MongoDB benefits PHP developers through an enhanced development cycle without the need for an ORM or database schema changes. It also outlines some of the key features of MongoDB, such as document-oriented storage, indexing, replication and querying. The document then demonstrates connecting to MongoDB from PHP, performing basic queries and updates, and using GridFS to store files. It concludes by proposing a sample student information system that could be built with MongoDB and PHP.
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018Mike Harris
I never wrote it; everybody else did! How many times have you waded through an ageing, decaying, tangled forrest of code and wished it would just die? How many times have you heard someone say that what really needs to happen is a complete rewrite? I have heard this many times, and, have uttered that fatal sentence myself. But shouldn’t we love our legacy code? Doesn’t it represent our investment and the hard work of ourselves and our predecessors? Throwing it away is dangerous, because, before we do, we’ll need to work out exactly what it does, and we’ll need to tweeze out that critical business logic nestled in a deeply entangled knot of IF statements. It could take us years to do, and we’ll have to maintain two systems whilst we do it, inevitably adding new features to them both. Yes we get to reimplement using the latest, coolest programming language, instead of an old behemoth, but how long will our new cool language be around, and who will maintain that code, when it itself inevitably turns to legacy? We can throw our arms in the air, complaining and grumbling about how we didn’t write the code, how we would never have written it the way it is, how those that wrote it were lesser programmers, possibly lesser humans themselves, but the code still remains, staring us in the face and hanging around for longer that we could possibly imagine. We can sort it out, we can improve it, we can make it testable, and we can learn to love our legacy code.
https://www.youtube.com/watch?v=qRP45l5UugE
The document discusses Test Driven Development (TDD) using PhpSpec. It begins with an overview of TDD vs Behavior Driven Development (BDD). It then covers key aspects of using PhpSpec including describing object behavior with examples, verifying behavior by running tests, matchers for assertions, describing collaborations and exceptions. The rest of the document demonstrates a TDD workflow using PhpSpec to develop a greeter class and related classes like Person in a step-by-step manner.
1. The document discusses various technologies for building big data architectures, including NoSQL databases, distributed file systems, and data partitioning techniques.
2. Key-value stores, document databases, and graph databases are introduced as alternatives to relational databases for large, unstructured data.
3. The document also covers approaches for scaling databases horizontally, such as sharding, replication, and partitioning data across multiple servers.
A whirlwind tour of the modules that any perl hacker, from beginner to experienced, should use and why.
Handout: List of modules in the talk along with many more: https://sites.google.com/site/perlhercynium/TEPHT-List2.pdf?attredirects=0
The document proposes an IT infrastructure for Shiv LLC, a company with locations in Los Angeles, Dallas, and Houston. It recommends implementing an Active Directory domain to enable communication and file sharing across the three locations. A centralized file server would store common files and applications. Each location would have its own local area network, connected to the other sites and to the internet via VPN. Firewalls, antivirus software, and regular backups would help secure the network and protect company data. The design allows for future growth and expansion as the company scales up.
The fundamentals and advance application of Node will be covered. We will explore the design choices that make Node.js unique, how this changes the way applications are built and how systems of applications work most effectively in this model. You will learn how to create modular code that’s robust, expressive and clear. Understand when to use callbacks, event emitters and streams.
The document provides an overview of key concepts in system design including:
1) Breaking problems into modules using a top-down approach and discussing trade-offs.
2) Architectural components like load balancing, databases, caching, and data partitioning that are important to consider in system design.
3) Database types like SQL and NoSQL and when each is best suited based on factors like data structure, scalability needs, and development agility.
The document provides an overview of using the Drupal database API for interacting with the Drupal database. It covers basics of db_query and dynamic queries using db_select. Key points include using placeholders in queries, working with result sets, and more advanced topics like joins, sorting, conditional statements, and query tagging. The document suggests considering the database API as an alternative to Views when custom queries or aggregated data are needed that may require complex Views configuration.
The document is a presentation about designing code for beauty, simplicity, and usability. It discusses reasons for designing code like for one's own sanity and future growth. It covers topics like balance, clarity, and harmony in code. Balance involves alignment, order, and grid usage. Clarity involves consistent naming, positive grammar, and method extraction. Harmony involves integrating code with its applications and user experience. The presentation emphasizes writing simply and designing the user experience before writing code.
1. The document discusses handling small file problems in Spark ETL pipelines. It recommends keeping partition sizes between 2GB and not too small to avoid overhead problems.
2. It provides examples of transformations like aggregation, normalization, and lookup that are commonly used.
3. Pivoting data in Spark is presented as an efficient solution to transform data compared to traditional ETL tools. The example pivots data to summarize by year and quarter within minutes for billions of records.
Hidden Gems of Performance Tuning: Hierarchical Profiler and DML Trigger Opti...Michael Rosenblum
In any large ecosystem, there are always areas that stay in the twilight, outside of the public’s attention. This deep dive attempts to change the trend regarding two, at first glance, unrelated PL/SQL topics: hierarchical profiler (HProf) and database triggers. But if you look closer, there’s something in common: they’re significantly underused! HProf because nobody heard about it, database triggers because of decades-old stigma. Let’s put both of them back into our development toolset!
Part #1. One of the most critical FREE SQL and PL/SQL performance tuning tools is almost totally unknown! If you ask, how much time is spent on routine A? How often is function B called? Most developers would hand-code something instead of using the Oracle PL/SQL HProf. This isn’t because the provided functionality is disliked, but because developers aren’t aware of its existence! This presentation is an attempt to alter this trend and reintroduce HProf to a wider audience.
Part #2. There isn’t anything “evil” about database triggers; they just have to be used where they can actually solve problems. In this presentation, various kinds of triggers will be examined from a global system optimization view, including tradeoffs between multiple goals (e.g., depending upon the available hardware, developers can select either CPU-intensive or I/O-intensive solutions). This presentation will focus on the most common performance problems related to different kinds of DML triggers and the proper ways of resolving them.
- The document discusses serialization and deserialization of objects for transfer between systems. It compares JSON and optimized JSON formats.
- JSON is more human-readable but has greater memory overhead and reduced compressibility compared to optimized formats like protocol buffers which can improve performance.
- The document recommends designing data transfer objects (DTOs) to optimize for smaller size and better compression when communicating with servers.
The document describes Wide-Search Molecular Replacement (WS-MR), a technique for determining protein structures using molecular replacement (MR) with an exhaustive search of known protein structures. WS-MR uses Phaser to attempt MR trials using all single-domain structures from the SCOP database. It is suitable when standard MR techniques have failed to find good models. The process requires amplitude data in MTZ format and substantial computing resources. Successful cases identify good MR candidates, while most cases do not find strong candidates. Users submit jobs via a portal, monitor results including scatter plots and tables of metrics, and can investigate top candidates by downloading structures.
Similar to The Breakup - Logically Sharding a Growing PostgreSQL Database (20)
Reliable observability at scale: Error Budgets for 1,000+Fred Moyer
This document summarizes a presentation about implementing service level objectives (SLOs) and error budgets at scale. It discusses establishing service level indicators (SLIs) to define good and bad service, setting SLOs as targets for SLIs over time periods, and calculating error budgets as the complement of SLOs. The presentation provides examples of SLIs, SLOs, and error budgets for latency and availability. It also discusses challenges including variance from real users and different stakeholders' needs, and recommends approaches like flexible latency metrics and measuring as close to users as possible.
Practical service level objectives with error budgetingFred Moyer
This document summarizes Fred Moyer's presentation on practical service level objectives with error budgeting. It discusses how to calculate error budgets based on service level indicators, objectives, and agreements. It presents methods to calculate error budgets using log files by counting errors and slow requests, and using metrics like counters and histograms to track errors and response time distributions over time. Maintaining an error budget allows managing risk while releasing new code by setting a target for acceptable errors or slow requests.
This document summarizes a presentation about properly computing service level objectives (SLOs) using latency data. It discusses common mistakes like averaging percentiles, and better approaches like using histograms. Log linear histograms are recommended as they provide flexibility in choosing thresholds while being space efficient. Open source libraries like libcircllhist can be used to calculate SLOs from latency data stored in histograms.
The document discusses proper techniques for defining and measuring service level objectives (SLOs). It begins with an overview of SLOs, service level indicators (SLIs), and service level agreements (SLAs). It then describes a common mistake in averaging percentiles across data sets. The rest of the document discusses different methods for accurately computing SLOs using log data, counting requests, and histograms. It argues that histograms provide the most flexibility while avoiding issues with averaging percentiles.
The document discusses techniques for accurately calculating service level objectives (SLOs) based on latency. It begins with an overview of common SLO terminology. It then describes a common mistake where percentiles are incorrectly averaged across time windows. The document proceeds to examine approaches to computing SLOs using log data, request counting, and histograms. Histograms are identified as the most flexible technique since they allow thresholds to be chosen as needed and provide full statistical analysis of latency data.
This document discusses best practices for defining and measuring latency service level objectives (SLOs). It recommends computing SLOs directly from raw log data using histograms, which allow arbitrary percentiles to be derived and are better than averaging sample percentiles. Histograms can also be aggregated over time and used to count the number of requests above a latency threshold regardless of what the threshold was set to initially. Common histogram implementations like HDR-Histogram and t-digest are suggested.
Comprehensive Container Based Service Monitoring with Kubernetes and IstioFred Moyer
This document summarizes Fred Moyer's talk on comprehensive container-based service monitoring with Kubernetes and Istio. The talk covered Istio architecture and deployment, using the Istio sample bookinfo application, and monitoring the application with Istio metrics and Grafana dashboards. It also discussed Istio Mixer metrics adapters, math and statistics concepts like histograms and quantiles, and monitoring concepts like service level objectives, indicators, and agreements. The talk provided exercises for attendees to deploy sample applications and create custom metrics adapters.
Comprehensive container based service monitoring with kubernetes and istioFred Moyer
The document provides an overview of using Kubernetes and Istio to monitor microservices. It discusses using Istio to collect telemetry data on requests, including rate, errors, and duration. This data can be visualized in Grafana dashboards to monitor key performance indicators. Histograms are recommended to capture request durations as they allow calculating percentiles over time for service level indicators. An Istio metrics adapter is also described that sends telemetry data to Circonus for long-term storage and alerting.
This document provides an overview of key statistical concepts including:
1. The average (arithmetic mean) is calculated by summing all values and dividing by the number of samples.
2. The median is the middle value of a data set when values are sorted from lowest to highest.
3. The 90th percentile represents the value where 90% of values are below it.
4. Standard deviation measures how spread out values are from the average and 68% of values fall within one standard deviation of the average in a normal distribution.
Fred Moyer from Circonus presented on IRONdb and Grafana. IRONdb is a time series database that can replace existing TSDBs without changes to ingestion or visualizations. It provides scale, reliability, and ease of operations. IRONdb is distributed, replicated across multiple datacenters for reliability, and can store years of high-cardinality histogram and metric data. The upcoming IRONdb data source for Grafana will support histograms, stream tags, and Prometheus storage. Attendees could sign up for early access and preview accounts.
Better service monitoring through histogramsFred Moyer
This document discusses using histograms and percentiles to better monitor service performance. It begins by noting the limitations of synthetic monitoring and outlines how real user data can provide a more accurate picture. Percentiles like the median and 90th percentile are explained as useful metrics for understanding performance. Histograms of request latency data over time are presented as a way to detect non-normal distributions that could indicate issues. Calculating alerting thresholds based on percentiles rather than averages is advocated to avoid missing multiple high samples. Examples are given of how percentile-based alerting can more effectively detect performance problems and avoid unnecessary alerts.
The document discusses differences between Perl and Go for Perl programmers. It covers Go topics like goroutines (threads), channels (queues), formatting code with gofmt, defining structs instead of hashes/objects, using slices instead of arrays, maps instead of hashes, error handling, importing packages instead of using Perl modules, writing tests with godoc instead of perldoc, and getting code with go get instead of cpanminus. It also provides Golang web resources for learning more.
Netfilter was used to solve performance and scalability issues with an existing captive portal solution. A netfilter module was developed that removed port numbers from HTTP requests, allowing most static content to be fetched directly from origin servers rather than through a proxy. This avoided proxying all traffic and achieved better performance than alternatives like Tinyproxy. The netfilter solution worked well technically but did not prove viable long-term for business reasons.
This document discusses Apache::Dispatch, a lightweight abstraction layer for mod_perl applications. It maps URIs to application resources via method handlers, providing the power of mod_perl handlers with a painless migration. The document reviews how Apache::Dispatch works, provides examples of configuration, method handlers, and testing with Apache::Test. It also covers additional Apache::Dispatch features like pre/post-dispatch handlers, inheritance, autoloading, and filtering.
This document discusses the Data::FormValidator module, which provides a simplified way to validate form data in Perl. It allows defining validation profiles that specify required and optional fields, as well as custom and built-in constraint methods. The module takes request parameters, runs validation according to the profile, and returns results that can be easily integrated into templates to display error messages.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
4. The Seven Stages Of Grief Scaling
1. Shock and Denial
2. Pain and Guilt
3. Anger and
Bargaining
4. Depression &
Reflection
5. The Upward Turn
6. Reconstruction
7. Acceptance and
Hope
5. The Seven Stages Of Grief Scaling
1. Shock and Denial
2. Pain and Guilt
3. Anger and
Bargaining
4. Depression &
Reflection
5. The Upward Turn
6. Reconstruction
7. Acceptance and
Hope
1. Monolithic Scaling
2. Hardware is
Expensive
3. If We Do It This
Way...
4. We Are So *%@#!&
5. Down To 150 Bugs!
6. Release Day
7. Beer & Therapy
(beerapy?)
6. The Problem
● The ability to efficiently backup and restore
● The amount of ram required to keep indexes
in memory
● Resource contention causing query planner
to make sub-optimal choices.
● Aged data extending query resources and
execution time
● Overlap in existing ID spaces
● No account crossover between shards. I.E.
Tii-UK and Tii require separate accounts.
7. Stage 2: Options
● Account based sharding
o Difficult to split account usage evenly across shards.
● Geographical based sharding
o Currently have one geographical shard (UK).
o Added deployment, poor resource utilization.
● Oracle RAC ($$$)
o Oracle OpenWorld is Sunday in SF. No bacon there.
● Horizontal sharding
o Move fast growing tables to separate physical hosts.
o Break relational constraints.
o Good path to a service oriented architecture node.
14. Stage 2: Options
Three Part Two Year Proposal: Short, Mid, and
Long term Goals.
Short: 3 Months
Query Partition and Refactor
Removal of ‘Leaf Service’: Marks
15. Stage 2: Options
Three Part Two Year Proposal: Short, Mid, and
Long term Goals.
Mid: 9 Months
ID Reconciliation Between Shards
Table Partitioning
16. Stage 2: Options
Three Part Two Year Proposal: Short, Mid, and
Long term Goals.
Long: 12 Months
Create DAL
Removal of Large Tables
Global Statistics and Reporting
17. Stage 2: Options
Short Term: 12 Months Later
I do not think it means what you think it
means.
21. Stage 3: Scoping The Solution - Database
Original: 236 tables
New main database (192 tables)
New marks database (40 tables)
22. Stage 3: Scoping The Solution - Code
Option 1 - Data Access Layer (DAL)
o Separate codebase encapsulating new set of tables
o Written in Golang, an HTTP based REST service
o Avoids carrying forward existing technical debt
o Requires detailed knowledge of existing product features
o Unit tests are very helpful, but coverage is never 100%
o 14 years of business logic (dark matter)
o In long lived web apps, tribal knowledge is authoritative
23. Stage 3: Scoping The Solution - Code
Option 2 - Add additional database handles to new db
o Perceived as a safer approach (deciding factor,
known risks).
o Requires paying interest on existing technical debt.
o Refactoring is less risky than rewriting.
o Take advantage of existing business logic and tribal
knowledge.
o Preserve sacred cows.
24. Stage 3: Scoping The Solution -
Hardware
"We can use smaller hardware because we are
splitting off part of the database"
➢This is somewhat of a fallacy
➢You might need smaller storage
➢You might need slightly less CPU
➢Stick with close to the same amount of RAM
25. Stage 4: Implementation - Rollback
S: “What if this fails?”
F: “We Rollback the code, restore the database,
and look for new jobs.”
26. Stage 4: Implementation - Rollback
Q: How do you bifurcate a database and
rollback without data loss?
A: Slony.
27. Stage 4: Implementation - Rollback
Timelines matter. Prepare in advance.
Split Replication Well In Advance.
Test Process, Then Test It Again.
28. Stage 4: Implementation - Archaeology
● What is this table? That service doesn’t exist
anymore?
○ Let’s Drop it!
●What’s that table? It’s an old version still in
use?
○ Let’s Drop it!
●What’s that one over there?
○ Let’s Drop it!
30. Stage 4: Implementation - Archaeology
● Fourteen years of application development.
● Five major codebases, dozens of support utilities.
● Hundreds of codepoints for database connections.
● A dozen different ORMs.
● Dynamically generated SQL joining tables.
● Technical debt (code with high maintenance costs).
● Best practices of 10 years ago are now liabilities.
31. How do you change all of
the electrical sockets in an
(old) office building?
Stage 4: Implementation - Archaeology
34. Stage 4: Implementation - Archaeology
James left 8 years ago. The elevator is in old building.
They tore down the old building to build a Target.
# this code is critical to our workflow, don’t remove it!!
# for details talk to jamesb <> who sits near the elevator
# $foo = $object->flocculate( key => $cfg->secret_key );
# return $foo;
return;
35. Stage 4: Implementation - Archaeology
Bob is still here though. Bob is a little particular
about his code though (we are all to some degree).
Now you’re in there meddling with Bob’s code. How
would you feel if you were Bob?
A little empathy goes a long way towards getting Bob
to help you get his code ported to the new dual
database schema.
36. Stage 4: Implementation - Queries
main database - marks database
SELECT count(m.*) FROM
gm3_mark m, gm3_qm_template qmt
WHERE m.read IN
(SELECT dgr.id FROM m_dg_read dgr
JOIN m_object_paper mop ON (mop.id =
dgr.source AND mop.owner = ?)
JOIN m_assignment ma ON (ma.id =
mop.assignment AND ma.class = ?) WHERE
reader = ?)
AND m.qm_template = qmt.id AND qmt.id = ?
37. Main Database - grab ids to pass to marks database.
SELECT p.id
FROM m_object_paper p
JOIN m_assignment a
ON a.id = p.assignment
WHERE a.class = ?
AND p.owner = ?
Stage 4: Implementation - Queries
38. Stage 4: Implementation - Queries
Marks database - pass former FK ids to an IN clause.
SELECT count(m.*)
FROM gm3_mark m
JOIN gm3_qm_template qmt
ON qmt.id = m.qm_template
JOIN m_dg_read dgr
ON dgr.id = m.read
WHERE dgr.source IN (?, ?, ?)
AND qmt.id = ? AND dgr.reader = ?
39. Stage 4: Implementation - Transactions
Single database transactions are easy.
eval {
$db->do(“INSERT INTO foo (name) VALUES
(‘bar’)”);
$id = $db->do(“SELECT CURRVAL(‘foo’)”);
$db->do(“INSERT INTO fee (foo_id) VALUES
($id)”);
};
if ($@) { # catch exception
$db->rollback; # roll transaction back
} else {
$db->commit; # commit transaction
}
40. Stage 4: Implementation - Transactions
Dual database transactions are harder.
eval {
# insert into foo in main db, grab last value
$main_db->do(“INSERT INTO foo VALUES (‘bar’)”);
$foo_id = $main_db->do(“SELECT CURRVAL(‘foo’)”);
# insert foo id into marks db, grab last value
$marks_db->do(“INSERT INTO fee VALUES ($id)”);
$fee_id = $main_db->do(“SELECT CURRVAL(‘fee’)”);
};
41. Stage 4: Implementation - Transactions
Roll back both handles on exception, commit both on
success.
if ($@) { # catch exception
$main_db->rollback; # roll main_db back
$marks_db->rollback; # roll marks_db back
} else {
$main_db->commit; # commit main_db
$marks_db->commit; # commit marks_db
}
42. Stage 4: Implementation - Transactions
What if the commit fails?
if ($@) { # catch exception
$main_db->rollback; # roll main_db back
$marks_db->rollback; # roll marks_db back
} else {
eval { $main_db->commit };
if ($@) {
$main_db->rollback;
$marks_db->rollback;
}
eval { $marks_db->commit }; ...
45. Stage 4: Implementation - Transactions
9 out of 10 users
prefer availability
So does customer support.
You can fix consistency.
46. Stage 4: Implementation - ORMs
ORMs are full of pain
● They hide away db connection details.
● They make it hard to break models apart.
● They make writing code easy…
● But debugging is much more difficult.
47. Stage 4: Implementation - ORMs
ORMs are full of pain
Back in my day we used SQL, and we liked it.
$classes = $c->classes->search( $select_hash, {
'+select' => 'source.id',
'+as' => 'src_id',
'join' => [ { 'user_rights_class' =>
{ 'user_role' => 'owner' } }, 'source' ],
'rows' => 200,
'page' => 1
} );
51. Stage 4: Implementation - Juggling
Main database - Marks database
I think he was talking to me.
52. Stage 4: Implementation - Config
● Main Database: One master, two slaves (2)
● Marks Database: One master, two slaves (2)
● ASP application: write user, read only user (2)
● Catalyst Application: write user, read only user (2)
● REST Application: write user, read only user (2)
● dev, qa, staging, production, sandbox, uk (6)
53. Stage 4: Implementation - Config
● Database hosts and users: 2*5 = 10
● Stages: 10 * 6 = 60
● Config managed in version control, no discovery.
● Config deployed via RPM with application.
● Get one wrong? Start all over again.
● Configuration is full of pain and suffering.
63. Stage 6: Cleanup
Patch Flavors:
How Did That Get there?
That’s a bug.
It worked fine in dev.
64. Stage 6: Cleanup
“Sometimes the query planner does dumb things”
o People forget why you embarked on this effort.
o People forget the successes and risk mitigation.
o People won’t forget the visceral reactions to
service degradations.
65. Stage 6: Cleanup
How to bring your site to a halt:
1.Start transaction to database 1
2.Start transaction to database 2
3.Wait for database 1 to finish