At point A, the entire QuerySet of 2,500,000 Order objects would be loaded into memory. This defeats Django's lazy loading and is extremely inefficient. It's better to use QuerySet methods like update() to perform updates without iterating.
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL-Consulting
This talk is prepared as a bunch of slides, where each slide describes a really bad way people can screw up their PostgreSQL database and provides a weight - how frequently I saw that kind of problem. Right before the talk I will reshuffle the deck to draw ten random slides and explain you why such practices are bad and how to avoid running into them.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL-Consulting
This talk is prepared as a bunch of slides, where each slide describes a really bad way people can screw up their PostgreSQL database and provides a weight - how frequently I saw that kind of problem. Right before the talk I will reshuffle the deck to draw ten random slides and explain you why such practices are bad and how to avoid running into them.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
This is an exam cheat sheet hopes to cover all keys points for GCP Data Engineer Certification Exam
Let me know if there is any mistake and I will try to update it
We will see internal architecture of spark cluster i.e what is driver, worker, executor and cluster manager, how spark program will be run on cluster and what are jobs,stages and task.
Dead Letter Queues for Kafka Consumers in Robinhood, Sreeram Ramji and Wenlon...HostedbyConfluent
Dead Letter Queues for Kafka Consumers in Robinhood, Sreeram Ramji and Wenlong Xiong | Current 2022
Robinhood uses Kafka in every line of its business, from stock and crypto trading to its self-clearing system and online data analytics. Robinhood’s fleet of microservices use Apache Kafka for building an event-driven architecture where services communicate with each other asynchronously. Producers and consumers to a kafka topic are almost always completely different teams, thus the schema of events in kafka is the only API for downstreams to rely on. We have seen over time that there can be multiple ways an event fails to be processed successfully by a downstream kafka consumer. The reasons range from being unable to deserialize, upstream code changes resulting in bad data, etc..
This talk discusses how we built libraries, templated micro services and tooling that leverages Postgres and Kafka for safely dealing with dead letters, inspecting and querying them, and republishing them to retry kafka topics for safe reprocessing at a later time. We also dive deeper into how this improved the operability and on-call health of all of our kafka application developers.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
MySQL Replication Performance Tuning for Fun and Profit!Vitor Oliveira
MySQL Replication, in addition to bringing high-availability, is the foundation to build high-performance MySQL database systems. Using read scale-out and sharding one can design systems that go from the capacity of a single server to supporting the largest internet sites. But to design and operate high-performance, efficient, manageable and reliable deployments requires knowing the intricacies of the underlying technologies.
This session will provide insights on the main factors that affect the performance of Asynchronous Replication and Group Replication, and how to configure them to make the most out of the underlying computing system. It will also show the latest developments in MySQL 5.7 and 8.0, in areas spanning from group communication to the multi-threaded slave applier, and how effective they are in helping meet the performance requirements in terms of throughput, latency and durability to support the most demanding workload types.
This is an exam cheat sheet hopes to cover all keys points for GCP Data Engineer Certification Exam
Let me know if there is any mistake and I will try to update it
We will see internal architecture of spark cluster i.e what is driver, worker, executor and cluster manager, how spark program will be run on cluster and what are jobs,stages and task.
Dead Letter Queues for Kafka Consumers in Robinhood, Sreeram Ramji and Wenlon...HostedbyConfluent
Dead Letter Queues for Kafka Consumers in Robinhood, Sreeram Ramji and Wenlong Xiong | Current 2022
Robinhood uses Kafka in every line of its business, from stock and crypto trading to its self-clearing system and online data analytics. Robinhood’s fleet of microservices use Apache Kafka for building an event-driven architecture where services communicate with each other asynchronously. Producers and consumers to a kafka topic are almost always completely different teams, thus the schema of events in kafka is the only API for downstreams to rely on. We have seen over time that there can be multiple ways an event fails to be processed successfully by a downstream kafka consumer. The reasons range from being unable to deserialize, upstream code changes resulting in bad data, etc..
This talk discusses how we built libraries, templated micro services and tooling that leverages Postgres and Kafka for safely dealing with dead letters, inspecting and querying them, and republishing them to retry kafka topics for safe reprocessing at a later time. We also dive deeper into how this improved the operability and on-call health of all of our kafka application developers.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
MySQL Replication Performance Tuning for Fun and Profit!Vitor Oliveira
MySQL Replication, in addition to bringing high-availability, is the foundation to build high-performance MySQL database systems. Using read scale-out and sharding one can design systems that go from the capacity of a single server to supporting the largest internet sites. But to design and operate high-performance, efficient, manageable and reliable deployments requires knowing the intricacies of the underlying technologies.
This session will provide insights on the main factors that affect the performance of Asynchronous Replication and Group Replication, and how to configure them to make the most out of the underlying computing system. It will also show the latest developments in MySQL 5.7 and 8.0, in areas spanning from group communication to the multi-threaded slave applier, and how effective they are in helping meet the performance requirements in terms of throughput, latency and durability to support the most demanding workload types.
Slides from talk given on Java/Scala Lab 2014 in Odessa, Ukraine. Describes of how Java can be used as platform for latency-restricted applications such as High Frequency Trading and demonstrates how latencies 15-30µsec can be achieved on vanilla Oracle JDK.
NWN Corporation's Doug Syer, VP of Technology, presents Practical DMD Scripting.
Access the full presentation recordings for GalaxZ17 here: http://ow.ly/WyBu30cakk0
Have you heard of TDD? Are you interested or familiar with this practice but have never been able to understand it?
In this presentation we will see the benefits of Test-Driven Development (TDD), understand how it works and what the benefits are of using it. We will see in a more detailed approach this way of developing software, where our software is always built guided by tests.
We will go over some history about TDD, which is the main process we must follow when we work with this mechanic and the rules that surround it. We will also list the main advantages and disadvantages that most developers who practice TDD find and whether the arguments in favour add up to more than those that subtract. Finally, we will review some good habits and practices when applying TDD and see how to do it step by step with an example of "live" coding session with Java.
At the end of the session, we hope that you will have a wider understanding of what TDD is, what advantages it brings, why it is interesting to master it and also that you will take with you some tricks and good practices to be able to apply them in your day-to-day life when writing code
PostgreSQL worst practices, version PGConf.US 2017 by Ilya KosmodemianskyPostgreSQL-Consulting
This talk is prepared as a bunch of slides, where each slide describes a really bad way people can screw up their PostgreSQL database and provides a weight - how frequently I saw that kind of problem. Right before the talk I will reshuffle the deck to draw twenty random slides and explain you why such practices are bad and how to avoid running into them.
Dapper: the microORM that will change your lifeDavide Mauri
ORM or Stored Procedures? Code First or Database First? Ad-Hoc Queries? Impedance Mismatch? If you're a developer or you are a DBA working with developers you have heard all this terms at least once in your life…and usually in the middle of a strong discussion, debating about one or the other. Well, thanks to StackOverflow's Dapper, all these fights are finished. Dapper is a blazing fast microORM that allows developers to map SQL queries to classes automatically, leaving (and encouraging) the usage of stored procedures, parameterized statements and all the good stuff that SQL Server offers (JSON and TVP are supported too!) In this session I'll show how to use Dapper in your projects from the very basis to some more complex usages that will help you to create *really fast* applications without the burden of huge and complex ORMs. The days of Impedance Mismatch are finally over!
Things that can go wrong when you're writing a cloud orchestration suite, or pretty much any other kind of highly available distributed system in Erlang (or other programming languages)
How to design software and APIs for parallelism on modern hardware
- Richard Parker, mathematician and freelance computer programmer in Cambridge, England
I see only three techniques capable of speeding up well written but old software by a factor of ten or more, namely
- Using multiple cores
- Making good use of cache memory
- Using the full power of a single processor.
So far, commercial software seems to have ignored recent changes in hardware. Often, performance does not really matter, but when it does matter, it requires changes to the design and coding of the computer software.
I have been looking into the changes needed for several years now, and it is gradually emerging that one aspect sticks out as being more important than all the others.
If functions, subroutines, methods etc. do a large number of whatever they do, rather than just one, it becomes possible to write clever implementations that run much faster. Several examples of this will be discussed.
I suggest that software designed and written today should take this into account, making it easier to speed it up if and when it becomes necessary to do so.
How I Learned to Stop Worrying and Love Legacy Code.....Mike Harris
Legacy Code. I never wrote it; everybody else did!
How many times have you waded through an ageing, decaying, tangled forrest of code and wished it would just die?
How many times have you heard someone say that what really needs to happen is a complete rewrite?
I have heard this many times, and, have uttered that fatal sentence myself.
But shouldn’t we love our legacy code?
Doesn’t it represent our investment and the hard work of ourselves and our predecessors?
Throwing it away is dangerous, because, before we do, we’ll need to work out exactly what it does, and we’ll need to tweeze out that critical business logic nestled in a deeply entangled knot of IF statements. It could take us years to do, and we’ll have to maintain two systems whilst we do it, inevitably adding new features to them both. Yes we get to reimplement using the latest, coolest programming language, instead of an old behemoth, but how long will our new cool language be around, and who will maintain that code, when it itself inevitably turns to legacy?
We can throw our arms in the air, complaining and grumbling about how we didn’t write the code, how we would never have written it the way it is, how those that wrote it were lesser programmers, possibly lesser humans themselves, but the code still remains, staring us in the face and hanging around for longer that we could possibly imagine. We can sort it out, we can improve it, we can make it testable, and we can learn to love our legacy code.
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON Byrum
The flexibility of OpenStack is a dual-edged sword, giving you unprecedented control over your infrastructure, but potentially becoming a nightmare for the indecisive manager, architect or sysadmin!
In this presentation, Tom Fifield – co-author of the OpenStack Operations Guide, and Community Manager at the OpenStack Foundation – takes you through some of the decisions you will face when planning your OpenStack cloud. In addition to a brief introduction on OpenStack and advice on how to interact with the community, he will cover topics such as:
How to approach your deployment, ranging from DIY to a turn-key solution from the ecosystem
Storage and networking decisions, including plugin options
Automating deployment and configuration with popular tools like Puppet and Chef
Through discussion of the ecosystem, customization and scaling, you’ll walk away with an understanding of ‘what it takes’ to build your OpenStack cloud.
Protecting Open Innovation with the Defensive Patent LicenseOSCON Byrum
The Defensive Patent License (DPL) is a new legal mechanism to protect innovators by creating a patent network that is committed to defense and "de-weaponizing" patents. It draws from the theories and values of F/OSS licensing to create obligations that "travel with the patent"--preventing troll from taking over open technologies and pulling them out of the public domain.
Using Cascalog to build an app with City of Palo Alto Open DataOSCON Byrum
"Using Cascalog to build an app with City of Palo Alto Open Data" by Paco Nathan, presented at OSCON 2013 in Portland. Based on a case study from "Enterprise Data Workflows with Cascading" http://shop.oreilly.com/product/0636920028536.do
Finite State Machines are overlooked at best, ignored at worst, and virtually always dismissed. This is tragic since FSMs are not just about Door Locks (the most commonly used example). On the contrary, these FSMs are invaluable in clearly defining communication protocols – ranging from low-level web-services through complex telephony application to reliable interactions between loosely-coupled systems. Properly using them can significantly enhance the stability and reliability of your systems.
Join me as I take you through a crash-course in FSMs, using erlang’s gen_fsm behavior as the background, and hopefully leaving you with a better appreciation of both FSM and erlang in the process.
OpenCar covers OS development for a new market: automotive apps. In-car apps are poised to explode for open source developers. The market is transforming from an inefficient, proprietary model to an HTML5-based “app store” model. To enter and participate in this new target category, developers need access to automakers, automotive systems, and knowledge of industry standards and platforms. http://sdk.opencar.com
How we built our community using Github - Uri CohenOSCON Byrum
The journey of GigaSpaces as a company in building the Cloudify open source product, what worked and what didn't and how it used Github as the platform for not just hosting the code
The Vanishing Pattern: from iterators to generators in PythonOSCON Byrum
The core of the talk is refactoring a simple iterable class from the classic Iterator design pattern (as implemented in the GoF book) to compatible but less verbose implementations using generators. This provides a meaningful context to understand the value of generators. Along the way the behavior of the iter function, the Sequence protocol and the Iterable interface are presented. The motivating examples of this talk are database applications.
This talk covers why Apache Zookeeper is a good fit for coordinating processes in a distributed environment, prior Python attempts at a client and the current state of the art Python client library, how unifying development efforts to merge several Python client libraries has paid off, features available to Python processes, and how to gracefully handle failures in a set of distributed processes.
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...OSCON Byrum
Presented by Diane Mueller, ActiveState @pythondj
Are you unsure what the security and privacy implications are for sensitive corporate data? US Patriot Act is causing many of us to hesitate on leveraging the cloud.
Organizations are thinking long and hard about the legal and regulatory implications of cloud computing. When it comes to actual corporate data, no matter what the efficiency gains are, legal departments are often directing IT departments to steer clear of any service that eliminates their ability to keep potential sensitive information out of the hands of Federal prosecutors.
Despite all the hype about every application moving into the cloud, some practical patterns are starting to emerge in the types of data corporations are willing to move to the cloud.
Covered in this session:
(a) Introduction to the US Patriot Act and Data Privacy issues Implications for on Cloud Computing Jurisdictional Issues
(b) Best Practices & Practical Patterns Classes of applications that best leverage the cloud
(c)What types of applications should stay on-premise Private Cloud Model(s) Building a Compliant Cloud Strategy
For more information:
email me at dianem {at} activestate {period} com
or ping me on twitter at @pythondj
visit http://activestate.com/stackato
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking OSCON Byrum
The BodyTrack project develops open source tools self tracking tools to aggregate and visualize data from diverse sources such as wearable sensors, observations from mobile apps, photos, and environmental data. Our goal is to empower individuals to explore potential environment/health interactions (food sensitivities, asthma or migraine triggers, sleep problems, etc.) and better assess strategies they think might help.
A Look at the Network: Searching for Truth in Distributed ApplicationsOSCON Byrum
A talk by C. Scott Andreas (@cscotta) of Boundary on "the network" and designing / deploying distributed applications.
This session offers a deep-dive into how application-level problems manifest at the network level. Some of these cases range from basic network partitions and node outages to sophisticated application-level changes such as garbage collections on managed runtimes, classes of bugs which evade conventional monitoring but constitute partial failures, changes in network activity based on database partitioning, load balancing, and sharding, and other warning signs that crop up at layer three long before wreaking havoc at layer seven as customer-visible failures begin to occur. Combining application-level metrics with network analytics is a powerful cocktail for identifying hot spots quickly, and connecting the dots out to the client closes the loop.
Faster! Faster! Accelerate your business with blazing prototypesOSCON Byrum
Bring your ideas to life! Convince your boss to that open source development is faster and cheaper than the "safe" COTS solution they probably hate anyway. Let's investigate ways to get real-life, functional prototypes up with blazing speed. We'll look at and compare tools for truly rapid development including Python, Django, Flask, PHP, Amazon EC2 and Heroku.
Comparing open source private cloud platformsOSCON Byrum
Private cloud computing has become an integral part of global business. While each platform provides a way for virtual machines to be deployed, implementations vary widely. It can be difficult to determine which features are right for your needs. This session will discuss the top open source private cloud platforms and provide analysis on which one is the best fit for you.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4
Unbreaking Your Django Application
1. Three Years of Worst Practice
or
Unbreaking Your Django Application
OSCON 2011
Christophe Pettus <cpettus@pgexperts.com>
PostgreSQL Experts, Inc.
2. Welcome!
• Time for the quick skills check:
• Time with Django?
• Database expertise?
3. On the menu.
• Model Design.
• Efficient use of the ORM.
• Transactions and Locking.
• External Tools and Techniques.
• Database-Level Debugging.
4. Batteries included.
• Nothing to buy, nothing to download.
• Stop me to ask questions.
• I’ll get some things wrong…
• So call me on it!
• Duty cycle: 50 minutes on, 10 minutes off.
5. Welcome to my world.
• Lots of clients using Django.
• Lots of clients with minimal database
experience.
• But very skilled Python programmers.
• Lots of emergency calls about failing
applications…
• … applications that worked great in test.
6. The good.
• Django can get a plausible application going
very fast.
• The Django admin is amazingly useful for
quick prototyping.
• Development generally happens on laptops or
virtual servers with small datasets.
• … and then production happens.
7. The bad.
• Django does not encourage scalable
programming techniques.
• It is not unique in this regard, of course.
• Many Django programmers have never built
very large systems before.
• Naïve programming techniques can result in
some really, really bad code.
8. The ugly.
• Major public-facing site.
• Old site in proprietary everything, replaced
by Django and Postgres.
• Worked great in test.
• On launch day…
• 300,000 users…
• … simultaneously.
9. What happened?
• Horribly slow performance (> 2 minutes to
load a single page).
• 25% of requests died with deadlocks.
• Nightly batch update process required 26
hours to run.
• Customers were… somewhat dissatisfied with
the transition to the new site.
10. The stages.
• Denial: “It’s just launch jitters.”
• Anger: “Why is POSTGRES SO FSCKING
SLOW?”
• Bargaining: “All we have to do is make this little
tweak here, and here, and here…”
• Depression: “We’re all going to die. And lose
our jobs. And our customers will feast upon
our corpses. Worse, we need a consultant.”
11. Acceptance.
• Models created locking problems.
• Code made terrible use of the ORM.
• Design of the templates defeated caching.
• Transaction model was not thought through.
• The database was a black box that no one
there could really diagnose.
12. The goal here is to…
• Understand how we get into these bad
situations.
• Avoid the problems at the start of the process,
when they’re easy to fix.
• Laugh painfully in recognition of these
problems in our own code. (Mine too!)
• And talk myself out of a job.
14. The model and you!
• The classes which Django maps into database
tables.
• Includes basic index specifications.
• Includes foreign-key relationships.
• Includes default values and constraints…
• … none of which passed on to the database
for enforcement. Doh.
15. Let’s talk about…
• Good model design.
• Foreign key relationships.
• Many-to-many.
• Indexing strategy.
• Migrations and how to avoid crushing your
application right in the middle of the highest
traffic period.
• Not that you’d do that, of course.
16. Mr Bad Example
class Class(Model):
class_code = CharField(length=5, unique=True)
class Roster(Model):
which_class = ForeignKey(Class)
student_name = CharField(length=40)
date_registered = DateTimeField()
sessions_attended = PositiveIntegerField()
grade_received = CharField(length=2)
waiting_list_position =
PositiveIntegerField(null=True)
18. Don’t use NULL.
• NULL does not do what you expect.
• No matter what you expect.
• SUM(1+NULL) is not 1 + NULL
• It’s tempting to use as a flag value. Resist.
• It’s not as consistent as None in Python.
19. Separate derived data.
• Data that can be from other sources should be
in its own model class.
• Allows recalculation without locking the main
object.
• Allows implementation as a view or a
materialized view.
• At minimum, wrap in a getter to hide the
underlying implementation.
20. Normalize.
• Don’t duplicate fields in derived or related
objects.
• That’s what foreign keys are for.
• Get the data model right, and then get the
templates to work with the model.
21. Primary Keys
• Django will create a primary key for you if you
don’t specify one.
• Sometimes that’s the right idea.
• Sometimes the data has a natural primary key
already.
• Use that if it does.
• Admittedly, Django’s primary key support is
not everything it could be.
22. Separate Application Zones
• Data frequently is coming from different
applications.
• Don’t smash them all into one object.
• #1 cause of deadlocks.
• In the extreme case, you can reduce the main
object to just an identity table.
• Or at least just the most static data.
23. Foreign Keys
• Good news, bad news.
• Keep rows small and focused.
• Better locking characteristics.
• Django doesn’t have a good primitive for
returning an object graph in one call.
• Multiple DB trips required.
24. So, what do to?
• If you almost always return the parent along
with the related object, consider object
inheritance instead.
• That does do a nice join operation.
• Don’t create an insane class graph, though.
• If you frequently use the related object without
reference to the parent object, use a foreign
key.
25. Django hates hated you.
• Pre-1.3, the only foreign key integrity mode
was CASCADE ON DELETE…
• … done in the ORM.
• Huge source of deadlock problems.
• Fixed in 1.3, yay!
• Worth moving to 1.3 for this alone.
26. Many-to-Many
• Django has a rockin’ built-in
ManyToManyField…
• … which you often shouldn’t use use.
• Use if: the relationship is pure 1-1 and has no
attributes.
• Django slaps a gratuitous primary key on the
many-to-many field.
• Many-to-manys have combinatoric issues.
27. How do you solve a
problem like Maaaany… ?
• If you use attributed many-to-manys, create the
table outside of Django and use raw SQL.
• Yes, it sounds bad.You’ll thank me later.
• Create a primary key on the two things being
related (usually larger table first).
• More on indexes later.
• Or… denormalize.
28. Wait, you just said to…
• Databases are the art of the possible.
• Consider storing the relationship as an array in
the primary object.
• If “People” like “Restaurants,” People are the
primary object.
• Moral of the story: If the rules are breaking
you, break them back.
29. Indexing.
• Django makes it really easy to create indexes.
• Really, really easy.
• db_index=True and away you go!
• What could possibly go wrong?
30. This could.
• Database size on disk: 157GB.
• Database as a pg_dump (excludes bloat and
indexes): 9GB.
• Total bloat: 200MB.
• Percentage of indexes which had ever been
used since the server was brought up three
months prior: 2%.
31. What is a good index?
• Highly selective.
• Greatly reduces the number of candidate
rows when used.
• Used very regularly by the application code.
• Or required to enforce constraints.
32. What is a bad index?
• Every other index.
• Bad selectivity.
• Rarely used.
• Expense to maintain compared to the
acceleration of queries that use it.
• An unselective index is more expensive than a
sequential scan.
33. Indexing Strategy
• Start with none (except those required by keys
or constraints).
• No, none. Zero. Zip.
• Analyze the queries and look for ones which:
• Query a constant column or set of columns
AND
• Select <10%…25% of the rows in the target
table.
34. Exceptions to the rules.
• There are always some.
• SSDs have different seek characteristics from
spinning disks.
• So do highly virtualized storage mechanisms
like Amazon EBS.
• 50%-75% selectivity may be in a win in those
cases.
• Test, test, test. Don’t assume anything.
35. Multi-column indexes.
• Two choices:
• A single index that includes both. (You’ll
need to do this outside of Django in 1.3, but
that’s OK: we’re all grown-ups, right?)
• Two indexes, one for each column.
• The size will be roughly equivalent between the
two.
• So, how do choose?
36. A single composite index…
• … will be somewhat smaller than two indexes.
• … will definitely be faster to update.
• … will accelerate a more restricted range of
queries.
• An index on (A, B) does nothing for queries
just using B.
• But on the queries it does accelerate, it will be
faster.
37. Two indexes…
• … will be somewhat larger.
• … will be definitely be slower to update.
• … will help on queries on both columns, but
not as much as a single composite index.
• … will help accelerate queries on either of the
columns.
• … can be expressed directly in Django.
• So, test already.
38. About that testing thing.
• Your database has much to tell you. Let it do
so.
• pg_stat_activity is a wealth of useful
information…
• … like which indexes are actually being used.
• If an index is not being used (or not being used
very often), drop it like a hot potato…
• … or, if you are flabbergasted that it is not
being used, find out why.
39. Migrations
• Django doesn’t have any migration tools out of
the box.
• South is pretty nice.
• So, we have a table that needs a new column.
• Either manually, or using South, or whatever, we
issue the fatefully command:
41. What could go wrong?
• Well, nothing, but…
• That table had 65,000,000 rows in it…
• And was a critical, constantly-used table…
• On very slow I/O.
• And that ALTER had to rewrite the entire
table.
• Which meant taking an exclusive lock.
43. Amazingly, no one lost
their jobs.
• Adding a column with a DEFAULT rewrites the
table in Postgres.
• Adding a NULLable column with no default
generally does not rewrite the table.
• No migration tool (that I know of) does
anything particularly helpful in this regard.
• Open-source street cred opportunity!
44. How to handle this?
-- First alternative: Works during table writes.
ALTER TABLE x ADD COLUMN b BOOLEAN NULL;
-- Very fast.
UPDATE x SET b = FALSE;
-- Rewrites table but does not take AccessExclusive.
ALTER TABLE x ALTER COLUMN b SET NOT NULL;
-- Takes AccessExclusive, but just scans the table.
45. How to handle this?
-- Second alternative: Read-only table.
BEGIN;
CREATE TABLE x_new AS SELECT x.*, FALSE AS b FROM x;
-- Duplicate table, adding new column.
ALTER TABLE x_new ALTER COLUMN b SET NOT NULL;
-- Scans column.
DROP TABLE x;
ALTER TABLE x_new RENAME TO x;
COMMIT;
46. How to handle this?
• Or just wait until 3am.
• None of these solutions are without their
flaws.
• In the particular case, compounded by the
terrible I/O speed of their hosting
environment.
• Make sure you test everything in a realistic QA
environment before deployment.
48. A note about hosting.
• Virtual servers tend to have acceptable CPU
performance…
• … and really, really bad I/O performance.
• Like USB thumb drive speeds.
• I/O performance is very expensive compared
to CPU performance.
• Databases need good I/O performance to
perform well.
49. To continue ranting.
• Sudden reboots because of underlying host
problems are often an issue.
• A certain extremely popular provider of
hosting has storage that simply freezes for 3-4
seconds at a time.
• Needless to say, this has less-than-salutary
effects on the performance of any database
that has its write-ahead log on that volume.
• Like, it freezes.
50. The awful truth.
• Carefully test the performance of any
environment (virtualized or non) before putting
it into production.
• Be very clear with any virtualization vendor
what their service level commitment is.
• If you expect your database to grow beyond
the amount of RAM you can get or afford, pay
close attention to I/O performance.
51. Remember…
• The unique selling point of your company is
not who you host with.
• Always be prepared to change hosting
environments if it is not providing what you
need.
• Do not distort your architecture to meet the
needs of a hosting provider.
• Do not fall victim to hosting Stockholm
Syndrome.
53. Let us now praise famous
ORMs.
• The Django ORM:
• Allows for very Python-esque code.
• Handles a wide range of very common
database interactions elegantly.
• Can generate a remarkable range of queries.
54. But.
• There are some things at which it does not
excel.
• The farther one moves from the load-modify-
store paradigm, the wilder the scenery gets.
• Knowing when to step outside the ORM is
important.
• The good news is that Django has great raw
SQL facilities that compliment the ORM.
55. The basic rules.
1. Do not iterate over QuerySets.
2. If you think you have to iterate over a
QuerySet, see rule #1.
3. If you are absolutely, certainly, 100% positive
that the only possible solution to your problem
is iterating over a QuerySet, see rule #3.
56. Real code.
qs = Orders.objects.all()
# There are about 2,500,000 rows in “orders”.
for order in qs:
order.age_in_days += 1
order.save()
# Astonishingly, this code snippet did not perform
# to the standards the client expected. They wanted
# to know why Postgres was so slow.
57. Iteration: the awful truth
• Defeats Django’s lazy-evaluation mechanism by
dragging everything into memory.
• Hauls data across the wire from the database
just to process it locally.
• Does filtration or summarization in the
application that is much more efficiently done
in the database.
• Databases have decades of experience in this
kind of munging… respect your elders!
58. Alternatives
• QuerySet.update()
• cursor.execute(“UPDATE app_orders ...”)
• Stored procedures.
• The list goes on.
59. But, what about…
• Complex filtration that cannot be done in the
database (especially on small result sets).
• Pushing data to another application in an ELT-
style operation.
• Legitimately huge result sets (mass emailing).
• All fair, but examine if there is a way to keep
the data out of the Django application.
• Because…
60. How much many objects
are in memory at point A?
qs = Orders.objects.all()
# There are about 2,500,000 rows in “orders”.
for order in qs:
order.age_in_days += 1 # POINT A
order.save()
62. Wait, stop, that can’t be
right!
• Django does lazy evaluation… everyone tells
me so!
• The Django code carefully asks for a slice of
100 objects…
• which trickles down through lots of really
convoluted Python to psycopg2…
• which dutifully asks for 100 rows from
Postgres…
• which sends all 2,500,000 over the wire.
63. For the want of a named
cursor…
• The protocol between the Postgres client and
server only does partial sends when using
named cursors.
• psycopg2 fully supports named cursors.
• Django doesn’t use them.
• So, the first time you ask for any object in a
QuerySet, you get all of them.
• This is a very good reason not to ask for large
result sets.
64. OK, what about LIMIT?
• In Django, qs[:x] adds a LIMIT clause to the
SQL.
• Remember that LIMIT isn’t really useful
without a sort.
• And that the database has to sort the entire
result set before applying the LIMIT.
• An index on the sort key is a superb idea.
65. The perils of OFFSET.
• qs[x:y] does an OFFSET x LIMIT y-x.
• OFFSET has to consider and toss every object
from 1 to x-1.
• Very large OFFSETs are extremely inefficient.
• Much better to use queries on indexed
columns instead.
• For pagination, consider strongly limiting how
deep it can go.
66. IN.
• The most abused query operation in Django.
• It looks so innocent, just sitting there, doesn’t
it?
• Thing.objects.filter(id__in=my_little_list)
68. Large INs are a mess.
• Very expensive for the database to parse.
• Very expensive for the database to execute.
• If there are potentially more than 10-15 items
in the list, rework the IN as a JOIN against
whatever the source of the keys is.
69. QuerySet objects.
• Once evaluated, a QuerySet holds on to its
result set.
• Reuse the same result set whenever
possible!
• Remember that each filter you apply to a
QuerySet creates a new query set.
• That time can add up.
• It can be faster to drop to raw SQL for
super-complex queries.
70. Workload separation.
• The question was, “How many of each widget
has been bought to date”?
• The answer was, “Your site has ground to a halt
each time the vendor logs in and tries to find
out this information.”
71. What it looked like.
SELECT COUNT(*) FROM orders WHERE widget_id=3001;
SELECT COUNT(*) FORM orders WHERE widget_id=3013;
SELECT COUNT(*) FROM orders WHERE widget_id=3017;
SELECT COUNT(*) FORM orders WHERE widget_id=3022;
SELECT COUNT(*) FROM orders WHERE widget_id=3045;
SELECT COUNT(*) FORM orders WHERE widget_id=3056;
SELECT COUNT(*) FROM orders WHERE widget_id=3098;
SELECT COUNT(*) FORM orders WHERE widget_id=3104;
SELECT COUNT(*) FROM orders WHERE widget_id=3109;
SELECT COUNT(*) FORM orders WHERE widget_id=3117;
72. The code was, of course…
wgt = Widget.orders.filter(vendor=cur_vend)
total_sold = 0
for widget in wgt:
total_sold +=
Orders.objects.filter(widget=wgt.id).count()
73. Well, at least it wasn’t
an IN.
• Each vendor could run this query as much as
they wanted…
• … on the same system that was taking orders.
• Problem 1: Not the best written code, ever.
• Problem 2: Doing reporting on the
transactional system.
74. Separate reporting and
transactions.
• First option: Daily dump/restore of the
transactional system.
• Cheap and cheerful.
• One day lag.
• Impractical as the database grows.
• Second option: A replication solution.
75. Postgres streaming
replication
• New in 9.0!
• Real-time replication from one primary to
multiple secondaries.
• Primary is read/write; secondaries are
read-only.
• Very light load on primary to do replication.
76. Streaming replication:
the good news.
• Comes in the box.
• Very easy to configure and get going.
• Secondaries can be promoted to be primaries
(thus serving as hot standbys).
• Replication is near realtime.
• DDL is automatically distributed to
secondaries.
• Replicas are binary-identical to the primary.
77. Streaming replication:
the bad news.
• All or nothing: cannot replicate only some of
the tables.
• Queries on secondaries may have to be rerun
if a page changes out from under them.
• Promotion of secondaries to primary is not
(yet) an automated process.
79. Trigger-based replication:
the good news
• Highly configurable: replicate only certain
tables, databases, schemas.
• Lots of control over what gets replicated
when.
• Logical-level replication, so fewer mysteries
about query cancelation, vacuum delay, etc.
80. Trigger-based replication:
the bad news.
• Generally fiddly to set up.
• One more piece of software to deal with.
• Limited support for replication of tables
without primary keys (not much of a problem
for Django applications).
• Generally asynchronous.
81. Learn to love the ORM.
• Eschew iteration: let the database do the data
reduction part.
• Limit the size of query sets.
• Avoid problematic constructs, like IN and
OFFSET.
• Don’t be afraid of using raw SQL.
83. Celery: negative calories!
• Very popular Python job queuing system.
• Integrates very nicely with Django
• Used to require Django, in fact.
• It’s very nicely done, but…
• No matter what…
84. Don’t use the database as
the job queue!
• Celery polls the job queue.
• All the time.
• From every worker process.
• Generating hundreds to thousands of queries a
minute.
• Can easily take 100% of the capacity of the
database.
• So, don’t do that:
86. Let’s be honest with
ourselves.
• Django’s transaction management is a mess.
• It clearly “just grew” from humble origins.
• Even the terminology is utterly unclear.
• But there is hope!
• Let us don pith helmet and pick up our
machete to hack our way through the
underbrush.
87. Transactions out of the
box.
address = Address(street_address="1112 E Broad St",
city="Westfield", state="NJ", zip="07090")
address.save()
order = Order(customer_name="Gomez Addams",
shipping_address=address)
order.save()
88. Wrong, wrong, wrong.
• BEGIN;
• INSERT INTO Address VALUES (...);
• COMMIT;
• BEGIN;
• INSERT INTO Order VALUES (...);
• COMMIT;
89. Default behavior.
• If you do nothing, the default behavior is to
wrap each .save() in its own transaction.
• This is probably not what you want.
• If you are defining multi-object graphs, it is
definitely not what you want.
90. What do we want?
• Read-only view functions don’t bother creating
a transaction at all.
• Read/write view functions run in their own
transaction.
• COMMIT on success, ROLLBACK on failure.
91. Transaction Middleware
• Puts the database under “transaction
management.”
• Sets psycopg2 to open a transaction the first
time the database is touched.
• Automatically commits the transaction on
completion, rolls it back if an exception comes
flying out of the view function.
• Equivalent of wrapping the view function with
the @commit_on_success decorator.
92. Much closer!
• But read-only transactions still have a
transaction defined around them.
• Can we get rid of it?
• Yes, we can get rid of the transaction by
turning on autocommit.
• Wait, what?
93. autocommit.
How does it work?
• Worst. Parameter. Name. Ever.
• autocommit turns off automatic commits.
• You want to run it on when running on
Postgres…
• … but you need to understand how it
changes the standard model.
94. autocommit…
• … sets psycopg2 so that it does not
automatically start a transaction at all.
• Gets rid of the transaction on read-only
functions, but:
• Gets rid of the transaction on functions that
write, too.
• So, we need to wrap writing functions in the
@commit_on_success decorator.
95. IDLE IN TRANSACTION
• A Postgres session state, meaning the session
has an open transaction but is not working.
• They should be extremely rare.
• This state is not good, since locks and other
resources are being held open by the
transaction.
96. Session state of mystery
• Django application are notorious for leaving
sessions open in Idle in Transaction.
• The path through the Django code which
allows for this is extremely unclear.
• So, it is impossible, and yet it happens. Summer
of Code project, anyone?
• Always manage your transactions: That greatly
reduces the number of Idle in Transaction
sessions observed.
97. Transaction cheat-sheet.
• Run with autocommit on.
• Wrap database-modifying code with
@commit_on_success.
• Leave read-only code undecorated.
• Don’t forget to manually call
transaction.is_dirty() if you modify the
database outside of the ORM!
98. Locking: A Primer.
• Readers do not block readers to the same row
(it would be kind of dopey if they did).
• Writers do not block readers; they create a
new version of the row.
• Readers do not block writers; they read the
old version of the row.
• Writers, however, do block writers.
99. Kinds of locks.
• There are a whole ton of different kinds of
table locks.
• … which we are going to ignore.
• The only interesting lock for our purposes is
the exclusive row lock.
• This is automatically acquired when a process
updates or deletes a row.
• No other process can update or delete that
row until the first process commits.
100. Transactions and locks.
• Locks are only held for the duration of a
transaction.
• Once the transaction either commits or rolls
back, the lock is released.
• So: Keep transactions as fast as possible.
• Long-running, database-modifying transactions
are the source of most serious concurrency
problems.
101. Avoiding lock conflicts.
• Modify tables in a fixed order.
• When updating multiple rows in a table, do so
in a fixed order (by id or some other
invariant).
• Avoid bulk-updates to a single table from
multiple processes at the same time (UPDATE
is intrinsically unordered).
• DDL changes take exclusive locks!
102. Foreign-key locks.
• Updating a dependent record (usually!) takes a
share lock (as of 9.0) on the parent record.
• However, it does not always do so, which can
lead to deadlocks.
• Done to enforce relational integrity.
• This can be… surprising if you are not
expecting it.
104. Django view function style
• Collect the input data.
• Do the business logic.
• Gather the data necessary for the view
template.
• Fill in the context.
• Return, rendering the template.
105. The problem.
• The view code really doesn’t know what the
template is going to draw.
• So it has to fill in a superset of the possible
data.
• This can be extremely inefficient.
• There are two approaches to this.
106. The bad approach
c = RequestContext(request)
c[‘results’] = list(qs)
c[‘moresults’] = list(qs2)
c[‘ohyoumightneedthis’] = list(qs3)
c[‘ohdontforgetthat’] = list(qs4)
...
107. QuerySets are lazy!
• This is where the lazy evaluation of QuerySets
is most useful.
• Don’t run them unless you have to.
• Present the QuerySet itself to the template.
• For non-QuerySet data, pass in a callable that
retrieves the required data.
• This will only be called if needed, interacting
nicely with Django’s template caching.
109. ETL/ELT
• Bulk-loading data through Python is not super-
efficient.
• Consider using a dedicated ETL/ELT tool…
• … or just the COPY command.
• Remember that constraints and default values
are not pushed to the SQL by Django…
• … definitely consider adding them using
custom SQL.
110. External caching.
• If you aren’t caching templates and template
fragments, you should.
• memcached, redis.
• Also cache expensive query results.
• Model objects serialize just fine.
• But test this! For small result sets, it is not
always a win.
111. Connection pooling.
• If you have more than 100 simultaneous
connections, consider connection pooling.
• If you have more than 200, run, don’t walk!
• For Postgres, pgPool and PgBouncer.
• pgPool: More sophisticated, higher overhead,
more fiddly.
• PgBouncer: Cheap and Cheerful.
112. Load balancing:
one primary.
• Django 1.2 has multi-database.
• Postgres 9.0 has streaming replication.
• Two great tastes that taste great together!
113. Django configuration
• Set up two database routes:
1. Direct all write activity towards the primary.
2. Direct all read activity towards the
secondary.
• That’s it. It’s that simple.
• Reads come from the secondary, writes go to
the primary.
114. Replication lag.
• We do have to worry about replication lag
anytime we’re doing replication.
• In this particular configuration, it tends to be a
non-problem:
• Postgres stream replication is fast.
• Django’s usual view function idiom does not
do a lot of reads-after-writes to the same
data set.
115. Fixing replication lag.
• Of course, it’ll come up anyway.
1. Don’t do read-after-write to the same data; use
the values you just wrote, already.
2. If you must do a read-after-write (trigger-
enforced values, etc.), explicitly direct the read
to the primary in the view function.
116. Multiple secondaries.
• Create a pool (using pgPool, PgBouncer)
among all the secondaries.
• Direct Django to that pooled secondary.
• Profit!
117. Multi-master replication.
• Bucardo supports multi-master replication.
• Need a primary key strategy that avoids
conflicts between machines.
• Django’s default behavior won’t cut it.
• Manually create sequences with offset
values, use UUIDs, etc.
• Details are application specific; ask me later if
you want more info.
119. The super-secret
debugging tool.
• tail -f
• Keep a terminal window open and tail the
Postgres log during development.
• csvlog on (for analysis use)
• log_min_duration = 0, log_checkpoints = on,
log_connections/disconnections = on
• auto_explain
120. Wait.You’re doing what?
• Always use Postgres during development if
that’s your deployment target.
• You can use SQLite for development if you are
going to deploy on SQLite.
• You’re not, are you?
121. But what am I looking for?
• Huge, awful-looking SQL statements.
• Surprising query plans (big sequential scans).
• Checkpoints coming too close together (might
just be a config problem, might be modifying
the database too heavily).
122. Index usage.
• pg_stats_activity.
• Postgres built-in view that tracks index and
table usage.
• Extremely helpful for finding unused indexes.
• Extremely helpful for finding tables that are
getting lots of sequential scans.
123. Locking.
• pg_locks.
• Shows the current table lock situation.
• Does not show row locks.
• pgrowlocks (contrib module) shows row
locks.
124. pgFouine
• Log file analyzer.
• Reports most common queries, most
expensive queries.
• Extremely valuable in Django environments, as
the queries are very stylized.