The document discusses understanding Postgres query plans to optimize database queries. It begins by explaining that understanding the query plan can help identify why queries are slow and whether indexes are being used efficiently. It then covers what happens when queries are executed, including parsing, planning, optimization and execution. The document demonstrates explaining queries to view the query plan and dives into different plan types like sequential scans, index scans, bitmap heap scans, nested loops joins, hash joins and merge joins. It also discusses ordering queries and techniques like top-N heapsort. The overall message is that viewing and understanding the query plan is key to writing efficient database queries.
The document summarizes various Python profiling tools. It discusses using the time utility and time module to measure elapsed time. It also covers the profile, cProfile, hotshot, lineprofiler, memoryprofiler, and objgraph modules for profiling code performance and memory usage. Examples are given showing how each tool can be used and the type of output it provides.
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 AutumnMasashi Shibata
1. The document describes using Goptuna, an open source Bayesian optimization library for Python and Go, to optimize hyperparameters for an ISUCON competition application.
2. It shows how Goptuna can suggest values for various configuration parameters like MySQL, Nginx, and Go application settings to optimize the application performance.
3. Running the optimization with Goptuna over 100 trials was able to find parameter configurations that improved the ISUCON score from 9560 to over 10,000 points.
발표자: 김준호(Lunit)
발표일: 2018.1.
의료 AI 관련 중, nodule detection 문제에 대해 다뤄보고자 합니다.
의료 AI에서는 어떠한 방식으로 classication을 하고, preprocessing은 어떤식으로 진행되는지 LUNA16이라는 의료 challenge에 이용되는 데이터를 가지고 발표를 진행해보고자 합니다.
이후, 이 데이터를 이용해서 최근 2017 MICCAI (의료 영상학회에서는 높은 수준의 학회)에서 발표된 "curriculum adaptive sampling for extreme data imbalance"를 실제 구현해서 적용해보고 이 때 발생할 수 있는 문제를 어떤식으로 해결할 수 있는지에 대한 tip도 제공할 예정입니다. (Python multi-processing data load, input-pipeline)
위 논문을 선정한 이유는, 단순한 classification이 아닌, nodule이 있는 위치도 정확하게 catch하는 논문 중, performance가 상당히 높기 때문입니다.
This document provides an overview and introduction to NumPy, a fundamental package for scientific computing in Python. It discusses NumPy's core capabilities like N-dimensional arrays and universal functions for fast element-wise operations. The document also briefly introduces SciPy which builds upon NumPy and provides many scientific algorithms. Finally, it demonstrates basic NumPy operations like creating arrays, slicing, indexing, and plotting to visualize data.
How fast ist it really? Benchmarking in practiceTobias Pfeiffer
“What’s the fastest way of doing this?” - you might ask yourself during development. Sure, you can guess what’s fastest or how long something will take, but do you know? How long does it take to sort a list of 1 Million elements? Are tail-recursive functions always the fastest?
Benchmarking is here to answer these questions. However, there are many pitfalls around setting up a good benchmark and interpreting the results. This talk will guide you through, introduce best practices and show you some surprising benchmarking results along the way.
- The document demonstrates various commands for exploring and summarizing data in R such as the iris data set including head(), tail(), str(), class(), summary(), and $-operator.
- The iris data set contains measurement data for 150 flowers across 4 variables and is stored as a data frame object in R.
- Data frames allow storing different data types together and can be explored using commands like summary() which provides summaries tailored to each variable type.
- Matrices can also be used to store multi-dimensional data and various functions like dim(), apply(), and cbind() allow manipulating the dimensions and combining matrices.
The Ring programming language version 1.5.1 book - Part 44 of 180Mahmoud Samir Fayed
This chapter discusses using the Allegro game programming library in Ring applications. It shows how to load the Allegro library, initialize it, and create a display. Examples are provided for drawing objects, animating their movement, and getting input from the keyboard and mouse. Key functions covered include al_init(), al_create_display(), al_draw_bitmap(), al_flip_display(), al_rest(), al_load_bitmap(), al_register_event_source(), al_wait_for_event(), and checking keyboard and mouse input. The chapter also discusses using classes for graphics and games programming with Allegro in Ring.
Elixir is a functional programming language that is well-suited for building scalable and fault-tolerant applications. The document provides an introduction to Elixir by discussing its roots in Erlang and how it builds upon Erlang's strengths like concurrency, distribution, and fault tolerance. It also demonstrates some basic Elixir concepts like functions, pattern matching, recursion, and the BEAM virtual machine. Finally, it provides examples of real-world applications of Elixir like building Phoenix web applications and developing embedded hardware projects with Nerves.
The document summarizes various Python profiling tools. It discusses using the time utility and time module to measure elapsed time. It also covers the profile, cProfile, hotshot, lineprofiler, memoryprofiler, and objgraph modules for profiling code performance and memory usage. Examples are given showing how each tool can be used and the type of output it provides.
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 AutumnMasashi Shibata
1. The document describes using Goptuna, an open source Bayesian optimization library for Python and Go, to optimize hyperparameters for an ISUCON competition application.
2. It shows how Goptuna can suggest values for various configuration parameters like MySQL, Nginx, and Go application settings to optimize the application performance.
3. Running the optimization with Goptuna over 100 trials was able to find parameter configurations that improved the ISUCON score from 9560 to over 10,000 points.
발표자: 김준호(Lunit)
발표일: 2018.1.
의료 AI 관련 중, nodule detection 문제에 대해 다뤄보고자 합니다.
의료 AI에서는 어떠한 방식으로 classication을 하고, preprocessing은 어떤식으로 진행되는지 LUNA16이라는 의료 challenge에 이용되는 데이터를 가지고 발표를 진행해보고자 합니다.
이후, 이 데이터를 이용해서 최근 2017 MICCAI (의료 영상학회에서는 높은 수준의 학회)에서 발표된 "curriculum adaptive sampling for extreme data imbalance"를 실제 구현해서 적용해보고 이 때 발생할 수 있는 문제를 어떤식으로 해결할 수 있는지에 대한 tip도 제공할 예정입니다. (Python multi-processing data load, input-pipeline)
위 논문을 선정한 이유는, 단순한 classification이 아닌, nodule이 있는 위치도 정확하게 catch하는 논문 중, performance가 상당히 높기 때문입니다.
This document provides an overview and introduction to NumPy, a fundamental package for scientific computing in Python. It discusses NumPy's core capabilities like N-dimensional arrays and universal functions for fast element-wise operations. The document also briefly introduces SciPy which builds upon NumPy and provides many scientific algorithms. Finally, it demonstrates basic NumPy operations like creating arrays, slicing, indexing, and plotting to visualize data.
How fast ist it really? Benchmarking in practiceTobias Pfeiffer
“What’s the fastest way of doing this?” - you might ask yourself during development. Sure, you can guess what’s fastest or how long something will take, but do you know? How long does it take to sort a list of 1 Million elements? Are tail-recursive functions always the fastest?
Benchmarking is here to answer these questions. However, there are many pitfalls around setting up a good benchmark and interpreting the results. This talk will guide you through, introduce best practices and show you some surprising benchmarking results along the way.
- The document demonstrates various commands for exploring and summarizing data in R such as the iris data set including head(), tail(), str(), class(), summary(), and $-operator.
- The iris data set contains measurement data for 150 flowers across 4 variables and is stored as a data frame object in R.
- Data frames allow storing different data types together and can be explored using commands like summary() which provides summaries tailored to each variable type.
- Matrices can also be used to store multi-dimensional data and various functions like dim(), apply(), and cbind() allow manipulating the dimensions and combining matrices.
The Ring programming language version 1.5.1 book - Part 44 of 180Mahmoud Samir Fayed
This chapter discusses using the Allegro game programming library in Ring applications. It shows how to load the Allegro library, initialize it, and create a display. Examples are provided for drawing objects, animating their movement, and getting input from the keyboard and mouse. Key functions covered include al_init(), al_create_display(), al_draw_bitmap(), al_flip_display(), al_rest(), al_load_bitmap(), al_register_event_source(), al_wait_for_event(), and checking keyboard and mouse input. The chapter also discusses using classes for graphics and games programming with Allegro in Ring.
Elixir is a functional programming language that is well-suited for building scalable and fault-tolerant applications. The document provides an introduction to Elixir by discussing its roots in Erlang and how it builds upon Erlang's strengths like concurrency, distribution, and fault tolerance. It also demonstrates some basic Elixir concepts like functions, pattern matching, recursion, and the BEAM virtual machine. Finally, it provides examples of real-world applications of Elixir like building Phoenix web applications and developing embedded hardware projects with Nerves.
The document discusses the deque collection in Python. Some key points:
- Deque allows fast appends and pops from either side of the list, with O(1) time complexity, unlike regular lists which are slow (O(n)) for pop(0) and insert(0,v).
- Deque provides methods like append, appendleft, popleft, pop for adding/removing elements from either side of the list.
- It can be initialized with a maximum length to act as a sliding window, discarding old elements as new ones are added.
- Methods like rotate rotate the deque a given number of positions, extending adds multiple elements at once. Deque is useful when
The document contains technical information about software vulnerabilities and security exploits. It discusses memory corruption issues like buffer overflows, use-after-free vulnerabilities, and heap overflow attacks. It also covers injection attacks, deserialization of untrusted data, container escapes, and other common software vulnerabilities. The document emphasizes the importance of secure coding practices, threat modeling, code reviews, and security testing to identify and address vulnerabilities.
The Ring programming language version 1.5.2 book - Part 45 of 181Mahmoud Samir Fayed
1. Initialize Allegro and load necessary addons like images.
2. Create a display and show a message box to initialize the window.
3. Draw shapes and bitmaps to the display and flip periodically to animate.
4. Set up event handling for input from keyboard, mouse, and timer to control animation.
5. Inside the game loop: handle input, update object positions, redraw, and flip display continuously.
Here are the R commands to create the requested graph from the MASS leuk dataset and save it as MASSleuk.jpeg:
```r
data(leuk)
windows()
par(mfrow=c(2,2))
plot(leuk$time, main="Scatter plot of time", ylab="time")
hist(leuk$time, main="Histogram of time", xlab="time")
boxplot(leuk$time, main="Boxplot of time")
qqnorm(leuk$time); qqline(leuk$time)
dev.copy(png, "MASSleuk.jpeg")
```
This will open a graphics window,
This document provides examples of built-in functions and decorators in Python like map, filter, all, any, getattr, hasattr, setattr, callable, isinstance, issubclass, closures, and memoization decorators. It demonstrates how to use these functions and decorators through examples. Built-in functions like map, filter and decorators allow extending functionality of functions. Closures enable functions to remember values in enclosing scopes. The @decorator syntax is demonstrated to be equivalent to applying a function to another function.
A tour of Python: slides from presentation given in 2012.
[Some slides are not properly rendered in SlideShare: the original is still available at http://www.aleksa.org/2015/04/python-presentation_7.html.]
yt: An Analysis and Visualization System for Astrophysical Simulation DataJohn ZuHone
yt is a platform for analyzing and visualizing astrophysical simulation data. It supports many common simulation codes and provides physically motivated objects, fields, and quantities to analyze simulations. It uses Python for scripting and includes tools for basic plotting, time series analysis, and volume rendering. The open source project is developed and maintained by an active team seeking to provide a unified approach to extracting insights from astrophysical simulations.
The document discusses clustering and numpy arrays in Python. It shows how to create arrays using numpy, perform operations like summing and finding min/max values, and access elements and slices. It also introduces Cython and demonstrates compiling a simple "Hello World" Cython program and using Cython to optimize a Python prime number generation function for improved performance.
This document discusses statistical computing in RStudio. It covers importing and browsing data, data types, and hands-on exercises. It also demonstrates basic math operations, using packages, getting help, and best practices for creating R documents.
This document discusses different options for parsing command line arguments in Python scripts, including raw argv parsing, getopt, argparse, and docopt. It notes that raw argv parsing and getopt are old-style parsing methods, while argparse is built into Python but may be complex. Docopt is introduced as a module that focuses on usage documentation rather than code, allowing the usage to be defined in a script's docstring. Examples are provided for argparse and more complex usages with docopt.
High performance GPU computing with Ruby RubyConf 2017Prasun Anand
The document discusses high performance GPU computing using Ruby. It summarizes ArrayFire, an open-source library that provides GPU-accelerated operations on multi-dimensional arrays. It also discusses RbCUDA, a Ruby wrapper for Nvidia's CUDA API that allows running CUDA kernels and using CUDA libraries like CuBLAS from Ruby. Benchmark results show ArrayFire and RbCUDA can achieve speedups of over 100,000x compared to non-GPU Ruby libraries for numeric tasks by leveraging the parallel processing power of GPUs. Contributions to the ArrayFire and RbCUDA open source projects are welcomed to further improve high performance GPU computing capabilities in Ruby.
The Ring programming language version 1.10 book - Part 56 of 212Mahmoud Samir Fayed
1. Load the Allegro library by loading the gamelib.ring file
2. Initialize Allegro and set up the display
3. Create bitmaps and draw objects onto the display
4. Add animation by moving objects and redrawing the display periodically
5. Handle user input events from the keyboard and mouse
6. Use a game loop to continuously update the display based on input
This presentation cycles through all of the `highlight` utility themes/styles on black/white backgrounds, so you can be informed when choosing one for your code examples in Keynote.
Elixir & Phoenix – fast, concurrent and explicitTobias Pfeiffer
Elixir and Phoenix are known for their speed, but that’s far from their only benefit. Elixir isn’t just a fast Ruby and Phoenix isn’t just Rails for Elixir. Through pattern matching, immutable data structures and new idioms your programs can not only become faster but more understandable and maintainable. This talk will take a look at what’s great, what you might miss and augment it with production experience and advice.
Elixir & Phoenix – fast, concurrent and explicitTobias Pfeiffer
Elixir and Phoenix are known for their speed, but that’s far from their only benefit. Elixir isn’t just a fast Ruby and Phoenix isn’t just Rails for Elixir. Through pattern matching, immutable data structures and new idioms your programs can not only become faster but more understandable and maintainable. This talk will take a look at what’s great, what you might miss and augment it with production experience and advice.
This document summarizes a Python presentation on feature selection. It discusses several common feature selection techniques like LASSO, random forests, and PCA. Code examples are provided to demonstrate how to perform feature selection on the Iris dataset using these methods in scikit-learn. Dimensionality reduction with PCA and word embeddings with Gensim for text are also briefly covered. The presentation aims to provide practical demonstrations of feature selection rather than theoretical explanations.
The document discusses the EXPLAIN command in PostgreSQL, which shows the query execution plan chosen by the planner. It describes different scan methods like sequential scan, index scan, and bitmap heap scan. It also covers join methods like nested loop, hash, and merge join. Metrics like cost, rows, and width are explained. Different optimization techniques and tools for analyzing EXPLAIN plans are mentioned at the end.
[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지Jeongah Shin
Droid Knights 2019 세션 발표 내용입니다.
주제 : Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지
내용 : 간단한 데모앱의 제작 과정을 되짚어보며 안드로이드 환경에서 Tensorflow Lite과 ML Kit을 활용하여 모바일 머신러닝 앱을 만드는 방법에 대해 설명 합니다. 더불어, Tensorflow Lite에서 2019년 1월 출시된 Feature인 Mobile GPU의 활용 까지 다룹니다.
* gif 움짤이 많아 세션 내용을 담은 문서를 따로 정리하였습니다.
https://github.com/motlabs/awesome-ml-demos-with-android
Slides from Advaned Python lectures I gave recently in Haifa Linux club
Advanced python, Part 2:
- Slots vs Dictionaries
- Basic and Advanced Generators
- Async programming
The document discusses performance issues that can arise from ORM queries and provides techniques for identifying and resolving them. It explains that ORMs may execute unexpected queries or optimized queries, and recommends monitoring database logs to see what queries are running. It also demonstrates how to use EXPLAIN to analyze slow queries, and discusses using indexes, select_related, prefetch_related and other techniques to improve query performance.
- The document discusses performance problems that can arise from ORM use and how to identify them. It recommends examining database logs to see queries being executed. Different types of scans like sequential, index, and bitmap scans are explained. Techniques like select_related, prefetch_related, and using indexes are suggested to reduce queries. The EXPLAIN command is demonstrated to analyze query plans and identify optimizations.
The document discusses the deque collection in Python. Some key points:
- Deque allows fast appends and pops from either side of the list, with O(1) time complexity, unlike regular lists which are slow (O(n)) for pop(0) and insert(0,v).
- Deque provides methods like append, appendleft, popleft, pop for adding/removing elements from either side of the list.
- It can be initialized with a maximum length to act as a sliding window, discarding old elements as new ones are added.
- Methods like rotate rotate the deque a given number of positions, extending adds multiple elements at once. Deque is useful when
The document contains technical information about software vulnerabilities and security exploits. It discusses memory corruption issues like buffer overflows, use-after-free vulnerabilities, and heap overflow attacks. It also covers injection attacks, deserialization of untrusted data, container escapes, and other common software vulnerabilities. The document emphasizes the importance of secure coding practices, threat modeling, code reviews, and security testing to identify and address vulnerabilities.
The Ring programming language version 1.5.2 book - Part 45 of 181Mahmoud Samir Fayed
1. Initialize Allegro and load necessary addons like images.
2. Create a display and show a message box to initialize the window.
3. Draw shapes and bitmaps to the display and flip periodically to animate.
4. Set up event handling for input from keyboard, mouse, and timer to control animation.
5. Inside the game loop: handle input, update object positions, redraw, and flip display continuously.
Here are the R commands to create the requested graph from the MASS leuk dataset and save it as MASSleuk.jpeg:
```r
data(leuk)
windows()
par(mfrow=c(2,2))
plot(leuk$time, main="Scatter plot of time", ylab="time")
hist(leuk$time, main="Histogram of time", xlab="time")
boxplot(leuk$time, main="Boxplot of time")
qqnorm(leuk$time); qqline(leuk$time)
dev.copy(png, "MASSleuk.jpeg")
```
This will open a graphics window,
This document provides examples of built-in functions and decorators in Python like map, filter, all, any, getattr, hasattr, setattr, callable, isinstance, issubclass, closures, and memoization decorators. It demonstrates how to use these functions and decorators through examples. Built-in functions like map, filter and decorators allow extending functionality of functions. Closures enable functions to remember values in enclosing scopes. The @decorator syntax is demonstrated to be equivalent to applying a function to another function.
A tour of Python: slides from presentation given in 2012.
[Some slides are not properly rendered in SlideShare: the original is still available at http://www.aleksa.org/2015/04/python-presentation_7.html.]
yt: An Analysis and Visualization System for Astrophysical Simulation DataJohn ZuHone
yt is a platform for analyzing and visualizing astrophysical simulation data. It supports many common simulation codes and provides physically motivated objects, fields, and quantities to analyze simulations. It uses Python for scripting and includes tools for basic plotting, time series analysis, and volume rendering. The open source project is developed and maintained by an active team seeking to provide a unified approach to extracting insights from astrophysical simulations.
The document discusses clustering and numpy arrays in Python. It shows how to create arrays using numpy, perform operations like summing and finding min/max values, and access elements and slices. It also introduces Cython and demonstrates compiling a simple "Hello World" Cython program and using Cython to optimize a Python prime number generation function for improved performance.
This document discusses statistical computing in RStudio. It covers importing and browsing data, data types, and hands-on exercises. It also demonstrates basic math operations, using packages, getting help, and best practices for creating R documents.
This document discusses different options for parsing command line arguments in Python scripts, including raw argv parsing, getopt, argparse, and docopt. It notes that raw argv parsing and getopt are old-style parsing methods, while argparse is built into Python but may be complex. Docopt is introduced as a module that focuses on usage documentation rather than code, allowing the usage to be defined in a script's docstring. Examples are provided for argparse and more complex usages with docopt.
High performance GPU computing with Ruby RubyConf 2017Prasun Anand
The document discusses high performance GPU computing using Ruby. It summarizes ArrayFire, an open-source library that provides GPU-accelerated operations on multi-dimensional arrays. It also discusses RbCUDA, a Ruby wrapper for Nvidia's CUDA API that allows running CUDA kernels and using CUDA libraries like CuBLAS from Ruby. Benchmark results show ArrayFire and RbCUDA can achieve speedups of over 100,000x compared to non-GPU Ruby libraries for numeric tasks by leveraging the parallel processing power of GPUs. Contributions to the ArrayFire and RbCUDA open source projects are welcomed to further improve high performance GPU computing capabilities in Ruby.
The Ring programming language version 1.10 book - Part 56 of 212Mahmoud Samir Fayed
1. Load the Allegro library by loading the gamelib.ring file
2. Initialize Allegro and set up the display
3. Create bitmaps and draw objects onto the display
4. Add animation by moving objects and redrawing the display periodically
5. Handle user input events from the keyboard and mouse
6. Use a game loop to continuously update the display based on input
This presentation cycles through all of the `highlight` utility themes/styles on black/white backgrounds, so you can be informed when choosing one for your code examples in Keynote.
Elixir & Phoenix – fast, concurrent and explicitTobias Pfeiffer
Elixir and Phoenix are known for their speed, but that’s far from their only benefit. Elixir isn’t just a fast Ruby and Phoenix isn’t just Rails for Elixir. Through pattern matching, immutable data structures and new idioms your programs can not only become faster but more understandable and maintainable. This talk will take a look at what’s great, what you might miss and augment it with production experience and advice.
Elixir & Phoenix – fast, concurrent and explicitTobias Pfeiffer
Elixir and Phoenix are known for their speed, but that’s far from their only benefit. Elixir isn’t just a fast Ruby and Phoenix isn’t just Rails for Elixir. Through pattern matching, immutable data structures and new idioms your programs can not only become faster but more understandable and maintainable. This talk will take a look at what’s great, what you might miss and augment it with production experience and advice.
This document summarizes a Python presentation on feature selection. It discusses several common feature selection techniques like LASSO, random forests, and PCA. Code examples are provided to demonstrate how to perform feature selection on the Iris dataset using these methods in scikit-learn. Dimensionality reduction with PCA and word embeddings with Gensim for text are also briefly covered. The presentation aims to provide practical demonstrations of feature selection rather than theoretical explanations.
The document discusses the EXPLAIN command in PostgreSQL, which shows the query execution plan chosen by the planner. It describes different scan methods like sequential scan, index scan, and bitmap heap scan. It also covers join methods like nested loop, hash, and merge join. Metrics like cost, rows, and width are explained. Different optimization techniques and tools for analyzing EXPLAIN plans are mentioned at the end.
[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지Jeongah Shin
Droid Knights 2019 세션 발표 내용입니다.
주제 : Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지
내용 : 간단한 데모앱의 제작 과정을 되짚어보며 안드로이드 환경에서 Tensorflow Lite과 ML Kit을 활용하여 모바일 머신러닝 앱을 만드는 방법에 대해 설명 합니다. 더불어, Tensorflow Lite에서 2019년 1월 출시된 Feature인 Mobile GPU의 활용 까지 다룹니다.
* gif 움짤이 많아 세션 내용을 담은 문서를 따로 정리하였습니다.
https://github.com/motlabs/awesome-ml-demos-with-android
Slides from Advaned Python lectures I gave recently in Haifa Linux club
Advanced python, Part 2:
- Slots vs Dictionaries
- Basic and Advanced Generators
- Async programming
The document discusses performance issues that can arise from ORM queries and provides techniques for identifying and resolving them. It explains that ORMs may execute unexpected queries or optimized queries, and recommends monitoring database logs to see what queries are running. It also demonstrates how to use EXPLAIN to analyze slow queries, and discusses using indexes, select_related, prefetch_related and other techniques to improve query performance.
- The document discusses performance problems that can arise from ORM use and how to identify them. It recommends examining database logs to see queries being executed. Different types of scans like sequential, index, and bitmap scans are explained. Techniques like select_related, prefetch_related, and using indexes are suggested to reduce queries. The EXPLAIN command is demonstrated to analyze query plans and identify optimizations.
This presentation is a fast-paced walk-through of very useful but occasionally lesser-known features of Postgres, the open source database. There is a blog post with links to more details coverage of the various topics that accompanies the presentation: https://medium.com/cognite/postgres-can-do-that-f221a8046e?source=friends_link&sk=18fa08c6b82f5aff6744478b07292e1e
The talk covers widely, providing lots of pointers to select resources that go deeper. The goal is that you will hear of several topics to learn more about – whether it's when developing, live debugging or learning to avoid production problems in the first place.
Example nuggets:
- How does this query actually execute, and how does it change as data grows?
- How can I easily create large amounts of test data?
- What's slow in production right now?
- How can I apply my schema changes without requiring a maintenance window?
- What's powering the Postgres-backed GraphQL engines?
The talk presents you with several appetizers to tempt you to go deeper with Postgres on your own. If you consume all the references provided, you may have several days worth of material to dig into – and a much bigger tool box.
There have been plenty of “explaining EXPLAIN” type talks over the years, which provide a great introduction to it. They often also cover how to identify a few of the more common issues through it. EXPLAIN is a deep topic though, and to do a good introduction talk, you have to skip over a lot of the tricky bits. As such, this talk will not be a good introduction to EXPLAIN, but instead a deeper dive into some of the things most don’t cover. The idea is to start with some of the more complex and unintuitive calculations needed to work out the relationships between operations, rows, threads, loops, timings, buffers, CTEs and subplans. Most popular tools handle at least several of these well, but there are cases where they don’t that are worth being conscious of and alert to. For example, we’ll have a look at whether certain numbers are averaged per-loop or per-thread, or both. We’ll also cover a resulting rounding issue or two to be on the lookout for. Finally, some per-operation timing quirks are worth looking out for where CTEs and subqueries are concerned, for example CTEs that are referenced more than once. As time allows, we can also look at a few rarer issues that can be spotted via EXPLAIN, as well as a few more gotchas that we’ve picked up along the way. This includes things like spotting when the query is JIT, planning, or trigger time dominated, spotting the signs of table and index bloat, issues like lossy bitmap scans or index-only scans fetching from the heap, as well as some things to be aware of when using auto_explain.
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
This document provides an overview of effective numerical computation in NumPy and SciPy. It discusses how Python can be used for numerical computation tasks like differential equations, simulations, and machine learning. While Python is initially slower than languages like C, libraries like NumPy and SciPy allow Python code to achieve sufficient speed through techniques like broadcasting, indexing, and using sparse matrix representations. The document provides examples of how to efficiently perform tasks like applying functions element-wise to sparse matrices and calculating norms. It also presents a case study for efficiently computing a formula that appears in a machine learning paper using different sparse matrix representations in SciPy.
This document discusses various techniques for optimizing Python code, including:
1. Using the right algorithms and data structures to minimize time complexity, such as choosing lists, sets or dictionaries based on needed functionality.
2. Leveraging Python-specific optimizations like string concatenation, lookups, loops and imports.
3. Profiling code with tools like timeit, cProfile and visualizers to identify bottlenecks before optimizing.
4. Optimizing only after validating a performance need and starting with general strategies before rewriting hotspots in Python or other languages. Premature optimization can complicate code.
Pygrunn 2012 down the rabbit - profiling in pythonRemco Wendt
The document discusses various tools for profiling Python code such as cProfile, profile, hotshot, line profiler, and trace to identify inefficient code and potential bottlenecks. It also covers memory profiling tools like Heapy and Meliae. Effective profiling requires understanding different profiling techniques, tools, and how to analyze the output to optimize performance and memory usage.
- Install Python 2.5 or 2.6 and SQLAlchemy 0.5 using easy_install
- Michael Bayer created SQLAlchemy and is a software architect in New York City
- SQLAlchemy allows modeling database queries and relationships between objects in a more Pythonic way compared to raw SQL
Problem 1 Show the comparison of runtime of linear search and binar.pdfebrahimbadushata00
The document describes two problems:
1) Comparing the runtime of linear search and binary search on random data sets of increasing sizes from 50,000 to 300,000 elements. The worst case runtime is reported.
2) Comparing the runtime of bubble sort and merge sort on the same random data sets. The algorithms sort the data in ascending order.
Java code is provided to generate the random data, implement the algorithms, and output the runtimes in nanoseconds. Line charts and tables are to be created from the output data to compare the performance of the different algorithms.
Presented at 3|SHARE's EVOLVE'14 - The Adobe Experience Manager Community Summit on Wednesday November 19th, 2014 at the Hard Rock Hotel in San Diego, CA. evolve14.com
This document provides an overview of search and indexing in Adobe Experience Manager using Apache Oak. It discusses Oak query implementation, cost calculation, and various index implementations like property, ordered, Lucene, Solr, and traversing indexes. It provides details on how indexes are defined, when to reindex, debugging cost calculation, and troubleshooting Solr. The primary messages are that search is significantly different between CRX2 and Oak, Oak provides more optimization opportunities but requires more configuration, and indexes need to be understood to optimize query performance.
The document discusses various topics related to optimizing performance for PostgreSQL including:
- Indexes and how to use EXPLAIN and EXPLAIN ANALYZE to analyze query performance. Conditional, functional and concurrent indexes are covered.
- Connection pooling options for Django like django-postgrespool to improve connection management.
- Replication options such as Slony, Bucardo, pgpool, WAL-E and Barman for high availability.
- Backup strategies including logical backups with pg_dump and physical backups using base backups. When each approach is best to use.
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
PostgreSQL is a battle-tested, open source database with a colorful history dating back to 1987. It has many advantages for a next project, including support for multiple programming languages for stored procedures, handling of XML and JSON, strong error reporting and logging, and window functions. It has a solid architecture with well-designed processes for handling write-ahead logs, statistics collection, and query optimization. While PostgreSQL has a learning curve, its longevity, stability, feature set and performance make it a great choice for many applications.
Down the rabbit hole, profiling in DjangoRemco Wendt
The document discusses various tools for profiling Python code such as cProfile, profile, hotshot, line profiler, and trace to identify inefficient code and bottlenecks. It covers using these tools to profile CPU and I/O bound problems as well as memory profiling issues. The document also demonstrates how to optimize code through caching, removing unnecessary function calls, and memoization.
This one is about advanced indexing in PostgreSQL. It guides you through basic concepts as well as through advanced techniques to speed up the database.
All important PostgreSQL Index types explained: btree, gin, gist, sp-gist and hashes.
Regular expression indexes and LIKE queries are also covered.
The document discusses how the Oracle optimizer can sometimes choose suboptimal execution plans, leading to performance deterioration. It presents a scenario where the same query runs much slower when bind variables are used. The document then shows how SQL profiles can be used to enforce a better execution plan. It argues that manually creating profiles is not ideal for 24/7 environments. The document proposes using machine learning for outlier detection to identify performance issues and then automatically generate SQL profiles to address the issues. Code examples are provided for outlier detection and generating profiles through the Oracle API to allow automating the process.
A slightly-modified version of my IPRUG talk, this time for the BT DevCon5 developer conference at Adastral Park on 25 May 2012.
The main changes are the addition of the Ruby section and the increased number of HHGTTG references in honour of towel day.
This document summarizes an advanced Python programming course, covering topics like performance tuning, garbage collection, and extending Python. It discusses profiling Python code to find bottlenecks, using more efficient algorithms and data structures, optimizing code through techniques like reducing temporary objects and inline functions, leveraging faster tools like NumPy, writing extension modules in C, and parallelizing computation across CPUs and clusters. It also explains basic garbage collection algorithms like reference counting and mark-and-sweep used in CPython.
This document discusses various techniques for optimizing Python code to improve performance. It begins by explaining that Python is an interpreted language and is generally slower than compiled languages like C/C++. Several methods for speeding up Python code are then presented: using local variables instead of global variables, leveraging built-in functions, list comprehensions, generator expressions, NumPy for numeric computing, Numba for just-in-time compilation, and algorithm/data structure optimization. Specific code examples are provided to demonstrate how these techniques can significantly reduce runtime. The key message is that with the right optimizations, Python code can achieve speeds comparable to lower-level languages while retaining the benefits of a high-level, interpreted language.
Gotcha! Ruby things that will come back to bite you.David Tollmyr
The document discusses various performance optimizations for JRuby applications. It covers techniques like avoiding unnecessary string creation, using java.util.concurrent utilities for concurrency instead of Ruby's Mutex, and avoiding shelling out from JRuby when possible. The author also shares lessons learned around array joins, queue implementations, and passing binary strings between Ruby and Java.
Similar to Becoming a better developer with EXPLAIN (20)
The document provides an overview of different index types in Postgres including B-Tree, GIN, GiST, and BRIN indexes. It discusses what each index type is best suited for, how to create each type of index, and their internal data structures. Specifically, it covers that B-Tree indexes are good for equality comparisons, GIN indexes store unique values efficiently for arrays/JSON and are useful for containment operators, GiST indexes allow overlapping ranges and are useful for nearest neighbor searches, and BRIN indexes provide scalable indexing for large tables.
SQL can seem like an obscure but somehow useful language. In this talk we will look into things that SQL can do, sometimes more easily than using python, and how to get it in your ORM, running in your application. During this talk we will use an application analysing the lyrics of my favorite teenage band and show fun examples of these SQL statements, and how to integrate them in your code
Want to know everything about indexes in postgres? Here are the slides for a postgresql talk, and if you want to know more, you can read articles on www.louisemeta.com
The document provides an overview of indexes in Postgres, including B-Trees, GIN, and GiST indexes. It discusses:
1) What B-Tree indexes store key-pointer pairs to optimize queries. The keys are ordered and pages are linked in a balanced tree structure. GIN indexes split arrays into unique keys and store posting lists in leaves. GiST indexes allow overlapping key ranges and are not ordered.
2) How B-Tree pages contain high keys, pointers, and items. GIN indexes store pending entries in a list until vacuumed. GiST indexes use consistency functions to determine child page checks during searches.
3) The processes for searching, inserting, and deleting in
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
"NATO Hackathon Winner: AI-Powered Drug Search", Taras KlobaFwdays
This is a session that details how PostgreSQL's features and Azure AI Services can be effectively used to significantly enhance the search functionality in any application.
In this session, we'll share insights on how we used PostgreSQL to facilitate precise searches across multiple fields in our mobile application. The techniques include using LIKE and ILIKE operators and integrating a trigram-based search to handle potential misspellings, thereby increasing the search accuracy.
We'll also discuss how the azure_ai extension on PostgreSQL databases in Azure and Azure AI Services were utilized to create vectors from user input, a feature beneficial when users wish to find specific items based on text prompts. While our application's case study involves a drug search, the techniques and principles shared in this session can be adapted to improve search functionality in a wide range of applications. Join us to learn how PostgreSQL and Azure AI can be harnessed to enhance your application's search capability.
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfleebarnesutopia
So… you want to become a Test Automation Engineer (or hire and develop one)? While there’s quite a bit of information available about important technical and tool skills to master, there’s not enough discussion around the path to becoming an effective Test Automation Engineer that knows how to add VALUE. In my experience this had led to a proliferation of engineers who are proficient with tools and building frameworks but have skill and knowledge gaps, especially in software testing, that reduce the value they deliver with test automation.
In this talk, Lee will share his lessons learned from over 30 years of working with, and mentoring, hundreds of Test Automation Engineers. Whether you’re looking to get started in test automation or just want to improve your trade, this talk will give you a solid foundation and roadmap for ensuring your test automation efforts continuously add value. This talk is equally valuable for both aspiring Test Automation Engineers and those managing them! All attendees will take away a set of key foundational knowledge and a high-level learning path for leveling up test automation skills and ensuring they add value to their organizations.
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Keywords: AI, Containeres, Kubernetes, Cloud Native
Event Link: https://meine.doag.org/events/cloudland/2024/agenda/#agendaId.4211
5. The mystery of querying stuff
You
Query
SELECT …
FROM …
id name
1 Louise
2 Alfred
… …
Result
6. Zooming into it
SQL Query
SELECT …
FROM …
id name
1 Louise
2 Alfred
… …
Result
Parse
Planner
Optimizer
Execute
- Generates execution plans for a query
- Calculates the cost of each plan.
- The best one is used to execute your query
7. - Understand why is your filter / join / order slow.
- Know if the new index you just added is used.
- Stop guessing which index can be useful
In conclusion: You will understand why your query is slow
So what can I learn from the
query plan
9. But why can’t we trust it ?
1.The ORM executes queries that
you might not expect
2.Your queries might not be
optimised and you won’t know
about it
10. The story of the owls
Owl RecipientRecipientJob
Letters Human
*
1
1*
1*
id
sender_id
receiver_id
sent_at
delivered_by
id
first_name
last_name
id
name
employer_name
feather_color
favourite_food
id
name
10 002 owls
10 000 humans
411861 letters
11. Loops (with django)
SELECT id, name, employer_name, favourite_food, job_id,
feather_color FROM owl WHERE employer_name = 'Hogwarts'
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
…
owls = Owl.objects.filter(employer_name=‘Hogwarts’)
for owl in owls:
print(owl.job) # 1 query per loop iteration
Awesome right ?
12. Loops (with django)
Owl.objects.filter(employer_name=‘Ulule’)
.select_related(‘job’)
SELECT … FROM "owl" LEFT OUTER JOIN "job" ON ("owl"."job_id" =
"job"."id")
WHERE "owl"."employer_name" = 'Ulule'
SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule'
SELECT … FROM "job" WHERE "job"."id" IN (2)
Owl.objects.filter(employer_name=‘Ulule’)
.prefetch_related(‘job’)
13. Where are my logs?
owl_conference=# show log_directory ;
log_directory
---------------
pg_log
owl_conference=# show data_directory ;
data_directory
-------------------------
/usr/local/var/postgres
owl_conference=# show log_filename ;
log_filename
-------------------------
postgresql-%Y-%m-%d.log
Terminal command
$ psql -U user -d your_database_name
psql interface
14. Having good looking logs
(and logging everything like a crazy owl)
owl_conference=# SHOW config_file;
config_file
-----------------------------------------
/usr/local/var/postgres/postgresql.conf
(1 row)
In your postgresql.conf
log_filename = 'postgresql-%Y-%m-%d.log'
log_statement = 'all'
logging_collector = on
log_min_duration_statement = 0
15. Finding dirty queries
pg_stat_statements
In your psql
CREATE EXTENSION pg_stat_statements;
The module must be added to your shared_preload_libraries.
You have to change your postgresql.conf and restart.
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
16. Finding the painful queries
SELECT total_time, min_time, max_time, mean_time,
calls, query
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 100;
-[ RECORD 6 ]---------------------------------------------------------
total_time | 519.491
min_time | 519.491
max_time | 519.491
mean_time | 519.491
calls | 1
query | SELECT COUNT(*) FROM letters;
19. What is EXPLAIN
Gives you the execution plan chosen by the query planner that your
database will use to execute your SQL statement
Using ANALYZE will actually execute your query! (Don’t worry, you
can ROLLBACK)
EXPLAIN (ANALYZE) my super query;
BEGIN;
EXPLAIN ANALYZE UPDATE owl SET … WHERE …;
ROLLBACK;
20. So, what does it took like ?
EXPLAIN ANALYZE SELECT * FROM owl WHERE
employer_name=‘Ulule';
QUERY PLAN
-------------------------------------
Seq Scan on owl (cost=0.00..205.01 rows=1 width=35)
(actual time=1.945..1.946 rows=1 loops=1)
Filter: ((employer_name)::text = 'Ulule'::text)
Rows Removed by Filter: 10001
Planning time: 0.080 ms
Execution time: 1.965 ms
(5 rows)
21. Let’s go step by step ! .. 1
Costs
(cost=0.00..205.01 rows=1 width=35)
Cost of retrieving
all rows
Number of rows
returned
Cost of retrieving
first row
Average width of a
row (in bytes)
(actual time=1.945..1.946 rows=1 loops=1)
If you use ANALYZE
Number of time your seq scan
(index scan etc.) was executed
22. Sequential Scan
Seq Scan on owl ...
Filter: ((employer_name)::text = 'Ulule'::text)
Rows Removed by Filter: 10001
- Scan the entire database table.
- Retrieves the rows matching your WHERE.
It can be expensive !
Would an index make this query faster ?
23. What is an index then?
In an encyclopaedia, if you
want to read about owls,
you don’t read the entire
book, you go to the index
first!
A database index contains
the column value and
pointers to the row that has
this value.
CREATE INDEX ON owl (employer_name);
employer_name
Hogwarts
Hogwarts
Hogwarts
…
Post office
…
Ulule
…
Pointer to
Owl 1
Owl 12
Owl 23
…
Owl m
…
Owl n
…
24. Index scan
Index Scan using owl_employer_name_idx on owl
(cost=0.29..8.30 rows=1 width=35) (actual
time=0.055..0.056 rows=1 loops=1)
Index Cond: ((employer_name)::text =
'Ulule'::text)
Planning time: 0.191 ms
Execution time: 0.109 ms
The index is visited row by row in order to
retrieve the data corresponding to your clause.
25. Index scan or sequential scan?
EXPLAIN SELECT * FROM owl
WHERE employer_name = 'post office’;
QUERY PLAN
-------------------------------------------------
Seq Scan on owl (cost=0.00..205.03 rows=7001 width=35)
Filter: ((employer_name)::text = 'post office'::text)
With an index and a really common value !
7000/10 000 owls work at the post office
26. Why is it using a sequential scan?
An index scan uses the order of the index, the
head has to move between rows.
Moving the read head of the database is 1000
times slower than reading the next physical
block.
Conclusion: For common values it’s quicker to
read all data from the table in physical order
By the way… Retrieving 7000 rows might not be a great idea :).
27. Bitmap Heap Scan
EXPLAIN ANALYZE SELECT * FROM owl WHERE owl.employer_name = 'Hogwarts';
QUERY PLAN
-------------------------------------------------
Bitmap Heap Scan on owl (cost=23.50..128.50 rows=2000 width=35)
(actual time=1.524..4.081 rows=2000 loops=1)
Recheck Cond: ((employer_name)::text = 'Hogwarts'::text)
Heap Blocks: exact=79
-> Bitmap Index Scan on owl_employer_name_idx1 (cost=0.00..23.00
rows=2000 width=0) (actual time=1.465..1.465 rows=2000 loops=1)
Index Cond: ((employer_name)::text = 'Hogwarts'::text)
Planning time: 15.642 ms
Execution time: 5.309 ms
(7 rows)
With an index and a common value
2000 owls work at Hogwarts
28. Bitmap Heap Scan
- The tuple-pointer from index are ordered by physical memory
- Go through the map.
Limit physical jumps between rows.
Recheck condition ? If the bitmap is too big
- The bitmap contains the pages where there are rows
- Goes through the pages, rechecks the condition
29. So we have 3 types of scan
1. Sequential scan
2. Index scan
3. Bitmap heap scan
And now let’s join stuff !
30. And now let’s join !
Nested loops
EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id)
WHERE job.id=1;
QUERY PLAN
-------------------------------------------------------------------
Nested Loop (cost=0.00..296.14 rows=9003 width=56)
(actual time=0.093..4.081 rows=9001 loops=1)
-> Seq Scan on job (cost=0.00..1.09 rows=1 width=21) (actual
time=0.064..0.064 rows=1 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 6
-> Seq Scan on owl (cost=0.00..205.03 rows=9003 width=35)
(actual time=0.015..2.658 rows=9001 loops=1)
Filter: (job_id = 1)
Rows Removed by Filter: 1001
Planning time: 0.188 ms
Execution time: 4.757 ms
31. Nested loops
Python version
jobs = Job.objects.all()
owls = Owl.objects.all()
for owl in owls:
for job in jobs:
if owl.job_id == job.id:
owl.job = job
break
- Used for little tables
- Complexity of O(n*m)
32. Hash Join
EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id > 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Hash Join (cost=1.17..318.70 rows=10001 width=56) (actual time=0.058..3.830 rows=1000 loops=1)
Hash Cond: (owl.job_id = job.id)
-> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=35) (actual time=0.039..2.170
rows=10002 loops=1)
-> Hash (cost=1.09..1.09 rows=7 width=21) (actual time=0.010..0.010 rows=6 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on job (cost=0.00..1.09 rows=7 width=21) (actual time=0.007..0.009 rows=6
loops=1)
Filter: (id > 1)
Rows Removed by Filter: 1
Planning time: 2.327 ms
Execution time: 3.905 ms
(10 rows)
33. Hash Join
Python version
jobs = Job.objects.all()
jobs_dict = {}
for job in jobs:
jobs_dict[job.id] = job
owls = Owl.objects.all()
for owl in owls:
owl.job = jobs_dict[owl.job_id]
- Used for small tables
- the hash table has to fit in memory (You
wouldn’t do a Python dictionary with 1M rows ;) )
- If the table is really small, a nested loop is
used because of the complexity of creating a
hash table
35. Merge Join - 2
Used for big tables, an index can be
used to avoid sorting
Sorted
Human
Lida (id = 1)
Mattie (id = 2)
Cameron (id = 3)
Carol (id = 4)
Maxie (id = 5)
Candy (id = 6)
…
Letters
Letter (receiver_id = 1)
Letter (receiver_id = 1)
Letter (receiver_id = 1)
Letter (receiver_id = 2)
Letter (receiver_id = 2)
Letter (receiver_id = 3)
…
Sorted
36. So we have 3 types of joins
1. Nested loop
2. Hash join
3. Merge join
And now, ORDER BY
37. And now let’s order stuff…
EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name;
QUERY PLAN
-------------------------------------------------------------
Sort (cost=894.39..919.39 rows=10000 width=23)
(actual time=163.228..164.211 rows=10000 loops=1)
Sort Key: last_name
Sort Method: quicksort Memory: 1166kB
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23)
(actual time=14.341..17.593 rows=10000 loops=1)
Planning time: 0.189 ms
Execution time: 164.702 ms
(6 rows)
Everything is sorted in the memory
(which is why it can be costly in terms of memory)
38. ORDER BY LIMIT
EXPLAIN ANALYZE SELECT * FROM human
ORDER BY last_name LIMIT 3;
QUERY PLAN
---------------------------------------------------------------
Limit (cost=446.10..446.12 rows=10 width=23)
(actual time=11.942..11.944 rows=3 loops=1)
-> Sort (cost=446.10..471.10 rows=10000 width=23)
(actual time=11.942..11.942 rows=3 loops=1)
Sort Key: last_name
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on human (cost=0.00..230.00 rows=10000
width=23) (actual time=0.074..0.947 rows=10000 loops=1)
Planning time: 0.074 ms
Execution time: 11.966 ms
(7 rows)
Like with quicksort, all the data has to be sorted… Why is the memory taken so much smaller?
39. Top-N heap sort
- A heap (sort of tree) is used with a limited size
- For each row
- If heap not full: add row in heap
- Else
- If value smaller than current values (for ASC): insert row
in heap, pop last
- Else pass
40. Top-N heap sort
Example LIMIT 3
Human
id last_name
1 Potter
2 Bailey
3 Acosta
4 Weasley
5 Caroll
… …
41. Ordering with an index
EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name LIMIT 5;
QUERY PLAN
-------------------------------------------------------------
Limit (cost=0.29..0.70 rows=5 width=23)
(actual time=0.606..0.611 rows=5 loops=1)
-> Index Scan using human_last_name_idx on human
(cost=0.29..834.27 rows=10000 width=23)
(actual time=0.605..0.610 rows=5 loops=1)
Planning time: 0.860 ms
Execution time: 0.808 ms
(4 rows)
Simply uses index order
CREATE INDEX ON human (last_name);
42. Be careful when you ORDER BY !
1. Sorting with sort key without limit or index can be
heavy in term of memory !
2. You might need an index, only EXPLAIN will tell
you
44. An example a bit more complex
The ministry of magic wants to identify suspect humans.
To do that they ask their DBM (Database Magician) for
- The list of the letters sent by Voldemort
- For each we want the id and date of the reply (if exists)
- All of this with the first and last name of the receiver
46. The python version
letters_from_voldemort =
Letters.objects.filter(sender_id=3267).select_related('receiver').order_by('sent_at')
letters_to_voldemort = Letters.objects.filter(receiver_id=3267).order_by('sent_at')
data = []
for letter in letters_from_voldemort:
for answer in letters_to_voldemort:
if letter.receiver_id == answer.sender_id and letter.sent_at < answer.sent_at:
answer_found = True
data.append([letter.receiver.first_name, letter.receiver.last_name,
letter.receiver.id, letter.id, letter.sent_at, answer.id, answer.sent_at])
break
else:
data.append([letter.receiver.first_name, letter.receiver.last_name,
letter.receiver.id, letter.id, letter.sent_at])
Takes about 1540 ms
47. Why not have fun with SQL ?
SELECT human.first_name, human.last_name, receiver_id, letter_id,
sent_at, answer_id, answer_sent_at
FROM (
SELECT
id as letter_id, receiver_id, sent_at, sender_id
FROM letters
WHERE
sender_id=3267
ORDER BY sent_at
) l1 LEFT JOIN LATERAL (
SELECT
id as answer_id,
sent_at as answer_sent_at
FROM letters
WHERE
sender_id = l1.receiver_id
AND sent_at > l1.sent_at
AND receiver_id = l1.sender_id
LIMIT 1
) l2 ON true JOIN human ON (human.id=receiver_id)
49. What is my problem ?
-> Sort (cost=8578.33..8578.43 rows=40 width=20) (actual
time=53.376..53.439 rows=1067 loops=1)
Sort Key: letters.sent_at
Sort Method: quicksort Memory: 132kB
-> Seq Scan on letters (cost=0.00..8577.26 rows=40
width=20) (actual time=0.939..53.127 rows=1067 loops=1)
Filter: (sender_id = 3267)
Rows Removed by Filter: 410794
-> Limit (cost=0.00..10636.57 rows=1 width=12) (actual
time=45.356..45.356 rows=0 loops=1067)
-> Seq Scan on letters letters_1 (cost=0.00..10636.57
rows=1 width=12) (actual time=45.346..45.346 rows=0
loops=1067)
Filter: ((sent_at > l1.sent_at) AND (sender_id =
l1.receiver_id) AND (receiver_id = l1.sender_id))
Rows Removed by Filter: 410896
The query planner is using a sequential scan to filter almost our entire table…
So clearly we are missing some indexes on the columns receiver_id, sender_id and sent_at.
50. Let’s create the following index
CREATE INDEX ON letters (sender_id, sent_at, receiver_id);
Quick reminder on multi column indexes
sender_id
1
2
3
4
5
6
sent_at
2010-03-05 20:18:00
2010-03-09 11:44:00
2010-04-02 14:38:00
2010-03-05 20:18:00
2015-03-05 20:18:00
2010-01-05 20:18:00
…
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
51. Multi-columns indexes
sender_id
1
2
3
4
5
6
sent_at
2010-03-05 20:18:00
2010-03-09 11:44:00
2010-04-02 14:38:00
2010-03-05 20:18:00
2015-03-05 20:18:00
2010-01-05 20:18:00
…
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
sender_id
1
2
3
4
5
6
sent_at
2010-03-05 20:18:00
2010-03-09 11:44:00
2010-04-02 14:38:00
2010-03-05 20:18:00
2015-03-05 20:18:00
2010-01-05 20:18:00
…
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
The first column of the index is ordered
The index will be used for
SELECT … FROM letters WHERE sender_id=…;
The same is true for
SELECT … FROM letters WHERE sender_id=…
AND sent_at = …;
52. The order of the columns matters !
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
In our previous query we had
SELECT … FROM letters WHERE sender_id=3267 ORDER BY sent_at
And then in the LATERAL JOIN
SELECT … FROM letters WHERE sender_id = l1.receiver_i
AND sent_at > l1.sent_at
AND receiver_id = l1.sender_id
The index won’t be used for
SELECT … FROM letters WHERE receiver_id=…;
The index (sender_id, sent_at, receiver_id) works fine
53. Better no ? 180 times faster than Python
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=154.76..767.34 rows=40 width=47) (actual time=1.092..7.694 rows=1067 loops=1)
-> Hash Join (cost=154.34..422.24 rows=40 width=39) (actual time=1.080..3.471 rows=1067 loops=1)
Hash Cond: (human.id = l1.receiver_id)
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.076..0.797 rows=10000 loops=1)
-> Hash (cost=153.84..153.84 rows=40 width=20) (actual time=0.992..0.992 rows=1067 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB
-> Subquery Scan on l1 (cost=153.34..153.84 rows=40 width=20) (actual time=0.520..0.813 rows=1067 loops=1)
-> Sort (cost=153.34..153.44 rows=40 width=20) (actual time=0.520..0.630 rows=1067 loops=1)
Sort Key: letters.sent_at
Sort Method: quicksort Memory: 132kB
-> Bitmap Heap Scan on letters (cost=4.73..152.27 rows=40 width=20) (actual time=0.089..0.304
rows=1067 loops=1)
Recheck Cond: (sender_id = 3267)
Heap Blocks: exact=74
-> Bitmap Index Scan on letters_sender_id_sent_at_receiver_id_idx (cost=0.00..4.72
rows=40 width=0) (actual time=0.079..0.079 rows=1067 loops=1)
Index Cond: (sender_id = 3267)
-> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=1067)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1
width=12) (actual time=0.003..0.003 rows=0 loops=1067)
Index Cond: ((sender_id = l1.receiver_id) AND (sent_at > l1.sent_at) AND (receiver_id = l1.sender_id))
Planning time: 0.565 ms
Execution time: 7.804 ms
(20 rows)
54. Better no ?
-> Bitmap Heap Scan on letters (cost=4.73..152.27 rows=40 width=20) (actual time=0.089..0.304 rows=1067 loops=1)
Recheck Cond: (sender_id = 3267)
Heap Blocks: exact=74
-> Bitmap Index Scan on letters_sender_id_sent_at_receiver_id_idx (cost=0.00..4.72 rows=40 width=0) (actual
time=0.079..0.079 rows=1067 loops=1)
Index Cond: (sender_id = 3267)
-> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=1067)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1 width=12)
(actual time=0.003..0.003 rows=0 loops=1067)
Index Cond: ((sender_id = l1.receiver_id) AND (sent_at > l1.sent_at) AND (receiver_id = l1.sender_id))
The index is used where there
were sequential scans.
55. Focusing on this has join…
-> Hash Join (cost=154.34..422.24 rows=40 width=39) (actual time=1.080..3.471 rows=1067 loops=1)
Hash Cond: (human.id = l1.receiver_id)
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.076..0.797 rows=10000 loops=1)
-> Hash (cost=153.84..153.84 rows=40 width=20) (actual time=0.992..0.992 rows=1067 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB
-> Subquery Scan on l1 (cost=153.34..153.84 rows=40 width=20) (actual time=0.520..0.813 rows=1067 loops=1)
A hash is built from the subquery (1067 rows)
Human table having 10 000 rows
human
Human 1
Human 2
…
Human 22
…
Human 114
hash table
(receiver_id)
22
94
104
114
125
…
Rows
row 981
row 801
row 132
row 203
row 42
row 12
row 26
row 1012
…
56. Let’s use a pagination…
SELECT human.first_name, human.last_name, receiver_id, letter_id,
sent_at, answer_id, answer_sent_at
FROM (
SELECT
id as letter_id, receiver_id, sent_at, sender_id
FROM letters
WHERE
sender_id=3267 AND sent_at > '2010-03-22'
ORDER BY sent_at LIMIT 20
) l1 LEFT JOIN LATERAL (
SELECT
id as answer_id,
sent_at as answer_sent_at
FROM letters
WHERE
sender_id = l1.receiver_id
AND sent_at > l1.sent_at
AND receiver_id = l1.sender_id
LIMIT 1
) l2 ON true JOIN human ON (human.id=receiver_id)
57. Let’s use a pagination…
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.13..417.82 rows=20 width=47) (actual time=0.049..0.365 rows=20 loops=1)
-> Nested Loop Left Join (cost=0.84..255.57 rows=20 width=28) (actual time=0.040..0.231 rows=20
loops=1)
-> Limit (cost=0.42..82.82 rows=20 width=20) (actual time=0.022..0.038 rows=20 loops=1)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters
(cost=0.42..165.22 rows=40 width=20) (actual time=0.020..0.035 rows=20 loops=1)
Index Cond: ((sender_id = 3267) AND (sent_at > '2010-03-22 00:00:00+01'::timestamp
with time zone))
-> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.009..0.009 rows=0 loops=20)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1
(cost=0.42..8.61 rows=1 width=12) (actual time=0.008..0.008 rows=0 loops=20)
Index Cond: ((sender_id = letters.receiver_id) AND (sent_at > letters.sent_at) AND
(receiver_id = letters.sender_id))
-> Index Scan using humain_pkey on human (cost=0.29..8.10 rows=1 width=23) (actual time=0.006..0.006
rows=1 loops=20)
Index Cond: (id = letters.receiver_id)
Planning time: 0.709 ms
Execution time: 0.436 ms
(12 rows)
59. Thank you for your attention !
Any questions?
Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
60. To go further - sources
https://momjian.us/main/writings/pgsql/optimizer.pdf
https://use-the-index-luke.com/sql/plans-dexecution/postgresql/operations
http://tech.novapost.fr/postgresql-application_name-django-settings.html
Slide used with the owls ans paintings from
https://www.instagram.com/zimmoriarty