Python is often used to maintain application backends. When the backend should implement user oriented workflows, it may rely on a RDBMS component to take care of the system's integrity.
PostgreSQL is the world's most advanced open source relational database, and is very good at taking care of your system's integrity. PostgreSQL also comes with a ton of data processing power, and in many cases a simple enough SQL statement may replace hundreds of lines of code written in Python.
In this talk, we learn advanced SQL techniques and how to reason about which part of the backend code should be done in the database, and which parf of the backend code is so easier to write as a SQL query.
An introduction to data engineering & data science using Apache Spark and Java.
Get Spark in Action 2e, at http://jgp.ai/sia.
In this presentation, I start by loading a few CSV files in Spark (ingestion) and displaying them through the help of this new tool I build, dṛṣṭi.
As you can expect, I clean the data, join it, transform it, and continue to visualize it through dṛṣṭi.
I use Delta Lake to create a cache for my data and explain what imputation is and show I can use imputation on my datasets to add the missing datapoints.
I then use Spark on simple linear regressions to predict/forecast data.
dṛṣṭi is open source (Apache 2 license) and is available at: https://github.com/jgperrin/ai.jgp.drsti.
All the labs are available at https://github.com/jgperrin/ai.jgp.drsti-spark.
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineCitus Data
PostgreSQL is the World’s Most Advanced Open Source Relational Database and by the end of this talk you will understand what that means for you, an application developer. What kind of problems PostgreSQL can solve for you, and how much you can rely on PostgreSQL in your daily activities, including unit-testing.
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineCitus Data
PostgreSQL is the World’s Most Advanced Open Source Relational Database and by the end of this talk you will understand what that means for you, an application developer. What kind of problems PostgreSQL can solve for you, and how much you can rely on PostgreSQL in your daily activities, including unit-testing.
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]CARTO
En este webinar repasamos - mediante una demostración con el mercado de Real Estate de Los Angeles como ejemplo - cada uno de los cinco pasos que la plataforma de CARTO sigue para una toma de decisiones eficaz basada en los datos.
Watch it now at: https://go.carto.com/carto-pasos-dato-toma-decisiones-recorded
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
An introduction to data engineering & data science using Apache Spark and Java.
Get Spark in Action 2e, at http://jgp.ai/sia.
In this presentation, I start by loading a few CSV files in Spark (ingestion) and displaying them through the help of this new tool I build, dṛṣṭi.
As you can expect, I clean the data, join it, transform it, and continue to visualize it through dṛṣṭi.
I use Delta Lake to create a cache for my data and explain what imputation is and show I can use imputation on my datasets to add the missing datapoints.
I then use Spark on simple linear regressions to predict/forecast data.
dṛṣṭi is open source (Apache 2 license) and is available at: https://github.com/jgperrin/ai.jgp.drsti.
All the labs are available at https://github.com/jgperrin/ai.jgp.drsti-spark.
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineCitus Data
PostgreSQL is the World’s Most Advanced Open Source Relational Database and by the end of this talk you will understand what that means for you, an application developer. What kind of problems PostgreSQL can solve for you, and how much you can rely on PostgreSQL in your daily activities, including unit-testing.
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineCitus Data
PostgreSQL is the World’s Most Advanced Open Source Relational Database and by the end of this talk you will understand what that means for you, an application developer. What kind of problems PostgreSQL can solve for you, and how much you can rely on PostgreSQL in your daily activities, including unit-testing.
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]CARTO
En este webinar repasamos - mediante una demostración con el mercado de Real Estate de Los Angeles como ejemplo - cada uno de los cinco pasos que la plataforma de CARTO sigue para una toma de decisiones eficaz basada en los datos.
Watch it now at: https://go.carto.com/carto-pasos-dato-toma-decisiones-recorded
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
You're about to set up a new data warehouse, modify an existing one or maybe you are struggling with dates in reports, dashboards, ... If so, this article will help you to understand what are the benefits of a date dimension table. I will even try to prove you that a data warehouse always needs to contain a date dimension table and that cubes, reports, dashboards, ... that are based on dates use in 99% of the cases a date dimension table (or should do).
The Sum of our Parts: the Complete CARTO Journey [CARTO]CARTO
In this webinar, we put all of the pieces together - showing, using the example of the Real Estate market in Los Angeles, how the CARTO platform powers a data-to-decision workflow, showcasing every step along the way.
Watch it now at: https://go.carto.com/sum-parts-complete-carto-journey-webinar-recorded
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineCitus Data
Most of the time we see finished SQL queries, either in code repositories, blog posts of talk slides. This talk focus on the process of how to write an SQL query, from a problem statement expressed in English to code review and long term maintenance of SQL code.
Uncovering SQL Server query problems with execution plans - Tony DavisRed Gate Software
Presentation by Tony Davis at SQL in The City 2016. An execution plan tells you exactly which tables and indexes SQL Server accessed, in what order, and what other operations it performed to return the data your query needed. But sometimes, the plan for even the simplest-looking query can reveal nasty surprises.
This session describes how SQL Server generates and reuses execution plans and the implications this has for you as the developer. After a quick-start guide to retrieving and reading plans, we'll focus on techniques that can help you track down high-cost queries quickly.
We'll cover tools such as ANTS Performance Profiler, as well as scripts that hunt down execution plans for queries that caused expensive scans, sort warnings, and other issues. Examining those plans, you'll uncover the root cause of the problem, often revealing issues such as inefficient indexing, data type mismatches, and misuse of functions.
Learn more about ANTS Performance Profiler: http://www.red-gate.com/products/dotnet-development/ants-performance-profiler/
Find out about all Redgate Products: http://www.red-gate.com/products/
Connect with Tony Davis on LinkedIn: https://www.linkedin.com/in/tony-davis-208b241
This presentations shows how to create a time/date dimension for PowerPivot from the date data in your fact table. I also shows the DAX functions that you can use to add columns to the fact table or a separate dimension table.
Company segmentation - an approach with RCasper Crause
We classify companies based on how their stocks trade using their daily stock returns (percentage movement from one day to the next). This analysis will help your organization determine which companies are related to each other (competitors and have similar attributes).
Running Intelligent Applications inside a Database: Deep Learning with Python...Miguel González-Fierro
In this talk we present a new paradigm of computation where the intelligence is computed inside the database. Standard software systems must get the data from the database to execute a routine. If the size of the data is big, there are inefficiencies due to the data movement. Store procedures tried to solve this issue in the past, allowing for computing simple functions inside the database. However, only simple routines can be executed.
To showcase the capabilities of our new system, we created a lung cancer detection algorithm using Microsoft’s Cognitive Toolkit, also known as CNTK. We used transfer learning between ImageNet dataset, which contains natural images, and a lung cancer dataset, which contains scans of horizontal sections of the lung for healthy and sick patients. Specifically, a pretrained Convolutional Neural Network on ImageNet is used on the lung cancer dataset to generate features. Once the features are computed, a boosted tree is applied to predict whether the patient has cancer or not.
All this process is computed inside the database, so the data movement is minimized. We are even able to execute the algorithm using the GPU of the virtual machine that hosts the database. Using a GPU, we can compute the featurization in less than 1h, in contrast to using a CPU, that would take up to 32h. Finally, we set up an API to connect the solution to a web app, where a doctor can analyze the images and get a prediction of a patient.
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Citus Data
A story about powering a 1.5 petabyte internal analytics application at Microsoft with 2816 cores and 18.7 TB of memory in the Citus cluster.
The internal RQV analytics dashboard at Microsoft helps the Windows team to assess the quality of upcoming Windows releases. The system tracks 20,000 diagnostic and quality metrics, digests data from 800 million Windows devices and currently supports over 6 million queries per day, with hundreds of concurrent users. The RQV analytics dashboard relies on Postgres—along with the Citus extension to Postgres to scale out horizontally—and is deployed on Microsoft Azure.
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
More Related Content
Similar to Python and PostgreSQL: Let's Work Together! | PyConFr 2018 | Dimitri Fontaine
You're about to set up a new data warehouse, modify an existing one or maybe you are struggling with dates in reports, dashboards, ... If so, this article will help you to understand what are the benefits of a date dimension table. I will even try to prove you that a data warehouse always needs to contain a date dimension table and that cubes, reports, dashboards, ... that are based on dates use in 99% of the cases a date dimension table (or should do).
The Sum of our Parts: the Complete CARTO Journey [CARTO]CARTO
In this webinar, we put all of the pieces together - showing, using the example of the Real Estate market in Los Angeles, how the CARTO platform powers a data-to-decision workflow, showcasing every step along the way.
Watch it now at: https://go.carto.com/sum-parts-complete-carto-journey-webinar-recorded
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineCitus Data
Most of the time we see finished SQL queries, either in code repositories, blog posts of talk slides. This talk focus on the process of how to write an SQL query, from a problem statement expressed in English to code review and long term maintenance of SQL code.
Uncovering SQL Server query problems with execution plans - Tony DavisRed Gate Software
Presentation by Tony Davis at SQL in The City 2016. An execution plan tells you exactly which tables and indexes SQL Server accessed, in what order, and what other operations it performed to return the data your query needed. But sometimes, the plan for even the simplest-looking query can reveal nasty surprises.
This session describes how SQL Server generates and reuses execution plans and the implications this has for you as the developer. After a quick-start guide to retrieving and reading plans, we'll focus on techniques that can help you track down high-cost queries quickly.
We'll cover tools such as ANTS Performance Profiler, as well as scripts that hunt down execution plans for queries that caused expensive scans, sort warnings, and other issues. Examining those plans, you'll uncover the root cause of the problem, often revealing issues such as inefficient indexing, data type mismatches, and misuse of functions.
Learn more about ANTS Performance Profiler: http://www.red-gate.com/products/dotnet-development/ants-performance-profiler/
Find out about all Redgate Products: http://www.red-gate.com/products/
Connect with Tony Davis on LinkedIn: https://www.linkedin.com/in/tony-davis-208b241
This presentations shows how to create a time/date dimension for PowerPivot from the date data in your fact table. I also shows the DAX functions that you can use to add columns to the fact table or a separate dimension table.
Company segmentation - an approach with RCasper Crause
We classify companies based on how their stocks trade using their daily stock returns (percentage movement from one day to the next). This analysis will help your organization determine which companies are related to each other (competitors and have similar attributes).
Running Intelligent Applications inside a Database: Deep Learning with Python...Miguel González-Fierro
In this talk we present a new paradigm of computation where the intelligence is computed inside the database. Standard software systems must get the data from the database to execute a routine. If the size of the data is big, there are inefficiencies due to the data movement. Store procedures tried to solve this issue in the past, allowing for computing simple functions inside the database. However, only simple routines can be executed.
To showcase the capabilities of our new system, we created a lung cancer detection algorithm using Microsoft’s Cognitive Toolkit, also known as CNTK. We used transfer learning between ImageNet dataset, which contains natural images, and a lung cancer dataset, which contains scans of horizontal sections of the lung for healthy and sick patients. Specifically, a pretrained Convolutional Neural Network on ImageNet is used on the lung cancer dataset to generate features. Once the features are computed, a boosted tree is applied to predict whether the patient has cancer or not.
All this process is computed inside the database, so the data movement is minimized. We are even able to execute the algorithm using the GPU of the virtual machine that hosts the database. Using a GPU, we can compute the featurization in less than 1h, in contrast to using a CPU, that would take up to 32h. Finally, we set up an API to connect the solution to a web app, where a doctor can analyze the images and get a prediction of a patient.
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Citus Data
A story about powering a 1.5 petabyte internal analytics application at Microsoft with 2816 cores and 18.7 TB of memory in the Citus cluster.
The internal RQV analytics dashboard at Microsoft helps the Windows team to assess the quality of upcoming Windows releases. The system tracks 20,000 diagnostic and quality metrics, digests data from 800 million Windows devices and currently supports over 6 million queries per day, with hundreds of concurrent users. The RQV analytics dashboard relies on Postgres—along with the Citus extension to Postgres to scale out horizontally—and is deployed on Microsoft Azure.
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...Citus Data
When do you use jsonb, and when don’t you? How do you make it fast? What operators are available, and what can they do? How will this change? These are all very good questions, but jsonb support in Postgres moves so fast that it’s hard to keep up.
In this talk, you will get details on these topics, complete with practical examples and real-world stories:
- When to use jsonb, what it’s good for, and when to not use it
- Operators and how to use them effectively
- Indexing, operator support for indexes, and the tradeoffs involved
- Postgres 12 improvements and new features
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Citus Data
One of the strongest features of any database is its extensibility and PostgreSQL comes with a rich extension API. It allows you to define new functions, types, and operators. It even allows you to modify some of its core parts like planner, executor or storage engine. You read it right, you can even change the behavior of PostgreSQL planner. How cool is that?
Such freedom in extensibility created strong extension community around PostgreSQL and made way for a vast amount of extensions such as pg_stat_statements, citus, postgresql-hll and many more.
In this tutorial, we will look at how you can create your own PostgreSQL extension. We will start with more common stuff like defining new functions and types but gradually explore less known parts of the PostgreSQL's extension API like C level hooks which lets you change the behavior of planner, executor and other core parts of the PostgreSQL. We will see how to code, debug, compile and test our extension. After that, we will also look into how to package and distribute our extension for other people to use.
To get the best benefit from the tutorial, C and SQL knowledge would be beneficial. Some knowledge on PostgreSQL internals would also be useful but we will cover the necessary details, so it is not necessary.
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
Postgres is a powerful database, it continues to improve in terms of performance, extensibility, and more broadly in features. However it is not perfect.
Here I'll cover a highly opinionated view of all the areas Postgres falls flat, with some rough thought ideas on how we can make it better. Opinions are all informed by 10 years of interacting with customers running literally millions of databases for users.
When it all goes wrong | PGConf EU 2019 | Will LeinweberCitus Data
You're woken up in the middle of the night to your phone. Your app is down and you're on call to fix it. Eventually you track it down to "something with the db," but what exactly is wrong? And of course, you're sure that nothing changed recently…
Knowing what to fix, and even where to start looking, is a skill that takes a long time to develop. Especially since Postgres normally works very well for months at a time, not letting you get practice!
In this talk, I'll share not only the more common failure cases and how to fix them, but also a general approach to efficiently figuring out what's wrong in the first place.
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncCitus Data
SQL can seem like an obscure and complex but powerful language. Learning it can be intimidating. As a developer, we can easily be tempted using basic SQL provided by the ORM. But did you know that you can use window functions in some ORMs? Same goes for a lot of other fun SQL functionalities.
In this talk we will explore some advanced SQL features that you might find useful. We will discover the wonderful world of joins (lateral, cross…), subqueries, grouping sets, window functions, common table expressions.
But most importantly this talk is not only a talk to show you how great SQL is. This talk is here to show you how to use it in real life. What are the features supported by your ORM? And how can you use them if they don’t support them?
Wether you know SQL or not, whether you are a developer or a DBA working with developers, you might learn a lot about SQL, ORMs, and application development using Postgres.
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...Citus Data
Many people have asked us: “Why did Microsoft acquire Citus Data?” and “What do you plan to do with the Citus open source extension to Postgres?” Come join us to see the exciting work we are doing with Postgres and open source at Microsoft.
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisCitus Data
Postgres relies heavily on an extension ecosystem, but that is almost 100% dependent on C; which cuts out developers, libraries, and ideas from the world of Postgres. postgres-extension.rs changes that by supporting development of extensions in Rust. Rust is a memory-safe language that integrates nicely in any environment, has powerful libraries, a vibrant ecosystem, and a prolific developer community.
Rust is a unique language because it supports high-level features but all the magic happens at compile-time, and the resulting code is not dependent on an intrusive or bulky runtime. That makes it ideal for integrating with postgres, which has a lot of its own runtime, like memory contexts and signal handlers. postgres-extension.rs offers this integration, allowing the development of extensions in rust, even if deeply-integrated into the postgres internals, and helping handle tricky issues like error handling. This is done through a collection of Rust function declarations, macros, and utility functions that allow rust code to call into postgres, and safely handle resulting errors.
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Citus Data
I spent the early part of my career working on developer tools, operating systems, high-speed file systems, and scale-out storage. Not databases. Frankly, I always thought that databases were a bit boring. So almost 2 years in to my new job at a Postgres company, I continue to be amazed at the enthusiasm of the PostgreSQL developer community and users. I mean, people’s eyes light up when you ask them why they love Postgres. Sure, a lot of us get animated when talking about our newest gadget, or Ronaldo’s phenomenal free-kick goal in the World Cup, or mint chip gelato from La Strega Nocciola—but most platform software simply doesn’t trigger this kind of passion. So why does Postgres? Why is this open source database having such a “moment”? Well, I’ve been trying to understand, looking at this “Postgres moment” from a few different angles. In this talk I’ll share what I’ve observed to be the top 10 business, technology, and community reasons so many of you have so much affection for PostgreSQL.
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncCitus Data
Want to know everything about indexes in postgres? Here are the slides for a postgresql talk, and if you want to know more, you can read articles on www.louisemeta.com.
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Citus Data
Many in today’s developer world look down on marketing. I mean, after all, the marketing team is usually “not technical.” And they’re not developers. It’s 2019 and while we try to promote inclusiveness of all types, inclusiveness doesn’t seem to apply to marketers. Why? Is that OK? Who does that hurt? I grew up in engineering and spent the first 15 years of my career as a developer or an engineering manager of some type. So now that I’m in marketing, it surprised me when one of my engineering colleagues blurted out “But it’s a technical conference!” when he learned one of my talks was accepted to a technical conference.
This keynote is about why developers really need marketing. About how good marketing managers can make it so visitors to your website don’t leave empty-handed, confused about what your technology actually does or why it matters. About how the ability to translate technology into what-users-actually-care-about can make your project be the one that takes off. About why Dormain Drewitz said at Monktoberfest: “I work in product marketing. My preferred programming language is English.” Finally, this talk explores how to be sensitive to the bias against marketing that pervades some of our teams—and how to instead embrace teamwork best practices employed by sailors, where everyone in the boat has an important role to play if you are to win the race.
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Citus Data
I’m a Postgres person. Period. After talking to many Rails developers about their application performance, I realized many performance issues can be solved by understanding your database a bit better. So I thought I’d share the statistics Postgres captures for you and how you can use them to find slow queries, un-used indexes, or tables which are not getting vacuumed correctly. This talk will cover Postgres tools and tips for the above, including pgstatstatements, useful catalog tables, and recently added Postgres features such as CREATE STATISTICS.
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberCitus Data
You're woken up in the middle of the night to your phone. Your app is down and you're on call to fix it. Eventually you track it down to "something with the db," but what exactly is wrong? And of course, you're sure that nothing changed recently…
Knowing what to fix, and even where to start looking, is a skill that takes a long time to develop. Especially since Postgres normally works very well for months at a time, not letting you get practice!
In this talk, I'll share not only the more common failure cases and how to fix them, but also a general approach to efficiently figuring out what's wrong in the first place.
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Citus Data
Watch Sai Srirampur, Solutions Engineer at Citus Data (now part of the Microsoft family), give a live demo of how you can use Postgres and the Citus extension to Postgres to manage real-time analytics workloads.
View if you & your application need:
>> A relational database that scales for customer-facing analytics dashboards, with real-time data ingest and a large volume of queries
>> A way to scale out Postgres horizontally, to address the performance hiccups you’re experiencing as you run into the resource limits of single-node Postgres
>> A way to roll-up and pre-aggregate data to build fast data pipelines and enable sub-second response times.
>> A way to consolidate your database platforms, to avoid having separate stores for your transactional and analytics workloads
Using a 4-node Citus database cluster in the cloud, Sai will show you how Citus shards Postgres to give you lightning fast performance, at scale. Also featuring rollups.
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberCitus Data
You're woken up in the middle of the night to your phone. Your app is down and you're on call to fix it. Eventually you track it down to "something with the db," but what exactly is wrong? And of course, you're sure that nothing changed recently…
Knowing what to fix, and even where to start looking, is a skill that takes a long time to develop. Especially since Postgres normally works very well for months at a time, not letting you get practice!
In this talk, I'll share not only the more common failure cases and how to fix them, but also a general approach to efficiently figuring out what's wrong in the first place.
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoCitus Data
I spent the early part of my career working on developer tools, operating systems, high-speed file systems, and scale-out storage. Not databases. Frankly, I always thought that databases were a bit boring. So one year in to my new job at a Postgres company, I continue to be amazed at the enthusiasm of the PostgreSQL developer community and users. I mean, people’s eyes light up when you ask them why they love Postgres. Sure, a lot of us get animated when talking about our newest iPhone, or Ronaldo’s phenomenal free-kick goal in the World Cup, or mint chip gelato from La Strega Nociola—but most platform software simply doesn’t trigger this kind of passion. So why does Postgres? Why is this open source database having such a “moment”? Why now? Well, I’ve been trying to find out, looking at this “Postgres moment” from a few different angles. In this talk I’ll share what I’ve observed to be the top 10 business, technology, and community reasons so many of you have so much affection for PostgreSQL.
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Citus Data
There are a number of data architectures you could use when building a multi-tenant app. Some, such as using one database per customer or one schema per customer. These two options scale to an extent when you have say 10s of tenants. However as you start scaling to hundreds and thousands of tenants, you start running into challenges both from performance and maintenance of tenants perspective. You could solve the above problem by adding the notion of tenancy directly into the logic of your SaaS application. How to implement/automate this in Django-ORM is a challenge? We will talk about how to make the django app tenant aware and at a broader level explain how scale out applications that are built on top of Django ORM and follow a multi tenant data model. We'd take postgresql as our database of choice and the logic/implementation can be extended to any other relational databases as well.
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...Citus Data
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
As a developer using PostgreSQL one of the most important tasks you have to deal with is modeling the database schema for your application. In order to achieve a solid design, it’s important to understand how the schema is then going to be used as well as the trade-offs it involves.
As Fred Brooks said: “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”
In this talk we're going to see practical normalisation examples and their benefits, and also review some anti-patterns and their typical PostgreSQL solutions, including Denormalization techniques thanks to advanced Data Types.
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Citus Data
Whether you’re working with a single node database, a distributed system, or an MPP database, a key factor in the flexibility you get with the system is how you shard or partition your data. Do you do it by customer, time, or some random uuid? Here we’ll walk through five different approaches to sharding your data and when you should consider each. If you’re thinking you need to scale beyond a single node this will give you the start of your roadmap for doing so. We’ll cover the basics of how you can do this directly in Postgres as well as principles that apply generically to any database.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
11. ACID
A relational database management system guarantees
consistency of a system as a whole while allowing concurrent
access (read and write) to a single data set.
• Atomic
• Consistent
• Isolated
• Durable
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018
13. Consistent
Dimitri Fontaine (CitusData) Data Modeling, Normalization and Denormalization March 13, 2018
• Data types
• Constraints
check, not null,
pkey, fkey
• Relations
• SQL
• Schema
create table foo
(
id int,
f1 text
);
18. Rule 5. Data dominates.
R O B P I K E , N O T E S O N P R O G R A M M I N G I N C
“If you’ve chosen the right data structures and
organized things well, the algorithms will
almost always be self-evident. Data structures,
not algorithms, are central to programming.”
(Brooks p. 102)
20. Daily NYSE Group Volume in
NYSE Listed, 2017
2010 1/4/2010 1,425,504,460 4,628,115 $38,495,460,645
2010 1/5/2010 1,754,011,750 5,394,016 $43,932,043,406
2010 1/6/2010 1,655,507,953 5,494,460 $43,816,749,660
2010 1/7/2010 1,797,810,789 5,674,297 $44,104,237,184
create table factbook
(
year int,
date date,
shares text,
trades text,
dollars text
);
copy factbook from 'factbook.csv' with delimiter E't' null ''
21. Daily NYSE Group Volume in
NYSE Listed, 2017
alter table factbook
alter shares
type bigint
using replace(shares, ',', '')::bigint,
alter trades
type bigint
using replace(trades, ',', '')::bigint,
alter dollars
type bigint
using substring(replace(dollars, ',', '') from 2)::numeric;
27. Monthly Report, SQL
set start '2017-02-01'
select date,
to_char(shares, '99G999G999G999') as shares,
to_char(trades, '99G999G999') as trades,
to_char(dollars, 'L99G999G999G999') as dollars
from factbook
where date >= date :'start'
and date < date :'start' + interval '1 month'
order by date;
29. Monthly Report, Python
def fetch_month_data(year, month):
"Fetch a month of data from the database"
date = "%d-%02d-01" % (year, month)
sql = """
select date, shares, trades, dollars
from factbook
where date >= date %s
and date < date %s + interval '1 month'
order by date;
"""
pgconn = psycopg2.connect(CONNSTRING)
curs = pgconn.cursor()
curs.execute(sql, (date, date))
res = {}
for (date, shares, trades, dollars) in curs.fetchall():
res[date] = (shares, trades, dollars)
return res
def list_book_for_month(year, month):
"""List all days for given month, and for each
day list fact book entry.
"""
data = fetch_month_data(year, month)
cal = Calendar()
print("%12s | %12s | %12s | %12s" %
("day", "shares", "trades", "dollars"))
print("%12s-+-%12s-+-%12s-+-%12s" %
("-" * 12, "-" * 12, "-" * 12, "-" * 12))
for day in cal.itermonthdates(year, month):
if day.month != month:
continue
if day in data:
shares, trades, dollars = data[day]
else:
shares, trades, dollars = 0, 0, 0
print("%12s | %12s | %12s | %12s" %
(day, shares, trades, dollars))
34. Monthly Report, Fixed, SQL
select cast(calendar.entry as date) as date,
coalesce(shares, 0) as shares,
coalesce(trades, 0) as trades,
to_char(
coalesce(dollars, 0),
'L99G999G999G999'
) as dollars
from /*
* Generate the target month's calendar then LEFT JOIN
* each day against the factbook dataset, so as to have
* every day in the result set, whether or not we have a
* book entry for the day.
*/
generate_series(date :'start',
date :'start' + interval '1 month'
- interval '1 day',
interval '1 day'
)
as calendar(entry)
left join factbook
on factbook.date = calendar.entry
order by date;
39. Monthly Report, WoW%, SQL
with computed_data as
(
select cast(date as date) as date,
to_char(date, 'Dy') as day,
coalesce(dollars, 0) as dollars,
lag(dollars, 1)
over(
partition by extract('isodow' from date)
order by date
)
as last_week_dollars
from /*
* Generate the month calendar, plus a week
* before so that we have values to compare
* dollars against even for the first week
* of the month.
*/
generate_series(date :'start' - interval '1 week',
date :'start' + interval '1 month'
- interval '1 day',
interval '1 day'
)
as calendar(date)
left join factbook using(date)
)
select date, day,
to_char(
coalesce(dollars, 0),
'L99G999G999G999'
) as dollars,
case when dollars is not null
and dollars <> 0
then round( 100.0
* (dollars - last_week_dollars)
/ dollars
, 2)
end
as "WoW %"
from computed_data
where date >= date :'start'
order by date;
40. Monthly Report, WoW%, SQL
with computed_data as
(
select cast(date as date) as date,
to_char(date, 'Dy') as day,
coalesce(dollars, 0) as dollars,
lag(dollars, 1)
over(
partition by extract('isodow' from date)
order by date
)
as last_week_dollars
from /*
* Generate the month calendar, plus a week
* before so that we have values to compare
* dollars against even for the first week
* of the month.
*/
generate_series(date :'start' - interval '1 week',
date :'start' + interval '1 month'
- interval '1 day',
interval '1 day'
)
as calendar(date)
left join factbook using(date)
)
select date, day,
to_char(
coalesce(dollars, 0),
'L99G999G999G999'
) as dollars,
case when dollars is not null
and dollars <> 0
then round( 100.0
* (dollars - last_week_dollars)
/ dollars
, 2)
end
as "WoW %"
from computed_data
where date >= date :'start'
order by date;
Window Function, SQL’92
43. Thinking in SQL
•Structured Query Language
•Declarative Programming Language
•Relational Model
•Unix: everything is a file
•Java: everything is an object
•Python: packages, modules, classes, methods
•SQL: relations
44. SQL Relations
•SELECT describes the type of the relation
•Named a projection operator
•Defines SQL Query Attribute domains
•FROM introduces base relations
•Relational Operators compute new relations
•INNER JOIN
•OUTER JOIN
•LATERAL JOIN
•set operators: UNION, EXECPT, INTERSECT
45. SQL Relations
with decades as
(
select extract('year' from date_trunc('decade', date)) as decade
from races
group by decade
)
select decade,
rank() over(partition by decade order by wins desc) as rank,
forename, surname, wins
from decades
left join lateral
(
select code, forename, surname, count(*) as wins
from drivers
join results
on results.driverid = drivers.driverid
and results.position = 1
join races using(raceid)
where extract('year' from date_trunc('decade', races.date))
= decades.decade
group by decades.decade, drivers.driverid
order by wins desc
limit 3
)
as winners on true
order by decade asc, wins desc;
46. Top-3 Pilots by decade
decade │ rank │ forename │ surname │ wins
════════╪══════╪═══════════╪════════════╪══════
1950 │ 1 │ Juan │ Fangio │ 24
1950 │ 2 │ Alberto │ Ascari │ 13
1950 │ 3 │ Stirling │ Moss │ 12
1960 │ 1 │ Jim │ Clark │ 25
1960 │ 2 │ Graham │ Hill │ 14
1960 │ 3 │ Jack │ Brabham │ 11
1970 │ 1 │ Niki │ Lauda │ 17
1970 │ 2 │ Jackie │ Stewart │ 16
1970 │ 3 │ Emerson │ Fittipaldi │ 14
1980 │ 1 │ Alain │ Prost │ 39
1980 │ 2 │ Nelson │ Piquet │ 20
1980 │ 2 │ Ayrton │ Senna │ 20
1990 │ 1 │ Michael │ Schumacher │ 35
1990 │ 2 │ Damon │ Hill │ 22
1990 │ 3 │ Ayrton │ Senna │ 21
2000 │ 1 │ Michael │ Schumacher │ 56
2000 │ 2 │ Fernando │ Alonso │ 21
2000 │ 3 │ Kimi │ Räikkönen │ 18
2010 │ 1 │ Lewis │ Hamilton │ 45
2010 │ 2 │ Sebastian │ Vettel │ 40
2010 │ 3 │ Nico │ Rosberg │ 23
(21 rows)
48. SQL & Developer Tooling
with computed_data as
(
select cast(date as date) as date,
to_char(date, 'Dy') as day,
coalesce(dollars, 0) as dollars,
lag(dollars, 1)
over(
partition by extract('isodow' from date)
order by date
)
as last_week_dollars
from /*
* Generate the month calendar, plus a week before
* so that we have values to compare dollars against
* even for the first week of the month.
*/
generate_series(date :'start' - interval '1 week',
date :'start' + interval '1 month'
- interval '1 day',
interval '1 day'
)
as calendar(date)
left join factbook using(date)
)
select date, day,
to_char(
coalesce(dollars, 0),
'L99G999G999G999'
) as dollars,
case when dollars is not null
and dollars <> 0
then round( 100.0
* (dollars - last_week_dollars)
/ dollars
, 2)
end
as "WoW %"
from computed_data
where date >= date :'start'
order by date;
• Code Integration
• SQL Queries in .sql files
• Parameters
• Result Set To Objects
• A Result Set is a Relation
• Testing
• Unit Testing
• Regression Testing
49. Python AnoSQL
$ cat queries.sql
-- name: get-all-greetings
-- Get all the greetings in the database
SELECT * FROM greetings;
-- name: $select-users
-- Get all the users from the database,
-- and return it as a dict
SELECT * FROM USERS;
51. RegreSQL
$ regresql test
Connecting to 'postgres:///chinook?sslmode=disable'… ✓
TAP version 13
ok 1 - src/sql/album-by-artist.1.out
ok 2 - src/sql/album-tracks.1.out
ok 3 - src/sql/artist.1.out
ok 4 - src/sql/genre-topn.top-3.out
ok 5 - src/sql/genre-topn.top-1.out
ok 6 - src/sql/genre-tracks.out
55. Geolocation & earthdistance
with geoloc as
(
select location as l
from location
join blocks using(locid)
where iprange
>>=
'212.58.251.195'
)
select name,
pos <@> l miles
from pubnames, geoloc
order by pos <-> l
limit 10;
name │ miles
═════════════════════╪═══════════════════
The Windmill │ 0.238820308117723
County Hall Arms │ 0.343235607674773
St Stephen's Tavern │ 0.355548630092567
The Red Lion │ 0.417746499125936
Zeitgeist │ 0.395340599421532
The Rose │ 0.462805636194762
The Black Dog │ 0.536202634581979
All Bar One │ 0.489581827372222
Slug and Lettuce │ 0.49081531378207
Westminster Arms │ 0.42400619117691
(10 rows)
56. NBA Games Statistics
“An interesting factoid: the team that recorded the
fewest defensive rebounds in a win was the
1995-96 Toronto Raptors, who beat the
Milwaukee Bucks 93-87 on 12/26/1995 despite
recording only 14 defensive rebounds.”
57. with stats(game, team, drb, min) as (
select ts.game, ts.team, drb, min(drb) over ()
from team_stats ts
join winners w on w.id = ts.game
and w.winner = ts.team
)
select game.date::date,
host.name || ' -- ' || host_score as host,
guest.name || ' -- ' || guest_score as guest,
stats.drb as winner_drb
from stats
join game on game.id = stats.game
join team host on host.id = game.host
join team guest on guest.id = game.guest
where drb = min;
NBA Games Statistics
58. -[ RECORD 1 ]----------------------------
date | 1995-12-26
host | Toronto Raptors -- 93
guest | Milwaukee Bucks -- 87
winner_drb | 14
-[ RECORD 2 ]----------------------------
date | 1996-02-02
host | Golden State Warriors -- 114
guest | Toronto Raptors -- 111
winner_drb | 14
-[ RECORD 3 ]----------------------------
date | 1998-03-31
host | Vancouver Grizzlies -- 101
guest | Dallas Mavericks -- 104
winner_drb | 14
-[ RECORD 4 ]----------------------------
date | 2009-01-14
host | New York Knicks -- 128
guest | Washington Wizards -- 122
winner_drb | 14
Time: 126.276 ms
NBA Games Statistics
59. Pure SQL Histograms
with drb_stats as (
select min(drb) as min,
max(drb) as max
from team_stats
),
histogram as (
select width_bucket(drb, min, max, 9) as bucket,
int4range(min(drb), max(drb), '[]') as range,
count(*) as freq
from team_stats, drb_stats
group by bucket
order by bucket
)
select bucket, range, freq,
repeat('■',
( freq::float
/ max(freq) over()
* 30
)::int
) as bar
from histogram;