SlideShare a Scribd company logo
SQL Analytic Queries ...
Tips & Tricks
Mostly in PostgreSQL
What are we going to talk about?
- Some less (or more) know facts about SQL
- Revision history (just most important parts)
- Quickly go through SQL Basics, since we all know those, right
- Range of SQL Advanced topics with comparison and parallels of real-world
situations and applications
- Conclusion, discussion and QA
Some less (or more) know facts about SQL ...
- SQL (Structured Query Language) is STANDARDIZED
internationally!
- By ISO (International Organization for Standardization) committee.
- All existing implementations follow same standards:
Oracle, MSSQL, MySQL, IBM DB2 PostgresSQL, etc, etc ...
- Revisions of standards so far (last 30 years):
SQL-86, SQL-89, SQL-92, SQL:1999 (SQL3), SQL:2003, SQL:2008,
SQL:2011, SQL:2016
Some less (or more) know facts about SQL ...
Today, after many revisions, SQL is:
- Turing complete
- Computationally Universal
- Calculation Engine
* Turing complete means that can be used to write any algorithm or “any
software”.
* In other words - it can do “anything”.
Today, SQL is also:
- Only ever successful 4th generation general-purpose
programming language in existence (known to mankind)
- Python, Java, C# and all others - are still 3rd generation languages ...
- 4th gen language - abstracts (or hides) unimportant details from user:
hardware, algorithms, processes, threads, etc...
* take a deep breath and let that sit for a while ...
Some less (or more) know facts about SQL ...
Some less (or more) know facts about SQL ...
SQL is also:
- Declarative
- You just tell or declare to machine what you want.
- Let the machine to figure out for you how.
* That’s how Oracle got its name
- Let’s you focus on your business logic and your problem and what
is really really important to you …
Revision history - SQL-92
SQL-92 - most important parts
- DATE, TIME, TIMESTAMP, INTERVAL, BIT string, VARCHAR strings
- UNION JOIN, NATURAL JOIN
- Conditional expressions with CASE (upgraded in SQL:2008)
- ALTER and DROP, CHECK constraint
- INFORMATION_SCHEMA tables
- Temporary tables; CREATE TEMP TABLE
- CAST (expr AS type), Scroll Cursors…
- Two extensions, published after standard:
- SQL/CLI (Call Level Interface) - 1995
- SQL/PSM (stored procedures) - 1996
* PostgresSQL 11 (released 2016-10-08) - finally implements stored procedures, standardized in 1996
SQL:1999 (SQL3) - most important parts
- Boolean type, user defined types
- Common Table Expressions (CTE), WITH clause, RECURSIVE queries
- Grouping sets, Group By ROLLUP, Group By CUBE
- Role-based Access Control - CREATE ROLE
- UNNEST keyword
Revision history - SQL:1999 (SQL3)
SQL:2003 - most important parts
- XML features and functions
- Window functions (ROW_NUMBER OVER, RANK OVER…)
- Auto-generated values (default values)
- Sequence generators, IDENTITY columns
Revision history - SQL:2003
SQL:2008 (ISO/IEC 9075:2008) - most important parts
- TRUNCATE TABLE
- CASE WHEN ELSE
- TRIGGERS (INSTEAD OF)
- Partitioned JOINS
- XQuery, pattern matching ...
Revision history - SQL:2008 (ISO/IEC 9075:2008)
SQL:2011 (ISO/IEC 9075:2011) - most important parts
- Support for TEMPORAL databases:
- Time period tables PERIOD FOR
- Temporal primary keys and temporal referential integrity
- System versioned tables (AS OF SYSTEM_TIME, and VERSIONS BETWEEN SYSTEM
TIME)
- Allows working with “historic” data
* MSSQL2016, Oracle 12c, MariaDB v10.3 fully implements, IBM DB2 v10 uses alternative syntax.
* PostgreSQL requires installation of the temporal_tables extension
Revision history - SQL:2011 (ISO/IEC 9075:2011)
SQL:2016 (ISO/IEC 9075:2016) - most important parts
- JSON functions and full support
- Row pattern recognition, matching a row sequence against a regular expression patterns
- Date and time formatting and parsing functions
- LISTAGG - function to transform values to row
- Functions without return time (polymorphic functions)
Revision history - SQL:2016 (ISO/IEC 9075:2016)
1. Basics - EVERYTHING is a set (or table)
-- this is a table:
my_table;
-- this is another table:
select * from my_table;
-- this is again table (with hardcoded values):
values ('first'), ('second'), ('third');
-- yep, you've guess it, another table (or set if you like):
select * from (
values ('first'), ('second'), ('third')
) t;
-- we can give name to our table as we like:
select * from (
values (1, 'first'), (2, 'second'), (3, 'third')
) as t (id, description);
-- we can use pre-defined functions as tables, this one will return series:
select i from generate_series(1,10) as t (i)
1. Basics - execution order
/***
Queries are always executed in following
order:
1. CTE - Common table expressions
2. FROM and JOINS
3. WHERE
4. GROUP BY
5. HAVING
6. [Window functions]
7. SELECT
8. ORDER BY
9. LIMIT
***/
CTE
WHERE
HAVING [Window func.]
FROM, JOIN
GROUP BY
SELECT
ORDER BY
LIMIT
2. TEMP TABLES
-- temp table lives during and it is limited visible to connection:
create temp table temp_test1 (id int, t text);
-- only I can see you, no other connection know that you exist
select * from temp_test1;
-- they can be created on fly (and usually are) from another table or query using "into":
select *
into temp temp_test2 from (
values (1, 'first'), (2, 'second'), (3, 'third')
) as t (id, description);
-- let's see:
select * from temp_test2;
2. TEMP TABLES
Expensive query
(joins, filters)
INTO TEMP
table
Counts and statistics
data from TEMP
Sort and page
from TEMP
Return multiple
result sets
single connection
- Used a lot for optimizations (avoid repeating expensive operations by using temp tables - caching)
- Note that hardware is abstracted, we don’t know is it on disk or in memory, that’s not the point
- Typical, common usage - paging and sorting from large tables with expensive joins, with calculation of
counts and statistics.
3. CTE - Common Table Expressions (WITH queries)
-- we can use common table expressions for same purpose as temp tables:
with my_cte as (
select i from generate_series(1,10) as t (i)
)
select * from my_cte;
-- we can combine multiple CTE's, Postgres will optimize every CTE individually:
with my_cte1 as (
select i from generate_series(1,3) as t (i)
),
my_cte2 as (
select i from generate_series(4,6) as t (i)
),
my_cte3 as (
select i from generate_series(7,9) as t (i)
)
select * from my_cte1
union --intersect
select * from my_cte2
union
select * from my_cte3;
3. CTE - Common Table Expressions (WITH queries) - RECURSION
-- CTE can be used for recursive queries:
with recursive t(i) as (
values (1) -- recursion seed
union all
select i + 1 from t where i < 10 --call
)
select i from t;
-- Typically, used for efficient processing of tree structures, example:
create temp table employees (id serial, name varchar, manager_id int);
insert into employees (name, manager_id)
values ('Michael North', NULL), ('Megan Berry', 1), ('Sarah Berry', 2),
('Zoe Black', 1), ('Tim James', 2), ('Bella Tucker', 2), ('Ryan Metcalfe',
2), ('Max Mills', 2), ('Benjamin Glover', 3) ,('Carolyn Henderson', 4);
select * from employees;
-- Returns ALL subordinates of the manager with the id 2:
with recursive subordinates AS (
select id, manager_id, name from employees where id = 2
union
select e.id, e.manager_id, e.name
from employees e
inner join subordinates s on e.manager_id = s.id
)
select * from subordinates;
4. UNNEST and AGGREGATE
-- any array can be unnest-ed to row values:
select unnest(array[1, 2, 3]);
-- any row values can aggregated back to array
select array_agg(i)
from (
values (1), (2), (3)
) t(i);
-- any row values can aggregated back to json array
select json_agg(i)
from (
values (1), (2), (3)
) t(i);
-- from row values to array and back to row values
select unnest(array_agg(i))
from (
values (1), (2), (3)
) t(i);
5. Subqueries
-- First ten dates in january with extracted day numbers
select cast(d as date), extract(day from d) as i
from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 days') as d(d); --ISO type cast
-- First ten dates in february with extracted day numbers
select d::date, extract(day from d) as i
from generate_series('2018-02-01'::date, '2018-02-10'::date, '1 days') as d(d); -- Postgres cast (using ::)
-- Any table expression anywhere can be replaced by another query which is also table expression:
-- So we can join previous queries as SUBQUERIES:
select first_month.i, first_month.d as first_month, second_month.d as second_month
from (
select cast(d as date), extract(day from d) as i
from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 days') as d(d)
) first_month inner join (
select cast(d as date), extract(day from d) as i
from generate_series(cast('2018-02-01' as date), cast('2018-02-10' as date), '1 days') as d(d)
) second_month on first_month.i = second_month.i;
5. Subqueries
-- subquery can be literary everywhere, but, sometimes needs to be limited to single value:
select cast(d as date),
(
select cast(d as date)
from generate_series(cast('2018-02-01' as date), cast('2018-02-10' as date), '1 days') as sub(d)
where extract(day from sub) = extract(day from d)
limit 1
) as february
from generate_series(cast('2018-02-01' as date), cast('2018-02-10' as date), '1 days') as d(d);
-- or it can multiple values in single row to be filtered in where clause:
select cast(d as date)
from generate_series(cast('2018-02-01' as date), cast('2018-02-10' as date), '1 days') as d(d)
where extract(day from d) in (
select extract(day from sub)
from generate_series(cast('2018-02-01' as date), cast('2018-02-10' as date), '1 days') as sub(d)
)
-- How efficient are these queries ??? What we actually want our machine to do?
-- Let see what execution plan has to say ...
6. LATERAL joins
-- What if want to reference one subquery from another?
-- This doesn't work, we cannot reference joined subquery from outer table:
select by_day.d as date, counts_day.count
from (
select cast(d as date), extract(day from d) as i
from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 days') as d(d)
) by_day inner join (
select count(*) as count, extract(day from d) as i
from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 hours') as d(d)
where extract(day from d) = by_day.i
group by extract(day from d)
) counts_day on by_day.i = counts_day.i;
6. LATERAL joins
-- To achieve this, we must use LATERAL join:
select by_day.d as date, counts_day.count
from (
select cast(d as date), extract(day from d) as i
from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 days') as d(d)
) by_day inner join lateral (
select count(*) as count, extract(day from d) as i
from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 hours') as d(d)
where extract(day from d) = by_day.i
group by extract(day from d)
) counts_day on by_day.i = counts_day.i;
6. LATERAL joins
-- Now, we can simplify even further this query:
select by_day.d as date, counts_day.count
from (
select cast(d as date), extract(day from d) as i
from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 days') as d(d)
) by_day inner join lateral (
select count(*) as count
from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 hours') as d(d)
where extract(day from d) = by_day.i
) counts_day on true;
7. DISTINCT ON
create temp table sales (brand varchar, segment varchar, quantity int);
insert into sales values ('ABC', 'Premium', 100), ('ABC', 'Basic', 200), ('XYZ', 'Premium', 100), ('XYZ', 'Basic', 300);
select * from sales;
-- brands with highest quantities:
select brand, max(quantity)
from sales
group by brand;
-- what are segments of brands with highest quantities? This is NOT allowed:
select brand, max(quantity), segment
from sales
group by brand;
-- we must use select distinct on:
select distinct on (brand) brand, quantity, segment
from sales
order by brand, quantity desc;
8. OLAP: GROUPING, GROUPING SETS, CUBE, ROLLUP
create temp table sales (brand varchar, segment varchar, quantity int);
insert into sales values ('ABC', 'Premium', 100), ('ABC', 'Basic', 200), ('XYZ', 'Premium', 100), ('XYZ', 'Basic', 300);
-- sum quantities by brand and segment:
select brand, segment, sum(quantity) from sales group by brand, segment;
-- sum quantities by brand only:
select brand, sum(quantity) from sales group by brand;
-- sum quantities by segment only:
select segment, sum(quantity) from sales group by segment;
-- sum all quantities:
select sum(quantity) from sales;
-- we can union of all of these queries but this is long an extremely un-efficient:
select brand, segment, sum(quantity) from sales group by brand, segment
union all
select brand, null as segment, sum(quantity) from sales group by brand
union all
select null as brand, segment, sum(quantity) from sales group by segment
union all
select null as brand, null as segment, sum(quantity) from sales;
8. OLAP: GROUPING, GROUPING SETS, CUBE, ROLLUP
-- unless we use grouping sets to get all sums by all categories
-- this is many times more efficient instead of separate queries with union
-- and lot shorter and easier to read:
select
brand, segment, sum(quantity)
from
sales
group by grouping sets (
(brand, segment),
(brand),
(segment),
()
)
order by
brand nulls last, segment nulls last;
8. OLAP: GROUPING, GROUPING SETS, CUBE, ROLLUP
-- generate ALL possible grouping combinations:
CUBE(c1,c2,c3)
-- results in:
GROUPING SETS (
(c1,c2,c3),
(c1,c2),
(c1,c3),
(c2,c3),
(c1),
(c2),
(c3),
()
)
-- previous example:
select brand, segment, sum(quantity)
from sales
group by cube (brand, segment);
8. OLAP: GROUPING, GROUPING SETS, CUBE, ROLLUP
-- generate grouping combinations by assuming hierarchy c1 > c2 > c3
ROLLUP(c1,c2,c3)
-- results in:
GROUPING SETS (
(c1, c2, c3)
(c1, c2)
(c1)
()
)
-- previous example:
select brand, segment, sum(quantity)
from sales
group by rollup (brand, segment);
-- results in:
select brand, segment, sum(quantity)
from sales
group by grouping sets (
(brand, segment),
(brand),
()
);
9. OLAP: WINDOW FUNCTIONS
create temp table employee (id serial, department varchar, salary int);
insert into employee (department, salary)
values
('develop', 5200), ('develop', 4200), ('develop', 4500), ('develop', 6000), ('develop', 5200),
('personnel', 3500), ('personnel', 3900),
('sales', 4800), ('sales', 5000), ('sales', 4800);
-- average salaries by department will return less rows because it is grouped by
select department, avg(salary)
from employee
group by department;
-- but not if we use aggregate function over partition (window) - this returns ALL records:
select department, salary, avg(salary) over (partition by department)
from employee;
9. OLAP: WINDOW FUNCTIONS
-- syntax:
window_function(arg1, arg2,..) OVER (PARTITION BY expression ORDER BY expression)
-- return all employees, no grouping
select
department, salary,
-- average salary:
avg(salary) over (partition by department),
-- employee order number within department (window):
row_number() over (partition by department order by id),
-- rank of employee salary within department (window):
rank() over (partition by department order by salary)
from employee;
BONUS: Mandelbrot set fractal
WITH RECURSIVE
x(i)
AS (
VALUES(0)
UNION ALL
SELECT i + 1 FROM x WHERE i < 101
),
Z(Ix, Iy, Cx, Cy, X, Y, I)
AS (
SELECT Ix, Iy, X::FLOAT, Y::FLOAT, X::FLOAT, Y::FLOAT, 0
FROM
(SELECT -2.2 + 0.031 * i, i FROM x) AS xgen(x,ix)
CROSS JOIN
(SELECT -1.5 + 0.031 * i, i FROM x) AS ygen(y,iy)
UNION ALL
SELECT Ix, Iy, Cx, Cy, X * X - Y * Y + Cx AS X, Y * X * 2 + Cy, I + 1
FROM Z
WHERE X * X + Y * Y < 16.0
AND I < 27
),
Zt (Ix, Iy, I) AS (
SELECT Ix, Iy, MAX(I) AS I
FROM Z
GROUP BY Iy, Ix
ORDER BY Iy, Ix
)
SELECT array_to_string(
array_agg(
SUBSTRING(
' .,,,-----++++%%%%@@@@#### ',
GREATEST(I,1),
1
)
),''
)
FROM Zt GROUP BY Iy ORDER BY Iy;
Conclusion and final words
- SQL is “mysterious machine”. Even after 15 years can pull some new surprises.
- Practice is the key. You need to practice, practice and get some more practice.
- Payoffs are huge: Application performances can be improve dramatically with significantly less
code.
- It can reduce amount of code and significantly improve system maintainability many, many times.
- It can be intimidating to some. Percentage of keywords in code is much higher, levels of
assembler code or cobol code.
- Don't be intimidated, it will pay off in the end. Any day gone without learn anything new is wasted
day.

More Related Content

What's hot

Mysql quick guide
Mysql quick guideMysql quick guide
Mysql quick guide
Sundaralingam Puvikanth
 
Mysql Ppt
Mysql PptMysql Ppt
Mysql Ppt
Hema Prasanth
 
Oracle Sql & PLSQL Complete guide
Oracle Sql & PLSQL Complete guideOracle Sql & PLSQL Complete guide
Oracle Sql & PLSQL Complete guide
Raviteja Chowdary Adusumalli
 
Best sql plsql material
Best sql plsql materialBest sql plsql material
Best sql plsql material
pitchaiah yechuri
 
Les11 Including Constraints
Les11 Including ConstraintsLes11 Including Constraints
Adbms 21 sql 99 schema definition constraints and queries
Adbms 21 sql 99 schema definition constraints and queriesAdbms 21 sql 99 schema definition constraints and queries
Adbms 21 sql 99 schema definition constraints and queries
Vaibhav Khanna
 
Oracle sql material
Oracle sql materialOracle sql material
Oracle sql material
prathap kumar
 
SQL
SQLSQL
Oracle ORA Errors
Oracle ORA ErrorsOracle ORA Errors
Oracle ORA Errors
Manish Mudhliyar
 
Les10 Creating And Managing Tables
Les10 Creating And Managing TablesLes10 Creating And Managing Tables
Database administration commands
Database administration commands Database administration commands
Database administration commands
Varsha Ajith
 
DML using oracle
 DML using oracle DML using oracle
DML using oracle
Farhan Aslam
 
mySQL and Relational Databases
mySQL and Relational DatabasesmySQL and Relational Databases
mySQL and Relational Databases
webhostingguy
 
MYSQL
MYSQLMYSQL
SQL
SQLSQL
Lab
LabLab
Dbms lab Manual
Dbms lab ManualDbms lab Manual
Dbms lab Manual
Vivek Kumar Sinha
 
BITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQLBITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQL
BITS
 
MySQL lecture
MySQL lectureMySQL lecture
MySQL lecture
webhostingguy
 

What's hot (19)

Mysql quick guide
Mysql quick guideMysql quick guide
Mysql quick guide
 
Mysql Ppt
Mysql PptMysql Ppt
Mysql Ppt
 
Oracle Sql & PLSQL Complete guide
Oracle Sql & PLSQL Complete guideOracle Sql & PLSQL Complete guide
Oracle Sql & PLSQL Complete guide
 
Best sql plsql material
Best sql plsql materialBest sql plsql material
Best sql plsql material
 
Les11 Including Constraints
Les11 Including ConstraintsLes11 Including Constraints
Les11 Including Constraints
 
Adbms 21 sql 99 schema definition constraints and queries
Adbms 21 sql 99 schema definition constraints and queriesAdbms 21 sql 99 schema definition constraints and queries
Adbms 21 sql 99 schema definition constraints and queries
 
Oracle sql material
Oracle sql materialOracle sql material
Oracle sql material
 
SQL
SQLSQL
SQL
 
Oracle ORA Errors
Oracle ORA ErrorsOracle ORA Errors
Oracle ORA Errors
 
Les10 Creating And Managing Tables
Les10 Creating And Managing TablesLes10 Creating And Managing Tables
Les10 Creating And Managing Tables
 
Database administration commands
Database administration commands Database administration commands
Database administration commands
 
DML using oracle
 DML using oracle DML using oracle
DML using oracle
 
mySQL and Relational Databases
mySQL and Relational DatabasesmySQL and Relational Databases
mySQL and Relational Databases
 
MYSQL
MYSQLMYSQL
MYSQL
 
SQL
SQLSQL
SQL
 
Lab
LabLab
Lab
 
Dbms lab Manual
Dbms lab ManualDbms lab Manual
Dbms lab Manual
 
BITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQLBITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQL
 
MySQL lecture
MySQL lectureMySQL lecture
MySQL lecture
 

Similar to Sql analytic queries tips

SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
Jay Coskey
 
MySQL Database System Hiep Dinh
MySQL Database System Hiep DinhMySQL Database System Hiep Dinh
MySQL Database System Hiep Dinh
webhostingguy
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
metsarin
 
MySql slides (ppt)
MySql slides (ppt)MySql slides (ppt)
MySql slides (ppt)
webhostingguy
 
Overview of Oracle database12c for developers
Overview of Oracle database12c for developersOverview of Oracle database12c for developers
Overview of Oracle database12c for developers
Getting value from IoT, Integration and Data Analytics
 
dbs class 7.ppt
dbs class 7.pptdbs class 7.ppt
dbs class 7.ppt
MARasheed3
 
Oracle Material.pdf
Oracle Material.pdfOracle Material.pdf
Oracle Material.pdf
rajeshkathavarayan
 
Sql 3
Sql 3Sql 3
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
TAISEEREISA
 
Sql lite android
Sql lite androidSql lite android
Sql lite android
Dushyant Nasit
 
Sql
SqlSql
SQL Macros - Game Changing Feature for SQL Developers?
SQL Macros - Game Changing Feature for SQL Developers?SQL Macros - Game Changing Feature for SQL Developers?
SQL Macros - Game Changing Feature for SQL Developers?
Andrej Pashchenko
 
My Sql concepts
My Sql conceptsMy Sql concepts
My Sql concepts
Pragya Rastogi
 
Database Oracle Basic
Database Oracle BasicDatabase Oracle Basic
Database Oracle Basic
Kamlesh Singh
 
DDL(Data defination Language ) Using Oracle
DDL(Data defination Language ) Using OracleDDL(Data defination Language ) Using Oracle
DDL(Data defination Language ) Using Oracle
Farhan Aslam
 
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
Tony jambu   (obscure) tools of the trade for tuning oracle sq lsTony jambu   (obscure) tools of the trade for tuning oracle sq ls
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
InSync Conference
 
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
MariaDB plc
 
Mysqlppt
MysqlpptMysqlppt
Mysqlppt
Reka
 
My sql with querys
My sql with querysMy sql with querys
My sql with querys
NIRMAL FELIX
 
lec02-data-models-sql-basics.pptx
lec02-data-models-sql-basics.pptxlec02-data-models-sql-basics.pptx
lec02-data-models-sql-basics.pptx
cAnhTrn53
 

Similar to Sql analytic queries tips (20)

SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
 
MySQL Database System Hiep Dinh
MySQL Database System Hiep DinhMySQL Database System Hiep Dinh
MySQL Database System Hiep Dinh
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
MySql slides (ppt)
MySql slides (ppt)MySql slides (ppt)
MySql slides (ppt)
 
Overview of Oracle database12c for developers
Overview of Oracle database12c for developersOverview of Oracle database12c for developers
Overview of Oracle database12c for developers
 
dbs class 7.ppt
dbs class 7.pptdbs class 7.ppt
dbs class 7.ppt
 
Oracle Material.pdf
Oracle Material.pdfOracle Material.pdf
Oracle Material.pdf
 
Sql 3
Sql 3Sql 3
Sql 3
 
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
 
Sql lite android
Sql lite androidSql lite android
Sql lite android
 
Sql
SqlSql
Sql
 
SQL Macros - Game Changing Feature for SQL Developers?
SQL Macros - Game Changing Feature for SQL Developers?SQL Macros - Game Changing Feature for SQL Developers?
SQL Macros - Game Changing Feature for SQL Developers?
 
My Sql concepts
My Sql conceptsMy Sql concepts
My Sql concepts
 
Database Oracle Basic
Database Oracle BasicDatabase Oracle Basic
Database Oracle Basic
 
DDL(Data defination Language ) Using Oracle
DDL(Data defination Language ) Using OracleDDL(Data defination Language ) Using Oracle
DDL(Data defination Language ) Using Oracle
 
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
Tony jambu   (obscure) tools of the trade for tuning oracle sq lsTony jambu   (obscure) tools of the trade for tuning oracle sq ls
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
 
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
Die Neuheiten in MariaDB 10.2 und MaxScale 2.1
 
Mysqlppt
MysqlpptMysqlppt
Mysqlppt
 
My sql with querys
My sql with querysMy sql with querys
My sql with querys
 
lec02-data-models-sql-basics.pptx
lec02-data-models-sql-basics.pptxlec02-data-models-sql-basics.pptx
lec02-data-models-sql-basics.pptx
 

Recently uploaded

Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 

Recently uploaded (20)

Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 

Sql analytic queries tips

  • 1. SQL Analytic Queries ... Tips & Tricks Mostly in PostgreSQL
  • 2. What are we going to talk about? - Some less (or more) know facts about SQL - Revision history (just most important parts) - Quickly go through SQL Basics, since we all know those, right - Range of SQL Advanced topics with comparison and parallels of real-world situations and applications - Conclusion, discussion and QA
  • 3. Some less (or more) know facts about SQL ... - SQL (Structured Query Language) is STANDARDIZED internationally! - By ISO (International Organization for Standardization) committee. - All existing implementations follow same standards: Oracle, MSSQL, MySQL, IBM DB2 PostgresSQL, etc, etc ... - Revisions of standards so far (last 30 years): SQL-86, SQL-89, SQL-92, SQL:1999 (SQL3), SQL:2003, SQL:2008, SQL:2011, SQL:2016
  • 4. Some less (or more) know facts about SQL ... Today, after many revisions, SQL is: - Turing complete - Computationally Universal - Calculation Engine * Turing complete means that can be used to write any algorithm or “any software”. * In other words - it can do “anything”.
  • 5. Today, SQL is also: - Only ever successful 4th generation general-purpose programming language in existence (known to mankind) - Python, Java, C# and all others - are still 3rd generation languages ... - 4th gen language - abstracts (or hides) unimportant details from user: hardware, algorithms, processes, threads, etc... * take a deep breath and let that sit for a while ... Some less (or more) know facts about SQL ...
  • 6. Some less (or more) know facts about SQL ... SQL is also: - Declarative - You just tell or declare to machine what you want. - Let the machine to figure out for you how. * That’s how Oracle got its name - Let’s you focus on your business logic and your problem and what is really really important to you …
  • 7. Revision history - SQL-92 SQL-92 - most important parts - DATE, TIME, TIMESTAMP, INTERVAL, BIT string, VARCHAR strings - UNION JOIN, NATURAL JOIN - Conditional expressions with CASE (upgraded in SQL:2008) - ALTER and DROP, CHECK constraint - INFORMATION_SCHEMA tables - Temporary tables; CREATE TEMP TABLE - CAST (expr AS type), Scroll Cursors… - Two extensions, published after standard: - SQL/CLI (Call Level Interface) - 1995 - SQL/PSM (stored procedures) - 1996 * PostgresSQL 11 (released 2016-10-08) - finally implements stored procedures, standardized in 1996
  • 8. SQL:1999 (SQL3) - most important parts - Boolean type, user defined types - Common Table Expressions (CTE), WITH clause, RECURSIVE queries - Grouping sets, Group By ROLLUP, Group By CUBE - Role-based Access Control - CREATE ROLE - UNNEST keyword Revision history - SQL:1999 (SQL3)
  • 9. SQL:2003 - most important parts - XML features and functions - Window functions (ROW_NUMBER OVER, RANK OVER…) - Auto-generated values (default values) - Sequence generators, IDENTITY columns Revision history - SQL:2003
  • 10. SQL:2008 (ISO/IEC 9075:2008) - most important parts - TRUNCATE TABLE - CASE WHEN ELSE - TRIGGERS (INSTEAD OF) - Partitioned JOINS - XQuery, pattern matching ... Revision history - SQL:2008 (ISO/IEC 9075:2008)
  • 11. SQL:2011 (ISO/IEC 9075:2011) - most important parts - Support for TEMPORAL databases: - Time period tables PERIOD FOR - Temporal primary keys and temporal referential integrity - System versioned tables (AS OF SYSTEM_TIME, and VERSIONS BETWEEN SYSTEM TIME) - Allows working with “historic” data * MSSQL2016, Oracle 12c, MariaDB v10.3 fully implements, IBM DB2 v10 uses alternative syntax. * PostgreSQL requires installation of the temporal_tables extension Revision history - SQL:2011 (ISO/IEC 9075:2011)
  • 12. SQL:2016 (ISO/IEC 9075:2016) - most important parts - JSON functions and full support - Row pattern recognition, matching a row sequence against a regular expression patterns - Date and time formatting and parsing functions - LISTAGG - function to transform values to row - Functions without return time (polymorphic functions) Revision history - SQL:2016 (ISO/IEC 9075:2016)
  • 13. 1. Basics - EVERYTHING is a set (or table) -- this is a table: my_table; -- this is another table: select * from my_table; -- this is again table (with hardcoded values): values ('first'), ('second'), ('third'); -- yep, you've guess it, another table (or set if you like): select * from ( values ('first'), ('second'), ('third') ) t; -- we can give name to our table as we like: select * from ( values (1, 'first'), (2, 'second'), (3, 'third') ) as t (id, description); -- we can use pre-defined functions as tables, this one will return series: select i from generate_series(1,10) as t (i)
  • 14. 1. Basics - execution order /*** Queries are always executed in following order: 1. CTE - Common table expressions 2. FROM and JOINS 3. WHERE 4. GROUP BY 5. HAVING 6. [Window functions] 7. SELECT 8. ORDER BY 9. LIMIT ***/ CTE WHERE HAVING [Window func.] FROM, JOIN GROUP BY SELECT ORDER BY LIMIT
  • 15. 2. TEMP TABLES -- temp table lives during and it is limited visible to connection: create temp table temp_test1 (id int, t text); -- only I can see you, no other connection know that you exist select * from temp_test1; -- they can be created on fly (and usually are) from another table or query using "into": select * into temp temp_test2 from ( values (1, 'first'), (2, 'second'), (3, 'third') ) as t (id, description); -- let's see: select * from temp_test2;
  • 16. 2. TEMP TABLES Expensive query (joins, filters) INTO TEMP table Counts and statistics data from TEMP Sort and page from TEMP Return multiple result sets single connection - Used a lot for optimizations (avoid repeating expensive operations by using temp tables - caching) - Note that hardware is abstracted, we don’t know is it on disk or in memory, that’s not the point - Typical, common usage - paging and sorting from large tables with expensive joins, with calculation of counts and statistics.
  • 17. 3. CTE - Common Table Expressions (WITH queries) -- we can use common table expressions for same purpose as temp tables: with my_cte as ( select i from generate_series(1,10) as t (i) ) select * from my_cte; -- we can combine multiple CTE's, Postgres will optimize every CTE individually: with my_cte1 as ( select i from generate_series(1,3) as t (i) ), my_cte2 as ( select i from generate_series(4,6) as t (i) ), my_cte3 as ( select i from generate_series(7,9) as t (i) ) select * from my_cte1 union --intersect select * from my_cte2 union select * from my_cte3;
  • 18. 3. CTE - Common Table Expressions (WITH queries) - RECURSION -- CTE can be used for recursive queries: with recursive t(i) as ( values (1) -- recursion seed union all select i + 1 from t where i < 10 --call ) select i from t; -- Typically, used for efficient processing of tree structures, example: create temp table employees (id serial, name varchar, manager_id int); insert into employees (name, manager_id) values ('Michael North', NULL), ('Megan Berry', 1), ('Sarah Berry', 2), ('Zoe Black', 1), ('Tim James', 2), ('Bella Tucker', 2), ('Ryan Metcalfe', 2), ('Max Mills', 2), ('Benjamin Glover', 3) ,('Carolyn Henderson', 4); select * from employees; -- Returns ALL subordinates of the manager with the id 2: with recursive subordinates AS ( select id, manager_id, name from employees where id = 2 union select e.id, e.manager_id, e.name from employees e inner join subordinates s on e.manager_id = s.id ) select * from subordinates;
  • 19. 4. UNNEST and AGGREGATE -- any array can be unnest-ed to row values: select unnest(array[1, 2, 3]); -- any row values can aggregated back to array select array_agg(i) from ( values (1), (2), (3) ) t(i); -- any row values can aggregated back to json array select json_agg(i) from ( values (1), (2), (3) ) t(i); -- from row values to array and back to row values select unnest(array_agg(i)) from ( values (1), (2), (3) ) t(i);
  • 20. 5. Subqueries -- First ten dates in january with extracted day numbers select cast(d as date), extract(day from d) as i from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 days') as d(d); --ISO type cast -- First ten dates in february with extracted day numbers select d::date, extract(day from d) as i from generate_series('2018-02-01'::date, '2018-02-10'::date, '1 days') as d(d); -- Postgres cast (using ::) -- Any table expression anywhere can be replaced by another query which is also table expression: -- So we can join previous queries as SUBQUERIES: select first_month.i, first_month.d as first_month, second_month.d as second_month from ( select cast(d as date), extract(day from d) as i from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 days') as d(d) ) first_month inner join ( select cast(d as date), extract(day from d) as i from generate_series(cast('2018-02-01' as date), cast('2018-02-10' as date), '1 days') as d(d) ) second_month on first_month.i = second_month.i;
  • 21. 5. Subqueries -- subquery can be literary everywhere, but, sometimes needs to be limited to single value: select cast(d as date), ( select cast(d as date) from generate_series(cast('2018-02-01' as date), cast('2018-02-10' as date), '1 days') as sub(d) where extract(day from sub) = extract(day from d) limit 1 ) as february from generate_series(cast('2018-02-01' as date), cast('2018-02-10' as date), '1 days') as d(d); -- or it can multiple values in single row to be filtered in where clause: select cast(d as date) from generate_series(cast('2018-02-01' as date), cast('2018-02-10' as date), '1 days') as d(d) where extract(day from d) in ( select extract(day from sub) from generate_series(cast('2018-02-01' as date), cast('2018-02-10' as date), '1 days') as sub(d) ) -- How efficient are these queries ??? What we actually want our machine to do? -- Let see what execution plan has to say ...
  • 22. 6. LATERAL joins -- What if want to reference one subquery from another? -- This doesn't work, we cannot reference joined subquery from outer table: select by_day.d as date, counts_day.count from ( select cast(d as date), extract(day from d) as i from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 days') as d(d) ) by_day inner join ( select count(*) as count, extract(day from d) as i from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 hours') as d(d) where extract(day from d) = by_day.i group by extract(day from d) ) counts_day on by_day.i = counts_day.i;
  • 23. 6. LATERAL joins -- To achieve this, we must use LATERAL join: select by_day.d as date, counts_day.count from ( select cast(d as date), extract(day from d) as i from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 days') as d(d) ) by_day inner join lateral ( select count(*) as count, extract(day from d) as i from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 hours') as d(d) where extract(day from d) = by_day.i group by extract(day from d) ) counts_day on by_day.i = counts_day.i;
  • 24. 6. LATERAL joins -- Now, we can simplify even further this query: select by_day.d as date, counts_day.count from ( select cast(d as date), extract(day from d) as i from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 days') as d(d) ) by_day inner join lateral ( select count(*) as count from generate_series(cast('2018-01-01' as date), cast('2018-01-10' as date), '1 hours') as d(d) where extract(day from d) = by_day.i ) counts_day on true;
  • 25. 7. DISTINCT ON create temp table sales (brand varchar, segment varchar, quantity int); insert into sales values ('ABC', 'Premium', 100), ('ABC', 'Basic', 200), ('XYZ', 'Premium', 100), ('XYZ', 'Basic', 300); select * from sales; -- brands with highest quantities: select brand, max(quantity) from sales group by brand; -- what are segments of brands with highest quantities? This is NOT allowed: select brand, max(quantity), segment from sales group by brand; -- we must use select distinct on: select distinct on (brand) brand, quantity, segment from sales order by brand, quantity desc;
  • 26. 8. OLAP: GROUPING, GROUPING SETS, CUBE, ROLLUP create temp table sales (brand varchar, segment varchar, quantity int); insert into sales values ('ABC', 'Premium', 100), ('ABC', 'Basic', 200), ('XYZ', 'Premium', 100), ('XYZ', 'Basic', 300); -- sum quantities by brand and segment: select brand, segment, sum(quantity) from sales group by brand, segment; -- sum quantities by brand only: select brand, sum(quantity) from sales group by brand; -- sum quantities by segment only: select segment, sum(quantity) from sales group by segment; -- sum all quantities: select sum(quantity) from sales; -- we can union of all of these queries but this is long an extremely un-efficient: select brand, segment, sum(quantity) from sales group by brand, segment union all select brand, null as segment, sum(quantity) from sales group by brand union all select null as brand, segment, sum(quantity) from sales group by segment union all select null as brand, null as segment, sum(quantity) from sales;
  • 27. 8. OLAP: GROUPING, GROUPING SETS, CUBE, ROLLUP -- unless we use grouping sets to get all sums by all categories -- this is many times more efficient instead of separate queries with union -- and lot shorter and easier to read: select brand, segment, sum(quantity) from sales group by grouping sets ( (brand, segment), (brand), (segment), () ) order by brand nulls last, segment nulls last;
  • 28. 8. OLAP: GROUPING, GROUPING SETS, CUBE, ROLLUP -- generate ALL possible grouping combinations: CUBE(c1,c2,c3) -- results in: GROUPING SETS ( (c1,c2,c3), (c1,c2), (c1,c3), (c2,c3), (c1), (c2), (c3), () ) -- previous example: select brand, segment, sum(quantity) from sales group by cube (brand, segment);
  • 29. 8. OLAP: GROUPING, GROUPING SETS, CUBE, ROLLUP -- generate grouping combinations by assuming hierarchy c1 > c2 > c3 ROLLUP(c1,c2,c3) -- results in: GROUPING SETS ( (c1, c2, c3) (c1, c2) (c1) () ) -- previous example: select brand, segment, sum(quantity) from sales group by rollup (brand, segment); -- results in: select brand, segment, sum(quantity) from sales group by grouping sets ( (brand, segment), (brand), () );
  • 30. 9. OLAP: WINDOW FUNCTIONS create temp table employee (id serial, department varchar, salary int); insert into employee (department, salary) values ('develop', 5200), ('develop', 4200), ('develop', 4500), ('develop', 6000), ('develop', 5200), ('personnel', 3500), ('personnel', 3900), ('sales', 4800), ('sales', 5000), ('sales', 4800); -- average salaries by department will return less rows because it is grouped by select department, avg(salary) from employee group by department; -- but not if we use aggregate function over partition (window) - this returns ALL records: select department, salary, avg(salary) over (partition by department) from employee;
  • 31. 9. OLAP: WINDOW FUNCTIONS -- syntax: window_function(arg1, arg2,..) OVER (PARTITION BY expression ORDER BY expression) -- return all employees, no grouping select department, salary, -- average salary: avg(salary) over (partition by department), -- employee order number within department (window): row_number() over (partition by department order by id), -- rank of employee salary within department (window): rank() over (partition by department order by salary) from employee;
  • 32. BONUS: Mandelbrot set fractal WITH RECURSIVE x(i) AS ( VALUES(0) UNION ALL SELECT i + 1 FROM x WHERE i < 101 ), Z(Ix, Iy, Cx, Cy, X, Y, I) AS ( SELECT Ix, Iy, X::FLOAT, Y::FLOAT, X::FLOAT, Y::FLOAT, 0 FROM (SELECT -2.2 + 0.031 * i, i FROM x) AS xgen(x,ix) CROSS JOIN (SELECT -1.5 + 0.031 * i, i FROM x) AS ygen(y,iy) UNION ALL SELECT Ix, Iy, Cx, Cy, X * X - Y * Y + Cx AS X, Y * X * 2 + Cy, I + 1 FROM Z WHERE X * X + Y * Y < 16.0 AND I < 27 ), Zt (Ix, Iy, I) AS ( SELECT Ix, Iy, MAX(I) AS I FROM Z GROUP BY Iy, Ix ORDER BY Iy, Ix ) SELECT array_to_string( array_agg( SUBSTRING( ' .,,,-----++++%%%%@@@@#### ', GREATEST(I,1), 1 ) ),'' ) FROM Zt GROUP BY Iy ORDER BY Iy;
  • 33. Conclusion and final words - SQL is “mysterious machine”. Even after 15 years can pull some new surprises. - Practice is the key. You need to practice, practice and get some more practice. - Payoffs are huge: Application performances can be improve dramatically with significantly less code. - It can reduce amount of code and significantly improve system maintainability many, many times. - It can be intimidating to some. Percentage of keywords in code is much higher, levels of assembler code or cobol code. - Don't be intimidated, it will pay off in the end. Any day gone without learn anything new is wasted day.