OLAP Functions Support in
Informix 12.1
Bingjie Miao
IBM
1
Agenda
•What is OLAP
•OLAP functions in Informix
– the OVER clause
– supported OLAP functions
•Questions?
What is OLAP?
• On-Line Analytical Processing
• Commonly used in Business
Intelligence (BI) tools
– ranking products, salesmen, items, etc
– exposing trends in sales from historic data
– testing business scenarios (forecast)
– sales breakdown or aggregates on multiple
dimensions (Time, Region, Demographics, etc)
OLAP Functions in Informix
• Supports subset of commonly used
OLAP functions
• Enables more efficient query
processing from BI tools such as
Cognos
Example query with group by
select customer_num, count(*)
from orders
where customer_num <= 110
group by customer_num;
customer_num (count(*))
101 1
104 4
106 2
110 2
4 row(s) retrieved.
Example query with OLAP function
select customer_num, ship_date, ship_charge,
count(*) over (partition by customer_num)
from orders
where customer_num <= 110;
customer_num ship_date ship_charge (count(*))
101 05/26/2008 $15.30 1
104 05/23/2008 $10.80 4
104 07/03/2008 $5.00 4
104 06/01/2008 $10.00 4
104 07/10/2008 $12.20 4
106 05/30/2008 $19.20 2
106 07/03/2008 $12.30 2
110 07/06/2008 $13.80 2
110 07/16/2008 $6.30 2
9 row(s) retrieved.
Where does OLAP function fit?
Joins, group by,
having, aggregation
OLAP functions
Final order by
OLAP function as predicates
• Use derived table query block to compute
OLAP function first
select * from
(select customer_num, ship_date,
ship_charge,
count(*) over (partition by
customer_num) as cnt
from orders
where customer_num <= 110)
where cnt >= 3;
OLAP function example
• Running 3-month average sales for a
particular product during a particular period
select product_name,
avg(sales) over (
partition by region
order by year, month
rows between 1 preceding and 1 following
)
from total_sales
where product_id = 105
and year between 2001 and 2010;
The over() Clause
olap_func(arg) over(partition by clause
order by clause window frame clause)
• Defines the “domain” of OLAP function
calculation
– partition by: divide into partitions
– order by: ordering within each partition
– window frame: sliding window within each
partition
– all clauses optional
Partition By
sum(x) over (
partition by a, b
order by c, d
rows between 2 preceding and 2 following)
a=1, b=1
a=2, b=2
a=1, b=2
a=2, b=1
Order By
sum(x) over (
partition by a, b
order by c, d
rows between 2 preceding and 2 following)
partition a=1, b=2
c=1,d=1
c=1,d=2
c=1,d=3
c=2,d=2
c=2,d=4
c=3,d=1
c=4,d=1
c=4,d=2
Window Frame
c=1,d=1
c=1,d=2
c=1,d=3
c=2,d=2
c=2,d=4
c=3,d=1
c=4,d=1
c=4,d=2
sum(x) over (
partition by a, b
order by c, d
rows between 2 preceding and 2 following)
Partition By
• Divide result set of query into partitions for
computing of an OLAP function
• If partition by clause is not specified, then
entire result set is a single partition
max(salary) over (partition by dept_id)
sum(sales) over (partition by region)
avg(price) over ()
Order By
• Ordering within each partition
• Required for some OLAP functions
–ranking, window frame clause
• Support ASC/DESC, NULLS FIRST/NULLS LAST
rank() over (partition by dept
order by salary desc)
dense_rank() over(order by total_sales
nulls last)
Window Frame
• Defines a sliding window within a partition
• OLAP function value computed from rows in the
sliding window
• Order by clause is required
Physical vs. Logical Window Frame
• Physical window frame
– ROWS keyword
– count offset by position
– fixed window size
– order by one or more column expressions
• Logical window frame
– RANGE keyword
– count offset by value
– window size may vary
– order by single column (numeric, date or datetime type)
Window Frame Examples
avg(price) over (order by year, day
rows between 6 preceding and current row)
count(*) over (order by ship_date
range between 2 preceding and 2 following)
• Current row can be physically outside the window
avg(sales) over (order by month
range between 3 preceding and 1 preceding)
sum(sales) over (order by month
rows between 2 following and 5 following)
Order By – Special Semantics
• “cumulative” semantics in absence of window
frame clause
– for OLAP function that allows window frame clause
– equivalent to “ROWS between unbounded preceding
and current row”
select sales, sum(sales) over (order by quarter)
from sales where year = 2012
sales (sum)
120 120
135 255
127 382
153 535
Supported OLAP Functions
• Ranking functions
– RANK, DENSE_RANK (DENSERANK)
– PERCENT_RANK, CUME_DIST, NTILE
– LEAD, LAG
• Numbering functions
– ROW_NUMBER (ROWNUMBER)
• Aggregate functions
– SUM, COUNT, AVG, MIN, MAX
– STDEV, VARIANCE, RANGE
– FIRST_VALUE, LAST_VALUE
– RATIO_TO_REPORT (RATIOTOREPORT)
Ranking Functions
• Partition by clause is optional
• Order by clause is required
• Window frame clause is NOT allowed
• Duplicate value handling is different between
rank() and dense_rank()
– same rank given to all duplicates
– next rank used “skips” ranks already covered by
duplicates in rank(), but uses next rank for
dense_rank()
RANK vs DENSE_RANK
select emp_num, sales,
rank() over (order by sales) as rank,
dense_rank() over (order by sales) as dense_rank
from sales;
emp_num sales rank dense_rank
101 2,000 1 1
102 2,400 2 2
103 2,400 2 2
104 2,500 4 3
105 2,500 4 3
106 2,650 6 4
PERCENT_RANK and CUME_DIST
• Calculates ranking information as a percentile
• Returns value between 0 and 1
select emp_num, sales,
percent_rank() over (order by sales) as per_rank,
cume_dist() over (order by sales) as cume_dist
from sales;
emp_num sales per_rank cume_dist
101 2,000 0 0.166666667
102 2,400 0.2 0.500000000
103 2,400 0.2 0.500000000
104 2,500 0.6 0.833333333
105 2,500 0.6 0.833333333
106 2,650 1.0 1.000000000
NTILE
• Divides the ordered data set into N
number of tiles indicated by the
expression.
• Number of tiles needs to be exact
numeric with scale zero
NTILE Example
select name, salary,
ntile(5) over (partition by dept order by salary)
from employee;
name salary (ntile)
John 35,000 1
Jack 38,400 1
Julie 41,200 2
Manny 45,600 2
Nancy 47,300 3
Pat 49,500 4
Ray 51,300 5
LEAD and LAG
LEAD(expr, offset, default)
LAG(expr, offset, default)
• Gives LEAD/LAG value of the expression at the
specified offset
• offset is optional, default to 1 if not specified
• default is optional, NULL if not specified
– default used when offset goes beyond current partition
boundary
• NULL handling
– RESPECT NULLS (default)
– IGNORE NULLS
LEAD/LAG Example
select name, salary, lag(salary)
over (partition by dept order by salary),
lead(salary, 1, 0)
over (partition by dept order by salary)
from employee;
name salary (lag) (lead)
John 35,000 38,400
Jack 38,400 35,000 41,200
Julie 41,200 38,400 45,600
Manny 45,600 41,200 47,300
Nancy 47,300 45,600 49,500
Pat 49,500 47,300 51,300
Ray 51,300 49,500 0
LEAD/LAG NULL handling
select price,
lag(price ignore nulls, 1) over (order by day),
lead(salary, 1) ignore nulls over (order by day)
from stock_price;
price (lag) (lead)
18.25 18.37
18.37 18.25 19.03
18.37 19.03
18.37 19.03
19.03 18.37 18.59
18.59 19.03 18.21
18.21 18.59
Numbering Functions
• Partition by clause and order by clause are
optional
• Window frame clause is NOT allowed
• Provides sequential row number to result set
– regardless of duplicates when order by is specified
ROW_NUMBER Example
select row_number() over (order by sales),
emp_num, sales
from sales;
(row_number) emp_num sales
1 101 2,000
2 102 2,400
3 103 2,400
4 104 2,500
5 105 2,500
6 106 2,650
Aggregate Functions
• Partition by, order by and window frame
clauses are all optional
– window frame clause requires order by clause
• All currently supported aggregate functions
– SUM, COUNT, MIN, MAX, AVG, STDEV, RANGE,
VARIANCE
• New aggregate functions
– FIRST_VALUE/LAST_VALUE
– RATIO_TO_REPORT
Aggregate Function Example
select price,
avg(price) over (order by day
rows between 1 preceding and 1 following)
from stock_price;
price (avg)
18.25 18.31
18.37 18.31
18.37
19.03
19.03 18.81
18.59 18.61
18.21 18.40
DISTINCT handling
• DISTINCT is supported, however DISTINCT is mutually
exclusive with order by clause or window frame
clause
select emp_id, manager_id,
count(distinct manager_id)
over (partition by department)
from employee;
emp_id manager_id (count)
101 103 3
102 103 3
103 100 3
104 110 3
105 110 3
FIRST_VALUE and LAST_VALUE
• Gives FIRST/LAST value of current
partition
• NULL handling
–RESPECT NULLS (default)
–IGNORE NULLS
FIRST_VALUE/LAST_VALUE Example
select price, price – first_value(price)
over (partition by year order by day)
as diff_price
from stock_price;
price diff_price
18.25 0
18.37 0.12
19.03 0.78
18.59 0.34
18.21 -0.04
RATIO_TO_REPORT
• Computes the ratio of current value to
sum of all values in current partition or
window frame.
select emp_num, sales,
ratio_to_report(sales) over (partition by
year order by sales)
from sales;
RATIO_TO_REPORT Example
select year, sales, ratio_to_report(sales)
over (partition by year)
from sales;
year sales (ratio_to_report)
1998 2400 0.2308
1998 2550 0.2452
1998 2650 0.2548
1998 2800 0.2692
1999 2450 0.2311
1999 2575 0.2429
1999 2725 0.2571
1999 2850 0.2689
Nested OLAP Functions
• OLAP function can be nested inside another
OLAP function
select emp_id, salary, salary – first_value(salary)
over (order by rank() over (order by salary))
as diff_salary
from employee;
select sum(ntile(10) over (order by salary))
over (partition by department)
from employee;
OLAP functions and IWA
• Queries containing OLAP functions can be
accelerated by Informix Warehouse
Accelerator (IWA)
• IWA processes majority of the query block
– scan, join, group by, having, aggregation
• Informix server processes OLAP functions
based on query result from IWA
For more information
• Links to OLAP function in Informix 12.1
documentation
http://pic.dhe.ibm.com/infocenter/informix/v121/inde
x.jsp?topic=%2Fcom.ibm.sqls.doc
%2Fids_sqs_2583.htm
http://pic.dhe.ibm.com/infocenter/informix/v121/inde
x.jsp?topic=%2Fcom.ibm.acc.doc
%2Fids_acc_queries1.htm
Questions?
Bingjie Miao
bingjie@us.ibm.com
41

Olap Functions Suport in Informix

  • 1.
    OLAP Functions Supportin Informix 12.1 Bingjie Miao IBM 1
  • 2.
    Agenda •What is OLAP •OLAPfunctions in Informix – the OVER clause – supported OLAP functions •Questions?
  • 3.
    What is OLAP? •On-Line Analytical Processing • Commonly used in Business Intelligence (BI) tools – ranking products, salesmen, items, etc – exposing trends in sales from historic data – testing business scenarios (forecast) – sales breakdown or aggregates on multiple dimensions (Time, Region, Demographics, etc)
  • 4.
    OLAP Functions inInformix • Supports subset of commonly used OLAP functions • Enables more efficient query processing from BI tools such as Cognos
  • 5.
    Example query withgroup by select customer_num, count(*) from orders where customer_num <= 110 group by customer_num; customer_num (count(*)) 101 1 104 4 106 2 110 2 4 row(s) retrieved.
  • 6.
    Example query withOLAP function select customer_num, ship_date, ship_charge, count(*) over (partition by customer_num) from orders where customer_num <= 110; customer_num ship_date ship_charge (count(*)) 101 05/26/2008 $15.30 1 104 05/23/2008 $10.80 4 104 07/03/2008 $5.00 4 104 06/01/2008 $10.00 4 104 07/10/2008 $12.20 4 106 05/30/2008 $19.20 2 106 07/03/2008 $12.30 2 110 07/06/2008 $13.80 2 110 07/16/2008 $6.30 2 9 row(s) retrieved.
  • 7.
    Where does OLAPfunction fit? Joins, group by, having, aggregation OLAP functions Final order by
  • 8.
    OLAP function aspredicates • Use derived table query block to compute OLAP function first select * from (select customer_num, ship_date, ship_charge, count(*) over (partition by customer_num) as cnt from orders where customer_num <= 110) where cnt >= 3;
  • 9.
    OLAP function example •Running 3-month average sales for a particular product during a particular period select product_name, avg(sales) over ( partition by region order by year, month rows between 1 preceding and 1 following ) from total_sales where product_id = 105 and year between 2001 and 2010;
  • 10.
    The over() Clause olap_func(arg)over(partition by clause order by clause window frame clause) • Defines the “domain” of OLAP function calculation – partition by: divide into partitions – order by: ordering within each partition – window frame: sliding window within each partition – all clauses optional
  • 11.
    Partition By sum(x) over( partition by a, b order by c, d rows between 2 preceding and 2 following) a=1, b=1 a=2, b=2 a=1, b=2 a=2, b=1
  • 12.
    Order By sum(x) over( partition by a, b order by c, d rows between 2 preceding and 2 following) partition a=1, b=2 c=1,d=1 c=1,d=2 c=1,d=3 c=2,d=2 c=2,d=4 c=3,d=1 c=4,d=1 c=4,d=2
  • 13.
    Window Frame c=1,d=1 c=1,d=2 c=1,d=3 c=2,d=2 c=2,d=4 c=3,d=1 c=4,d=1 c=4,d=2 sum(x) over( partition by a, b order by c, d rows between 2 preceding and 2 following)
  • 14.
    Partition By • Divideresult set of query into partitions for computing of an OLAP function • If partition by clause is not specified, then entire result set is a single partition max(salary) over (partition by dept_id) sum(sales) over (partition by region) avg(price) over ()
  • 15.
    Order By • Orderingwithin each partition • Required for some OLAP functions –ranking, window frame clause • Support ASC/DESC, NULLS FIRST/NULLS LAST rank() over (partition by dept order by salary desc) dense_rank() over(order by total_sales nulls last)
  • 16.
    Window Frame • Definesa sliding window within a partition • OLAP function value computed from rows in the sliding window • Order by clause is required
  • 17.
    Physical vs. LogicalWindow Frame • Physical window frame – ROWS keyword – count offset by position – fixed window size – order by one or more column expressions • Logical window frame – RANGE keyword – count offset by value – window size may vary – order by single column (numeric, date or datetime type)
  • 18.
    Window Frame Examples avg(price)over (order by year, day rows between 6 preceding and current row) count(*) over (order by ship_date range between 2 preceding and 2 following) • Current row can be physically outside the window avg(sales) over (order by month range between 3 preceding and 1 preceding) sum(sales) over (order by month rows between 2 following and 5 following)
  • 19.
    Order By –Special Semantics • “cumulative” semantics in absence of window frame clause – for OLAP function that allows window frame clause – equivalent to “ROWS between unbounded preceding and current row” select sales, sum(sales) over (order by quarter) from sales where year = 2012 sales (sum) 120 120 135 255 127 382 153 535
  • 20.
    Supported OLAP Functions •Ranking functions – RANK, DENSE_RANK (DENSERANK) – PERCENT_RANK, CUME_DIST, NTILE – LEAD, LAG • Numbering functions – ROW_NUMBER (ROWNUMBER) • Aggregate functions – SUM, COUNT, AVG, MIN, MAX – STDEV, VARIANCE, RANGE – FIRST_VALUE, LAST_VALUE – RATIO_TO_REPORT (RATIOTOREPORT)
  • 21.
    Ranking Functions • Partitionby clause is optional • Order by clause is required • Window frame clause is NOT allowed • Duplicate value handling is different between rank() and dense_rank() – same rank given to all duplicates – next rank used “skips” ranks already covered by duplicates in rank(), but uses next rank for dense_rank()
  • 22.
    RANK vs DENSE_RANK selectemp_num, sales, rank() over (order by sales) as rank, dense_rank() over (order by sales) as dense_rank from sales; emp_num sales rank dense_rank 101 2,000 1 1 102 2,400 2 2 103 2,400 2 2 104 2,500 4 3 105 2,500 4 3 106 2,650 6 4
  • 23.
    PERCENT_RANK and CUME_DIST •Calculates ranking information as a percentile • Returns value between 0 and 1 select emp_num, sales, percent_rank() over (order by sales) as per_rank, cume_dist() over (order by sales) as cume_dist from sales; emp_num sales per_rank cume_dist 101 2,000 0 0.166666667 102 2,400 0.2 0.500000000 103 2,400 0.2 0.500000000 104 2,500 0.6 0.833333333 105 2,500 0.6 0.833333333 106 2,650 1.0 1.000000000
  • 24.
    NTILE • Divides theordered data set into N number of tiles indicated by the expression. • Number of tiles needs to be exact numeric with scale zero
  • 25.
    NTILE Example select name,salary, ntile(5) over (partition by dept order by salary) from employee; name salary (ntile) John 35,000 1 Jack 38,400 1 Julie 41,200 2 Manny 45,600 2 Nancy 47,300 3 Pat 49,500 4 Ray 51,300 5
  • 26.
    LEAD and LAG LEAD(expr,offset, default) LAG(expr, offset, default) • Gives LEAD/LAG value of the expression at the specified offset • offset is optional, default to 1 if not specified • default is optional, NULL if not specified – default used when offset goes beyond current partition boundary • NULL handling – RESPECT NULLS (default) – IGNORE NULLS
  • 27.
    LEAD/LAG Example select name,salary, lag(salary) over (partition by dept order by salary), lead(salary, 1, 0) over (partition by dept order by salary) from employee; name salary (lag) (lead) John 35,000 38,400 Jack 38,400 35,000 41,200 Julie 41,200 38,400 45,600 Manny 45,600 41,200 47,300 Nancy 47,300 45,600 49,500 Pat 49,500 47,300 51,300 Ray 51,300 49,500 0
  • 28.
    LEAD/LAG NULL handling selectprice, lag(price ignore nulls, 1) over (order by day), lead(salary, 1) ignore nulls over (order by day) from stock_price; price (lag) (lead) 18.25 18.37 18.37 18.25 19.03 18.37 19.03 18.37 19.03 19.03 18.37 18.59 18.59 19.03 18.21 18.21 18.59
  • 29.
    Numbering Functions • Partitionby clause and order by clause are optional • Window frame clause is NOT allowed • Provides sequential row number to result set – regardless of duplicates when order by is specified
  • 30.
    ROW_NUMBER Example select row_number()over (order by sales), emp_num, sales from sales; (row_number) emp_num sales 1 101 2,000 2 102 2,400 3 103 2,400 4 104 2,500 5 105 2,500 6 106 2,650
  • 31.
    Aggregate Functions • Partitionby, order by and window frame clauses are all optional – window frame clause requires order by clause • All currently supported aggregate functions – SUM, COUNT, MIN, MAX, AVG, STDEV, RANGE, VARIANCE • New aggregate functions – FIRST_VALUE/LAST_VALUE – RATIO_TO_REPORT
  • 32.
    Aggregate Function Example selectprice, avg(price) over (order by day rows between 1 preceding and 1 following) from stock_price; price (avg) 18.25 18.31 18.37 18.31 18.37 19.03 19.03 18.81 18.59 18.61 18.21 18.40
  • 33.
    DISTINCT handling • DISTINCTis supported, however DISTINCT is mutually exclusive with order by clause or window frame clause select emp_id, manager_id, count(distinct manager_id) over (partition by department) from employee; emp_id manager_id (count) 101 103 3 102 103 3 103 100 3 104 110 3 105 110 3
  • 34.
    FIRST_VALUE and LAST_VALUE •Gives FIRST/LAST value of current partition • NULL handling –RESPECT NULLS (default) –IGNORE NULLS
  • 35.
    FIRST_VALUE/LAST_VALUE Example select price,price – first_value(price) over (partition by year order by day) as diff_price from stock_price; price diff_price 18.25 0 18.37 0.12 19.03 0.78 18.59 0.34 18.21 -0.04
  • 36.
    RATIO_TO_REPORT • Computes theratio of current value to sum of all values in current partition or window frame. select emp_num, sales, ratio_to_report(sales) over (partition by year order by sales) from sales;
  • 37.
    RATIO_TO_REPORT Example select year,sales, ratio_to_report(sales) over (partition by year) from sales; year sales (ratio_to_report) 1998 2400 0.2308 1998 2550 0.2452 1998 2650 0.2548 1998 2800 0.2692 1999 2450 0.2311 1999 2575 0.2429 1999 2725 0.2571 1999 2850 0.2689
  • 38.
    Nested OLAP Functions •OLAP function can be nested inside another OLAP function select emp_id, salary, salary – first_value(salary) over (order by rank() over (order by salary)) as diff_salary from employee; select sum(ntile(10) over (order by salary)) over (partition by department) from employee;
  • 39.
    OLAP functions andIWA • Queries containing OLAP functions can be accelerated by Informix Warehouse Accelerator (IWA) • IWA processes majority of the query block – scan, join, group by, having, aggregation • Informix server processes OLAP functions based on query result from IWA
  • 40.
    For more information •Links to OLAP function in Informix 12.1 documentation http://pic.dhe.ibm.com/infocenter/informix/v121/inde x.jsp?topic=%2Fcom.ibm.sqls.doc %2Fids_sqs_2583.htm http://pic.dhe.ibm.com/infocenter/informix/v121/inde x.jsp?topic=%2Fcom.ibm.acc.doc %2Fids_acc_queries1.htm
  • 41.