Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Oracle数据库分析函数详解

4,798 views

Published on

Oracle数据库分析函数详解

Published in: Technology
  • 既然放到网上了,为什么不设置为可下载呢?怎么想的
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • 既然放到网上了,为什么不设置为可下载呢?怎么想的
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Oracle数据库分析函数详解

  1. 1. 甲:我有个 SQL ,你能帮我用分析函数改写下吗? SELECT owner,object_type FROM demo2 子查询方法 WHERE owner=DINGJUN123 AND trunc(created,dd) = 总行数: 676749 (SELECT MAX(trunc(created,dd)) 返回 27 行 FROM demo2 原始 SQL : 逻辑度 9581 , COST : 3049 WHERE owner=DINGJUN123) 优点:最容易想到 缺点:多次访问表或索引乙:相关列有索引吗?甲: owner 有索引,选择性还可以,我就是想用分析函数改写看看?乙:哦,知道了,这是典型的 top-n 查询。 SELECT owner,object_type 分析函数方法 FROM ( SELECT owner,object_type, dense_rank() over(ORDER BY trunc(created,dd) DESC) rn 分析 SQL : 逻辑度 9572 , COST : 2790 FROM demo2 优点:减少表或索引的访问次数, SQL 简单 WHERE owner=DINGJUN123 缺点:需要排序操作 ) WHERE rn=1
  2. 2. ---------------------------------------------------- - SELECT STATEMENT | | 16 TABLE ACCESS BY INDEX ROWID| DEMO2 | 16 INDEX RANGE SCAN | IDX_DEMO2 | 1 SORT AGGREGATE | | 1 INDEX RANGE SCAN | IDX_DEMO2 | 1636SELECT owner,object_typeFROM ( ---------------------------------------------------- -SELECT owner,object_type, SELECT STATEMENT | | 1636 dense_rank() over(ORDER BY trunc(created,dd) DESC) rn VIEW | | 1636 WINDOW NOSORT STOPKEY | | 1636 FROM demo2 TABLE ACCESS BY INDEX ROWID| DEMO2 | 1636 WHERE owner=DINGJUN123 INDEX RANGE SCAN | IDX_DEMO2 | 1636) WHERE rn=1 SELECT empno,sal,deptno, EMPNO SAL DEPTNO SUM_CURRENT ------ ---------- ---------- ----------- SUM(sal) over (PARTITION BY deptno ORDER BY empno) 7782 2450 10 2450 sum_current 7839 5000 10 7450 FROM emp 7934 1300 10 8750
  3. 3. ----------------------------------------------------SELECT a.ID,a.sal,a.ext - SELECT STATEMENT | | 1 | 65 |FROM t1 a, HASH JOIN | | 1 | 65 |(SELECT ID,MAX(sal) max_sal FROM t1 GROUP 39M VIEW | | 1101K| 27M| BY ID ) b HASH GROUP BY | | 1101K| 27M|WHERE a.sal=b.max_sal TABLE ACCESS FULL| T1 | 1101K| 27M|AND a.ID=b.ID TABLE ACCESS FULL | T1 | 1101K| 40M| 已用时间 : 00: 00: 00.35 ----------------------------------------------------- -----SELECT ID,sal,ext SELECT STATEMENT | | 1101K| 54M|FROM ( VIEW | | 1101K| 54M|SELECT ID,sal,ext,rank() over(PARTITION BY ID ORDERBY sal DESC) rn WINDOW SORT PUSHED RANK| | 1101K| 40M|FROM t1 54M) WHERE rn=1 TABLE ACCESS FULL | T1 | 1101K| 40M| 已用时间 : 00: 00: 01.79 CREATE TABLE t1 AS SELECT mod(LEVEL,1000) ID,LEVEL+1000 sal,MOD(LEVEL,10) ext FROM dual CONNECT BY LEVEL<1000000
  4. 4. ROWS Vs RANGE PARTITION BY ORDER BY UNBOUNDED PRECEDINGAnalytic FOLLOWINGFunction CURRENT ROW
  5. 5. 1通过 partition by 子句,将相同的行聚合到一起,每个分析函数都可以使用 partiti句。 1 2每行在对应的窗口内,应用分析函数,然后计算得到当前行对应的分析函数值。 3partition by 子句可以没有,如果也没有 order by 子句,那么表示当前行对应的窗围是所有行。 2 DEPTNO EMPNO SAL SUM_DEPT SUM_ALL --------- ---------- ---------- ---------- ---------- 10 7782 2450 8750 32025 10 7934 1300 8750 32025 10 7839 5000 8750 32025 20 7902 3000 10875 32025 20 7566 2975 10875 32025 20 7876 1100 10875 32025 20 7369 800 10875 32025 20 7788 3000 10875 32025
  6. 6. 1 order by 默认是当前行所属的 partition 第 1 行到当前行。2 order by 默认是 range 窗口,对应逻辑窗口,保证分析函数值的唯一性 order by 如果有多个排序键,则必须要求对应的窗口是当前 partition 所有3 行、第 1 行到当前行、当前行到当前 partition 最后一行 DEPTNO EMPNO SAL DEPT_CURRENT -------- ---------- ---------- ------------ 10 7934 1300 1300 10 7782 2450 3750 10 7839 5000 8750 20 7369 800 800 20 7876 1100 1900 20 7566 2975 4875 20 7902 3000 10875 20 7788 3000 10875 30 7900 950 950 30 7521 1250 3450 30 7654 1250 3450 30 7844 1500 7950 30 7845 1500 7950 30 7846 1500 7950 30 7499 1600 9550 30 7698 2850 12400
  7. 7. 1 有显式 window 子句,必须有 order by 。有的分析函数不能有显式 window, 如 row_number,rank,dense_rank 等2 window 可以指定是逻辑窗口还是物理窗口,逻辑行用 range, 物理行用 rows3 window 窗口的方向必须是从上至下的 ID DEFAULT_SUM RANGE_UNBOUND_SUM ROWS_UNBOUND_SUM RANGE_SUM ROWS_SUM --- ----------- ----------------- ---------------- ---------- ---------- 1 2 2 1 5 5 1 2 2 2 5 11 3 5 5 5 3 16 6 23 23 11 33 21 6 23 23 17 33 25 6 23 23 23 33 27 7 30 30 30 42 30 8 38 38 38 24 24 9 47 47 47 17 17
  8. 8. 排名分析函数有 row_number,dense_rank,rank,first,last,ntile 等, EMPNO DENSE_RN SAL DEPTNO ROW_RN RANK_RN其中 row_number,dense_rank,rank,first,last 都需要 order by---------- ------ ---------- ---------- ---------- 。 ---------- 7934 1300 10 1 1 1 7782 2450 10 2 2 2 7839 5000 10 3 3 3 7369 800 20 1 1 1 7876 1100 20 2 2 2 7566 2975 20 3 3 3 7902 3000 20 4 4 4 7788 3000 20 5 4 4 7900 950 30 1 1 1 7521 1250 30 2 2 2 7654 1250 30 3 2 2 7844 1500 30 4 4 3 7845 1500 30 5 4 3 7846 1500 30 6 4 3 7499 1600 30 7 7 4 7698 2850 30 8 8 5
  9. 9. 聚集分析函数有 sum,max,min,avg,count 等,很多组函数同时可以 作为分析函数使用。 ENAME HIREDATE SAL T_SAL ---------- ------------------- ---------- ---------- SMITH 1980-12-17 00:00:00 800 800 ALLEN 1981-02-20 00:00:00 1600 1600 WARD 1981-02-22 00:00:00 1250 2850 JONES 1981-04-02 00:00:00 2975 5825 SELECT ename, hiredate, sal, BLAKE 1981-05-01 00:00:00 2850 5825 SUM(sal) OVER (ORDER BY hiredate CLARK 1981-06-09 00:00:00 2450 5300 RANGE NUMTOYMINTERVAL(2,month) TURNER 1981-09-08 00:00:00 1500 1500PRECEDING) AS t_sal MARTIN 1981-09-28 00:00:00 1250 2750 KING 1981-11-17 00:00:00 5000 6250 FROM scott.emp JAMES 1981-12-03 00:00:00 950 8950 FORD 1981-12-03 00:00:00 3000 8950 MILLER 1982-01-23 00:00:00 1300 5250 SCOTT 1987-04-19 00:00:00 3000 3000 ADAMS 1987-05-23 00:00:00 1100 4100
  10. 10. 行比较分析函数有 LEAD/LAG 。无 window 子句,分别得到当前行物理偏移 n 行的值,默认偏移 1 ,比较特殊,这里没有逻辑窗口的概念。 EMPNO HIREDATE LEAD_HIREDA LEAD_HIREDA ------ ----------- ----------- ----------- 7369 1980-12-17 1981-02-20 7499 1981-02-20 1981-02-22 1980-12-17 SELECT empno,hiredate, 7521 1981-02-22 1981-04-02 1981-02-20 lead(hiredate,1) over(ORDER BY hiredate) 7566 1981-04-02 1981-05-01 1981-02-22 lead_hiredate, 7698 1981-05-01 1981-06-09 1981-04-02 7782 1981-06-09 1981-09-08 1981-05-01 lag(hiredate,1) over(ORDER BY hiredate) 7844 1981-09-08 1981-09-28 1981-06-09 lead_hiredate 7654 1981-09-28 1981-11-17 1981-09-08 FROM scott.emp 7839 1981-11-17 1981-12-03 1981-09-28 7900 1981-12-03 1981-12-03 1981-11-17 7902 1981-12-03 1982-01-23 1981-12-03 7934 1982-01-23 1987-04-19 1981-12-03 7788 1987-04-19 1987-05-23 1982-01-23 7876 1987-05-23 1987-04-19
  11. 11. 统计分析函数最常用的是 ratio_to_report 。可以有 window 子句。 SELECT department_id ,sum(salary) dept_sum, SELECT department_id ,sum(salary) dept_sum,SUM(SUM(salary)) over() all_sum, SUM(SUM(salary)) over() all_sum,round(SUM(salary)/(SUM(SUM(salary)) round(ratio_to_report(SUM(salary)) over(),2)*100||%over()),2)*100||% ratio ratioFROM hr.employees FROM hr.employeesGROUP BY department_id GROUP BY department_idORDER BY 1 ORDER BY 1 DEPARTMENT_ID DEPT_SUM ALL_SUM RATIO ------------- ---------- ---------- ------- 10 4400 691416 1% 20 19000 691416 3% 30 24900 691416 4% 40 6500 691416 1% 50 156400 691416 23% 60 28800 691416 4% 70 10000 691416 1% 80 304500 691416 44% 90 58000 691416 8% 100 51608 691416 7% 110 20308 691416 3% 7000 691416 1%
  12. 12. 要求对 ID 相同, num 连续的,查找最小 num 以及 val 求和。select id,num,val from test_tab; ID NUM VAL---------- ---------- ---------- 1 1 50 1 2 100 1 3 150 ID MIN(NUM) SUM(VAL) 1 5 250 ---- ---------- ---------- 2 1 100 1 1 300 2 3 400 1 5 250 3 1 100 2 1 100 2 3 400 3 2 200 3 1 300SELECT ID,MIN(num),SUM(val)FROM( SELECT ID,num,val, num-row_number() over(PARTITION BY IDORDER BY num) rn FROM test_tab)GROUP BY ID,rnORDER BY 1,2
  13. 13. 将 num 的值按 id 相同的,按月升序,如果当前行为空,将前面最近非空的 num 填充到当前行,否则找最近的后面行。 SELECT ID,mm,num, ID MM NUM NEW_NUM nvl(last_value(num IGNORE NULLS) -- ---------- ---------- ---------- 1 201001 3 3 over(PARTITION BY ID ORDER BY mm) , 1 201002 2 2 last_value(num IGNORE NULLS) 1 201003 2 over(PARTITION BY ID ORDER BY mm DESC) 1 201004 2 ) new_num 1 201005 1 1 FROM demo5 2 201001 2 2 201002 2 2 ORDER BY ID,mm 2 201003 3 3 2 201004 3
  14. 14. SQL*PLUS 提供 BREAK 命令,就是当前列值与前面相同,则置NULL ,这是报表常用的一种手段。SQL> break ON department_idSQL> SELECT department_id,first_name 2 FROM hr.employees 3 WHERE department_id<40 4 ORDER BY 1,2; SELECT decode(lag(department_id,1) over(PARTITION BY department_id ORDER BYDEPARTMENT_ID FIRST_NAME first_name),------------- -------------------- department_id,NULL,department_id 10 Jennifer 20 Michael ) newdepartment_id, Pat first_name 30 Alexander FROM hr.employees Den WHERE department_id<40 Guy ORDER BY department_id,first_name Karen Shelli Sigal
  15. 15. ---------------------------------------------------- - DELETE STATEMENT | | 2 DELETE | DUPROWS |DELETE FROM duprows a HASH JOIN | | 2WHERE a.ROWID <> VIEW | VW_SQ_1 | 3(SELECT MIN(b.ROWID) SORT GROUP BY | | 3 TABLE ACCESS FULL| DUPROWS | 3FROM duprows b TABLE ACCESS FULL | DUPROWS | 3WHERE a.ext=b.ext) ------------------------------------------------DELETE FROM duprows a DELETE STATEMENT | | 1 DELETE | DUPROWS |WHERE a.ROWID IN NESTED LOOPS | | 1(SELECT ROWID FROM VIEW | VW_NSO_1 | 3(SELECT row_number() over(PARTITION BY b.ext SORT UNIQUE | | 1 ORDER BY b.ROWID) rn VIEW | | 3 WINDOW SORT | | 3FROM duprows b TABLE ACCESS FULL | DUPROWS | 3)c TABLE ACCESS BY USER ROWID| DUPROWS | 1WHERE c.rn>1 ------------------------------------------------)
  16. 16. SQL> SELECT INDEX_NAME, COLUMN_NAME 2 FROM user_ind_columns 3 WHERE INDEX_NAME LIKE %PK 4 AND rownum < 10;INDEX_NAME COLUMN_NAME------------------------------ --------------------------------------------------------------------------------ALL_ORDERS_PK YEARALL_ORDERS_PK MONTHALL_ORDERS_PK CUST_NBRALL_ORDERS_PK REGION_IDALL_ORDERS_PK SALESPERSON_IDASSEMBLY_PK ASSEMBLY_TYPEASSEMBLY_PK ASSEMBLY_IDA_ID_PK ID select INDEX_NAME,CUSTOMER_PK CUST_NBR max(decode(rn, 1, COLUMN_NAME)) c1, max(decode(rn, 2, COLUMN_NAME)) c2, INDEX_NAME C1 C2 C3 C4 C5 max(decode(rn, 3, COLUMN_NAME)) ------------------------------ --------------- --------------- c3, ------------- max(decode(rn, 4, COLUMN_NAME)) ALL_ORDERS_PK CUST_NBR MONTH REGION_ID SALESPERSON_ID c4, SALESPERSON_ID ASSEMBLY_PK ASSEMBLY_ID ASSEMBLY_TYPE max(decode(rn, 4, COLUMN_NAME)) c5 from (select INDEX_NAME, A_ID_PK ID TABLE_NAME, COLUMN_NAME, CUSTOMER_PK CUST_NBR row_number() over(partition by INDEX_NAME order by COLUMN_NAME) rn from user_ind_columns where INDEX_NAME like %PK and
  17. 17. 平均分派问题,如何将金额平均分摊,并且小数也分摊掉,避免误差。 SELECT ID, persons,(CASE WHEN rn <= (amount - amount2) * 100 THEN 0.01 SQL> select * from demo7_1; ELSE 0 END) + je AS je,amount -- 然后排序,与总金额有差额的补 0.01 FROM (SELECT t.*, SUM(je) OVER(PARTITION BY id) ID AMOUNT AS amount2, ---------- ---------- ROW_NUMBER() OVER(PARTITION BY id 1 100 ORDER BY je DESC) rn 2 50 FROM ( -- 先展开记录数,用 trunc 先平均 , 只舍不入 SELECT tt.* 已用时间 : 00: 00: 00.01 FROM (SELECT t2.id, t2.persons, SQL> select * from demo7_2; TRUNC(t1.amount /t2.persons, 2) je, t1.amount amount FROM demo7_1 t1, demo7_2 t2 ID PERSONS WHERE t1.id = t2.id ---------- ---------- ) tt, 1 3 -- 构造最大的人数序列 2 2 (SELECT LEVEL rn FROM dual CONNECT BY LEVEL <= (SELECT MAX(persons) max_num FROM demo7_2) ) tm WHERE tt.persons >= tm.rn) t ID PERSONS JE AMOUNT )--- ---------- ---------- ---------- 1 3 33.34 100 1 3 33.33 100 1 3 33.33 100 2 2 25 50 2 2 25 50
  18. 18. 分析函数是对 ANSI SQL 的有力补充,专门用于计算复杂的累积计算、移动平均、行间计算、聚合报表等。 ORACLE 不仅提供了官方分析函数,而且还提供了自定义聚集函数的功能,比如 wmsys.wm_concat 函数就是一个自定义聚集函数,详细可以参考 ORACLE cartridge developer guide 文档中的 user-definedaggregate function 部分。 学习分析函数包括 ORACLE 其他内容,要详细研究文档,特别注意文档中的注意点,在解决实际问题过程中有意识地想起自己学的知识即可。
  19. 19. 1. 搞清楚 partition 、 order by 、 window 之间的关系,特别是range,rows 等。2. 搞清楚每种分析函数的功能和限制等。3. 搞清楚每种分析函数的使用场合、缺点等。
  20. 20. 谢 谢

×