OLAP etc

491 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
491
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

OLAP etc

  1. 1. 1 Data Warehousing (Business Intelligence / Decision Support Systems / …!) Data Warehouse "… a subject oriented, essential entities (e.g. customer rather than applications (e.g. car insurance) integrated, consistency (e.g. in naming, relationships, etc.) time-variant, organised by time periods (or snapshots) and nonvolatile updated in bulk uploads rather than transactions set of data that supports decision making …" Is used as a source for data mining and OLAP. OLAP (On-line Analytical Processing) [Not OLTP (on-line transaction processing) or TPS (transaction processing system)] Provides multidimensional view of data. Data Mart (!) TPS OLAP optimised for transaction volume optimised for data analysis process a few records at a time process summarised data real-time update as transactions occur batch update (e.g. daily) based on tables based on hypercubes raw data aggregated data SQL is widely used no common query language OLAP and SQL OLAP Package in SQL2003 standard (further extensions to the amendment to SQL1999). For example, consider the ROLLUP and CUBE extensions to SQL (GROUP BY clause): empno name job mgr hiredate sal comm dept 7369 Smith Clerk 7902 17-12-2000 8000.00 20 7499 Allen Salesman 7698 20-2-2001 16000.00 3000 30 7521 Ward Salesman 7698 22-2-2002 12500.00 5000 30 7566 Jones Manager 7839 2-4-2002 29750.00 20 7654 Martin Salesman 7698 28-9-1999 12500.00 14000 30 7698 Blake Manager 7839 1-5-2003 28500.00 30 7782 Clark Manager 7839 9-6-2001 24500.00 10 7788 Scott Analyst 7566 9-11-1997 30000.00 20 7839 King President 17-1-2000 50000.00 10 7844 Turner Salesman 7698 8-9-2003 15000.00 0 30 7876 Adams Clerk 7788 23-9-1998 11000.00 20 7900 James Clerk 7698 3-12-1996 9500.00 30 7902 Ford Analyst 7566 3-12-1996 30000.00 20 7934 Miller Clerk 7782 23-1-2002 13000.00 10 select dept, job, count(empno) as ees from emp group by dept, job;
  2. 2. 2 dept job ees 10 Clerk 1 10 Manager 1 10 President 1 20 Analyst 2 20 Clerk 2 20 Manager 1 30 Clerk 1 30 Manager 1 30 Salesman 4 select dept, job, count(empno) as ees from emp group by rollup (dept, job); dept job ees 10 Clerk 1 10 Manager 1 10 President 1 10 NULL 3 20 Analyst 2 20 Clerk 2 20 Manager 1 20 NULL 5 30 Clerk 1 30 Manager 1 30 Salesman 4 30 NULL 6 NULL NULL 14 select dept, job, count(empno) as ees from emp group by cube (dept, job); dept job ees 10 Clerk 1 10 Manager 1 10 President 1 10 NULL 3 20 Analyst 2 20 Clerk 2 20 Manager 1 20 NULL 5 30 Clerk 1 30 Manager 1 30 Salesman 4 30 NULL 6 NULL Analyst 2 NULL Clerk 4 NULL Manager 3 NULL President 1 NULL Salesman 4 NULL NULL 14 Other OLAP additions to the SQL standard include: new numeric functions (e.g. natural logarithm, exponentiate) new aggregation operators (e.g. standard deviation) ranking functions (i.e. gets rank ordering) cumulative and other 'moving average' functions (using WINDOW clause) several statistical functions applied to pairs of columns (e.g. distribution)
  3. 3. 3 Multidimensional Database / Hypercubes OLAP is typically used with a multidimensional database, where the fundamental representational unit is a hypercube. For example, analyse university student data by gender ethnic origin age faculty / department year of entry results (class of degree) entry qualifications rotation changing the view of the data (e.g. from "by dept" to "by job") drill-down reporting more detailed levels of data (e.g. for a particular dept details of jobs) drill-through move from the summary data to the source data items (e.g. the actual employees) Data Preparation Many of the issues surrounding decision support systems concern the tasks of obtaining and preparing the data in the first place. The data must be extracted (from various sources), cleansed, transformed and consolidated, loaded into the decision support database, and then periodically refreshed.
  4. 4. 4 Data Mining Data mining is the search for relationships and global patterns that exist in large databases but are hidden in the vast amounts of data. The analyst can combine knowledge of the data with machine learning techniques to discover items ('nuggets') of knowledge hidden in the data. The results of the data mining may be to discover the following types of information: association rules sequential patterns patterns within time series classification hierarchies clustering The goals of data mining are typically: prediction identification classification optimisation Data mining technologies (some examples): Decision trees Genetic algorithms K-nearest neighbour method Neural networks Data visualisation Some applications of data mining: marketing finance health care

×