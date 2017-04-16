1
 Part 1: Data Warehouses  Part 2: OLAP  Part 3: Data Mining  Part 4: Big Data 2
3
 I can’t find the data I need ◦ data is scattered over the network ◦ many versions, subtle differences 4  I can’t get th...
A single, complete and consistent store of data obtained from a variety of different sources made available to end users i...
6 Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Which customers ...
 Used to manage and control business  Data is historical or point-in-time  Optimized for inquiry rather than update  U...
 Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more e...
 A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-ma...
 The warehouse is organized around the major subjects of the enterprise (e.g. customers, products, and sales) rather than...
 The data warehouse integrates corporate application-oriented data from different source systems, which often includes da...
 Data in the warehouse is only accurate and valid at some point in time or over some time interval.  Time-variance is al...
 Data in the warehouse is not updated in real- time but is refreshed from operational systems on a regular basis.  New d...
 Potential high returns on investment  Competitive advantage  Increased productivity of corporate decision- makers 14
15
 The types of queries that a data warehouse is expected to answer ranges from the relatively simple to the highly complex...
 What was the total revenue for Scotland in the third quarter of 2004?  What was the total revenue for property sales fo...
 Underestimation of resources for data loading  Hidden problems with source systems  Required data not captured  Incre...
 High demand for resources  Data ownership  High maintenance  Long duration projects  Complexity of integration 19
20
 A subset of a data warehouse that supports the requirements of a particular department or business function.  Character...
 To give users access to the data they need to analyze most often.  To provide data in a form that matches the collectiv...
 To provide appropriately structured data as dictated by the requirements of the end- user access tools.  Building a dat...
 The potential users of a data mart are more clearly defined and can be more easily targeted to obtain support for a data...
25 Departmentally Structured Individually Structured Data Warehouse Organizationally Structured Less More History Normaliz...
26
 Aggregation -- (total sales, percent-to-total)  Comparison -- Budget vs. Expenses  Ranking -- Top 10, quartile analysi...
 Accompanying the growth in data warehousing is an ever-increasing demand by users for more powerful access tools that pr...
 OLAP and Data Mining differ in what they offer the user and because of this they are complementary technologies.  An en...
 The dynamic synthesis, analysis, and consolidation of large volumes of multi- dimensional data, Codd (1993).  Describes...
 Enables users to gain a deeper understanding and knowledge about various aspects of their corporate data through fast, c...
 Can easily answer ‘who?’ and ‘what?’ questions, however, ability to answer ‘what if?’ and ‘why?’ type questions distingu...
33
 Although OLAP applications are found in widely divergent functional areas, they all have the following key features: ◦ m...
 Must provide a range of powerful computational methods such as that required by sales forecasting, which uses trend algo...
 Key feature of almost any analytical application as performance is almost always judged over time.  Time hierarchy is n...
 Increased productivity of end-users.  Reduced backlog of applications development for IT staff.  Retention of organiza...
 Example of two-dimensional query.  What is the total revenue generated by property sales in each city, in each quarter ...
39
 Example of three-dimensional query. ◦ ‘What is the total revenue generated by property sales for each type of property (...
41
 Cube represents data as cells in an array.  Relational table only represents multi- dimensional data in two dimensions....
 Measure - sales (actual, plan, variance) 43 Month 1 2 3 4 765 Product Toothpaste Juice Cola Milk Cream Soap W S N Dimens...
 It is a powerful visualization tool  It provides fast, interactive response times  It is good for analyzing time serie...
 Andyne Computing -- Pablo  Arbor Software -- Essbase  Cognos -- PowerPlay  Comshare -- Commander OLAP  Holistic Syst...
46
 The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and...
 Reveals information that is hidden and unexpected, as little value in finding patterns and relationships that are alread...
 Most accurate results normally require large volumes of data to deliver reliable conclusions.  Starts by developing an ...
 Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing.  Relati...
 Retail / Marketing ◦ Identifying buying patterns of customers ◦ Finding associations among customer demographic characte...
 Banking ◦ Detecting patterns of fraudulent credit card use ◦ Identifying loyal customers ◦ Predicting customers likely t...
 Insurance ◦ Claims analysis ◦ Predicting which customers will buy new policies  Medicine ◦ Characterizing patient behav...
 Four main operations include: ◦ Predictive modeling ◦ Database segmentation ◦ Link analysis ◦ Deviation detection  Ther...
 Techniques are specific implementations of the data mining operations.  Each operation has its own strengths and weakne...
56
 Similar to the human learning experience ◦ uses observations to form a model of the important characteristics of some ph...
 Model is developed using a supervised learning approach, which has two phases: training and testing. ◦ Training builds a...
 Applications of predictive modeling include customer retention management, credit approval, cross selling, and direct ma...
60
 Used to estimate a continuous numeric value that is associated with a database record.  Uses the traditional statistica...
 Linear regression attempts to fit a straight line through a plot of the data, such that the line is the best representat...
 Data mining requires statistical methods that can accommodate non-linearity, outliers, and non-numeric data.  Applicati...
 Aim is to partition a database into an unknown number of segments, or clusters, of similar records.  Uses unsupervised ...
 Less precise than other operations thus less sensitive to redundant and irrelevant features.  Applications of database ...
66
 Aims to establish links (associations) between records, or sets of records, in a database.  There are three specializat...
 Finds items that imply the presence of other items in the same event.  Affinities between items are represented by asso...
 Finds patterns between events such that the presence of one set of items is followed by another set of items in a databa...
 Finds links between two sets of data that are time-dependent, and is based on the degree of similarity between the patte...
 Relatively new operation in terms of commercially available data mining tools.  Often a source of true discovery becaus...
 Can be performed using statistics and visualization techniques or as a by-product of data mining.  Applications include...
73
What is Big Data? What makes data, “Big” Data? 74
 No single standard definition… “Big Data” is data whose scale, diversity, and complexity require new architecture, techn...
 Data Volume ◦ 44x increase from 2009 2020 ◦ From 0.8 zettabytes to 35zb  Data volume is increasing exponentially 76 Exp...
 Various formats, types, and structures  Text, numerical, images, audio, video, sequences, time series, social media dat...
 Data is begin generated fast and need to be processed fast  Online Data Analytics  Late decisions  missing opportunit...
79
80
 OLTP: Online Transaction Processing (DBMSs)  OLAP: Online Analytical Processing (Data Warehousing)  RTAP: Real-Time An...
Social media andnetworks (all of us aregenerating data) Scientific instruments (collecting all sorts of data) Mobiledevice...
 The Model of Generating/Consuming Data has Changed Old Model: Fewcompanies aregenerating data, all others areconsuming d...
- Ad-hoc querying and reporting - Data mining techniques - Structured data, typical sources - Small to mid-size datasets -...
 Big data is more real-time in nature than traditional DW applications  Traditional DW architectures (e.g. Exadata, Tera...
 The Bottleneck is in technology ◦ New architecture, algorithms, techniques are needed  Also in technical skills ◦ Exper...
What Technology Do We Have For Big Data ?? 87
88
89
