Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Warehousing


Published on

  • Be the first to comment

Data Warehousing

  1. 1. Data Warehousing & Data Mining
  2. 2. Some Definitions … <ul><li>A data warehouse (DW) is a collection of integrated databases designed to support a DSS </li></ul><ul><li>An operational data store (ODS) stores data for a specific application. It feeds the data warehouse a stream of desired raw data. </li></ul><ul><li>A data mart is a lower-cost, scaled-down version of a data warehouse, usually designed to support a small group of users (rather than the entire firm) </li></ul><ul><li>The metadata is information that is kept about the warehouse </li></ul><ul><li>Online Analytical Processing (OLAP) is the broad category of software technology that enables multidimensional analysis of enterprise data </li></ul>
  3. 3. Business Intelligence and Analytics <ul><li>Business intelligence (BI) </li></ul><ul><ul><li>Acquisition of data and information for use in decision-making activities </li></ul></ul><ul><li>Business analytics (BA) </li></ul><ul><ul><li>Models and solution methods </li></ul></ul><ul><li>Web intelligence </li></ul><ul><ul><li>Application of business intelligence techniques to Web sites </li></ul></ul><ul><li>Web analytics </li></ul><ul><ul><li>Application of business analytics to Web sites </li></ul></ul><ul><li>Data mining </li></ul><ul><ul><li>Applying models and methods to data to identify patterns and trends </li></ul></ul>
  4. 4. Data Warehouse <ul><li>Subject -oriented (as opposed to application -oriented) </li></ul><ul><ul><li>Data is organised based on its intended use </li></ul></ul><ul><li>“ Scrubbed” and “cleansed” so that data from heterogeneous sources are standardised </li></ul><ul><li>Time series, historical data </li></ul><ul><li>Non-volatile (read only) </li></ul><ul><li>Summarised: in decision-usable format </li></ul><ul><li>Data from both internal and external sources is present </li></ul><ul><li>Metadata included </li></ul><ul><ul><li>Business metadata </li></ul></ul><ul><ul><li>Semantic metadata </li></ul></ul>
  5. 5. Data Warehouse: Environment <ul><li>The organisation’s legacy systems and data stores provide data to the data warehouse (DW) or mart </li></ul><ul><li>During the transfer of data from the various sources, cleansing or transformation may occur, so the data in the DW is more uniform </li></ul><ul><li>Simultaneously, metadata is recorded </li></ul><ul><li>Finally, the DW or mart may be used to create one or more “personal” warehouses </li></ul>
  6. 6. Data Warehouse: Environment
  7. 7. Integration of Data Sources <ul><li>Access needed to multiple sources </li></ul><ul><ul><li>Often enterprise-wide </li></ul></ul><ul><ul><li>Disparate and heterogeneous databases </li></ul></ul><ul><ul><li>XML becoming language standard </li></ul></ul><ul><li>External data sources: Web </li></ul><ul><ul><li>Intelligent agents </li></ul></ul><ul><ul><li>Document management systems </li></ul></ul><ul><ul><li>Content management systems </li></ul></ul><ul><li>External data sources: commercial databases </li></ul><ul><ul><li>Might buy / sell access to specialised databases </li></ul></ul>
  8. 8. Integration of Data Sources
  9. 9. Data Marts <ul><li>Dependent </li></ul><ul><ul><li>Created from warehouse </li></ul></ul><ul><ul><li>Replicated </li></ul></ul><ul><ul><ul><li>Functional subset of warehouse </li></ul></ul></ul><ul><li>Independent </li></ul><ul><ul><li>Scaled down, less expensive version of data warehouse </li></ul></ul><ul><ul><li>Designed for a department or SBU </li></ul></ul><ul><ul><li>Organisation may have multiple data marts </li></ul></ul><ul><ul><ul><li>Difficult to integrate </li></ul></ul></ul>
  10. 10. Migrating Data <ul><li>Business rules </li></ul><ul><ul><li>Stored in metadata repository </li></ul></ul><ul><ul><li>Applied to data warehouse centrally </li></ul></ul><ul><li>Data extracted from all relevant sources </li></ul><ul><ul><li>Loaded through data-transformation tools or programs </li></ul></ul><ul><ul><li>Separate operation and decision support environments </li></ul></ul><ul><li>Correct problems in quality before data stored </li></ul><ul><ul><li>Cleanse and organise in consistent manner </li></ul></ul>
  11. 11. Data Quality <ul><li>Quality is critical </li></ul><ul><ul><li>Quality determines usefulness </li></ul></ul><ul><ul><li>Often neglected or casually handled </li></ul></ul><ul><ul><li>Problems exposed when data is summarised </li></ul></ul>
  12. 12. Data Quality
  13. 13. Data Quality <ul><li>Cleanse data </li></ul><ul><ul><li>When populating warehouse </li></ul></ul><ul><ul><li>Data quality action plan </li></ul></ul><ul><ul><li>Best practices for data quality </li></ul></ul><ul><ul><li>Measure results </li></ul></ul><ul><li>Data integrity issues </li></ul><ul><ul><li>Uniformity </li></ul></ul><ul><ul><li>Version </li></ul></ul><ul><ul><li>Completeness check </li></ul></ul><ul><ul><li>Conformity check </li></ul></ul><ul><ul><li>Genealogy or drill-down </li></ul></ul>
  14. 14. Advantages of Data Warehousing <ul><li>Simplicity </li></ul><ul><ul><li>a data warehouse provides a single image of business reality by integrating various data </li></ul></ul><ul><li>Better quality data; improved productivity </li></ul><ul><ul><li>consistency and accuracy leads to better and more productive decision-making; end-user computing boosts productivity </li></ul></ul><ul><li>Fast access </li></ul><ul><ul><li>necessary data is in one place, so system response time is cut </li></ul></ul><ul><li>Easy to use </li></ul><ul><ul><li>designed for specific informational needs of end users </li></ul></ul><ul><li>Separate decision-support operation from production operation </li></ul><ul><ul><li>speeds access, avoids conflict and integrity problems </li></ul></ul>
  15. 15. Advantages of Data Warehousing <ul><li>Gives competitive advantage </li></ul><ul><ul><li>through better management and and utilisation of corporate knowledge </li></ul></ul><ul><li>Ultimate distributed database </li></ul><ul><ul><li>a data warehouse pulls together information from disparate and potentially incompatible locations throughout the organisation </li></ul></ul><ul><li>Information flow management </li></ul><ul><ul><li>a data warehouse, especially the meta data, is helpful in the continual task of incrementally refining process workflows in a changing business environment </li></ul></ul><ul><li>Enables parallel processing </li></ul><ul><ul><li>users can ask questions that were too process-intensive to answer before and a data warehouse can handle more users, transactions, queries, and messages </li></ul></ul><ul><li>Robust processing engines </li></ul><ul><ul><li>data warehouses allow users to directly obtain and refine data from different software applications without affecting the operational databases </li></ul></ul><ul><li>Security </li></ul><ul><ul><li>since clients of the data warehouses cannot directly query the production databases, the security of the production databases is increased </li></ul></ul>
  16. 16. Disadvantages of Data Warehousing <ul><li>Complexity and anticipation in development </li></ul><ul><ul><li>you cannot just buy a data warehouse; you have to build one because each warehouse has a unique architecture and a set of requirements that spring from the individual needs of the organisation </li></ul></ul><ul><li>Takes time to build </li></ul><ul><li>Expensive to build </li></ul><ul><li>End-user training </li></ul><ul><ul><li>It is necessary to create a new “mind-set” with all employees who must be prepared to capitalise upon the innovative data analysis provided by data warehouses </li></ul></ul><ul><li>Complexity involved in symmetrical multiprocessing (SMP) and massively parallel processing (MPP) </li></ul>
  17. 17. The Future of Data Warehousing <ul><li>As the DW becomes a standard part of an organisation, there will be efforts to find new ways to use the data. This will likely bring with it several new challenges: </li></ul><ul><ul><li>Regulatory constraints may limit the ability to combine sources of disparate data (e.g. Data Protection Act) </li></ul></ul><ul><ul><li>These disparate sources are likely to contain unstructured data , which is hard to store </li></ul></ul><ul><ul><li>The Internet makes it possible to access data from virtually “anywhere”. Of course, this just increases the disparity. </li></ul></ul>
  18. 18. Data Mining <ul><li>Definition: “the analysis of data to discover previously unknown relationships that provide useful information” ( Hand et al. ) </li></ul><ul><li>Data mining makes use of statistical and visualisation techniques to discover and present information in a form that is easily comprehensible </li></ul><ul><li>Data mining can be applied to tasks such as decision support, forecasting, estimation, and uncovering and understanding relationships among data elements </li></ul>
  19. 19. Data Mining <ul><li>Traditionally the task of identifying and utilising information hidden in data has been achieved through some form of traditional statistical methods </li></ul><ul><li>Typically, this involves a user formulating a guess about a possible relationship in the data and evaluating this hypothesis via a statistical test. This is a largely time-intensive, user-driven, top-down approach to data analysis. </li></ul><ul><li>With data mining, the interrogation of the data is done by the data mining algorithm rather than by the user </li></ul><ul><li>Data mining is a self-organising, data-influenced, bottom-up approach to data analysis </li></ul><ul><li>Simply put, what data mining does is sort through masses of data to uncover patterns and relationships, then build models to predict behaviours </li></ul>
  20. 20. Web Mining <ul><li>Web mining is a special case of data mining where the mining occurs over a Website </li></ul><ul><li>It enhances the website with intelligent behaviour, such as suggesting related links or recommending new products </li></ul><ul><li>It allows you to unobtrusively learn the interests of the visitors and modify their user profiles in real time </li></ul><ul><li>They also allow you to match resources to the interests of the visitor </li></ul>
  21. 21. Data Mining: Why the Growth in Popularity? <ul><li>One reason is that we keep getting more and more data all the time and need tools to understand it </li></ul><ul><li>We also are aware that the human brain has trouble processing multidimensional data </li></ul><ul><li>A third reason is that machine learning techniques are becoming more affordable and more refined at the same time </li></ul>
  22. 22. Verification -v- Knowledge Data Discovery <ul><li>In the past, decision support activities were primarily based on the concept of verification </li></ul><ul><li>This required a great deal of prior knowledge on the decision-maker’s part in order to verify a suspected relationship </li></ul><ul><li>With the advance of technology, the concept of verification began to turn into knowledge data discovery </li></ul>
  23. 23. Knowledge Data Discovery <ul><li>Knowledge data discovery (KDD) techniques include: statistical analysis, neural or fuzzy logic, intelligent agents, data visualisation </li></ul><ul><li>KDD techniques not only discover useful patterns in the data, but also can be used to develop predictive models </li></ul>
  24. 24. The Knowledge Discovery Search Process <ul><li>Define the business problem and obtain the data to study it </li></ul><ul><li>Use data mining software to model the problem </li></ul><ul><li>Mine the data to search for patterns of interest </li></ul><ul><li>Review the mining results and refine them by re-specifying the model </li></ul><ul><li>Once validated, make the model available to other users of the DW </li></ul>
  25. 25. Analytic Systems <ul><li>Real-time queries and analysis </li></ul><ul><li>Real-time decision-making </li></ul><ul><li>Real-time data warehouses updated daily or more frequently </li></ul><ul><ul><li>Updates may be made while queries are active </li></ul></ul><ul><ul><li>Not all data updated continuously </li></ul></ul><ul><li>Deployment of business analytic applications </li></ul>
  26. 26. On-line Analytical Processing (OLAP) <ul><li>Activities performed by end users in on-line (i.e. “live” multi-user) systems </li></ul><ul><ul><li>Specific, open-ended query generation e.g. SQL </li></ul></ul><ul><ul><li>Ad hoc reports </li></ul></ul><ul><ul><li>Statistical analysis </li></ul></ul><ul><ul><li>Building DSS applications </li></ul></ul><ul><li>Modeling and visualisation capabilities </li></ul><ul><li>Special class of tools </li></ul><ul><ul><li>DSS, BI, BA, DBMS, GIS, etc. </li></ul></ul>
  27. 27. Multidimensional OLAP (MOLAP) <ul><li>Data can be viewed across several dimensions. Here sales are arrayed by region and product </li></ul><ul><li>A fourth dimension could be added by using several graphs, perhaps at different time points </li></ul><ul><li>Most analyses have many more dimensions than this. MOLAP handles data as an n- dimensional hypercube </li></ul>
  28. 28. Relational OLAP (ROLAP) <ul><li>A large relational database server replaces the multidimensional one </li></ul><ul><li>The database contains both detailed and summarised data, allowing “drill down” techniques to be applied </li></ul><ul><li>SQL interfaces allow vendors to build tools, both portable and scalable </li></ul><ul><li>This requires databases with many relational tables which may lead to substantial processor overhead on complex joins </li></ul>
  29. 29. Data Mining Technologies <ul><li>Statistics – the most mature data mining technologies, but are often not applicable because they need clean data. In addition, many statistical procedures assume linear relationships, which limits their use. </li></ul><ul><li>Neural networks, genetic algorithms, fuzzy logic – these technologies are able to work with complicated and imprecise data. Their broad applicability has made them popular in the field. </li></ul>
  30. 30. Data Mining Technologies <ul><li>Decision trees – these technologies are conceptually simple and have gained in popularity as better tree growing software was introduced. Because of the way they are used, they are perhaps better called “classification” trees. </li></ul>
  31. 31. Data Mining Techniques <ul><li>Paralleling the popularity of data mining itself, the development of new techniques is exploding as well </li></ul><ul><li>Many innovations are vendor-specific, which sometimes does little to advance the state of the art </li></ul><ul><li>Regardless, data-mining techniques tend to fall into four major categories: </li></ul><ul><ul><li>classification </li></ul></ul><ul><ul><li>association </li></ul></ul><ul><ul><li>sequencing </li></ul></ul><ul><ul><li>clustering </li></ul></ul>
  32. 32. Classification Methods <ul><li>The goal is to discover rules that define whether an item belongs to a particular subset or class of data </li></ul><ul><li>For example, if we are trying to determine which households will respond to a direct mail campaign, we will want rules that separate the “probables” from the not probables. </li></ul><ul><li>These IF-THEN rules often are portrayed in a tree-like structure </li></ul>
  33. 33. Sequencing Methods <ul><li>These methods are applied to time series data in an attempt to find hidden trends </li></ul><ul><li>If found, these can be useful predictors of future events </li></ul><ul><li>For example, customer groups that tend to purchase products tied-in with hit movies would be targeted with promotional campaigns timed to release dates </li></ul>
  34. 34. Clustering Techniques <ul><li>Clustering techniques attempt to create partitions in the data according to some “distance” metric </li></ul><ul><li>Clustering aims to segment a diverse group into a number of similar subgroups or clusters </li></ul><ul><li>The clusters formed are data grouped together simply by their similarity to their neighbours </li></ul><ul><li>By examining the characteristics of each cluster, it may be possible to establish rules for classification </li></ul><ul><li>In clustering, there are no predefined classes and no examples. The records are grouped together on the basis of self-similarity. </li></ul>
  35. 35. Association Methods <ul><li>These techniques search all transactions from a system for patterns of occurrence </li></ul><ul><li>A common method is market basket analysis , in which the set of products purchased by thousands of consumers are examined </li></ul><ul><ul><li>It finds affinity groupings that discover what items are usually purchased with others, predicting the frequency with which certain items are purchased at the same time </li></ul></ul><ul><li>Results are then portrayed as percentages; for example, “30% of the people that buy steaks also buy charcoal” </li></ul>
  36. 36. Association: Market Basket Analysis <ul><li>This is the most widely used and, in many ways, most successful data mining algorithm </li></ul><ul><li>It essentially determines what products people purchase together </li></ul><ul><li>Retailers can use this information to place these products in the same area </li></ul><ul><li>Direct marketers can use this information to determine which new products to offer to their current customers </li></ul><ul><li>Inventory policies can be improved if reorder points reflect the demand for the complementary products </li></ul>
  37. 37. Market Basket Analysis Method <ul><li>We first need a list of transactions to see what was purchased. This can be easily obtained from cash registers / POS devices. </li></ul><ul><li>Next, we choose a list of products to analyse, and tabulate how many times each was purchased with the others … </li></ul>
  38. 38. A Convenience Store Example <ul><li>Consider the following simple example about five transactions at a convenience store: </li></ul><ul><ul><li>Transaction 1: Pizza, cola, milk </li></ul></ul><ul><ul><li>Transaction 2: Milk, potato chips </li></ul></ul><ul><ul><li>Transaction 3: Cola, pizza </li></ul></ul><ul><ul><li>Transaction 4: Milk, biscuits </li></ul></ul><ul><ul><li>Transaction 5: Cola, biscuits </li></ul></ul><ul><li>These need to be cross tabulated and displayed in a table … </li></ul>
  39. 39. A Convenience Store Example <ul><li>Pizza and Cola sell together more often than any other combination; a cross-marketing opportunity? </li></ul><ul><li>Milk sells well with everything; people probably come here specifically to buy it </li></ul>2 0 1 1 0 Biscuits 0 1 0 1 0 Chips 1 0 3 1 2 Cola 1 1 1 3 1 Milk 0 0 2 1 2 Pizza Biscuits also Chips also Cola also Milk also Pizza also Product Bought
  40. 40. Market Basket Analysis: Using the Results <ul><li>The tabulations can immediately be translated into association rules and the numerical measures computed </li></ul><ul><li>Comparing this week’s table to last week’s table can immediately show the affect of this week’s promotional activities </li></ul><ul><li>Some rules are going to be trivial (e.g. hot dogs and buns sell together) or inexplicable / spurious (e.g. wheelbarrows sell best on Wednesdays?) </li></ul>
  41. 41. Market Basket Analysis: Limitations <ul><li>A large number of real transactions are needed to do an effective basket analysis, but the data’s accuracy is compromised if all the products do not occur with similar frequency </li></ul><ul><li>The analysis can sometimes capture results that were due to the success of previous marketing campaigns (and not natural tendencies of customers) </li></ul><ul><li>(Have a look at to see it in action) </li></ul>
  42. 42. Data Visualisation <ul><li>Data visualisation is so powerful because the human visual cortex converts objects into information so quickly </li></ul><ul><li>See an example on the next slide where height and shading add additional dimensions to the figure … </li></ul>
  43. 43. Data Visualisation: An “Enlivened” Risk Analysis Report
  44. 44. Data Visualisation <ul><li>Technologies which support visualisation and interpretation include: </li></ul><ul><ul><li>Digital imaging, GIS, GUI, tables, multi-dimensions, graphs, VR, 3D, animation </li></ul></ul><ul><li>Helps to visually identify relationships and trends </li></ul><ul><li>Data manipulation allows real-time inspection of performance data / CPI benchmarks </li></ul>
  45. 45. Geographical Information Systems (GIS) <ul><li>A Geographical Information System (GIS) is a special purpose database that contains a spatial co-ordinate system </li></ul><ul><li>Computerised system for managing and manipulating data with digitised maps </li></ul><ul><li>Used for modeling and simulations </li></ul><ul><li>A comprehensive GIS requires: </li></ul><ul><ul><li>Data input from maps, aerial photos, etc. </li></ul></ul><ul><ul><li>Data storage, retrieval and query </li></ul></ul><ul><ul><li>Data transformation and modeling </li></ul></ul><ul><ul><li>Data reporting (maps, reports and plans) </li></ul></ul>
  46. 46. GIS: Sample Applications
  47. 47. Capabilities of a GIS <ul><li>In general, a GIS contains two types of data: </li></ul><ul><ul><li>Spatial data : these elements correspond to a uniquely-defined location on earth. They could be in point, line or polygon form </li></ul></ul><ul><ul><li>Attribute data : These are the data that will be portrayed at the geographic references established by spatial data </li></ul></ul><ul><li>Example (next slide): data from an opinion poll is displayed for multiple regions in the USA. Clicking on an area allows the user to drill down to the results for smaller areas. </li></ul>
  48. 48. Sample GIS Application: Telephone Polling Results On the “live” map, clicking on an area allows the user to drill down and see results for smaller areas
  49. 49. Data Mining: Some Applications <ul><li>Pharmaceuticals: Massive amounts of biological and clinical information can be analysed with data mining methods to discover new uses for existing drugs </li></ul><ul><li>Healthcare: Hospitals are using data mining to perform utilisation analysis and pricing analysis, to estimate outcome analysis, to improve preventive care, and to detect fraud and questionable practices </li></ul><ul><li>Banking: Data mining tools help banks to understand customer behaviour, conduct profitability analysis, improve cross-selling efforts, identify credit risk, identify customers for loan campaigns, tailor financial products to meet customer needs, seek new customers, and enhance customer service </li></ul><ul><li>Credit card companies: Predictors for credit card customer attrition and fraud are frequently identified via data mining. Successful users of data mining include American Express and Citibank. </li></ul><ul><li>Financial services: Security analysts are using data mining extensively to analyse large volumes of financial data in order to build trading and risk models for developing investment strategies </li></ul>
  50. 50. Data Mining: Some Applications <ul><li>Telemarketing and direct marketing: In this sector, companies have gained big savings and are able to target customers more accurately by using data mining. Direct marketers are configuring and mailing their product catalogs based on customers' purchase history and demographic data. </li></ul><ul><li>Airlines: As the competition in the airline business increases, understanding customers' needs has become imperative. Airlines capture customer data in order to make strategic movements such as expanding their services in new routes. </li></ul><ul><li>Manufacturers: Data mining is widely used in manufacturing industries to control and schedule technical production processes. </li></ul><ul><li>Insurance companies: The insurance industry is data intensive. Data mining has recently provided insurers with a wealth of useful information extracted from huge databases for decision making. </li></ul>
  51. 51. Data Mining: Some Applications <ul><li>Telecommunications: By applying the insights learned through data mining, telecommunications companies can identify products and services that maximise value and then use this information to establish marketing campaigns to improve market share. A common example in this industry is identifying factors that influence customer retention. In the US, telephone companies were famous for their price-cutting strategy in the past, but the new strategy is to know their customers better. Using data mining, telephone companies are able to provide customers with a great variety of new services they are likely to purchase. </li></ul><ul><li>Distribution and retailing: With the huge amount of consumer data flowing in daily from different sources, especially from e-commerce Web sites, data mining helps companies learn more about their customers and develop insights into their buying habits. Knowing the behaviours (e.g. likes and dislikes) of customers leads to better customer service and allows companies to create one-to-one relationships with customers, hopefully prolonging loyalty and prompting repeat business. As such, data mining is used extensively in the area of customer relationship management. Large users of data mining in retailing industry include Wal-Mart and Victoria's Secret. </li></ul><ul><li>Remotely sensed data: Huge amounts of remotely sensed data are taken in every day from satellite images and other related sources. Data mining is used in prediction of weather, monitoring and reasoning about ozone depletion, etc. </li></ul>
  52. 52. Advantages of Data Mining <ul><li>Provide better information to achieve competitive edge </li></ul><ul><ul><li>This advantage is the primary motivation for data mining. Data mining has a powerful analytical ability to generate information, which allows an organisation to better understand itself, its customers, and the marketplace it competes in. When used as a marketing tool, data mining often results in sharper competitive edge, an evidence-based selling approach, a customer-oriented marketing plan, shorter selling cycles, and reduced operational costs. </li></ul></ul><ul><li>Add value to a data warehouse </li></ul><ul><ul><li>A data warehouse by itself is just a large repository of unstructured data, and data mining is the process of analysing the data and transforming it into useful information. Organisations have experienced a payback of 10 to 70 times their data warehouse investment after data mining components are added. </li></ul></ul><ul><li>Increase operating efficiency </li></ul><ul><ul><li>Data mining's ability to quickly organise and analyse a large pool of data has dramatically increased workplace efficiency. It allows users to create complex financial statement in minutes compared with weeks by traditional methods. </li></ul></ul>
  53. 53. Advantages of Data Mining <ul><li>Provide flexibility in using data </li></ul><ul><ul><li>With data mining, users gain control over the data. Instead of letting the system push the data, users are now able to pull the data they need. Users can let their imagination run and manipulate data in various ways to answer their questions. The easy-to-use interface of data mining tools and client/server technology has made the information directly accessible by individual users. </li></ul></ul><ul><li>Reduce operating costs </li></ul><ul><ul><li>Modern data mining tools are made of highly sophisticated hardware and software components. They allow these tools to analyse massive data sets efficiently with reduced operating costs. (e.g. the high costs faced by public sector organisations such as healthcare providers when asked to answer a “parliamentary question” raised in the Oireachtas could be reduced by the use of data warehouses and data mining) </li></ul></ul><ul><li>Ready-to-use </li></ul><ul><ul><li>Unlike traditional data analysis methods, data mining hardly requires pre-processing of data prior to analysis. It can use a mixture of numeric, categorical, and date data, and can tolerate missing and noisy data. The results are in the form of ready-to-use business rules with almost no statistical expertise and guesswork needed. </li></ul></ul><ul><li>Solve research bottleneck </li></ul><ul><ul><li>In many social science and business situations, conducting real experiments is almost impossible. Data mining is able to provide these research agendas with a more limited set of working hypotheses for further investigation based on large, unstructured data sets. </li></ul></ul>
  54. 54. Disadvantages of Data Mining <ul><li>No definitive answer </li></ul><ul><ul><li>Data mining yields useful insights and clues but no definitive answers. The definitive answers need to be achieved through much more rigorous scientific experimentation. Experiences from Wall Street have shown that this technology may not outperform traditional methods. Therefore, users should have a realistic expectation of the results of data mining. </li></ul></ul><ul><li>High cost </li></ul><ul><ul><li>The cost of implementing data mining is quite high; thus, it may not be appropriate in some business environments. Need to justify ROI by cost-benefit analysis </li></ul></ul><ul><li>Complex and lengthy project </li></ul><ul><ul><li>Experience from data mining system developers has shown that it takes a long time to get the project right. Developers suggest focusing on incremental development and benefits. </li></ul></ul><ul><li>Privacy </li></ul><ul><ul><li>The detailed data about individuals used in data mining might involve a violation of privacy. This problem worsens when the World Wide Web is involved, because detailed personal information is easily accessible and can fall into wrong hands. </li></ul></ul>
  55. 55. Disadvantages of Data Mining <ul><li>Knowledge requirement of user </li></ul><ul><ul><li>Despite its increasingly simple interface and automation of the thinking processes, data mining is more suitable for people with statistical, operation research, and management science backgrounds. The ease of use becomes a critical factor for attracting more businesses to invest in this technology. </li></ul></ul><ul><li>Unmanageable database </li></ul><ul><ul><li>Many authors have suggested that organisations must increase the size of their databases tremendously in order to do data mining. However, some are concerned that this will result in unmanageable and unnecessary databases. </li></ul></ul><ul><li>Wrong information from errors in data </li></ul><ul><ul><li>The massive data used in data mining inevitably contains mistakes caused by human errors. Information generated should be used with caution to avoid lawsuits in areas such as hiring. Experts suggest using only relevant information for mining to reduce such risks. </li></ul></ul>
  56. 56. Additional Resources <ul><li>See case studies of successful implementations at: </li></ul><ul><li>See product demos at: </li></ul><ul><li>CIO Magazine - ERP Resources: </li></ul><ul><li>White papers available from: </li></ul><ul><li>Industry research reports available from: </li></ul><ul><li>The Data Warehousing Information Center: http:// </li></ul>