Topic 6.


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Topic 6.

  1. 1. ICT619 Intelligent Systems Topic 6: Data Mining
  2. 2. Data Mining <ul><li>Introduction </li></ul><ul><li>Business Applications of Data Mining </li></ul><ul><li>Data Mining Activities </li></ul><ul><li>Data Mining Techniques </li></ul><ul><li>How to Apply Data Mining </li></ul><ul><li>Data Mining Development Methodology </li></ul>
  3. 3. Why data mining? <ul><li>“ Customers who bought this title also bought … “ </li></ul><ul><li>- from </li></ul><ul><ul><li>Why? – More effective (targeted) marketing </li></ul></ul><ul><ul><li>How? – Targeting through association </li></ul></ul><ul><li>Abundance of business data typically in terabytes </li></ul><ul><li>- point-of-sale (POS) devices, customer call detail databases, web log files in e-commerce etc </li></ul><ul><li>Data is being collected mostly for improving efficiency of underlying operations </li></ul><ul><li>But not for analysis </li></ul>
  4. 4. Why data mining? (cont’d) <ul><li>Useful information (business intelligence) to gain competitive advantage can be extracted by &quot;mine&quot;-ing data </li></ul><ul><li>Examples: underlying trends, associations or patterns in market behaviour </li></ul><ul><li>According to (Hirji 2001), </li></ul><ul><li>“ … data mining is the analysis and non-trivial extraction of data from databases for the purpose of discovering new and valuable information, in the form of patterns and rules, from relationships between data elements.” </li></ul>
  5. 5. Data mining in perspective <ul><li>OLAP with data warehouses tells us what is happening and how </li></ul><ul><li>Data mining tells us what is likely to happen </li></ul><ul><li>Data mining is knowledge discovery in (commercial) databases - KDD </li></ul><ul><li>Data mining is a process rather than a product </li></ul>
  6. 6. Data mining in perspective (cont’d) <ul><li>Statistical methods do not scale up to today's problems </li></ul><ul><li>New &quot;intelligent&quot; tools are needed </li></ul><ul><li>Data mining draws from artificial intelligence/soft computing, database theory, data visualization, marketing, statistics, and so on </li></ul><ul><li>Our objectives: </li></ul><ul><ul><li>Understand the role of data mining in business </li></ul></ul><ul><ul><li>Distinguish between different data mining techniques </li></ul></ul><ul><ul><li>Understand how to go about making use of data mining </li></ul></ul>
  7. 7. Business Applications of Data Mining <ul><li>Fastest growing segment of business intelligence market </li></ul><ul><li>Increasingly an integral and necessary component of an organization’s portfolio of analytical techniques </li></ul><ul><li>Data mining for marketing </li></ul><ul><li>Uses data on customer behaviour to identify target groups for marketing </li></ul><ul><li>Reduces cost by avoiding groups unlikely to respond </li></ul>
  8. 8. Business Applications of Data Mining (cont’d) <ul><li>Data mining for customer relationship management </li></ul><ul><ul><li>Anticipating customers’ needs and responding to them proactively </li></ul></ul><ul><li>Data mining in R&D </li></ul><ul><ul><li>Can lower costs during the R&D phase of the product life cycle by analysing voluminous test data </li></ul></ul><ul><ul><li>Bioinformatics - data mining in biology and medicine </li></ul></ul>
  9. 9. Data Mining Activities <ul><li>Two broad groups – directed and undirected data mining </li></ul><ul><li>Directed data mining </li></ul><ul><li>We know what we are looking for </li></ul><ul><li>We aim to find the value of a pre-identified target variable in terms of a collection of input variables, eg, classifying insurance claims </li></ul><ul><li>Undirected data mining </li></ul><ul><li>Finds patterns in data </li></ul><ul><li>Leaves it to the user to find the significance of these patterns </li></ul><ul><li>Eg, identifying groups of customers with similar buying patterns </li></ul>
  10. 10. Different types of data mining tasks
  11. 11. Data Mining Tasks <ul><li>Classification </li></ul><ul><li>Assigns a given object to a predefined category (class) based on the object’s attributes (features) </li></ul><ul><li>Objects to be classified are generally database records. </li></ul><ul><li>Discrete outcomes – yes/no, low/medium/high etc, </li></ul><ul><li>Examples of classification tasks: </li></ul><ul><ul><li>Assigning keywords to articles </li></ul></ul><ul><ul><li>Classifying credit applicants as low, medium and high risk </li></ul></ul><ul><ul><li>Assigning customers to predefined customer segments </li></ul></ul>
  12. 12. Data Mining Tasks <ul><li>Estimation </li></ul><ul><li>Continuously varying outcomes </li></ul><ul><li>Eg income, probability of a customer leaving (known in data mining circles as churning ) </li></ul><ul><li>Outcomes can also be used for classification by ranking and thresholding </li></ul><ul><li>Prediction </li></ul><ul><li>Classification or estimation task performed to predict some future behaviour </li></ul><ul><li>Examples include: </li></ul><ul><li>- Predicting which customers will churn in the next six months </li></ul><ul><li>- Predicting the size of a balance that will be transferred </li></ul>
  13. 13. Data Mining Tasks (cont’d) <ul><li>Finding affinity grouping or association rules </li></ul><ul><li>Finds out, which things go together, eg, in a supermarket shopping trolley </li></ul><ul><li>Used for arranging items in shelves or catalogues </li></ul><ul><li>Identifying cross-selling opportunities </li></ul><ul><li>Clustering </li></ul><ul><li>Segments a group of diverse records into subgroups or clusters containing similar records </li></ul><ul><li>No predefined classes in clustering; records grouped based on similarities in their attributes </li></ul><ul><li>Eg, people with similar buying habits </li></ul><ul><li>Data miner must interpret clusters and decide what to do </li></ul>
  14. 14. Data Mining Tasks (cont’d) <ul><li>Description and visualisation </li></ul><ul><li>To help increase our understanding of people, products or processes that produced the data </li></ul><ul><li>A good description can provide an explanation of their behaviour </li></ul><ul><li>Data visualisation can be very effective in explaining things by exploiting our ability to utilise visual clues </li></ul>
  15. 15. Data Mining Techniques <ul><li>Our aim is a basic understanding of data mining techniques to find out </li></ul><ul><ul><li>When to apply them </li></ul></ul><ul><ul><li>How to interpret their results </li></ul></ul><ul><ul><li>How to evaluate their performance </li></ul></ul><ul><li>Three major approaches are: </li></ul><ul><ul><li>Decision trees </li></ul></ul><ul><ul><li>Automatic cluster detection </li></ul></ul><ul><ul><li>Artificial neural networks (supervised and unsupervised) </li></ul></ul>
  16. 16. Decision Trees <ul><li>Visual representation of a reasoning process </li></ul><ul><li>Particularly suitable for solving classification problems </li></ul><ul><li>Consists of internal nodes, leaf nodes and edges </li></ul>Fig. A sample decision tree for catalogue mailing (Ganti et al. 1999).
  17. 17. Decision Trees (cont’d) <ul><li>Each leaf node is labelled with a class label </li></ul><ul><li>The class label decided by the class of the records that ended up in that leaf during training </li></ul><ul><li>A leaf node may also contain a value depending upon the average of the values of such records </li></ul>Fig. A sample decision tree for catalogue mailing (Ganti et al. 1999). Group A contains any self-employed person aged <=40 and earning a salary of more than $50,000.
  18. 18. Decision Trees (cont’d) <ul><li>Each edge originating from an internal node is labelled with a splitting predicate involving that node’s splitting attribute </li></ul><ul><li>The splitting predicate forces any record to take a unique path from the root to exactly one leaf node. </li></ul>Fig. A sample decision tree for catalogue mailing (Ganti et al. 1999). Group A contains any self-employed person aged less than 41 and earning a salary of more than $50,000.
  19. 19. How decision trees work <ul><li>Each record with N attributes is a point in an N -dimensional record space </li></ul><ul><li>Each branch in a decision tree is a test on a single variable that splits the space into two or more regions </li></ul><ul><li>With each successive test and split, the resulting regions get more and more segregated with increasing homogeneity among the records </li></ul><ul><li>Ultimately, the leaf nodes will contain the purest batch of records </li></ul>
  20. 20. How decision trees work <ul><li>For example, in the example decision tree, any self-employed person aged <= 40 and earning a salary of more than $50,000 will be classified as belonging to group A. </li></ul>
  21. 21. How decision trees work (cont’d) <ul><li>Overfitting in decision trees </li></ul><ul><li>A decision tree that correctly classifies every single record </li></ul><ul><li>Such a tree is unlikely to generalise to new data sets </li></ul><ul><li>To prevent overfitting, test data set are used to prune decision trees once it has been built using the training data set. </li></ul>
  22. 22. How decision trees are built <ul><li>Recursive partitioning </li></ul><ul><li>An iterative process of splitting the training data into partitions (regions of record space) </li></ul><ul><li>Initially, all records are in a training set consisting of pre-classified records </li></ul><ul><li>An algorithm splits up the data, using every possible binary split on every field of the records </li></ul><ul><li>The best split is defined as one that creates partitions where a single class predominates </li></ul>
  23. 23. How decision trees are built (cont’d) <ul><li>Recursive partitioning (cont’d) </li></ul><ul><li>The most important task in building a decision tree is to decide which of the attributes (independent fields in a record) gives the best split </li></ul><ul><li>The measure used to evaluate a potential splitter is the reduction in diversity (or increase in purity) </li></ul><ul><li>The best split has the largest reduction in diversity </li></ul><ul><li>One measure of diversity is the Gini index: </li></ul><ul><li>2p1 * (1 – p2) </li></ul>
  24. 24. How decision trees are built (cont’d) <ul><li>Recursive partitioning (cont’d) </li></ul><ul><li>The splitting process is applied to each of the new parts and so on until no more useful splits can be found </li></ul><ul><li>A node becomes a leaf node when no split can be found that significantly decreases the diversity </li></ul><ul><li>Pruning </li></ul><ul><li>The full decision tree needs to be pruned to improve its performance </li></ul><ul><li>Pruning is done by removing leaves and branches (edges leading to leaves) that fail to generalise </li></ul><ul><li>There are a number of pruning methods </li></ul><ul><li>Eg, a tree is pruned back to the subtree that minimises error on the test set. </li></ul>
  25. 25. How decision trees are built (cont’d) <ul><li>Different types of decision trees </li></ul><ul><li>Types depend upon </li></ul><ul><ul><li>the number of splits allowed at each level </li></ul></ul><ul><ul><li>how these splits are chosen when the tree is built </li></ul></ul><ul><ul><li>how the tree is pruned to prevent overfitting </li></ul></ul><ul><li>More broadly, decision trees can be grouped as: </li></ul><ul><ul><li>Classification trees (leaves represent classes) </li></ul></ul><ul><ul><li>Regression trees (leaves represent a numeric value) </li></ul></ul>
  26. 26. Algorithms for building decision trees <ul><li>Most notable are </li></ul><ul><li>- CHAID, C4.5/C5.0 and CART </li></ul><ul><li>Data mining software tools allow approximation of any of these algorithms by providing choice of </li></ul><ul><ul><li>splitting criteria and pruning strategies </li></ul></ul><ul><ul><li>control parameters such as maximum tree depth </li></ul></ul>
  27. 27. Application of decision trees <ul><li>Useful when the data mining task is classification of records or prediction of outcomes </li></ul><ul><li>Also chosen to generate understandable rules, which can be explained and translated into SQL or a natural language </li></ul><ul><li>For example, </li></ul><ul><ul><li>IF age < 41 </li></ul></ul><ul><ul><li>AND income < $50,000 </li></ul></ul><ul><ul><li>AND employment = self </li></ul></ul><ul><ul><li>THEN belongs to group A </li></ul></ul>
  28. 28. Automatic Cluster Detection <ul><li>Aims to discover structure in a complex data set as a whole in order to carve it up into simpler groups </li></ul><ul><li>Examples of clustering </li></ul><ul><li>- finding products that should be grouped together in a catalogue, or </li></ul><ul><li>- identifying groups of customers with similar tastes in music </li></ul><ul><li>Many methods for finding clusters in data, a prominent one is K-means clustering </li></ul>
  29. 29. K-means clustering <ul><li>Available in a wide variety of commercial data mining tools </li></ul><ul><li>Divides the data set into a predetermined number, k, of clusters </li></ul><ul><li>Initial clusters centred at random points ( seeds ) in the record space </li></ul>
  30. 30. K-means clustering (cont’d) <ul><li>Records are assigned to the clusters through an iterative process </li></ul><ul><li>In the first step, k data points are selected to be the seeds </li></ul><ul><li>Each seed is an embryonic cluster with only one element </li></ul><ul><li>In the second step, each record is assigned to the cluster whose centroid is nearest to that record </li></ul><ul><li>This forms the new clusters with new intercluster boundaries. </li></ul>
  31. 31. K-means clustering (cont’d) <ul><li>The centroid of a cluster of records calculated by taking average of each field for all the records in that cluster </li></ul><ul><li>Euclidean distance most commonly used for measuring distance by data mining software. </li></ul><ul><li>Distance between two points P(x1, x2, .. , xn) and Q(y1, y2, .. , yn) in n -dimensional space is  ((x1-y1) 2 + (x2-y2) 2 + .. + (xn-yn) 2 ). </li></ul>
  32. 32. K-means clustering (cont’d) <ul><li>In the k -means method, the original choice of the value of k determines the number of clusters that will be found </li></ul><ul><li>Unless advanced knowledge is available on the likely number of clusters, value of k is determined by trial-and-error </li></ul><ul><li>Best results are obtained when k matches the underlying structure of the data. </li></ul>
  33. 33. Interpreting clusters <ul><li>Automatic clustering is undirected data mining </li></ul><ul><li>- We look for something useful without having to know what we are looking for </li></ul><ul><li>Both an advantage and possible disadvantage! </li></ul><ul><li>The most frequently used approaches interpreting clusters are </li></ul><ul><ul><li>Building a decision tree with the cluster labels as target variables, and using it to derive rules explaining how to assign new records to the correct cluster </li></ul></ul><ul><ul><li>Using visualisation to see how the clusters are affected by changes in input variables. </li></ul></ul><ul><ul><li>Examining the differences in the distributions of variables from cluster to cluster, one variable at a time. </li></ul></ul>
  34. 34. Application of clusters <ul><li>Clustering is used </li></ul><ul><li>When natural groupings are suspected, </li></ul><ul><li>Eg, groups representing customers or products that have a lot in common with each other </li></ul><ul><li>When there are many competing patterns in the data making it hard to spot any single pattern </li></ul><ul><li>Creating clusters reduces the complexity within clusters so that other data mining techniques are more likely to succeed </li></ul>
  35. 35. Artificial Neural Networks <ul><li>Main generic application of artificial neural networks is pattern recognition or classification </li></ul><ul><li>Estimation and prediction can be viewed as variants of classification </li></ul><ul><li>The best ANN model for performing classification is the backpropagation network (or the multilayer perceptron) </li></ul><ul><li>The ANN model particularly suited for clustering is the Kohonen net or the self-organising map (SOM) </li></ul>
  36. 36. Artificial Neural Networks (cont’d) <ul><li>SOM learning algorithms are unsupervised </li></ul><ul><li>Clusters are represented in a SOM by groups of adjacent neurons in output layer </li></ul><ul><li>SOM reduces dimensionality from N to 2 </li></ul><ul><li>SOM can serve as a clustering tool as well as visualisation tool for high-dimensional data </li></ul><ul><li>SOMs claimed to be often more effective than k -means for complex shaped clusters </li></ul>
  37. 37. Application of neural nets <ul><li>Artificial neural networks can produce very good results </li></ul><ul><li>But require extensive data preparation involving normalisation and conversion of categorical values to numeric values </li></ul><ul><li>Do not work well when there are many hundreds or thousands of input features - long training phases </li></ul><ul><li>Difficult to understand because they represent complex non-linear models </li></ul><ul><li>Unlike decision trees, do not produce rules readily. </li></ul><ul><li>A good choice for most classification and prediction tasks when the results are more important than understanding how the model works </li></ul>
  38. 38. How to Apply Data Mining <ul><li>Four ways of utilising data mining expertise in business: </li></ul><ul><li>By purchasing readymade scores (such as on credit worthiness for a loan applicant) from outside vendors. </li></ul><ul><li>By purchasing software that embodies data mining expertise designed for a particular application such as credit approval, fraud detection or churn prevention </li></ul><ul><li>By hiring outside consultants to perform data mining for special projects </li></ul><ul><li>By developing own data mining skills within the business organization </li></ul><ul><li>Purchasing scores is quick and easy, but the intelligence limited to single score values </li></ul>
  39. 39. Purchasing Software <ul><li>Two possibilities: </li></ul><ul><li>Software may be an actual model </li></ul><ul><li>Eg, in the form of a set of rules for decision support, or a fully-trained neural network applied to a particular domain </li></ul><ul><li>Software may embody knowledge of the process of building models for a particular domain in the form of a model-creation wizard or template </li></ul><ul><li>Purchased models work well if the products, customers, and market conditions match those used to develop the model </li></ul>
  40. 40. Tasks for the data mining model builder <ul><li>Model building software automate the process of creating candidate models and selecting the ones that perform best </li></ul><ul><li>Significant tasks left for the user: </li></ul><ul><ul><li>Choosing a suitable business problem to be addressed by data mining. </li></ul></ul><ul><ul><li>Identifying and collecting data that is likely to contain the information needed to answer the business question. </li></ul></ul><ul><ul><li>Pre-processing the data so that the data mining tool can make use of it </li></ul></ul><ul><ul><li>Transforming the database so that the input variables needed by the model are available </li></ul></ul><ul><ul><li>Designing a plan of action based on the model and implementing it in the marketplace </li></ul></ul><ul><ul><li>Measuring results of the actions and feeding them back into the database for future mining. </li></ul></ul>
  41. 41. Hiring Outside Experts <ul><li>Recommended approach if </li></ul><ul><ul><li>Organization in early stages of integrating data mining in its business </li></ul></ul><ul><ul><li>Data mining activity is to be an one-off process </li></ul></ul><ul><li>Not if </li></ul><ul><ul><li>it is to be an ongoing process, eg, data mining for customer relationship management </li></ul></ul><ul><li>Outside expertise for data mining is likely to be available in three possible places </li></ul>
  42. 42. Hiring Outside Experts (cont’d) <ul><li>Outside expertise for data mining is likely to be available in three possible places: </li></ul><ul><li>From a data mining software vendor </li></ul><ul><li>Data mining centres </li></ul><ul><li>Usually collaborations between universities and private companies </li></ul><ul><li>Eg, Monash Data Mining Centre </li></ul><ul><li>Consulting companies </li></ul><ul><li>Consulting company chosen should have had experience specifically in the area of interest to the organisation </li></ul>
  43. 43. Developing In-house Expertise for data mining <ul><li>Applies particularly to companies which have many products and customers </li></ul><ul><li>Should be a core competency of all large scale businesses </li></ul>
  44. 44. Data Mining Development Methodology <ul><li>Best practice yet to emerge (Hirji 2001) </li></ul><ul><li>A proposed a five-stage model (Cabena 1998): </li></ul><ul><ul><li>Business objective determination </li></ul></ul><ul><ul><li>Data preparation </li></ul></ul><ul><ul><li>Data mining </li></ul></ul><ul><ul><li>Results analysis </li></ul></ul><ul><ul><li>Knowledge assimilation </li></ul></ul>
  45. 45. Data Mining Development Methodology (cont’d) <ul><li>Cabena’s five-stage model: </li></ul><ul><li>Business objective determination </li></ul><ul><ul><li>Clearly identifying the business problem to be mined </li></ul></ul><ul><li>Data preparation </li></ul><ul><ul><li>Data selection, preprocessing and transformation </li></ul></ul><ul><li>Data mining </li></ul><ul><ul><li>Algorithm selection and execution </li></ul></ul><ul><li>Results analysis </li></ul><ul><ul><li>Has anything new or interesting been found </li></ul></ul><ul><li>Knowledge assimilation </li></ul><ul><ul><li>Formulate ways of exploiting the new information extracted </li></ul></ul>
  46. 46. A Case Study (Hirji 2001) <ul><li>Involved a large fast food outlet </li></ul><ul><li>Brought out some deficiencies of the above methodology </li></ul><ul><li>A new set of stages for data mining development and use proposed: </li></ul><ul><ul><li>Business objective determination </li></ul></ul><ul><ul><li>Data preparation </li></ul></ul><ul><ul><li>Data audit </li></ul></ul><ul><ul><li>Interactive data mining and results analysis </li></ul></ul><ul><ul><li>Back end data mining </li></ul></ul><ul><ul><li>Results synthesis and presentation </li></ul></ul>
  47. 47. A Case Study (Hirji 2001) (cont’d) <ul><li>Case study used IBM’s Intelligent Miner for Data on AIX as the data mining tool </li></ul><ul><li>Took 20 actual days of effort across the 6 stages above </li></ul><ul><li>Back end data mining involves data enrichment and additional data mining algorithm execution by the data mining specialist </li></ul><ul><li>Distribution of time required </li></ul><ul><li>45% taken up by stages 4, 5 and 6 </li></ul><ul><li>30% required by the data preparation stage (70% predicted in the earlier model) </li></ul><ul><li>Use of a data warehouse saved time needed for selecting, cleaning, transforming, coding, and loading the data </li></ul>
  48. 48. A Case Study (Hirji 2001) (cont’d) <ul><li>Interactive data mining and results analysis stage </li></ul><ul><li>Linking data mining results with business strategy and using application software such as spreadsheets to perform sensitivity analysis of results obtained </li></ul><ul><li>Aims to demonstrate how data mining results support business strategy </li></ul>
  49. 49. REFERENCES <ul><li>Berry, M., & Linoff, G. Mastering Data Mining, Wiley Computer Publishing, New York 2000. </li></ul><ul><li>Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., and Zanasi, A. Discovering Data Mining: From Concept to Implementation . Prentice Hall, Englewood Cliffs, NJ 1998. </li></ul><ul><li>Dhar, V., & Stein, R .,”Deriving Rules from Data” in Seven Methods for Transforming Corporate Data into Business Intelligence ., Prentice Hall 1997, pp. 167-189, 251-258. </li></ul><ul><li>Ganti, V., Gehrke, J., & Ramakrishnan, R. Mining Very Large Databases , IEEE Computer, Vol.32 No.8, August 1999, pp.38-45. </li></ul><ul><li>Hirji, K., Exploring Data Mining Implementation , Communications of the ACM, Vol.44, No.7, July 2001, pp. 87-93. </li></ul><ul><li>Web site on Data Mining and Web Mining - http:// </li></ul>