Machine Learning

423 views
367 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
423
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • What is predictive modeling? Predictive modeling uses demographic, medical and pharmacy claims information to determine the range and intensity of medical problems for a given population of insured persons. This assessment of risk allows health plans, payers and provider groups to plan, evaluate and fund health care management programs more effectively. From: http://www.dxcgrisksmart.com/faq.html
  • What is predictive modeling? Predictive modeling uses demographic, medical and pharmacy claims information to determine the range and intensity of medical problems for a given population of insured persons. This assessment of risk allows health plans, payers and provider groups to plan, evaluate and fund health care management programs more effectively. From: http://www.dxcgrisksmart.com/faq.html
  • Machine Learning

    1. 1. Machine Learning, Data Mining INFO 629 Dr. R. Weber
    2. 2. The picnic game <ul><li>How did you reason to find the rule? </li></ul><ul><li>According to Michalski (1983) A theory and methodology of inductive learning. In Machine Learning, chapter 4, “ inductive learning is a heuristic search through a space of symbolic descriptions (i.e., generalizations) generated by the application of rules to training instances .” </li></ul>
    3. 3. Learning <ul><li>Rote Learning </li></ul><ul><ul><li>Learn multiplication tables </li></ul></ul><ul><li>Supervised L e a r n i n g </li></ul><ul><ul><li>Examples are used to help a program identify a concept </li></ul></ul><ul><ul><li>Examples are typically represented with attribute-value pairs </li></ul></ul><ul><ul><li>Notion of supervision originates from guidance from examples </li></ul></ul><ul><li>Unsupervised Learning </li></ul><ul><ul><li>Human efforts at scientific discovery, theory formation </li></ul></ul>
    4. 4. Inductive Learning <ul><li>Learning by generalization </li></ul><ul><li>Performance of classification tasks </li></ul><ul><ul><li>Classification, categorization, clustering </li></ul></ul><ul><li>Rules indicate categories </li></ul><ul><li>Goal: </li></ul><ul><ul><li>Characterize a concept </li></ul></ul>
    5. 5. Concept Learning is a Form of Inductive Learning <ul><li>Learner uses: </li></ul><ul><ul><li>positive examples (instances ARE examples of a concept) and </li></ul></ul><ul><ul><li>negative examples (instances ARE NOT examples of a concept) </li></ul></ul>
    6. 6. Concept Learning <ul><li>Needs empirical validation </li></ul><ul><li>Dense or sparse data determine quality of different methods </li></ul>
    7. 7. Validation of Concept Learning i <ul><li>The learned concept should be able to correctly classify new instances of the concept </li></ul><ul><ul><li>When it succeeds in a real instance of the concept it finds true positives </li></ul></ul><ul><ul><li>When it fails in a real instance of the concept it finds false negatives </li></ul></ul>
    8. 8. Validation of Concept Learning ii <ul><li>The learned concept should be able to correctly classify new instances of the concept </li></ul><ul><ul><li>When it succeeds in a counterexample it finds true negatives </li></ul></ul><ul><ul><li>When it fails in a counterexample it finds false positives </li></ul></ul>
    9. 9. Basic classification tasks <ul><li>Classification </li></ul><ul><li>Categorization </li></ul><ul><li>Clustering </li></ul>
    10. 10. Categorization
    11. 11. Classification
    12. 12. Clustering
    13. 13. Clustering <ul><li>Data analysis method applied to data </li></ul><ul><li>Data should naturally possess groupings </li></ul><ul><li>Goal: group data into clusters </li></ul><ul><li>Resulting clusters are collections where objects within a cluster are similar to each other </li></ul><ul><li>Objects outside the cluster are dissimilar to objects inside </li></ul><ul><li>Objects from one cluster are dissimilar to objects in other clusters </li></ul><ul><li>Distance measures are used to compute similarity </li></ul>
    14. 14. Rule Learning <ul><li>Learning widely used in data mining </li></ul><ul><li>Version Space Learning is a search method to learn rules </li></ul><ul><li>Decision Trees </li></ul>
    15. 15. Version Space i <ul><li>A=1,B=1,C=1  Outcome=1 </li></ul><ul><li>A=0,B=.5,C=.5  Outcome=0 </li></ul><ul><li>A=0,B=0,C=.3  Outcome=.5 </li></ul><ul><li>Creates tree that includes all possible combinations </li></ul><ul><li>Does not learn for rules with disjunctions (i.e. OR statements) </li></ul><ul><li>Incremental method, trains additional data without the need to retrain all data </li></ul>
    16. 16. Decision trees <ul><li>Knowledge representation formalism </li></ul><ul><li>Represent mutually exclusive rules (disjunction) </li></ul><ul><li>A way of breaking up a data set into classes or categories </li></ul><ul><li>Classification rules that determine, for each instance with attribute values, whether it belongs to one or another class </li></ul>
    17. 17. Decision trees consist of: - leaf nodes (classes) - decision nodes (tests on attribute values) - from decision nodes branches grow for each possible outcome of the test From Cawsey, 1997
    18. 18. Decision tree induction <ul><li>Goal is to correctly classify all example data </li></ul><ul><li>Several algorithms to induce decision trees: ID3 (Quinlan 1979) , CLS, ACLS, ASSISTANT, IND, C4.5 </li></ul><ul><li>Constructs decision tree from past data </li></ul><ul><li>Not incremental </li></ul><ul><li>Attempts to find the simplest tree (not guaranteed because it is based on heuristics) </li></ul>
    19. 19. <ul><li>From: </li></ul><ul><ul><li>a set of target classes </li></ul></ul><ul><ul><li>Training data containing objects of more than one class </li></ul></ul><ul><li>ID3 uses test to refine the training data set into subsets that contain objects of only one class each </li></ul><ul><li>Choosing the right test is the key </li></ul>ID3 algorithm
    20. 20. <ul><li>Information gain or ‘minimum entropy’ </li></ul><ul><li>Maximizing information gain corresponds to minimizing entropy </li></ul><ul><li>Predictive features (good indicators of the outcome) </li></ul>How does ID3 chooses tests
    21. 21. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
    22. 22. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
    23. 23. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
    24. 24. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
    25. 25. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
    26. 26. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
    27. 27. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yes yes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No.
    28. 28. Explanation-based learning <ul><li>Incorporates domain knowledge into the learning process </li></ul><ul><li>Feature values are assigned a relevance factor if their values are consistent with domain knowledge </li></ul><ul><li>Features that are assigned relevance factors are considered in the learning process </li></ul>
    29. 29. Familiar Learning Task <ul><li>Learn relative importance of features </li></ul><ul><li>Goal: learn individual weights </li></ul><ul><li>Commonly used in case-based reasoning </li></ul><ul><li>Methods include a similarity measure to get feedback about verify their relative importance: feedback methods </li></ul><ul><li>Search methods: gradient descent </li></ul><ul><li>ID3 </li></ul>
    30. 30. Classification using Naive Bayes <ul><li>Naïve Bayes classifier uses two sources of information to classify a new instance </li></ul><ul><ul><li>The distribution of the rtaining dataset (prior probability) </li></ul></ul><ul><ul><li>The region surrounding the new instance in the dataset (likelihood) </li></ul></ul><ul><li>Naïve because assumes conditional independence not always applicable </li></ul><ul><li>It is made to simplify the computation and in this sense considered to be “Naïve”. </li></ul><ul><li>Conditional independence reduces the requirement for large number of observations </li></ul><ul><li>Bias in estimating probabilities often may not make a difference in practice -- it is the order of the probabilities, not their exact values, that determine the classifications. </li></ul><ul><li>Comparable in performance with classification trees and with neural networks </li></ul><ul><li>Highly accurate and fast when applied to large databases </li></ul><ul><li>Some links: </li></ul><ul><ul><li>http ://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm </li></ul></ul><ul><ul><li>http://www.statsoft.com/textbook/stnaiveb.html </li></ul></ul>
    31. 31. KDD : definition <ul><li>Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, and potential useful and understandable patterns in data. (R.Feldman,2000) </li></ul><ul><li>KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayad, Piatetsky-Shapiro, Smyth 1996 p. 6). </li></ul><ul><li>Data mining is one of the steps in the KDD process. </li></ul><ul><li>Text mining concerns applying data mining techniques to unstructured text. </li></ul>
    32. 32. The KDD Process DATA patterns interpretation SELECTED DATA PROCESSED DATA browsing KNOWLEDGE TRANSFORMED DATA filtering preprocessing transformation Data mining
    33. 33. <ul><li>Predictive modeling/risk assessment </li></ul><ul><li>Database segmentation </li></ul>Data mining tasks i Classification, decision trees Kohonen nets, clustering techniques
    34. 34. <ul><li>Link analysis </li></ul><ul><li>Deviation detection </li></ul>Data mining tasks ii <ul><li>Rules: </li></ul><ul><li>Association generation </li></ul><ul><li>Relationships between entities </li></ul><ul><li>How things change over time, trends </li></ul>
    35. 35. KDD applications <ul><li>Fraud detection </li></ul><ul><ul><li>Telecom (calling cards, cell phones) </li></ul></ul><ul><ul><li>Credit cards </li></ul></ul><ul><ul><li>Health insurance </li></ul></ul><ul><li>Loan approval </li></ul><ul><li>Investment analysis </li></ul><ul><li>Marketing and sales data analysis </li></ul><ul><ul><li>Identify potential customers </li></ul></ul><ul><ul><li>Effectiveness of sales campaign </li></ul></ul><ul><ul><li>Store layout </li></ul></ul>
    36. 36. Text mining <ul><li>The problem starts with a query and the solution is a set of information (e.g., patterns, connections, profiles, trends) contained in several different texts that are potentially relevant to the initial query. </li></ul>
    37. 37. Text mining applications <ul><li>IBM Text Navigator </li></ul><ul><ul><li>Cluster documents by content; </li></ul></ul><ul><ul><li>Each document is annotated by the 2 most frequently used words in the cluster; </li></ul></ul><ul><li>Concept Extraction (Los Alamos) </li></ul><ul><ul><li>Text analysis of medical records; </li></ul></ul><ul><ul><li>Uses a clustering approach based on trigram representation; </li></ul></ul><ul><ul><li>Documents in vectors, cosine for comparison; </li></ul></ul>

    ×