Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
645
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. What’s it all about?   Data vs information   Data mining and machine learning   Structural descriptions Data Mining z z Rules: classification and association Decision trees Practical Machine Learning Tools and Techniques   Datasets z Weather, contact lens, CPU performance, labor negotiation Slides for Chapter 1 of Data Mining by I. H. Witten and E. Frank data, soybean classification   Fielded applications z Loan applications, screening images, load forecasting, machine fault diagnosis, market basket analysis Generalization as search   Data mining and ethics   07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 1 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 2 Data vs. information Information is crucial Society produces huge amounts of data Example 1: in vitro fertilization ¡ ¡ z Sources: business, science, medicine, economics, z Given: embryos described by 60 features geography, environment, sports, … z Problem: selection of embryos that will survive Potentially valuable resource Data: historical records of embryos and outcome ¡ z Raw data is useless: need techniques to Example 2: cow culling ¡ ¡ automatically extract information from it z Given: cows described by 700 features z Data: recorded facts z Problem: selection of cows that should be culled z Information: patterns underlying the data z Data: historical records and farmers’ decisions 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 3 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 4 Data mining Machine learning techniques Extracting Algorithms for acquiring structural descriptions ¡ ¡ z implicit, from examples z previously unknown, ¡ Structural descriptions represent patterns z potentially useful explicitly information from data z Can be used to predict outcome in new situation ¡ Needed: programs that detect patterns and z Can be used to understand and explain how prediction is derived regularities in the data (may be even more important) ¡ Strong patterns ‰good predictions Methods originate from artificial intelligence, ¡ z Problem 1: most patterns are not interesting statistics, and research on databases z Problem 2: patterns may be inexact (or spurious) z Problem 3: data may be garbled or missing 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 5 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 6 1
  • 2. Structural descriptions Can machines really learn? Example: if-then rules Definitions of “learning” from dictionary: ¡   If tear production rate = reduced To get knowledge of by study, Difficult to measure then recommendation = none experience, or being taught Otherwise, if age = young and astigmatic = no To become aware by information or then recommendation = soft from observation To commit to memory Trivial for computers To be informed of, ascertain; to receive Age Spectacle Astigmatism Tear production Recomm ended instruction prescription rate lenses Operational definition:   Young Myope No Reduced None Things learn when they change their Does a slipper learn? Young Hyperm etrope No Norm al Soft behavior in a way that makes them Pre- presbyopic Hyperm etrope No Reduced None perform better in the future. Presbyopic Myope Yes Norm al Hard Does learning imply intention?   … … … … … 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 7 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 8 The weather problem Ross Quinlan Conditions for playing a certain game Machine learning researcher from 1970’s ¡   University of Sydney, Australia ¡ Outlook Tem perature Hum idity Windy Play Sunny Hot High False No 1986 “Induction of decision trees” ML Journal Sunny Hot High True No Overcast Hot High False Yes 1993 C4.5: Programs for machine learning. Rainy Mild Normal False Yes Morgan Kaufmann … … … … … 199? Started If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 9 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 10 Classification vs. association rules Weather data with mixed attributes Classification rule: Some attributes have numeric values ¡   predicts value of a given attribute (the classification of an example) Outlook Temperature Humidity Windy Play If outlook = sunny and humidity = high Sunny 85 85 False No then play = no Sunny 80 90 True No Overcast 83 86 False Yes Association rule: ¡ Rainy 75 80 False Yes predicts value of arbitrary attribute (or combination) … … … … … If temperature = cool then humidity = normal If humidity = normal and windy = false If outlook = sunny and humidity > 83 then play = no then play = yes If outlook = rainy and windy = true then play = no If outlook = sunny and play = no If outlook = overcast then play = yes then humidity = high If humidity < 85 then play = yes If windy = false and play = no then outlook = sunny and humidity = high If none of the above then play = yes 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 11 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 12 2
  • 3. The contact lenses data A complete and correct rule set Age Spectacle Ast igmat ism Tear product ion Recom mended prescript ion rate l enses Young Myope No Reduced None If tear production rate = reduced then recommendation = none Young Myope No Norm al Sof t If age = young and astigmatic = no Young Myope Yes Reduced None and tear production rate = normal then recommendation = soft Young Myope Yes Norm al Hard If age = pre-presbyopic and astigmatic = no Young Hypermet rope No Reduced None and tear production rate = normal then recommendation = soft Young Hypermet rope No Norm al Sof t If age = presbyopic and spectacle prescription = myope Young Hypermet rope Yes Reduced None and astigmatic = no then recommendation = none Young Hypermet rope Yes Norm al hard Pre- presbyopic Myope No Reduced None If spectacle prescription = hypermetrope and astigmatic = no Pre- presbyopic Myope No Norm al Sof t and tear production rate = normal then recommendation = soft Pre- presbyopic Myope Yes Reduced None If spectacle prescription = myope and astigmatic = yes Pre- presbyopic Myope Yes Norm al Hard and tear production rate = normal then recommendation = hard Pre- presbyopic Hypermet rope No Reduced None If age young and astigmatic = yes Pre- presbyopic Hypermet rope No Norm al Sof t and tear production rate = normal then recommendation = hard Pre- presbyopic Hypermet rope Yes Reduced None If age = pre-presbyopic Pre- presbyopic Hypermet rope Yes Norm al None and spectacle prescription = hypermetrope Presbyopic Myope No Reduced None and astigmatic = yes then recommendation = none Presbyopic Myope No Norm al None Presbyopic Myope Yes Reduced None If age = presbyopic and spectacle prescription = hypermetrope Presbyopic Myope Yes Norm al Hard and astigmatic = yes then recommendation = none Presbyopic Hypermet rope No Reduced None Presbyopic Hypermet rope No Norm al Sof t Presbyopic Hypermet rope Yes Reduced None Presbyopic Hypermet rope Yes Norm al None 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 13 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 14 A decision tree for this problem Classifying iris flowers Sepal length Sepal width Petal length Petal width Type 1 5.1 3.5 1.4 0.2 Iris setosa 2 4.9 3.0 1.4 0.2 Iris setosa … 51 7.0 3.2 4.7 1.4 Iris versicolor 52 6.4 3.2 4.5 1.5 Iris versicolor … 101 6.3 3.3 6.0 2.5 Iris virginica 102 5.8 2.7 5.1 1.9 Iris virginica … If petal length < 2.45 then Iris setosa If sepal width < 2.10 then Iris versicolor ... 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 15 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 16 Predicting CPU performance Data from labor negotiations   Example: 209 different computer configurations At tribut e Type 1 2 3 … 40 Cycle time Main memory Cache Channels Performance Durati on (Number of years) 1 2 3 2 (ns) (Kb) (Kb) Wage increase first year Percent age 2% 4% 4.3% 4.5 Wage increase second year Percent age ? 5% 4.4% 4.0 MYCT MMIN MMAX CACH CHMIN CHMAX PRP Wage increase third year Percent age ? ? ? ? Cost of living adj ust ment {none,t cf,tc} none t cf ? none 1 125 256 6000 256 16 128 198 Working hours per week (Number of hours) 28 35 38 40 2 29 8000 32000 32 8 32 269 Pension {none,ret - allw, empl- cntr} none ? ? ? … St andby pay Percent age ? 13% ? ? Shift - work suppl ement Percent age ? 5% 4% 4 208 480 512 8000 32 0 0 67 Educat ion al lowance {yes,no} yes ? ? ? 209 480 1000 4000 0 0 0 45 St atutory holidays (Number of days) 11 15 12 12 Vacati on {below- avg,avg,gen} avg gen gen avg Long- term di sabili ty assist ance {yes,no} no ? ? yes Dent al plan contri bution {none,half ,f ull} none ? f ull f ull   Linear regression function Bereavem ent assi stance {yes,no} no ? ? yes Healt h plan contribut ion {none,half ,f ull} none ? f ull half Acceptabi lity of contract {good,bad} bad good good good PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX + 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 17 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 18 3
  • 4. Decision trees for the labor data Soybean classification Attribute Num ber Sam ple value of values Environment Tim e of occurrence 7 July Precipitation 3 Above normal … Seed Condition 2 Norm al Mold growth 2 Absent … Fruit Condition of fruit 4 Norm al pods Fruit spots 5 ? Leaf Condition 2 Abnormal Leaf spot size 3 ? … Stem Condition 2 Abnormal Stem lodging 2 Yes … Root Condition 3 Norm al Diagnosis 19 Diaporthe stem canker 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 19 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 20 The role of domain knowledge Fielded applications The result of learning—or the learning method ¡ If leaf condition is normal itself—is deployed in practical applications and stem condition is abnormal z Processing loan applications and stem cankers is below soil line and canker lesion color is brown z Screening images for oil slicks then z Electricity supply forecasting diagnosis is rhizoctonia root rot z Diagnosis of machine faults If leaf malformation is absent z Marketing and sales and stem condition is abnormal z Separating crude oil and natural gas and stem cankers is below soil line and canker lesion color is brown z Reducing banding in rotogravure printing then z Finding appropriate technicians for telephone faults diagnosis is rhizoctonia root rot z Scientific applications: biology, astronomy, chemistry z Automatic selection of TV programs But in this domain, “leaf condition is normal” implies z Monitoring intensive care patients “leaf malformation is absent”! 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 21 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 22 Processing loan applications Enter machine learning (American Express) Given: questionnaire with 1000 training examples of borderline cases ¡ ¡ financial and personal information ¡ 20 attributes: Question: should money be lent? age ¡ z ¡ Simple statistical method covers 90% of cases z years with current employer ¡ Borderline cases referred to loan officers z years at current address z years with the bank But: 50% of accepted borderline cases defaulted! ¡ z other credit cards possessed,… Solution: reject all borderline cases? ¡ Learned rules: correct on 70% of cases ¡ z No! Borderline cases are most active customers z human experts only 50% Rules could be used to explain decisions to ¡ customers 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 23 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 24 4
  • 5. Screening images Enter machine learning Given: radar satellite images of coastal waters ¡ Extract dark regions from normalized image ¡ Problem: detect oil slicks in those images Attributes: ¡ ¡ ¡ Oil slicks appear as dark regions with changing z size of region size and shape z shape, area z intensity Not easy: lookalike dark regions can be caused ¡ z sharpness and jaggedness of boundaries by weather conditions (e.g. high wind) z proximity of other regions Expensive process requiring highly trained ¡ z info about background personnel ¡ Constraints: z Few training examples—oil slicks are rare! z Unbalanced data: most dark regions aren’t slicks z Regions from same image form a batch z Requirement: adjustable false-alarm rate 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 25 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 26 Load forecasting Enter machine learning Electricity supply companies ¡ Prediction corrected using “most similar” days ¡ need forecast of future demand Attributes: ¡ for power z temperature Forecasts of min/max load for each hour ¡ z humidity ‰significant savings z wind speed Given: manually constructed load model that ¡ z cloud cover readings assumes “normal” climatic conditions z plus difference between actual load and predicted load Problem: adjust for weather conditions ¡ Average difference among three “most similar” days ¡ Static model consist of: added to static model ¡ z base load for the year ¡ Linear regression coefficients form attribute weights z load periodicity over the year in similarity function z effect of holidays 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 27 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 28 Diagnosis of machine faults Enter machine learning Available: 600 faults with expert’s diagnosis   Diagnosis: classical domain ¡ of expert systems   ~300 unsatisfactory, rest used for training Attributes augmented by intermediate concepts   Given: Fourier analysis of vibrations measured ¡ at various points of a device’s mounting that embodied causal domain knowledge Expert not satisfied with initial rules because they   Question: which fault is present? ¡ did not relate to his domain knowledge Preventative maintenance of electromechanical ¡ motors and generators   Further background knowledge resulted in more complex rules that were satisfactory Information very noisy ¡   Learned rules outperformed hand-crafted ones So far: diagnosis by expert/hand-crafted rules ¡ 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 29 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 30 5
  • 6. Marketing and sales I Marketing and sales II Companies precisely record massive amounts of Market basket analysis ¡ ¡ marketing and sales data z Association techniques find ¡ Applications: groups of items that tend to occur together in a z Customer loyalty: transaction identifying customers that are likely to defect by (used to analyze checkout data) detecting changes in their behavior Historical analysis of purchasing patterns ¡ (e.g. banks/phone companies) Special offers: Identifying prospective customers ¡ z identifying profitable customers z Focusing promotional mailouts (e.g. reliable owners of credit cards that need extra (targeted campaigns are cheaper than mass- money during the holiday season) marketed ones) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 31 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 32 Machine learning and statistics Statisticians ¡ Historical difference (grossly oversimplified):   Sir Ronald Aylmer Fisher z Statistics: testing hypotheses   Born: 17 Feb 1890 London, England z Machine learning: finding the right hypothesis Died: 29 July 1962 Adelaide, Australia   Numerous distinguished contributions to But: huge overlap ¡ developing the theory and application of statistics z Decision trees (C4.5 and CART) for making quantitative a vast field of biology z Nearest-neighbor methods Today: perspectives have converged ¡ z Most ML algorithms employ statistical techniques ¡ Leo Breiman ¡ Developed decision trees ¡ 1984 Classification and Regression Trees. Wadsworth. 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 33 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 34 Generalization as search Enumerating the concept space Search space for weather problem ¡ Inductive learning: find a concept description ¡ that fits the data z 4 x 4 x 3 x 3 x 2 = 288 possible combinations z With 14 rules ‰2.7x10 34 possible rule sets Example: rule sets as description language ¡ Other practical problems: ¡ z Enormous, but finite, search space z More than one description may survive Simple solution: ¡ z No description may survive z enumerate the concept space Language is unable to describe target concept ¢ z eliminate descriptions that do not fit examples ¢ or data contains noise z surviving descriptions contain target concept ¡ Another view of generalization as search: hill-climbing in description space according to pre- specified matching criterion z Most practical algorithms use heuristic search that cannot guarantee to find the optimum solution 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 35 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 36 6
  • 7. Bias Language bias Important decisions in learning systems: Important question: ¡ ¡ z Concept description language z is language universal z Order in which the space is searched or does it restrict what can be learned? Way that overfitting to the particular training data is Universal language can express arbitrary subsets ¡ z avoided of examples These form the “bias” of the search: If language includes logical or (“disjunction”), it is ¡ ¡ z Language bias universal z Search bias ¡ Example: rule sets z Overfitting-avoidance bias Domain knowledge can be used to exclude some ¡ concept descriptions a priori from the search 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 37 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 38 Search bias Overfitting-avoidance bias Search heuristic Can be seen as a form of search bias ¡ ¡ z “Greedy” search: performing the best single step ¡ Modified evaluation criterion z “Beam search”: keeping several alternatives z E.g. balancing simplicity and number of errors z … ¡ Modified search strategy Direction of search ¡ z E.g. pruning (simplifying a description) z General-to-specific ¢ Pre-pruning: stops at a simple description before search ¢ E.g. specializing a rule by adding conditions proceeds to an overly complex one Post-pruning: generates a complex description first and ¢ z Specific-to-general simplifies it afterwards E.g. generalizing an individual instance into a rule ¢ 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 39 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 40 Data mining and ethics I Data mining and ethics II Ethical issues arise in Important questions: ¡ ¡ practical applications z Who is permitted access to the data? ¡ Data mining often used to discriminate z For what purpose was the data collected? z E.g. loan applications: using some information (e.g. z What kind of conclusions can be legitimately drawn sex, religion, race) is unethical from it? Ethical situation depends on application Caveats must be attached to results ¡ ¡ E.g. same information ok in medical application Purely statistical arguments are never sufficient! ¡ z Attributes may contain problematic information Are resources put to good use? ¡ ¡ z E.g. area code may correlate with race 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 41 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 1) 42 7