Introduction to Machine Learning
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Introduction to Machine Learning

  • 711 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
711
On Slideshare
711
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
8
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Aims 09s1: COMP9417 Machine Learning and Data Mining This lecture will provide the basis for you to be able to describe the motivation, scope and some application areas of machine learning. Introduction to Machine Learning Following it you should be able to: • describe the general learning problem March 12, 2008 • state some of the steps in setting up a learning problem • list some applications of machine learning • list some issues in machine learning Acknowledgement: Material derived from slides for the book Machine Learning, Tom Mitchell, McGraw-Hill, 1997 http://www-2.cs.cmu.edu/~tom/mlbook.html COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 1 Overview Why Machine Learning [Recommended reading: Mitchell, Chapter 1] [Recommended exercises: 1.1,1.2, optionally 1.5] • Considerable progress in algorithms and theory • Why Machine Learning? • Growing flood of online data • What is a well-defined learning problem? • Increasing computational power • An example: learning to play checkers (draughts) • Many successful commercial/scientific applications • What questions should we ask about Machine Learning? COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 2 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 3
  • 2. Three niches for machine learning: Some definitions • Data mining: using historical data to improve decisions machine learning the science of algorithmic methods of learning from experience with the goal of improving performance on selected tasks – medical records → medical knowledge • Software applications we can’t program by hand data mining the use of machine learning or statistical algorithms to search large amounts of data for hidden patterns or relationships that are – autonomous robots interesting and potentially useful – speech recognition • Self customizing programs – Web sites that learn user interests COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 4 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 5 Typical data mining Task Datamining Result Patient103 time=1 Patient103 time=2 ... Patient103 time=n Patient103 time=1 Patient103 time=2 ... Patient103 time=n Age: 23 Age: 23 Age: 23 FirstPregnancy: no FirstPregnancy: no FirstPregnancy: no Age: 23 Age: 23 Age: 23 Anemia: no Anemia: no Anemia: no FirstPregnancy: no FirstPregnancy: no FirstPregnancy: no Diabetes: no Diabetes: YES Diabetes: no Anemia: no Anemia: no Anemia: no PreviousPrematureBirth: no PreviousPrematureBirth: no PreviousPrematureBirth: no Diabetes: no Diabetes: YES Diabetes: no Ultrasound: ? Ultrasound: abnormal Ultrasound: ? PreviousPrematureBirth: no PreviousPrematureBirth: no PreviousPrematureBirth: no Elective C−Section: ? Elective C−Section: no Elective C−Section: no Ultrasound: ? Ultrasound: abnormal Ultrasound: ? Emergency C−Section: ? Emergency C−Section: ? Emergency C−Section: Yes Elective C−Section: ? Elective C−Section: no Elective C−Section: no ... ... ... Emergency C−Section: ? Emergency C−Section: ? Emergency C−Section: Yes ... ... ... Given: One of 18 learned rules: • 9714 patient records, each describing a pregnancy and birth If No previous vaginal delivery, and • Each patient record contains 215 features Abnormal 2nd Trimester Ultrasound, and Malpresentation at admission Learn to predict: Then Probability of Emergency C-Section is 0.6 Over training data: 26/41 = .63, • Classes of future patients at high risk for Emergency Cesarean Section Over test data: 12/20 = .60 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 6 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 7
  • 3. Credit Risk Analysis Other Prediction Problems ... Customer103: (time=t0) Customer103: (time=t1) Customer103: (time=tn) Customer purchase behavior: Years of credit: 9 Years of credit: 9 Years of credit: 9 Loan balance: $2,400 Loan balance: $3,250 Loan balance: $4,500 ... Customer103: (time=t0) Customer103: (time=t1) Customer103: (time=tn) Income: $52k Income: ? Income: ? Own House: Yes Own House: Yes Own House: Yes Sex: M Sex: M Sex: M Age: 53 Age: 53 Age: 53 Other delinquent accts: 2 Other delinquent accts: 2 Other delinquent accts: 3 Income: $50k Income: $50k Income: $50k Max billing cycles late: 3 Max billing cycles late: 4 Max billing cycles late: 6 Own House: Yes Own House: Yes Own House: Yes Profitable customer?: ? Profitable customer?: ? Profitable customer?: No MS Products: Word MS Products: Word MS Products: Word ... ... ... Computer: 386 PC Computer: Pentium Computer: Pentium Purchase Excel?: ? Purchase Excel?: ? Purchase Excel?: Yes ... ... ... Rules learned from synthesized data: If Other-Delinquent-Accounts > 2, and Customer retention: Number-Delinquent-Billing-Cycles > 1 Customer103: (time=t0) Customer103: (time=t1) ... Customer103: (time=tn) Then Profitable-Customer? = No [Deny Credit Card application] Sex: M Sex: M Sex: M Age: 53 Age: 53 Age: 53 Income: $50k Income: $50k Income: $50k If Other-Delinquent-Accounts = 0, and Own House: Yes Own House: Yes Own House: Yes Checking: $5k Checking: $20k Checking: $0 (Income > $30k) OR (Years-of-Credit > 3) Savings: $15k Savings: $0 Savings: $0 Then Profitable-Customer? = Yes [Accept Credit Card application] Current−customer?: yes ... Current−customer?: yes ... Current−customer?: No COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 8 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 9 Process optimization: Tasmanian Apple Thinning Product72: (time=t0) Product72: (time=t1) ... Product72: (time=tn) Stage: mix Stage: cook Stage: cool Apple orchards are important in primary production in Tasmania, and Mixing−speed: 60rpm Temperature: 325 Fan−speed: medium Viscosity: 1.3 Viscosity: 3.2 Viscosity: 1.3 there has been a long history in the process of apple thinning. Apples are Fat content: 15% Density: 2.8 Fat content: 12% Density: 1.1 Fat content: 12% Density: 1.2 naturally biennial bearing,. Trees flower heavily one year producing a large Spectral peak: 2800 Spectral peak: 3200 Spectral peak: 3100 crop of small fruit (called the ”On” year) followed by light flowering the Product underweight?: ?? Product underweight?: ?? Product underweight?: Yes ... ... ... next year with a small crop of large poor quality fruit. Thinning is most economically done by applying sprays of chemicals that act similarly to plant hormones and cause the abortion of flowers and fruitlets at an early stage of development. Early thinning favours the development of the desirable high density of cells in the fruit. Orchardists – decision about concentration of thinning agent at blossom time. If concentration too low, then thinning is not effective and cost of hand thinning is prohibitive,. If the concentration too high, then risk of losing all the fruit. Decision is difficult because of large number of variables to be taken into account. COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 10 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 11
  • 4. • trees - cultivar, rootstock and age. BG Gas Drilling - “Stuck Pipe” • physiology - previous crop, vigour, number of blossom buds. • pruning - severity of detailed pruning, limb thinning, and penetration of light into the canopy. • market - size of fruit required for the market. • spraying - type of spray machinery and volume of water to be used in the machinery. 60 tasks, (some with 50 decision tree leaves (i.e. rule paths), plus 30 other variables and 40 procedures supported by a customized help file of 5,000 words. Drilling is a hugely expensive process, with daily costs for a North Sea operation typically incurring rig costs of around $50,000 per day. Clearly, anything that helps to reduce the time when a drilling rig is not productive has the potential to achieve huge savings. COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 12 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 13 Daily report data from two databases: One of which was old and included Nissan - Car selection incomplete or absent data - particularly IADC (International Association of Drilling Contractors) codes. The other database was compiled more recently and included a large amount of additional data about well site geology, drilling costs, etc. Sixty recorded occurrences of Stuck Pipe in 170 BG wells. Possible to mine the data and to determine trends. Much of the time invested by the project team has concentrated on getting data in good order. Results indicate that length of time the hole has been open; the properties of the drilling mud; and the frequency with which the mud is conditioned all play a significant role in the incidence of Stuck Pipe. Starting from the basic choices of 3 alternative engines, 3 types of suspension, 2 types of transmission, 9 colours and 3 styles of seat fabric, customers can go far further and create a car to suit their own personality. COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 14 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 15
  • 5. With 670,000 possible combinations, “it is a totally new concept” says Channel 4 TV scheduling Takao Ohmura, Sales Manager of Tokyo Nissan Computer Systems. A guidebook explains the options in table form and we were able to input During the day, Channel 4’s strength is the housewife market whilst in these tables into XpertRule. Normally it is difficult to utilise such a large the evenings Channel 4’s strength lies in its varied targeting ability. In matrix, but XpertRule was able to automatically generate a decision tree comparison with ITV, Channel 4 audiences contain a greater proportion structure to arrive at the correct model, from attributes and values in the of younger, lighter, up-market, male viewers (audience research has also tables. identified Channel 4’s ability to target cluster groups defined by names such as “Progressive Priscillas” and “Free-thinking Franks”). It met our three major requirements: (1) the model selection and check must be completed in three minutes: (2) the ability to run on Nissan Advertisers may specify to have commercials placed first in the break, dealers hardware, and (3) ease of maintaining the system after the launch last in the break or “Top & Tail” in a break making break sequencing of the Cefiro model. a challenge if optimal use of airtime is to be achieved. Definition of a knowledge-based system to solve the problem requires observation of a number of prioritised “rules”: Top of the list is the need for no overlaps or gaps, with Top and Tail or First and Last network spots also receiving high priority. Lower down the list are First and Last Super-macro spots and non-reporting Super-macros sequenced to play at the same time. COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 16 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 17 Optimization problems – as the number of possible combinations grows Problems Too Difficult to Program by Hand it becomes impractical to try all combinations to arrive at a solution in a reasonable time. Rule of thumb can be used to narrow down options but, in most cases, good ALVINN [Pomerleau] drives 70 mph on highways ! rules are not available or are difficult to capture. Numerical optimization techniques are currently available in most advanced spreadsheets, but these tend to be incapable of optimizing problems involving sequencing or scheduling and they are “exploitation” rather than “exploration” techniques. The solution involved the use of genetic algorithm techniques which allows the exploration of large search spaces for optimal or near optimal solutions. COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 18 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 19
  • 6. Sharp Straight Sharp Stanley - DARPA Grand Challenge Champion 2005 Left Ahead Right 30 Output Units 4 Hidden Units 30x32 Sensor • won 2 million dollars (US), first team to complete 132 mile course Input Retina • modified VW Touareg R5 with drive-by-wire, took 6 hours 54 minutes averaging over 19 mph • seven Pentium M computers, GPS and various sensors • localization, mapping and collision avoidance COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 20 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 21 Software that adapts to User Measuring Neural Activity • Brin & Page - PhD students in data mining at Stanford • Botros, van Dijk & Killian (2007) - Cochlear implant adjustment • PageRank algorithm (1998) • Expert system uses neural response telemetry (ECAP) • Google business model - technology targets advertisements to users • Decision tree learning - Quinlan’s C5 and Cubist COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 22 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 23
  • 7. concentration of yeast in the wells of the microtitre trays using the biological entities. adjacent plate reader and returns the results to the LIMS (although The original bioinformatic information for the AAA model was microtitre trays are still moved in and out of incubators manually). Scientific Discovery taken mainly from the KEGG13 catalogue of metabolism. The model Scientific Discovery was then tested with all possible auxotrophic experiments involving The Robot Scientist project (2004) Robot scientist in the lab a single replacement metabolite, and was altered manually to fit the empirical results. To ensure that the model was not ‘over-fitted’, we carried out all possible auxotrophic experiments with pairs of metabolites. The model correctly predicted at least 98.5% of the experiments (Supplementary Information). To the best of our knowledge, no bioinformatic model has been as thoroughly tested with knockout mutants. Machine learning is the branch of artificial intelligence that seeks to develop computer systems that improve their performance automatically with experience14,15. It has much in common with statistics, but differs in having a greater emphasis on algorithms, Figure 1 The Robot Scientist hypothesis-generation and experimentation loop. data representation and making acquired knowledge explicit. The 248 NATURE | VOL 427 | 15 JANUARY 2004 | www.nature.com/nature COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 24 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 25 Where Is this Headed ? Where Is this Headed ? Mature algorithms Opportunity for tomorrow: enormous impact • decision trees, regression, neural nets, Bayesian methods ... • Learn across full mixed-media data • can be applied to standard database relations or flat files • Learn across multiple internal databases, plus the web and newsfeeds • established software and services industry • Learn by active experimentation • Learn more complex functions • Learn by analogy • Cumulative, lifelong learning and adaptation • Programming languages and systems with learning embedded ? COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 26 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 27
  • 8. Relevant Disciplines A definition of the learning problem • Artificial intelligence Learning = improving with experience at some task • Computational complexity theory • Improve over task T , • Statistics • with respect to performance measure P , • Information theory • based on experience E. • Bayesian methods • Control theory E.g., Learn to play checkers (draughts) • Philosophy • T : Play checkers • Psychology and neurobiology • P : % of games won in world tournament • Physics • E: opportunity to play against self • ... COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 28 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 29 Learning to Play Checkers Type of Training Experience • T : Play checkers • Direct or indirect? • P : Percent of games won in world tournament • Teacher or not? • What experience? A problem: is training experience representative of performance goal? • What exactly should be learned? • How shall it be represented? • What specific algorithm to learn it? COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 30 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 31
  • 9. Choose the Target Function Possible Definition for Target Function V • ChooseM ove : Board → M ove ?? • if b is a final board state that is won, then V (b) = 100 • V : Board → ?? • if b is a final board state that is lost, then V (b) = −100 • ... • if b is a final board state that is drawn, then V (b) = 0 • if b is a not a final state in the game, then V (b) = V (b ), where b is the best final board state that can be achieved starting from b and playing optimally until the end of the game. This gives correct values, but is not operational COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 32 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 33 Choose Representation for Target Function A Representation for Learned Function • collection of rules? • neural network ? w0 + w1 · bp(b) + w2 · rp(b) + w3 · bk(b) + w4 · rk(b) + w5 · bt(b) + w6 · rt(b) • polynomial function of board features? • bp(b): number of black pieces on board b • ... • rp(b): number of red pieces on b • bk(b): number of black kings on b • rk(b): number of red kings on b • bt(b): number of red pieces threatened by black (i.e., which can be taken on black’s next turn) • rt(b): number of black pieces threatened by red COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 34 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 35
  • 10. Obtaining Training Examples Choose Weight Tuning Rule LMS Weight update rule: • V (b): the true target function ˆ • V (b) : the learned function Do repeatedly: • Vtrain(b): the training value • Select a training example b at random One rule for estimating training values: 1. Compute error(b): ˆ error(b) = Vtrain(b) − V (b) ˆ • Vtrain(b) ← V (Successor(b)) 2. For each board feature fi, update weight wi: wi ← wi + c · fi · error(b) c is some small constant, say 0.1, to moderate the rate of learning COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 36 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 37 Design Choices Some Issues in Machine Learning Determine Type of Training Experience Games against Table of correct ... • What algorithms can approximate functions well (and when)? experts Games against moves self • How does number of training examples influence accuracy? Determine Target Function • How does complexity of hypothesis representation impact it? Board Board ... ¨ move ¨ value • How does noisy data influence accuracy? Determine Representation of Learned Function • What are the theoretical limits of learnability? ... Polynomial Linear function Artificial neural • How can prior knowledge of learner help? of six features network Determine • What clues can we get from biological learning systems? Learning Algorithm • How can systems alter their own representations? Linear ... Gradient programming descent Completed Design COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 38 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 39