Your SlideShare is downloading. ×
0
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
This is a heavily data-oriented
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

This is a heavily data-oriented

276

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
276
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd Machine Learning: Making Computer Science Scientific
  • 2. Acknowledgements
    • VLSI Wafer Testing
      • Tony Fountain
    • Robot Navigation
      • Didac Busquets
      • Carles Sierra
      • Ramon Lopez de Mantaras
    • NSF grants IIS-0083292 and ITR-085836
  • 3. Outline
    • Three scenarios where standard software engineering methods fail
    • Machine learning methods applied to these scenarios
    • Fundamental questions in machine learning
    • Statistical thinking in computer science
  • 4. Scenario 1: Reading Checks Find and read “courtesy amount” on checks:
  • 5. Possible Methods:
    • Method 1: Interview humans to find out what steps they follow in reading checks
    • Method 2: Collect examples of checks and the correct amounts. Train a machine learning system to recognize the amounts
  • 6. Scenario 2: VLSI Wafer Testing
    • Wafer test: Functional test of each die (chip) while on the wafer
  • 7. Which Chips (and how many) should be tested?
    • Tradeoff:
      • Test all chips on wafer?
        • Avoid cost of packaging bad chips
        • Incur cost of testing all chips
      • Test none of the chips on the wafer?
        • May package some bad chips
        • No cost of testing on wafer
  • 8. Possible Methods
    • Method 1: Guess the right tradeoff point
    • Method 2: Learn a probabilistic model that captures the probability that each chip will be bad
      • Plug this model into a Bayesian decision making procedure to optimize expected profit
  • 9. Scenario 3: Allocating mobile robot camera
    • Binocular
    • No GPS
  • 10. Camera tradeoff
    • Mobile robot uses camera both for obstacle avoidance and landmark-based navigation
    • Tradeoff:
      • If camera is used only for navigation, robot collides with objects
      • If camera is used only for obstacle avoidance, robot gets lost
  • 11. Possible Methods
    • Method 1: Manually write a program to allocate the camera
    • Method 2: Experimentally learn a policy for switching between obstacle avoidance and landmark tracking
  • 12. Software Engineering Methodology
    • Analyze
      • Interview experts, users, etc. to determine the actions the system must perform
    • Design
      • Apply CS knowledge to design a solution
    • Implement
    • Test
  • 13. Challenges for SE Methodology
    • Standard SE methods fail when…
      • System requirements are hard to collect
      • The system must resolve difficult tradeoffs
  • 14. (1) System requirements are hard to collect
    • There are no human experts
      • Cellular telephone fraud
    • Human experts are inarticulate
      • Handwriting recognition
    • The requirements are changing rapidly
      • Computer intrusion detection
    • Each user has different requirements
      • E-mail filtering
  • 15. (2) The system must resolve difficult tradeoffs
    • VLSI Wafer testing
      • Tradeoff point depends on probability of bad chips, relative costs of testing versus packaging
    • Camera Allocation for Mobile Robot
      • Tradeoff depends on probability of obstacles, number and quality of landmarks
  • 16. Machine Learning: Replacing guesswork with data
    • In all of these cases, the standard SE methodology requires engineers to make guesses
      • Guessing how to do character recognition
      • Guessing the tradeoff point for wafer test
      • Guessing the tradeoff for camera allocation
    • Machine Learning provides a way of making these decisions based on data
  • 17. Outline
    • Three scenarios where software engineering methods fail
    • Machine learning methods applied to these scenarios
    • Fundamental questions in machine learning
    • Statistical thinking in computer science
  • 18. Basic Machine Learning Methods
    • Supervised Learning
    • Density Estimation
    • Reinforcement Learning
  • 19. Supervised Learning Training Examples Learning Algorithm Classifier New Examples 8 8 3 6 0 1
  • 20. AT&T/NCR Check Reading System Recognition transformer is a neural network trained on 500,000 examples of characters The entire system is trained given entire checks as input and dollar amounts as output LeCun, Bottou, Bengio & Haffner (1998) Gradient-Based Learning Applied to Document Recognition
  • 21. Check Reader Performance
    • 82% of machine-printed checks correctly recognized
    • 1% of checks incorrectly recognized
    • 17% “rejected” – check is presented to a person for manual reading
    • Fielded by NCR in June 1996; reads millions of checks per month
  • 22. Supervised Learning Summary
    • Desired classifier is a function y = f(x)
    • Training examples are desired input-output pairs (x i ,y i )
  • 23. Density Estimation Training Examples Learning Algorithm Density Estimator P(chip i is bad) = 0.42 Partially-tested wafer
  • 24. On-Wafer Testing System
    • Trained density estimator on 600 wafers from mature product (HP; Corvallis, OR)
      • Probability model is “naïve Bayes” mixture model with four components (trained with EM)
    W C209 C3 C2 C1 . . .
  • 25. One-Step Value of Information
    • Choose the larger of
      • Expected profit if we predict remaining chips, package, and re-test
      • Expected profit if we test chip Ci, then predict remaining chips, package, and re-test [for all Ci not yet tested]
  • 26. On-Wafer Chip Test Results 3.8% increase in profit
  • 27. Density Estimation Summary
    • Desired output is a joint probability distribution P(C 1 , C 2 , …, C 203 )
    • Training examples are points X= (C 1 , C 2 , …, C 203 ) sampled from this distribution
  • 28. Reinforcement Learning Environment state s reward r action a Agent’s goal: Choose actions to maximize total reward Action Selection Rule is called a “policy”: a =  (s) agent
  • 29. Reinforcement Learning Methods
    • Direct
      • Start with initial policy 
      • Experiment with environment to decide how to improve 
      • Repeat
    • Model Based
      • Experiment with environment to learn how it behaves (dynamics + rewards)
      • Compute optimal policy 
  • 30. Reinforcement Learning for Robot Navigation
    • Learning from rewards and punishments in the environment
      • Give reward for reaching goal
      • Give punishment for getting lost
      • Give punishment for collisions
  • 31. Experimental Results: % trials robot reaches goal Busquets, Lopez de Mantaras, Sierra, Dietterich (2002)
  • 32. Reinforcement Learning Summary
    • Desired output is an action selection policy 
    • Training examples are <s,a,r,s’> tuples collected by the agent interacting with the environment
  • 33. Outline
    • Three scenarios where software engineering methods fail
    • Machine learning methods applied to these scenarios
    • Fundamental questions in machine learning
    • Statistical thinking in computer science
  • 34. Fundamental Issues in Machine Learning
    • Incorporating Prior Knowledge
    • Incorporating Learned Structures into Larger Systems
    • Making Reinforcement Learning Practical
    • Triple Tradeoff: accuracy, sample size, hypothesis complexity
  • 35. Incorporating Prior Knowledge
    • How can we incorporate our prior knowledge into the learning algorithm?
      • Difficult for decision trees, neural networks, support-vector machines, etc.
        • Mismatch between form of our knowledge and the way the algorithms work
      • Easier for Bayesian networks
        • Express knowledge as constraints on the network
  • 36. Incorporating Learned Structures into Larger Systems
    • Success story: Digit recognizer incorporated into check reader
    • Challenges:
      • Larger system may make several coordinated decisions, but learning system treated each decision as independent
      • Larger system may have complex cost function: Errors in thousands place versus the cents place: $7,236.07
  • 37. Making Reinforcement Learning Practical
    • Current reinforcement learning methods do not scale well to large problems
    • Need robust reinforcement learning methodologies
  • 38. The Triple Tradeoff
    • Fundamental relationship between
      • amount of training data
      • size and complexity of hypothesis space
      • accuracy of the learned hypothesis
    • Explains many phenomena observed in machine learning systems
  • 39. Learning Algorithms
    • Set of data points
    • Class H of hypotheses
    • Optimization problem: Find the hypothesis h in H that best fits the data
    Training Data h Hypothesis Space
  • 40. Triple Tradeoff
    • Amount of Data – Hypothesis Complexity – Accuracy
    N = 1000 Hypothesis Space Complexity Accuracy N = 10 N = 100
  • 41. Triple Tradeoff (2) Number of training examples N Accuracy Hypothesis Complexity H 1 H 2 H 3
  • 42. Intuition
    • With only a small amount of data, we can only discriminate between a small number of different hypotheses
    • As we get more data, we have more evidence, so we can consider more alternative hypotheses
    • Complex hypotheses give better fit to the data
  • 43. Fixed versus Variable-Sized Hypothesis Spaces
    • Fixed size
      • Ordinary linear regression
      • Bayes net with fixed structure
      • Neural networks
    • Variable size
      • Decision trees
      • Bayes nets with variable structure
      • Support vector machines
  • 44. Corollary 1: Fixed H will underfit Number of training examples N Accuracy H 1 H 2 underfit
  • 45. Corollary 2: Variable-sized H will overfit Hypothesis Space Complexity Accuracy N = 100 overfit
  • 46. Ideal Learning Algorithm: Adapt complexity to data Hypothesis Space Complexity Accuracy N = 10 N = 100 N = 1000
  • 47. Adapting Hypothesis Complexity to Data Complexity
    • Find hypothesis h to minimize
      • error(h) +  complexity(h)
    • Many methods for adjusting 
      • Cross-validation
      • MDL
  • 48. Corollary 3: It is optimal to be suboptimal
    • Finding the smallest decision tree (or the smallest neural network) that fits N data points is NP-Hard
    • Heuristic greedy algorithms work well
    • Smarter algorithms do NOT work as well!
  • 49. What’s going on?
    • Heuristic algorithms do not consider all possible trees or neural networks
      • They effectively consider a smaller H
      • They are less likely to overfit the data
    • Conclusion: It is optimal (for accuracy) to be suboptimal (for fitting the data)
  • 50. Outline
    • Three scenarios where software engineering methods fail
    • Machine learning methods applied to these scenarios
    • Fundamental questions in machine learning
    • Statistical thinking in computer science
  • 51. The Data Explosion
    • NASA Data
      • 284 Terabytes (as of August, 1999)
      • Earth Observing System: 194 G/day
      • Landsat 7: 150 G/day
      • Hubble Space Telescope: 0.6 G/day
    http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html
  • 52. The Data Explosion (2)
    • Google indexes 2,073,418,204 web pages
    • US Year 2000 Census: 62 Terabytes of scanned images
    • Walmart Data Warehouse: 7 (500?) Terabytes
    • Missouri Botanical Garden TROPICOS plant image database: 700 Gbytes
  • 53. The Data Explosion (3) http://www.cs.columbia.edu/~hgs/internet/traffic.html
  • 54. Old Computer Science Conception of Data Store Retrieve
  • 55. New Computer Science Conception of Data Store Build Models Solve Problems Problems Solutions
  • 56. Machine Learning: Making Data Active
    • Methods for building models from data
    • Methods for collecting and/or sampling data
    • Methods for evaluating and validating learned models
    • Methods for reasoning and decision-making with learned models
    • Theoretical analyses
  • 57. Machine Learning and Computer Science
    • Natural language processing
    • Databases and data mining
    • Computer architecture
    • Compilers
    • Computer graphics
  • 58. Hardware Branch Prediction Source: Jim é nez & Lin (2000) Perceptron Learning for Predicting the Behavior of Conditional Branches
  • 59. Instruction Scheduler for New CPU
    • The performance of modern microprocessors depends on the order in which instructions are executed
    • Modern compilers rearrange instruction order to optimize performance (“instruction scheduling”)
    • Each new CPU design requires modifying the instruction scheduler
  • 60. Instruction Scheduling
    • Moss, et al. (1997): Machine Learning scheduler can beat performance of commercial compilers and match the performance of research compiler.
    • Training examples: small basic blocks
      • Experimentally determine optimal instruction order
      • Learn preference function
  • 61. Computer Graphics: Video Textures
    • Generate new video by splicing together short stretches of old video
    A B C D E F B D E D E F A Apply reinforcement learning to identify good transition points Arno Schödl, Richard Szeliski, David H. Salesin, Irfan Essa (SIGGRAPH 2000)
  • 62. Video Textures Arno Schödl, Richard Szeliski, David H. Salesin, Irfan Essa (SIGGRAPH 2000) You can find this video at Virtual Fish Tank Movie
  • 63. Graphics: Image Analogies : :: : ? Hertzmann, Jacobs, Oliver, Curless, Salesin (2000) SIGGRAPH
  • 64. Learning to Predict Textures Find p to minimize Euclidean distance between and B’(q) := A’(p) A(p) A’(p) B(q) B’(q)
  • 65. Image Analogies : :: :
  • 66. A video can be found at Image Analogies Movie
  • 67. Summary
    • Standard Software Engineering methods fail in many application problems
    • Machine Learning methods can replace guesswork with data to make good design decisions
  • 68. Machine Learning and Computer Science
    • Machine Learning is already at the heart of speech recognition and handwriting recognition
    • Statistical methods are transforming natural language processing (understanding, translation, retrieval)
    • Statistical methods are creating opportunities in databases, computer graphics, robotics, computer vision, networking, and computer security
  • 69. Computer Power and Data Power
    • Data is a new source of power for computer science
    • Every computer science student should learn the fundamentals of machine learning and statistical thinking
    • By combining engineered frameworks with models learned from data, we can develop the high-performance systems of the future

×