Your SlideShare is downloading. ×
Presentation on Machine Learning and Data Mining
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Presentation on Machine Learning and Data Mining


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Machine Learning and Data Mining Yves Kodratoff CNRS, LRI Bât. 490, Université Paris-Sud 91405 Orsay, “Automatic Learning”: stemming from 4 communities developing 4 approaches AI Stats (and DA) Bayesian Stats. Pattern Recognition DM: the ‘daughter’ of DB and AL 1. A good many definitions A few definitions 1, 2, 3: Supervised and Unsupervised Learning What is automated induction? The components of DM 2. Differences between AL and DM Differences in the scientific approach Differences from the point of view of industry 1, 2 Twelve tips for successful Data Mining
  • 2. What Data Mining techniques do you use regularly?
  • 3. A few definitions 1: Supervised and Unsupervised Learning Supervised Learning (“with teacher”) Input: description in extension of the problem. Most often: Field 1 Field 2 … Field k Class Record 1 Value 11 Value 12 … Value 1k Class value … Record p Value p1 Value p2 … Value pk Class value Output : extract the ‘properties’ of this description (also called : description in intention) IF (Field m = Value ml) & Field n ∈ [Value ij, Value mn] & … THEN Class value = a Unsupervised Learning (“without teacher”) Discover patterns in the data
  • 4. Clustering = classification, categorization, segmentation Data Analysis e.g. main axis of ellipsoid containing the data Search for logical structures = Probabilistic theorems (associations) functional relations among variables (such as PV = nRT) Spatial or Temporal sequences Discover terms in texts A few definitions 2: What is automated induction? Techniques for inventing a new model better fitting the data Essentially made of 4 steps: Definition of the hypothesis space Choice of a search strategy within the hypothesis space Choice of an optimization criterion Validation
  • 5. Definition of the hypothesis space Defines the task and the space of possible solutions e.g.: tagging. ‘special purposes’  ‘special-adj purposes-n-plur’ Texample task: Learn the tags of new words from a set of tagged texts Hypothesis space: Let W1 the new word to tag. Hypothesis space is ‘context’: all words and tags within 3 words before or after W1. Rules will be of the form: IF context(W1) = … THEN tag W1 as … Choice of a search strategy within the hypothesis space Exhaustive Exhaustive + random choice Greedy (choose 1st step that leads to best value of optimization criterion) Steepest descent (e.g. Neural Networks) Genetic Algorithms
  • 6. Choice of an optimization criterion Apply the current hypothesis to the data and then use the following : Adjust numerical distances (DA) e.g. hypothesize a cluster, compute its center of gravity, compute the sum of the distances of the points in the cluster to the center of gravity, optimum is obtained when distance is minimum Decrease variance (Stats) Increase precision or similar measurements (ML) Adjust discrete (or Boolean) distances (ML & DA) Decrease entropy (decision trees) Increase utility (define utility) (DM) Increase posterior probability of phenomenon given data: P(Ph D) (Bayesian learning) Minimum length description ( learning & Bayesian) When everything else fails: Occam’s razor ('everyone')
  • 7. Validation Expert Use the results A few definitions 3: The base components of DM Data Mining Machine Learning Pattern Recognition Exploratory Statistics Data Analysis Bayesian statistics Data Mining (DM) (1989) Unsupervised: Association Detection Temporal Series Segmentation techniques Supervised : Data with many fields and few records : DNA chips
  • 8. Machine Learning (ML) (1980) Supervised : Decision Trees Decision Rules Generalization techniques Inductive Logic Programming Model combinations Unsupervised: COBWEB (clustering) Pattern Recognition (1958 - ~1985) Supervised : Perceptron Neural networks Unsupervised: Self-organizing maps Exploratory Statistics (~65s - 1995) Supervised : k-means Regression trees(1983) Support Vector Machines (1995) Unsupervised: Logistic regression
  • 9. Data Analysis (60s) Supervised : Main components analysis Unsupervised: Numerical clustering Bayesian statistics Supervised (1961) Naive Bayes Unsupervised (1995) Large Bayesian networks structure
  • 10. Differences between AL and DM Differences in the scientific approach Classic data Automatic DM processing Learning (ML and Statistics) Simulates Simulates Simulates deductive inductive inductive reasoning (= reasoning (= reasoning ("even applies an existing invents a model) more inductive") model) validation validation validation according to according to according to precision precision utility and comprehensibility Results as universal Results as Results relative to as possible universal as particular cases possible elegance = elegance = elegance = conciseness conciseness adequacy to the user's model Position relative to Artificial Intelligence Tends to reject Either tends to reject Naturally AI AI (Statistics) or integrates AI, DB, claims belonging to Stat., and MMI. AI (ML)
  • 11. Differences from the point of view of industry 1 Twelve tips for successful Data Mining Oracle Data Mining Suite a - Mine significantly more data b - Create new variable to tease more information out of your data c - Take has shallow dive into the data first d - Rapidly build many exploratory predictive models e - Cluster your customers first, and then build multiple targeted predictive models apply pattern detection methods to the entire basis  laws valid for all individuals (usually trivial) apply pattern detection methods to the segmented basis  laws valid for all each segment (usually as interesting as segmentation is) f - automated model building g - Demystify neural networks and clusters by reverse engineering them using C&RT models h - Use predictive modeling to impute missing values i - Build multiple models and form a ‘panel of experts’ predictive models j - Forget about traditional dated hygiene practices k - Enrich your data with external data
  • 12. l - Feed the models a better ‘balanced fuel mixture’ of data Differences from the point of view of industry 2 What Data Mining techniques do you use regularly? Aug. 2001 Oct. 2002 Clustering na 12% (if ‘type of analysis’, then 22%) Neural Networks 13% 9% Decision Trees/Rules 19% 16% Logistic Regression 14% 9% Statistics 17% 12% Bayesian nets 6% 3% Visualization 8% 6% Nearest Neighbor na 5% Association Rules 7% 8% Hybrid methods 4% 3% Text Mining 2% 4% Sequence Analysis na 3% Genetic Algorithms na 3% Naive Bayes na 2% Web mining 5% 2% Agents 1% na Other 2% 2% Conclusion Obvious that DM takes care of industrial problems BUT ALSO
  • 13. Scientifically more audacious