Machine Learning: Learning with data


Published on

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Machine Learning: Learning with data

  1. 1. 10/11/2011 - ONE Talks Machine Learning Learning with DataAndré LourençoInstituto Superior de Engenharia de Lisboa,Instituto de Telecomunicações,Instituto Superior Técnico, Lisbon, Portugal © 2005, it - instituto de telecomunicações. Todos os direitos reservados.
  2. 2. Outline • Introduction • Examples • What does it mean to learn? • Supervised and Unsupervised Learning • Types of Learning • Classification Problem • Text Mining Example • Conclusions (and further reading) 2
  3. 3. Introduction 3
  4. 4. What is Machine Learning? • A branch of artificial intelligence (AI) • Arthur Samuel (1959) Field of study that gives computers the ability to learn without being explicitly programmed From: Andrew NG – Standford Machine Learning Classes 4 09-11-2011
  5. 5. What is Machine Learning? • Tom Mitchell (1998) Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. • Mark Dredze Teaching a computer about the world 5 09-11-2011
  6. 6. What is Machine Learning? • Goal: Design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases • How to apply machine Learning? • Observe the world • Develop models that match observations • Teach computer to learn these models • Computer applies learned model to the world 6 09-11-2011
  7. 7. Example 1:Prediction of House Price From: Andrew NG – Standford Machine Learning Classes 7 09-11-2011
  8. 8. Example 2:Learning to automatically classify text documents From: 8 09-11-2011
  9. 9. Example 3:Face Detection and Tracking face-detection-and-tracking/ 9 09-11-2011
  10. 10. Example 4: Social Network Mining Users’ Profile Friendship Group & Network U3 U1 U5 Hidden Information ? U2 U4From: Exploit of Online Social Networks with Community-BasedGraph Semi-Supervised Learning, Mingzhen Mo and Irwin King GroupICONIP 2010, Sydney, Australia Network 10 09-11-2011
  11. 11. Example 5:Biometric Systems 1. Physical 2. Behavioral 11 09-11-2011
  12. 12. WHAT DOES IT MEAN TOLEARN? 12 09-11-2011
  13. 13. What does it mean to learn? • Learn patterns in data z Decision ẋ System z : observed signal ẋ Estimated output 13 09-11-2011
  14. 14. Unsupervised Learning • Look for patterns in data • No training Data (no examples of output) • Pro: • No labeling of examples for output • Con: • Cannot demonstrate specific types of output • Applications: • Data mining • Finds interesting patterns in data From: Mark Dredze Machine Learning - Finding Patterns in the World 14 09-11-2011
  15. 15. Supervised Learning • Learn patterns to simulate given output • Pro: • Can learn complex patterns • Good performance • Con: • Requires many examples of output for examples • Applications: • Classification • Sorts data into predefined groups From: Mark Dredze Machine Learning - Finding Patterns in the World 15 09-11-2011
  16. 16. Types of Learning: Output • Classification • Binary, multi‐class, multi‐label, hierarchical, etc. • Classify email as spam • Loss: accuracy • Ranking • Order examples by preference • Rank results of web search • Loss: Swapped pairs • Regression • Real‐valued output • Predict the price of tomorrow’s stock price • Loss: Squared loss • Structured prediction • Sequences, trees, segmentation • Find faces in an image • Loss: Precision/Recall of faces From: Mark Dredze 16 09-11-2011 Machine Learning - Finding Patterns in the World
  17. 17. Classification Problem • Classical Architecture z Feature y ẋ Classification Extraction z : observed signal y : feature vector (pattern) y S ẋ Estimated output (class) ẋ {1,2,…,c} 17 09-11-2011
  18. 18. Classification Problem• Example with 1 feature• Problem: classify people in non-obese or obese by observation of its weight (only 1 feature) • Is it possible to classify without without making any mistakes? 18 18
  19. 19. Classification Problem• Example with 2 features z Feature y = {weight, ẋ = non-obese Classification Extraction Height} or obese z : observed signal y : feature vector (pattern) y S ẋ Estimated output (class) ẋ {1: non-obese, 2: obese} 19 09-11-2011
  20. 20. Classification Problem• Example with 2 feature • Problem: classify people in non-obese or obese by observation of its weight and height • Now the decision appears more simple! 20 20
  21. 21. Classification Problem• Example with 2 feature • Problem: classify people in non-obese or obese by observation of its weight and height • Regiões de decisão: R1 : non-obese; R2 : obese 21 21
  22. 22. Classification Problem• Decision Regions • Goal of the classifier: define a partition of the feature space with c disjoint regions, called decision regions: : R1, R2, …, Rc 22 22
  23. 23. TEXT MINING EXAMPLE 23 09-11-2011
  24. 24. Text Mining Process Adapted from: Introduction to Text Mining, Yair Even-Zohar, University of Illinois 24 09-11-2011
  25. 25. Text Mining Process• Text preprocessing • Syntactic/Semantic text analysis• Features Generation • Bag of words• Features Selection • Simple counting • Statistics• Text/Data Mining • Classification- Supervised learning • Clustering- Unsupervised learning• Analyzing results 25 09-11-2011
  26. 26. Syntactic / Semantic text analysis • Part Of Speech (pos) tagging • Find the corresponding pos for each word e.g., John (noun) gave (verb) the (det) ball (noun) • Word sense disambiguation • Context based or proximity based • Parsing • Generates a parse tree (graph) for each sentence • Each sentence is a stand alone graph 26 09-11-2011
  27. 27. Feature Generation: Bag of words • Text document is represented by the words it contains (and their occurrences) • e.g., “Lord of the rings”  {“the”, “Lord”, “rings”, “of”} • Highly efficient • Makes learning far simpler and easier • Order of words is not that important for certain applications • Stemming: identifies a word by its root • e.g., flying, flew  fly • Reduce dimensionality • Stop words: The most common words are unlikely to help text mining • e.g., “the”, “a”, “an”, “you” … 27 09-11-2011
  28. 28. Example Hi, Here is your weekly update (that unfortunately hasnt gone out in about a month). Not much action here right now. 1) Due to the unwavering insistence of a member of the group, the ncsa.d2k.modules.core.datatype package is month). hi, weekly update (that unfortunately gone out now completely independent of now. d2k application. much action here right the 1) due unwavering insistence 2) Transformations are now handled differently in Tables. package member group, ncsa.d2k.modules.core.datatype Previously, transformations were done using a now completely independent d2k application. 2) TransformationModule. That handled could thentables. previously, transformations now module differently be added to a list that an ExampleTable kept. transformationmodule. module transformations done using Now, there is an interfaceadded list exampletable kept. sub-interface called called Transformation and a now, interface called ReversibleTransformation. unfortunate go out month much action here hi week update transformation sub-interface called right now 1 due unwaver insistence member group ncsa reversibletransformation. d2k modules core datatype package now complete independence d2k application 2 transformation now handle different table previous transformation do use transformationmodule module add list exampletable keep now interface call transformation sub-interface call reversibletransformation 28 09-11-2011
  29. 29. Feature Generation: Weighting • Term Frequency Bag of Words Lorem 1 term ti, document dj dolor 1 Praesent 1 • Inverse Document Frequency Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Praesent iaculis 1 et quam sit amet diam porttitor iaculis. Vestibulum ante ipsum Vestibulum 1 primis in faucibus orciluctus et ultrices posuere ipsum 2 cubilia Curae; consectetuer 2 • TF-IDF 29 09-11-2011
  30. 30. Feature Generation: Vector Space Model Documents as vectors 30 09-11-2011
  31. 31. Feature Selection • Reduce dimensionality • Learners have difficulty addressing tasks with high dimensionality • Irrelevant features • Not all features help! • e.g., the existence of a noun in a news article is unlikely to help classify it as “politics” or “sport” • Stop Words Removal 31 09-11-2011
  32. 32. Example hi core week datatype update package unfortunate complete go independence out application month 2 hi do much transformationweek core action handle update datatype here different unfortunate package right table go complete now previous out independence 1 use month hi application datatype due much transformationmodule transformation week package unwaver add action handle update complete insistence list here different unfortunate independence member exampletable right table month application group keep now previous ncsa interface due action use transformation d2k call insistence right add handle modules sub-interface member duelist different do group reversibletransformation keep insistence table ncsa interface member previous d2k call group add modules sub-interface ncsa list d2k interface modules call core sub-interface 32 09-11-2011
  33. 33. Document Similarity • Dot Product – cosine similarity 33 09-11-2011
  34. 34. Text Mining: Classification definition • Given: a collection of labeled records (training set) • Each record contains a set of features (attributes), and the true class (label) • Find: a model for the class as a function of the values of the features • Goal: previously unseen records should be assigned a class as accurately as possible • A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it 34 09-11-2011
  35. 35. Text Mining: Clustering definition • Given: a set of documents and a similarity measure among documents • Find: clusters such that: • Documents in one cluster are more similar to one another • Documents in separate clusters are less similar to one another • Goal: • Finding a correct set of documents 35 09-11-2011
  36. 36. Supervised vs. Unsupervised Learning • Supervised learning (classification) • Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations • New data is classified based on the training set • Unsupervised learning (clustering) • The class labels of training data is unknown • Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data 36 09-11-2011
  37. 37. CONCLUDING REMARKS 37 09-11-2011
  38. 38. Readings • Survey Books in Machine Learning • The Elements of Statistical Learning • Hastie, Tibshirani, Friedman • Pattern Recognition and Machine Learning • Bishop • Machine Learning • Mitchell • Questions? 38 09-11-2011
  39. 39. ACKNOWLEDGEMENTS • ISEL – DEETC • Final year and MSc supervised students (Tony Tam, ...) • Students of Digital Signal Processing • Artur Ferreira • Instituto Telecomunicações (IT) David Coutinho, Hugo Silva, Ana Fred, Mário Figueiredo • Fundação para a Ciência e Tecnologia (FCT) 39 09-11-2011
  40. 40. you for the attention!André Ribeiro LourençoMail to: 40
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.