Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

648 views

Published on

Day 2

Summary Day 2

Mercè Martín (BigML)

https://bigml.com/events/valencian-summer-school-in-machine-learning-2015

Published in:
Data & Analytics

No Downloads

Total views

648

On SlideShare

0

From Embeds

0

Number of Embeds

22

Shares

0

Downloads

32

Comments

0

Likes

3

No embeds

No notes for slide

- 1. Morning class summary Mercè Martín BigML
- 2. Day 2
- 3. The Future of ML José David Martín-Guerrero (IDAL, UV) Machine learning projectMachine learning project All steps are connected and feedback is essential to succeed Society has drifted to the Machine Learning way social networks, data acquisition, technologies...
- 4. Feature engineering challenges High space dimensionality (#features >>> #samples) Inputs preparation: selection, transformation or model direct attack Modelling strategies: paradox of choice Too many algorithms and structures, no general purpose one? Too many con2guration options, no automatic choice? Select your model by its structure, parameters (tuning) or search algorithm (e.g. Deep learning: no feature engineering but hectic tuning, Azure: many elections) Wish list: more automation Work7ows, model selection, tuning, representation, prediction strategies The Future of ML
- 5. The Future of ML Existing techniques: Reinforcement learning Environment definable as state-space? Evolution of this space acted by a set of actors? The Problem is suitable for RL Goal to be maximized in the long term? Prior experience Interaction Environment adaptation Policy So far applied to synthetic problems and robotics but also suitable for marketing or medicine, and more to come!
- 6. Evaluating ML Algorithms II GOLDEN RULE: Never use the same example for training the model and evaluating it!! What if you don't have so much data? Sample and repeat! José Hernández-Orallo (UPV) Under-fitting: too general How can we detect them? Evaluating Over-fitting: too specific
- 7. Evaluating ML Algorithms II Training Data h1 Test hn Evaluation Evaluation Learning Learning Training Test n times n folds Cross-validation o We take all possible combinations with n‒1 for training and the remaining fold for test. o The error (or any other metric) is calculated n times and then averaged. o A 2nal model is trained with all the data. Bootstrapping o We extract n samples with repetition and train with the rest
- 8. Evaluating ML Algorithms II Cost-sensitive evaluations: not all errors are equally costly Hadamard product = Cost matrix . Confusion matrix open close OPEN 0 100€ CLOSE 2000€ 0 Actual Predicted c1 open close OPEN 300 500 CLOSE 200 99000 Actual Pred c3 open close OPEN 400 5400 CLOSE 100 94100 Actual c2 open close OPEN 0 0 CLOSE 500 99500 Actual c1 open close OPEN 0€ 50,000€ CLOSE 400,000€ 0€ c3 open close OPEN 0€ 540,000€ CLOSE 200,000€ 0€ c2 open close OPEN 0€ 0€ CLOSE 1,000,000€ 0€ TOTAL COST: 450,000€ TOTAL COST: 1,000,000€ TOTAL COST: 740,000€ Confusion Matrices Cost Matrix Resulting Matrices External Context: Set of classes Cost estimation Confusion matrix & cost matrix can be characterized by just one number: slope
- 9. Evaluating ML Algorithms II ROC (Receiver Operating Characteristic) analysis Dynamic context (class distribution & cost matrix) ROC diagram 0 1 1 0 FPR TPR o Given several classi2ers: We add the trivial (0,0) (1,1) classi2ers and construct the convex hull of their points (FPR,TPR). The points in the edges are linear combinations of classi2ers (p * Ca + (1-p) * Cb ) The classi2ers below the ROC curve are discarded. The best classi2er (from those remaining) will be selected at application time… slope Probabilistic context: soft ROC analysis A single classifier with probability-weighted predictions can generate a ROC curve by changing score threshold (each threshold gives a new classifier in the ROC curve) Ca Cb
- 10. Evaluating ML Algorithms II AUC (Area Under the ROC Curve) For crisp classifiers AUC is equivalent to the macro-averaged accuracy. AUC is a good metric for classifiers and rankers: A classifier with high AUC is a good ranker. It is also good for a (uniform) range of operating conditions A model with very good AUC will have good accuracy for all operating conditions. A model with very good accuracy for one operating condition can have very bad accuracy for another operating condition. A classifier with high AUC can have poor calibration (probability estimation). Multidimensional classifications? ROC problematic, AUC has been extended Regressions? ROC has been extended, AUC is the error variance
- 11. Cluster Analysis K-means clustering K=3 Poul Petersen (BigML) Unsupervised problem (unlabelled data) Customer segmentation, Item discovery (types), Association (profiles), recommender, active learning (group and label)
- 12. Cluster Analysis • What is the distance to a “missing value”? Defaults replacement • What is the distance between categorical values? [0,1] • What is the distance between text features? Vectorize and use cosine distance • Does it have to be Euclidean distance? • Unknown “K”? Distance and centers define the groups: K-means, but... Problems: Convergence (initial conditions), scaling dimensions Things you need to tackle: K-means: starting from a subset of K points, recursively compute the distances of all points in data to them and associate with the closest. Define the center of each group as new set of K points and repeat until there's no improvement.
- 13. Cluster Analysis Let K=5 K=5 g-means clustering: increment k looking for the gaussian
- 14. Unsupervised Data: Rank by dissimilarity Why? Unusual instances, intrusion detection, fraud, incorrect data • Given a group, try to single out the odd: remove outliers from data Dataset → Anomaly Detector → score → remove outliers Can use it a diKerent layers and combined with clustering • Improve model competence: testing predictions score to look for new instances dissimilar to train instances (non-competent model) • Compare against usual distributions, Gaussian, Benford's Law Anomaly Detection Poul Petersen (BigML)
- 15. Anomaly Detection “Round”“Skinny” “Corners” “Skinny” but not “smooth” No “Corners” Not “Round” Most unusual Different according to grouping features (prior knowledge)
- 16. Anomaly Detection Grow a random decision tree until each instance is in its own leaf (random features and splits) “easy” to isolate “hard” to isolate Depth Now repeat the process several times and assign an anomaly score ( 0 = similar , 1 = dissimilar) to any input data by computing how di%erent is the average depth for the instance to the average depth of the training set.
- 17. Machine Learning Black Art Charles Parker (BigML) Even when you follow the yellow brick road... Different models Feature engineering Evaluation metrics The house of horrors awaits you around the corner: Huge Hypothesis Space Poorly Picked Loss Function Cross Validation Drifting Domain Reliance on Research Results
- 18. Machine Learning Black Art ● Huge hypothesis space: the possible classifiers you could build with an algorithm given the data. Choice! Triple trade-off Use non-parametric methods As data scales simpler models are desirable Big data often trumps modelling! ● Poorly picked Loss function: standard loss functions (entropy, distance in formal space) are mathematically convenient but not always enough for real problems No info about the classes or the costs False positive in disease diagnosis False positive in face detection False positive in thumbprint identification Path dependence Game playing Let developers apply their own loss function: SVM light, plugins in splitting code, customized gradient descent... OR Hack the prediction (cascade classifiers) Change the problem setting (time based limits to the classifier, max loss) Keep error down with a certain probability More complex: you need more data
- 19. Machine Learning Black Art ● Cross-validation hold outs can lead to leakage: features or instances can be correlated in test an train sets. Optimistic performance. Law of averages and being off by one Features correlated with my prediction can bias predictions Photo dating: colors, borders... Beware of the group the instances belong to Agreggates and timestamps Instances in close moments are very correlated
- 20. Machine Learning Black Art ● Drifting Domain Domain changes (document classification, sales prediction) Adverse selection of training data (market data predictions, spam) ➢ Prior p(input) is changing → covariate shift ➢ Map changes p(output | input) is changing → concept drift Symptoms: lots of errors, distribution changes. Compare to old data! ● Reliance on Research results Reality does not comply to theorems' initial boundaries (error, sample complexity, convergence) Rule of thumb: Use academia as your starting point, but don’t think it will solve all your problems. Keep learning Reality does not comply to theorems' initial boundaries (error, sample complexity, convergence) non-real assumptions
- 21. Useful Things about ML Charles Parker (BigML) Advice from Dijkstra ● Killing Ambitious Projects - identify sub-problems you can tackle hard vs easy, hacking it's all right. Good candidates: No human experts predict in complex environments (protein folding) Humans can't explain how they know f(x)(character recognition) f(x) is changing all the time (market data) f(x) must be specialized many times (anything user speci2c) ● Ignoring the Lure of Complexity Look for simplicity (remove spaghetti code, processes, drudgery) Push around complexity (clever compression) Raw data might have information, sometimes is the right way ● Finding Your Own Humility Know and embrace your own limits Continuously learn Do A/B test: improve from an existing system ● Avoiding Useless Projects Look for the best combination of easy and big win De2ne metrics with experts but don't rely on them: monitor
- 22. Useful Things about ML Advice From DijkstraAdvice From DijkstraAdvice From DijkstraAdvice From Dijkstra (continued) ● Creating a good story Explain why and summarize your model and your data Stories are more valuable than models ● Continuing to Learn Don't accommodate, work at the verge of your abilities Understand your limitations Learn from your errors Summary: ML can be of value for every organization: 2nd where Locating the right problem, Executing, Showing the proof When you win we all win, so good luck!!!

No public clipboards found for this slide

Be the first to comment