SlideShare a Scribd company logo
Practical Machine Learning
(Part I)
Traian Rebedea
Faculty of Automatic Control and Computers, UPB
traian.rebedea@cs.pub.ro
Outline
• About me
• Why machine learning?
• Basic notions of machine learning
• Sample problems and solutions
– Finding vandalism on Wikipedia
– Sexual predator detection
– Identification of buildings on a map
– Change in the meaning of concepts over time
• Practical ML
– R
– Python (sklearn)
– Java (Weka)
– Others
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
About me
• Started working in 2003 (year 2 of studies)
– J2EE developer
• Then created my own company with a colleague (in 2005)
– Web projects (PHP, Javascript, etc.), involved in some startups for the
technological aspects (e.g. Happyfish TV)
• Started teaching at A&C, UPB (in 2006)
– TA for Algorithms, Natural Language Processing
• Started my PhD (in 2007)
– Natural Language Processing, Discourse Analysis, Tehnology-Enhanced
Learning
• Now I am teaching several lectures: Algorithm Design, Algorithm Design
and Complexity, Symbolic and Statistical Learning, Information Retrieval
• Working on several topics in NLP: opinion mining, conversational agents,
question-answering, culturonomics
• Interested in all new projects that use NLP, ML and IR as well as any other
“smart” applications
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Why machine learning?
• “Simply put, machine learning is the part of artificial
intelligence that actually works” (Forbes, link)
• Most popular course on Coursera
• Popular?
– Meaning practical: ML results are nice and easy to show to
others
– Meaning well-paid/in demand: companies are increasing
demand for ML specialists
• Large volumes of data, corpora, etc.
• Computer science, data science, almost any other
domain (especially in research)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Why ML?
• Mix of Computer science and Math (Statistics)
skills
• Why is math essential to ML?
– http://courses.washington.edu/css490/2012.Winter/lecture_slides/02
_math_essentials.pdf
• “To get really useful results, you need good
mathematical intuitions about certain general
machine learning principles, as well as the
inner workings of the individual algorithms.”
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Basic Notions of ML
• Different tasks in ML
• Supervised
– Regression
– Classification
• Unsupervised
– Clustering
– Dimensionality reduction / visualization
• Semi-supervised
– Label propagation
• Reinforcement learning
• Deep learning
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Supervised learning
• Dataset consisting of labeled examples
• Each example has one or several attributes
and a dependent variable (label, output,
response, predicted variable)
• After building, selection and assessing the
model on the labeled data, it is used to find
labels for new data
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Supervised learning
• Dataset is usually split into three parts
– A rule is 50-25-25
• Training set
– Used to build models
– Compute the basic parameters of the model
• Holdout/validation set
– Used to find out the best model and adjust some parameters
of the model (usually, in order to minimize some error)
– Also called model selection
• Test set
– Used to assess the final model on new data, after model
building and selection
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://blogs.sas.com/content/jmp/2010/07/06/train-validate-and-test-for-
data-mining-in-jmp/
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Cross-validation
If no of partitions (S) = no of training instances 
leave-one-out (only for small datasets)
Main disadvantages:
- no of runs is
increased by a factor
of S
- Multiple parameters
for the same model
for different runs
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Regression
• Supervised learning when the output variable
is continous
– This is the usual textbook definition
– Also may be used for binary output variables (e.g.
Not-spam=0, Spam=1) or output variables that
have an ordering (e.g. integer numbers, sentiment
levels, etc.)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://smlv.cc.gatech.edu/2010/10/06/linear-regression-and-least-
squares-estimation/
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Classification
• Supervised learning when the output variable
is discrete (classes)
– Can also be used for binary or integer outputs
– Even if they are ordered, but classification usually
does not account for the order
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• CIFAR dataset: http://www.cs.toronto.edu/~kriz/cifar.html (also image source)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Classification
• Several important types
– Binary vs. multiple classes
– Hard vs. soft
– Single-label vs. multi-label (per example)
– Balanced vs. unbalanced classes
• Extensions that are semi-supervised
– One-class classification
– PU (Positive and Unlabeled) learning
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Unsupervised learning
• For an unlabeled dataset, infer properties (find
“hidden” structure) about the distribution of the
objects in the dataset
– without any help related to correct answers
• Generally, much more data than for supervised
learning
• There are several techniques that can be applied
– Clustering (cluster analysis)
– Dimensionality reduction (visualization of data)
– Association rules, frequent itemsets
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Clustering
• Partition dataset into clusters (groups) of
objects such that the ones in the same cluster
are more similar to each other than to objects
in different clusters
• Similarity
– The most important notion when clustering
– “Inverse of a distance”
• Possible objective: approximate the modes of
the input dataset distribution
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Clustering
• Lots of different cluster models:
– Connectivity (distance models) (e.g Hierarchical)
– Centroid (e.g. K-means)
– Distribution (e.g GMM)
– Density of data (e.g. DBSCAN)
– Graph (e.g. cliques, community detection)
– Subspace (e.g. biclustering)
– …
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://genomics10.bu.edu/terrence/gems/gems.html
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: https://sites.google.com/site/statsr4us/advanced/multivariate/cluster
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Dimensionality reduction
• Usually, unsupervised datasets are large both
in number of examples and in number of
attributes (features)
• Reduce the number of features in order to
improve (human) readability and
interpretation of data
• Mainly linked to visualization
– Also used for feature selection, data compression
or denoising
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://www.ifs.tuwien.ac.at/ifs/research/pub_pdf/rau_ecdl01.pdf
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Semi-supervised learning
• Mix of labeled and unlabeled data in the dataset
• Enlarge the labeled dataset with unlabeled instances for which we
might determine the correct label with a high probability
• Unlabeled data is much cheaper or simpler to get than labeled data
– Human annotation either requires experts or is not challenging (it is
boring)
– Annotation may also require specific software or other types of
devices
– Students (or Amazon Mturk users) are not always good annotators
• More information:
http://pages.cs.wisc.edu/~jerryzhu/pub/sslicml07.pdf
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Label propagation
• Start with the labeled dataset
• Want to assign labels to some (or all) of the unlabeled
instances
• Assumption: “data points that are close have similar labels”
• Build a fully connected graph with both labeled and unlabeled
instances as vertices
• The weight of an edge represents the similarity between the
points (or inverse of a distance)
• Repeat until convergence
– Propagate labels through edges (larger weights allow a
more quick propagation)
– Nodes will have soft labels
• More info: http://lvk.cs.msu.su/~bruzz/articles/classification/zhu02learning.pdf
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://web.ornl.gov/sci/knowledgediscovery/Projects.htm
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Basic Notions of ML
• How to asses the performance of a model?
• Used to choose the best model
• Supervised learning
– Assessing errors of the model on the validation
and test sets
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Performance measures - supervised
• Regression
– Mean error, mean squared error (MSE), root MSE, etc.
• Classification
– Accuracy
– Precision / recall (possibly weighted)
– F-measure
– Confusion matrix
– TP, TN, FP, FN
– AUC / ROC
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://www.resacorp.com/lsmse2.htm
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://en.wikipedia.org/wiki/Sensitivity_and_specificity
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Basic Notions of ML
• Overfitting
– Low error on training data
– High error on (unseen/new) test data
– The model does not generalize well from the training data
on unseen data
– Possible causes:
• Too few training examples
• Model too complex (too many parameters)
• Underfitting
– High error both on training and test data
– Model is too simple to capture the information in the
training data
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://gerardnico.com/wiki/data_mining/overfitting
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Sample problems and solutions
• Chosen from prior experience
• May not be the most interesting problems for all
(any) of you
• I have tried to present 4 problems that are
somehow diverse in solution and simple to
explain
• All solutions are state-of-the-art (that means not
the very best one, but among the top)
– Very best solution may be found in research papers
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Finding vandalism on Wikipedia
• Work with Dan Cioiu
• Vandalism = editing in a malicious manner that is intentionally
disruptive
• Dataset used at PAN 2011
– Real articles extracted from Wikipedia in 2009 and 2010
– Training data: 32,000 edits
– Test data: 130,000 edits
• An edit consists of:
– the old and new content of the article
– author information
– revision comment and timestamp
• We chose to use only the actual text in the revision (no author and
timestamp information) to predict vandalism
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Features
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Using several classifiers
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Cross-validation results
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Combining results using a meta-
classifier
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Final results on test set
• Source: http://link.springer.com/chapter/10.1007%2F978-3-642-40495-5_32
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Main conclusions
• Feature extraction is important in ML
– Which are the best features?
• Syntactic and context analysis are the most important
• Semantic analysis is slow but increases detection by
10%
• Using features related to users would further improve
our results
• A meta-classifier sometimes improves the results of
several individual classifiers
– Not always! Maybe another discussion on this…
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Sexual predator detection
• Work with Claudia Cardei
• Automatically detect sexual predators in a chat
discussion in order to alert the parents or the police
• Corpus from PAN 2012
– 60000+ chat conversations (around 10000 after removing
outliers)
– 90000+ users
– 142 sexual predators
• Sexual predator is used “to describe a person seen as
obtaining or trying to obtain sexual contact with
another person in a metaphorically predatory manner”
(Wikipedia)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Problem description
• Decide either a user is a predator or not
• Use only the chats that have a predator
(balanced corpus: 50% predators - 50%
victims)
• One document for each user
• How to determine the features?
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Simple solution
• “Bag of words” (BoW) features with tf-idf
weighting
• Feature extraction using mutual information
• MLP (neural network) with 500 features: 89%
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Going beyond
• The should be other features that are discriminative for predators
• Investigated behavioral features:
– Percentage of questions and inquiries initiated by an user
– Percentage of negations
– Expressions that might denote an underage user (“don’t know”, “never
did”,…)
– The age of the user if found in a chat reply (“asl” like answers)
– Percentage of slang words
– Percentage of words with a sexual connotation
– Readability scores (Flesch)
– Who started the conversation
• AdaBoost (with DecisionStump): 90%
• Random Forest: 93%
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Main conclusions
• Feature selection is useful when the number of
features is large
– Reduced from 10k+ features to hundreds/thousands
– We used MI, there are others techniques
• Fewer features can be as descriptive as (or even
more descriptive than) the BoW model
– Difficult to choose
– Need additional computations and information
– Need to understand the dataset
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Identification of buildings on a map
• Work with Alexandra Vieru, Paul Vlase
• Given an aerial image, identify the buildings
• Non-military purposes:
– Estimate population density
– Identify main landmarks (shopping malls, etc.)
– Identify new buildings over time
• Features
– HOG descriptors – 3x3 cells
– Histogram of RGB – 5 bins, normalized
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Preliminary results
• SVM classifier
• Train: 2276 images, Test: 637 images
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Main conclusions
• Verify which are the best features
– If this is possible
• Must verify for overfitting using train and test
corpus
• Each task can have a baseline (a simple
classification) method
– Here it might be SVM with RGB histogram features
• Actually this could be treated as a one-class
classification (or PU learning) problem
– There are much more negative instances than positive
ones
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Change in the meaning of concepts
over time
• Work with Costin Chiru, Bianca Dinu, Diana Mincu
• Assumption:
– Different meanings of words should be identifiable
based on the other words that are associated with
them
– These other words are changing over time thus
denoting a change in the meaning (or usage of the
meanings)
• Using the Google Books n-grams corpus
– Dependency information
– 5-grams information centered on the analyzed word
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: https://books.google.com/ngrams (Query: “gay=>*_NOUN”)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Clustering using tf-idf over dependencies of the
analyzed word
• Each distinct year is considered a different document
and each dependency is a feature/term
• Example for the word “gay”:
– Cluster 1 (1600 - 1925): signature {more, lively, happy,
brilliant, cheerful}
– Cluster 2 (1923-1969): signature {more, lively, happy,
cheerful, be}
– Cluster 3 (1981 - today): signature {lesbian, bisexual, anti,
enolum, straight}
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Practical ML
• Brief discussion of alternatives that might be
used by a programmer
• The main idea is to show that it is simple to
start working on a ML task
• However, it is more difficult to achieve very
good results
– Experience
– Theoretical underpinnings of models
– Statistical background
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
R language
• Designed for “statistical computing”
• Lots of packages and functions for ML & statistics
• Simple to output visualizations
• Easy to write code, short
• It is kind of slow on some tasks
• Need to understand some new concepts: data frame,
factor, etc.
• You may find the same functionality in several
packages (e.g. NaïveBayes, Cohen’s Kappa, etc.)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Python (sklearn)
• sklearn (scikit-learn) package developed for
ML
• Uses existing packages numpy, scipy
• Started as a GSOC project in 2007
• It is new, implemented efficiently (time and
memory), lots of functionality
• Easy to write code
– Naming conventions, parameters, etc.
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://peekaboo-vision.blogspot.ro/2013/01/machine-learning-cheat-
sheet-for-scikit.html
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Java (Weka)
• Similar to scikit-learn for Python
• Older (since 1997) and more stable than scikit-learn
• Some implementations are (were) a little bit more
inefficient, especially related to memory consumption
• However, usually it is fast, accurate, and has lots of
other useful utilities
• It also has a GUI for getting some results fast without
writing any code
• Lots of people are debating whether Weka or scikit-learn is
better. I don’t want to get into this discussion as it depends on
your task and resources.
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Not discussed
• RapidMiner
– Easy to use
– GUI interface for building the processing pipeline
– Can write some code/plugins
• Matlab/Octave
– Numeric computing
– Paid vs. free
• SPSS/SAS/Stata
– Somehow similar to R, faster, more robust, expensive
• Many others:
– Apache Mahout / MALLET / NLTK / Orange / MLPACK
– Specific solutions for various ML models
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Conclusions
• Lots of interesting ML tasks and data available
– Do you have access to new and interesting data in your
business?
• Several easy to use programming alternatives
• Understand the basics
• Enhance your maths (statistics) skills for a better ML
experience
• Hands-on experience and communities of practice
– www.kaggle.com (and other similar initiatives)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Thank you!
traian.rebedea@cs.pub.ro
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
_____
_____

More Related Content

What's hot

Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
Sri Ambati
 
Regularization
RegularizationRegularization
Regularization
Darren Yow-Bang Wang
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
odsc
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
Mehrnaz Faraz
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
Knoldus Inc.
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
9xdot
 
幾何を使った統計のはなし
幾何を使った統計のはなし幾何を使った統計のはなし
幾何を使った統計のはなし
Toru Imai
 
Feature Engineering & Selection
Feature Engineering & SelectionFeature Engineering & Selection
Feature Engineering & Selection
Eng Teong Cheah
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
Somnath Banerjee
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
Si Haem
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
正準相関分析
正準相関分析正準相関分析
正準相関分析
Akisato Kimura
 
Fisher線形判別分析とFisher Weight Maps
Fisher線形判別分析とFisher Weight MapsFisher線形判別分析とFisher Weight Maps
Fisher線形判別分析とFisher Weight Maps
Takao Yamanaka
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
Dr Ganesh Iyer
 
次元の呪い
次元の呪い次元の呪い
次元の呪い
Kosuke Tsujino
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Gabriel Moreira
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
SOUMIT KAR
 

What's hot (20)

Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Regularization
RegularizationRegularization
Regularization
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
 
幾何を使った統計のはなし
幾何を使った統計のはなし幾何を使った統計のはなし
幾何を使った統計のはなし
 
Feature Engineering & Selection
Feature Engineering & SelectionFeature Engineering & Selection
Feature Engineering & Selection
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
正準相関分析
正準相関分析正準相関分析
正準相関分析
 
Fisher線形判別分析とFisher Weight Maps
Fisher線形判別分析とFisher Weight MapsFisher線形判別分析とFisher Weight Maps
Fisher線形判別分析とFisher Weight Maps
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
次元の呪い
次元の呪い次元の呪い
次元の呪い
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 

Similar to Practical machine learning - Part 1

Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nyberg
diannepatricia
 
WWW2015 PHD Symposium
WWW2015 PHD SymposiumWWW2015 PHD Symposium
WWW2015 PHD Symposium
Dominik Kowald
 
Simulation and modeling introduction.pptx
Simulation and modeling introduction.pptxSimulation and modeling introduction.pptx
Simulation and modeling introduction.pptx
ShamasRehman4
 
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Hans Põldoja
 
Domain Modeling for Personalized Learning
Domain Modeling for Personalized LearningDomain Modeling for Personalized Learning
Domain Modeling for Personalized Learning
Peter Brusilovsky
 
Enabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMSEnabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMS
Mohamed EL Zayat
 
Measuring the user experience
Measuring the user experienceMeasuring the user experience
Measuring the user experience
Andres Baravalle
 
Lessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning SoftwareLessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning Software
Christian Ramirez
 
Using Qualtrics for Online Trainings
Using Qualtrics for Online TrainingsUsing Qualtrics for Online Trainings
Using Qualtrics for Online Trainings
Shalin Hai-Jew
 
Course Possibilities & Architecture
Course Possibilities & ArchitectureCourse Possibilities & Architecture
Course Possibilities & Architecture
Folajimi Fakoya
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
ananth
 
Unit IV Software Engineering
Unit IV Software EngineeringUnit IV Software Engineering
Unit IV Software Engineering
Nandhini S
 
Be cse
Be cseBe cse
Be cse
imamruta
 
Web-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital CompetencesWeb-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital Competences
Hans Põldoja
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.ppt
SumitVishwambhar
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.ppt
NelsonYanes6
 
Evaluation beyond the desktop
Evaluation beyond the desktopEvaluation beyond the desktop
Evaluation beyond the desktop
Thomas Grill
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 

Similar to Practical machine learning - Part 1 (20)

Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nyberg
 
WWW2015 PHD Symposium
WWW2015 PHD SymposiumWWW2015 PHD Symposium
WWW2015 PHD Symposium
 
Simulation and modeling introduction.pptx
Simulation and modeling introduction.pptxSimulation and modeling introduction.pptx
Simulation and modeling introduction.pptx
 
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
 
Domain Modeling for Personalized Learning
Domain Modeling for Personalized LearningDomain Modeling for Personalized Learning
Domain Modeling for Personalized Learning
 
Enabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMSEnabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMS
 
Lecture 1 (bce-7)
Lecture   1 (bce-7)Lecture   1 (bce-7)
Lecture 1 (bce-7)
 
Measuring the user experience
Measuring the user experienceMeasuring the user experience
Measuring the user experience
 
Building Comfort with MATLAB
Building Comfort with MATLABBuilding Comfort with MATLAB
Building Comfort with MATLAB
 
Lessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning SoftwareLessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning Software
 
Using Qualtrics for Online Trainings
Using Qualtrics for Online TrainingsUsing Qualtrics for Online Trainings
Using Qualtrics for Online Trainings
 
Course Possibilities & Architecture
Course Possibilities & ArchitectureCourse Possibilities & Architecture
Course Possibilities & Architecture
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Unit IV Software Engineering
Unit IV Software EngineeringUnit IV Software Engineering
Unit IV Software Engineering
 
Be cse
Be cseBe cse
Be cse
 
Web-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital CompetencesWeb-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital Competences
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.ppt
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.ppt
 
Evaluation beyond the desktop
Evaluation beyond the desktopEvaluation beyond the desktop
Evaluation beyond the desktop
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 

More from Traian Rebedea

An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
Traian Rebedea
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
Traian Rebedea
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
Traian Rebedea
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
Traian Rebedea
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
Traian Rebedea
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...
Traian Rebedea
 
A focused crawler for romanian words discovery
A focused crawler for romanian words discoveryA focused crawler for romanian words discovery
A focused crawler for romanian words discovery
Traian Rebedea
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large Corpora
Traian Rebedea
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitareTraian Rebedea
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaTraian Rebedea
 
Relevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTubeRelevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTubeTraian Rebedea
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianTraian Rebedea
 
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...Traian Rebedea
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriTraian Rebedea
 
Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Traian Rebedea
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Traian Rebedea
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyTraian Rebedea
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Traian Rebedea
 

More from Traian Rebedea (20)

An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...
 
A focused crawler for romanian words discovery
A focused crawler for romanian words discoveryA focused crawler for romanian words discovery
A focused crawler for romanian words discovery
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large Corpora
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitare
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
 
Relevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTubeRelevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTube
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in Romanian
 
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuri
 
Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD Survey
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009
 

Recently uploaded

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

Practical machine learning - Part 1

  • 1. Practical Machine Learning (Part I) Traian Rebedea Faculty of Automatic Control and Computers, UPB traian.rebedea@cs.pub.ro
  • 2. Outline • About me • Why machine learning? • Basic notions of machine learning • Sample problems and solutions – Finding vandalism on Wikipedia – Sexual predator detection – Identification of buildings on a map – Change in the meaning of concepts over time • Practical ML – R – Python (sklearn) – Java (Weka) – Others 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 3. About me • Started working in 2003 (year 2 of studies) – J2EE developer • Then created my own company with a colleague (in 2005) – Web projects (PHP, Javascript, etc.), involved in some startups for the technological aspects (e.g. Happyfish TV) • Started teaching at A&C, UPB (in 2006) – TA for Algorithms, Natural Language Processing • Started my PhD (in 2007) – Natural Language Processing, Discourse Analysis, Tehnology-Enhanced Learning • Now I am teaching several lectures: Algorithm Design, Algorithm Design and Complexity, Symbolic and Statistical Learning, Information Retrieval • Working on several topics in NLP: opinion mining, conversational agents, question-answering, culturonomics • Interested in all new projects that use NLP, ML and IR as well as any other “smart” applications 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 4. Why machine learning? • “Simply put, machine learning is the part of artificial intelligence that actually works” (Forbes, link) • Most popular course on Coursera • Popular? – Meaning practical: ML results are nice and easy to show to others – Meaning well-paid/in demand: companies are increasing demand for ML specialists • Large volumes of data, corpora, etc. • Computer science, data science, almost any other domain (especially in research) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 5. Why ML? • Mix of Computer science and Math (Statistics) skills • Why is math essential to ML? – http://courses.washington.edu/css490/2012.Winter/lecture_slides/02 _math_essentials.pdf • “To get really useful results, you need good mathematical intuitions about certain general machine learning principles, as well as the inner workings of the individual algorithms.” 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 6. Basic Notions of ML • Different tasks in ML • Supervised – Regression – Classification • Unsupervised – Clustering – Dimensionality reduction / visualization • Semi-supervised – Label propagation • Reinforcement learning • Deep learning 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 7. Supervised learning • Dataset consisting of labeled examples • Each example has one or several attributes and a dependent variable (label, output, response, predicted variable) • After building, selection and assessing the model on the labeled data, it is used to find labels for new data 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 8. Supervised learning • Dataset is usually split into three parts – A rule is 50-25-25 • Training set – Used to build models – Compute the basic parameters of the model • Holdout/validation set – Used to find out the best model and adjust some parameters of the model (usually, in order to minimize some error) – Also called model selection • Test set – Used to assess the final model on new data, after model building and selection 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 10. Cross-validation If no of partitions (S) = no of training instances  leave-one-out (only for small datasets) Main disadvantages: - no of runs is increased by a factor of S - Multiple parameters for the same model for different runs 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 11. Regression • Supervised learning when the output variable is continous – This is the usual textbook definition – Also may be used for binary output variables (e.g. Not-spam=0, Spam=1) or output variables that have an ordering (e.g. integer numbers, sentiment levels, etc.) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 12. • Source: http://smlv.cc.gatech.edu/2010/10/06/linear-regression-and-least- squares-estimation/ 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 13. Classification • Supervised learning when the output variable is discrete (classes) – Can also be used for binary or integer outputs – Even if they are ordered, but classification usually does not account for the order 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 14. • CIFAR dataset: http://www.cs.toronto.edu/~kriz/cifar.html (also image source) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 15. Classification • Several important types – Binary vs. multiple classes – Hard vs. soft – Single-label vs. multi-label (per example) – Balanced vs. unbalanced classes • Extensions that are semi-supervised – One-class classification – PU (Positive and Unlabeled) learning 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 16. Unsupervised learning • For an unlabeled dataset, infer properties (find “hidden” structure) about the distribution of the objects in the dataset – without any help related to correct answers • Generally, much more data than for supervised learning • There are several techniques that can be applied – Clustering (cluster analysis) – Dimensionality reduction (visualization of data) – Association rules, frequent itemsets 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 17. Clustering • Partition dataset into clusters (groups) of objects such that the ones in the same cluster are more similar to each other than to objects in different clusters • Similarity – The most important notion when clustering – “Inverse of a distance” • Possible objective: approximate the modes of the input dataset distribution 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 18. Clustering • Lots of different cluster models: – Connectivity (distance models) (e.g Hierarchical) – Centroid (e.g. K-means) – Distribution (e.g GMM) – Density of data (e.g. DBSCAN) – Graph (e.g. cliques, community detection) – Subspace (e.g. biclustering) – … 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 19. • Source: http://genomics10.bu.edu/terrence/gems/gems.html 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 20. • Source: https://sites.google.com/site/statsr4us/advanced/multivariate/cluster 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 21. Dimensionality reduction • Usually, unsupervised datasets are large both in number of examples and in number of attributes (features) • Reduce the number of features in order to improve (human) readability and interpretation of data • Mainly linked to visualization – Also used for feature selection, data compression or denoising 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 22. • Source: http://www.ifs.tuwien.ac.at/ifs/research/pub_pdf/rau_ecdl01.pdf 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 23. Semi-supervised learning • Mix of labeled and unlabeled data in the dataset • Enlarge the labeled dataset with unlabeled instances for which we might determine the correct label with a high probability • Unlabeled data is much cheaper or simpler to get than labeled data – Human annotation either requires experts or is not challenging (it is boring) – Annotation may also require specific software or other types of devices – Students (or Amazon Mturk users) are not always good annotators • More information: http://pages.cs.wisc.edu/~jerryzhu/pub/sslicml07.pdf 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 24. Label propagation • Start with the labeled dataset • Want to assign labels to some (or all) of the unlabeled instances • Assumption: “data points that are close have similar labels” • Build a fully connected graph with both labeled and unlabeled instances as vertices • The weight of an edge represents the similarity between the points (or inverse of a distance) • Repeat until convergence – Propagate labels through edges (larger weights allow a more quick propagation) – Nodes will have soft labels • More info: http://lvk.cs.msu.su/~bruzz/articles/classification/zhu02learning.pdf 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 25. • Source: http://web.ornl.gov/sci/knowledgediscovery/Projects.htm 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 26. Basic Notions of ML • How to asses the performance of a model? • Used to choose the best model • Supervised learning – Assessing errors of the model on the validation and test sets 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 27. Performance measures - supervised • Regression – Mean error, mean squared error (MSE), root MSE, etc. • Classification – Accuracy – Precision / recall (possibly weighted) – F-measure – Confusion matrix – TP, TN, FP, FN – AUC / ROC 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 28. • Source: http://www.resacorp.com/lsmse2.htm 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 29. • Source: http://en.wikipedia.org/wiki/Sensitivity_and_specificity 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 30. Basic Notions of ML • Overfitting – Low error on training data – High error on (unseen/new) test data – The model does not generalize well from the training data on unseen data – Possible causes: • Too few training examples • Model too complex (too many parameters) • Underfitting – High error both on training and test data – Model is too simple to capture the information in the training data 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 31. • Source: http://gerardnico.com/wiki/data_mining/overfitting 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 32. Sample problems and solutions • Chosen from prior experience • May not be the most interesting problems for all (any) of you • I have tried to present 4 problems that are somehow diverse in solution and simple to explain • All solutions are state-of-the-art (that means not the very best one, but among the top) – Very best solution may be found in research papers 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 33. Finding vandalism on Wikipedia • Work with Dan Cioiu • Vandalism = editing in a malicious manner that is intentionally disruptive • Dataset used at PAN 2011 – Real articles extracted from Wikipedia in 2009 and 2010 – Training data: 32,000 edits – Test data: 130,000 edits • An edit consists of: – the old and new content of the article – author information – revision comment and timestamp • We chose to use only the actual text in the revision (no author and timestamp information) to predict vandalism 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 34. Features 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 35. Using several classifiers 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 36. Cross-validation results 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 37. Combining results using a meta- classifier 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 38. Final results on test set • Source: http://link.springer.com/chapter/10.1007%2F978-3-642-40495-5_32 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 39. Main conclusions • Feature extraction is important in ML – Which are the best features? • Syntactic and context analysis are the most important • Semantic analysis is slow but increases detection by 10% • Using features related to users would further improve our results • A meta-classifier sometimes improves the results of several individual classifiers – Not always! Maybe another discussion on this… 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 40. Sexual predator detection • Work with Claudia Cardei • Automatically detect sexual predators in a chat discussion in order to alert the parents or the police • Corpus from PAN 2012 – 60000+ chat conversations (around 10000 after removing outliers) – 90000+ users – 142 sexual predators • Sexual predator is used “to describe a person seen as obtaining or trying to obtain sexual contact with another person in a metaphorically predatory manner” (Wikipedia) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 41. Problem description • Decide either a user is a predator or not • Use only the chats that have a predator (balanced corpus: 50% predators - 50% victims) • One document for each user • How to determine the features? 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 42. Simple solution • “Bag of words” (BoW) features with tf-idf weighting • Feature extraction using mutual information • MLP (neural network) with 500 features: 89% 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 43. Going beyond • The should be other features that are discriminative for predators • Investigated behavioral features: – Percentage of questions and inquiries initiated by an user – Percentage of negations – Expressions that might denote an underage user (“don’t know”, “never did”,…) – The age of the user if found in a chat reply (“asl” like answers) – Percentage of slang words – Percentage of words with a sexual connotation – Readability scores (Flesch) – Who started the conversation • AdaBoost (with DecisionStump): 90% • Random Forest: 93% 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 44. Main conclusions • Feature selection is useful when the number of features is large – Reduced from 10k+ features to hundreds/thousands – We used MI, there are others techniques • Fewer features can be as descriptive as (or even more descriptive than) the BoW model – Difficult to choose – Need additional computations and information – Need to understand the dataset 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 45. Identification of buildings on a map • Work with Alexandra Vieru, Paul Vlase • Given an aerial image, identify the buildings • Non-military purposes: – Estimate population density – Identify main landmarks (shopping malls, etc.) – Identify new buildings over time • Features – HOG descriptors – 3x3 cells – Histogram of RGB – 5 bins, normalized 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 46. 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 47. Preliminary results • SVM classifier • Train: 2276 images, Test: 637 images 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 48. Main conclusions • Verify which are the best features – If this is possible • Must verify for overfitting using train and test corpus • Each task can have a baseline (a simple classification) method – Here it might be SVM with RGB histogram features • Actually this could be treated as a one-class classification (or PU learning) problem – There are much more negative instances than positive ones 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 49. Change in the meaning of concepts over time • Work with Costin Chiru, Bianca Dinu, Diana Mincu • Assumption: – Different meanings of words should be identifiable based on the other words that are associated with them – These other words are changing over time thus denoting a change in the meaning (or usage of the meanings) • Using the Google Books n-grams corpus – Dependency information – 5-grams information centered on the analyzed word 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 50. • Source: https://books.google.com/ngrams (Query: “gay=>*_NOUN”) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 51. • Clustering using tf-idf over dependencies of the analyzed word • Each distinct year is considered a different document and each dependency is a feature/term • Example for the word “gay”: – Cluster 1 (1600 - 1925): signature {more, lively, happy, brilliant, cheerful} – Cluster 2 (1923-1969): signature {more, lively, happy, cheerful, be} – Cluster 3 (1981 - today): signature {lesbian, bisexual, anti, enolum, straight} 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 52. Practical ML • Brief discussion of alternatives that might be used by a programmer • The main idea is to show that it is simple to start working on a ML task • However, it is more difficult to achieve very good results – Experience – Theoretical underpinnings of models – Statistical background 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 53. R language • Designed for “statistical computing” • Lots of packages and functions for ML & statistics • Simple to output visualizations • Easy to write code, short • It is kind of slow on some tasks • Need to understand some new concepts: data frame, factor, etc. • You may find the same functionality in several packages (e.g. NaïveBayes, Cohen’s Kappa, etc.) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 54. Python (sklearn) • sklearn (scikit-learn) package developed for ML • Uses existing packages numpy, scipy • Started as a GSOC project in 2007 • It is new, implemented efficiently (time and memory), lots of functionality • Easy to write code – Naming conventions, parameters, etc. 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 55. • Source: http://peekaboo-vision.blogspot.ro/2013/01/machine-learning-cheat- sheet-for-scikit.html 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 56. Java (Weka) • Similar to scikit-learn for Python • Older (since 1997) and more stable than scikit-learn • Some implementations are (were) a little bit more inefficient, especially related to memory consumption • However, usually it is fast, accurate, and has lots of other useful utilities • It also has a GUI for getting some results fast without writing any code • Lots of people are debating whether Weka or scikit-learn is better. I don’t want to get into this discussion as it depends on your task and resources. 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 57. Not discussed • RapidMiner – Easy to use – GUI interface for building the processing pipeline – Can write some code/plugins • Matlab/Octave – Numeric computing – Paid vs. free • SPSS/SAS/Stata – Somehow similar to R, faster, more robust, expensive • Many others: – Apache Mahout / MALLET / NLTK / Orange / MLPACK – Specific solutions for various ML models 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 58. Conclusions • Lots of interesting ML tasks and data available – Do you have access to new and interesting data in your business? • Several easy to use programming alternatives • Understand the basics • Enhance your maths (statistics) skills for a better ML experience • Hands-on experience and communities of practice – www.kaggle.com (and other similar initiatives) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 59. Thank you! traian.rebedea@cs.pub.ro 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49 _____ _____