SlideShare a Scribd company logo
1 of 29
Download to read offline
Smart recommendation engine
of things to do in destination
Natural Language Processing and
Machine Learning
How to automatically categorize tours
and activities ?
July 2nd 2018
Introduction
MyLittleAdventure
@mylitadventure
Johnny RAHAJARISON
@brainstorm_me
johnny.rahajarison@mylittleadventure.com
2
Agenda
Introduction to machine learning
Why Natural Language Processing is so hard?
How do we process text?
Let’s try it out
Go further
3
What’s Machine Learning ?
Software that do something without being
explicitly programmed to, just by learning
through examples
Same software can be used for various tasks
It learns from experiences with respect to some task and
performance, and improves through experience
4
Unsupervised algorithms
Unsupervised algorithms
ClusteringAnomaly detection
5
Supervised algorithms
Supervised algorithms
ClassificationRegression
6
You said text, right?
7
Obviously, you said text
Not numbers
ContextPolysemy
Synonyms
Enantiosemy
Neologisms
Sarcasm
Names
Rare words
Common sense
Dialects
Non formal / abbrev.
8
Ambiguity?
9
I saw a man on a hill with a telescope.
Ambiguity?
10
I saw a man on a hill with a telescope.
Text should be prepared
11
Let’s clean our text first
['one', 'morn', 'when', 'gregor', 'samsa', 'woke', 'from', 'troubl', 'dream', 'he', 'found',
'himself', 'transform', 'in', 'hi', 'bed', 'into', 'a', 'horribl', 'vermin', 'He', 'lay', 'on',
'hi', 'armour-lik', 'back', 'and', 'if', 'he', 'lift', 'hi', 'head', 'a', 'littl', 'he', 'could',
'see', 'hi', 'brown', 'belli', 'slightli', 'dome', 'and', 'divid', 'by', 'arch', 'into', 'stiff',
'section', 'the', 'bed', 'wa', 'hardli', 'abl', 'to', 'cover', 'it', 'and', 'seem', 'readi', 'to',
'slide', 'off', 'ani', 'moment', 'hi', 'mani', 'leg', 'piti', 'thin', 'compar', 'with', 'the',
'size', 'of', 'the', 'rest', 'of', 'him', 'wave', 'about', 'helplessli', 'as', 'he', 'look',
'what', "'s", 'happen', ‘to']
✓ Tokenize sentences
✓ Tokenize words
✓ Transliterate
✓ Normalize
✓ Filter out 

(punctuation, special characters, stop words)
✓ Use a stemmer and / or a lemmatizer

("be" = am, are, is; “vari" = variation, vary, varies, variables)
12
A bag of words
“John","likes","to","watch","movies","Mary","likes","movies","too"
{"John":1,"likes":2,"to":1,"watch":1,"movies":2,"Mary":1,"too":1}
{131:1, 132:2, 133:1, 134:1, 135:2, 136:1, 137:1}
[1, 2, 1, 1, 2, 1, 1]
Each unique word in our dictionary will correspond to a feature
13
Count of documents
TF-IDF
TF (Term Frequencies)
Occurrences of a term
IDF (Inverse Document Frequency)
log( )Count of documents where terms appear
Total words in each document
14
Another way: use words embeddings
Words embeddings captures relative meaning
Use vectors to get comprehensive geometry of words
15
Paris - France + China = Beijing
Another way: use words embeddings
16
Example of “movies" vector
movies -0.34582 0.057328 0.1328 0.22376 0.10161 0.52948 -0.30199 0.45676 -0.37643 -0.51857 0.67325 -0.012444 -0.099021 0.43823
-0.28905 -1.0183 -0.0062387 -0.32893 0.55547 0.44181 0.31524 0.29909 0.51605 0.32109 0.021471 0.67909 0.037333 -0.42321
0.56517 0.47979 -0.63307 0.1126 0.0050579 -0.18879 -0.87478 -0.29481 -0.70824 -0.072256 0.1614 0.34523 0.61872 -0.036932
-0.43343 0.29604 0.18671 -0.33384 0.50628 -0.013876 0.46303 0.19298 0.16783 -0.55786 -0.16947 -0.27382 0.31027 0.10974 0.12819
0.23538 0.038003 -0.077524 -0.23291 0.044094 0.36325 0.20611 0.55571 -0.022715 -0.04996 0.32312 0.44176 0.25272 0.15159
0.22682 -0.10425 0.73375 0.66572 -0.55885 0.082242 -0.13387 0.31042 -0.38443 -0.38631 -0.7518 0.6706 -0.17495 0.056298 0.82038
0.41573 -0.12316 0.28437 -0.19324 -0.13485 0.28862 -0.37817 0.37268 0.01515 0.39123 0.059544 -0.074006 -0.17152 -1.1523
0.26541 0.082314 0.17914 -0.089861 -0.20884 0.29248 -0.60263 -0.0024285 0.24521 -0.5427 -0.074404 0.14034 0.0085891 -0.37351
0.23573 0.1493 -0.14038 0.11725 -0.51013 -0.64531 0.1329 0.075911 -0.10827 0.22077 -0.086253 0.4096 0.052314 0.40964 -0.030506
0.30572 -0.40694 -0.11773 0.21586 0.14448 0.23419 -0.23401 0.06811 0.29447 -0.4086 0.88777 -0.19477 -0.18847 0.10324 -0.24593
-0.10173 -0.43226 -0.091173 -0.092602 -0.23385 -0.16498 0.22057 0.11014 -0.25018 -0.43089 0.19759 0.11762 -0.045432 0.13331
0.032684 -0.21702 0.35082 -0.40466 -0.02425 -0.22637 0.0094442 0.72848 0.10286 0.27199 -0.40396 0.22366 -0.039481 -0.17164
-1.7307 0.3706 -0.13711 0.2295 -0.34432 -0.024381 -0.093941 -0.29861 -0.33164 -0.12931 -0.11218 0.047052 0.40442 0.0043382
0.22364 -0.31537 0.1987 -0.46108 -0.35126 -0.14584 0.17765 0.10869 -0.14434 -0.6152 -0.5874 0.014977 -0.1691 -0.46926 1.3959
-0.15449 -0.24167 -0.002575 0.4758 -0.044786 -0.21345 0.22983 -0.34356 -0.43402 -0.45719 -0.29775 -0.053295 0.50132 -0.24066
0.45762 0.095118 0.21008 0.71912 0.028577 -0.64176 0.1314 0.21556 -0.12536 -0.3298 -0.07123 0.35428 -0.3787 0.12348 -0.060439
0.19217 -0.29951 -0.73189 -0.33589 0.449 0.22654 1.0404 0.019947 -0.74711 0.071042 0.067809 0.36341 -0.32579 -0.11085 -0.24507
-0.13518 -0.44326 0.022784 -0.57252 0.33756 -0.23411 -0.062955 -0.35353 1.0497 -0.14938 -0.57772 0.27652 -0.28787 -0.0040621
0.25113 0.40818 -0.13227 0.016032 -0.55465 0.0021098 -0.27755 0.16082 -0.055202 0.21104 0.58412 0.42842 -0.047253 0.10542
0.027478 0.30911 0.31792 -1.8564 0.014412 -0.29748 -0.70103 -0.068219 -0.53071 -0.10661 0.028596 0.081479 0.34323 -0.047833
0.023129 0.028697 0.33859 -0.20706 -0.0025571 -0.18267 -0.26946 -1.1064 -0.31228 -0.13101 0.1161 -0.068647 -0.09988
Another way: use words embeddings
17
[[], 2*[], [], [], 2 *[-0.34582, 0.057328, … 0.22376, 0.10161], [], []]
{"John":1,"likes":2,"to":1,"watch":1,"movies":2,"Mary":1,"too":1}
{131:1, 132:2, 133:1, 134:1, 135:2, 136:1, 137:1}
[1, 2, 1, 1, 2, 1, 1]
Another way: use words embeddings
Embeddings vector for “movies"
18
Let’s predict
19
Recipe
Prepare
Training / Test
data
Files, database,
cache, data flow
Selection of model,
and (hyper) parameters
Train algorithm
Use or store your
trained estimator
Make
predictions
Measure accuracy
precision
Measure
20
Collect our training & test dataset
Food Label Vectorized
Eiffel Tower with Dinner
[ 0., 0., 0., 0., 0.5, 0.5, 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0.5, 0., 0.5],
Skip the line Eiffel Tower
[ 0., 0., 0., 0., 0., 0.3967171 , 0., 0., 0., 0.47792296, 0., 0.,
0., 0., 0., 0.47792296, 0.47792296, 0., 0., 0.3967171 , 0., 0.],
Louvre Museum fast track
[ 0., 0., 0., 0., 0., 0., 0.5, 0., 0., 0., 0.5, 0.5, 0., 0., 0.,
0., 0., 0., 0., 0., 0.5, 0.],
Gourmet tour of Paris
[ 0., 0., 0., 0., 0., 0., 0., 0.58910044, 0., 0., 0., 0.,
0.41798437, 0.48900396, 0., 0., 0., 0., 0.48900396, 0., 0., 0.],
Segway tour of city’s highlights
[ 0., 0., 0.48838773, 0., 0., 0., 0., 0., 0.48838773, 0., 0., 0.,
0.3465257 , 0., 0.48838773, 0., 0., 0., 0.40540376, 0., 0., 0.],
Dinner cruise with Champagne
[ 0., 0.54408243, 0., 0.54408243, 0.45163515, 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.45163515],
Aquarium of Paris ticket
[ 0.55967542, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.39710644, 0.46457866, 0., 0., 0., 0.55967542, 0., 0., 0., 0.]
… …
21
Choose a classifier algorithm
22
A few recommendations
Naive Bayes / Logistic Regression
Decision Trees
Random Forest
Gradient Boosting
SVM
Neural Networks
23
Let’s measure
Food Label Prediction
Eiffel Tower with Dinner 0.83
Gourmet tour of Paris 0.96
Dinner cruise with Champagne 1.0
Segway tour of city’s highlights 0.03
Orsay dedicated entrance 0.02
3 course meal in Eiffel Tower 0.97
Cooking class in Paris 0.89
Moulin Rouge Paris dinner show 0.91
24
Training set
Real datas
25
Go further
26
There is way more
Cross validation dataset
N-Grams
Wrong user content
Misspellings & typos
Hard to get training data
Harder languages or transliterations issues
Memory / computing limitations
Online learning & Stacking
27
Some resources
https://www.slideshare.net/mylittleadventure/introduction-machine-learning-by-mylittleadventure
http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
https://bit.ly/2uL954v
NLTK
Book
Stanford’s GloVe
DatasetCourse
Andrew Ng (coursera)
Platform
28
Libraries
Thank you
July 2nd 2018
Questions ?
@mylitadventure
@brainstorm_me
johnny.rahajarison@mylittleadventure.com

More Related Content

Similar to SophiaConf 2018 - J. Rahajarison (My Little Adventure)

Harkable Day of Innovation Oct 2013 - Hark in the Park
Harkable Day of Innovation Oct 2013 - Hark in the ParkHarkable Day of Innovation Oct 2013 - Hark in the Park
Harkable Day of Innovation Oct 2013 - Hark in the ParkHarkable
 
What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...Domino Data Lab
 
What's in your Workflow?
What's in your Workflow?What's in your Workflow?
What's in your Workflow?Emily Riederer
 
Wearables that rocks my world and some that don't
Wearables that rocks my world and some that don'tWearables that rocks my world and some that don't
Wearables that rocks my world and some that don'tLBi
 
Business statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeBusiness statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeAssignmentchimp
 
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.Brittany Smith
 
Webconf 2013 - Media Query 123
Webconf 2013 - Media Query 123Webconf 2013 - Media Query 123
Webconf 2013 - Media Query 123Hina Chen
 
Performics at CES: Day 2
Performics at CES: Day 2 Performics at CES: Day 2
Performics at CES: Day 2 Performics
 
"The Cutting Edge" - Palletways Business Club Presentation
"The Cutting Edge" - Palletways Business Club Presentation"The Cutting Edge" - Palletways Business Club Presentation
"The Cutting Edge" - Palletways Business Club Presentationgeorge_edwards
 
Data science in action
Data science in actionData science in action
Data science in actionLonghow Lam
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
 
Patient Zero, One, One, Zero, One
Patient Zero, One, One, Zero, OnePatient Zero, One, One, Zero, One
Patient Zero, One, One, Zero, OneChris Dancy
 
PPT Seminar TA Augmented Reality
PPT Seminar TA Augmented RealityPPT Seminar TA Augmented Reality
PPT Seminar TA Augmented RealityAhmad Arif Faizin
 
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...Ahmad Arif Faizin
 
Detecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningDetecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningAndrew Beard
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSingleStore
 
Faster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesFaster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesOSCON Byrum
 
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptxLuis Beltran
 
AUGMENTED REALITY Refernces
AUGMENTED REALITY ReferncesAUGMENTED REALITY Refernces
AUGMENTED REALITY ReferncesVenu Gopal
 

Similar to SophiaConf 2018 - J. Rahajarison (My Little Adventure) (20)

Harkable Day of Innovation Oct 2013 - Hark in the Park
Harkable Day of Innovation Oct 2013 - Hark in the ParkHarkable Day of Innovation Oct 2013 - Hark in the Park
Harkable Day of Innovation Oct 2013 - Hark in the Park
 
Fighting Digital Dizzyness
Fighting Digital DizzynessFighting Digital Dizzyness
Fighting Digital Dizzyness
 
What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...
 
What's in your Workflow?
What's in your Workflow?What's in your Workflow?
What's in your Workflow?
 
Wearables that rocks my world and some that don't
Wearables that rocks my world and some that don'tWearables that rocks my world and some that don't
Wearables that rocks my world and some that don't
 
Business statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeBusiness statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylme
 
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.
 
Webconf 2013 - Media Query 123
Webconf 2013 - Media Query 123Webconf 2013 - Media Query 123
Webconf 2013 - Media Query 123
 
Performics at CES: Day 2
Performics at CES: Day 2 Performics at CES: Day 2
Performics at CES: Day 2
 
"The Cutting Edge" - Palletways Business Club Presentation
"The Cutting Edge" - Palletways Business Club Presentation"The Cutting Edge" - Palletways Business Club Presentation
"The Cutting Edge" - Palletways Business Club Presentation
 
Data science in action
Data science in actionData science in action
Data science in action
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Patient Zero, One, One, Zero, One
Patient Zero, One, One, Zero, OnePatient Zero, One, One, Zero, One
Patient Zero, One, One, Zero, One
 
PPT Seminar TA Augmented Reality
PPT Seminar TA Augmented RealityPPT Seminar TA Augmented Reality
PPT Seminar TA Augmented Reality
 
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...
 
Detecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningDetecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine Learning
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
 
Faster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesFaster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypes
 
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
 
AUGMENTED REALITY Refernces
AUGMENTED REALITY ReferncesAUGMENTED REALITY Refernces
AUGMENTED REALITY Refernces
 

More from TelecomValley

Rapport d'activité SoFAB 2022
Rapport d'activité SoFAB 2022Rapport d'activité SoFAB 2022
Rapport d'activité SoFAB 2022TelecomValley
 
Rapport d'activité 2022
Rapport d'activité 2022Rapport d'activité 2022
Rapport d'activité 2022TelecomValley
 
Rapport d'activité 2021 - Telecom Valley
Rapport d'activité 2021 - Telecom ValleyRapport d'activité 2021 - Telecom Valley
Rapport d'activité 2021 - Telecom ValleyTelecomValley
 
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...TelecomValley
 
Rapport d'activité SoFAB 2020
Rapport d'activité SoFAB 2020Rapport d'activité SoFAB 2020
Rapport d'activité SoFAB 2020TelecomValley
 
Rapport d'activité Telecom Valley 2020
Rapport d'activité Telecom Valley 2020Rapport d'activité Telecom Valley 2020
Rapport d'activité Telecom Valley 2020TelecomValley
 
Rapport d'activité SoFAB 2019
Rapport d'activité SoFAB 2019Rapport d'activité SoFAB 2019
Rapport d'activité SoFAB 2019TelecomValley
 
Rapport d'activité Telecom Valley 2019
Rapport d'activité Telecom Valley 2019Rapport d'activité Telecom Valley 2019
Rapport d'activité Telecom Valley 2019TelecomValley
 
Revue de presse Telecom Valley - Février 2020
Revue de presse Telecom Valley - Février 2020Revue de presse Telecom Valley - Février 2020
Revue de presse Telecom Valley - Février 2020TelecomValley
 
Revue de presse Telecom Valley - Janvier 2020
Revue de presse Telecom Valley - Janvier 2020Revue de presse Telecom Valley - Janvier 2020
Revue de presse Telecom Valley - Janvier 2020TelecomValley
 
Revue de presse Telecom Valley - Décembre 2019
Revue de presse Telecom Valley - Décembre 2019Revue de presse Telecom Valley - Décembre 2019
Revue de presse Telecom Valley - Décembre 2019TelecomValley
 
Revue de presse Telecom Valley - Novembre 2019
Revue de presse Telecom Valley - Novembre 2019Revue de presse Telecom Valley - Novembre 2019
Revue de presse Telecom Valley - Novembre 2019TelecomValley
 
Revue de presse Telecom Valley - Octobre 2019
Revue de presse Telecom Valley - Octobre 2019Revue de presse Telecom Valley - Octobre 2019
Revue de presse Telecom Valley - Octobre 2019TelecomValley
 
Revue de presse Telecom Valley - Septembre 2019
Revue de presse Telecom Valley - Septembre 2019Revue de presse Telecom Valley - Septembre 2019
Revue de presse Telecom Valley - Septembre 2019TelecomValley
 
Présentation Team France Export régionale - 29/11/19
Présentation Team France Export régionale - 29/11/19Présentation Team France Export régionale - 29/11/19
Présentation Team France Export régionale - 29/11/19TelecomValley
 
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...TelecomValley
 
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...TelecomValley
 
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...Et si mon test était la spécification de mon application ? - JACOB - iWE - So...
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...TelecomValley
 
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFE
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFEA la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFE
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFETelecomValley
 
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.12019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1TelecomValley
 

More from TelecomValley (20)

Rapport d'activité SoFAB 2022
Rapport d'activité SoFAB 2022Rapport d'activité SoFAB 2022
Rapport d'activité SoFAB 2022
 
Rapport d'activité 2022
Rapport d'activité 2022Rapport d'activité 2022
Rapport d'activité 2022
 
Rapport d'activité 2021 - Telecom Valley
Rapport d'activité 2021 - Telecom ValleyRapport d'activité 2021 - Telecom Valley
Rapport d'activité 2021 - Telecom Valley
 
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...
 
Rapport d'activité SoFAB 2020
Rapport d'activité SoFAB 2020Rapport d'activité SoFAB 2020
Rapport d'activité SoFAB 2020
 
Rapport d'activité Telecom Valley 2020
Rapport d'activité Telecom Valley 2020Rapport d'activité Telecom Valley 2020
Rapport d'activité Telecom Valley 2020
 
Rapport d'activité SoFAB 2019
Rapport d'activité SoFAB 2019Rapport d'activité SoFAB 2019
Rapport d'activité SoFAB 2019
 
Rapport d'activité Telecom Valley 2019
Rapport d'activité Telecom Valley 2019Rapport d'activité Telecom Valley 2019
Rapport d'activité Telecom Valley 2019
 
Revue de presse Telecom Valley - Février 2020
Revue de presse Telecom Valley - Février 2020Revue de presse Telecom Valley - Février 2020
Revue de presse Telecom Valley - Février 2020
 
Revue de presse Telecom Valley - Janvier 2020
Revue de presse Telecom Valley - Janvier 2020Revue de presse Telecom Valley - Janvier 2020
Revue de presse Telecom Valley - Janvier 2020
 
Revue de presse Telecom Valley - Décembre 2019
Revue de presse Telecom Valley - Décembre 2019Revue de presse Telecom Valley - Décembre 2019
Revue de presse Telecom Valley - Décembre 2019
 
Revue de presse Telecom Valley - Novembre 2019
Revue de presse Telecom Valley - Novembre 2019Revue de presse Telecom Valley - Novembre 2019
Revue de presse Telecom Valley - Novembre 2019
 
Revue de presse Telecom Valley - Octobre 2019
Revue de presse Telecom Valley - Octobre 2019Revue de presse Telecom Valley - Octobre 2019
Revue de presse Telecom Valley - Octobre 2019
 
Revue de presse Telecom Valley - Septembre 2019
Revue de presse Telecom Valley - Septembre 2019Revue de presse Telecom Valley - Septembre 2019
Revue de presse Telecom Valley - Septembre 2019
 
Présentation Team France Export régionale - 29/11/19
Présentation Team France Export régionale - 29/11/19Présentation Team France Export régionale - 29/11/19
Présentation Team France Export régionale - 29/11/19
 
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...
 
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...
 
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...Et si mon test était la spécification de mon application ? - JACOB - iWE - So...
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...
 
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFE
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFEA la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFE
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFE
 
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.12019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

SophiaConf 2018 - J. Rahajarison (My Little Adventure)

  • 1. Smart recommendation engine of things to do in destination Natural Language Processing and Machine Learning How to automatically categorize tours and activities ? July 2nd 2018
  • 3. Agenda Introduction to machine learning Why Natural Language Processing is so hard? How do we process text? Let’s try it out Go further 3
  • 4. What’s Machine Learning ? Software that do something without being explicitly programmed to, just by learning through examples Same software can be used for various tasks It learns from experiences with respect to some task and performance, and improves through experience 4
  • 7. You said text, right? 7
  • 8. Obviously, you said text Not numbers ContextPolysemy Synonyms Enantiosemy Neologisms Sarcasm Names Rare words Common sense Dialects Non formal / abbrev. 8
  • 9. Ambiguity? 9 I saw a man on a hill with a telescope.
  • 10. Ambiguity? 10 I saw a man on a hill with a telescope.
  • 11. Text should be prepared 11
  • 12. Let’s clean our text first ['one', 'morn', 'when', 'gregor', 'samsa', 'woke', 'from', 'troubl', 'dream', 'he', 'found', 'himself', 'transform', 'in', 'hi', 'bed', 'into', 'a', 'horribl', 'vermin', 'He', 'lay', 'on', 'hi', 'armour-lik', 'back', 'and', 'if', 'he', 'lift', 'hi', 'head', 'a', 'littl', 'he', 'could', 'see', 'hi', 'brown', 'belli', 'slightli', 'dome', 'and', 'divid', 'by', 'arch', 'into', 'stiff', 'section', 'the', 'bed', 'wa', 'hardli', 'abl', 'to', 'cover', 'it', 'and', 'seem', 'readi', 'to', 'slide', 'off', 'ani', 'moment', 'hi', 'mani', 'leg', 'piti', 'thin', 'compar', 'with', 'the', 'size', 'of', 'the', 'rest', 'of', 'him', 'wave', 'about', 'helplessli', 'as', 'he', 'look', 'what', "'s", 'happen', ‘to'] ✓ Tokenize sentences ✓ Tokenize words ✓ Transliterate ✓ Normalize ✓ Filter out 
 (punctuation, special characters, stop words) ✓ Use a stemmer and / or a lemmatizer
 ("be" = am, are, is; “vari" = variation, vary, varies, variables) 12
  • 13. A bag of words “John","likes","to","watch","movies","Mary","likes","movies","too" {"John":1,"likes":2,"to":1,"watch":1,"movies":2,"Mary":1,"too":1} {131:1, 132:2, 133:1, 134:1, 135:2, 136:1, 137:1} [1, 2, 1, 1, 2, 1, 1] Each unique word in our dictionary will correspond to a feature 13
  • 14. Count of documents TF-IDF TF (Term Frequencies) Occurrences of a term IDF (Inverse Document Frequency) log( )Count of documents where terms appear Total words in each document 14
  • 15. Another way: use words embeddings Words embeddings captures relative meaning Use vectors to get comprehensive geometry of words 15
  • 16. Paris - France + China = Beijing Another way: use words embeddings 16
  • 17. Example of “movies" vector movies -0.34582 0.057328 0.1328 0.22376 0.10161 0.52948 -0.30199 0.45676 -0.37643 -0.51857 0.67325 -0.012444 -0.099021 0.43823 -0.28905 -1.0183 -0.0062387 -0.32893 0.55547 0.44181 0.31524 0.29909 0.51605 0.32109 0.021471 0.67909 0.037333 -0.42321 0.56517 0.47979 -0.63307 0.1126 0.0050579 -0.18879 -0.87478 -0.29481 -0.70824 -0.072256 0.1614 0.34523 0.61872 -0.036932 -0.43343 0.29604 0.18671 -0.33384 0.50628 -0.013876 0.46303 0.19298 0.16783 -0.55786 -0.16947 -0.27382 0.31027 0.10974 0.12819 0.23538 0.038003 -0.077524 -0.23291 0.044094 0.36325 0.20611 0.55571 -0.022715 -0.04996 0.32312 0.44176 0.25272 0.15159 0.22682 -0.10425 0.73375 0.66572 -0.55885 0.082242 -0.13387 0.31042 -0.38443 -0.38631 -0.7518 0.6706 -0.17495 0.056298 0.82038 0.41573 -0.12316 0.28437 -0.19324 -0.13485 0.28862 -0.37817 0.37268 0.01515 0.39123 0.059544 -0.074006 -0.17152 -1.1523 0.26541 0.082314 0.17914 -0.089861 -0.20884 0.29248 -0.60263 -0.0024285 0.24521 -0.5427 -0.074404 0.14034 0.0085891 -0.37351 0.23573 0.1493 -0.14038 0.11725 -0.51013 -0.64531 0.1329 0.075911 -0.10827 0.22077 -0.086253 0.4096 0.052314 0.40964 -0.030506 0.30572 -0.40694 -0.11773 0.21586 0.14448 0.23419 -0.23401 0.06811 0.29447 -0.4086 0.88777 -0.19477 -0.18847 0.10324 -0.24593 -0.10173 -0.43226 -0.091173 -0.092602 -0.23385 -0.16498 0.22057 0.11014 -0.25018 -0.43089 0.19759 0.11762 -0.045432 0.13331 0.032684 -0.21702 0.35082 -0.40466 -0.02425 -0.22637 0.0094442 0.72848 0.10286 0.27199 -0.40396 0.22366 -0.039481 -0.17164 -1.7307 0.3706 -0.13711 0.2295 -0.34432 -0.024381 -0.093941 -0.29861 -0.33164 -0.12931 -0.11218 0.047052 0.40442 0.0043382 0.22364 -0.31537 0.1987 -0.46108 -0.35126 -0.14584 0.17765 0.10869 -0.14434 -0.6152 -0.5874 0.014977 -0.1691 -0.46926 1.3959 -0.15449 -0.24167 -0.002575 0.4758 -0.044786 -0.21345 0.22983 -0.34356 -0.43402 -0.45719 -0.29775 -0.053295 0.50132 -0.24066 0.45762 0.095118 0.21008 0.71912 0.028577 -0.64176 0.1314 0.21556 -0.12536 -0.3298 -0.07123 0.35428 -0.3787 0.12348 -0.060439 0.19217 -0.29951 -0.73189 -0.33589 0.449 0.22654 1.0404 0.019947 -0.74711 0.071042 0.067809 0.36341 -0.32579 -0.11085 -0.24507 -0.13518 -0.44326 0.022784 -0.57252 0.33756 -0.23411 -0.062955 -0.35353 1.0497 -0.14938 -0.57772 0.27652 -0.28787 -0.0040621 0.25113 0.40818 -0.13227 0.016032 -0.55465 0.0021098 -0.27755 0.16082 -0.055202 0.21104 0.58412 0.42842 -0.047253 0.10542 0.027478 0.30911 0.31792 -1.8564 0.014412 -0.29748 -0.70103 -0.068219 -0.53071 -0.10661 0.028596 0.081479 0.34323 -0.047833 0.023129 0.028697 0.33859 -0.20706 -0.0025571 -0.18267 -0.26946 -1.1064 -0.31228 -0.13101 0.1161 -0.068647 -0.09988 Another way: use words embeddings 17
  • 18. [[], 2*[], [], [], 2 *[-0.34582, 0.057328, … 0.22376, 0.10161], [], []] {"John":1,"likes":2,"to":1,"watch":1,"movies":2,"Mary":1,"too":1} {131:1, 132:2, 133:1, 134:1, 135:2, 136:1, 137:1} [1, 2, 1, 1, 2, 1, 1] Another way: use words embeddings Embeddings vector for “movies" 18
  • 20. Recipe Prepare Training / Test data Files, database, cache, data flow Selection of model, and (hyper) parameters Train algorithm Use or store your trained estimator Make predictions Measure accuracy precision Measure 20
  • 21. Collect our training & test dataset Food Label Vectorized Eiffel Tower with Dinner [ 0., 0., 0., 0., 0.5, 0.5, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.5, 0., 0.5], Skip the line Eiffel Tower [ 0., 0., 0., 0., 0., 0.3967171 , 0., 0., 0., 0.47792296, 0., 0., 0., 0., 0., 0.47792296, 0.47792296, 0., 0., 0.3967171 , 0., 0.], Louvre Museum fast track [ 0., 0., 0., 0., 0., 0., 0.5, 0., 0., 0., 0.5, 0.5, 0., 0., 0., 0., 0., 0., 0., 0., 0.5, 0.], Gourmet tour of Paris [ 0., 0., 0., 0., 0., 0., 0., 0.58910044, 0., 0., 0., 0., 0.41798437, 0.48900396, 0., 0., 0., 0., 0.48900396, 0., 0., 0.], Segway tour of city’s highlights [ 0., 0., 0.48838773, 0., 0., 0., 0., 0., 0.48838773, 0., 0., 0., 0.3465257 , 0., 0.48838773, 0., 0., 0., 0.40540376, 0., 0., 0.], Dinner cruise with Champagne [ 0., 0.54408243, 0., 0.54408243, 0.45163515, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.45163515], Aquarium of Paris ticket [ 0.55967542, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.39710644, 0.46457866, 0., 0., 0., 0.55967542, 0., 0., 0., 0.] … … 21
  • 22. Choose a classifier algorithm 22
  • 23. A few recommendations Naive Bayes / Logistic Regression Decision Trees Random Forest Gradient Boosting SVM Neural Networks 23
  • 24. Let’s measure Food Label Prediction Eiffel Tower with Dinner 0.83 Gourmet tour of Paris 0.96 Dinner cruise with Champagne 1.0 Segway tour of city’s highlights 0.03 Orsay dedicated entrance 0.02 3 course meal in Eiffel Tower 0.97 Cooking class in Paris 0.89 Moulin Rouge Paris dinner show 0.91 24 Training set Real datas
  • 25. 25
  • 27. There is way more Cross validation dataset N-Grams Wrong user content Misspellings & typos Hard to get training data Harder languages or transliterations issues Memory / computing limitations Online learning & Stacking 27
  • 29. Thank you July 2nd 2018 Questions ? @mylitadventure @brainstorm_me johnny.rahajarison@mylittleadventure.com