SlideShare a Scribd company logo
1 of 37
Download to read offline
Enhancing white-box machine learning processes by incorporating
semantic background knowledge
Gilles Vandewiele
Promotors: Femke Ongenae & Filip De Turck
Mentor: Agnieszka Ławrynowicz
Current ML processes are purely data-driven and ignore all existing
knowledge
Long training times
→ scales with #samples AND #features
A lot of data required to get good results
→ depends heavily on the quality of the data
-
-
There’s a lot of background & expert knowledge available in structured form!
Experts Knowledge
base
Expert
system
4
Some examples of critical domains...
Healthcare Finance Law
Research question
Can we combine the advantages of both data-driven and knowledge-
driven approaches by incorporating prior knowledge in the steps of a
(data-driven) ML process and what is the impact of this incorporation?
“
”
Outline
Lay-out of a typical machine learning process
& white box vs black box models
1.
Pre-processing: augmenting and balancing the data2.
Feature selection with PageRank3.
Other possible incorporations & conclusion4.
Introduction0.
The steps of a typical ML process: feature extraction/engineering
Hair? Feathers? Eggs? Milk? Predator? Legs? …
Bass 0 0 1 0 0 0
Bear 1 0 0 1 1 1
Boar 1 0 0 1 1 1
Calf 1 0 0 1 0 1
Deer 1 0 0 1 0 1
Girl 1 0 0 1 1 1
…
The steps of a typical ML process: feature selection
Hair? Feathers? Milk? Legs? …
Bass 0 0 0 0
Bear 1 0 1 1
Boar 1 0 1 1
Calf 1 0 1 1
Deer 1 0 1 1
Girl 1 0 1 1
…
Hair? Feathers? Eggs? Milk? Predator? Legs? …
Bass 0 0 1 0 0 0
Bear 1 0 0 1 1 1
Boar 1 0 0 1 1 1
Calf 1 0 0 1 0 1
Deer 1 0 0 1 0 1
Girl 1 0 0 1 1 1
…
The steps of a typical ML process: model construction
Hair? Feathers? Milk? Legs? …
Bass 0 0 0 0
Bear 1 0 1 1
Boar 1 0 1 1
Calf 1 0 1 1
Deer 1 0 1 1
Girl 1 0 1 1
…
White-box
Black-box
Additional step: pre-processing
Hair? Feathers? Milk? Legs? …
Bass 0 0 0 0
Bear 1 0 1 1
Boar 1 0 1 1
Calf 1 0 1 1
Deer 1 0 1 1
Girl 1 0 1 1
…
Data augmentation
Generate new (artificial)
samples or discover/engineer
new features
Additional step: pre-processing
Class balancing
Create a more uniform class
distribution in the dataset
White vs black box models
White vs black box models
Instance-based explanation
Why did you classify this sample as positive?
<->
Model-based explanation
What are the most important features? Why do you classify this group of samples as positive? …
→ LIME [1], MFI [2], SHAP [3]
White vs black box models
Model-based explanation
Model debugging
Feature importances → selection
...
Faster adoption in critical domains
Outline
Lay-out of a typical machine learning process
& white box vs black box models
1.
Pre-processing: augmenting and balancing the data2.
Feature selection with PageRank3.
Other possible incorporations & conclusion4.
Introduction0.
Hair? Feathers? Eggs? Milk? Predator? Legs? …
Bass 0 0 1 0 0 0
Bear 1 0 0 1 1 1
Boar 1 0 0 1 1 1
Calf 1 0 0 1 0 1
Deer 1 0 0 1 0 1
Girl 1 0 0 1 1 1
…
Features
Samples
Hair? Feathers? Eggs? Milk? Predator? Legs? …
Bass
Bear
Boar
Calf
Deer
Girl
…
LINK
LINK
Hair? Feathers? Eggs? Milk? Predator? Legs? …
Bass
Bear
Boar
Calf
Deer
Girl
…
LINK
LINK
How to link this unstructured data with minimal
user interaction?
Data augmentation: discovering new features
dbr:Bear
dbr:Mammal
dbr:Carnivora
dbo:class
dbo:order
dbr:Flea
dbr:Insect
dbr:Endopterygota
dbo:class
dbo:order
…
…
[3], [4], ...
Data augmentation: open problems
How to find useful features in immensely large
graph?
When is a feature useful?
- When to stop exploring children of a certain node?
- Not too many introduced missing values
- Can we gain information by adding the new feature?
Data imbalance
Model that always predict 0
 VERY HIGH accuracy
 COMPLETELY useless
Data balancing approaches
Custom objective function (higher penalty for
minority class)
Oversampling of minority class
Undersampling of majority class
Model-agnostic+
Current approaches: SMOTE [6] & ADASYN [7]
Hybrid approach
Outline
Lay-out of a typical machine learning process
& white box vs black box models
1.
Pre-processing: augmenting and balancing the data2.
Feature selection with PageRank3.
Other possible incorporations & conclusion4.
Introduction0.
F1
F2
F3
F4
F5
C1
C2
C3
PageRank Feature Selection
Feature concept
Class concept
PageRank Feature Selection: advantages
Requires no data
Can be used in unsupervised scenarios (i.e. clustering)
Fast (runtime in function of #features, not #samples)
+
+
+
PageRank Feature Selection: preliminary results
Zoo dataset: 7 classes, 16 features (categorical)
Glass dataset: 6 classes, 9 features (continuous)
→ CLUSTERING
→ V-Measure ~ F-Measure
Features
Lin similarity
PageRank
PageRank Feature Selection: preliminary results
Zoo Glass
PageRank Feature Selection: open problems
How to initialize the edge weights?
What ranking algorithm to use?
- Similarity/distance measures? Which ones?
- Distance or similarity: feature <-> feature, class <-> class, feature <-> class?
- PageRank out of the box?
Outline
Lay-out of a typical machine learning process
& white box vs black box models
1.
Pre-processing: augmenting and balancing the data2.
Feature selection with PageRank3.
Other possible incorporations & conclusion4.
Introduction0.
Feature extraction: knowledge subgraph vector embedding
BEAR
How to identify the relevant subgraph in the immensely large knowledge graph?
Feature extraction: knowledge subgraph vector embedding
EMBEDDING [8]
0.45
0.15
0.887
0.51
0.24
0.41
…
CURRENTLY:
(HOPEFULLY) IN FOUR YEARS:
References
[1] Ribeiro, Marco Tulio et al. "Model-agnostic interpretability of machine learning.“
[2] Vidovic, Marina M-C. et al. "Feature Importance Measure for Non-linear Learning Algorithms.“
[3] Lundberg, Scott et al. "An unexpected unity among methods for interpreting model predictions."
[4] Paulheim, Heiko, et al. "Data mining with background knowledge from the web.“
[5] Terziev, Yordan. "Feature Generation using Ontologies during Induction of Decision Trees on Linked Data.“
[6] Chawla, Nitesh V., et al. "SMOTE: synthetic minority over-sampling technique.“
[7] He, Haibo, et al. "ADASYN: Adaptive synthetic sampling approach for imbalanced learning.“
[8] Ristoski, Petar, and Heiko Paulheim. "Rdf2vec: Rdf graph embeddings for data mining.”
THANK YOU!
Acknowledgements:
- Reviewers & organizing committee
- My mentor: Agnieszka Ławrynowicz
- My promotors: Filip De Turck & Femke Ongenae gilles.vandewiele@ugent.be
@Gillesvdwiele

More Related Content

What's hot

Data Science 101
Data Science 101Data Science 101
Data Science 101ideatoipo
 
Machine Learning in the age of Big Data
Machine Learning in the age of Big DataMachine Learning in the age of Big Data
Machine Learning in the age of Big DataDaniel Sârbe
 
Machine Learning using Big data
Machine Learning using Big data Machine Learning using Big data
Machine Learning using Big data Vaibhav Kurkute
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocolsbutest
 

What's hot (8)

Data Science 101
Data Science 101Data Science 101
Data Science 101
 
Machine Learning in the age of Big Data
Machine Learning in the age of Big DataMachine Learning in the age of Big Data
Machine Learning in the age of Big Data
 
Machine Learning using Big data
Machine Learning using Big data Machine Learning using Big data
Machine Learning using Big data
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Machine learning
Machine learningMachine learning
Machine learning
 
Semi-Supervised Learning with GANs by Olga Petrova, Machine Learning Engineer...
Semi-Supervised Learning with GANs by Olga Petrova, Machine Learning Engineer...Semi-Supervised Learning with GANs by Olga Petrova, Machine Learning Engineer...
Semi-Supervised Learning with GANs by Olga Petrova, Machine Learning Engineer...
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocols
 

Similar to Enhancing ML with Semantic Knowledge

MachineLearningAndDataAnalytics_034739.pptx
MachineLearningAndDataAnalytics_034739.pptxMachineLearningAndDataAnalytics_034739.pptx
MachineLearningAndDataAnalytics_034739.pptxSakshiSingh770619
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data ExtractionDasha Herrmannova
 
Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014StampedeCon
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4Roger Barga
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Data science 101 Masterclass
Data science 101 MasterclassData science 101 Masterclass
Data science 101 MasterclassBen Keen
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringTuri, Inc.
 
7 Step Data Cleanse: Salesforce Hygiene
7 Step Data Cleanse: Salesforce Hygiene7 Step Data Cleanse: Salesforce Hygiene
7 Step Data Cleanse: Salesforce HygieneCloudFixer
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningGerard de Melo
 
Real World NLP, ML, and Big Data
Real World NLP, ML, and Big DataReal World NLP, ML, and Big Data
Real World NLP, ML, and Big DataDevin Bost
 
ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratorySara Hooker
 
Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine...
Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine...Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine...
Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine...Carlo Torniai
 

Similar to Enhancing ML with Semantic Knowledge (20)

MachineLearningAndDataAnalytics_034739.pptx
MachineLearningAndDataAnalytics_034739.pptxMachineLearningAndDataAnalytics_034739.pptx
MachineLearningAndDataAnalytics_034739.pptx
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data science 101 Masterclass
Data science 101 MasterclassData science 101 Masterclass
Data science 101 Masterclass
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
Ml3
Ml3Ml3
Ml3
 
7 Step Data Cleanse: Salesforce Hygiene
7 Step Data Cleanse: Salesforce Hygiene7 Step Data Cleanse: Salesforce Hygiene
7 Step Data Cleanse: Salesforce Hygiene
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Startup Data Science
Startup Data ScienceStartup Data Science
Startup Data Science
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
 
Real World NLP, ML, and Big Data
Real World NLP, ML, and Big DataReal World NLP, ML, and Big Data
Real World NLP, ML, and Big Data
 
AI in Production
AI in ProductionAI in Production
AI in Production
 
ML crash course
ML crash courseML crash course
ML crash course
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
 
Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine...
Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine...Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine...
Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine...
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 

Recently uploaded (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 

Enhancing ML with Semantic Knowledge

  • 1. Enhancing white-box machine learning processes by incorporating semantic background knowledge Gilles Vandewiele Promotors: Femke Ongenae & Filip De Turck Mentor: Agnieszka Ławrynowicz
  • 2. Current ML processes are purely data-driven and ignore all existing knowledge Long training times → scales with #samples AND #features A lot of data required to get good results → depends heavily on the quality of the data - -
  • 3. There’s a lot of background & expert knowledge available in structured form! Experts Knowledge base Expert system
  • 4. 4 Some examples of critical domains... Healthcare Finance Law
  • 5. Research question Can we combine the advantages of both data-driven and knowledge- driven approaches by incorporating prior knowledge in the steps of a (data-driven) ML process and what is the impact of this incorporation? “ ”
  • 6. Outline Lay-out of a typical machine learning process & white box vs black box models 1. Pre-processing: augmenting and balancing the data2. Feature selection with PageRank3. Other possible incorporations & conclusion4. Introduction0.
  • 7. The steps of a typical ML process: feature extraction/engineering Hair? Feathers? Eggs? Milk? Predator? Legs? … Bass 0 0 1 0 0 0 Bear 1 0 0 1 1 1 Boar 1 0 0 1 1 1 Calf 1 0 0 1 0 1 Deer 1 0 0 1 0 1 Girl 1 0 0 1 1 1 …
  • 8. The steps of a typical ML process: feature selection Hair? Feathers? Milk? Legs? … Bass 0 0 0 0 Bear 1 0 1 1 Boar 1 0 1 1 Calf 1 0 1 1 Deer 1 0 1 1 Girl 1 0 1 1 … Hair? Feathers? Eggs? Milk? Predator? Legs? … Bass 0 0 1 0 0 0 Bear 1 0 0 1 1 1 Boar 1 0 0 1 1 1 Calf 1 0 0 1 0 1 Deer 1 0 0 1 0 1 Girl 1 0 0 1 1 1 …
  • 9. The steps of a typical ML process: model construction Hair? Feathers? Milk? Legs? … Bass 0 0 0 0 Bear 1 0 1 1 Boar 1 0 1 1 Calf 1 0 1 1 Deer 1 0 1 1 Girl 1 0 1 1 … White-box Black-box
  • 10. Additional step: pre-processing Hair? Feathers? Milk? Legs? … Bass 0 0 0 0 Bear 1 0 1 1 Boar 1 0 1 1 Calf 1 0 1 1 Deer 1 0 1 1 Girl 1 0 1 1 … Data augmentation Generate new (artificial) samples or discover/engineer new features
  • 11. Additional step: pre-processing Class balancing Create a more uniform class distribution in the dataset
  • 12. White vs black box models
  • 13. White vs black box models Instance-based explanation Why did you classify this sample as positive? <-> Model-based explanation What are the most important features? Why do you classify this group of samples as positive? … → LIME [1], MFI [2], SHAP [3]
  • 14. White vs black box models Model-based explanation Model debugging Feature importances → selection ... Faster adoption in critical domains
  • 15. Outline Lay-out of a typical machine learning process & white box vs black box models 1. Pre-processing: augmenting and balancing the data2. Feature selection with PageRank3. Other possible incorporations & conclusion4. Introduction0.
  • 16. Hair? Feathers? Eggs? Milk? Predator? Legs? … Bass 0 0 1 0 0 0 Bear 1 0 0 1 1 1 Boar 1 0 0 1 1 1 Calf 1 0 0 1 0 1 Deer 1 0 0 1 0 1 Girl 1 0 0 1 1 1 … Features Samples
  • 17. Hair? Feathers? Eggs? Milk? Predator? Legs? … Bass Bear Boar Calf Deer Girl … LINK LINK
  • 18. Hair? Feathers? Eggs? Milk? Predator? Legs? … Bass Bear Boar Calf Deer Girl … LINK LINK How to link this unstructured data with minimal user interaction?
  • 19. Data augmentation: discovering new features dbr:Bear dbr:Mammal dbr:Carnivora dbo:class dbo:order dbr:Flea dbr:Insect dbr:Endopterygota dbo:class dbo:order … … [3], [4], ...
  • 20. Data augmentation: open problems How to find useful features in immensely large graph? When is a feature useful? - When to stop exploring children of a certain node? - Not too many introduced missing values - Can we gain information by adding the new feature?
  • 21. Data imbalance Model that always predict 0  VERY HIGH accuracy  COMPLETELY useless
  • 22. Data balancing approaches Custom objective function (higher penalty for minority class) Oversampling of minority class Undersampling of majority class Model-agnostic+
  • 23. Current approaches: SMOTE [6] & ADASYN [7]
  • 25. Outline Lay-out of a typical machine learning process & white box vs black box models 1. Pre-processing: augmenting and balancing the data2. Feature selection with PageRank3. Other possible incorporations & conclusion4. Introduction0.
  • 27. PageRank Feature Selection: advantages Requires no data Can be used in unsupervised scenarios (i.e. clustering) Fast (runtime in function of #features, not #samples) + + +
  • 28. PageRank Feature Selection: preliminary results Zoo dataset: 7 classes, 16 features (categorical) Glass dataset: 6 classes, 9 features (continuous) → CLUSTERING → V-Measure ~ F-Measure Features Lin similarity PageRank
  • 29. PageRank Feature Selection: preliminary results Zoo Glass
  • 30. PageRank Feature Selection: open problems How to initialize the edge weights? What ranking algorithm to use? - Similarity/distance measures? Which ones? - Distance or similarity: feature <-> feature, class <-> class, feature <-> class? - PageRank out of the box?
  • 31. Outline Lay-out of a typical machine learning process & white box vs black box models 1. Pre-processing: augmenting and balancing the data2. Feature selection with PageRank3. Other possible incorporations & conclusion4. Introduction0.
  • 32. Feature extraction: knowledge subgraph vector embedding BEAR How to identify the relevant subgraph in the immensely large knowledge graph?
  • 33. Feature extraction: knowledge subgraph vector embedding EMBEDDING [8] 0.45 0.15 0.887 0.51 0.24 0.41 …
  • 36. References [1] Ribeiro, Marco Tulio et al. "Model-agnostic interpretability of machine learning.“ [2] Vidovic, Marina M-C. et al. "Feature Importance Measure for Non-linear Learning Algorithms.“ [3] Lundberg, Scott et al. "An unexpected unity among methods for interpreting model predictions." [4] Paulheim, Heiko, et al. "Data mining with background knowledge from the web.“ [5] Terziev, Yordan. "Feature Generation using Ontologies during Induction of Decision Trees on Linked Data.“ [6] Chawla, Nitesh V., et al. "SMOTE: synthetic minority over-sampling technique.“ [7] He, Haibo, et al. "ADASYN: Adaptive synthetic sampling approach for imbalanced learning.“ [8] Ristoski, Petar, and Heiko Paulheim. "Rdf2vec: Rdf graph embeddings for data mining.”
  • 37. THANK YOU! Acknowledgements: - Reviewers & organizing committee - My mentor: Agnieszka Ławrynowicz - My promotors: Filip De Turck & Femke Ongenae gilles.vandewiele@ugent.be @Gillesvdwiele