SlideShare a Scribd company logo
1 of 32
Download to read offline
Poorly Supervised Learning
How to engineer labels for machine learning
when you don’t have any.
Chris Anagnostopoulos
Chief Data Scientist, www.ment.at
Hon. Senior Lecturer, Imperial College
Mentat
(X, y)
Mentat
(X, y)
image courtesy of Stanford DAWN
Mentat
(X, y)
Predict the class y of an object, given a description x of that object
˜f = argmin
f2
nX
i=1
L(f(xi), yi)
˜✓ = argmin
✓2⇥
nX
i=1
L(f✓(xi), yi)
Function approximation:
Parametric estimation:
Relies on a classifier (family of functions)
and a labelled dataset.
learning by example
Mentat
Few labels
Multiple noisy labels
Not missing
at random
Unit of analysis /
missing context
Class imbalance
Expert prior rules
vs
(X, y)
Mentat
The X and the y in Cybersecurity
Mentat
The X in Cybersecurity
web proxy logs
windows
authentication logs
process logs
packet capture
email headers
network flow
DNS logs
firewall logs
Unix syslog
A broad array of data formats,
often capturing different
aspects of the same activity.
Mentat
spamfilter
An example: exfiltration
web proxy logs
windows
authentication logs
packet captureemail headers
netflow
firewall logs
Unix syslog
LinkedIn
Email
social engineering
spearphishing
User clicks
malware
download
webproxy
antivirus
privilege
escalation
file scanning
and encryption
login
malicious
infrastructure
firewall
process logs
Mentat
The y in Cybersecurity
actual attacks are
(thankfully) very rare
but plenty of signals worth surfacing:
near-misses, early attack stages,
risky behaviours, “suspect” traffic
Mentat
The y in Cybersecurity
vs
Mountains of “soft” labelling rules
captured as search queries, automated
rules (NIDS) or look-ups (intel)
Very little to no time to
produce manual labels at
any reasonable scale
Mentat
The y in Cybersecurity
Benign vs malware in pcap
Network intrusion detection
systems (NIDS) logs
Threat intel: observables,
indicators of compromise
Threat intel: kill chain
Threat intel: TTPs
even when y is available,
it’s often about a complex
X obtained by data fusion
often a noisy y is available
by checking against a DB
the user is looking for
patterns and lookalikes
no y exists for the vast
majority of traffic
Mentat
Solution: label engineering
Mentat
Standard tools: data fusion and feature engineering
expert-defined features
are usually meant as scores
Mentat
A middle-ground between rules and classifiers
• hard to explicitly describe how cats differ from dogs.
• learning by example avoids that, but needs a gazillion labels
• human learning is a sensible mix of examples and imperfect rules
cat dog
“cats are smaller than dogs”
• the trick is to interpret rules as noisy, partial labels
• a feature engineering step is needed to map rules to raw data
Mentat
A middle-ground between rules and classifiers
Raw Data Features
Expert
Ruleset
Noisy
Labels
01011010110101000
10101101010111010
101101010111101…
{
‘size’: ‘large’,
‘ear shape’: ‘pointy’,
’color:’ [‘white’, ‘orange’],
‘nose shape: …
}
if it is very large,
then it is not a cat,
if its ears are
pointy, then it is
likely a cat
…
Example
ID
Rule 1 Rule 2 Rule 3 Ground
Truth
01.jpeg Cat Dog Cat Cat
02.jpeg Cat - Cat -
03.jpeg - - Dog -
04.jpeg - Cat - Cat
Mentat
A middle-ground between rules and classifiers
Raw Data Features
Expert
Ruleset
Noisy
Labels
01011010110101000
10101101010111010
101101010111101…
{
‘size’: ‘large’,
‘ear shape’: ‘pointy’,
’color:’ [‘white’, ‘orange’],
‘nose shape: …
}
if it is very large,
then it is not a cat,
if its ears are
pointy, then it is
likely a cat
…
Example
ID
Rule 1 Rule 2 Rule 3 Ground
Truth
01.jpeg Cat Dog Cat Cat
02.jpeg Cat - Cat -
03.jpeg - - Dog -
04.jpeg - Cat - Cat
ManualData Programming
Mentat
A middle-ground between rules and classifiers
Raw Data Features
Expert
Ruleset
Noisy
Labels
01011010110101000
10101101010111010
101101010111101…
{
‘size’: ‘large’,
‘ear shape’: ‘pointy’,
’color:’ [‘white’, ‘orange’],
‘nose shape: …
}
if it is very large,
then it is not a cat,
if its ears are
pointy, then it is
likely a cat
…
Example
ID
Rule 1 Rule 2 Rule 3 Ground
Truth
01.jpeg Cat Dog Cat Cat
02.jpeg Cat - Cat -
03.jpeg - - Dog -
04.jpeg - Cat - Cat
ManualData Programming
Semantically Easy
Semantically Tough
Mentat
A middle-ground between rules and classifiers
Raw Data
Features Expert Rules
Noisy Tags
Case Report
Class: Confidential
Jack Black interviewing
suspect John Doe
Date of interview:
11/01/2015
Summary: Regarding
the incident in London
on the 6th of December
2010, the main suspect
was John Doe.
- whichever name appears first in the
document is the name of the suspect.
- if in the first paragraph the pattern “Suspect:
[Name]” appears, then that is the suspect
name. It can also be called “Person of Interest”.
- if the term “incident” or “crime” or
“investigation” and the term “suspect” appears
in the summary, followed by a name, and that
name also appears in the first paragraph
Example
ID
Rule 1 Rule 2 Rule 3 Ground
Truth
01.doc J. Black - J. Doe J. Doe
02.doc … … … …
03.doc … … … …
04.doc … … … …
ManualData Programming
Labelling Functions
Mentat
Weakly supervised learning and data programming
Labelling functions is a middle ground between feature engineering and
labelling. It is a very information-efficient use of analyst time.
Challenge:
This was most striking in the chemical name task, where
complicated surface forms (e.g., “9-acetyl-1,3,7-trimethyl-
pyrim- idinedione") caused tokenization errors which had
significant effects on system recall. This was corrected by
utilizing a domain-specific tokenizer.
domain primitives
Mentat
Weakly supervised learning and data programming
Is all expert labelling heuristic? (as opposed to ground truth labelling)
vision and other types of cognition are often hard to
reduce to a number of heuristics
but sometimes they are heuristics on top of “deep”
domain primitives (e.g., a “circle” and a “line”)
Mentat
Weakly supervised learning and data programming
Is all expert labelling heuristic? (as opposed to ground truth labelling)
in non-sensory domains, we would expect expert
labelling to be much more rules-based.
eliciting the full decision tree voluntarily is
impossible, however; in practice, it seems
experts use a forest methodology, rather than
one relying on a single deep tree
regulation plays a role here. If you are expected
to follow a playbook and document your work,
then a procedural description is natural
Mentat
Weakly supervised learning and data programming
Domain Primitives
sessions, action sequences, statistical profiling
(“unusually high”), data fusion (job roles),
integration with threat intel knowledge DBs,
jargon (“C2-like behaviour”, “domain-flux”)
Log analysis:
Threat intel:
formal representation of TTPs, regex for cyber
entities (hashes, IPs, malware names …),
The whole point is to empower the expert to express their heuristics
programmatically in an easy, accurate and scalable fashion
Mentat
Weakly supervised learning and data programming
“Hey. We’re building a model for user agents.
Does this one look suspicious?” “Yep! That’s a bad one.”
“Great! Out of curiosity, why?”
“The SeznamBot version is wrong”
Where you had one additional label, now you have a labelling rule.
Mentat
Weakly supervised learning and data programming
“Hey. We’re building a model for user agents.
Does this one look suspicious?” “Yep! That’s a bad one.”
“Great! Out of curiosity, why?”
“The IP is blacklisted”
This is a case of distant learning that might or might not induce a
signal on the user agent itself (a small fraction of all attacks will
have anomalous user agents, so label is very noisy)
Mentat
Weakly supervised learning and data programming
“Hey. We’re building a model for user agents.
Does this one look suspicious?” “Yep! That’s a bad one.”
“Great! Out of curiosity, why?”
“The IP is from Russia”
This is case of missing context that could be turned into a
labelling rule if we enrich the data. Again, the label might be
noisy as far as the user agent is concerned - but that’s OK!
Mentat
Breaking free of (X,y): label engineering
semi-supervised
active learning
weakly supervised
multi-view learning
unsupervised learning
data programming
distant learning
importance reweighting
cost-sensitive learning
label-noise robustness
boosting co-training
Mentat
Maths and algos
E(X,y)⇠⇡L(f✓(xi), yi)Same loss as before:
Enter labelling functions:
“an IP that we have
never seen before,
and the TLD is not
a .com, .co.uk, .org”
“an IP that appears
in our blacklist”
“an IP that appears
in our greylist”
i : X ! { 1, 0, 1}
missing value
Generative model: µ↵, (⇤, Y ) =
1
2
mY
i=1
i↵i1[⇤i=Y ] + i(1 ↵i)1[⇤i= Y ] + (1 i)1[⇤i=0]
“each labelling function λ has a probability β of
labelling an object and α of getting the label right”
µ↵, (⇤, Y ) remember: we do not have any Ys
Mentat
Maximum
Likelihood:
Maths and algos
µ↵?, ? (⇤, Y ) where ↵?
, ?
argmax
↵,
X
x2X
log P(⇤,Y )⇠µ↵,
(⇤ = (x))
“what values of α and β seem most
likely given our unlabelled data X?”.
Average over all possible labels.
˜✓ = argmax
✓
X
x2X
E(⇤,Y )⇠µ↵?, ? [L(x, y) | ⇤ = (x)]Noise-aware
empirical loss:
Assumptions:
y ? (f(x) | (x)) “the labelling functions tell you all
you need to know about the label”
i(x) ? j(x) | y “the labelling functions are independent”
So overlaps are very informative!
Mentat
Software
Chatter
(mostly text)
Logs
(device data)
Intel Alerts
Analysis Rulesets
keyword extraction
topic classification
link analysis
alias detection
regular expressions
SQL / SPL queries
NIDS rules
DPI classification
behavioural profiling
infra (IP) reputation
Action
relationship mining
cyber entity extraction
graph reputation scoring
ruleset optimisation
regex parameterisation
data fusion (e.g.,
sevssionisation)
Λ(Χ)
Mentat
Use Cases in Defence and Security
Anomaly Detection
in Cyber Logs
Relation Extraction
from Investigator Reports
Detect New Buildings
in Aerial Photography
< 10 true positives
billions of log lines
~10K case reports
2 annotated ones
annotated data for UK,
0 annotations for
territories of interest
vast amounts of threat
intel and custom
Splunk queries
semi-standardised
language and HR
databases allows for easy
creation of labelling rules
huge rulesets already
exist in legacy software
“Mr. John SMITH”
suspect
Mentat
Conclusion
Experts often give you one label where they could
simply tell you the rule they are using to produce it
Labelling functions are more intuitive than feature
engineering; they can yield vastly more labels than
manual annotation; and they capture know-how.
Forcing the user to deal with the tradeoff between
accuracy and coverage is unnecessary. Boosting has
taught us how to combine weak heuristics.
At the end of all this, you can still use your favourite classifier.
Focus on the real pain point: dearth of labels.
UK Co. 08251230
Level 39
One Canada Place
Canary Wharf, London
United Kingdom
+44 203 6373330 www.ment.at
info@ment.at

More Related Content

Similar to Weakly supervised learning

Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdfHODIT12
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1MostafaHazemMostafaa
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfhemangppatel
 
Using binary classifiers
Using binary classifiersUsing binary classifiers
Using binary classifiersbutest
 
AI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data ScienceAI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data ScienceAbe
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Machine learning
Machine learningMachine learning
Machine learningRohit Kumar
 
Global CCISO Forum 2018 | AI vs Malware 2018
Global CCISO Forum 2018 | AI vs Malware 2018Global CCISO Forum 2018 | AI vs Malware 2018
Global CCISO Forum 2018 | AI vs Malware 2018EC-Council
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into valueNAVER D2
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptxSaharA84
 
Mass declassification sept 23 2010v2.1
Mass declassification sept 23 2010v2.1Mass declassification sept 23 2010v2.1
Mass declassification sept 23 2010v2.1Jeff Jonas
 
Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Florian Leitner
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Lukas Mandrake
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introductionAnas Jamil
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowOswald Campesato
 
Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Johan Blomme
 
Artificial intelligence Prolog Language
Artificial intelligence Prolog LanguageArtificial intelligence Prolog Language
Artificial intelligence Prolog LanguageREHMAT ULLAH
 

Similar to Weakly supervised learning (20)

Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
 
Using binary classifiers
Using binary classifiersUsing binary classifiers
Using binary classifiers
 
AI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data ScienceAI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data Science
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Machine learning
Machine learningMachine learning
Machine learning
 
Global CCISO Forum 2018 | AI vs Malware 2018
Global CCISO Forum 2018 | AI vs Malware 2018Global CCISO Forum 2018 | AI vs Malware 2018
Global CCISO Forum 2018 | AI vs Malware 2018
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into value
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx
 
Mass declassification sept 23 2010v2.1
Mass declassification sept 23 2010v2.1Mass declassification sept 23 2010v2.1
Mass declassification sept 23 2010v2.1
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 
Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
 
Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
 
Classification
ClassificationClassification
Classification
 
Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1
 
Artificial intelligence Prolog Language
Artificial intelligence Prolog LanguageArtificial intelligence Prolog Language
Artificial intelligence Prolog Language
 

Recently uploaded

SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 

Recently uploaded (20)

SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

Weakly supervised learning

  • 1. Poorly Supervised Learning How to engineer labels for machine learning when you don’t have any. Chris Anagnostopoulos Chief Data Scientist, www.ment.at Hon. Senior Lecturer, Imperial College
  • 3. Mentat (X, y) image courtesy of Stanford DAWN
  • 4. Mentat (X, y) Predict the class y of an object, given a description x of that object ˜f = argmin f2 nX i=1 L(f(xi), yi) ˜✓ = argmin ✓2⇥ nX i=1 L(f✓(xi), yi) Function approximation: Parametric estimation: Relies on a classifier (family of functions) and a labelled dataset. learning by example
  • 5. Mentat Few labels Multiple noisy labels Not missing at random Unit of analysis / missing context Class imbalance Expert prior rules vs (X, y)
  • 6. Mentat The X and the y in Cybersecurity
  • 7. Mentat The X in Cybersecurity web proxy logs windows authentication logs process logs packet capture email headers network flow DNS logs firewall logs Unix syslog A broad array of data formats, often capturing different aspects of the same activity.
  • 8. Mentat spamfilter An example: exfiltration web proxy logs windows authentication logs packet captureemail headers netflow firewall logs Unix syslog LinkedIn Email social engineering spearphishing User clicks malware download webproxy antivirus privilege escalation file scanning and encryption login malicious infrastructure firewall process logs
  • 9. Mentat The y in Cybersecurity actual attacks are (thankfully) very rare but plenty of signals worth surfacing: near-misses, early attack stages, risky behaviours, “suspect” traffic
  • 10. Mentat The y in Cybersecurity vs Mountains of “soft” labelling rules captured as search queries, automated rules (NIDS) or look-ups (intel) Very little to no time to produce manual labels at any reasonable scale
  • 11. Mentat The y in Cybersecurity Benign vs malware in pcap Network intrusion detection systems (NIDS) logs Threat intel: observables, indicators of compromise Threat intel: kill chain Threat intel: TTPs even when y is available, it’s often about a complex X obtained by data fusion often a noisy y is available by checking against a DB the user is looking for patterns and lookalikes no y exists for the vast majority of traffic
  • 13. Mentat Standard tools: data fusion and feature engineering expert-defined features are usually meant as scores
  • 14. Mentat A middle-ground between rules and classifiers • hard to explicitly describe how cats differ from dogs. • learning by example avoids that, but needs a gazillion labels • human learning is a sensible mix of examples and imperfect rules cat dog “cats are smaller than dogs” • the trick is to interpret rules as noisy, partial labels • a feature engineering step is needed to map rules to raw data
  • 15. Mentat A middle-ground between rules and classifiers Raw Data Features Expert Ruleset Noisy Labels 01011010110101000 10101101010111010 101101010111101… { ‘size’: ‘large’, ‘ear shape’: ‘pointy’, ’color:’ [‘white’, ‘orange’], ‘nose shape: … } if it is very large, then it is not a cat, if its ears are pointy, then it is likely a cat … Example ID Rule 1 Rule 2 Rule 3 Ground Truth 01.jpeg Cat Dog Cat Cat 02.jpeg Cat - Cat - 03.jpeg - - Dog - 04.jpeg - Cat - Cat
  • 16. Mentat A middle-ground between rules and classifiers Raw Data Features Expert Ruleset Noisy Labels 01011010110101000 10101101010111010 101101010111101… { ‘size’: ‘large’, ‘ear shape’: ‘pointy’, ’color:’ [‘white’, ‘orange’], ‘nose shape: … } if it is very large, then it is not a cat, if its ears are pointy, then it is likely a cat … Example ID Rule 1 Rule 2 Rule 3 Ground Truth 01.jpeg Cat Dog Cat Cat 02.jpeg Cat - Cat - 03.jpeg - - Dog - 04.jpeg - Cat - Cat ManualData Programming
  • 17. Mentat A middle-ground between rules and classifiers Raw Data Features Expert Ruleset Noisy Labels 01011010110101000 10101101010111010 101101010111101… { ‘size’: ‘large’, ‘ear shape’: ‘pointy’, ’color:’ [‘white’, ‘orange’], ‘nose shape: … } if it is very large, then it is not a cat, if its ears are pointy, then it is likely a cat … Example ID Rule 1 Rule 2 Rule 3 Ground Truth 01.jpeg Cat Dog Cat Cat 02.jpeg Cat - Cat - 03.jpeg - - Dog - 04.jpeg - Cat - Cat ManualData Programming Semantically Easy Semantically Tough
  • 18. Mentat A middle-ground between rules and classifiers Raw Data Features Expert Rules Noisy Tags Case Report Class: Confidential Jack Black interviewing suspect John Doe Date of interview: 11/01/2015 Summary: Regarding the incident in London on the 6th of December 2010, the main suspect was John Doe. - whichever name appears first in the document is the name of the suspect. - if in the first paragraph the pattern “Suspect: [Name]” appears, then that is the suspect name. It can also be called “Person of Interest”. - if the term “incident” or “crime” or “investigation” and the term “suspect” appears in the summary, followed by a name, and that name also appears in the first paragraph Example ID Rule 1 Rule 2 Rule 3 Ground Truth 01.doc J. Black - J. Doe J. Doe 02.doc … … … … 03.doc … … … … 04.doc … … … … ManualData Programming Labelling Functions
  • 19. Mentat Weakly supervised learning and data programming Labelling functions is a middle ground between feature engineering and labelling. It is a very information-efficient use of analyst time. Challenge: This was most striking in the chemical name task, where complicated surface forms (e.g., “9-acetyl-1,3,7-trimethyl- pyrim- idinedione") caused tokenization errors which had significant effects on system recall. This was corrected by utilizing a domain-specific tokenizer. domain primitives
  • 20. Mentat Weakly supervised learning and data programming Is all expert labelling heuristic? (as opposed to ground truth labelling) vision and other types of cognition are often hard to reduce to a number of heuristics but sometimes they are heuristics on top of “deep” domain primitives (e.g., a “circle” and a “line”)
  • 21. Mentat Weakly supervised learning and data programming Is all expert labelling heuristic? (as opposed to ground truth labelling) in non-sensory domains, we would expect expert labelling to be much more rules-based. eliciting the full decision tree voluntarily is impossible, however; in practice, it seems experts use a forest methodology, rather than one relying on a single deep tree regulation plays a role here. If you are expected to follow a playbook and document your work, then a procedural description is natural
  • 22. Mentat Weakly supervised learning and data programming Domain Primitives sessions, action sequences, statistical profiling (“unusually high”), data fusion (job roles), integration with threat intel knowledge DBs, jargon (“C2-like behaviour”, “domain-flux”) Log analysis: Threat intel: formal representation of TTPs, regex for cyber entities (hashes, IPs, malware names …), The whole point is to empower the expert to express their heuristics programmatically in an easy, accurate and scalable fashion
  • 23. Mentat Weakly supervised learning and data programming “Hey. We’re building a model for user agents. Does this one look suspicious?” “Yep! That’s a bad one.” “Great! Out of curiosity, why?” “The SeznamBot version is wrong” Where you had one additional label, now you have a labelling rule.
  • 24. Mentat Weakly supervised learning and data programming “Hey. We’re building a model for user agents. Does this one look suspicious?” “Yep! That’s a bad one.” “Great! Out of curiosity, why?” “The IP is blacklisted” This is a case of distant learning that might or might not induce a signal on the user agent itself (a small fraction of all attacks will have anomalous user agents, so label is very noisy)
  • 25. Mentat Weakly supervised learning and data programming “Hey. We’re building a model for user agents. Does this one look suspicious?” “Yep! That’s a bad one.” “Great! Out of curiosity, why?” “The IP is from Russia” This is case of missing context that could be turned into a labelling rule if we enrich the data. Again, the label might be noisy as far as the user agent is concerned - but that’s OK!
  • 26. Mentat Breaking free of (X,y): label engineering semi-supervised active learning weakly supervised multi-view learning unsupervised learning data programming distant learning importance reweighting cost-sensitive learning label-noise robustness boosting co-training
  • 27. Mentat Maths and algos E(X,y)⇠⇡L(f✓(xi), yi)Same loss as before: Enter labelling functions: “an IP that we have never seen before, and the TLD is not a .com, .co.uk, .org” “an IP that appears in our blacklist” “an IP that appears in our greylist” i : X ! { 1, 0, 1} missing value Generative model: µ↵, (⇤, Y ) = 1 2 mY i=1 i↵i1[⇤i=Y ] + i(1 ↵i)1[⇤i= Y ] + (1 i)1[⇤i=0] “each labelling function λ has a probability β of labelling an object and α of getting the label right” µ↵, (⇤, Y ) remember: we do not have any Ys
  • 28. Mentat Maximum Likelihood: Maths and algos µ↵?, ? (⇤, Y ) where ↵? , ? argmax ↵, X x2X log P(⇤,Y )⇠µ↵, (⇤ = (x)) “what values of α and β seem most likely given our unlabelled data X?”. Average over all possible labels. ˜✓ = argmax ✓ X x2X E(⇤,Y )⇠µ↵?, ? [L(x, y) | ⇤ = (x)]Noise-aware empirical loss: Assumptions: y ? (f(x) | (x)) “the labelling functions tell you all you need to know about the label” i(x) ? j(x) | y “the labelling functions are independent” So overlaps are very informative!
  • 29. Mentat Software Chatter (mostly text) Logs (device data) Intel Alerts Analysis Rulesets keyword extraction topic classification link analysis alias detection regular expressions SQL / SPL queries NIDS rules DPI classification behavioural profiling infra (IP) reputation Action relationship mining cyber entity extraction graph reputation scoring ruleset optimisation regex parameterisation data fusion (e.g., sevssionisation) Λ(Χ)
  • 30. Mentat Use Cases in Defence and Security Anomaly Detection in Cyber Logs Relation Extraction from Investigator Reports Detect New Buildings in Aerial Photography < 10 true positives billions of log lines ~10K case reports 2 annotated ones annotated data for UK, 0 annotations for territories of interest vast amounts of threat intel and custom Splunk queries semi-standardised language and HR databases allows for easy creation of labelling rules huge rulesets already exist in legacy software “Mr. John SMITH” suspect
  • 31. Mentat Conclusion Experts often give you one label where they could simply tell you the rule they are using to produce it Labelling functions are more intuitive than feature engineering; they can yield vastly more labels than manual annotation; and they capture know-how. Forcing the user to deal with the tradeoff between accuracy and coverage is unnecessary. Boosting has taught us how to combine weak heuristics. At the end of all this, you can still use your favourite classifier. Focus on the real pain point: dearth of labels.
  • 32. UK Co. 08251230 Level 39 One Canada Place Canary Wharf, London United Kingdom +44 203 6373330 www.ment.at info@ment.at