SlideShare a Scribd company logo
1 of 20
Noise Contrastive Estimation-based Matching Framework for Low-
Resource Security Attack Pattern Recognition
Tu Nguyen, Nedim Šrndić, Alexander Neth
Huawei R&D Munich
{tu.nguyen, nedim.srndic, alexander.neth}@huawei.com
Cyber Threat Intelligence (CTI)
“Cyber threat intelligence (CTI) is knowledge, skills and experience-
based information concerning the occurrence and assessment of both
cyber and physical threats and threat actors that is intended to help
mitigate potential attacks and harmful events occurring in cyberspace.”
“Cyber threat intelligence sources include open source intelligence, social
media intelligence, human Intelligence, technical intelligence, device log
files, forensically acquired data or intelligence from the internet traffic and
data derived for the deep and dark web.”
[...] We witnessed that the botnet was spread via mass phishing, using a VB−scripted Excel
attachment to download the second stage from xx.warez22.info. The same domain was used
for C&C via HTTP. The botnet distributed a file encryption module we named VBenc. [...]
[1] https://en.wikipedia.org/wiki/Cyber_threat_intelligence
Cyber Threat Intelligence (CTI)
[...] We witnessed that the botnet was spread via mass phishing, using a VB−scripted Excel
attachment to download the second stage from xx.warez22.info. The same domain was used
for C&C via HTTP. The botnet distributed a file encryption module we named VBenc. [...]
[1] https://en.wikipedia.org/wiki/Cyber_threat_intelligence
[1.1] https://attack.mitre.org/techniques/T1566/
[1.2] https://attack.mitre.org/techniques/T1486/
Phishing (T1566 ) Data Encrypted for Impact (T1486)
Cyber Threat Intelligence (CTI)
[...] We witnessed that the botnet was spread via mass phishing, using a VB−scripted Excel
attachment to download the second stage from xx.warez22.info. The same domain was used
for C&C via HTTP. The botnet distributed a file encryption module we named VBenc. [...]
[1] https://en.wikipedia.org/wiki/Cyber_threat_intelligence
[1.1] https://attack.mitre.org/techniques/T1566/
[1.2] https://attack.mitre.org/techniques/T1486/
[1.3] https://detect-respond.blogspot.com/2013/03/the-pyramid-of-pain.html
Phishing (T1566 ) Data Encrypted for Impact (T1486)
Tactics, Techniques and Procedures (TTP)
[2] https://attack.mitre.org/#
The most high-level (and valuable) concept of CTI is called attack pattern. It
abstracts a security attack by describing its goal ("tactic"), algorithm
("technique") and potential implementations ("procedures").
Over 600 such techniques, 14 tactics and thousands of procedures are curated by
the MITRE ATT&CK [2] ontology, which denotes attack patterns as TTPs (tactics,
techniques and procedures).
Title
Description
Metadata
Procedure
examples
Mitigations
Detection
References
Scarce information
“Adversaries may steal data by exfiltrating it over a
different protocol than that of the existing command and
control channel. ‘’
Abstract formulation
Implications:
1. Concepts:
a. Adversary
b. (Valuable) data
c. Communication protocol
d. C&C channel
2. Prerequisites:
a. Adversary has infiltrated
b. Adversary established C&C channel
over protocol A
3. Actions:
a. Adversary exfiltrates data over
protocol B != A
Tactics, Techniques and Procedures (TTP)
[3] screenshot from https://attack.mitre.org/techniques/T1048/
Semantic
Understanding
Attack graph External Remote
Services (T1133)
Exploit Public-Facing
Application (T1190) …
TTP recognition
• CTI entities and relations are very technical and complex
• Understanding and mapping text to TTPs are challenging even for human
• Learning-based TTP mapping is potential but lacking datasets
• TTPs are numerous and complex
• MITRE ATT&K refine / extend the KB frequently as attacks evolve, thus
system needs to be able to adapt / extend to new TTPs
Tactics, Techniques and Procedures (TTP) Mapping
[4] screenshot from https://www.mandiant.com/resources/blog/apt41-initiates-global-intrusion-campaign-using-multiple-exploits
[4]
TTP Classification
TTP Classification Challenges
Large label space
Complex, technical
writing style
Noisy & missing
labels
Long-tail labels
TTP Mapping -> Text Matching
Direct matching
signal
textual profile
TTP Mapping -> Text Matching
Direct matching
signal
textual profile
NCE
Recap of Noise Contrastive Estimation (NCE)
• 𝑋: input space, 𝑌: label space, 𝑌 < ∞
• 𝑠𝜃: 𝑋 × 𝑌 → R differentiable in 𝜃 ∈ 𝑅𝑑
• Goal: sample from 𝑝 𝑥, 𝑦 , estimate 𝜃: 𝑥 → 𝑎𝑟𝑔𝑚𝑎𝑥 𝑦∈𝑌 𝑠𝜃 𝑥, 𝑦 has optimal 0-1 loss
• p𝜃 y x =
exp(𝑠𝜃 𝑥,𝑦 )
𝑦ℎ𝑎𝑡∈𝑌
exp 𝑠𝜃 𝑥,𝑦ℎ𝑎𝑡
• Cross-entropy loss:
• 𝐽𝐶𝐸 𝜃 = 𝐸 𝑥,𝑦 ~𝑝𝑜𝑝 [− log 𝑝𝜃(𝑦|𝑥)]
• Difficult to compute when 𝑌 ≫
• NCE: ∀ 1 ≤ 𝑘 ≤ 𝐾, 2 ≤ 𝐾 ≪ 𝑌
• 𝜋𝜃 𝑘 𝑥, 𝑦 1:𝐾 ) =
exp(𝑠𝜃 𝑥,𝑦𝑘 )
𝑘′=1
𝑘
exp(𝑠𝜃 𝑥,𝑦 𝑘′ )
• 𝐽𝑁𝐶𝐸 𝜃 = 𝐸 𝑥,𝑦1 ~𝑝(𝑥,𝑦) & 𝑦 2:𝐾 ~𝑞 𝐾−1 [− log 𝜋𝜃 1 𝑥, 𝑦 1:𝐾
• 𝑦 2:𝐾 ∈ 𝑌 𝐾−1
: negative samples from 𝑞 𝑠. 𝑡. 𝑞 𝑦 =
1
𝑌
𝑜𝑟 𝑞 𝑦 = 𝑝𝑜𝑝 𝑦
• Negative samples:
• Unconditional: ‘easy’ 𝑦 2:𝐾 ~ 𝑞 𝐾−1
• Conditional: ‘hard’ 𝑦 2:𝐾 ~ ℎ . 𝑥, 𝑦1
Text-TTP Matching
• Classification: 𝑠𝜃: 𝑋 × 𝑌 → R differentiable in 𝜃 ∈ 𝑅𝑑
• 𝑋: input space, 𝑌: label space, 𝑌 ≪ ∞
• Matching: 𝑠𝜃: 𝑋 × 𝑌 → R differentiable in 𝜃 ∈ 𝑅𝑑
• X, Y: input space, 𝑌 ≪ ∞
• naturally with inductive bias
• triangle inequality: 𝜃 𝑥, 𝑧 ≤ 𝜃 𝑥, 𝑦 + 𝜃 𝑦, 𝑧
• Loss: similar to classification
• 𝐽 𝜃 = −1/N 𝑖=1
𝑁
log
exp(𝑠𝜃 𝑥𝑖,𝑦 𝑖,1 )
𝑘′=1
𝐾
exp(𝑠𝜃 𝑥𝑖,𝑦 𝑖,𝑘′ )
• 𝑥1, 𝑦1,1 . . (𝑥𝑁, 𝑦𝑁,1) are training pairs
• 𝑦𝑖,2. . 𝑦 𝑖,𝐾 ~ q(. |xi, yi,1) are K-1 negative samples
Text-TTP Matching
w/o contrastive
learning
/w contrastive
learning
Learning to Compare
TTPi
TTPi_0
TTPi_1
TTPi_2
text
TTPj
TTPj_0
TTPj_1
TTPj_2
+
-
-
-
-
-
+
+
Sub-
Technique
+ Pos label
+ Missing (noisy) label
Neg label
-
Contrast
Weak
Contrast
TTPi
TTPi_0
TTPi_1
TTPi_2
text
TTPj
TTPj_0
TTPj_1
TTPj_2
+
-
-
-
-
-
+
+
𝑝(1|𝑥, 𝑦)
𝑥 𝑦
𝑝(1|𝑥, 𝑦)
Ranking
TTPi_1 +
TTPi +
TTP -
TTP -
TTP -
TTP -
TTPj +
𝑝(1|𝑥, 𝑦)
Optimization
Asymmetric- focus loss
𝛼-balanced focus loss
• Remedy the imbalance btw. # neg vs. # pos
• 𝛼-balanced:
• Down-scale the contribution of negs
• Asymmetric:
• Down-scale the contribution of
possibly noisy labels
• Datasets
• ATT&CK procedure examples
• Source: MITRE ATT&CK website
• Large and very cleanly labeled but too short and too simple
• Short text (summarized from threat reports)
• TRAM
• Source: crowdsourced by MITRE
• Short text, noisy labels
• Expert
• Source: diverse and paragraph-level text from threat
reports labeled by security experts
• High quality but relatively small
• Derived ATT&CK procedure examples
• Source: reference links from MITRE ATT&CK
• Paragraph-level text but relatively noisy labels
• Evaluation protocol
• Train on 72.5%, validate on 12.5%, test on 15% of each dataset
• We combine the training splits cross datasets.
Experiment Settings
Experiment Results
Model Analysis
• As the negative sample size increases, the model tends to converge
faster and exhibit better performance.
• It appears that there are no additional benefits beyond a size of 60,
which corresponds to 10% of the label space.
• Our models exhibit a more pronounced skewness in their distribution,
resembling that of a pure classification model like NAPKINXC.
• Broader distribution at the head, indicating inclination to assign
comparable probabilities to multiple labels.
Empirical Studies
High-level
tactics
Hallucinati
on (non
existing)
Not
explicitly
mentioned
Thanks!

More Related Content

Similar to Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition

RSA SIGNATURE: BEHIND THE SCENES
RSA SIGNATURE: BEHIND THE SCENESRSA SIGNATURE: BEHIND THE SCENES
RSA SIGNATURE: BEHIND THE SCENESacijjournal
 
Algorithms - "Chapter 2 getting started"
Algorithms - "Chapter 2 getting started"Algorithms - "Chapter 2 getting started"
Algorithms - "Chapter 2 getting started"Ra'Fat Al-Msie'deen
 
NTC_Tensor flow 深度學習快速上手班_Part1 -機器學習
NTC_Tensor flow 深度學習快速上手班_Part1 -機器學習NTC_Tensor flow 深度學習快速上手班_Part1 -機器學習
NTC_Tensor flow 深度學習快速上手班_Part1 -機器學習NTC.im(Notch Training Center)
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習Mark Chang
 
Rsa Signature: Behind The Scenes
Rsa Signature: Behind The Scenes Rsa Signature: Behind The Scenes
Rsa Signature: Behind The Scenes acijjournal
 
PyTorch Introduction
PyTorch IntroductionPyTorch Introduction
PyTorch IntroductionYash Kawdiya
 
Computer_Science_Sr.Sec_2021-22.pdf
Computer_Science_Sr.Sec_2021-22.pdfComputer_Science_Sr.Sec_2021-22.pdf
Computer_Science_Sr.Sec_2021-22.pdfSathyaPrabha11
 
Tech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflowTech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflowRamdhan Rizki
 
Master Thesis Presentation
Master Thesis PresentationMaster Thesis Presentation
Master Thesis PresentationMohamed Sobh
 
Paper id 24201456
Paper id 24201456Paper id 24201456
Paper id 24201456IJRAT
 
Lattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epochLattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epochDefCamp
 
Order-Picking-Policies.ppt
Order-Picking-Policies.pptOrder-Picking-Policies.ppt
Order-Picking-Policies.pptTaspiyaAfroz
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKOlivier Grisel
 
IRJET- A Data Mining with Big Data Disease Prediction
IRJET-  	  A Data Mining with Big Data Disease PredictionIRJET-  	  A Data Mining with Big Data Disease Prediction
IRJET- A Data Mining with Big Data Disease PredictionIRJET Journal
 
Practical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear GroupsPractical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear GroupsMITSUNARI Shigeo
 

Similar to Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition (20)

Algorithms.
Algorithms. Algorithms.
Algorithms.
 
RSA SIGNATURE: BEHIND THE SCENES
RSA SIGNATURE: BEHIND THE SCENESRSA SIGNATURE: BEHIND THE SCENES
RSA SIGNATURE: BEHIND THE SCENES
 
Algorithms - "Chapter 2 getting started"
Algorithms - "Chapter 2 getting started"Algorithms - "Chapter 2 getting started"
Algorithms - "Chapter 2 getting started"
 
NTC_Tensor flow 深度學習快速上手班_Part1 -機器學習
NTC_Tensor flow 深度學習快速上手班_Part1 -機器學習NTC_Tensor flow 深度學習快速上手班_Part1 -機器學習
NTC_Tensor flow 深度學習快速上手班_Part1 -機器學習
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習
 
Rsa Signature: Behind The Scenes
Rsa Signature: Behind The Scenes Rsa Signature: Behind The Scenes
Rsa Signature: Behind The Scenes
 
PyTorch Introduction
PyTorch IntroductionPyTorch Introduction
PyTorch Introduction
 
Computer_Science_Sr.Sec_2021-22.pdf
Computer_Science_Sr.Sec_2021-22.pdfComputer_Science_Sr.Sec_2021-22.pdf
Computer_Science_Sr.Sec_2021-22.pdf
 
1504 basic statistics
1504 basic statistics1504 basic statistics
1504 basic statistics
 
Tech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflowTech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflow
 
Master Thesis Presentation
Master Thesis PresentationMaster Thesis Presentation
Master Thesis Presentation
 
Paper id 24201456
Paper id 24201456Paper id 24201456
Paper id 24201456
 
Lattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epochLattice based Merkle for post-quantum epoch
Lattice based Merkle for post-quantum epoch
 
Order-Picking-Policies.ppt
Order-Picking-Policies.pptOrder-Picking-Policies.ppt
Order-Picking-Policies.ppt
 
lecture_01.ppt
lecture_01.pptlecture_01.ppt
lecture_01.ppt
 
Annotations.pdf
Annotations.pdfAnnotations.pdf
Annotations.pdf
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
IRJET- A Data Mining with Big Data Disease Prediction
IRJET-  	  A Data Mining with Big Data Disease PredictionIRJET-  	  A Data Mining with Big Data Disease Prediction
IRJET- A Data Mining with Big Data Disease Prediction
 
Practical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear GroupsPractical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
Practical Two-level Homomorphic Encryption in Prime-order Bilinear Groups
 
Alg1
Alg1Alg1
Alg1
 

Recently uploaded

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 

Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition

  • 1. Noise Contrastive Estimation-based Matching Framework for Low- Resource Security Attack Pattern Recognition Tu Nguyen, Nedim Šrndić, Alexander Neth Huawei R&D Munich {tu.nguyen, nedim.srndic, alexander.neth}@huawei.com
  • 2. Cyber Threat Intelligence (CTI) “Cyber threat intelligence (CTI) is knowledge, skills and experience- based information concerning the occurrence and assessment of both cyber and physical threats and threat actors that is intended to help mitigate potential attacks and harmful events occurring in cyberspace.” “Cyber threat intelligence sources include open source intelligence, social media intelligence, human Intelligence, technical intelligence, device log files, forensically acquired data or intelligence from the internet traffic and data derived for the deep and dark web.” [...] We witnessed that the botnet was spread via mass phishing, using a VB−scripted Excel attachment to download the second stage from xx.warez22.info. The same domain was used for C&C via HTTP. The botnet distributed a file encryption module we named VBenc. [...] [1] https://en.wikipedia.org/wiki/Cyber_threat_intelligence
  • 3. Cyber Threat Intelligence (CTI) [...] We witnessed that the botnet was spread via mass phishing, using a VB−scripted Excel attachment to download the second stage from xx.warez22.info. The same domain was used for C&C via HTTP. The botnet distributed a file encryption module we named VBenc. [...] [1] https://en.wikipedia.org/wiki/Cyber_threat_intelligence [1.1] https://attack.mitre.org/techniques/T1566/ [1.2] https://attack.mitre.org/techniques/T1486/ Phishing (T1566 ) Data Encrypted for Impact (T1486)
  • 4. Cyber Threat Intelligence (CTI) [...] We witnessed that the botnet was spread via mass phishing, using a VB−scripted Excel attachment to download the second stage from xx.warez22.info. The same domain was used for C&C via HTTP. The botnet distributed a file encryption module we named VBenc. [...] [1] https://en.wikipedia.org/wiki/Cyber_threat_intelligence [1.1] https://attack.mitre.org/techniques/T1566/ [1.2] https://attack.mitre.org/techniques/T1486/ [1.3] https://detect-respond.blogspot.com/2013/03/the-pyramid-of-pain.html Phishing (T1566 ) Data Encrypted for Impact (T1486)
  • 5. Tactics, Techniques and Procedures (TTP) [2] https://attack.mitre.org/# The most high-level (and valuable) concept of CTI is called attack pattern. It abstracts a security attack by describing its goal ("tactic"), algorithm ("technique") and potential implementations ("procedures"). Over 600 such techniques, 14 tactics and thousands of procedures are curated by the MITRE ATT&CK [2] ontology, which denotes attack patterns as TTPs (tactics, techniques and procedures).
  • 6. Title Description Metadata Procedure examples Mitigations Detection References Scarce information “Adversaries may steal data by exfiltrating it over a different protocol than that of the existing command and control channel. ‘’ Abstract formulation Implications: 1. Concepts: a. Adversary b. (Valuable) data c. Communication protocol d. C&C channel 2. Prerequisites: a. Adversary has infiltrated b. Adversary established C&C channel over protocol A 3. Actions: a. Adversary exfiltrates data over protocol B != A Tactics, Techniques and Procedures (TTP) [3] screenshot from https://attack.mitre.org/techniques/T1048/
  • 7. Semantic Understanding Attack graph External Remote Services (T1133) Exploit Public-Facing Application (T1190) … TTP recognition • CTI entities and relations are very technical and complex • Understanding and mapping text to TTPs are challenging even for human • Learning-based TTP mapping is potential but lacking datasets • TTPs are numerous and complex • MITRE ATT&K refine / extend the KB frequently as attacks evolve, thus system needs to be able to adapt / extend to new TTPs Tactics, Techniques and Procedures (TTP) Mapping [4] screenshot from https://www.mandiant.com/resources/blog/apt41-initiates-global-intrusion-campaign-using-multiple-exploits [4]
  • 9. TTP Classification Challenges Large label space Complex, technical writing style Noisy & missing labels Long-tail labels
  • 10. TTP Mapping -> Text Matching Direct matching signal textual profile
  • 11. TTP Mapping -> Text Matching Direct matching signal textual profile NCE
  • 12. Recap of Noise Contrastive Estimation (NCE) • 𝑋: input space, 𝑌: label space, 𝑌 < ∞ • 𝑠𝜃: 𝑋 × 𝑌 → R differentiable in 𝜃 ∈ 𝑅𝑑 • Goal: sample from 𝑝 𝑥, 𝑦 , estimate 𝜃: 𝑥 → 𝑎𝑟𝑔𝑚𝑎𝑥 𝑦∈𝑌 𝑠𝜃 𝑥, 𝑦 has optimal 0-1 loss • p𝜃 y x = exp(𝑠𝜃 𝑥,𝑦 ) 𝑦ℎ𝑎𝑡∈𝑌 exp 𝑠𝜃 𝑥,𝑦ℎ𝑎𝑡 • Cross-entropy loss: • 𝐽𝐶𝐸 𝜃 = 𝐸 𝑥,𝑦 ~𝑝𝑜𝑝 [− log 𝑝𝜃(𝑦|𝑥)] • Difficult to compute when 𝑌 ≫ • NCE: ∀ 1 ≤ 𝑘 ≤ 𝐾, 2 ≤ 𝐾 ≪ 𝑌 • 𝜋𝜃 𝑘 𝑥, 𝑦 1:𝐾 ) = exp(𝑠𝜃 𝑥,𝑦𝑘 ) 𝑘′=1 𝑘 exp(𝑠𝜃 𝑥,𝑦 𝑘′ ) • 𝐽𝑁𝐶𝐸 𝜃 = 𝐸 𝑥,𝑦1 ~𝑝(𝑥,𝑦) & 𝑦 2:𝐾 ~𝑞 𝐾−1 [− log 𝜋𝜃 1 𝑥, 𝑦 1:𝐾 • 𝑦 2:𝐾 ∈ 𝑌 𝐾−1 : negative samples from 𝑞 𝑠. 𝑡. 𝑞 𝑦 = 1 𝑌 𝑜𝑟 𝑞 𝑦 = 𝑝𝑜𝑝 𝑦 • Negative samples: • Unconditional: ‘easy’ 𝑦 2:𝐾 ~ 𝑞 𝐾−1 • Conditional: ‘hard’ 𝑦 2:𝐾 ~ ℎ . 𝑥, 𝑦1
  • 13. Text-TTP Matching • Classification: 𝑠𝜃: 𝑋 × 𝑌 → R differentiable in 𝜃 ∈ 𝑅𝑑 • 𝑋: input space, 𝑌: label space, 𝑌 ≪ ∞ • Matching: 𝑠𝜃: 𝑋 × 𝑌 → R differentiable in 𝜃 ∈ 𝑅𝑑 • X, Y: input space, 𝑌 ≪ ∞ • naturally with inductive bias • triangle inequality: 𝜃 𝑥, 𝑧 ≤ 𝜃 𝑥, 𝑦 + 𝜃 𝑦, 𝑧 • Loss: similar to classification • 𝐽 𝜃 = −1/N 𝑖=1 𝑁 log exp(𝑠𝜃 𝑥𝑖,𝑦 𝑖,1 ) 𝑘′=1 𝐾 exp(𝑠𝜃 𝑥𝑖,𝑦 𝑖,𝑘′ ) • 𝑥1, 𝑦1,1 . . (𝑥𝑁, 𝑦𝑁,1) are training pairs • 𝑦𝑖,2. . 𝑦 𝑖,𝐾 ~ q(. |xi, yi,1) are K-1 negative samples
  • 15. Learning to Compare TTPi TTPi_0 TTPi_1 TTPi_2 text TTPj TTPj_0 TTPj_1 TTPj_2 + - - - - - + + Sub- Technique + Pos label + Missing (noisy) label Neg label - Contrast Weak Contrast TTPi TTPi_0 TTPi_1 TTPi_2 text TTPj TTPj_0 TTPj_1 TTPj_2 + - - - - - + + 𝑝(1|𝑥, 𝑦) 𝑥 𝑦 𝑝(1|𝑥, 𝑦) Ranking TTPi_1 + TTPi + TTP - TTP - TTP - TTP - TTPj + 𝑝(1|𝑥, 𝑦) Optimization Asymmetric- focus loss 𝛼-balanced focus loss • Remedy the imbalance btw. # neg vs. # pos • 𝛼-balanced: • Down-scale the contribution of negs • Asymmetric: • Down-scale the contribution of possibly noisy labels
  • 16. • Datasets • ATT&CK procedure examples • Source: MITRE ATT&CK website • Large and very cleanly labeled but too short and too simple • Short text (summarized from threat reports) • TRAM • Source: crowdsourced by MITRE • Short text, noisy labels • Expert • Source: diverse and paragraph-level text from threat reports labeled by security experts • High quality but relatively small • Derived ATT&CK procedure examples • Source: reference links from MITRE ATT&CK • Paragraph-level text but relatively noisy labels • Evaluation protocol • Train on 72.5%, validate on 12.5%, test on 15% of each dataset • We combine the training splits cross datasets. Experiment Settings
  • 18. Model Analysis • As the negative sample size increases, the model tends to converge faster and exhibit better performance. • It appears that there are no additional benefits beyond a size of 60, which corresponds to 10% of the label space. • Our models exhibit a more pronounced skewness in their distribution, resembling that of a pure classification model like NAPKINXC. • Broader distribution at the head, indicating inclination to assign comparable probabilities to multiple labels.

Editor's Notes

  1. Hello everyone, I’m very pleased to present our recent work in the field of AI and cybersecurity. This is a joint work of Nedim Srndic, Alexander Neth and myself. We belong to the AI4Sec team in Huawei Research Center Munich. The title of the paper is: ‘Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition’. We provide an efficient learning method to recognize attack-patterns in a textual described cyber attack report. This is a crucial task for Cyber Threat Intelligence. Lets get started.
  2. We start with the definition of Cyber Threat Intelligence, quoted by Wikipedia. Cyber Threat Intelligence (CTI), an essential pillar of cybersecurity, involves collecting and analyzing information on cyber threats, including threat actors, their campaigns, and malware, helping timely threat detection and defense efforts. Textual threat reports or blogs, published over the web by security vendors, are considered an important source of CTI, where security vendors diligently investigate and promptly detail intricate attacks. We show here a made up example of how a text in the threat report looks like, we also highlight the CTI entities in the text.
  3. There are at least two different attack patterns, associating to different parts of the text. The first is a Phishing activity and the second described by “the botnet distributed a file encryption” is a ‘Data Encrypted for Impact’ attack pattern.
  4. Widely popular in the security community is this pyramid of pain, indicating how challenging and also how valuable it is to figure out what CTIs are used in a particular attack. We can see what have been seen in our example, the domain name, the tool and the TTPs appear here. TTPs or attack patterns are considered as the tip of this pyramid, indicating that recognizing them is the hardest part and also the most valuable part of the CTI.
  5. The widely adopted knowledge base, where TTPs are conceptualized, standardized and pre-defined is called ATTACK, provided by MITRE organization. You can see in this visualization how the collection or ontology of TTPs looks like in the knowledge base.
  6. We show in this slide how an actual technique looks like in the knowledge base. The example is the ‘Exfiltration Over Alternative Protocol’. There are certain metadata provided together with the TTPs, the title, the textual description, the procedure examples and the suggestions on Mitigations and Detection. In the bottom of the page are the references, where the procedure examples are derived from. For the textual description, we can see that it has an overall abstract formulation. There are concepts to be understood i.e., … There are also Prerequisites i.e., and there are actions needed to be satisfied.
  7. The task we introduce in this work is TTP Mapping, essentially to extract the pre-defined TTPs from the text. And here is an illustration of the steps. From a threat report, we need a semantic understanding of the text, from that we try to map the interactions of the entities to the attack patterns or TTPs in the second step. Ideally then, an attack graph can be represented by the extract TTPs. There encompasses different challenges for different steps of the TTP mapping process. For semantic understanding or text encoding, CTI entities and relations are naturally very technical and complex. For TTP Mapping, the task is challenging even for human. The Learning-based TTP mapping is potential but consequently we lack sizable and quality datasets as TTPs are as well numerous and complex. Lastly, MITRE ATT&K refine / extend the KB frequently as attacks evolve, thus system needs to be able to adapt / extend to new TTPs.
  8. Conventionally, TTP Mapping is designed as a vanilla multi-class classification task, where a text is encoded then classified using a softmax function to the labels – which are TTPs.
  9. However, there are certain problems with this softmax-based learning. First, it is the complex, technical writing style of the text input. Then come the challenges of the (1) large label space, (2) the noisy and missing labels, as TTPs are especially hard to precisely annotate and (3) the typical long-tail labels. Overall these challenges bring a great obstacle to this way of learning.
  10. In this work we propose an alternative learning setting where we avoid the direct optimization for discrimination between data points in the large label space. Concretely, we transform the task into a text matching problem, as can be seen between the input text and the TTP. This allow us to utilize the direct semantic similarity between the input-label pairs to derive a calibrated assignment score.
  11. To support the low-resource learning, with limited labels, we further introduce the use of Noise Contrastive Learning (NCE) to add further learning signals to our learning optimization paradigm.
  12. We provide a quick recap to Noise Contrastive Estimation (NCE). NCE is a powerful parameter estimation method for loglinear models, which avoids calculation of the partition function (see in the formula) at each training step, when used with Cross Entropy, which is really computationally demanding for the large label space. For the NCE estimation method, the target is instead transformed to the discrimination between positive and negative examples.
  13. In the new text-ttp matching paradigm, it is very similar to the classification one, except for now the scoring function is actually also the matching function of the pair. Thus it allows us to naturally rank the pair candidates. Further the matching function introduces a inherent ‘inductive bias’, thus allowing it to perform well also for the long-tail or even unseen labels.
  14. Here we provide an example of the characteristic of the ‘ranking’-like TTP distribution. We can see that, without NCE, the TTPs tend to concentrate more at the top of the ranking, while with NCE, the TTPs tend to be polarized, with very few TTPs occur at the top-k ranking – which is ideal for our case.
  15. our proposed NCE-based learn-to-compare framework. The NCE mechanism alleviates the complexity of the moderately sized label space and helps the matching model learn the distinctive representations of the labels (TTPs). We introduce further two re-focusing remedies on top of NCEs to allow the framework reduce the level of contrasting to missing or noisy labels.
  16. Overall, our proposed NCE-based models greatly outperform the baselines. Particularly, the asymmetric loss-based model achieves the best performance across most metrics and datasets. We also observe the significant improvements of the two loss variants (i.e., α-balanced and asymmetric) over the vanilla InfoNCE. In addition, the models demonstrates a substantial improvement at the cutoff threshold @1 (∼10%) in comparison to @3 (∼5%). This supports the effectiveness of our matching network in classification settings. Interestingly, the Dynamic Triplet-loss, which is also a common contrastive learning loss, underperformed in our experiments – indicating the lack of the probabilistic characteristics that are required for the ranking task.
  17. We provide further analysis into (1) the size of the negative samples and (2) the properties of our ranking distributions.
  18. In the end, we also compare to ChatGPT 4.0 for the empirical studies. We observe rather interesting answers from the LLM, however, it still underperforms for this task.