SlideShare a Scribd company logo
1 of 26
Ziyou Jiang, Lin Shi, Guowei Yang, Qing Wang*
SCPatcher: Mining Crowd
Security Discussions to Enrich
Secure Coding Practices
Presentation of ASE’23, Luxembourg
Email: ziyou2019@iscas.ac.cn
Date: 12/09/2023
2
Catalog
1. Background
2. Motivation
3. Approach
4. Experiment and Results
5. Discussion
6. Conclusion
1. Background
3
 IT Companies, Universities, Security Organizations
 Screenshot of Typical OWASP’s Public SCP
What is the public Secure Coding
Practices (SCP)?
…
Gather the guidelines for
secure software development
… (10 Guidelines)
Team’s
Experiences
① ID: OWASP Error
Handling and Logging (#5):
SCP: Properly free allocated
memory when error
conditions occur.
② ID: OWASP Access
Control (#13):
SCP: Restrict access to user
and data attributes and
policy information used by
access controls.
①
②
Developer #1: I am not
an experienced developer in
security, and I want to know
how to free memory with the
detailed codes to prevent such
insecure code, please!
But the public SCP doesn’t contain the
detailed coding examples!
Developer #2: I wonder
how to prevent the invalid
access to the web pages via
cookies, but I cannot find
detailed coding examples from
this public SCP.
Think: we need the detailed coding examples
to enrich the public SCP!
* OWASP’s Public SCP can be found on:
https://developers.google.cn/assistant/sdk/guides/library/python/
best-practices/privacy-and-security
1. Background
4
Where can we obtain the coding examples?
Knowledge Sharing
Platforms
Web Developers
Administrator
Development Teams
Project
Manager
Security
Practitioners
Bots
 People tend to share their experiences on
the different knowledge-sharing platforms.
 Security posts in Stack Overflow, for example, may contain
the detailed coding examples and their explanations.
Stack Overflow (SO), one of the
famous Q&A platform that contain
massive security posts.
 These discussions with SCP may be
referenced by security practitioners.
* Twitter reference of SCP can be found on https://twitter.com/puf/status/1483457400606580736
5
Catalog
1. Background
2. Motivation
3. Approach
4. Experiment and Results
5. Discussion
6. Conclusion
2. Motivation
6
Can we automatically mine the SCP from the post?
 Input security post #48884217
 Definition of SCP Specification:
< 𝐸𝑥𝑎𝑚+
, 𝐸𝑥𝑝𝑙+
, 𝐸𝑥𝑎𝑚−
, 𝐸𝑥𝑝𝑙−
, 𝐶𝑊𝐸_𝐼𝐷, 𝑃𝑢𝑏𝑙𝑖𝑐_𝑆𝐶𝑃 >
𝐸𝑥𝑎𝑚+
(Secure Coding Example), 𝐸𝑥𝑝𝑙+
(Secure Coding
Explanation), 𝐸𝑥𝑎𝑚−
(Insecure Coding Example), 𝐸𝑥𝑝𝑙−
(Insecure Coding Explanation)
𝐶𝑊𝐸_𝐼𝐷, 𝑃𝑢𝑏𝑙𝑖𝑐_𝑆𝐶𝑃: Relevant CWE and public SCP
 Contribution: (1) Build a dataset with insecure code and its fix;
(2) Enrich the public SCP with the detailed SCP.
7
Catalog
1. Background
2. Motivation
3. Approach
4. Experiment and Results
5. Discussion
6. Conclusion
3. Approach overview: SCPatcher
8
• Goal: SCPatcher aims to automatically extract the SCP specification from security posts to enrich
the public SCP.
• Input: Security posts from Stack Overflow.
• Output: The SCP specifications extracted from security post.
A: Extracting the Area of Coding Example and Explanation B: Extracting the Coding Example and Explanation C: Matching the Relevant CWE and Public SCP
3. SCPatcher-Step A
9
 Prompt Template: Clozing Template [Petroni et al., EMNLP’19]
 [X] The {insecure|secure} codes and sentences are [Y].
 𝑇𝑄: [Q]. The insecure codes and sentences are [Y].
 𝑇𝐴: [A]. The secure codes and sentences are [Y].
• Step A: Extracting the Area of Coding Example and Explanation
• Goal:
① The security posts contain massive non-relevant sentences, so directly extracting the coding examples and explanations is
relative difficult and inaccurate.
② Therefore, we locate the Area of coding examples (Line of Code (LOC) that contain the coding examples) and coding
explanations (Sentences that contain the coding explanations).
A: Extracting the Area of Coding Example and Explanation
 Fix-prompt Tuning: ℒ = 𝐶𝑟 𝑦𝑒𝑥𝑎𝑚, 𝐴_𝐸𝑥𝑎𝑚 + 𝐶𝑟(𝑦𝑒𝑥𝑝𝑙, 𝐴_𝐸𝑥𝑝𝑙)
where 𝐶𝑟(𝑦, 𝑓(𝑥)) is the CRINGE loss [Adolphs et al.] for generative LLM.
Results: GPT-3 achieves the highest results
on the four tasks of area extraction, with over
80% accuracy. It outperforms the other LLMs.
Therefore, we choose GPT-3 as LLM.
 Selection of Large Language Model (LLM):
Bert-base [Devlin et al., NAACL’19], Albert-
large [Lan et al., ICLR’20], GPT-2 [Betz et al.],
T5 [Raffel et al., JMLR’20], GPT-3 [Brown et
al, NeurIPS’20]
Table 1: The accuracy of LLM on extracting the areas.
3. SCPatcher-Step B-1
10
• Step B: Extracting the Coding Example and Explanation
• Goal: We aim to slice the coding examples and explanations from the previous extracted areas.
Security Feature Vector (SFV)
 Goal: Using the high-level representations of security-related knowledge
to enhance the extraction.
 Components of SFV: Five types of keywords (WW, TW, AW, DW, CW)
 The embedding of SFV (𝝃):
𝝃 = 𝑐𝑜𝑛𝑐𝑎𝑡[𝐺𝑙𝑜𝑣𝑒(𝑊𝑊, 𝑇𝑊, 𝐴𝑊, 𝐷𝑊, 𝐶𝑊)]
B: Extracting the Coding Example and Explanation
Note: GloVe (Pennington et al., EMNLP’14) is the representative
word embedding method.
Table 2: The components of security feature vector (SFV).
3. SCPatcher-Step B-2
11
Attention-based Selector: The core
module of the slicer and summarizer.
 Task-oriented Encoder: Embed the
element of each area [𝐸1, … 𝐸𝑛] to
[𝒆1, … 𝒆𝑛] for the further selection.
 Multi-head Attention: An effective
method to enhance the embeddings
with SFV, embedding the 𝒆𝑖 to 𝒄𝑖.
 Transformer Decoder: Output the
selection probabilities 𝑝(𝐸𝑖), select the
elements with cut-off value 𝑝 𝐸𝑖 > 𝜃.
• Step B: Extracting the Coding Example and Explanation
• Goal: We aim to slice the coding examples and explanations from the previous extracted areas.
B: Extracting the Coding Example and Explanation
Hierarchically Slicing Coding Examples
 Step 1: Determine the lengthy code with
max_length=15 (Hu et al., FCS’23)
 Step 2: Transform the LOC in the areas to Abstract
Syntax Tree (AST), embed the tree with CAST
encoder→Task-oriented Encoder
 Step 3: Split the AST with Functions, Comments,
and Empty Lines. Use the Attention Selector to
select the insecure & secure sub-AST. If len(sub-
AST)>max_length, repeat Step 1
 Step 4: Output the coding examples.
Summarizing Coding Explanations
 Step 1: Embed the sentences in the areas with
BERT encoder→Task-oriented Encoder
 Step 2: Summarize the sentences of coding
explanations with Attention Selector.
 Step 3: Output the summarized insecure & secure
coding explanations
Root If (user)
Func
(Auth)
Comt
(Auth)
Empt-
Line
Documen
t.cookie
Console.
warn
Selected AST
…
…
3. SCPatcher-Step B-3
12
• Step B: Extracting the Coding Example and Explanation
• Goal: We aim to slice the coding examples and explanations from the previous extracted areas.
B: Extracting the Coding Example and Explanation
Fine-tuning the Both Models
 Multitask Fine-tuning: Train the two models parallelly with
joint loss:
ℒ = 𝐻 𝑦𝑒𝑥𝑎𝑚, 𝑝(𝑆𝐿𝐶) + 𝐻(𝑦𝑒𝑥𝑝𝑙, 𝑝(𝑆𝑈𝑀))
Where Function 𝐻 𝑦, 𝑝(𝑥) is the cross-entropy training
loss. 𝑦𝑒𝑥𝑎𝑚, 𝑝(𝑆𝐿𝐶) are ground-truth and prediction of
coding example slicer, and 𝑦𝑒𝑥𝑝𝑙, 𝑝(𝑆𝑈𝑀) are ground-truth
and prediction of coding explanation summarizer.
3. SCPatcher-Step C
13
• Step C: Matching the Relevant CWE and Public SCP
• Goal: We aim to find the most relevant CWE the public SCP with the semantic similarity.
 Definition of Similarity: The similarity of words in SFV between
security posts and CWE/public SCP.
𝑠𝑖𝑚𝑐𝑤𝑒 = 𝐴𝑣𝑔 𝒘𝑖∈𝐶− 𝑚𝑎𝑥 𝒘𝑗∈𝐶𝑊𝐸 𝑐𝑜𝑠(𝒘𝑖, 𝒘𝑗)
𝑠𝑖𝑚𝑠𝑐𝑝 = 𝐴𝑣𝑔 𝒘𝑖∈𝐶+ 𝑚𝑎𝑥 𝒘𝑗∈𝑆𝐶𝑃 𝑐𝑜𝑠(𝒘𝑖, 𝒘𝑗)
C: Matching the Relevant CWE and Public SCP
Matching the CWE & public SCP
① Set the threshold 𝜃𝑐𝑤𝑒 and 𝜃𝑠𝑐𝑝.
② If the similarity is higher than 𝜃𝑐𝑤𝑒 and 𝜃𝑠𝑐𝑝, then match the
CWE and public SCP. Otherwise, set the “Unmatched”.
where 𝐶−
= {𝐸𝑥𝑎𝑚−
, 𝐸𝑥𝑝𝑙−
} (Secure Coding Examples &
Explanations), 𝐶+
= 𝐸𝑥𝑎𝑚+
, 𝐸𝑥𝑝𝑙+
(Secure Coding
Examples & Explanations); 𝒘𝑖 are embeddings of words in
SFV keywords, and 𝒘𝑗 indicates the embedding of most
similar word in CWE and public SCP.
14
Catalog
1. Background
2. Motivation
3. Approach
4. Experiment and Results
5. Discussion
6. Conclusion
4. Evaluation: Dataset
15
A: Dataset Collection
 Obtain the posts that are tagged with “security” and its
similar tags, such as “websecurity”, “danger” and
“firebase”, etc. (Yang et al., JCST’16).
Table 3: The statistics of our dataset. {“Post_ID”: “48884217”,
“Post_Content”: {“Title”: “Handling Firebase ID tokens on the client side with
vanilla JavaScrip”, “Question”: “I am writing a Firebase application in vanilla
JavaScrip,… [CODE1],…”, “Answer”: “If we don't want to implement a single
page application, …[CODE2], …”},
“Code”: {“[CODE1]”: “/* global firebase, firebaseui */ const uiConfig = …”,
“[CODE2]”: “document.cookie = '__session=' + token…”}
“Sec_Exam”: “document.cookie =…”, “Sec_Expl”: “If we don‘t want to
implement a single page application…”,
“Insec_Exam”: “firebase.auth().onAuthStateChanged…“Insec_Expl”:
“Cookies/localStorage/webStorage do not seem to be fully securable…”,
“CWE”: “CWE-79”, “SCP”: “Access_Control_13”}
 Format of labeled data
Manually Labeled SCP
Specification
B: Data Preprocessing
 Step 1: Filter the posts that do not contain <code> tags,
 Step 2: Filter the posts that receive negative scores, as
well as the non-English posts.
 Step 3: Tagging the code blocks in sequence, i.e., with
[CODE1]…[CODEn], then remove other HTML tags.
C: Dataset Labeling
 For each post, we manually label the insecure and
secure coding examples and explanations.
 Team with 2 senior researchers, 2 Ph.D. students, and
4 master students, with over 4-year experiences on
software security.
 Make discussions until a decision has been reached.
D: Dataset Augmentation
 Imbalanced and few-shot dataset.
 Augment the dataset with EDA (Wei et al., EMNLP-
IJCNLP’19), the most widely-used method.
4. Evaluation: Research Questions
16
 RQ1: What are the performances of SCPatcher on extracting the
secure or insecure coding example?
 RQ2: What are the performances of SCPatcher on extracting the
secure or insecure coding explanation?
 RQ3: What are the performances and capability of SCPatcher on
matching the CWE?
 RQ4: What are the performances and capability of SCPatcher on
matching the public SCP?
4. Experiment Design and Analysis: RQ1
17
Table 4: The baseline comparison on extracting the insecure and secure coding examples (%).
Three SOTA Baselines
① Baselines: GPT-3 (Original LLM), VulSlicer and DeepBalance
② Variants: Baseline+Area (Combine the baseline with extracted
areas), Baseline+Area+SFV (Combine the baseline with
extracted areas and SFV)
③ Note: All baselines are fine-tuned on our dataset.
RQ1: What are the performances of SCPatcher on extracting the coding example?
Three Evaluation Metrics
① Rouge-L (Measure the performance of code generation),
② MToken (Measure the similarity between code lines, i.e., LOC)
③ MLine (Measure the similarity between code tokens)
Results for RQ1
 SCPatcher outperforms all the baseline on extracting coding
examples. It reaches the highest Rouge-L, MToken, and MLine with
63.33%, 74.23%, and 72.38%, outperforming the best baseline with
4.03% (Rouge-L), 4.91% (MToken), and 2.73% (MLine).
 The time cost of SCPatcher is 11 hours, which is only longer than
the original VulSlicer and DeepBalance.
 Overall, SCPatcher has advantages over all the baselines on
extracting coding examples
∗ Due to the temporary lack of open embeddings for GPT-3, we will compare the GPT-3+Area+SFV when the embeddings are available.
4. Experiment Design and Analysis: RQ2
18
Table 5: The baseline comparison on extracting the insecure and secure coding explanations (%).
Three SOTA Baselines
① Baselines: GPT-3 (Original LLM), BERTSum, and BART
(SOTA baselines in text summarization)
② Variants: Baseline+Area (Combine the baseline with extracted
areas), Baseline+Area+SFV (Combine the baseline with
extracted areas and SFV)
③ Note: All baselines are fine-tuned on our dataset.
RQ2: What are the performances of SCPatcher on extracting the coding explanation?
Three Evaluation Metrics
① Precision (Measure the ratio of matched sentences in prediction)
② Recall (Measure the ratio of matched sentences in ground-truth)
③ F1 (Measure the harmony of Precision and Recall)
Results for RQ2
 SCPatcher outperforms all the baseline on extracting the coding
explanations. It reaches the highest Precision, Recall and F1 with
79.43%, 79.70%, and 79.56%, outperforming the best baseline GPT-
3+Area with 4.04% (Precision), 3.91% (Recall), and 3.97% (F1).
 The time cost of SCPatcher is 11 hours, which is only longer than
the original BERTSum.
 Overall, SCPatcher has advantages over all the baselines on
extracting coding explanations
∗ Due to the temporary lack of open embeddings for GPT-3, we will compare the GPT-3+Area+SFV when the embeddings are available.
4. Experiment Design and Analysis: RQ3
19
Table 6: The performances on matching the CWE.
 Statistics of SCP specification for CWE
Results for RQ3
 409 SCP specifications are matched with existing CWEs, while
38 posts are unmatched.
 SCPatcher can accurately matches the CWE with 81.43%
(Precision) on average.
 The top-3 maximum number of predictions are CWE-787 (with
76 SCP specifications), CWE-79 (68 SCP specifications), and
CWE-78 (58 SCP specifications).
Metric
 Precision, which can better reflect to the matching result of CWE
in the experiment:
𝐶𝑊𝐸𝑚𝑎𝑡𝑐ℎ
𝐶𝑊𝐸𝑝𝑟𝑒𝑑
× 100%
Table 7: The size of matched and unmatched CWE/public SCP.
RQ3: What are the performances and capability of SCPatcher on matching the CWE?
4. Experiment Design and Analysis: RQ4
20
Table 8: The performances on matching the public SCP.
 Statistics of SCP specification for public SCP.
Metric
 Precision, which can better reflect to the matching result of CWE
in the experiment:
𝑆𝐶𝑃𝑚𝑎𝑡𝑐ℎ
𝑆𝐶𝑃𝑝𝑟𝑒𝑑
× 100%
Results for RQ4
 392 SCP specifications are matched with existing OWASP types,
while 55 posts are unmatched.
 SCPatcher can enrich the public SCP with 392 SCP specifications,
3,074 LOC, and 1,967 sentences. It accurately matches the public
SCP with 78.34% (Precision) on average.
 The top-3 maximum number of predictions is Session
Management (with 57 SCP specifications), Access Control (53
SCP specifications), and File Management (46 SCP specifications).
RQ4: What are the performances and capability of SCPatcher on matching the public SCP?
Table 7: The size of matched and unmatched CWE/public SCP.
21
Catalog
1. Background
2. Motivation
3. Approach
4. Experiment and Results
5. Discussion
6. Conclusion
5. Discussion: Effect of Variants
22
 Comparison Results on different templates.
 Results: Our template (Cloze 2) outperforms the rest templates
with 3.15% MLine (Insecure Coding Example), 1.42% MLine
(Secure Coding Example), 1.16% F1 (Insecure Coding
Explanation), and 4.74% F1 (Secure Coding Explanation).
 Effect of #sentence on extracting coding explanations.
Table 9: The compared prompt templates.  Effect of #code-block on extracting coding examples.
 Results: SCPatcher outperforms the GPT3- Area with the
average 6.15% MLine (Insecure Coding Example) and 5.37%
MLine (Secure Coding Example); 4.94% F1 (Insecure Coding
Explanation), and 3.67% F1 (Secure Coding Explanation)
• Effect of Prompt Template • Effect of Post’s Sentences and Code-block Numbers
5. Discussion: Qualitative Evaluation
23
* The text with red dashed box indicates the incorrect extraction result.
• Case Study
 Experiment: Compare the SCPatcher with the SOTA
baseline, i.e., GPT3+Area, on extracting SCP specifications
on the motivating example.
 Result: We find that SCPatcher can accurately extract the
coding examples and explanations and successfully
matches the CWE and public SCP. The GPT-3+Area, on
the contrary, introduces incorrect code explanations and
irrelevant lines of code and thus matches the incorrect
CWE and public SCP
• Analysis of Unmatched 10% SCP Specification
 Case 1: Non-SCP-related Code (6.2%). Some code
examples in the security posts are not related to the SCPs.
 Case 2: Inaccurate Extraction (2.4%). Some extracted
SCP specifications are inaccurate and thus cannot be
correctly matched.
 Case 3: Potential New Public SCPs (1.4%). Some SCP
specifications may incorporate new public SCPs that have
not been incorporated by the OWASP.
Post #72865733 propose the WAF to
prevent Log4j in K8s cloud system.
24
Catalog
1. Background
2. Motivation
3. Approach
4. Experiment and Results
5. Discussion
6. Conclusion
6. Conclusion
25
• We introduce the SCPatcher, which is an automated approach to enrich
secure coding practices by mining crowd security discussions.
• We conduct an experimental evaluation of the performance of SCPatcher,
which shows that SCPatcher outperforms all baselines, together with a
user study with security practitioners, which further demonstrates its
usefulness in practice.
• We plan to improve our approach with more extended datasets from other
knowledge-sharing platforms. We also plan to enrich more public SCPs,
such as Google and UC Berkeley, to further evaluate the practicality.
https://doi.org/10.5281/zenodo.8254682
Thank you!
Q&A
Ziyou Jiang, ziyou2019@iscas.ac.cn
Institute of Software, CAS, Beijing, China
ASE’23, Luxembourg

More Related Content

Similar to ASE2023_SCPatcher_Presentation_V5.pptx

The First C# Project Analyzed
The First C# Project AnalyzedThe First C# Project Analyzed
The First C# Project AnalyzedPVS-Studio
 
2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial IntelligenceAlex Camargo
 
A Check of the Open-Source Project WinSCP Developed in Embarcadero C++ Builder
A Check of the Open-Source Project WinSCP Developed in Embarcadero C++ BuilderA Check of the Open-Source Project WinSCP Developed in Embarcadero C++ Builder
A Check of the Open-Source Project WinSCP Developed in Embarcadero C++ BuilderAndrey Karpov
 
Data Security Using Elliptic Curve Cryptography
Data Security Using Elliptic Curve CryptographyData Security Using Elliptic Curve Cryptography
Data Security Using Elliptic Curve CryptographyIJCERT
 
Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Emilio Coppa
 
Penn  State  University          School  of.docx
Penn  State  University            School  of.docxPenn  State  University            School  of.docx
Penn  State  University          School  of.docxdanhaley45372
 
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...Cyber Security Alliance
 
COLLEGE OF COMPUTING AND INFORMATICS Assignment – 4
COLLEGE OF COMPUTING AND INFORMATICS Assignment – 4COLLEGE OF COMPUTING AND INFORMATICS Assignment – 4
COLLEGE OF COMPUTING AND INFORMATICS Assignment – 4Voffelarin
 
100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects 100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects Andrey Karpov
 
COIT20262 Assignment 1 Term 1, 2018 Advanced Network Secur.docx
COIT20262 Assignment 1 Term 1, 2018 Advanced Network Secur.docxCOIT20262 Assignment 1 Term 1, 2018 Advanced Network Secur.docx
COIT20262 Assignment 1 Term 1, 2018 Advanced Network Secur.docxclarebernice
 
Learning to Spot and Refactor Inconsistent Method Names
Learning to Spot and Refactor Inconsistent Method NamesLearning to Spot and Refactor Inconsistent Method Names
Learning to Spot and Refactor Inconsistent Method NamesDongsun Kim
 
cscript_controller.pdf
cscript_controller.pdfcscript_controller.pdf
cscript_controller.pdfVcTrn1
 
4CS4-25-Java-Lab-Manual.pdf
4CS4-25-Java-Lab-Manual.pdf4CS4-25-Java-Lab-Manual.pdf
4CS4-25-Java-Lab-Manual.pdfamitbhachne
 

Similar to ASE2023_SCPatcher_Presentation_V5.pptx (20)

The First C# Project Analyzed
The First C# Project AnalyzedThe First C# Project Analyzed
The First C# Project Analyzed
 
Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016
 
2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence
 
Cppcheck
CppcheckCppcheck
Cppcheck
 
A Check of the Open-Source Project WinSCP Developed in Embarcadero C++ Builder
A Check of the Open-Source Project WinSCP Developed in Embarcadero C++ BuilderA Check of the Open-Source Project WinSCP Developed in Embarcadero C++ Builder
A Check of the Open-Source Project WinSCP Developed in Embarcadero C++ Builder
 
JavaSecure
JavaSecureJavaSecure
JavaSecure
 
MattsonTutorialSC14.pdf
MattsonTutorialSC14.pdfMattsonTutorialSC14.pdf
MattsonTutorialSC14.pdf
 
Data Security Using Elliptic Curve Cryptography
Data Security Using Elliptic Curve CryptographyData Security Using Elliptic Curve Cryptography
Data Security Using Elliptic Curve Cryptography
 
Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)
 
Penn  State  University          School  of.docx
Penn  State  University            School  of.docxPenn  State  University            School  of.docx
Penn  State  University          School  of.docx
 
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
 
COLLEGE OF COMPUTING AND INFORMATICS Assignment – 4
COLLEGE OF COMPUTING AND INFORMATICS Assignment – 4COLLEGE OF COMPUTING AND INFORMATICS Assignment – 4
COLLEGE OF COMPUTING AND INFORMATICS Assignment – 4
 
Binary Analysis - Luxembourg
Binary Analysis - LuxembourgBinary Analysis - Luxembourg
Binary Analysis - Luxembourg
 
CodeChecker summary 21062021
CodeChecker summary 21062021CodeChecker summary 21062021
CodeChecker summary 21062021
 
100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects 100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects
 
COIT20262 Assignment 1 Term 1, 2018 Advanced Network Secur.docx
COIT20262 Assignment 1 Term 1, 2018 Advanced Network Secur.docxCOIT20262 Assignment 1 Term 1, 2018 Advanced Network Secur.docx
COIT20262 Assignment 1 Term 1, 2018 Advanced Network Secur.docx
 
Learning to Spot and Refactor Inconsistent Method Names
Learning to Spot and Refactor Inconsistent Method NamesLearning to Spot and Refactor Inconsistent Method Names
Learning to Spot and Refactor Inconsistent Method Names
 
Srgoc dotnet
Srgoc dotnetSrgoc dotnet
Srgoc dotnet
 
cscript_controller.pdf
cscript_controller.pdfcscript_controller.pdf
cscript_controller.pdf
 
4CS4-25-Java-Lab-Manual.pdf
4CS4-25-Java-Lab-Manual.pdf4CS4-25-Java-Lab-Manual.pdf
4CS4-25-Java-Lab-Manual.pdf
 

Recently uploaded

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 

Recently uploaded (20)

chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 

ASE2023_SCPatcher_Presentation_V5.pptx

  • 1. Ziyou Jiang, Lin Shi, Guowei Yang, Qing Wang* SCPatcher: Mining Crowd Security Discussions to Enrich Secure Coding Practices Presentation of ASE’23, Luxembourg Email: ziyou2019@iscas.ac.cn Date: 12/09/2023
  • 2. 2 Catalog 1. Background 2. Motivation 3. Approach 4. Experiment and Results 5. Discussion 6. Conclusion
  • 3. 1. Background 3  IT Companies, Universities, Security Organizations  Screenshot of Typical OWASP’s Public SCP What is the public Secure Coding Practices (SCP)? … Gather the guidelines for secure software development … (10 Guidelines) Team’s Experiences ① ID: OWASP Error Handling and Logging (#5): SCP: Properly free allocated memory when error conditions occur. ② ID: OWASP Access Control (#13): SCP: Restrict access to user and data attributes and policy information used by access controls. ① ② Developer #1: I am not an experienced developer in security, and I want to know how to free memory with the detailed codes to prevent such insecure code, please! But the public SCP doesn’t contain the detailed coding examples! Developer #2: I wonder how to prevent the invalid access to the web pages via cookies, but I cannot find detailed coding examples from this public SCP. Think: we need the detailed coding examples to enrich the public SCP! * OWASP’s Public SCP can be found on: https://developers.google.cn/assistant/sdk/guides/library/python/ best-practices/privacy-and-security
  • 4. 1. Background 4 Where can we obtain the coding examples? Knowledge Sharing Platforms Web Developers Administrator Development Teams Project Manager Security Practitioners Bots  People tend to share their experiences on the different knowledge-sharing platforms.  Security posts in Stack Overflow, for example, may contain the detailed coding examples and their explanations. Stack Overflow (SO), one of the famous Q&A platform that contain massive security posts.  These discussions with SCP may be referenced by security practitioners. * Twitter reference of SCP can be found on https://twitter.com/puf/status/1483457400606580736
  • 5. 5 Catalog 1. Background 2. Motivation 3. Approach 4. Experiment and Results 5. Discussion 6. Conclusion
  • 6. 2. Motivation 6 Can we automatically mine the SCP from the post?  Input security post #48884217  Definition of SCP Specification: < 𝐸𝑥𝑎𝑚+ , 𝐸𝑥𝑝𝑙+ , 𝐸𝑥𝑎𝑚− , 𝐸𝑥𝑝𝑙− , 𝐶𝑊𝐸_𝐼𝐷, 𝑃𝑢𝑏𝑙𝑖𝑐_𝑆𝐶𝑃 > 𝐸𝑥𝑎𝑚+ (Secure Coding Example), 𝐸𝑥𝑝𝑙+ (Secure Coding Explanation), 𝐸𝑥𝑎𝑚− (Insecure Coding Example), 𝐸𝑥𝑝𝑙− (Insecure Coding Explanation) 𝐶𝑊𝐸_𝐼𝐷, 𝑃𝑢𝑏𝑙𝑖𝑐_𝑆𝐶𝑃: Relevant CWE and public SCP  Contribution: (1) Build a dataset with insecure code and its fix; (2) Enrich the public SCP with the detailed SCP.
  • 7. 7 Catalog 1. Background 2. Motivation 3. Approach 4. Experiment and Results 5. Discussion 6. Conclusion
  • 8. 3. Approach overview: SCPatcher 8 • Goal: SCPatcher aims to automatically extract the SCP specification from security posts to enrich the public SCP. • Input: Security posts from Stack Overflow. • Output: The SCP specifications extracted from security post. A: Extracting the Area of Coding Example and Explanation B: Extracting the Coding Example and Explanation C: Matching the Relevant CWE and Public SCP
  • 9. 3. SCPatcher-Step A 9  Prompt Template: Clozing Template [Petroni et al., EMNLP’19]  [X] The {insecure|secure} codes and sentences are [Y].  𝑇𝑄: [Q]. The insecure codes and sentences are [Y].  𝑇𝐴: [A]. The secure codes and sentences are [Y]. • Step A: Extracting the Area of Coding Example and Explanation • Goal: ① The security posts contain massive non-relevant sentences, so directly extracting the coding examples and explanations is relative difficult and inaccurate. ② Therefore, we locate the Area of coding examples (Line of Code (LOC) that contain the coding examples) and coding explanations (Sentences that contain the coding explanations). A: Extracting the Area of Coding Example and Explanation  Fix-prompt Tuning: ℒ = 𝐶𝑟 𝑦𝑒𝑥𝑎𝑚, 𝐴_𝐸𝑥𝑎𝑚 + 𝐶𝑟(𝑦𝑒𝑥𝑝𝑙, 𝐴_𝐸𝑥𝑝𝑙) where 𝐶𝑟(𝑦, 𝑓(𝑥)) is the CRINGE loss [Adolphs et al.] for generative LLM. Results: GPT-3 achieves the highest results on the four tasks of area extraction, with over 80% accuracy. It outperforms the other LLMs. Therefore, we choose GPT-3 as LLM.  Selection of Large Language Model (LLM): Bert-base [Devlin et al., NAACL’19], Albert- large [Lan et al., ICLR’20], GPT-2 [Betz et al.], T5 [Raffel et al., JMLR’20], GPT-3 [Brown et al, NeurIPS’20] Table 1: The accuracy of LLM on extracting the areas.
  • 10. 3. SCPatcher-Step B-1 10 • Step B: Extracting the Coding Example and Explanation • Goal: We aim to slice the coding examples and explanations from the previous extracted areas. Security Feature Vector (SFV)  Goal: Using the high-level representations of security-related knowledge to enhance the extraction.  Components of SFV: Five types of keywords (WW, TW, AW, DW, CW)  The embedding of SFV (𝝃): 𝝃 = 𝑐𝑜𝑛𝑐𝑎𝑡[𝐺𝑙𝑜𝑣𝑒(𝑊𝑊, 𝑇𝑊, 𝐴𝑊, 𝐷𝑊, 𝐶𝑊)] B: Extracting the Coding Example and Explanation Note: GloVe (Pennington et al., EMNLP’14) is the representative word embedding method. Table 2: The components of security feature vector (SFV).
  • 11. 3. SCPatcher-Step B-2 11 Attention-based Selector: The core module of the slicer and summarizer.  Task-oriented Encoder: Embed the element of each area [𝐸1, … 𝐸𝑛] to [𝒆1, … 𝒆𝑛] for the further selection.  Multi-head Attention: An effective method to enhance the embeddings with SFV, embedding the 𝒆𝑖 to 𝒄𝑖.  Transformer Decoder: Output the selection probabilities 𝑝(𝐸𝑖), select the elements with cut-off value 𝑝 𝐸𝑖 > 𝜃. • Step B: Extracting the Coding Example and Explanation • Goal: We aim to slice the coding examples and explanations from the previous extracted areas. B: Extracting the Coding Example and Explanation Hierarchically Slicing Coding Examples  Step 1: Determine the lengthy code with max_length=15 (Hu et al., FCS’23)  Step 2: Transform the LOC in the areas to Abstract Syntax Tree (AST), embed the tree with CAST encoder→Task-oriented Encoder  Step 3: Split the AST with Functions, Comments, and Empty Lines. Use the Attention Selector to select the insecure & secure sub-AST. If len(sub- AST)>max_length, repeat Step 1  Step 4: Output the coding examples. Summarizing Coding Explanations  Step 1: Embed the sentences in the areas with BERT encoder→Task-oriented Encoder  Step 2: Summarize the sentences of coding explanations with Attention Selector.  Step 3: Output the summarized insecure & secure coding explanations Root If (user) Func (Auth) Comt (Auth) Empt- Line Documen t.cookie Console. warn Selected AST … …
  • 12. 3. SCPatcher-Step B-3 12 • Step B: Extracting the Coding Example and Explanation • Goal: We aim to slice the coding examples and explanations from the previous extracted areas. B: Extracting the Coding Example and Explanation Fine-tuning the Both Models  Multitask Fine-tuning: Train the two models parallelly with joint loss: ℒ = 𝐻 𝑦𝑒𝑥𝑎𝑚, 𝑝(𝑆𝐿𝐶) + 𝐻(𝑦𝑒𝑥𝑝𝑙, 𝑝(𝑆𝑈𝑀)) Where Function 𝐻 𝑦, 𝑝(𝑥) is the cross-entropy training loss. 𝑦𝑒𝑥𝑎𝑚, 𝑝(𝑆𝐿𝐶) are ground-truth and prediction of coding example slicer, and 𝑦𝑒𝑥𝑝𝑙, 𝑝(𝑆𝑈𝑀) are ground-truth and prediction of coding explanation summarizer.
  • 13. 3. SCPatcher-Step C 13 • Step C: Matching the Relevant CWE and Public SCP • Goal: We aim to find the most relevant CWE the public SCP with the semantic similarity.  Definition of Similarity: The similarity of words in SFV between security posts and CWE/public SCP. 𝑠𝑖𝑚𝑐𝑤𝑒 = 𝐴𝑣𝑔 𝒘𝑖∈𝐶− 𝑚𝑎𝑥 𝒘𝑗∈𝐶𝑊𝐸 𝑐𝑜𝑠(𝒘𝑖, 𝒘𝑗) 𝑠𝑖𝑚𝑠𝑐𝑝 = 𝐴𝑣𝑔 𝒘𝑖∈𝐶+ 𝑚𝑎𝑥 𝒘𝑗∈𝑆𝐶𝑃 𝑐𝑜𝑠(𝒘𝑖, 𝒘𝑗) C: Matching the Relevant CWE and Public SCP Matching the CWE & public SCP ① Set the threshold 𝜃𝑐𝑤𝑒 and 𝜃𝑠𝑐𝑝. ② If the similarity is higher than 𝜃𝑐𝑤𝑒 and 𝜃𝑠𝑐𝑝, then match the CWE and public SCP. Otherwise, set the “Unmatched”. where 𝐶− = {𝐸𝑥𝑎𝑚− , 𝐸𝑥𝑝𝑙− } (Secure Coding Examples & Explanations), 𝐶+ = 𝐸𝑥𝑎𝑚+ , 𝐸𝑥𝑝𝑙+ (Secure Coding Examples & Explanations); 𝒘𝑖 are embeddings of words in SFV keywords, and 𝒘𝑗 indicates the embedding of most similar word in CWE and public SCP.
  • 14. 14 Catalog 1. Background 2. Motivation 3. Approach 4. Experiment and Results 5. Discussion 6. Conclusion
  • 15. 4. Evaluation: Dataset 15 A: Dataset Collection  Obtain the posts that are tagged with “security” and its similar tags, such as “websecurity”, “danger” and “firebase”, etc. (Yang et al., JCST’16). Table 3: The statistics of our dataset. {“Post_ID”: “48884217”, “Post_Content”: {“Title”: “Handling Firebase ID tokens on the client side with vanilla JavaScrip”, “Question”: “I am writing a Firebase application in vanilla JavaScrip,… [CODE1],…”, “Answer”: “If we don't want to implement a single page application, …[CODE2], …”}, “Code”: {“[CODE1]”: “/* global firebase, firebaseui */ const uiConfig = …”, “[CODE2]”: “document.cookie = '__session=' + token…”} “Sec_Exam”: “document.cookie =…”, “Sec_Expl”: “If we don‘t want to implement a single page application…”, “Insec_Exam”: “firebase.auth().onAuthStateChanged…“Insec_Expl”: “Cookies/localStorage/webStorage do not seem to be fully securable…”, “CWE”: “CWE-79”, “SCP”: “Access_Control_13”}  Format of labeled data Manually Labeled SCP Specification B: Data Preprocessing  Step 1: Filter the posts that do not contain <code> tags,  Step 2: Filter the posts that receive negative scores, as well as the non-English posts.  Step 3: Tagging the code blocks in sequence, i.e., with [CODE1]…[CODEn], then remove other HTML tags. C: Dataset Labeling  For each post, we manually label the insecure and secure coding examples and explanations.  Team with 2 senior researchers, 2 Ph.D. students, and 4 master students, with over 4-year experiences on software security.  Make discussions until a decision has been reached. D: Dataset Augmentation  Imbalanced and few-shot dataset.  Augment the dataset with EDA (Wei et al., EMNLP- IJCNLP’19), the most widely-used method.
  • 16. 4. Evaluation: Research Questions 16  RQ1: What are the performances of SCPatcher on extracting the secure or insecure coding example?  RQ2: What are the performances of SCPatcher on extracting the secure or insecure coding explanation?  RQ3: What are the performances and capability of SCPatcher on matching the CWE?  RQ4: What are the performances and capability of SCPatcher on matching the public SCP?
  • 17. 4. Experiment Design and Analysis: RQ1 17 Table 4: The baseline comparison on extracting the insecure and secure coding examples (%). Three SOTA Baselines ① Baselines: GPT-3 (Original LLM), VulSlicer and DeepBalance ② Variants: Baseline+Area (Combine the baseline with extracted areas), Baseline+Area+SFV (Combine the baseline with extracted areas and SFV) ③ Note: All baselines are fine-tuned on our dataset. RQ1: What are the performances of SCPatcher on extracting the coding example? Three Evaluation Metrics ① Rouge-L (Measure the performance of code generation), ② MToken (Measure the similarity between code lines, i.e., LOC) ③ MLine (Measure the similarity between code tokens) Results for RQ1  SCPatcher outperforms all the baseline on extracting coding examples. It reaches the highest Rouge-L, MToken, and MLine with 63.33%, 74.23%, and 72.38%, outperforming the best baseline with 4.03% (Rouge-L), 4.91% (MToken), and 2.73% (MLine).  The time cost of SCPatcher is 11 hours, which is only longer than the original VulSlicer and DeepBalance.  Overall, SCPatcher has advantages over all the baselines on extracting coding examples ∗ Due to the temporary lack of open embeddings for GPT-3, we will compare the GPT-3+Area+SFV when the embeddings are available.
  • 18. 4. Experiment Design and Analysis: RQ2 18 Table 5: The baseline comparison on extracting the insecure and secure coding explanations (%). Three SOTA Baselines ① Baselines: GPT-3 (Original LLM), BERTSum, and BART (SOTA baselines in text summarization) ② Variants: Baseline+Area (Combine the baseline with extracted areas), Baseline+Area+SFV (Combine the baseline with extracted areas and SFV) ③ Note: All baselines are fine-tuned on our dataset. RQ2: What are the performances of SCPatcher on extracting the coding explanation? Three Evaluation Metrics ① Precision (Measure the ratio of matched sentences in prediction) ② Recall (Measure the ratio of matched sentences in ground-truth) ③ F1 (Measure the harmony of Precision and Recall) Results for RQ2  SCPatcher outperforms all the baseline on extracting the coding explanations. It reaches the highest Precision, Recall and F1 with 79.43%, 79.70%, and 79.56%, outperforming the best baseline GPT- 3+Area with 4.04% (Precision), 3.91% (Recall), and 3.97% (F1).  The time cost of SCPatcher is 11 hours, which is only longer than the original BERTSum.  Overall, SCPatcher has advantages over all the baselines on extracting coding explanations ∗ Due to the temporary lack of open embeddings for GPT-3, we will compare the GPT-3+Area+SFV when the embeddings are available.
  • 19. 4. Experiment Design and Analysis: RQ3 19 Table 6: The performances on matching the CWE.  Statistics of SCP specification for CWE Results for RQ3  409 SCP specifications are matched with existing CWEs, while 38 posts are unmatched.  SCPatcher can accurately matches the CWE with 81.43% (Precision) on average.  The top-3 maximum number of predictions are CWE-787 (with 76 SCP specifications), CWE-79 (68 SCP specifications), and CWE-78 (58 SCP specifications). Metric  Precision, which can better reflect to the matching result of CWE in the experiment: 𝐶𝑊𝐸𝑚𝑎𝑡𝑐ℎ 𝐶𝑊𝐸𝑝𝑟𝑒𝑑 × 100% Table 7: The size of matched and unmatched CWE/public SCP. RQ3: What are the performances and capability of SCPatcher on matching the CWE?
  • 20. 4. Experiment Design and Analysis: RQ4 20 Table 8: The performances on matching the public SCP.  Statistics of SCP specification for public SCP. Metric  Precision, which can better reflect to the matching result of CWE in the experiment: 𝑆𝐶𝑃𝑚𝑎𝑡𝑐ℎ 𝑆𝐶𝑃𝑝𝑟𝑒𝑑 × 100% Results for RQ4  392 SCP specifications are matched with existing OWASP types, while 55 posts are unmatched.  SCPatcher can enrich the public SCP with 392 SCP specifications, 3,074 LOC, and 1,967 sentences. It accurately matches the public SCP with 78.34% (Precision) on average.  The top-3 maximum number of predictions is Session Management (with 57 SCP specifications), Access Control (53 SCP specifications), and File Management (46 SCP specifications). RQ4: What are the performances and capability of SCPatcher on matching the public SCP? Table 7: The size of matched and unmatched CWE/public SCP.
  • 21. 21 Catalog 1. Background 2. Motivation 3. Approach 4. Experiment and Results 5. Discussion 6. Conclusion
  • 22. 5. Discussion: Effect of Variants 22  Comparison Results on different templates.  Results: Our template (Cloze 2) outperforms the rest templates with 3.15% MLine (Insecure Coding Example), 1.42% MLine (Secure Coding Example), 1.16% F1 (Insecure Coding Explanation), and 4.74% F1 (Secure Coding Explanation).  Effect of #sentence on extracting coding explanations. Table 9: The compared prompt templates.  Effect of #code-block on extracting coding examples.  Results: SCPatcher outperforms the GPT3- Area with the average 6.15% MLine (Insecure Coding Example) and 5.37% MLine (Secure Coding Example); 4.94% F1 (Insecure Coding Explanation), and 3.67% F1 (Secure Coding Explanation) • Effect of Prompt Template • Effect of Post’s Sentences and Code-block Numbers
  • 23. 5. Discussion: Qualitative Evaluation 23 * The text with red dashed box indicates the incorrect extraction result. • Case Study  Experiment: Compare the SCPatcher with the SOTA baseline, i.e., GPT3+Area, on extracting SCP specifications on the motivating example.  Result: We find that SCPatcher can accurately extract the coding examples and explanations and successfully matches the CWE and public SCP. The GPT-3+Area, on the contrary, introduces incorrect code explanations and irrelevant lines of code and thus matches the incorrect CWE and public SCP • Analysis of Unmatched 10% SCP Specification  Case 1: Non-SCP-related Code (6.2%). Some code examples in the security posts are not related to the SCPs.  Case 2: Inaccurate Extraction (2.4%). Some extracted SCP specifications are inaccurate and thus cannot be correctly matched.  Case 3: Potential New Public SCPs (1.4%). Some SCP specifications may incorporate new public SCPs that have not been incorporated by the OWASP. Post #72865733 propose the WAF to prevent Log4j in K8s cloud system.
  • 24. 24 Catalog 1. Background 2. Motivation 3. Approach 4. Experiment and Results 5. Discussion 6. Conclusion
  • 25. 6. Conclusion 25 • We introduce the SCPatcher, which is an automated approach to enrich secure coding practices by mining crowd security discussions. • We conduct an experimental evaluation of the performance of SCPatcher, which shows that SCPatcher outperforms all baselines, together with a user study with security practitioners, which further demonstrates its usefulness in practice. • We plan to improve our approach with more extended datasets from other knowledge-sharing platforms. We also plan to enrich more public SCPs, such as Google and UC Berkeley, to further evaluate the practicality. https://doi.org/10.5281/zenodo.8254682
  • 26. Thank you! Q&A Ziyou Jiang, ziyou2019@iscas.ac.cn Institute of Software, CAS, Beijing, China ASE’23, Luxembourg

Editor's Notes

  1. Hello everyone, my name is Ziyou Jiang, and I’m from Institute of Software, Chinese Academy of Sciences. Today, I’m going to introduce our work, the SCPatcher, which aims to mine the crowd discussions to enrich the secure coding practices.
  2. We will introduce our work in these six sections.
  3. First, we introduce what is the public secure coding practices. The security teams of IT companies, universities and organizations, analyze the open-source projects, and find how to develop the reliable software. Then, they formulate these findings to the guidelines to instruct the developers who need to build a secure software, and publish it to the community. These guidelines are secure coding practices in the public, also represented as public SCP in our paper. However, we find that the public SCPs are not specific enough. Taking the OWASP as an example. For developers that are not familiar with the software security, they pointed out that the OWASP’s SCPs are usually one-sentence, and hard to understand. Beyond these public SCPs, they still need some detailed codes to help them. So, we believe that the public SCP may need to be enriched with the coding examples.
  4. But where can we find these coding examples? We pay attention to the knowledge-sharing platforms. Many developers discuss and share their security experiences on these platforms. The stack overflow, for example, is one of the Q&A platform that contain the massive security posts. In this example, the post has the detailed codes for how to make Firebase data models secure, and security practitioner in Twitter referenced it and said that it is a typical SCP for the Firebase security. So, we can see that, the crowd security discussions are helpful, which may contain the SCP with some detailed coding examples.
  5. However, if people need to manually mine these knowledge, it is very costly and low-efficient. So, can we propose a method to automatically extract these information? What is the most important knowledge that need to be extracted? We present our motivation example here. From the left figure, we can see that, a post in Stack Overflow contain question and accepted answer. The question part has the insecure coding example and its explanation, and the accepted answer proposes the corresponding secure information. Also, we find that the post can match “Access Control” in OWASP, which is an one-sentence SCP without any coding details, so we believe that we can use this coding examples and explanations to enrich this SCP. We define the extracted and matched components as SCP specification, and we build an automatic approach to extract it, named as SCPatcher.
  6. The SCPatcher contain three major steps. The first is the area extraction; The second step extract the coding examples and explanations from previous located areas. And the third step is two matchers, we can not only match the public SCP, but also find the relevant CWE, which is the weakness type in the common weakness enumeration.
  7. For the step A, Since the security posts always contain massive non-relevant codes and sentences. We find that is too difficult to extract them directly. But if we first locate the areas, it will be much easier to extract these information. To extract the areas, we utilize the generative large-language model, which is recently very useful in natural language process. We first define the prompt. We utilize the cloze-template, which has advantages in our task compared with others. We will discuss it later. For this prompt, [X] is the input post, and [Y] is the cloze-testing results, which contain the areas. For the question part, we use these TQ to extract the areas of insecure coding examples and explanations; for the accepted answer, we use the TA to extract the area of secure information. Second, we select the LLM from the five candidate SOTAs, and we find that GPT-three has the best performance, so we use it as our model to select the areas. Finally, we fine-tune the model with the Cringe loss, which is a typical method for training the generative LLM.
  8. For the step B, we propose two modules, the hierarchical slicer to extract the coding examples, and the summarizer to extract coding explanations from these areas. Before the extraction, we define the security feature vector, which is a high-level representations of security knowledge. We define the SFV according to the previous works, which are five types of keywords, from WW to CW. We utilize the GloVe method to embed all the extracted words, and concatenate them to a single vector. This vector is used to enhance the accuracy of extraction.
  9. Then, we propose the attention-based selector, which is the core module of slicer and summarizer. The selector has a task-oriented encoder, which can embed each element of the area to a single embedding. Then it use the multi-head attention to enhance the embeddings with the SFV, which is a novel method that introduce the external knowledge to the embeddings. Finally, it use the transformer decoder to predict the probabilities, which is compatible to the multi-head attention. The probabilities decide which element need to be chosen, and we use some cut-off value to determine the selected elements. For the slicer, we first determine the lengthy code. According to Hu’s work, the lengthy code is difficult to be understood when it exceeds 15 lines, so we slice these codes to make them understandable. Then, we transform the code to the Abstract Syntax Tree, which is a typical code representation. We use the CAST as the task-oriented encoder, which is a novel method to embed the AST trees. Third, we utilize selector with three hierarchical levels, function, comments, and empty lines. We iterative slice the lengthy codes until it is understandable. For the summarizer, we choose the BERT as the task-oriented encoder, which is a representative method to embed the sentences with bi-directional contexts. Then, we use the selector to summarize the final coding explanations.
  10. To fine-tune the both models, considering the time aspect, we use the multitask fine-tuning to train the two modules jointly.
  11. So, we have obtained the coding examples and explanations, and we need to match them to the CWE and public SCP. We define the semantic similarity based on the security keywords, and we propose the equations, as is shown in this page. The equation outputs the highest cosine similarity of keywords between posts and CWE & public SCP. We also defines two thresholds. If the semantic similarity is higher than threshold, the post can be successfully matched. Otherwise, we need to set the SCP specification as “unmatched”. So here we can finally obtain the result, the six-tuple SCP specification.
  12. To evaluate the performance of SCPatcher. We first build the dataset with four steps, the data collection, preprocessing, labeling and augmentation. We first collect the original dataset with “security” and its equivalent tags Then, we manually label it, where we build a team with eight experienced lablers in software security, and they discuss with each other until the labels are decided. After the labeling, we find that the dataset is imbalanced and few-shot, so we augment them to the balanced dataset with EDA, a widely-used data augmentation method.
  13. To evaluate the performance of SCPatcher, we propose four RQs.
  14. For RQ-one, we analyze the performance on extracting the coding examples, with three SOTA baselines. We compare it with GPT-three, and other two baselines, the VulSlicer and Deepbalance. To demonstrate the ability of area and SFV, we propose two variants for the baselines. Note that, all the baselines are fine-tuned on our dataset. We use the three metrics on code generation to evaluate the performances, the Rouge-L, MatchToken and MatchLine. From the results, we can see that, SCPacther can outperform the SOTA baseline, the GPT-three plus area, on all the metrics, and the time cost is only longer than the two original baselines for VulSlicer and DeepBalance. So, we believe that SCPatcher has the advantage on extracting coding examples.
  15. Samely, for RQ-two, we analyze the performance on extracting the coding explanation. We also choose GPT-three and other two SOTA summarization baselines. We use the precision, recall, and F-one, as the metrics. The result shows that. SCPatcher also can outperforms all the baselines in this RQ, which also has advantages on extracting the coding explanations.
  16. For RQ-three, we analyze the performance of matching the CWE. We use the precision as the metric, which can better reflect to the matching rate for CWE. The results of the CWE matcher is around 80 percent, which means the CWE is accurately matched with our model.
  17. For RQ-four, we analyze the performance on enriching the public SCP in OWASP (We take OWASP as example). We calculate the LOC and sentences that can be used to enrich the public SCP. We find that, SCPatcher can enrich the public SCP with around 400 specifications, 3000 LOC, and 2000 sentences, which means it has the practical ability to enrich the public SCP. Also, the precision is 78 percent, which means the SCP matcher is also accurate.
  18. Finally, in the discussion, we analyze the effect of prompt template. We design two types of prompts, the prefix template is the typical Q&A format template, and the cloze templates. The comparison results shows that the cloze template is better than the prefix in our task, the cloze two (our template), is the best for SCPatcher. We also analyze the effect of sentence numbers and code-block numbers in the security posts, and we compare the SCPatcher with the GPT-three plus Area. We find that SCPatcher are mostly higher than the GPT-three in all intervals, which means the it can adapt to different types of security posts.
  19. We use the case study to demonstrate the advantage of SCPatcher. We find that, to extract the SCP specification from the motivation example, SCPatcher is more accurate than the GPT-three, and GPT-three has some mispredictions in this figure. Although SCPatcher is accurate, we find that around ten percent of specifications are unmatched. We manually analyze these posts, then find three types of bad cases. Except to the non-scp codes and inaccurate extraction, we find that around 1.4 percent cases may have new public SCPs that are not incorporated by the OWASP, like the WAF to the log4j. These specifications may be used to complement the OWASP library later.
  20. Overall, we conclude that our paper. We introduce the SCPatcher, which is an automatic method to enrich the public SCPs by mining the security discussions. We build the dataset and conduct the experiments to evaluate the performances, together with some ablation studies. We have released the dataset and approach in this zenodo link. In the future, we plan to improve our dataset and the approach, and evaluate its practical usage on other public SCPs.
  21. Thank you for listening.