ASE2023_SCPatcher_Presentation_V5.pptx

Ziyou Jiang, Lin Shi, Guowei Yang, Qing Wang*
SCPatcher: Mining Crowd
Security Discussions to Enrich
Secure Coding Practices
Presentation of ASE’23, Luxembourg
Email: ziyou2019@iscas.ac.cn
Date: 12/09/2023

2
Catalog
1. Background
2. Motivation
3. Approach
4. Experiment and Results
5. Discussion
6. Conclusion

1. Background
3
 IT Companies, Universities, Security Organizations
 Screenshot of Typical OWASP’s Public SCP
What is the public Secure Coding
Practices (SCP)?
…
Gather the guidelines for
secure software development
… (10 Guidelines)
Team’s
Experiences
① ID: OWASP Error
Handling and Logging (#5):
SCP: Properly free allocated
memory when error
conditions occur.
② ID: OWASP Access
Control (#13):
SCP: Restrict access to user
and data attributes and
policy information used by
access controls.
①
②
Developer #1: I am not
an experienced developer in
security, and I want to know
how to free memory with the
detailed codes to prevent such
insecure code, please!
But the public SCP doesn’t contain the
detailed coding examples!
Developer #2: I wonder
how to prevent the invalid
access to the web pages via
cookies, but I cannot find
detailed coding examples from
this public SCP.
Think: we need the detailed coding examples
to enrich the public SCP!
* OWASP’s Public SCP can be found on:
https://developers.google.cn/assistant/sdk/guides/library/python/
best-practices/privacy-and-security

1. Background
4
Where can we obtain the coding examples?
Knowledge Sharing
Platforms
Web Developers
Administrator
Development Teams
Project
Manager
Security
Practitioners
Bots
 People tend to share their experiences on
the different knowledge-sharing platforms.
 Security posts in Stack Overflow, for example, may contain
the detailed coding examples and their explanations.
Stack Overflow (SO), one of the
famous Q&A platform that contain
massive security posts.
 These discussions with SCP may be
referenced by security practitioners.
* Twitter reference of SCP can be found on https://twitter.com/puf/status/1483457400606580736

5
Catalog
1. Background
2. Motivation
3. Approach
5. Discussion
6. Conclusion

2. Motivation
6
Can we automatically mine the SCP from the post?
 Input security post #48884217
 Definition of SCP Specification:
< 𝐸𝑥𝑎𝑚+
, 𝐸𝑥𝑝𝑙+
, 𝐸𝑥𝑎𝑚−
, 𝐸𝑥𝑝𝑙−
, 𝐶𝑊𝐸_𝐼𝐷, 𝑃𝑢𝑏𝑙𝑖𝑐_𝑆𝐶𝑃 >
𝐸𝑥𝑎𝑚+
(Secure Coding Example), 𝐸𝑥𝑝𝑙+
(Secure Coding
Explanation), 𝐸𝑥𝑎𝑚−
(Insecure Coding Example), 𝐸𝑥𝑝𝑙−
(Insecure Coding Explanation)
𝐶𝑊𝐸_𝐼𝐷, 𝑃𝑢𝑏𝑙𝑖𝑐_𝑆𝐶𝑃: Relevant CWE and public SCP
 Contribution: (1) Build a dataset with insecure code and its fix;
(2) Enrich the public SCP with the detailed SCP.

7
Catalog
1. Background
2. Motivation
3. Approach
5. Discussion
6. Conclusion

3. Approach overview: SCPatcher
8
• Goal: SCPatcher aims to automatically extract the SCP specification from security posts to enrich
the public SCP.
• Input: Security posts from Stack Overflow.
• Output: The SCP specifications extracted from security post.
A: Extracting the Area of Coding Example and Explanation B: Extracting the Coding Example and Explanation C: Matching the Relevant CWE and Public SCP

3. SCPatcher-Step A
9
 Prompt Template: Clozing Template [Petroni et al., EMNLP’19]
 [X] The {insecure|secure} codes and sentences are [Y].
 𝑇𝑄: [Q]. The insecure codes and sentences are [Y].
 𝑇𝐴: [A]. The secure codes and sentences are [Y].
• Step A: Extracting the Area of Coding Example and Explanation
• Goal:
① The security posts contain massive non-relevant sentences, so directly extracting the coding examples and explanations is
relative difficult and inaccurate.
② Therefore, we locate the Area of coding examples (Line of Code (LOC) that contain the coding examples) and coding
explanations (Sentences that contain the coding explanations).
A: Extracting the Area of Coding Example and Explanation
 Fix-prompt Tuning: ℒ = 𝐶𝑟 𝑦𝑒𝑥𝑎𝑚, 𝐴_𝐸𝑥𝑎𝑚 + 𝐶𝑟(𝑦𝑒𝑥𝑝𝑙, 𝐴_𝐸𝑥𝑝𝑙)
where 𝐶𝑟(𝑦, 𝑓(𝑥)) is the CRINGE loss [Adolphs et al.] for generative LLM.
Results: GPT-3 achieves the highest results
on the four tasks of area extraction, with over
80% accuracy. It outperforms the other LLMs.
Therefore, we choose GPT-3 as LLM.
 Selection of Large Language Model (LLM):
Bert-base [Devlin et al., NAACL’19], Albert-
large [Lan et al., ICLR’20], GPT-2 [Betz et al.],
T5 [Raffel et al., JMLR’20], GPT-3 [Brown et
al, NeurIPS’20]
Table 1: The accuracy of LLM on extracting the areas.

3. SCPatcher-Step B-1
10
• Step B: Extracting the Coding Example and Explanation
• Goal: We aim to slice the coding examples and explanations from the previous extracted areas.
Security Feature Vector (SFV)
 Goal: Using the high-level representations of security-related knowledge
to enhance the extraction.
 Components of SFV: Five types of keywords (WW, TW, AW, DW, CW)
 The embedding of SFV (𝝃):
𝝃 = 𝑐𝑜𝑛𝑐𝑎𝑡[𝐺𝑙𝑜𝑣𝑒(𝑊𝑊, 𝑇𝑊, 𝐴𝑊, 𝐷𝑊, 𝐶𝑊)]
B: Extracting the Coding Example and Explanation
Note: GloVe (Pennington et al., EMNLP’14) is the representative
word embedding method.
Table 2: The components of security feature vector (SFV).

11
Attention-based Selector: The core
module of the slicer and summarizer.
 Task-oriented Encoder: Embed the
element of each area [𝐸1, … 𝐸𝑛] to
[𝒆1, … 𝒆𝑛] for the further selection.
 Multi-head Attention: An effective
method to enhance the embeddings
with SFV, embedding the 𝒆𝑖 to 𝒄𝑖.
 Transformer Decoder: Output the
selection probabilities 𝑝(𝐸𝑖), select the
elements with cut-off value 𝑝 𝐸𝑖 > 𝜃.
Hierarchically Slicing Coding Examples
 Step 1: Determine the lengthy code with
max_length=15 (Hu et al., FCS’23)
 Step 2: Transform the LOC in the areas to Abstract
Syntax Tree (AST), embed the tree with CAST
encoder→Task-oriented Encoder
 Step 3: Split the AST with Functions, Comments,
and Empty Lines. Use the Attention Selector to
select the insecure & secure sub-AST. If len(sub-
AST)>max_length, repeat Step 1
 Step 4: Output the coding examples.
Summarizing Coding Explanations
 Step 1: Embed the sentences in the areas with
BERT encoder→Task-oriented Encoder
 Step 2: Summarize the sentences of coding
explanations with Attention Selector.
 Step 3: Output the summarized insecure & secure
coding explanations
Root If (user)
Func
(Auth)
Comt
(Auth)
Empt-
Line
Documen
t.cookie
Console.
warn
Selected AST
…
…

12
Fine-tuning the Both Models
 Multitask Fine-tuning: Train the two models parallelly with
joint loss:
ℒ = 𝐻 𝑦𝑒𝑥𝑎𝑚, 𝑝(𝑆𝐿𝐶) + 𝐻(𝑦𝑒𝑥𝑝𝑙, 𝑝(𝑆𝑈𝑀))
Where Function 𝐻 𝑦, 𝑝(𝑥) is the cross-entropy training
loss. 𝑦𝑒𝑥𝑎𝑚, 𝑝(𝑆𝐿𝐶) are ground-truth and prediction of
coding example slicer, and 𝑦𝑒𝑥𝑝𝑙, 𝑝(𝑆𝑈𝑀) are ground-truth
and prediction of coding explanation summarizer.

3. SCPatcher-Step C
13
• Step C: Matching the Relevant CWE and Public SCP
• Goal: We aim to find the most relevant CWE the public SCP with the semantic similarity.
 Definition of Similarity: The similarity of words in SFV between
security posts and CWE/public SCP.
𝑠𝑖𝑚𝑐𝑤𝑒 = 𝐴𝑣𝑔 𝒘𝑖∈𝐶− 𝑚𝑎𝑥 𝒘𝑗∈𝐶𝑊𝐸 𝑐𝑜𝑠(𝒘𝑖, 𝒘𝑗)
𝑠𝑖𝑚𝑠𝑐𝑝 = 𝐴𝑣𝑔 𝒘𝑖∈𝐶+ 𝑚𝑎𝑥 𝒘𝑗∈𝑆𝐶𝑃 𝑐𝑜𝑠(𝒘𝑖, 𝒘𝑗)
C: Matching the Relevant CWE and Public SCP
Matching the CWE & public SCP
① Set the threshold 𝜃𝑐𝑤𝑒 and 𝜃𝑠𝑐𝑝.
② If the similarity is higher than 𝜃𝑐𝑤𝑒 and 𝜃𝑠𝑐𝑝, then match the
CWE and public SCP. Otherwise, set the “Unmatched”.
where 𝐶−
= {𝐸𝑥𝑎𝑚−
, 𝐸𝑥𝑝𝑙−
} (Secure Coding Examples &
Explanations), 𝐶+
= 𝐸𝑥𝑎𝑚+
, 𝐸𝑥𝑝𝑙+
(Secure Coding
Examples & Explanations); 𝒘𝑖 are embeddings of words in
SFV keywords, and 𝒘𝑗 indicates the embedding of most
similar word in CWE and public SCP.

14
Catalog
1. Background
2. Motivation
3. Approach
5. Discussion
6. Conclusion

4. Evaluation: Dataset
15
A: Dataset Collection
 Obtain the posts that are tagged with “security” and its
similar tags, such as “websecurity”, “danger” and
“firebase”, etc. (Yang et al., JCST’16).
Table 3: The statistics of our dataset. {“Post_ID”: “48884217”,
“Post_Content”: {“Title”: “Handling Firebase ID tokens on the client side with
vanilla JavaScrip”, “Question”: “I am writing a Firebase application in vanilla
JavaScrip,… [CODE1],…”, “Answer”: “If we don't want to implement a single
page application, …[CODE2], …”},
“Code”: {“[CODE1]”: “/* global firebase, firebaseui */ const uiConfig = …”,
“[CODE2]”: “document.cookie = '__session=' + token…”}
“Sec_Exam”: “document.cookie =…”, “Sec_Expl”: “If we don‘t want to
implement a single page application…”,
“Insec_Exam”: “firebase.auth().onAuthStateChanged…“Insec_Expl”:
“Cookies/localStorage/webStorage do not seem to be fully securable…”,
“CWE”: “CWE-79”, “SCP”: “Access_Control_13”}
 Format of labeled data
Manually Labeled SCP
Specification
B: Data Preprocessing
 Step 1: Filter the posts that do not contain <code> tags,
 Step 2: Filter the posts that receive negative scores, as
well as the non-English posts.
 Step 3: Tagging the code blocks in sequence, i.e., with
[CODE1]…[CODEn], then remove other HTML tags.
C: Dataset Labeling
 For each post, we manually label the insecure and
secure coding examples and explanations.
 Team with 2 senior researchers, 2 Ph.D. students, and
4 master students, with over 4-year experiences on
software security.
 Make discussions until a decision has been reached.
D: Dataset Augmentation
 Imbalanced and few-shot dataset.
 Augment the dataset with EDA (Wei et al., EMNLP-
IJCNLP’19), the most widely-used method.

4. Evaluation: Research Questions
16
 RQ1: What are the performances of SCPatcher on extracting the
secure or insecure coding example?
 RQ2: What are the performances of SCPatcher on extracting the
secure or insecure coding explanation?
 RQ3: What are the performances and capability of SCPatcher on
matching the CWE?
 RQ4: What are the performances and capability of SCPatcher on
matching the public SCP?

4. Experiment Design and Analysis: RQ1
17
Table 4: The baseline comparison on extracting the insecure and secure coding examples (%).
Three SOTA Baselines
① Baselines: GPT-3 (Original LLM), VulSlicer and DeepBalance
② Variants: Baseline+Area (Combine the baseline with extracted
areas), Baseline+Area+SFV (Combine the baseline with
extracted areas and SFV)
③ Note: All baselines are fine-tuned on our dataset.
RQ1: What are the performances of SCPatcher on extracting the coding example?
Three Evaluation Metrics
① Rouge-L (Measure the performance of code generation),
② MToken (Measure the similarity between code lines, i.e., LOC)
③ MLine (Measure the similarity between code tokens)
Results for RQ1
 SCPatcher outperforms all the baseline on extracting coding
examples. It reaches the highest Rouge-L, MToken, and MLine with
63.33%, 74.23%, and 72.38%, outperforming the best baseline with
4.03% (Rouge-L), 4.91% (MToken), and 2.73% (MLine).
 The time cost of SCPatcher is 11 hours, which is only longer than
the original VulSlicer and DeepBalance.
 Overall, SCPatcher has advantages over all the baselines on
extracting coding examples
∗ Due to the temporary lack of open embeddings for GPT-3, we will compare the GPT-3+Area+SFV when the embeddings are available.

18
Table 5: The baseline comparison on extracting the insecure and secure coding explanations (%).
Three SOTA Baselines
① Baselines: GPT-3 (Original LLM), BERTSum, and BART
(SOTA baselines in text summarization)
② Variants: Baseline+Area (Combine the baseline with extracted
areas), Baseline+Area+SFV (Combine the baseline with
extracted areas and SFV)
③ Note: All baselines are fine-tuned on our dataset.
RQ2: What are the performances of SCPatcher on extracting the coding explanation?
Three Evaluation Metrics
① Precision (Measure the ratio of matched sentences in prediction)
② Recall (Measure the ratio of matched sentences in ground-truth)
③ F1 (Measure the harmony of Precision and Recall)
Results for RQ2
 SCPatcher outperforms all the baseline on extracting the coding
explanations. It reaches the highest Precision, Recall and F1 with
79.43%, 79.70%, and 79.56%, outperforming the best baseline GPT-
3+Area with 4.04% (Precision), 3.91% (Recall), and 3.97% (F1).
 The time cost of SCPatcher is 11 hours, which is only longer than
the original BERTSum.
 Overall, SCPatcher has advantages over all the baselines on
extracting coding explanations
∗ Due to the temporary lack of open embeddings for GPT-3, we will compare the GPT-3+Area+SFV when the embeddings are available.

19
Table 6: The performances on matching the CWE.
 Statistics of SCP specification for CWE
Results for RQ3
 409 SCP specifications are matched with existing CWEs, while
38 posts are unmatched.
 SCPatcher can accurately matches the CWE with 81.43%
(Precision) on average.
 The top-3 maximum number of predictions are CWE-787 (with
76 SCP specifications), CWE-79 (68 SCP specifications), and
CWE-78 (58 SCP specifications).
Metric
 Precision, which can better reflect to the matching result of CWE
in the experiment:
𝐶𝑊𝐸𝑚𝑎𝑡𝑐ℎ
𝐶𝑊𝐸𝑝𝑟𝑒𝑑
× 100%
Table 7: The size of matched and unmatched CWE/public SCP.
RQ3: What are the performances and capability of SCPatcher on matching the CWE?

20
Table 8: The performances on matching the public SCP.
 Statistics of SCP specification for public SCP.
Metric
 Precision, which can better reflect to the matching result of CWE
in the experiment:
𝑆𝐶𝑃𝑚𝑎𝑡𝑐ℎ
𝑆𝐶𝑃𝑝𝑟𝑒𝑑
× 100%
Results for RQ4
 392 SCP specifications are matched with existing OWASP types,
while 55 posts are unmatched.
 SCPatcher can enrich the public SCP with 392 SCP specifications,
3,074 LOC, and 1,967 sentences. It accurately matches the public
SCP with 78.34% (Precision) on average.
 The top-3 maximum number of predictions is Session
Management (with 57 SCP specifications), Access Control (53
SCP specifications), and File Management (46 SCP specifications).
RQ4: What are the performances and capability of SCPatcher on matching the public SCP?
Table 7: The size of matched and unmatched CWE/public SCP.

21
Catalog
1. Background
2. Motivation
3. Approach
5. Discussion
6. Conclusion

5. Discussion: Effect of Variants
22
 Comparison Results on different templates.
 Results: Our template (Cloze 2) outperforms the rest templates
with 3.15% MLine (Insecure Coding Example), 1.42% MLine
(Secure Coding Example), 1.16% F1 (Insecure Coding
Explanation), and 4.74% F1 (Secure Coding Explanation).
 Effect of #sentence on extracting coding explanations.
Table 9: The compared prompt templates.  Effect of #code-block on extracting coding examples.
 Results: SCPatcher outperforms the GPT3- Area with the
average 6.15% MLine (Insecure Coding Example) and 5.37%
MLine (Secure Coding Example); 4.94% F1 (Insecure Coding
Explanation), and 3.67% F1 (Secure Coding Explanation)
• Effect of Prompt Template • Effect of Post’s Sentences and Code-block Numbers

5. Discussion: Qualitative Evaluation
23
* The text with red dashed box indicates the incorrect extraction result.
• Case Study
 Experiment: Compare the SCPatcher with the SOTA
baseline, i.e., GPT3+Area, on extracting SCP specifications
on the motivating example.
 Result: We find that SCPatcher can accurately extract the
coding examples and explanations and successfully
matches the CWE and public SCP. The GPT-3+Area, on
the contrary, introduces incorrect code explanations and
irrelevant lines of code and thus matches the incorrect
CWE and public SCP
• Analysis of Unmatched 10% SCP Specification
 Case 1: Non-SCP-related Code (6.2%). Some code
examples in the security posts are not related to the SCPs.
 Case 2: Inaccurate Extraction (2.4%). Some extracted
SCP specifications are inaccurate and thus cannot be
correctly matched.
 Case 3: Potential New Public SCPs (1.4%). Some SCP
specifications may incorporate new public SCPs that have
not been incorporated by the OWASP.
Post #72865733 propose the WAF to
prevent Log4j in K8s cloud system.

24
Catalog
1. Background
2. Motivation
3. Approach
5. Discussion
6. Conclusion

6. Conclusion
25
• We introduce the SCPatcher, which is an automated approach to enrich
secure coding practices by mining crowd security discussions.
• We conduct an experimental evaluation of the performance of SCPatcher,
which shows that SCPatcher outperforms all baselines, together with a
user study with security practitioners, which further demonstrates its
usefulness in practice.
• We plan to improve our approach with more extended datasets from other
knowledge-sharing platforms. We also plan to enrich more public SCPs,
such as Google and UC Berkeley, to further evaluate the practicality.
https://doi.org/10.5281/zenodo.8254682

Thank you!
Q&A
Ziyou Jiang, ziyou2019@iscas.ac.cn
Institute of Software, CAS, Beijing, China
ASE’23, Luxembourg

ASE2023_SCPatcher_Presentation_V5.pptx

Recommended

Recommended

More Related Content

Similar to ASE2023_SCPatcher_Presentation_V5.pptx

Similar to ASE2023_SCPatcher_Presentation_V5.pptx (20)

Recently uploaded

Recently uploaded (20)

ASE2023_SCPatcher_Presentation_V5.pptx

Editor's Notes