SlideShare a Scribd company logo
1 of 31
Download to read offline
KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | 2018-03-27
Vulnerability Detection
Based on Git History
Agenda
Introduction
Introduction
Research
Background
The Trend of Security Incidents
Key facts. Why this research is important:
In Quantity
# of CVE reports: 1,020 (2000) → 14,643 (2017) [NVD]
In Quality
• Equifax exposed 143M consumers’ data due to website application
vulnerability (2017)
• Yahoo breached 3B users’ account information (2013)
The Century of Vulnerability
# OF VULNERABILITIES
As information technology is broadly
adopted, the impact of security
incidents is getting extensive and
critical.
Introduction To Help Code Reviewer
We know how to deliver software in proper quality. Code review!
Best Practice is Well-Known
Review patches before release and fix bugs before deployment. Still, however,
even the famous OSS projects struggle with the lack of code reviewers.
A Trade-Off of Automation Techniques
Software projects widely adopt a variety of automation approaches. Vulnerability
detection techniques faces a contradictory:
• (a) High precision. Useless if the tool outputs a billions of false positives.
• (b) Adaptability. No one wants to make efforts only for ensuring security such
as annotating unsafe user inputs.
Research
Background
# Example of taint annotation
int printf(/*@untainted@*/ char *fmt,
...);
Git is somewhat difficult.
No worries, it’s not only you!
WHAT’S GIT?
“(Git is) expressly designed
to make you feel
less intelligent than you
thought you were”
– Andrew Morton
The Greatness of Git -
www.linuxfoundation.org
Introduction
What’s Git?
But Git is Always Stay With You
Trust me, or try this command on your terminal:
# List up how much you rely on Git
history | awk '{ print $2 }' | sort | 
uniq -c | sort -r | head
Introduction
What’s Git?
Git for Machine Learning
Git provides what machine learning requires; good data:
• Adopted by 69.2% of 30K developers [StackOverfow]
• Trusted by most prominent OSS projects such as Linux
Kernel, OpenSSL, FFmpeg, PostgreSQL, Chrome V8, and
Apache HTTPD.
Introduction
What’s Git?
CVE-ID and Security Fix on Git
A sufficient number of reliable security fixes:
• Refers CVE-IDs in their commit message
• Or, fixed commits are referred by CVE database
Introduction
What’s Git?
A Brief Introduction of Git Features
Agenda
Methodology
A static analysis to detect
suspicious vulnerabilities based
on Git history.
METHODOLOGY - HVD
Methodology Proposal Approach
Concept
• This research proposes the approach which aims to
reduce the false positive rate compared to VCCFinder
[Perl et al] without sacrificing adaptability.
• The data source is the same to VCCFinder but this
approach takes account of added-lines and removed-
lines in patch feature while VCCFinder doesn’t.
Methodology VCCFinder: a Novel Approach
Concept
Generally, it’s hard to apply machine learning to source code
because most high-level programming languages such as
C/C++ are less redundant compared to natural languages
and assembly languages. To address this difficulty, Perl et
al.:
• Narrowed down the problem to the quantifiable lemma.
The quality of source code can be hardly quantified but
vulnerability can be expressed as 0 or 1.
• Leveraged the legacies. CVE database and the prominent
OSS projects.
“I really never wanted to do
source control
management at all and felt
that it was just about the
least interesting thing in
the computing world”
– Linus Torvalds
10 Years of Git -
www.linuxfoundation.org
Methodology Overall Architecture
Concept
Methodology Abbreviations
Terms
• HVD: History-based Vulnerability Detector
• VCC: Vulnerability-Contributing Commit(s). Changes
containing vulnerability
• UC: Unclassified Changes
• LT-S: Line type sensitive. The HVD approach
• LT-I: Line type insensitive. The replication of VCCFinder
Methodology Exploit vs Vulnerability
Terms
Potential
vulnerability
Vulnerability
Exploit
(malicious input)
Agenda
Evaluation
351,452
commits in total
Evaluation Dataset Provided by Perl et al.
Experiment
• This dataset contains commits labelled by VCC and UC and associated with
their CVE-IDs.
• It comprises 714 VCCs out of 350k commits in total from 66 OSS repositories
implemented in C/C++.
• The number of unique tokens counts 170k.
• Compressed size is 525mb (npz).
Evaluation Implementation in Python
Experiment
To make the experiment reliable, I adopted a variety of libraries including:
• Numpy
• SciPy
• Scikit-learn
• Unidiff
LT-I: note that the reproducibility is limited since the source of VCCFinder is not
publicly available.
Evaluation Environment Specs
Experiment
The computation was performed at the one of CX250 Cluster (MPC):
• CPU: Intel Xeon E5-2680v2 2.80GHz (10-core) x2
• Memory: 64GB (4GB DDR3-1866 ECC x16)
Evaluation Precision Improvement
• LT-S improved the AUC (area under curve) of its precision-recall curve by
18.8% from LT-I.
Precision
Evaluation Trade-off
• Execution time x3: (LT-I, LT-S) = (17m06s, 45m36s)
• Note: the vast majority of the processing time is occupied by learning phase.
In the practical use case, the learnt model is dumped and shared with future
predictions for a while once calculated. Then, it takes a few seconds to parse a
given unknown commit and perform prediction by using the shared model.
Hence, the execution time of learning phase should not influence the
development process.
Precision
Evaluation The most contributing features
Effective Features
To gain more profound insights from the
experiment, this study also reveals that
valuables consisting of words related to
computer resource most significantly
contributed to the classification model.
For instance:
• (RAM) structors: memory allocation with
complex structures
• (RAM) vmalloc: virtual memory allocation
• (CPU) skbuf_head: a spin-lock of threads
• (network) tso: TCP Segmentation Offload
• (network) if_ether: a flag of Ethernet
availability
Evaluation Findings & insights
Effective Features
Findings:
• The valuable tokens which are relevant to computer resources such as CPU,
memory, and network
• The figure also shows most contributing valuables are added-tokens.
Insights:
• These findings do not surprise us because it’s obvious that vulnerability occurs
correlating closely with side effects with computer resource management and
adding code.
• However, it’s worth verifying that automatic detection approach makes no
difference with the experiential intuition of human.
Agenda
Conclusion
Despite the difficulty that the features acquirable via Git are limited, this study shows LT-
S improved AUC of the precision-recall curve by 18.8% compared to LT-I without losing
the original advantages:
• (a) Scalability
• (b) Generality
• (c) Explainability
CONCLUSION
KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | @I05
Thank you!
Questions & discussion

More Related Content

What's hot

OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For DevelopersKevin Brockhoff
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...Priyanka Aash
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportQAware GmbH
 
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...MITRE - ATT&CKcon
 
Model-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsModel-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsLionel Briand
 
SDN Analytics & Security
SDN Analytics & Security  SDN Analytics & Security
SDN Analytics & Security Scott Raynovich
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelFrank Pfleger
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsLionel Briand
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingAmuhinda Hungai
 
Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Abhik Roychoudhury
 
Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions TestingCR
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsLionel Briand
 
Bridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineBridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineDevOps.com
 
Under-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes ManifestsUnder-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes ManifestsAkond Rahman
 
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software ProjectsAnalysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software ProjectsRAKESH RANA
 
Container intrusions Do You Even IDS
Container intrusions Do You Even IDSContainer intrusions Do You Even IDS
Container intrusions Do You Even IDSAlfredo Hickman
 
What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?Akond Rahman
 
44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN securityDavid Jorm
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...Open Networking Perú (Opennetsoft)
 

What's hot (20)

OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
 
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
 
Model-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsModel-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specifications
 
SDN Analytics & Security
SDN Analytics & Security  SDN Analytics & Security
SDN Analytics & Security
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical Systems
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Singapore International Cyberweek 2020
Singapore International Cyberweek 2020
 
Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
 
Bridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineBridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD Pipeline
 
Under-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes ManifestsUnder-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes Manifests
 
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software ProjectsAnalysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
 
Container intrusions Do You Even IDS
Container intrusions Do You Even IDSContainer intrusions Do You Even IDS
Container intrusions Do You Even IDS
 
What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?
 
44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
 

Similar to Detecting vulnerabilities in code through Git history analysis

Code Quality - Security
Code Quality - SecurityCode Quality - Security
Code Quality - Securitysedukull
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0Matt Lucas
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Amine Barrak
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Amine Barrak
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Ryan Hodgin
 
Zero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically GuaranteedZero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically GuaranteedAshley Zupkus
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerVMware Tanzu
 
DevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingDevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingAarno Aukia
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyDevOps.com
 
Cyber Resiliency 20120420
Cyber Resiliency 20120420Cyber Resiliency 20120420
Cyber Resiliency 20120420Steve Goeringer
 
Scaling security in a cloud environment v0.5 (Sep 2017)
Scaling security in a cloud environment  v0.5 (Sep 2017)Scaling security in a cloud environment  v0.5 (Sep 2017)
Scaling security in a cloud environment v0.5 (Sep 2017)Dinis Cruz
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureAlex Bulankou
 
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...SonjaChevre
 
Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?Reuven Harrison
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingRISC-V International
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?NGINX, Inc.
 
Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSADenim Group
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Lionel Briand
 

Similar to Detecting vulnerabilities in code through Git history analysis (20)

Code Quality - Security
Code Quality - SecurityCode Quality - Security
Code Quality - Security
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
 
1506.08725v1
1506.08725v11506.08725v1
1506.08725v1
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...
 
Zero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically GuaranteedZero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically Guaranteed
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing Primer
 
Pragmatic Code Coverage
Pragmatic Code CoveragePragmatic Code Coverage
Pragmatic Code Coverage
 
DevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingDevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss Banking
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
 
Cyber Resiliency 20120420
Cyber Resiliency 20120420Cyber Resiliency 20120420
Cyber Resiliency 20120420
 
Scaling security in a cloud environment v0.5 (Sep 2017)
Scaling security in a cloud environment  v0.5 (Sep 2017)Scaling security in a cloud environment  v0.5 (Sep 2017)
Scaling security in a cloud environment v0.5 (Sep 2017)
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
 
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
 
Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testing
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?
 
Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSA
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
 

More from Kenta Yamamoto

The Art of Command Line (2021)
The Art of Command Line (2021)The Art of Command Line (2021)
The Art of Command Line (2021)Kenta Yamamoto
 
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...Kenta Yamamoto
 
文字コードとセキュリティ
文字コードとセキュリティ文字コードとセキュリティ
文字コードとセキュリティKenta Yamamoto
 
良いUrlを設計する
良いUrlを設計する良いUrlを設計する
良いUrlを設計するKenta Yamamoto
 
私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか 私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか Kenta Yamamoto
 
優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則Kenta Yamamoto
 
東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3Kenta Yamamoto
 

More from Kenta Yamamoto (10)

The Art of Command Line (2021)
The Art of Command Line (2021)The Art of Command Line (2021)
The Art of Command Line (2021)
 
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
 
文字コードとセキュリティ
文字コードとセキュリティ文字コードとセキュリティ
文字コードとセキュリティ
 
良いUrlを設計する
良いUrlを設計する良いUrlを設計する
良いUrlを設計する
 
私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか 私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか
 
Tips for bash script
Tips for bash scriptTips for bash script
Tips for bash script
 
優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則
 
20110805 ui14課題2
20110805 ui14課題220110805 ui14課題2
20110805 ui14課題2
 
20110804 ui14課題
20110804 ui14課題20110804 ui14課題
20110804 ui14課題
 
東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3
 

Recently uploaded

Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 

Recently uploaded (20)

Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 

Detecting vulnerabilities in code through Git history analysis

  • 1. KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | 2018-03-27 Vulnerability Detection Based on Git History
  • 3. Introduction Research Background The Trend of Security Incidents Key facts. Why this research is important: In Quantity # of CVE reports: 1,020 (2000) → 14,643 (2017) [NVD] In Quality • Equifax exposed 143M consumers’ data due to website application vulnerability (2017) • Yahoo breached 3B users’ account information (2013)
  • 4. The Century of Vulnerability # OF VULNERABILITIES As information technology is broadly adopted, the impact of security incidents is getting extensive and critical.
  • 5. Introduction To Help Code Reviewer We know how to deliver software in proper quality. Code review! Best Practice is Well-Known Review patches before release and fix bugs before deployment. Still, however, even the famous OSS projects struggle with the lack of code reviewers. A Trade-Off of Automation Techniques Software projects widely adopt a variety of automation approaches. Vulnerability detection techniques faces a contradictory: • (a) High precision. Useless if the tool outputs a billions of false positives. • (b) Adaptability. No one wants to make efforts only for ensuring security such as annotating unsafe user inputs. Research Background # Example of taint annotation int printf(/*@untainted@*/ char *fmt, ...);
  • 6. Git is somewhat difficult. No worries, it’s not only you! WHAT’S GIT?
  • 7. “(Git is) expressly designed to make you feel less intelligent than you thought you were” – Andrew Morton The Greatness of Git - www.linuxfoundation.org
  • 8. Introduction What’s Git? But Git is Always Stay With You Trust me, or try this command on your terminal: # List up how much you rely on Git history | awk '{ print $2 }' | sort | uniq -c | sort -r | head
  • 9. Introduction What’s Git? Git for Machine Learning Git provides what machine learning requires; good data: • Adopted by 69.2% of 30K developers [StackOverfow] • Trusted by most prominent OSS projects such as Linux Kernel, OpenSSL, FFmpeg, PostgreSQL, Chrome V8, and Apache HTTPD.
  • 10. Introduction What’s Git? CVE-ID and Security Fix on Git A sufficient number of reliable security fixes: • Refers CVE-IDs in their commit message • Or, fixed commits are referred by CVE database
  • 11. Introduction What’s Git? A Brief Introduction of Git Features
  • 13. A static analysis to detect suspicious vulnerabilities based on Git history. METHODOLOGY - HVD
  • 14. Methodology Proposal Approach Concept • This research proposes the approach which aims to reduce the false positive rate compared to VCCFinder [Perl et al] without sacrificing adaptability. • The data source is the same to VCCFinder but this approach takes account of added-lines and removed- lines in patch feature while VCCFinder doesn’t.
  • 15. Methodology VCCFinder: a Novel Approach Concept Generally, it’s hard to apply machine learning to source code because most high-level programming languages such as C/C++ are less redundant compared to natural languages and assembly languages. To address this difficulty, Perl et al.: • Narrowed down the problem to the quantifiable lemma. The quality of source code can be hardly quantified but vulnerability can be expressed as 0 or 1. • Leveraged the legacies. CVE database and the prominent OSS projects.
  • 16. “I really never wanted to do source control management at all and felt that it was just about the least interesting thing in the computing world” – Linus Torvalds 10 Years of Git - www.linuxfoundation.org
  • 18. Methodology Abbreviations Terms • HVD: History-based Vulnerability Detector • VCC: Vulnerability-Contributing Commit(s). Changes containing vulnerability • UC: Unclassified Changes • LT-S: Line type sensitive. The HVD approach • LT-I: Line type insensitive. The replication of VCCFinder
  • 19. Methodology Exploit vs Vulnerability Terms Potential vulnerability Vulnerability Exploit (malicious input)
  • 22. Evaluation Dataset Provided by Perl et al. Experiment • This dataset contains commits labelled by VCC and UC and associated with their CVE-IDs. • It comprises 714 VCCs out of 350k commits in total from 66 OSS repositories implemented in C/C++. • The number of unique tokens counts 170k. • Compressed size is 525mb (npz).
  • 23. Evaluation Implementation in Python Experiment To make the experiment reliable, I adopted a variety of libraries including: • Numpy • SciPy • Scikit-learn • Unidiff LT-I: note that the reproducibility is limited since the source of VCCFinder is not publicly available.
  • 24. Evaluation Environment Specs Experiment The computation was performed at the one of CX250 Cluster (MPC): • CPU: Intel Xeon E5-2680v2 2.80GHz (10-core) x2 • Memory: 64GB (4GB DDR3-1866 ECC x16)
  • 25. Evaluation Precision Improvement • LT-S improved the AUC (area under curve) of its precision-recall curve by 18.8% from LT-I. Precision
  • 26. Evaluation Trade-off • Execution time x3: (LT-I, LT-S) = (17m06s, 45m36s) • Note: the vast majority of the processing time is occupied by learning phase. In the practical use case, the learnt model is dumped and shared with future predictions for a while once calculated. Then, it takes a few seconds to parse a given unknown commit and perform prediction by using the shared model. Hence, the execution time of learning phase should not influence the development process. Precision
  • 27. Evaluation The most contributing features Effective Features To gain more profound insights from the experiment, this study also reveals that valuables consisting of words related to computer resource most significantly contributed to the classification model. For instance: • (RAM) structors: memory allocation with complex structures • (RAM) vmalloc: virtual memory allocation • (CPU) skbuf_head: a spin-lock of threads • (network) tso: TCP Segmentation Offload • (network) if_ether: a flag of Ethernet availability
  • 28. Evaluation Findings & insights Effective Features Findings: • The valuable tokens which are relevant to computer resources such as CPU, memory, and network • The figure also shows most contributing valuables are added-tokens. Insights: • These findings do not surprise us because it’s obvious that vulnerability occurs correlating closely with side effects with computer resource management and adding code. • However, it’s worth verifying that automatic detection approach makes no difference with the experiential intuition of human.
  • 30. Despite the difficulty that the features acquirable via Git are limited, this study shows LT- S improved AUC of the precision-recall curve by 18.8% compared to LT-I without losing the original advantages: • (a) Scalability • (b) Generality • (c) Explainability CONCLUSION
  • 31. KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | @I05 Thank you! Questions & discussion