SlideShare a Scribd company logo
KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | 2018-03-27
Vulnerability Detection
Based on Git History
Agenda
Introduction
Introduction
Research
Background
The Trend of Security Incidents
Key facts. Why this research is important:
In Quantity
# of CVE reports: 1,020 (2000) → 14,643 (2017) [NVD]
In Quality
• Equifax exposed 143M consumers’ data due to website application
vulnerability (2017)
• Yahoo breached 3B users’ account information (2013)
The Century of Vulnerability
# OF VULNERABILITIES
As information technology is broadly
adopted, the impact of security
incidents is getting extensive and
critical.
Introduction To Help Code Reviewer
We know how to deliver software in proper quality. Code review!
Best Practice is Well-Known
Review patches before release and fix bugs before deployment. Still, however,
even the famous OSS projects struggle with the lack of code reviewers.
A Trade-Off of Automation Techniques
Software projects widely adopt a variety of automation approaches. Vulnerability
detection techniques faces a contradictory:
• (a) High precision. Useless if the tool outputs a billions of false positives.
• (b) Adaptability. No one wants to make efforts only for ensuring security such
as annotating unsafe user inputs.
Research
Background
# Example of taint annotation
int printf(/*@untainted@*/ char *fmt,
...);
Git is somewhat difficult.
No worries, it’s not only you!
WHAT’S GIT?
“(Git is) expressly designed
to make you feel
less intelligent than you
thought you were”
– Andrew Morton
The Greatness of Git -
www.linuxfoundation.org
Introduction
What’s Git?
But Git is Always Stay With You
Trust me, or try this command on your terminal:
# List up how much you rely on Git
history | awk '{ print $2 }' | sort | 
uniq -c | sort -r | head
Introduction
What’s Git?
Git for Machine Learning
Git provides what machine learning requires; good data:
• Adopted by 69.2% of 30K developers [StackOverfow]
• Trusted by most prominent OSS projects such as Linux
Kernel, OpenSSL, FFmpeg, PostgreSQL, Chrome V8, and
Apache HTTPD.
Introduction
What’s Git?
CVE-ID and Security Fix on Git
A sufficient number of reliable security fixes:
• Refers CVE-IDs in their commit message
• Or, fixed commits are referred by CVE database
Introduction
What’s Git?
A Brief Introduction of Git Features
Agenda
Methodology
A static analysis to detect
suspicious vulnerabilities based
on Git history.
METHODOLOGY - HVD
Methodology Proposal Approach
Concept
• This research proposes the approach which aims to
reduce the false positive rate compared to VCCFinder
[Perl et al] without sacrificing adaptability.
• The data source is the same to VCCFinder but this
approach takes account of added-lines and removed-
lines in patch feature while VCCFinder doesn’t.
Methodology VCCFinder: a Novel Approach
Concept
Generally, it’s hard to apply machine learning to source code
because most high-level programming languages such as
C/C++ are less redundant compared to natural languages
and assembly languages. To address this difficulty, Perl et
al.:
• Narrowed down the problem to the quantifiable lemma.
The quality of source code can be hardly quantified but
vulnerability can be expressed as 0 or 1.
• Leveraged the legacies. CVE database and the prominent
OSS projects.
“I really never wanted to do
source control
management at all and felt
that it was just about the
least interesting thing in
the computing world”
– Linus Torvalds
10 Years of Git -
www.linuxfoundation.org
Methodology Overall Architecture
Concept
Methodology Abbreviations
Terms
• HVD: History-based Vulnerability Detector
• VCC: Vulnerability-Contributing Commit(s). Changes
containing vulnerability
• UC: Unclassified Changes
• LT-S: Line type sensitive. The HVD approach
• LT-I: Line type insensitive. The replication of VCCFinder
Methodology Exploit vs Vulnerability
Terms
Potential
vulnerability
Vulnerability
Exploit
(malicious input)
Agenda
Evaluation
351,452
commits in total
Evaluation Dataset Provided by Perl et al.
Experiment
• This dataset contains commits labelled by VCC and UC and associated with
their CVE-IDs.
• It comprises 714 VCCs out of 350k commits in total from 66 OSS repositories
implemented in C/C++.
• The number of unique tokens counts 170k.
• Compressed size is 525mb (npz).
Evaluation Implementation in Python
Experiment
To make the experiment reliable, I adopted a variety of libraries including:
• Numpy
• SciPy
• Scikit-learn
• Unidiff
LT-I: note that the reproducibility is limited since the source of VCCFinder is not
publicly available.
Evaluation Environment Specs
Experiment
The computation was performed at the one of CX250 Cluster (MPC):
• CPU: Intel Xeon E5-2680v2 2.80GHz (10-core) x2
• Memory: 64GB (4GB DDR3-1866 ECC x16)
Evaluation Precision Improvement
• LT-S improved the AUC (area under curve) of its precision-recall curve by
18.8% from LT-I.
Precision
Evaluation Trade-off
• Execution time x3: (LT-I, LT-S) = (17m06s, 45m36s)
• Note: the vast majority of the processing time is occupied by learning phase.
In the practical use case, the learnt model is dumped and shared with future
predictions for a while once calculated. Then, it takes a few seconds to parse a
given unknown commit and perform prediction by using the shared model.
Hence, the execution time of learning phase should not influence the
development process.
Precision
Evaluation The most contributing features
Effective Features
To gain more profound insights from the
experiment, this study also reveals that
valuables consisting of words related to
computer resource most significantly
contributed to the classification model.
For instance:
• (RAM) structors: memory allocation with
complex structures
• (RAM) vmalloc: virtual memory allocation
• (CPU) skbuf_head: a spin-lock of threads
• (network) tso: TCP Segmentation Offload
• (network) if_ether: a flag of Ethernet
availability
Evaluation Findings & insights
Effective Features
Findings:
• The valuable tokens which are relevant to computer resources such as CPU,
memory, and network
• The figure also shows most contributing valuables are added-tokens.
Insights:
• These findings do not surprise us because it’s obvious that vulnerability occurs
correlating closely with side effects with computer resource management and
adding code.
• However, it’s worth verifying that automatic detection approach makes no
difference with the experiential intuition of human.
Agenda
Conclusion
Despite the difficulty that the features acquirable via Git are limited, this study shows LT-
S improved AUC of the precision-recall curve by 18.8% compared to LT-I without losing
the original advantages:
• (a) Scalability
• (b) Generality
• (c) Explainability
CONCLUSION
KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | @I05
Thank you!
Questions & discussion

More Related Content

What's hot

OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
Kevin Brockhoff
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
Priyanka Aash
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
QAware GmbH
 
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE - ATT&CKcon
 
Model-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsModel-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specifications
Lionel Briand
 
SDN Analytics & Security
SDN Analytics & Security  SDN Analytics & Security
SDN Analytics & Security
Scott Raynovich
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
Frank Pfleger
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical Systems
Lionel Briand
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
Amuhinda Hungai
 
Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Singapore International Cyberweek 2020
Singapore International Cyberweek 2020
Abhik Roychoudhury
 
Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions
TestingCR
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Lionel Briand
 
Bridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineBridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD Pipeline
DevOps.com
 
Under-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes ManifestsUnder-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes Manifests
Akond Rahman
 
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software ProjectsAnalysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
RAKESH RANA
 
Container intrusions Do You Even IDS
Container intrusions Do You Even IDSContainer intrusions Do You Even IDS
Container intrusions Do You Even IDS
Alfredo Hickman
 
What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?
Akond Rahman
 
44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security
David Jorm
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Sung Kim
 
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
Open Networking Perú (Opennetsoft)
 

What's hot (20)

OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
 
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
MITRE ATT&CKcon 2018: Detection Philosophy, Evolution & ATT&CK, Fred Stankows...
 
Model-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsModel-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specifications
 
SDN Analytics & Security
SDN Analytics & Security  SDN Analytics & Security
SDN Analytics & Security
 
WJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next levelWJAX 2019 - Taking Distributed Tracing to the next level
WJAX 2019 - Taking Distributed Tracing to the next level
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical Systems
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
Singapore International Cyberweek 2020
Singapore International Cyberweek 2020Singapore International Cyberweek 2020
Singapore International Cyberweek 2020
 
Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions Improving Automated Tests with Fluent Assertions
Improving Automated Tests with Fluent Assertions
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
 
Bridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD PipelineBridging the Security Testing Gap in Your CI/CD Pipeline
Bridging the Security Testing Gap in Your CI/CD Pipeline
 
Under-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes ManifestsUnder-reported Security Defects in Kubernetes Manifests
Under-reported Security Defects in Kubernetes Manifests
 
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software ProjectsAnalysing Defect Inflow Distribution of Automotive & Large Software Projects
Analysing Defect Inflow Distribution of Automotive & Large Software Projects
 
Container intrusions Do You Even IDS
Container intrusions Do You Even IDSContainer intrusions Do You Even IDS
Container intrusions Do You Even IDS
 
What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?What Questions Do Programmers Ask About Configuration as Code?
What Questions Do Programmers Ask About Configuration as Code?
 
44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security44CON & Ruxcon: SDN security
44CON & Ruxcon: SDN security
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
IntelFlow: Toward adding Cyber Threat Intelligence to Software Defined Networ...
 

Similar to Vulnerability Detection Based on Git History

Code Quality - Security
Code Quality - SecurityCode Quality - Security
Code Quality - Security
sedukull
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
Matt Lucas
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Amine Barrak
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Amine Barrak
 
1506.08725v1
1506.08725v11506.08725v1
1506.08725v1
Sandeep Sivanandan
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Ryan Hodgin
 
Zero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically GuaranteedZero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically Guaranteed
Ashley Zupkus
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing Primer
VMware Tanzu
 
Pragmatic Code Coverage
Pragmatic Code CoveragePragmatic Code Coverage
Pragmatic Code Coverage
Alexandre (Shura) Iline
 
DevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingDevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss Banking
Aarno Aukia
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
DevOps.com
 
Cyber Resiliency 20120420
Cyber Resiliency 20120420Cyber Resiliency 20120420
Cyber Resiliency 20120420
Steve Goeringer
 
Scaling security in a cloud environment v0.5 (Sep 2017)
Scaling security in a cloud environment  v0.5 (Sep 2017)Scaling security in a cloud environment  v0.5 (Sep 2017)
Scaling security in a cloud environment v0.5 (Sep 2017)
Dinis Cruz
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
Alex Bulankou
 
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
SonjaChevre
 
Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?
Reuven Harrison
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testing
RISC-V International
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?
NGINX, Inc.
 
Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSA
Denim Group
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Lionel Briand
 

Similar to Vulnerability Detection Based on Git History (20)

Code Quality - Security
Code Quality - SecurityCode Quality - Security
Code Quality - Security
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
 
1506.08725v1
1506.08725v11506.08725v1
1506.08725v1
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...
 
Zero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically GuaranteedZero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically Guaranteed
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing Primer
 
Pragmatic Code Coverage
Pragmatic Code CoveragePragmatic Code Coverage
Pragmatic Code Coverage
 
DevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingDevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss Banking
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
 
Cyber Resiliency 20120420
Cyber Resiliency 20120420Cyber Resiliency 20120420
Cyber Resiliency 20120420
 
Scaling security in a cloud environment v0.5 (Sep 2017)
Scaling security in a cloud environment  v0.5 (Sep 2017)Scaling security in a cloud environment  v0.5 (Sep 2017)
Scaling security in a cloud environment v0.5 (Sep 2017)
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
 
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
Migrating from OpenTracing to OpenTelemetry - Kubernetes Community Days Munic...
 
Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?Are your DevOps and Security teams friends or foes?
Are your DevOps and Security teams friends or foes?
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testing
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?
 
Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSA
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
 

More from Kenta Yamamoto

The Art of Command Line (2021)
The Art of Command Line (2021)The Art of Command Line (2021)
The Art of Command Line (2021)
Kenta Yamamoto
 
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
Kenta Yamamoto
 
文字コードとセキュリティ
文字コードとセキュリティ文字コードとセキュリティ
文字コードとセキュリティ
Kenta Yamamoto
 
良いUrlを設計する
良いUrlを設計する良いUrlを設計する
良いUrlを設計する
Kenta Yamamoto
 
私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか 私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか
Kenta Yamamoto
 
Tips for bash script
Tips for bash scriptTips for bash script
Tips for bash script
Kenta Yamamoto
 
優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則
Kenta Yamamoto
 
20110805 ui14課題2
20110805 ui14課題220110805 ui14課題2
20110805 ui14課題2
Kenta Yamamoto
 
20110804 ui14課題
20110804 ui14課題20110804 ui14課題
20110804 ui14課題
Kenta Yamamoto
 
東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3
Kenta Yamamoto
 

More from Kenta Yamamoto (10)

The Art of Command Line (2021)
The Art of Command Line (2021)The Art of Command Line (2021)
The Art of Command Line (2021)
 
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
[論文紹介] VCC-Finder: Finding Potential Vulnerabilities in Open-Source Projects ...
 
文字コードとセキュリティ
文字コードとセキュリティ文字コードとセキュリティ
文字コードとセキュリティ
 
良いUrlを設計する
良いUrlを設計する良いUrlを設計する
良いUrlを設計する
 
私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか 私たちは何を Web っぽいと感じているのか
私たちは何を Web っぽいと感じているのか
 
Tips for bash script
Tips for bash scriptTips for bash script
Tips for bash script
 
優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則優れたビデオゲームに共通する不変の法則
優れたビデオゲームに共通する不変の法則
 
20110805 ui14課題2
20110805 ui14課題220110805 ui14課題2
20110805 ui14課題2
 
20110804 ui14課題
20110804 ui14課題20110804 ui14課題
20110804 ui14課題
 
東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3東日本大震災後の訪日外国人数の変移_2011.3
東日本大震災後の訪日外国人数の変移_2011.3
 

Recently uploaded

Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and MoreManyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
narinav14
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
sandeepmenon62
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
Luigi Fugaro
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
Tier1 app
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
kalichargn70th171
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
vaishalijagtap12
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
Pedro J. Molina
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
seospiralmantra
 
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdfThe Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
kalichargn70th171
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Liberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptxLiberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptx
Massimo Artizzu
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
Reetu63
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 

Recently uploaded (20)

Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and MoreManyata Tech Park Bangalore_ Infrastructure, Facilities and More
Manyata Tech Park Bangalore_ Infrastructure, Facilities and More
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
 
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdfThe Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Liberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptxLiberarsi dai framework con i Web Component.pptx
Liberarsi dai framework con i Web Component.pptx
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 

Vulnerability Detection Based on Git History

  • 1. KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | 2018-03-27 Vulnerability Detection Based on Git History
  • 3. Introduction Research Background The Trend of Security Incidents Key facts. Why this research is important: In Quantity # of CVE reports: 1,020 (2000) → 14,643 (2017) [NVD] In Quality • Equifax exposed 143M consumers’ data due to website application vulnerability (2017) • Yahoo breached 3B users’ account information (2013)
  • 4. The Century of Vulnerability # OF VULNERABILITIES As information technology is broadly adopted, the impact of security incidents is getting extensive and critical.
  • 5. Introduction To Help Code Reviewer We know how to deliver software in proper quality. Code review! Best Practice is Well-Known Review patches before release and fix bugs before deployment. Still, however, even the famous OSS projects struggle with the lack of code reviewers. A Trade-Off of Automation Techniques Software projects widely adopt a variety of automation approaches. Vulnerability detection techniques faces a contradictory: • (a) High precision. Useless if the tool outputs a billions of false positives. • (b) Adaptability. No one wants to make efforts only for ensuring security such as annotating unsafe user inputs. Research Background # Example of taint annotation int printf(/*@untainted@*/ char *fmt, ...);
  • 6. Git is somewhat difficult. No worries, it’s not only you! WHAT’S GIT?
  • 7. “(Git is) expressly designed to make you feel less intelligent than you thought you were” – Andrew Morton The Greatness of Git - www.linuxfoundation.org
  • 8. Introduction What’s Git? But Git is Always Stay With You Trust me, or try this command on your terminal: # List up how much you rely on Git history | awk '{ print $2 }' | sort | uniq -c | sort -r | head
  • 9. Introduction What’s Git? Git for Machine Learning Git provides what machine learning requires; good data: • Adopted by 69.2% of 30K developers [StackOverfow] • Trusted by most prominent OSS projects such as Linux Kernel, OpenSSL, FFmpeg, PostgreSQL, Chrome V8, and Apache HTTPD.
  • 10. Introduction What’s Git? CVE-ID and Security Fix on Git A sufficient number of reliable security fixes: • Refers CVE-IDs in their commit message • Or, fixed commits are referred by CVE database
  • 11. Introduction What’s Git? A Brief Introduction of Git Features
  • 13. A static analysis to detect suspicious vulnerabilities based on Git history. METHODOLOGY - HVD
  • 14. Methodology Proposal Approach Concept • This research proposes the approach which aims to reduce the false positive rate compared to VCCFinder [Perl et al] without sacrificing adaptability. • The data source is the same to VCCFinder but this approach takes account of added-lines and removed- lines in patch feature while VCCFinder doesn’t.
  • 15. Methodology VCCFinder: a Novel Approach Concept Generally, it’s hard to apply machine learning to source code because most high-level programming languages such as C/C++ are less redundant compared to natural languages and assembly languages. To address this difficulty, Perl et al.: • Narrowed down the problem to the quantifiable lemma. The quality of source code can be hardly quantified but vulnerability can be expressed as 0 or 1. • Leveraged the legacies. CVE database and the prominent OSS projects.
  • 16. “I really never wanted to do source control management at all and felt that it was just about the least interesting thing in the computing world” – Linus Torvalds 10 Years of Git - www.linuxfoundation.org
  • 18. Methodology Abbreviations Terms • HVD: History-based Vulnerability Detector • VCC: Vulnerability-Contributing Commit(s). Changes containing vulnerability • UC: Unclassified Changes • LT-S: Line type sensitive. The HVD approach • LT-I: Line type insensitive. The replication of VCCFinder
  • 19. Methodology Exploit vs Vulnerability Terms Potential vulnerability Vulnerability Exploit (malicious input)
  • 22. Evaluation Dataset Provided by Perl et al. Experiment • This dataset contains commits labelled by VCC and UC and associated with their CVE-IDs. • It comprises 714 VCCs out of 350k commits in total from 66 OSS repositories implemented in C/C++. • The number of unique tokens counts 170k. • Compressed size is 525mb (npz).
  • 23. Evaluation Implementation in Python Experiment To make the experiment reliable, I adopted a variety of libraries including: • Numpy • SciPy • Scikit-learn • Unidiff LT-I: note that the reproducibility is limited since the source of VCCFinder is not publicly available.
  • 24. Evaluation Environment Specs Experiment The computation was performed at the one of CX250 Cluster (MPC): • CPU: Intel Xeon E5-2680v2 2.80GHz (10-core) x2 • Memory: 64GB (4GB DDR3-1866 ECC x16)
  • 25. Evaluation Precision Improvement • LT-S improved the AUC (area under curve) of its precision-recall curve by 18.8% from LT-I. Precision
  • 26. Evaluation Trade-off • Execution time x3: (LT-I, LT-S) = (17m06s, 45m36s) • Note: the vast majority of the processing time is occupied by learning phase. In the practical use case, the learnt model is dumped and shared with future predictions for a while once calculated. Then, it takes a few seconds to parse a given unknown commit and perform prediction by using the shared model. Hence, the execution time of learning phase should not influence the development process. Precision
  • 27. Evaluation The most contributing features Effective Features To gain more profound insights from the experiment, this study also reveals that valuables consisting of words related to computer resource most significantly contributed to the classification model. For instance: • (RAM) structors: memory allocation with complex structures • (RAM) vmalloc: virtual memory allocation • (CPU) skbuf_head: a spin-lock of threads • (network) tso: TCP Segmentation Offload • (network) if_ether: a flag of Ethernet availability
  • 28. Evaluation Findings & insights Effective Features Findings: • The valuable tokens which are relevant to computer resources such as CPU, memory, and network • The figure also shows most contributing valuables are added-tokens. Insights: • These findings do not surprise us because it’s obvious that vulnerability occurs correlating closely with side effects with computer resource management and adding code. • However, it’s worth verifying that automatic detection approach makes no difference with the experiential intuition of human.
  • 30. Despite the difficulty that the features acquirable via Git are limited, this study shows LT- S improved AUC of the precision-recall curve by 18.8% compared to LT-I without losing the original advantages: • (a) Scalability • (b) Generality • (c) Explainability CONCLUSION
  • 31. KENTA YAMAMOTO | TECHNICAL SUPPORT ENGINEER | @I05 Thank you! Questions & discussion