Data Quality for Software Vulnerability Dataset

•Download as PPTX, PDF•

0 likes•42 views

This project explores data quality for software vulnerability datasets, and provides solutions for automated data cleaning frameworks to improve data quality and downstream tasks.

Software

Software Vulnerability Prediction
The University of Adelaide Slide 3
• Utilise AI to improve automation and effectiveness of vulnerability detection.
• Use knowledge from previous examples to automatically learn vulnerable patterns.
Previous known Vulnerabilities
Machine Learning
Prediction

Software Vulnerability Prediction
The University of Adelaide Slide 4
• Utilise AI to improve automation and effectiveness of vulnerability detection.
• Use knowledge from previous examples to automatically learn vulnerable patterns.
Previous known Vulnerabilities
Machine Learning
Prediction
Data is the core
component of any
data-driven pipeline:
“Garbage In, Garbage Out”

Software Vulnerability Datasets
The University of Adelaide Slide 5
Weak
Supervision
1. Vulnerability Reports
2. Development Commit
Logs
3. Static Analysis Tools
4. Synthetic Data

Research Objective
The University of Adelaide Slide 6
Aim
Outcomes
Inform the state of software
vulnerability data quality and the
reliability of downstream tasks.
1
Enable automated data cleaning
frameworks to improve data quality
and downstream tasks.
2
To gain deep understanding into
the nature of data quality for
software vulnerability datasets.

Research Design
The University of Adelaide Slide 7

Research Design
The University of Adelaide Slide 8
Data Quality Attributes
Accuracy
1
Completeness
4
Uniqueness
2
Consistency
3
Currentness
5

Research Design
The University of Adelaide Slide 9
Labelling Heuristic: Selected Dataset:
Security Big-Vul
Developer Devign
Tool D2A
Synthetic Juliet Test Suite

Research Design
The University of Adelaide Slide 10
Inspect change in model
performance caused by
attempting to reduce data
quality issues.

Findings - Accuracy
The University of Adelaide Slide 11
“The degree to which the data has attributes that correctly represent the
true value of the intended attribute of a concept or event in a specific
context of use.”
Big-Vul 54.3%
Devign 80.0%
28.6%
D2A
100%
Juliet
Manually inspect
label correctness
-50%
Lower performance
on true labels
-29%
-80%

Findings - Uniqueness
The University of Adelaide Slide 12
“The degree to which there is no duplication in records.”
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Security Developer Tool Synthetic
Model Performance with and without
duplicates
Original No duplicates
-13.9%
-81.7%
-10.4%
Big-Vul 83.0%
Devign 89.9%
2.1%
D2A
16.3%
Juliet

Key Takeaways
The University of Adelaide Slide 13
State of the art software vulnerability datasets are imperfect.
Data quality significantly affects the performance of downstream software security
models.
We need better cleaning methods or more robust models to ensure reliability and
effective data driven software security.
Dataset Accuracy Uniqueness Consistency Completeness Currentness
Big-Vul
0.543 0.830 0.999 0.824 0.761
Devign
0.800 0.899 0.991 0.944 0.811
D2A
0.286 0.021 0.531 0.981 0.844
Juliet
1 0.163 0.750 1 NA
Dataset data
quality values

The document summarizes a literature review on security orchestration. The review analyzed papers from various sources to understand different aspects of security orchestration such as definitions, challenges it addresses, proposed solutions, adoption practices, and architectural considerations. Key findings include that security orchestration aims to integrate disparate security tools, automate incident response workflows, and bridge the gap between detection and response. It addresses issues like lack of interoperability, skills shortage and inefficient manual processes. Taxonomies of proposed solutions and open challenges in technology, people and processes are also discussed.

The Gap Between Academic Research and Industrial Practice in Software Testing

Zoltan Micskei

In software engineering, there is always a gap between the current research topics and the everyday industrial practice. However, in my experience this gap is much wider in software testing, e.g. advanced testing techniques seldom reach everyday testers. This gap can be attributed to several factors, the talk will highlight education and tools from the possible causes. In order to illustrate this gap the talk will collect and compere the topics of recent academic and industrial testing related conferences. My goal is to offer a glimpse into recent software testing research topics to practitioners, and start a discussion whether there is really a wide gap between academy and industry.

How to Extend Security and Compliance Within Box

Elastica Inc.

Choosing an enterprise-class file sharing service such as Box is a great first step in safely migrating to the cloud. However even with the most robust service, enterprise organizations are still responsible for how their users take advantage of the service, what sensitive content they upload and share, and potential damage due to compromised user credentials. In this on-demand webcast Eric Andrews, Elastica VP of Marketing, will discuss: • What base level security Box provides • Best practices in identifying sensitive, shared content that may violate compliance policies (PCI, PHI, PII, etc.) • Best practices in using data science to uncover risky or anomalous behavior

US AI Safety Institute and Trustworthy AI Details.

Bob Marcus

Lecture 02 Software Management Renaissance.ppt

Getahuntigistu5

The document discusses software management and the evolution of approaches to software development. It covers the following key points: - Traditional "waterfall" models of software development had drawbacks like late risk resolution and focus on documentation over collaboration. - Newer agile approaches emphasize iterative development, early delivery of working software, and continuous improvement based on feedback. - Improving software economics involves optimizing factors like size, process, personnel skills, tools/environments, and quality requirements. Techniques like reuse, object orientation, and automated testing can help compress schedules and reduce costs. - Effective project management requires skills like hiring, communication, decision making, team building, and adapting to changes over time. Developing high

Experience Sharing on School Pentest Project

eLearning Consortium 電子學習聯盟

The document summarizes a school penetration testing project conducted by UDomain. They identified over 1,700 vulnerabilities across 10 school websites, including 20,000+ records of personal data. Critical vulnerabilities included SQL injection, XSS, and passwords in plaintext. Recommendations included more regular scanning, patching of outdated systems, and reliance on secure vendor solutions. UDomain demonstrated SQL injection techniques and explained their security services and qualifications.

Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...

Kieran Alden

The document discusses the role of software in research and the Software Sustainability Institute's (SSI) work to address challenges. SSI helps researchers make their software more sustainable and reusable through consulting, training, and community engagement. Case studies show how SSI has helped research groups improve software to enable new science and broader adoption. The document observes that software is now pervasive in research but culture does not widely support reuse or recognize software contributions. SSI aims to address gaps in skills, recognition, and sustainable practices to support digital research foundations.

Security Data Quality Challenges

CREST

first_resume

Tirumala Reddy Konireddy

This document is a resume for Tirumala Reddy K, a software developer and web developer currently pursuing an M.S. in Computer Science from the University of Missouri Kansas City with expected graduation in May 2015. Reddy has work experience as a software developer at Infotech Enterprises Ltd from July 2012 to December 2013 and as a web developer at the University of Missouri Kansas City from Summer 2014 to Fall 2014. Reddy's education includes a B.Tech in Electronics and Communications Engineering from Sri Venkateswara University College of Engineering obtained in May 2012 with a percentage of 82.7%.

Solnet dev secops meetup

pbink

DevSecOps aims to integrate security practices into DevOps workflows to deliver value faster and safer. It addresses challenges like keeping security practices aligned with continuous delivery models and empowered DevOps teams. DevSecOps incorporates security checks and tools into development pipelines to find and fix issues early. This helps prevent breaches like the 2017 Equifax hack, which exploited a known vulnerability. DevSecOps promotes a culture of collaboration, shared responsibility, and proactive security monitoring throughout the software development lifecycle.

Executing on the promise of the Internet of Things (IoT)

Dell World

The document discusses the Internet of Things (IoT) and how analytics can be used to extract insights from IoT data. It describes how IoT involves connecting sensors and devices to collect and transmit data for tracking, analysis, and actions. It also discusses how IoT analytics can reduce risks, optimize customer service, and couple device data with other data sources. Additionally, the document provides examples of how Dell Statistica advanced analytics software and Dell IoT solutions have helped customers in various industries like manufacturing, healthcare, utilities, and more.

Clone of an organization

IRJET Journal

This document discusses cloning an organization to allow testing and manipulation without affecting the original site. It defines cloning as creating an exact copy that can be used for tasks without risk to the original. Types of clones include the frontend design, backend design, and database. Benefits of cloning for software testing are that it is cost-effective, improves security and product quality, and increases customer satisfaction. The document then discusses various software testing types, reverse engineering, and software development life cycles like waterfall, RAD, spiral, V-model, incremental, agile, iterative, big bang and prototype models. The conclusion is that cloning can help test and learn new features without interrupting the original organization's data and business.

Agile methods cost of quality

Cristiano Caetano

This document discusses the benefits of testing early and often using agile methods. It begins with background on the author and then discusses challenges with traditional project management approaches. Key benefits of agile testing highlighted include finding defects much earlier, improving productivity, and increasing project success rates. The document provides an overview of agile testing practices and how they improve the testing workflow. It also discusses how agile testing approaches can help control costs and improve overall project quality.

Agile Methods Cost of Quality: Benefits of Testing Early & Often

David Rico

Murali Krishnan Narayanan_Resume

Murali krishnan

Murali Krishnan Narayanan is a software quality professional with over 8 years of experience in functional testing. He has extensive experience in testing web and mobile applications, defining test plans and cases, executing tests, and reporting on results. He is proficient in defect tracking tools like Jira and has worked on projects in various domains like publishing, aviation, gaming, ecommerce, and elearning.

Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise

DataWorks Summit

The document discusses different levels of analytics maturity from reactive operational reporting to prescriptive analytics. It provides examples of analytics applications including predicting top talent retention and identifying abnormal patterns in organizational structures. The second half of the document focuses on building a state-of-the-art analytics system, outlining key components like data integration, machine learning pipelines for feature extraction, model training and evaluation, and publishing results.

Explainable Artificial Intelligence (XAI)  to Predict and Explain Future Soft...

Chakkrit (Kla) Tantithamthavorn

This document discusses explainable artificial intelligence (XAI) for predicting and explaining future software defects. It describes how software analytics can be used to mine data from issue tracking systems and version control systems to build analytical models for software defect prediction. The document outlines a framework called MAME that involves mining data, analyzing metrics, building models, and explaining predictions. Accurate prediction of defects is important, but explanations are also needed to address regulatory concerns and help practitioners prioritize resources effectively.

Sinha_WhitePaper

Mayank Sinha

This document summarizes a white paper about automating test data generation. It discusses how manual testing and data generation is costly and inefficient. Current solutions like using production data are risky and don't support scalability. The paper then introduces a tool called DataGen that was developed to automate test data generation for various databases. DataGen aims to generate high-volume data with minimal human intervention to improve software quality while reducing business risks and testing costs.

Md Ismail_QA

Md Ismail Sharfi

The document provides a summary of MD Ismail Sharfi's professional experience and qualifications. It outlines over 3 years of experience in software testing using both manual and automation techniques. Some of the skills and tools listed include Selenium, TestNG, Java, SQL, and experience in functional, regression, and performance testing. It also provides contact information and education history.

Data Driven Testing Is More Than an Excel File

Mehmet Gök

This document discusses data-driven testing and test data management. It covers several frameworks for data-driven testing including keyword-driven testing and behavior-driven development. It also discusses concepts for managing test data like subsetting, synthetic data generation, data integrity, and approaches like data modeling, discovery, and profiling test data. Finally, it discusses tools for test data management and service virtualization and considerations for selecting tools.

Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...

gerogepatton

The history of Artificial Intelligence and Machine Learning dates back to 1950’s. In recent years, there has been an increase in popularity for applications that implement AI and ML technology. As with traditional development, software testing is a critical component of an efficient AI/ML application. However, the approach to development methodology used in AI/ML varies significantly from traditional development. Owing to these variations, numerous software testing challenges occur. This paper aims to recognize and to explain some of the biggest challenges that software testers face in dealing with AI/ML applications. For future research, this study has key implications. Each of the challenges outlined in this paper is ideal for further investigation and has great potential to shed light on the way to more productive software testing strategies and methodologies that can be applied to AI/ML applications.

Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...

gerogepatton

SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...

ijaia

Shivani jain

Shivani Jain

Shivani Jain seeks a position as an IT professional to utilize her technical and intellectual abilities. She has a M.Tech in Information Technology from GGS Indraprastha University with 76.03% and a B.Tech in Information Technology from HMR Institute of Technology and Management with 74.2%. Her experience includes research work at ICAR-Indian Agricultural Statistical Research Institute and teaching at Mahan Institute of Technologies. She is proficient in languages like Java, C++, HTML, and technologies like CloudAnalyst and CloudSim.

AI for Software Testing Excellence in 2024

Testgrid.io

BUSTED! How to Find Security Bugs Fast!

Parasoft

This presentation explores how busting software bugs does more than ensure the reliability and performance of your software—it helps ensure application security. Topics covered include: How AppSec processes are really quality processes How software bugs are really security vulnerabilities How to apply coding standards as part of a continuous testing process to prevent defects from affecting the safety, security, and reliability of your applications

Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...

CREST

Making Software and Software Engineering visible

CREST

Similar to Data Quality for Software Vulnerability Dataset

Doing Science Properly In The Digital Age - Rutgers Seminar

Neil Chue Hong

Security Data Quality Challenges

CREST

first_resume

Tirumala Reddy Konireddy

Solnet dev secops meetup

pbink

Executing on the promise of the Internet of Things (IoT)

Dell World

Clone of an organization

IRJET Journal

Agile methods cost of quality

Cristiano Caetano

Agile Methods Cost of Quality: Benefits of Testing Early & Often

David Rico

Murali Krishnan Narayanan_Resume

Murali krishnan

Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise

DataWorks Summit

Explainable Artificial Intelligence (XAI)  to Predict and Explain Future Soft...

Chakkrit (Kla) Tantithamthavorn

Sinha_WhitePaper

Mayank Sinha

Md Ismail_QA

Md Ismail Sharfi

Data Driven Testing Is More Than an Excel File

Mehmet Gök

Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...

gerogepatton

Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...

gerogepatton

SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...

ijaia

Shivani jain

Shivani Jain

AI for Software Testing Excellence in 2024

Testgrid.io

BUSTED! How to Find Security Bugs Fast!

Parasoft

Similar to Data Quality for Software Vulnerability Dataset (20)

Doing Science Properly In The Digital Age - Rutgers Seminar

Security Data Quality Challenges

first_resume

Solnet dev secops meetup

Executing on the promise of the Internet of Things (IoT)

Clone of an organization

Agile methods cost of quality

Agile Methods Cost of Quality: Benefits of Testing Early & Often

Murali Krishnan Narayanan_Resume

Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise

Explainable Artificial Intelligence (XAI)  to Predict and Explain Future Soft...

Sinha_WhitePaper

Md Ismail_QA

Data Driven Testing Is More Than an Excel File

Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...

SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...

Shivani jain

AI for Software Testing Excellence in 2024

BUSTED! How to Find Security Bugs Fast!

More from CREST

Mobile Devices: Systemisation of Knowledge about Privacy Invasion Tactics and...

CREST

Making Software and Software Engineering visible

CREST

Understanding and Addressing Architectural Challenges of Cloud- Based Systems

CREST

This keynote talk discusses architectural challenges of cloud-based systems. It begins with background on the speaker and an outline of the talk. The speaker then discusses why software architecture is important and key facets of cloud computing and architecture. Several research challenges are presented, such as interoperability, privacy, scalability, and service level agreement compliance. The talk emphasizes the need to systematically build and leverage architectural knowledge for cloud-based systems. Approaches discussed include classifying cloud architecture knowledge, discovering architecture styles, and developing an architecture design knowledge ecosystem. The talk concludes that software architecture plays a vital role in cloud systems and that building architectural knowledge is important for developing and migrating systems to the cloud.

DevSecOps: Continuous Engineering with Security by Design: Challenges and Sol...

CREST

Some key takeaways from this talk are outlined below. The main focus area for researchers in DevSecOps is automation and tool usage. Older technologies, such as SAST & DAST tools have drawbacks that affect DevSecOps goals. Shift-left security and continuous security assessment are two key recommendations. These practices prioritise security in a continuous manner throughout the deployment cycle. Inability to automate traditionally manual security practices is a significant problem in this field. These practices are hard to be fully integrated with the continuous practices of DevOps. Even though cultural or human aspects are critical for DevSecOps success, these has not been much done in the state-of-the-art and the state-of-the-practice domains Adopting DevSecOps principles or practices in various complex, resource-constrained, and highly regulated infrastructures is a growing area of research. More empirically evaluated solutions are needed to ensure wider adoption of such tools or frameworks

A Deep Dive into the Socio-Technical Aspects of Delays in Security Patching

CREST

Mining Software Repositories for Security: Data Quality Issues Lessons from T...

CREST

This presentation highlights a range of issues that arise when dealing with data quality, and poses several recommendations, including: Consideration of Label Noise in Negative Class • Semi-Supervised, e.g., self-training, positive or Unlabeled training on unlabeled set • Consideration of Timeliness • Currently labeled data & more positive samples; Preserve data sequence for training • Use of Data Visualization • Try to achieve better data understandability for non data scientists • Creation and Use of Diverse Language Datasets • Bug seeding into semantically similar languages • Use of Data Quality Assessment Criteria • Determine and use specific data quality assessment approaches • Better Data Sharing and Governance

A Decentralised Platform for Provenance Management of Machine Learning Softwa...

CREST

Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...

CREST

Falling for Phishing: An Empirical Investigation into People's Email Response...

CREST

An Experience Report on the Design and Implementation of an Ad-hoc Blockchain...

CREST

This document summarizes an experience report on the design and implementation of an ad-hoc blockchain platform for tactical edge applications. The platform was designed to enable multi-task group collaboration in tactical edge environments characterized by limited resources and intermittent connectivity. Key requirements for the platform included decentralization, proximity to the tactical edge, information trustworthiness, and provenance of processing results. The architectural design included a peer-to-peer architecture leveraging an ad-hoc blockchain as a connector between decentralized peer nodes. The platform was implemented and evaluated in an emergency response case study involving search and rescue missions. The case study demonstrated the platform's ability to decentralize and maintain a dynamic reference information library in a tactical edge environment.

Gazealytics: A Unified and Flexible Visual Toolkit for Exploratory and Compar...

CREST

Detecting Misuses of Security APIs: A Systematic Review

CREST

Chen_Reading Strategies for Graph Visualizations that Wrap Around in Torus To...

CREST

Mod2Dash Presentation

CREST

Run-time Patching and updating Impact Estimation

CREST

ECSA 2023 Ubuntu Case Study

CREST

Energy Efficiency Evaluation of Local and Offloaded Data Processing

CREST

Designing Quality-Driven Blockchain Networks

CREST

Privacy Engineering in the Wild

CREST

CREST Overview

CREST

More from CREST (20)