SlideShare a Scribd company logo
Invalidating Copyright
Infringement Claims with
Python and Fuzzy
Hashing
Joe T. Sylve, M.S.

Managing Partner
504ENSICS Labs
Background
• Client was being sued for Copyright Infringement
• Client’s lawyer wanted two questions answered
• Does the code contain any open source or GPL code?
• When was the code in question written?

• Code was written in PHP (web-based application)
• Code had absolutely no comments
• No copyright headers
• No dates of any kind

www.504ensics.com
Goal
• If it can be proven that the code contains open
source or GPL code with restrictive licenses then
the claim in invalid
• If it can be proven that the copyright code on file
was written after the author’s claimed “creation
date”, Copyright is invalid

www.504ensics.com
Is code original?
• No comments or header’s that would imply
authorship
• Code didn’t look familiar
• Code was kind of crappy

www.504ensics.com
Step 1 – Acquire Samples
• Wrote Python script to download all projects
written in PHP from Github
• Scraped from search feature
• Limited to 50 pages of search

• Got something like 10GB of compressed code
• ~100,000 files

www.504ensics.com
Step 2 – Compare Code
• Three Options
• Manual Verification
• Grad Students, Interns, etc

• Cryptographic Hashing
• MD5, SHA-1, etc

• “Fuzzy” Hashing
• ssdeep, sdhash

www.504ensics.com
Fuzzy Hashing
• Vassil says I have to call it “Approximate Matching”
• Ssdeep
• Vassil Roussev & Candace Quates
• Free, Open Source
• Awesome

• Traditional hashing
• If a single bit of the input changes, the whole hash
changes

• Fuzzy Hashing
• Compares files and gives similarity index
• Can find “similar” files
www.504ensics.com
When was code written?
• We can invalidate copyright if the sample on file
was written after the claimed authorship date
• No comments or dates of any kind in the code!
• No access to developer’s workstation to do
traditional forensics
• ???

www.504ensics.com
PHP
• Web-based language
• Updated reasonably frequently
• New Features added often
• Goal
• Determine which features were used in the code
• Correlate features with PHP release date
• Code couldn’t have been written before this date

www.504ensics.com
Step 1 – Function Use
• Programmer can create own functions or use ones
available in the language
• Ex
• function plus_one($x) { return $x + 1; }

• Python script to find all function declarations and
calls
• Ignore declared functions
• Left with a list of language “features” used

www.504ensics.com
Step 2 – Version Detection
• PHP comes with auto-generated documentation
about each built-in function
• Documentation says which version each function
became first available
• Write python script to scrape PHP documentation
• Correlate functions with PHP versions
• We only care about the function with the newest
version

www.504ensics.com
Step 3 – Date the code
• PHP has an archive of release notes on their
website
• Contains release versions and dates
• Python script scrapes release notes for the PHP
version of interest and gives us the release date
• Reasonably, the code couldn’t have been written
before that date

www.504ensics.com
Step 4 – Profit
• Win!
• Code in question used features first available in
PHP 5.1.5
• Release date 17-Aug-2006
• This was after the claimed creation date

www.504ensics.com
Conclusion
• Sometimes you can’t depend solely on existing
tools
• Learn to program even if you’re not a
“programmer”
• PHP sucks
• Fuzzy Hashing and Python is Cool

www.504ensics.com

More Related Content

What's hot

Specification-driven API Design with OpenAPI
Specification-driven API Design with OpenAPISpecification-driven API Design with OpenAPI
Specification-driven API Design with OpenAPI
Lukas Leander Rosenstock
 
apidays LIVE London 2021 - Designing APIs: Less Data is More by Damir Svrtan,...
apidays LIVE London 2021 - Designing APIs: Less Data is More by Damir Svrtan,...apidays LIVE London 2021 - Designing APIs: Less Data is More by Damir Svrtan,...
apidays LIVE London 2021 - Designing APIs: Less Data is More by Damir Svrtan,...
apidays
 
Native Script by Sebastian Witalec
Native Script by Sebastian WitalecNative Script by Sebastian Witalec
Native Script by Sebastian Witalec
Simone Basso
 
Managing Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub EraManaging Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub Era
nexB Inc.
 
nexB Software Audit M&A: What to expect as a Seller
nexB Software Audit M&A: What to expect as a SellernexB Software Audit M&A: What to expect as a Seller
nexB Software Audit M&A: What to expect as a Seller
nexB Inc.
 
Magento 2 performance profiling and best practices
Magento 2 performance profiling and best practicesMagento 2 performance profiling and best practices
Magento 2 performance profiling and best practices
Jacques Bodin-Hullin
 
Reaching Out To Developers
Reaching Out To DevelopersReaching Out To Developers
Reaching Out To Developers
Christian Heilmann
 
OmegaT "Team Project" feature: a case study
OmegaT "Team Project" feature: a case studyOmegaT "Team Project" feature: a case study
OmegaT "Team Project" feature: a case study
Qabiria
 
Google Developer Day 2010 Japan: 音声入力 API for Android (アレックス グランスタイン, 小西 祐介)
Google Developer Day 2010 Japan: 音声入力 API for Android (アレックス グランスタイン, 小西 祐介)Google Developer Day 2010 Japan: 音声入力 API for Android (アレックス グランスタイン, 小西 祐介)
Google Developer Day 2010 Japan: 音声入力 API for Android (アレックス グランスタイン, 小西 祐介)
Google Developer Relations Team
 
How to Review your Translation with 2 Free and Open Source QA Tools
How to Review your Translation with 2 Free and Open Source QA ToolsHow to Review your Translation with 2 Free and Open Source QA Tools
How to Review your Translation with 2 Free and Open Source QA Tools
Qabiria
 
Android maven Road to flutter| Mavenizing Flutter for web
Android maven Road to flutter| Mavenizing Flutter for webAndroid maven Road to flutter| Mavenizing Flutter for web
Android maven Road to flutter| Mavenizing Flutter for web
OluwatobiAkinpelu
 
Effective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and DapperEffective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and Dapper
Mike Melusky
 
How to Manage Open Source requirements with AboutCode
How to Manage Open Source requirements with AboutCodeHow to Manage Open Source requirements with AboutCode
How to Manage Open Source requirements with AboutCode
nexB Inc.
 
Introduction to OmegaT
Introduction to OmegaTIntroduction to OmegaT
Introduction to OmegaT
Qabiria
 

What's hot (14)

Specification-driven API Design with OpenAPI
Specification-driven API Design with OpenAPISpecification-driven API Design with OpenAPI
Specification-driven API Design with OpenAPI
 
apidays LIVE London 2021 - Designing APIs: Less Data is More by Damir Svrtan,...
apidays LIVE London 2021 - Designing APIs: Less Data is More by Damir Svrtan,...apidays LIVE London 2021 - Designing APIs: Less Data is More by Damir Svrtan,...
apidays LIVE London 2021 - Designing APIs: Less Data is More by Damir Svrtan,...
 
Native Script by Sebastian Witalec
Native Script by Sebastian WitalecNative Script by Sebastian Witalec
Native Script by Sebastian Witalec
 
Managing Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub EraManaging Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub Era
 
nexB Software Audit M&A: What to expect as a Seller
nexB Software Audit M&A: What to expect as a SellernexB Software Audit M&A: What to expect as a Seller
nexB Software Audit M&A: What to expect as a Seller
 
Magento 2 performance profiling and best practices
Magento 2 performance profiling and best practicesMagento 2 performance profiling and best practices
Magento 2 performance profiling and best practices
 
Reaching Out To Developers
Reaching Out To DevelopersReaching Out To Developers
Reaching Out To Developers
 
OmegaT "Team Project" feature: a case study
OmegaT "Team Project" feature: a case studyOmegaT "Team Project" feature: a case study
OmegaT "Team Project" feature: a case study
 
Google Developer Day 2010 Japan: 音声入力 API for Android (アレックス グランスタイン, 小西 祐介)
Google Developer Day 2010 Japan: 音声入力 API for Android (アレックス グランスタイン, 小西 祐介)Google Developer Day 2010 Japan: 音声入力 API for Android (アレックス グランスタイン, 小西 祐介)
Google Developer Day 2010 Japan: 音声入力 API for Android (アレックス グランスタイン, 小西 祐介)
 
How to Review your Translation with 2 Free and Open Source QA Tools
How to Review your Translation with 2 Free and Open Source QA ToolsHow to Review your Translation with 2 Free and Open Source QA Tools
How to Review your Translation with 2 Free and Open Source QA Tools
 
Android maven Road to flutter| Mavenizing Flutter for web
Android maven Road to flutter| Mavenizing Flutter for webAndroid maven Road to flutter| Mavenizing Flutter for web
Android maven Road to flutter| Mavenizing Flutter for web
 
Effective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and DapperEffective .NET Core Unit Testing with SQLite and Dapper
Effective .NET Core Unit Testing with SQLite and Dapper
 
How to Manage Open Source requirements with AboutCode
How to Manage Open Source requirements with AboutCodeHow to Manage Open Source requirements with AboutCode
How to Manage Open Source requirements with AboutCode
 
Introduction to OmegaT
Introduction to OmegaTIntroduction to OmegaT
Introduction to OmegaT
 

Similar to Invalidating copyright infringement claims

WordPress Under Control (Boston WP Meetup)
WordPress Under Control (Boston WP Meetup)WordPress Under Control (Boston WP Meetup)
WordPress Under Control (Boston WP Meetup)
Matt Bernhardt
 
Managing Open Source Software Supply Chains
Managing Open Source Software Supply ChainsManaging Open Source Software Supply Chains
Managing Open Source Software Supply Chains
nexB Inc.
 
Php internal architecture
Php internal architecturePhp internal architecture
Php internal architecture
Elizabeth Smith
 
Shift Remote FRONTEND: Building Web Parasite Using Chrome Extension - Ivan Vu...
Shift Remote FRONTEND: Building Web Parasite Using Chrome Extension - Ivan Vu...Shift Remote FRONTEND: Building Web Parasite Using Chrome Extension - Ivan Vu...
Shift Remote FRONTEND: Building Web Parasite Using Chrome Extension - Ivan Vu...
Shift Conference
 
Python programming 2nd
Python programming 2ndPython programming 2nd
Python programming 2nd
Aishwarya Deshmukh
 
Web-App Remote Code Execution Via Scripting Engines
Web-App Remote Code Execution Via Scripting EnginesWeb-App Remote Code Execution Via Scripting Engines
Web-App Remote Code Execution Via Scripting Engines
c0c0n - International Cyber Security and Policing Conference
 
OSSF 2018 - Jamie Jones of GitHub - Pull what where? Contributing to Open Sou...
OSSF 2018 - Jamie Jones of GitHub - Pull what where? Contributing to Open Sou...OSSF 2018 - Jamie Jones of GitHub - Pull what where? Contributing to Open Sou...
OSSF 2018 - Jamie Jones of GitHub - Pull what where? Contributing to Open Sou...
FINOS
 
Juc boston2014.pptx
Juc boston2014.pptxJuc boston2014.pptx
Juc boston2014.pptx
Brandon Mueller
 
Developing rich multimedia applications with FI-WARE.
Developing rich multimedia applications with FI-WARE.Developing rich multimedia applications with FI-WARE.
Developing rich multimedia applications with FI-WARE.
Luis Lopez
 
Managing Software Inventories & Automating Open Source Software Compliance
Managing Software Inventories & Automating Open Source Software ComplianceManaging Software Inventories & Automating Open Source Software Compliance
Managing Software Inventories & Automating Open Source Software Compliance
nexB Inc.
 
Php
PhpPhp
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018
Den Delimarsky
 
CodeIgniter - PHP MVC Framework by silicongulf.com
CodeIgniter - PHP MVC Framework by silicongulf.comCodeIgniter - PHP MVC Framework by silicongulf.com
CodeIgniter - PHP MVC Framework by silicongulf.com
Christopher Cubos
 
Desktop Apps with PHP and Titanium
Desktop Apps with PHP and TitaniumDesktop Apps with PHP and Titanium
Desktop Apps with PHP and Titanium
Ben Ramsey
 
Tutorial Módulo 1 de Introdução com Flask
Tutorial Módulo 1 de Introdução com FlaskTutorial Módulo 1 de Introdução com Flask
Tutorial Módulo 1 de Introdução com Flask
Vinícius Marques
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
Antonio Peric-Mazar
 
PHP Frameworks Review - Mar 19 2015
PHP Frameworks Review - Mar 19 2015PHP Frameworks Review - Mar 19 2015
PHP Frameworks Review - Mar 19 2015
kyphpug
 
Open Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdfOpen Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdf
Javier Perez
 
Building RESTful APIs
Building RESTful APIsBuilding RESTful APIs
Building RESTful APIs
Silota Inc.
 
Modern Web 2016: Using Golang to build a smart IM Bot
Modern Web 2016: Using Golang to build a smart IM Bot Modern Web 2016: Using Golang to build a smart IM Bot
Modern Web 2016: Using Golang to build a smart IM Bot
Evan Lin
 

Similar to Invalidating copyright infringement claims (20)

WordPress Under Control (Boston WP Meetup)
WordPress Under Control (Boston WP Meetup)WordPress Under Control (Boston WP Meetup)
WordPress Under Control (Boston WP Meetup)
 
Managing Open Source Software Supply Chains
Managing Open Source Software Supply ChainsManaging Open Source Software Supply Chains
Managing Open Source Software Supply Chains
 
Php internal architecture
Php internal architecturePhp internal architecture
Php internal architecture
 
Shift Remote FRONTEND: Building Web Parasite Using Chrome Extension - Ivan Vu...
Shift Remote FRONTEND: Building Web Parasite Using Chrome Extension - Ivan Vu...Shift Remote FRONTEND: Building Web Parasite Using Chrome Extension - Ivan Vu...
Shift Remote FRONTEND: Building Web Parasite Using Chrome Extension - Ivan Vu...
 
Python programming 2nd
Python programming 2ndPython programming 2nd
Python programming 2nd
 
Web-App Remote Code Execution Via Scripting Engines
Web-App Remote Code Execution Via Scripting EnginesWeb-App Remote Code Execution Via Scripting Engines
Web-App Remote Code Execution Via Scripting Engines
 
OSSF 2018 - Jamie Jones of GitHub - Pull what where? Contributing to Open Sou...
OSSF 2018 - Jamie Jones of GitHub - Pull what where? Contributing to Open Sou...OSSF 2018 - Jamie Jones of GitHub - Pull what where? Contributing to Open Sou...
OSSF 2018 - Jamie Jones of GitHub - Pull what where? Contributing to Open Sou...
 
Juc boston2014.pptx
Juc boston2014.pptxJuc boston2014.pptx
Juc boston2014.pptx
 
Developing rich multimedia applications with FI-WARE.
Developing rich multimedia applications with FI-WARE.Developing rich multimedia applications with FI-WARE.
Developing rich multimedia applications with FI-WARE.
 
Managing Software Inventories & Automating Open Source Software Compliance
Managing Software Inventories & Automating Open Source Software ComplianceManaging Software Inventories & Automating Open Source Software Compliance
Managing Software Inventories & Automating Open Source Software Compliance
 
Php
PhpPhp
Php
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018
 
CodeIgniter - PHP MVC Framework by silicongulf.com
CodeIgniter - PHP MVC Framework by silicongulf.comCodeIgniter - PHP MVC Framework by silicongulf.com
CodeIgniter - PHP MVC Framework by silicongulf.com
 
Desktop Apps with PHP and Titanium
Desktop Apps with PHP and TitaniumDesktop Apps with PHP and Titanium
Desktop Apps with PHP and Titanium
 
Tutorial Módulo 1 de Introdução com Flask
Tutorial Módulo 1 de Introdução com FlaskTutorial Módulo 1 de Introdução com Flask
Tutorial Módulo 1 de Introdução com Flask
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
PHP Frameworks Review - Mar 19 2015
PHP Frameworks Review - Mar 19 2015PHP Frameworks Review - Mar 19 2015
PHP Frameworks Review - Mar 19 2015
 
Open Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdfOpen Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdf
 
Building RESTful APIs
Building RESTful APIsBuilding RESTful APIs
Building RESTful APIs
 
Modern Web 2016: Using Golang to build a smart IM Bot
Modern Web 2016: Using Golang to build a smart IM Bot Modern Web 2016: Using Golang to build a smart IM Bot
Modern Web 2016: Using Golang to build a smart IM Bot
 

Recently uploaded

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 

Recently uploaded (20)

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 

Invalidating copyright infringement claims

  • 1. Invalidating Copyright Infringement Claims with Python and Fuzzy Hashing Joe T. Sylve, M.S. Managing Partner 504ENSICS Labs
  • 2. Background • Client was being sued for Copyright Infringement • Client’s lawyer wanted two questions answered • Does the code contain any open source or GPL code? • When was the code in question written? • Code was written in PHP (web-based application) • Code had absolutely no comments • No copyright headers • No dates of any kind www.504ensics.com
  • 3. Goal • If it can be proven that the code contains open source or GPL code with restrictive licenses then the claim in invalid • If it can be proven that the copyright code on file was written after the author’s claimed “creation date”, Copyright is invalid www.504ensics.com
  • 4. Is code original? • No comments or header’s that would imply authorship • Code didn’t look familiar • Code was kind of crappy www.504ensics.com
  • 5. Step 1 – Acquire Samples • Wrote Python script to download all projects written in PHP from Github • Scraped from search feature • Limited to 50 pages of search • Got something like 10GB of compressed code • ~100,000 files www.504ensics.com
  • 6. Step 2 – Compare Code • Three Options • Manual Verification • Grad Students, Interns, etc • Cryptographic Hashing • MD5, SHA-1, etc • “Fuzzy” Hashing • ssdeep, sdhash www.504ensics.com
  • 7. Fuzzy Hashing • Vassil says I have to call it “Approximate Matching” • Ssdeep • Vassil Roussev & Candace Quates • Free, Open Source • Awesome • Traditional hashing • If a single bit of the input changes, the whole hash changes • Fuzzy Hashing • Compares files and gives similarity index • Can find “similar” files www.504ensics.com
  • 8. When was code written? • We can invalidate copyright if the sample on file was written after the claimed authorship date • No comments or dates of any kind in the code! • No access to developer’s workstation to do traditional forensics • ??? www.504ensics.com
  • 9. PHP • Web-based language • Updated reasonably frequently • New Features added often • Goal • Determine which features were used in the code • Correlate features with PHP release date • Code couldn’t have been written before this date www.504ensics.com
  • 10. Step 1 – Function Use • Programmer can create own functions or use ones available in the language • Ex • function plus_one($x) { return $x + 1; } • Python script to find all function declarations and calls • Ignore declared functions • Left with a list of language “features” used www.504ensics.com
  • 11. Step 2 – Version Detection • PHP comes with auto-generated documentation about each built-in function • Documentation says which version each function became first available • Write python script to scrape PHP documentation • Correlate functions with PHP versions • We only care about the function with the newest version www.504ensics.com
  • 12. Step 3 – Date the code • PHP has an archive of release notes on their website • Contains release versions and dates • Python script scrapes release notes for the PHP version of interest and gives us the release date • Reasonably, the code couldn’t have been written before that date www.504ensics.com
  • 13. Step 4 – Profit • Win! • Code in question used features first available in PHP 5.1.5 • Release date 17-Aug-2006 • This was after the claimed creation date www.504ensics.com
  • 14. Conclusion • Sometimes you can’t depend solely on existing tools • Learn to program even if you’re not a “programmer” • PHP sucks • Fuzzy Hashing and Python is Cool www.504ensics.com