Your SlideShare is downloading. ×
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Invalidating copyright infringement claims
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Invalidating copyright infringement claims

60

Published on

How to invalidate copyright infringement claims by going into the code and locating the date.

How to invalidate copyright infringement claims by going into the code and locating the date.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
60
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Invalidating Copyright Infringement Claims with Python and Fuzzy Hashing Joe T. Sylve, M.S. Managing Partner 504ENSICS Labs
  • 2. Background • Client was being sued for Copyright Infringement • Client’s lawyer wanted two questions answered • Does the code contain any open source or GPL code? • When was the code in question written? • Code was written in PHP (web-based application) • Code had absolutely no comments • No copyright headers • No dates of any kind www.504ensics.com
  • 3. Goal • If it can be proven that the code contains open source or GPL code with restrictive licenses then the claim in invalid • If it can be proven that the copyright code on file was written after the author’s claimed “creation date”, Copyright is invalid www.504ensics.com
  • 4. Is code original? • No comments or header’s that would imply authorship • Code didn’t look familiar • Code was kind of crappy www.504ensics.com
  • 5. Step 1 – Acquire Samples • Wrote Python script to download all projects written in PHP from Github • Scraped from search feature • Limited to 50 pages of search • Got something like 10GB of compressed code • ~100,000 files www.504ensics.com
  • 6. Step 2 – Compare Code • Three Options • Manual Verification • Grad Students, Interns, etc • Cryptographic Hashing • MD5, SHA-1, etc • “Fuzzy” Hashing • ssdeep, sdhash www.504ensics.com
  • 7. Fuzzy Hashing • Vassil says I have to call it “Approximate Matching” • Ssdeep • Vassil Roussev & Candace Quates • Free, Open Source • Awesome • Traditional hashing • If a single bit of the input changes, the whole hash changes • Fuzzy Hashing • Compares files and gives similarity index • Can find “similar” files www.504ensics.com
  • 8. When was code written? • We can invalidate copyright if the sample on file was written after the claimed authorship date • No comments or dates of any kind in the code! • No access to developer’s workstation to do traditional forensics • ??? www.504ensics.com
  • 9. PHP • Web-based language • Updated reasonably frequently • New Features added often • Goal • Determine which features were used in the code • Correlate features with PHP release date • Code couldn’t have been written before this date www.504ensics.com
  • 10. Step 1 – Function Use • Programmer can create own functions or use ones available in the language • Ex • function plus_one($x) { return $x + 1; } • Python script to find all function declarations and calls • Ignore declared functions • Left with a list of language “features” used www.504ensics.com
  • 11. Step 2 – Version Detection • PHP comes with auto-generated documentation about each built-in function • Documentation says which version each function became first available • Write python script to scrape PHP documentation • Correlate functions with PHP versions • We only care about the function with the newest version www.504ensics.com
  • 12. Step 3 – Date the code • PHP has an archive of release notes on their website • Contains release versions and dates • Python script scrapes release notes for the PHP version of interest and gives us the release date • Reasonably, the code couldn’t have been written before that date www.504ensics.com
  • 13. Step 4 – Profit • Win! • Code in question used features first available in PHP 5.1.5 • Release date 17-Aug-2006 • This was after the claimed creation date www.504ensics.com
  • 14. Conclusion • Sometimes you can’t depend solely on existing tools • Learn to program even if you’re not a “programmer” • PHP sucks • Fuzzy Hashing and Python is Cool www.504ensics.com

×