spug_2008-08

420 views

Published on

Using File::Find and MP3::Tag to find duplicate mp3 files

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
420
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

spug_2008-08

  1. 1. Simple Perl Using File::Find and MP3::Tag to search through a junk drawer of mp3 files, finding duplicates
  2. 2. File::Find <ul><ul><li>Searches a directory tree </li></ul></ul><ul><ul><li>Invokes your callback ( &wanted subroutine) for each thing </li></ul></ul><ul><ul><li>Your callback subroutine does something with the thing </li></ul></ul>
  3. 3. Using File::Find <ul><ul><li>Create your callback subroutine </li></ul></ul><ul><ul><li>Call find() with your callback and a list of directories as arguments </li></ul></ul>sub wanted { # do something neat ... } find( &wanted , @directories );
  4. 4. &wanted <ul><li>sub wanted { </li></ul><ul><li>say &quot; $_ &quot;; </li></ul><ul><li>say &quot; $File::Find::dir &quot;; </li></ul><ul><li>say &quot; $File::Find::name &quot;; </li></ul><ul><li>} </li></ul>
  5. 5. 01_find <ul><li>#!/usr/local/bin/perl </li></ul><ul><li>use v5.10; </li></ul><ul><li>use strict; </li></ul><ul><li>use warnings; </li></ul><ul><li>use File::Find; </li></ul><ul><li>#============================== </li></ul><ul><li># main program </li></ul><ul><li># take any command line arguments as the names of directories to search </li></ul><ul><li>my @dirs_to_search = @ARGV ; </li></ul><ul><li># if no search dirs were specified, just use '.' </li></ul><ul><li>if ( ! @dirs_to_search ) { </li></ul><ul><li>@dirs_to_search = ( '.' ); </li></ul><ul><li>} </li></ul><ul><li>find( &process_file , @dirs_to_search ); </li></ul>
  6. 6. 01_find (cont.) <ul><li>sub process_file { </li></ul><ul><li># $_ is set to the name of the current file </li></ul><ul><li># $File::Find::dir is the name of the containing </li></ul><ul><li># directory $File::Find::name is the full path </li></ul><ul><li>say &quot;$_ < $_ >&quot;; </li></ul><ul><li>say &quot;$File::Find::dir < $File::Find::dir >&quot;; </li></ul><ul><li>say &quot;$File::Find::name < $File::Find::name >&quot;; </li></ul><ul><li>say ''; # blank line </li></ul><ul><li>} </li></ul>
  7. 7. 02_find_type <ul><li>sub process_file { </li></ul><ul><li>my $type ; </li></ul><ul><li>if ( -f $_ ) { </li></ul><ul><li>$type = 'normal file'; </li></ul><ul><li>} </li></ul><ul><li>elsif ( -d $_ ) { </li></ul><ul><li>$type = 'directory'; </li></ul><ul><li>} </li></ul><ul><li>else { </li></ul><ul><li>$type = 'other'; </li></ul><ul><li>} </li></ul><ul><li>say &quot;file: < $_ >&quot;; </li></ul><ul><li>say &quot;type: < $type >&quot;; </li></ul><ul><li>say ''; </li></ul><ul><li>} </li></ul>
  8. 8. 03_find_mp3 <ul><li>sub process_file { </li></ul><ul><li># skip anything that isn't a normal file </li></ul><ul><li>if ( not -f $_ ) { </li></ul><ul><li>return; </li></ul><ul><li>} </li></ul><ul><li># skip any normal file that </li></ul><ul><li># doesn't have an .mp3 suffix </li></ul><ul><li>if ( not /.mp3$/ ) { </li></ul><ul><li>return; </li></ul><ul><li>} </li></ul><ul><li>say &quot;file <$_>&quot;; </li></ul><ul><li>} </li></ul>
  9. 9. 04_find_mp3 <ul><li>sub process_file { </li></ul><ul><li># skip anything that isn't a normal file </li></ul><ul><li>if ( not -f $_ ) { </li></ul><ul><li>return; </li></ul><ul><li>} </li></ul><ul><li>my $mime = qx{ /usr/bin/file -bi &quot;$_&quot; }; </li></ul><ul><li>chomp $mime; </li></ul><ul><li># &quot;text/plain; charset=us-ascii&quot; </li></ul><ul><li># ... get rid of charset or other extra info </li></ul><ul><li>$mime =~ s/;.*//; </li></ul><ul><li># skip any non mp3 files </li></ul><ul><li>if ( $mime ne 'audio/mpeg' ) { </li></ul><ul><li>warn &quot;skipping [wrong mimetype] file <$_> mime: <$mime> &quot;; </li></ul><ul><li>return; </li></ul><ul><li>} </li></ul><ul><li>say &quot; ** got an mp3 file: <$_>&quot;; </li></ul><ul><li>} </li></ul>
  10. 10. <ul><li>touch </li></ul><ul><li>'&quot;; echo &quot;<$$> pwned orz&quot; >> orz.log; echo&quot;' </li></ul><ul><li>> ls </li></ul><ul><li>&quot;; echo &quot;<$$> pwned orz&quot; >> orz.log; echo&quot; </li></ul><ul><li># within process_file() </li></ul><ul><li># $_ = q{&quot;; echo &quot;<$$> pwned orz&quot; >> orz.log; echo&quot;}; </li></ul><ul><li># ... </li></ul><ul><li>my $mime = qx{ /usr/bin/file -bi &quot;$_&quot; } ; </li></ul><ul><li>/usr/bin/file -bi &quot;&quot;; echo &quot;<$$> pwned orz&quot; >> orz.log; echo&quot;&quot; </li></ul>DANGERS
  11. 11. 05_find_mp3_secure <ul><li>#!/usr/local/bin/perl -T </li></ul><ul><li>BEGIN { </li></ul><ul><li># delete certain tainted environment variables </li></ul><ul><li>delete @ENV{ qw( PATH ENV ) }; </li></ul><ul><li>} </li></ul><ul><ul><li>Turn on Taint mode </li></ul></ul>
  12. 12. 05_find_mp3_secure (cont.) <ul><li>my $shellsafe = qr{^([-@w./ ]+)$} ; </li></ul><ul><li>find( </li></ul><ul><li>{ </li></ul><ul><li>wanted => &process_file , </li></ul><ul><li>untaint => 1, </li></ul><ul><li>untaint_pattern => $shellsafe , </li></ul><ul><li>untaint_skip => 1, </li></ul><ul><li>no_chdir => 1, </li></ul><ul><li>}, </li></ul><ul><li>@dirs_to_search, </li></ul><ul><li>); </li></ul>
  13. 13. 05_find_mp3_secure (cont.) <ul><li>sub process_file { </li></ul><ul><li>my $file; </li></ul><ul><li>if ( m/$shellsafe/ ) { </li></ul><ul><li># untaint the safe filename </li></ul><ul><li>$file = $1; </li></ul><ul><li>} </li></ul><ul><li>else { </li></ul><ul><li>warn &quot;skipping [suspicious name] file: <$_> &quot;; </li></ul><ul><li>return; </li></ul><ul><li>} </li></ul><ul><li># now use $file instead of $_ </li></ul><ul><li># ... </li></ul><ul><li>} </li></ul>
  14. 14. MP3::Tag <ul><li>use MP3::Tag ; </li></ul><ul><li>my $mp3 = MP3::Tag-> new ( $filename ); </li></ul><ul><li>my ( </li></ul><ul><li>$title, $track, $artist, $album, </li></ul><ul><li>$comment, $year, $genre, </li></ul><ul><li>) = $mp3-> autoinfo() ; </li></ul><ul><li># or </li></ul><ul><li>my $info = {}; # hashref </li></ul><ul><li># hash slice </li></ul><ul><li>@ { $info }{ qw(title track artist album comment year genre) } </li></ul><ul><li>= $mp3->autoinfo(); </li></ul>
  15. 15. 06_mp3_info <ul><li># process_file( writes directly into this </li></ul><ul><li>my $mp3_database = { }; </li></ul><ul><li>find( ... ); </li></ul><ul><li># use Data::Dumper; </li></ul><ul><li># print Dumper( $mp3_database ); </li></ul><ul><li># use JSON; </li></ul><ul><li># print to_json( $mp3_database ); </li></ul><ul><li>use YAML ; </li></ul><ul><li>print Dump( $mp3_database ) ; </li></ul>
  16. 16. 06_mp3_info (cont.) <ul><li>sub process_file { </li></ul><ul><li># ... </li></ul><ul><li>my $mp3 = MP3::Tag->new( $file ); </li></ul><ul><li>@{ $mp3_database ->{ $file } } </li></ul><ul><li>{ qw( title track artist </li></ul><ul><li>album comment year genre ) } </li></ul><ul><li>= $mp3-> autoinfo (); </li></ul><ul><li>} </li></ul>
  17. 17. 07_find_mp3_dupes <ul><li>sub process_file { </li></ul><ul><li># ... </li></ul><ul><li>my $info = {}; </li></ul><ul><li>$info->{ file } = $file ; </li></ul><ul><li>my $mp3 = MP3::Tag->new( $file ); </li></ul><ul><li>@{ $info->{ mp3 } }{ </li></ul><ul><li>qw( title track artist </li></ul><ul><li>album comment year genre ) </li></ul><ul><li>} = $mp3->autoinfo() ; </li></ul><ul><li># continued ... </li></ul>
  18. 18. 07_find_mp3_dupes (cont.) <ul><li>my $song = join '|', </li></ul><ul><li>map { </li></ul><ul><li>my $_ = lc $_; </li></ul><ul><li>tr/àáâäãå/aaaaaa/; </li></ul><ul><li>tr/èéêë/eeee/; </li></ul><ul><li>tr/ìíîïĩ/iiiii/; </li></ul><ul><li>tr/òóôöõ/ooooo/; </li></ul><ul><li>tr/ùúûüũ/uuuuu/; </li></ul><ul><li>tr/ñýÿ/nyy/; </li></ul><ul><li>s/s+//g; </li></ul><ul><li>$_; </li></ul><ul><li>} </li></ul><ul><li>@{ $info->{ mp3 } }{ qw( artist title ) }; </li></ul><ul><li>push @{ $mp3_database ->{ $song } }, $info ; </li></ul><ul><li>} </li></ul>
  19. 19. 07_find_mp3_dupes (cont.) <ul><li>find( ... ); </li></ul><ul><li># print Dump( $mp3_database ); </li></ul><ul><li>my @dupes = grep { @$_ > 1 } values %$mp3_database ; </li></ul><ul><li>for my $dupe ( @dupes ) { </li></ul><ul><li>say &quot; *** Duplicate Songs ***&quot;; </li></ul><ul><li>print Dump ( $dupe ); </li></ul><ul><li>} </li></ul><ul><li>say &quot; &quot;; </li></ul>

×