Sometimes I get some crappy zipped/rared/whatever packages that contain filenames that are not UTF-8 encoded, mostly from old package programs used on the Windows platform. What happens is that those packages will unpack just fine, but more often than not you end up with filenames that contain non-printable characters. PITA if there’s a lot of them. tr to the rescue!
ls -1 | while read file; do N=$(echo $file | tr -cd '\11\12\40-\176'); mv "$file" "$N"; done
What this does is basically:
- get every filename in the current directory and toss it to tr
- the -c and -d options used like this command tr to only output characters that we actually specify
- the quoted argument tells tr to only retain the octal characters 11, 12 and 40 to 176. Octal 11 is Tab, 12 is linefeed (technically, this should be omitted from a filename, but I also use this to filter textfiles, so it comes in handy and is of no real harm here). 40 to 176 are the standard keyboard characters from space to ~, which we actually like in our filenames.
- finally, we move the garbled filename to the new, cleaned up version.