Home > unix > Remove all non-printable ASCII Characters from filename

Remove all non-printable ASCII Characters from filename

Sometimes I get some crappy zipped/rared/whatever packages that contain¬†filenames that are not UTF-8 encoded, mostly from old package programs used on the Windows platform. What happens is that those packages will unpack just fine, but more often than not you end up with filenames that contain non-printable characters. PITA if there’s a lot of them. tr to the rescue!

ls -1 | while read file; do N=$(echo $file | tr -cd '\11\12\40-\176'); mv "$file" "$N"; done

What this does is basically:

  • get every filename in the current directory and toss it to tr
  • the -c and -d options used like this command tr to only output characters that we actually specify
  • the quoted argument tells tr to only retain the octal characters 11, 12 and 40 to 176. Octal 11 is Tab, 12 is linefeed (technically, this should be omitted from a filename, but I also use this to filter textfiles, so it comes in handy and is of no real harm here). 40 to 176 are the standard keyboard characters from space to ~, which we actually like in our filenames.
  • finally, we move the garbled filename to the new, cleaned up version.
About these ads
Categories: unix Tags: , ,
  1. February 6, 2012 at 1:35 pm

    Here’s a script which converts all non-ascii chars in filenames to translit analogues:
    https://github.com/bk322/bk-goodies/blob/master/bk-asciify-filenames.bash

  1. February 15, 2012 at 12:00 am
  2. August 2, 2013 at 3:06 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: