Friday 7 September 2012

Copying the dead

In a previous POST we looked at doing a file signature check on all the files in the live set, then using awk to search the file descripter field in the resultant database.
Once we do our awk search we send the results to a text file, so now we need to know how to use that text file to further process our results.

Lets imagine we have used searched for the string "image data" in our file description field to identify all our graphic files, and created a text file of the fine names and file paths in a file called live_pics.txt in our /tmp folder with this command:

awk -F: '$2 ~ /image\ data/ {print $1}' listoffiles > /tmp/live_pics.txt

One thing you definitely DON'T want to do is use the text file in a loop and the cp (copy command) to copy data out, like this:

cat /tmp/live_pics.txt | while read IMG ; do cp $IMG  /mnt/case/images ; done

Our command reads the live_pics.txt file, line by line and copies each file out to a single directory on an external drive mounted at /mnt/case.  The reason we don't do this is that if we have two files with the same name in our live_pics list (but in different directories) then the cp command will copy out the first file but then overwrite it with the second file - because a file with that name already exists in our receiving directory.  Also, if a file name in our list of pictures, happens to start with a "-" character then the shell will interpret the remainder of the string as an option to the cp command resulting in an error message.  In addition, if there is any white space in the file path or file name, the shell will assume that that is the end of the line, and fail to copy the image out.  Here is my solution to the problem; I use a function that checks to see if a file with the file name already exists, if so it appends [1], [2] etc to the file.  I had to overcome my fear and loathing of perl to introduce a perl regular expression for checking if the file name already exists.  I set the Internal Field Separator environmental variable ($IFS) to a newline, thus the function uses a new line character as a marker for the end of a line (ignoring the white space in any file paths). I also include a "--" after the -p option to let the shell know that we have finished with our options. Here is the function and a few lines of code to show how you would use the function:


filecp () {
filepath="$1"
filename=`basename "$1"`
     while [ -e $dir/"$filename" ]; do
            filename=`echo "$filename" | perl -pe 's/(\[(\d+)\])?(\..*)?$/"[".(1+$2)."]$3"/e;'`
     done
 cp -p -- "$filepath" $dir/"$filename"
}

IFS='\n'
dir=/mnt/case/images
cat /tmp/live_pics.txt | while read IMG ; do filecp $IMG ; done
unset IFS

Obviously the same principle applies to any list of files that you want to copy out from the file system, so the code can be integrated into any of your scripts in your previewing system.  If you haven't been using some of the defensive programming techniques in this code when using the cp command, you really need this code!

No comments:

Post a Comment