Forensicator Of The Dead: Identifying the dead

When previewing a system, once you get down to the file system level, the first thing you might want to do is create a database of live files and what those files actually are - what is known as a file signature analysis. Linux generally relies on file signatures to identify file types, rather than file extension (which is the Windows way).
Therefore there is already a an expansive file signature database installed on all Linux systems named "magic" (often found at /usr/share/misc). In addition you can create you own magic file with your customised signatures. The magic file is used extensively by the "file" command. The file command compares the signature in a regular file to the signatures in the magic database(s) and returns a description of the file based on the content of the magic file. Most forensicators understand the significance of doing a file signature check as opposed to relying on file extension.
A user can change a file extension, the firefox web browser caches files to its web cache with no file extension, the opera browser caches files with a .tmp file extension. So if you are viewing graphic files via your forensic suite, or even if you have exported them to view in a file browser, have you checked whether your tool of choice is working on the file extenison or file signature? It may be that you are missing 1000's of images in, say, the firefox cache based on faulty assumptions...make sure you check, chaps! Sorting files by file extension is about as effective as aiming for the centre of mass on a zombie in the hope of stopping them. Remember: Head shots to destroy zombies, file signature analysis to identify file types. Simples!

Anyway, we can start to create our previewing script. Remember with my file system mounting SCRIPT all the file systems were mounted under the /media node, thus /dev/sda1 gets mounted at /media/sda1, /dev/sda2 at /media/sda2 etc etc. The script also initiates a loop to process each mounted file system in turn and exports some variables that can be used by our processing script. Also, our external drive is mounted at /mnt/cases. Here is the relevant part of the script:

for i in `cat /etc/mtab | grep media | egrep -iv 'cd|dvd'| awk -F\ '{print $2}'`

export volpath=`echo $i` #eg /media/sda1

export suspart=`echo $volpath | sed 's/media/dev/g'` #eg /dev/sda1

export fsuspart=`echo $suspart | sed 's/\///g'` #eg devsda1

export susdev=`echo $suspart | sed 's/[0-9]*//g'` #eg /dev/sda

export tsusdev=$susdev #eg /dev/sda

dirname=`echo $suspart | sed 's/\//_/g'` #eg _dev_sda1

ddirname=`echo $susdev | sed 's/\//_/g'` #eg _dev_sda

sudo mkdir -m 777 $evipath/$csname/$evnum/$ddirname #eg /mnt/cases/BADGUY_55-08/ABC1/_dev_sda1

sudo mkdir -m 777 $evipath/$csname/$evnum/$ddirname/$dirname

cd $evipath/$csname/$evnum/$ddirname/$dirname

export reppath=`pwd` #eg /mnt/cases/BADGUY_55-08/ABC1/_dev_sda1

sudo mkdir -m 777 findings

sudo mkdir -m 777 Report

sudo mkdir -m 777 tmp

First thing to note is that Forensicator has been a bad zombie by putting his variables in lower case, it is much better programming practice to put them in upper case to make the code easier to read.
The first line intitiates our loop, it is isolating all the file systems mounted under /media by looking in the /etc/mtab file, then excluding our cd/dvd drive in case we have booted the system from CD (as opposed to thumb drive). From line 9 onwards it is referencing some variables called $evipath, $csname, $evnum. If you look earlier in the diskmount script you will see that they were created during an interactive session, when the user was prompted for input, like this:

export evipath=/mnt/cases

echo -n "What is the case name (NO SPACES OR FORWARD SLASHES)? > "

read csname #eg BADGUY_55-08

export csname

echo -n "What is the evidence number of suspect system (NO SPACES OR FORWARD SLASHES)? > "

read evnum #eg ABC1

export evnum

echo -n "What is your rank and name? > "

read examiner #eg DC_Sherlock_Holmes

export examiner

The "read" command is great for getting user input assigned to a variable, the value is then exported for use by other scripts. I have commented the code (anything after the # character) to show you an example of the what the variable value looks like. So, we have a mounted external drive, we have created a case directory structure on it, the topmost directory is the case name, the next directory down the tree is that of the evidence number, inside that will be a directory for each physical device eg. _dev_sda, inside that there will be a directory for each partition (_dev_sda1, _dev_sda2, etc etc), inside each of those will be 3 directories named "findings", "Report" and "tmp". We have also, as part of our loop, created at variable called $reppath (short for Report Path), this variable points at the partitions directory on our ouput drive, so that data can be sent to the findings|Report|tmp directory, an example of a $reppath variable value would be something like:

/mnt/cases/BADGUY_55-08/ABC1/_dev_sda1

If I wanted to create a database of all the files and their description for each partion the code would therefore be:

find $volpath -type f -exec file {} \; >> $reppath/tmp/listoffiles

The $volpath is the mounted partition, eg /media/sda1. The database is called listoffiles (it is just a simple text file), when the $reppath variable gets expanded the full file name and path would be something like:

/mnt/cases/BADGUY_55-08/ABC1/_dev_sda1/tmp/listoffiles

The syntax for the find command is a bit weird if you aren't familiar with all the options, the command is saying find all entities in the path (for instance) /media/sda1,
confine the results to regular files ( -type f), execute the file command (-exec file) for each entity found ( {} ) and redirect results to my listoffiles.

This is how I would do the file signature check, once this database is created, I script out interrogating the database for certain file types then processing those. The code I have, and will be publishing here, will process the live set, deleted set and unallocated space along with the interpartition gaps and ambient data such as swap/hiberfil/memory dumps, it will export various files out for review, hunt for encrypted files, process compressed data and various archive formats, create storyboards of any movie files, do virus checking, recovers and processes 25 different chat/messaging formats, processes all the major email formats, processes p2p history files, does complete URL recovery and analysis, and lots more. This is all done automatically with a single command.

Monday, 3 September 2012

Identifying the dead

No comments:

Post a Comment