Recovering a single record from the MFT is easily do-able, but why would you want to, after all the istat command in the sleuthkit can parse any MFT records (and in the most useful and understandable way amongst all the forensic tools that I have looked at)? Well, MFT parsing tools generally report on the data in the "allocated" area of the MFT record, they don't report the residual data in the record. The residual data could contain information useful to your investigation. It is important to understand that when the data is written to an MFT record for a NEW file, it zeros out the data after the end of record marker. Therefore, any residual data that you find in an MFT record MUST have once been part of the allocated area of the record. So, how did I extract the records? A simple one-line command for each record (I am using an arbitrary record number in this example) like this:
icat -o 2048 bad-guy.E01 0 | dd bs=1024 count=1 skip=9846 > MFT9846.dat
The first part before the pipe simply output the contents of record 0 on the partition starting at sector 2048 on my disk image. Record 0 on an NTFS file system is the MFT, so the entire MFT is output. We then pipe the output to the dd command, specifying a block size of 1024 bytes (the size of each record, and skipping a number of records until we get to the one want. The output is redirected to a file - job done!
Here is a way to identify files that contain an arbitrary number of a particular string. Another example of how this might be useful:
In several examinations I have found web-pages in the web cache that contain chat records, these are generally from websites that use flash based chat applications. I noted that the chat messages were generally preceded by the string "said:". Thus you would see the username followed by the string "said: " followed by the message. I therefore automate the process of searching for web pages that contain SEVERAL instances of the string "said: ". Obviously if we looked for web pages that contained a single instance of the string "said: " we are going to get a number of false positives, as that string might appear in a web based story, or very frequently a news story quoting a source.
So, we could find all the web-pages in the mounted file system, remembering that if it is a Windows box being investigated there is a likelihood that there will be white space in file names and file paths so we need to set our Internal Field Separator environmental variable to new lines only, like this:
IFS='
'
Thats IFS=' followed by a press of the return key followed by a single ' character. Theoretically you should be able to type: IFS='\n' but it does seem to work for me, thus I explicitly use the return key.
find /disk/image -type f -exec file {} \; | awk -F: '$2 ~ /HTML document/ {print $1}' > /tmp/livewebs.txt
This command conducts a file signature check on the file in the live file system and looks at the description field in the output for the string "HTML document" if there is a match the file path and file name are sent to a file in my /tmp directory.
I can then process my list of files like this:
cat /tmp/livewebs.txt | while read LWC ; do egrep -H -c -e " said:" -e " says:" -e "msgcontent" $LWC | awk -F : '$2 > 2 { print $1 }' > /tmp/livewebchat.txt ; done
The above command reads my list of files, line by line, it searches for 3 regular expressions ( said:, says: and msgcontent). The -H option for egrep reports the file path and file name, the -c option reports the number of matches for each file. So, a list of file paths/names are generated followed by a colon, followed by the number of matched regular expressions for each file. The result is piped to awk, the field separator is set to a colon (giving us 2 fields). The second field is checked to see if the number of matches is greater than 2, if they are the first field is (the file name/path) is sent to a new file in the /tmp directory. I could then use that new file to copy out all of the files:
cat /tmp/livewebchat.txt | while read i ; do cp $i /home/forensicotd/cases/badguy/webchat ; done
Obviously I would need to do that again with any deleted files I've recovered with the tsk_recover command or any unallocated web pages I have recovered with photorec. I would need to do something similar with gzipped files as well, as they could be compressed web pages - so use zgrep instead of egrep. Remember that simply coping files out is a bad idea - you need to use the FUNCTION I posted to ensure that files don't get over-written in the event of a file name collision.
Another pain in dealing with web-pages is actually viewing them. You REALLY shouldn't be hooked up to the 'net whilst you are doing forensics, if you are, then you are doing it WRONG! But browsers often struggle to load raw web-pages when they are trying to load content coded into the html in your web page. So, one thing we could do with the web pages listed in our livewebchat.txt file is remove all the html tag, effectively converting them to text files that we can view easily in a text editor, or the terminal. We can do that with the html2text program. Even better, we can output the full path and filename at the top of our converted webpage so that if we find anything interesting we know the exact file path that the file came from. Here is a function that I wrote to convert the web-page to text, output the file path/name and check if the file name exists and adjusting the file name to prevent it over-writing a file with an identical file name:
suswebs () {
filepath="$1"
filename=`basename "$1"`
msg=`html2text $1`
while [ -e ${WDIR}/"${filename}" ]; do
filename=`echo "$filename" | perl -pe 's/(\[(\d+)\])?(\..*)?$/"[".(1+$2)."]$3"/e;'`
done
echo "${filepath}${msg}" > ${WDIR}/"${filename}.txt"
}
To use the function you need to set the WDIR variable to the location where you want your processed web-pages to go to, so something like this:
WDIR=/home/forensicotd/cases/badguy/webchat
cat /tmp/livewebchat.txt | while read l ; do suswebs $l ; done
Obviously you can add all of these commands to your linux preview disk.
IFS='
'
Thats IFS=' followed by a press of the return key followed by a single ' character. Theoretically you should be able to type: IFS='\n' but it does seem to work for me, thus I explicitly use the return key.
find /disk/image -type f -exec file {} \; | awk -F: '$2 ~ /HTML document/ {print $1}' > /tmp/livewebs.txt
This command conducts a file signature check on the file in the live file system and looks at the description field in the output for the string "HTML document" if there is a match the file path and file name are sent to a file in my /tmp directory.
I can then process my list of files like this:
cat /tmp/livewebs.txt | while read LWC ; do egrep -H -c -e " said:" -e " says:" -e "msgcontent" $LWC | awk -F : '$2 > 2 { print $1 }' > /tmp/livewebchat.txt ; done
The above command reads my list of files, line by line, it searches for 3 regular expressions ( said:, says: and msgcontent). The -H option for egrep reports the file path and file name, the -c option reports the number of matches for each file. So, a list of file paths/names are generated followed by a colon, followed by the number of matched regular expressions for each file. The result is piped to awk, the field separator is set to a colon (giving us 2 fields). The second field is checked to see if the number of matches is greater than 2, if they are the first field is (the file name/path) is sent to a new file in the /tmp directory. I could then use that new file to copy out all of the files:
cat /tmp/livewebchat.txt | while read i ; do cp $i /home/forensicotd/cases/badguy/webchat ; done
Obviously I would need to do that again with any deleted files I've recovered with the tsk_recover command or any unallocated web pages I have recovered with photorec. I would need to do something similar with gzipped files as well, as they could be compressed web pages - so use zgrep instead of egrep. Remember that simply coping files out is a bad idea - you need to use the FUNCTION I posted to ensure that files don't get over-written in the event of a file name collision.
Another pain in dealing with web-pages is actually viewing them. You REALLY shouldn't be hooked up to the 'net whilst you are doing forensics, if you are, then you are doing it WRONG! But browsers often struggle to load raw web-pages when they are trying to load content coded into the html in your web page. So, one thing we could do with the web pages listed in our livewebchat.txt file is remove all the html tag, effectively converting them to text files that we can view easily in a text editor, or the terminal. We can do that with the html2text program. Even better, we can output the full path and filename at the top of our converted webpage so that if we find anything interesting we know the exact file path that the file came from. Here is a function that I wrote to convert the web-page to text, output the file path/name and check if the file name exists and adjusting the file name to prevent it over-writing a file with an identical file name:
suswebs () {
filepath="$1"
filename=`basename "$1"`
msg=`html2text $1`
while [ -e ${WDIR}/"${filename}" ]; do
filename=`echo "$filename" | perl -pe 's/(\[(\d+)\])?(\..*)?$/"[".(1+$2)."]$3"/e;'`
done
echo "${filepath}${msg}" > ${WDIR}/"${filename}.txt"
}
To use the function you need to set the WDIR variable to the location where you want your processed web-pages to go to, so something like this:
WDIR=/home/forensicotd/cases/badguy/webchat
cat /tmp/livewebchat.txt | while read l ; do suswebs $l ; done
Obviously you can add all of these commands to your linux preview disk.