Thursday 10 January 2013

Ressurecting the dead

Unlike zombies, deleted files will not miraculously return to life on their own, we need to either undelete them or carve them if there is no file system meta data to help us.   So I want to blog about file carving, this will be over two posts, the first post will deal with theory, the second post will look at a couple of tools that I use.

In theory, file carving is straight forward enough - just look for a file header and extract the file.   The practice is a lot more complicated.
Let's consider carving files out of unallocated space.   The first consideration is "what is unallocated space?".   Imagine a deleted file in an NTFS file system, where the file data not been overwritten, Both the file data and the meta data are intact, the MFT entry has the flag set to indicate that the file is deleted, thus the clusters are available for allocation by another file.
Do you consider this file to be in unallocated space?   Some people say yes, as the relevant clusters are not allocated to a live file, some say no as the relevant clusters ARE allocated, albeit to a deleted file.  In many ways the question is academic, it doesn't matter what you consider to be unallocated space, it matters what your file carving tools considers to be unallocated space.  If you don't know what your tool considers to be unallocated space then how do you know if you have recovered all of the potentially recoverable files?

Another consideration is what strategy are you going to use.   File carving tools have different approaches to the problem of carving.   Once approach is to search the entire data-stream, byte-by-byte looking for file signatures.   This is the most thorough approach, however it is the most time consuming approach and will potentially lead to a number of false positives.   Some file signatures are maybe only 2 bytes in length, by pure chance we can expect those 2 bytes to appear on a hard disk a number of times.  Those 2 bytes may or may not represent the file header that you are interested in.   Figuring out if they are relevant headers or false positives can be quite challenging.

One way to reduce the number of false positives is to search for file signatures at the block (or cluster) level.   As the file signature is normally at the start of a file, we only need to look at the cluster boundary - as that where the start of files will be.   Any file signatures found here are unlikely to be false positives, what's more our carving routines will be a lot quicker.   The downside to this is that valid files may get missed, especially if there is a new file system overlaying an old file system.   The cluster boundary for the OLD file system may not fall at the cluster boundary for the NEW file system.   Imagine a 500 GB hard drive with a single partition filling the disk, when formatted the block size may be 16 sectors.  If a user then shrinks that partition to 400GB and creates a new partition in the remaining 100GB, the block size might be set at 8 sectors.   You would need your carving tool to search for headers at the 16 sector boundary for the first partition, and 8 sector boundary at the second partition.   Maybe a better solution would be to search  for signatures at the sector boundary?   This would ensure that all block (cluster) boundaries were searched but increase both the time taken and the risk of finding false positives.   Searching at the sector boundary means that there is also a possibility of files embedded in other files not being found if they are not saved to disk at the sector boundaries (not sure if this is possible, I have never tested it).

Once you have decided your strategy, the problems don't end there.   From a programmers point of view, how do you define when your tool stops carving and resumes searching for file headers?   This is probably the biggest problem facing programmers of carving tools.   Some files have footers, so you could program your carving tool to just keep going until it gets to the file footers.   But what happens if the footer is overwritten or is part of a fragmented file...your tool will just keep carving data out until it reaches the end of the disk or eventually finds a footer many, many clusters further into the disk.   There are different potential solutions to this problem, one is to set a maximum file size so that your tool stops carving at a certain point, even if no footers are found.   Another solution is to stop carving once your tool finds another header.   The problem here is deciding what file type header should be your stop point.  If you are carving for jpgs, do you start carving until you find another jpg header or any type of header?   If your carving engine does byte-by-byte carving, then if you are using "any known file signature" as your stop point you risk ending the carving prematurely if your tool finds a "false positive" header.  You can combine the approaches as Jesse Kornblum did when coding the "foremost" file carver - that is to say, once you start carving carve until max file size or footer found.   In fact there are now quite a few different approaches to the problems posed by file carving, a good overview can be found in this PRESENTATION.

Ultimately, once you understand how your file carving tool works, there is no "right way" or "wrong way" to do file carving.  The file signature searching engine in Encase is very through, however it uses a "byte-by-byte" strategy meaning that there are many false positives and it doesn't really do file carving as it doesn't export the found files. My own preferences depend on what I am looking for, generally for unallocated space I will carve at the sector or cluster boundary, for swap and hiberfil files I do byte_by_byte carving.   I will do a step by step post in the next few days on a couple of the file carving tools that I use routiney.  One of them, photorec, is another one of the tools that I use on just about every case I can think of.


1 comment: