If you can see this check that

Main Page

Carving and hash signatures

Signature analysis and hashes



This lab covers searching for files with hashes and file carving. File hashes present an important method of rapidly searching for and identifying known good and bad files. A file hash database of files to be searched for can be used to rapidly identify them on a system, even when their names have been changed in an attempt to obfuscate their true type.

File carving is a file extraction method for recovering files from a partition or disk image that may be corrupt. Or it may be used to recover deleted files.

You have been provided with 4 files. These can be found in /images/siglab/.

  1. search.dd
  2. carve.dd
  3. KnownGoodFiles.hdb
  4. KnownBadFiles.hdb

KnownGoodFiles is a hash database of files which should be excluded from further analysis. These could be things like system files. The files have been hashed using a file source which has been validated (e.g. the files were downloaded from the manufacturer).

KnownBadFiles is a hash database of files which a forensics colleague of yours has created. These are hashes of files you are specifically interested in finding and examining. These could be be questionable jpegs which were discovered on another computer, or for instance rootkit executables which you think may have been used as part of a crime.

Question 1: Sorter

Before you begin, prepare your environment for this investigation. Firstly you should verify the md5 hash of carve.dd and search.dd. Calculate the md5 sum of:

Tests - not attempted
Calculated md5 of carve.dd UNTESTED
Calculated md5 of search.dd UNTESTED

Create a directory "/home/caine/sorter", and for the remainder of this tutorial all files created should be stored in this new directory.

Tests - not attempted
/home/caine/sorter exists UNTESTED

The sorter command analyses a partition image and organises the allocated and unallocated files by file type. The resulting analysis is a collection of files created in the directory specified using "-d".

Using the sorter command line tool to analyse the search.dd image. Make sure that you use the -m switch to provide the correct mounting point (C:). Make the output directory of this command /home/caine/sorter/.

Tests - not attempted
Sorter executed on search.dd UNTESTED

How many files were found? Hint: in the output directory are files with the answer you need, e.g. sorter.sum.

Tests - not attempted
No of files? UNTESTED

How many images (e.g. jpegs and gifs) were found?

Tests - not attempted
No of images? UNTESTED

How many extension mismatches were found?

Tests - not attempted
No of extension mismatches? UNTESTED

Question 2: Generate a hash database

If you had to build your own hash files, you would need to find the file of interest, then using md5sum create a hash database using the md5 information. This question demonstrates building a simple hash database of files of interest.

Firstly make a directory /mnt/search and mount search.dd at that location. You should use the "-r" option to mount readonly. To make this directory and to do the mount you must execute the appropriate commands as root (e.g. using sudo). Just mount the file directly with the mount command (do not use a loop device). No loop device is needed here are there is no partition offset to specify.

Tests - not attempted
/mnt/search created UNTESTED
search.dd mounted UNTESTED

Create a file /home/caine/myhash.hdb which contains the md5sum output for

 /mnt/search/Documents and Settings/Clyde/My Documents/Frankie.xml

Hint: Spaces in a filename must be escaped. If you want to say " " you need to say "\ " (slash space).

Tests - not attempted
Frankie added to myhash.hdb UNTESTED

Now ADD (append to the end of myhash.hdb using >>) the following file to the database.

 /mnt/search/Documents and Settings/Clyde/My Documents/Ford.xml
Tests - not attempted
Frankie still in myhash.hdb UNTESTED
Ford added to myhash.hdb UNTESTED

Question 3: hfind

In order to efficiently use the hash database files, you need to create an index of the file entries.

Firstly, copy the KnownBadFiles.hdb and KnownGoodFiles.hdb to /home/caine from their current location in /images/siglab.

Now use the hfind command, using the db type of md5sum, to build a .idx file for both KnownGoodFiles and KnownBadFiles.

Tests - not attempted
GoodFiles copied ok UNTESTED
BadFiles copied ok UNTESTED
GoodFiles index ok UNTESTED
BadFiles index ok UNTESTED

Now lookup the md5 hash of Frankie.xml which you saved into myhash.hdb in the last question. Using this hexadecimal hash value for Frankie.xml, search for that has using hfind in both KnownGoodFiles.hdb and KnownBadFiles.hdb.

To use hfind in this way it takes 2 parameters. The first parameter is either KnownGoodFiles.hdb or KnownBadFiles.hdb, and the second parameter is the hexademimal hash which you are searching for.

Search of KnownGoodFiles produced:
Search of KnownBadFiles produced:
Expert opinion of Frankie.xml:

Tests - not attempted
KnownGoodFiles UNTESTED
KnownBadFiles UNTESTED

Question 4: Sorter and filtering

It is possible to filter out known good and bad files while using sorter. This helps cut down on the number of files in the sorter output. Use the appropriate sorter commands to use each of the indexes build with KnownGoodFiles and KnownBadFiles for filtering to search the search.dd image.

Hint: Make sure that you created a new directory for the output of sorter. Make this directory /home/caine/sorter2.

HINT: just like hfind, you need to use the .hdb name of the database file and not the .idx name. The good file database is called the hash_exclude in the sorter man pages, while the bad file database is called hash_alert in the documentation.

Tests - not attempted
/home/caine/sorter2 exists UNTESTED
Sorter executed on search.dd UNTESTED
Used hash alerts UNTESTED
Used hash exclusion UNTESTED

How many alerts were found?

Tests - not attempted
No of alerts? UNTESTED

How many exclusions were found?

Tests - not attempted
No of exclusions? UNTESTED

Question 5: Linking the Techniques

In this question you will put all you have learned so far together to search the image for known bad files that have been obfuscated, using the output from sorter saved in the /home/caine/sorter2 directory.

The search.dd image has a number of files on it that can be found in the KnownBadHashes.hdb file. However, some of these files may have been tampered with and obfuscated by a user, perhaps by changing file extensions or using compression or zip archives.

The following table should be filled with the details of bad files found on the image.

The file information for each category identified in the sorter.sum file is saved into its own separate file. For example "Hash Database Alerts" are saved in "alert.txt" and "Extension Mismatches" is saved in "mismatch.txt".

To complete this exercise you have to open the alert.txt file and mismatch.txt file and analyse the information shown. There are 11 files in total to consider.

Example: the first file in alert.txt is Anjie.docx. From alert.txt use the inode number and hash signature of Anjie.docx to analyse the file. The alert.txt file shows Anjie.docx has a hash "37b42ccf126a804620d706ebd6b19ae8". Search using the hash in KnownBadFiles:

$ hfind KnownBadFiles.hdb  37b42ccf126a804620d706ebd6b19ae8

37b42ccf126a804620d706ebd6b19ae8        File9.gz
Now use icat on the file's inode number from alert.txt and analyse the head and tail information. You can do this manually with xxd:
$ icat /images/siglab/search.dd 312 | xxd | head -2
0000000: 1f8b 0808 7963 c94e 0003 4669 6c65 392e  ....yc.N..File9.
0000010: 6a70 6700 ec5b 0b3c 545b db5f 7b66 8c71  jpg..[.<T[._{f.q

$ icat /images/siglab/search.dd 312 | xxd | tail -2
0001840: 499e 2de4 1b83 f2c6 1df4 d7ce 2f8f dfad  I.-........./...
0001850: 0e6f 71d7 f00f d8bb fcb9 0070 0000       .oq........p..
If you google 1f8b0808 you can discover this is a gz file (a file compressed using gzip). Or you can use "file" to achieve the same thing:
$ icat /images/siglab/search.dd 312 | file -
/dev/stdin: gzip compressed data, was "File9.jpg", from Unix, ...

So from the information gathered so far, Anjie.docx is not a "docx" file at all, and is instead a gzipped file which used to be called File9.jpg. But what sort of file was it before it was compressed? To look at a compressed file you need to use a command like zcat, which is the same as cat but uncompresses files before printing them.

$ icat /images/siglab/search.dd 312 | zcat | xxd | head -2
0000000: d0cf 11e0 a1b1 1ae1 0000 0000 0000 0000  ................
0000010: 0000 0000 0000 0000 3e00 0300 feff 0900  ........>.......
Again you can manually recognise d0cf 11e0 a1b1 1ae1 indicates a word document, or again you could use file.
$ icat /images/siglab/search.dd 312 | zcat | file -
/dev/stdin: CDF V2 Document, ... , Name of Creating Application: Microsoft Office Word, ...
As this file was compressed, it is worth checking the uncompressed version against known bad hashes...
$ icat /images/siglab/search.dd 312 | zcat | md5sum
4b7e00728187f79aefc74a48a15c7681  -
$ hfind KnownBadFiles.hdb  4b7e00728187f79aefc74a48a15c7681
4b7e00728187f79aefc74a48a15c7681  File9.jpg
So Anjie.docx is a compressed version of File9.jpg. Both files are bad files. So someone started with a .docx file, renamed it to File9.jpg, then compressed it using gzip before renaming it again to Anjie.docx. As both the compressed and uncompressed versions exist in the bad hash database both files need to be listed in the table below.

Repeat this example for all the files identified as bad hash files.

Name in search.dd md5 hash Name in HashDatabase Observation
Anjie.docx37b42ccf126a804620d706ebd6b19ae8 File9.gzgzipped and wrong extension
File9.jpg4b7e00728187f79aefc74a48a15c7681 File9.jpgOffice document and wrong extension
Tests - not attempted
You have answer 1 UNTESTED
You have answer 2 UNTESTED
You have answer 3 UNTESTED
You have answer 4 UNTESTED
You have answer 5 UNTESTED
You have answer 6 UNTESTED
You have answer 7 UNTESTED
You have answer 8 UNTESTED
You have answer 9 UNTESTED

Question 6: Scalpel

In this question we will use "scalpel" to perform file carving and file identification using known databases.

Firstly create a directory /home/caine/scalpelOutput to contain the output when running scalpel.

Tests - not attempted
/home/caine/scalpelOutput exists UNTESTED

Edit the /etc/scalpel/scalpel.conf file. You have to do this using sudo. Configure the scalpel config file to search for jpgs, pdfs and word documents. To do this delete the comment character (the '#') from the beginning of the lines responsible for jpg,pdf,and doc. Remember to use sudo, as you need to be root to edit this file.

Tests - not attempted
scalpel configured UNTESTED

Use the scalpel tool to carve the carve.dd file in /images/siglab in to the /home/caine/scalpelOutput directory.

Tests - not attempted
Scalpel tool executed UNTESTED

List the names of the files found and extracted from carve.dd by scalpel by comparing the hashes of the files in the scalpel output directory with the KnownBadFiles.hdb.

Hint: use the md5deep tool to recursively analyse (using the correct flag) all the files in the scalpel output directory. Ensure you use the -b flag. Save this output in /home/caine/out1.

Hint 2: you will need to clean up the md5deep output file so that hfind will work with it. Use the following regular expression (and then save the output into something like /home/caine/out2:

sed 's/\s*[0-9a-z]*.\(doc\|jpg\|txt\|gif\)//' FILE_CREATED_BY_MD5DEEP.hdb > /home/caine/out2

Hint 3: The hfind command can take a file which contains a list of md5 hashes and look up each line of that file (e.g. out2) in its hash database. You need a new flag to do this. Make sure the output file (e.g. out2) is just a list of hashes, without any filenames or other text (the sed command should have sorted this for you already but it pays to be sure).

Filenames carved from image which match a hash in KnownBadFiles
Tests - not attempted
answer 1 UNTESTED
answer 2 UNTESTED
answer 3 UNTESTED
answer 4 UNTESTED
answer 5 UNTESTED

Linux tutorials: intro1 intro2 wildcard permission pipe vi essential admin net fwall DNS diag Apache1 Apache2
Caine 6.0: Essentials | Basic | Search | SysIntro | 5a | 5b | 5c | 6 | 7 | 8a | 8b | WebBrowserA | WebBrowserB | Registry | Browser
Digital Investigation: Editing | Email | Logs | Strength
Kali: 1a | 1b | 1c | 2 | 3 | 4a | 4b | 5 | 6 | 7a | 8a | 8b | 9 | 10 |
Useful: Quiz | Forums | Privacy Policy | Terms and Conditions
Site Links:XMLZoo ActiveSQL ProgZoo SQLZoo

Copyright @ 2004-2014 Gordon Russell. All rights reserved.