The Sleuthkit's latest release (3.2.0) contains a new feature tsk_loaddb that loads all metadata about a disk image into an Sqlite database. This is a very interesting feature as a long standing issue with TSK is that you had to rerun commands over and over, and that information was not cached between invocations. tsk_loaddb fixes this as all information necessary to examine an image is saved in the created database.
Unfortunately the database structure is not documented, so in this post we aim to reveal important parts of the database and to show how this new feature can be used to examine cases with TSK much more efficiently.
In this example, we have created an image file with /dev/zero, formatted it ext3, and mounted it on loopback. We then used gen_fs.py to create a simple directory layout within the mounted disk image. After that, we created the TSK database with:
./tsk_loaddb disk.img
and this created an Sqlite database named disk.img.db. Upon examining this database we see the structure shown here.
Within this database structure we can observe usual TSK information such as the disk partition, files contained, and block level data. Now examination of TSK data is as simple as SQL queries, which we will show for a few common scenarios.
1) Locating blocks of a file ('march.xls' in this example) contained in the disk image
sqlite> select blocks.blk_start from tsk_fs_blocks as blocks, tsk_fs_files as files where files.name="march.xls" and files.file_id=blocks.file_id;
99331
sqlite>
Block 99331 is reported from the query and if we use blkcat to read this block, we see that it matches the information contained in the gen_fs.py script. If we are recovering a file that spans multiple blocks, we can simply include the blocks.blk_len column in the select statment and it will report the number of blocks used by the file.
2) Locating the filename for an offset in the disk
A very common scenario in forensics investigations is searching for keywords and then determining the context of matches. This often requires determining which file the matching keywords were found in or if they are in unallocated space. Before the tsk_loaddb feature, this step required using a number of TSK commands in succession for each offset found by a file carver or indexer. Again, with the new feature it's as simple as an SQL query.
First, we determine where on the disk our string of interest (EEEEE, see gen_fs.py) is located:
# strings -t d disk.img | grep EEEEE
101714944 EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Second, we get the block size from the image and divide our offset by it:
sqlite> select block_size from tsk_fs_info;
1024
sqlite>
101714944/1024 = 99331
Finally, we search for which file corresponds to this block:
sqlite> select files.name from tsk_fs_blocks as blocks, tsk_fs_files as files where blocks.blk_start <= 99331 and blocks.blk_start+blocks.blk_len >= 99331 and files.file_id=blocks.file_id;
This query works by finding a file that
1) starts either at or before our block of interest
2) ends at or after our block of interest
This allows for immediate retrieval of filenames related to keyword searches and file carving. As discussed at the end of this post, this process can easily be scripted for a large number of files.
3) Time based activity searching
Many cases involve searching for file activity based on timelines. Since the sleuthkit database contains MAC time information, we can now perform much of this work through SQL queries. For example a simple search of files created before a 8:23PM on 01/24/11 could look like:
sqlite> select name from tsk_fs_files where datetime(ctime,'unixepoch','localtime') < '2011-01-24 20:34:00';
Now, there are a few things to explain about this query. First, we are querying based on the create time of the file (ctime) and in order to have this file match our local time zone, instead of UTC, we have to use the Sqlite datetime function to format the date correctly. Second, the date we want to search against needs to be written as is formatted after the less than (<) sign. Once we have this syntax correct, we can quickly search for files based on complex, multiple parameter queries such as 'created on X but modified after Y' or 'accessed between Z and Y'.
In this post, we hope to have showed some of the power contained in the new tsk_loaddb feature as well as inspired people to construct new queries and research new automated capabilties. Obviously we have performed our work through the direct SQL interface, but scripting interactions in Python or another language would be trivial using the provided Sqllite libraries. We also have envisioned a number of higher level capabilities that can be easily made using this feature, and hope to implement them and share them on the blog in the future.
A blog covering DFS's experiences in computer security and digital forensics
Tuesday, January 25, 2011
Friday, January 21, 2011
Speaking Materials from our talk at Blackhat DC
Digital Forensics Solution's researcher Andrew Case recently presented forensics memory analysis of Linux Live CDs and the Tor anonymity project at the Blackhat DC 2011 conference.
The talk has already received considerable attention and Tor quickly fixed some of the issues that were discussed. Two bug reports ( 1 and 2 ) were filed and the most recent release contains patches that sanitize sensitive memory after use.
We are now working to get the presented live CD research integrated into Volatility so that the research becomes available to the general public. Once this occurs, investigators of all skill levels will be able to properly handle cases involving live CDs.
The white paper that was published can be found here and the slides that accompanied the presentation here.
If you have any questions or comments about the research please email Andrew or comment below. Also, check back Monday as we will have a writeup of other Blackhat presentations that caught our interest.
The talk has already received considerable attention and Tor quickly fixed some of the issues that were discussed. Two bug reports ( 1 and 2 ) were filed and the most recent release contains patches that sanitize sensitive memory after use.
We are now working to get the presented live CD research integrated into Volatility so that the research becomes available to the general public. Once this occurs, investigators of all skill levels will be able to properly handle cases involving live CDs.
The white paper that was published can be found here and the slides that accompanied the presentation here.
If you have any questions or comments about the research please email Andrew or comment below. Also, check back Monday as we will have a writeup of other Blackhat presentations that caught our interest.
A New Blog About Digital Forensics and Computer Security
Welcome to Digital Forensics Solution's blog about computer security and forensics. We plan on using this blog to document our research, presentations, software projects, and interesting things we discover along the way.
Our past work can be found on our research page and information about contributing authors can be found on our about us page.
We already have a number of topics that we plan on blogging about over the next week or two so please check regularly for updates.
Our past work can be found on our research page and information about contributing authors can be found on our about us page.
We already have a number of topics that we plan on blogging about over the next week or two so please check regularly for updates.
Subscribe to:
Posts (Atom)