Not so long ago I found myself analyzing some emails that appeared to be spoofed or forged. During my analysis, I started looking at a header entry that I thought might aid in proving or disproving my theory. The header entry was Thread-Index. Thread-Index is a Microsoft Outlook centric header that is used to track conversations. I wanted to use this to analyze potential discrepancies in the FILETIME time stamp in the email message added by the email client.
I am using the MSDN documentation to walk through the header value and the Python programming language to illustrate how to decode the somewhat obfuscated value.
The MSDN documentation titled “Tracking Conversations” references two distinct properties (PR_CONVERSATION_TOPIC and PR_CONVERSATION_INDEX) that the email client should set to aid in tracking conversations. Since the Thread-Index header is what I am most interested in, I will concentrate on how that value is created.
The header is actually a base64-encoded binary representation of the data such as:
If we take the base64 string and attempt to simply decode the string by echoing the string and piping it through base64 with the decode option, we can see the output is binary data.
At this point, we could just pipe that output through something like xxd to get the hexadecimal representation of the data to work with, but let’s use Python to help with some of that work. Python gives us several different methods and modules to handle the decoding, but let’s just use the built-in functions for now.
As you can see, that still looks a little ugly, so we will use the map function to iterate through the results and apply the ord function to return an integer value for each. That data is then iterated through the hex function to return a list of hexadecimal values of each of the integers. Basically, we’re doing this to make it prettier and more useable.
Using the list of hexadecimal values, we create a string to use in the conversion of the value into a legible FILETIME time stamp and Global Unique Identifier (GUID). To create the string, we join the third and fourth characters for each element in the list. The zfill() function ensures that any element in the list that is missing the 0 in the first position of the hexadecimal value will be included in the string correctly.
Now, that looks better. The hexadecimal string gives us something to work with as we consult the documentation for how this value is created.
The email client calls the ScCreateConversationIndex function to calculate the value for the outgoing message. ScCreateConversationIndex creates a header block that is 22 bytes in length, followed by zero or more child blocks, each five bytes in length. The exact wording from the documentation is:
ScCreateConversationIndex implements the index as a header block that is 22 bytes in length, followed by zero or more child blocks each 5 bytes in length.
The header block is composed of 22 bytes, divided into three parts:
- One reserved byte. Its value is 1.
- Five bytes for the current system time converted to the FILETIME structure format.
- Sixteen bytes holding a GUID, or globally unique identifier.
Each child block is composed of 5 bytes, divided as follows:
- One bit containing a code representing the difference between the current time and the time stored in the header block. This bit will be 0 if the difference is less than .02 second and greater than two years and 1 if the difference is less than one second and greater than 56 years.
- Thirty one bits containing the difference between the current time and the time in the header block expressed in FILETIME units. This part of the child block is produced using one of two strategies, depending on the value of the first bit. If this bit is zero, ScCreateConversationIndex discards the high 15 bits and the low 18 bits. If this bit is one, the function discards the high 10 bits and the low 23 bits.
- Four bits containing a random number generated by calling the Win32 function GetTickCount.
- Four bits containing a sequence count that is taken from part of the random number.
According to the documentation, the header block is divided into three parts. The first byte is a reserved byte with a value of 1. The next five bytes is the current system time using the FILETIME structure format. The last 16 bytes of the header block hold the GUID.
Now let us look at the document for PidTagConversationIndex to get a better understanding of how the value is set.
If we plug our hex string value from above (01cf477a7053f8214424269e4e4eb030b4f9e32a32ab) into this representation of the header block, the value we are working with should look like this.
This is where we see the first discrepancy in the documentation. According to the documentation we should be able to take the 40 bits of the FILETIME value to calculate the current FILETIME. The value is actually a time delta in 100-nanosecond units that is added to January 1, 1601 to come up with current FILETIME value.
So, the first byte is not just reserved, but actually included to calculate the time delta. We use the “Reserved” number and the FILETIME value, then fill the remaining 16 bits with zero padding to get the offset value we need to calculate the time value.
The value we have just calculated represents 130,401,505,413,103,616 100-nanosecond units since January 1, 1601. To calculate the date, we do the following:
Let me explain what we did here.
Since we ultimately need to get something that resembles a time stamp, we are importing a couple of classes from the datetime module. First, you will notice that I manipulated the way I calculated the time_offset from before. When we calculate the FILETIME by adding the timedelta to the starting date of January 1, 1601 the time_offset needs to be in a value that can be accepted by the timedelta function. The timedelta function’s smallest time value is microseconds. 100 nanoseconds is 0.1 microseconds so we divide the time_offset value by 10 to get the timedelta value in microseconds.
There we have it!
The email client produced a FILETIME time stamp of March 24, 2014 16:02:21 UTC. This time stamp can be used to compare other timestamps within the email header, and to other emails from the same email client, to aid in the analysis of whether or not an email is forged. Keep in mind that you will observe time differences because of time synchronization issues, and differences by all hosts involved, but using other emails as references can help reconcile differences.
In the case of my own analysis, the timestamp in the “suspect” emails were not consistent with the time stamps in other emails from the same client, confirming my suspicion that they were, in fact, forged.
It is worth pointing out that the GUID is created for each message the email client produces. So, if you see a re-used GUID in email that is definite evidence of a forged email. Nothing has to be converted, but formatting the data will produce the look most people are used to seeing here:
Formatted properly, results in the following:
Before we go, let me touch on the remaining portion of the Thread-Index value that was not discussed with this example. Any child blocks will follow the header block (the first 22 bytes) in five byte blocks. This was the second discrepancy with the MSDN documentation. The documentation states: “Thirty-one bits containing the difference between the current time and the time in the header block expressed in FILETIME units”. What I found is that this is true for the first child block, but subsequent child blocks include the difference between the current time and the time of the previous child blocks.
The process to parse the child blocks requires additional space not available in this blog but can be interpreted from the Python script below:
Author: Jeremy Scott
Purpose: Python script used to pass the Outlook Thread-Index value through the algorithm
to decode and parse the header contents and return the WIN32 FILETIME, GUID, and
any Child message FILETIME values.
Copyright (c) 2014, Jeremy Scott
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted
provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions
and the following disclaimer. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the documentation and/or other
materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR
IMPLIED WARRANTIES,INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
from datetime import datetime,timedelta
__copyright__ = "Copyright (c) 2014, Jeremy Scott - jeremyscott.org"
__license__ = "BSD"
__version__ = "0.1"
__maintainer__ = __author__
__email__ = "dev [a] jeremyscott.org"
__status__ = "Development"
Definition used to parse PR_CONVERSATION_INDEX value and return human readable value.
hex_data = value.decode('base64')
hex_chars = map(hex,map(ord,hex_data))
hex_string = "".join(c[2:4].zfill(2) for c in hex_chars)
ft_value = hex_string[:12] + '0000'
guid = hex_string[12:44]
time_offset = int(ft_value,16) / 10.
filetime = datetime(1601,1,1) + timedelta(microseconds=time_offset)
print "Decoded:\t" + hex_string
print "FILETIME:\t" + str(filetime)
print "GUID:\t\t" + guid[:8] + "-" + guid[8:12] + "-" + guid[12:16] + "-" + guid[16:20] + "-" + guid[20:]
if hex_string > 44:
child_blocks = hex_string[44:]
children = [child_blocks[i:i+n] for i in range(0, len(child_blocks), n)]
count = 0
for child in children:
scale = 16
num_of_bits = 40
binary = bin(int(child, scale))[2:].zfill(num_of_bits)
time_diff = '0'*15 + binary[1:32] + '0'*18
c_time_offset = int(time_diff, 2) / 10.
filetime = filetime + timedelta(microseconds=c_time_offset)
print "\tChild Message[" + str(count+1) + "]: " + str(filetime)
count += 1
parser = argparse.ArgumentParser()
parser.add_argument("value", nargs=1, help='Thread-Index value')
parser.add_argument('--version', action='version', version='filetime_converter v' + __version__)
args = parser.parse_args()
value = str(args.value)
if __name__ == '__main__':