extflow.py - A Hack For Carving Files From TCPFlow Streams

UPDATE: This code had been updated please see the following link

Quick post. Lately I have been messing around with Cuckoo Sandbox for automating the analysis of malicious URLs. It's quite effective for automating VirtualBox but it has some shortcomings when it comes to extracting dropped files. Luckily, all the files can be extracted from the pcap. There are a number of great tools out there for carving out files from pcap (foremost, tcpxtract, nfex, etc). Most of these tools are based off signatures of a file header. What if you just want the raw stream or have a file that contains embedded files (PDF that contains a SWF)? Rather than search for the header we could rebuild the stream using tcpflow, parse out the http/server/info from the pcap, find where the data starts via the pattern '\x0d\x0a\x0d\x0a', read the header, match the file extension, write the data to disk, add an extension and now we have carved out all the files from the pcap. Yep, one marvelous hack that has probably already been done with a one liner in awk and sed..

Output
remnux:~/Desktop/tmp$ ls
dump.pcap  extflow.py
remnux:~/Desktop/tmp$ tcpflow -r dump.pcap 
remnux:~/Desktop/tmp$ ls
010.000.002.015.01052-XXX.XXX.XXX.XXX.00080  XXX.XXX.XXX.XXX.00080-010.000.002.015.01054
010.000.002.015.01053-XXX.XXX.XXX.XXX.00080  XXX.XXX.XXX.XXX.00080-010.000.002.015.01055
010.000.002.015.01054-XXX.XXX.XXX.XXX.00080  XXX.XXX.XXX.XXX.00080-010.000.002.015.01056
010.000.002.015.01055-XXX.XXX.XXX.XXX.00080  XXX.XXX.XXX.XXX.00080-010.000.002.015.01057
010.000.002.015.01056-XXX.XXX.XXX.XXX.00080  XXX.XXX.XXX.XXX.00080-010.000.002.015.01058
010.000.002.015.01057-XXX.XXX.XXX.XXX.00080  XXX.XXX.XXX.XXX.00080-010.000.002.015.01060
010.000.002.015.01058-XXX.XXX.XXX.XXX.00080  XXX.XXX.XXX.XXX.00080-010.000.002.015.01062
010.000.002.015.01060-XXX.XXX.XXX.XXX.00080  XXX.XXX.XXX.XXX.00080-010.000.002.015.01052
010.000.002.015.01062-XXX.XXX.XXX.XXX.00080  dump.pcap
XXX.XXX.XXX.XXX.00080-010.000.002.015.01053  extflow.py
remnux:~/Desktop/tmp$ python ./extflow.py 
remnux:~/Desktop/tmp$ ls
010.000.002.015.01052-XXX.XXX.XXX.XXX.00080       XXX.XXX.XXX.XXX.00080-010.000.002.015.01055.swf
010.000.002.015.01053-XXX.XXX.XXX.XXX.00080       XXX.XXX.XXX.XXX.00080-010.000.002.015.01056
010.000.002.015.01054-XXX.XXX.XXX.XXX.00080       XXX.XXX.XXX.XXX.00080-010.000.002.015.01056.jar
010.000.002.015.01055-XXX.XXX.XXX.XXX.00080       XXX.XXX.XXX.XXX.00080-010.000.002.015.01057
010.000.002.015.01056-XXX.XXX.XXX.XXX.00080       XXX.XXX.XXX.XXX.00080-010.000.002.015.01057.mz
010.000.002.015.01057-XXX.XXX.XXX.XXX.00080       XXX.XXX.XXX.XXX.00080-010.000.002.015.01058
010.000.002.015.01058-XXX.XXX.XXX.XXX.00080       XXX.XXX.XXX.XXX.00080-010.000.002.015.01058.mz
010.000.002.015.01060-XXX.XXX.XXX.XXX.00080       XXX.XXX.XXX.XXX.00080-010.000.002.015.01060
010.000.002.015.01060-XXX.XXX.XXX.XXX.00080.bin   XXX.XXX.XXX.XXX.00080-010.000.002.015.01060.bin
010.000.002.015.01062-XXX.XXX.XXX.XXX.00080       XXX.XXX.XXX.XXX.00080-010.000.002.015.01062
010.000.002.015.01062-XXX.XXX.XXX.XXX.00080.bin   XXX.XXX.XXX.XXX.00080-010.000.002.015.01062.bin
XXX.XXX.XXX.XXX.00080-010.000.002.015.01053       XXX.XXX.XXX.XXX.00080-010.000.002.015.01052
XXX.XXX.XXX.XXX.00080-010.000.002.015.01053.html  XXX.XXX.XXX.XXX.00080-010.000.002.015.01052.html
XXX.XXX.XXX.XXX.00080-010.000.002.015.01054       dump.pcap
XXX.XXX.XXX.XXX.00080-010.000.002.015.01054.pdf   extflow.py
XXX.XXX.XXX.XXX.00080-010.000.002.015.01055

Python Code - Download
import os
# extflow.py created by Alexander.Hanel@gmail.com
# This is a simple script that will carve out files
# from streams created by tcpflow. 

def ext(header):
    # To add a new signature add your own elif statement
    #    elif 'FILE SIGNATURE' in header:
    #    return 'FILE EXTENSION'
    
    if 'MZ' in header:
        return '.mz'
    elif 'FWS' in header:
        return '.swf'
    elif 'CWS' in header:
        return '.swf'
    elif 'html' in header:
        return '.html'
    elif '\x50\x4B\x03\x04\x14\x00\x08\x00\x08' in header:
        return '.jar'
    elif 'PK' in header:
        return '.zip'
    elif 'PDF' in header:
        return '.pdf'
    else:
        return '.bin'
    return     

def main():
    for infile in os.listdir(os.getcwd()):
        if '.py' in infile or 'pcap' in infile :
            continue
        f = open(infile, 'rb')
        d = f.read()
        addr = d.find('\x0d\x0a\x0d\x0a') + 4
        if addr == len(d):
            continue
        f.seek(addr)
        # if the file signature is farther than the first 20 bytes
        # you should change the read value to that distance 
        o = open(infile+ext(f.read(20)), 'wb')
        f.seek(addr)
        o.write(f.read())
        o.close()
        
if __name__ == '__main__':
   main()

2 comments:

  1. Have you had a look at chaosreader? It dumps the payloads from the pcap and creates a nice webpage listing of them all as well. It supports a number of application layer protocols, so unless your dropper is using some custom protocol it should do the trick. http://www.brendangregg.com/chaosreader.html

    ReplyDelete