Building a PCAP Record Extractor Using Python

Occasionally, I find myself needing to extract an entire packet from a packet capture (PCAP). The reasoning varies between testing a custom decoder or parser I have written, or including the data in a report, or sometimes just wanting to visualize or structure the data in another way. Packet extraction can be done using Tshark extracting field by field, then reassembling the individual components. The process can be tedious, and probably has a higher error and frustration rate. So whatever can we do; who or what will save the day? Dun dun dun, Python sweeps in over the horizon, wind blowing in its hair, Michael Bolton theme song blasting in the background, swoops to the rescue (dramatic pause), fade to black.

Building the Packet Extractor

Like many great Python applications it begins with modules. For brevity, we are only focusing on the core code needed to perform our tasks. Other tertiary elements are omitted. As in the previous article, we will be using Pcapy for extracting the data. Pcapy is probably my top choice for processing PCAP files, its interface is simple, yet robust. In designing the application, we may want to add new features later. Command line options are a great way to organize and add features, because each option is often processed by a separate function or class. Programmatically we can use the Argparse framework to handle command line options. This helps keep our syntax clean and organized.

Figure 1

All of the processing is handled in a single function called LoopPcap(), which accepts two parameters: the file to be processed, and packet number. Once the packet is located, the results are returned back to the main() function for additional processing as indicated in figure 2.

Figure 2

Pcapy has a couple of methods for iterating over data. In the previous article I used the loop() method, which controls the iteration and sends the data to a processing function. In this example, the next() method is used and the iteration is controlled by the processing function. It is a matter of personal preference which method is used. Depending on the scenario, I will interchange between loop() and next().

LoopPcap() begins by declaring a Boolean logic variable for the header, it is used as a control for the iteration. We assume there is always data in the file, and every entry will have a header which contains the time and date the record or packet was created. If there is no header (header = False), we are at the end of the file, and processing will stop. 

Since the current implementation of libpcap does not assign index values for each record entry, we must count each record until the desired packet is reached. At this point you can return the found packet and you’re done. As a matter of personal preference, using a list comprehension, the packet is converted into a list of hexadecimal bytes.

Putting it all together

The last section of code needed is the main() function. This will house the components needed for accepting command line arguments and receiving the returned data from LoopPcap(). Our example script only accepts two parameters: the input file, and the packet number. Figure 3 displays the completed main() function.

Figure 3

Extracting the Target Packet

The hard part is over, our masterpiece is complete. Now it just needs something interesting to locate. For this example, we focus our sights on a PCAP that documents the exploits of a server compromised in a lab environment. Once the packet of interest is identified in Wireshark, the packet number can feed into the extractor. Figure 4 displays the identification of a large packet issuing an API call to a vulnerable DSSETUP operation. Figure 5 displays the contents of the identified packet.

Figure 4

Figure 5

Conclusion

Today we learned how to create a basic packet extractor using less than 40 lines of code. This program could easily be modified for a multitude of applications. The original version I wrote has added functionality for identifying specific patterns and traits for protocol anomalies, and extracting multiple packets.

What features will you add to your version? Where will the next packet of interest take you? The only limits are your imagination.