Extracting files from Burp intruder

About 2 min

Following the discovery of an authenticated Insecure Direct Object Reference (IDOR) vulnerability which allowed connected users to download files, I needed to extract them out of Burp. I could have done this with a loop and wget/curl but I was on Windows and it was much easier to deal with cookies within Burp. So I ran my intruder and then saved the items to a file.

Burp export a file containing XML with both the request and the response base64 encoded. The document is like so :

<?xml version="1.0"?>
<!DOCTYPE items [
<!ELEMENT items (item*)>
<!ATTLIST items burpVersion CDATA "">
<!ATTLIST items exportTime CDATA "">
<!ELEMENT item (time, url, host, port, protocol, method, path, extension, request, status, responselength, mimetype, response, comment)>
<!ELEMENT time (#PCDATA)>
<!ELEMENT url (#PCDATA)>
<!ELEMENT host (#PCDATA)>
<!ATTLIST host ip CDATA "">
<!ELEMENT port (#PCDATA)>
<!ELEMENT protocol (#PCDATA)>
<!ELEMENT method (#PCDATA)>
<!ELEMENT path (#PCDATA)>
<!ELEMENT extension (#PCDATA)>
<!ELEMENT request (#PCDATA)>
<!ATTLIST request base64 (true|false) "false">
<!ELEMENT status (#PCDATA)>
<!ELEMENT responselength (#PCDATA)>
<!ELEMENT mimetype (#PCDATA)>
<!ELEMENT response (#PCDATA)>
<!ATTLIST response base64 (true|false) "false">
<!ELEMENT comment (#PCDATA)>
]>
<items burpVersion="2022.12.4" exportTime="Mon Dec 12 10:00:00 CET 2022">
  <item>
    <time>Mon Dec 12 10:00:00 CET 2022</time>
    <url><![CDATA[https://domain.tld/documents/15008/download/]]></url>
    <host ip="X.X.X.X">domain.tld</host>
    <port>443</port>
    <protocol>https</protocol>
    <method><![CDATA[GET]]></method>
    <path><![CDATA[/documents/15008/download/]]></path>
    <extension>null</extension>
    <request base64="true"><![CDATA[REDACTED]]></request>
    <status>200</status>
    <responselength>35419</responselength>
    <mimetype></mimetype>
    <response base64="true"><![CDATA[REDACTED]]></response>
    <comment></comment>
  </item>
</items>

In order to extract the files from it, I used the following script ugly script.

import base64
import re
import xml.etree.ElementTree as ET

# Variables
path = 'document-download.burp'
output_path = './extracted/'

def main():
    # Parse Burp file
    mytree = ET.parse(path)
    myroot = mytree.getroot()

    # Search through each item in the file
    for item in myroot.findall('item'):
        try:
            # Retreive the response
            based = (item.find('response').text)

            # Decode the response
            data = base64.b64decode(based)

            # Retrieve the headers from the response
            headers = data.split(b'\r\n\r\n')[0]

            # Retrieve the content from the response
            content = data.split(b'\r\n\r\n')[1]

            # Extract the filename from the response headers
            regex_expression = r'filename=(?:\")?([a-zA-Z0-9\-\_\.\_]*)(?:\")?'
            filenames = re.findall(regex_expression, headers.decode("utf-8"))
            if len(filenames) <= 0 :
                raise Exception("No filename was identified")

            # Get the first match
            filename = filenames[0]

            # Generate the output path
            output_name = output_path + filename
            print (f"[+] Extracted : {filename}")

            # Write the body to a file using extracted filename
            f = open(output_name, "wb")
            f.write(content)
            f.close()

        # If something goes wrong, print the exception
        except Exception as e:
            print(e)

if __name__ == "__main__":
    main()

The script is using the filename value out of the Content-Disposition HTTP header of the response.