The Burp Suite User Forum was discontinued on the 1st November 2024.

Burp Suite User Forum

For support requests, go to the Support Center. To discuss with other Burp users, head to our Discord page.

SUPPORT CENTER DISCORD

Crawling a web site results to bloated project file

Miro | Last updated: Nov 11, 2021 05:25PM UTC

When crawling a web site, using crawling and audit's default settings. my project file size grows almost to 20GB. And when the project file gets that big, the backups will also file (not enough space on my disk). When i 'cat' the project file, i noticed that lots of responses from crawl are saved there. When the crawling stops, the next step is to find auditable end points, and that took longer than i had patience to wait (waited 3hours). Some characteristics of my target web site are: - most of the responses are really big, content-lengths are between 80 000 and 400 000, mostly averaging to around 100 000. Root of the website had content-length of 400 000. - web site is traditional PHP application, without any fancy JS frameworks - crawler reports that it has made bit over 4000 requests - crawler had found between 500 and 600 end points (web site has lots of end points, and this is why crawler would have been really nice) I tested the crawling with pro versions of 2021.8.4 and 2021.9.1, same results in both cases. I', running the burp on kali linux. I can perform active scans without any problems, when sending the request manually to the scanner.

Ben, PortSwigger Agent | Last updated: Nov 12, 2021 10:45AM UTC

Hi Miro, Burp disk-based Project files are designed so that users can save and return to the work that they have carried out at a later date. When you perform a full crawl and audit of a site, Burp will be crawling the site in order to discover content before then carrying out auditing against the discovered content. The requests and responses that are generated as a result of this process will be being stored within the project file. The more complex/larger the site is will result in Burp needing to issue more requests in order to actually try and identify as much of the site content as possible and then actually audit that content. 20GB does sound a tad on the large side though. Just to confirm, you are starting with an entirely new project file when you perform this scan and you are seeing the project file reach this size purely as a result of the scan (I ask because a project file will store details of the all of the work that you have carried out via Burp whilst you have the project open - if you are always using one specific project file to carry out all your activities then it will be storing details of said activities and would naturally increase in size over time)?

Miro | Last updated: Nov 12, 2021 01:25PM UTC

Hi, Yes, i'm always creating a new project file between the cases. When i was testing the crawler, i created a new project file, and just started the "crawl and audit" -scan, with default settings, i didn't do anything else. Still my project file grew to the 20GB. This all happened with in a few hours (default max crawling time). Just as a reference, when doing only the scanning (manually selecting scannable requests), my project file is currently at 1.25GB. Scanner has send +1million requests, and found 2100 issues.

Andrew | Last updated: May 25, 2022 01:05AM UTC

I'm getting a fair bit of bloat in 2022.3.8 also (20gig+ files). I'm not using crawl but do a lot of active scans with plugins. Save As excluding out of scope items dropped this to 14Gig. Perhaps an extension is writing lots of entries somewhere plus i have 260k errors related to 'no NTLM challenge recieved from proxy server' in the UI so not sure if thats logging something somewhere. Even with the NTLM error everythings is working fine. It would be great if there was a file analyzer to determine what's using all the space in the burp file and do a cleanup, dedupe or something.

Ben, PortSwigger Agent | Last updated: May 26, 2022 09:43AM UTC

Hi Andrew, I can see that you have sent us an email around this issue and are already in discussion with one of my colleagues - it is probably easier to continue this discussion via that medium.

Syed | Last updated: Oct 13, 2024 01:57AM UTC