Burp Suite User Forum

Create new post

Page Deduplication

Kris | Last updated: Feb 26, 2016 10:07AM UTC

Some applications offer a large set of sites that only present different data but are based on the same template. This can result in thousands of pages in the scope that are basically irrelevant. There should be some way of getting rid of similary pages or analyze the whole scope to sort out the unique pages. Gryffin from yahoo does something similar already. Something like MinHash would probably work to do the job. https://github.com/yahoo/gryffin

PortSwigger Agent | Last updated: Feb 26, 2016 01:35PM UTC

We are currently working on some enhancements to the Spider tool which will enable it to recognize when this phenomenon occurs and treat all the relevant responses as "the same" despite the different URLs, and (eventually) stop following new URLs that are likely to lead to other instances of the same response. This is a major ongoing task and it is likely to be a few months until the capability is ready for release.

You must be an existing, logged-in customer to reply to a thread. Please email us for additional support.