The Burp Suite User Forum was discontinued on the 1st November 2024.

Burp Suite User Forum

For support requests, go to the Support Center. To discuss with other Burp users, head to our Discord page.

SUPPORT CENTER DISCORD

Cross-domain crawling: another port is ignored even when added to the scope

Oleksii | Last updated: Apr 17, 2023 03:15PM UTC

Imagine we have a webapp named `domain.com` with: ``` <a href="http://domain.com/">part 80</a><br> <a href="http://domain.com:81/">part 81</a><br> <a href="http://sub.domain.com/">sub part 80</a><br> <a href="http://sub.domain.com:81/">sub part 81</a><br> ``` and a set of advanced scope rules: ``` { "enabled": true, "protocol": "any", "host": "^domain\\.com$", "port": "^80|81$" } { "enabled": true, "protocol": "any", "host": "^sub\.domain\\.com$", "port": "^80|81$" } ``` 1. When starting the scan from `http://domain.com/`, it scans only `domain.com` and `sub.domain.com` AND `domain.com:81` and `sub.domain.com:81` are just ignored; 2. In case when `http://domain.com/` does `302 Found` to `http://domain.com:81/` it scans both `domain.com` and `domain.com:81`. The concern is that `domain.com:81` (and `sub.domain.com:81`) are ignored unless explicitly redirected to by the webapp, even when they are added to the scope. Is this behavior expected? Thank you.

Michelle, PortSwigger Agent | Last updated: Apr 18, 2023 12:39PM UTC

Hi When you configure a scan using Burp Suite Professional, the 'URLs to scan' section defines the start URL from which Burp will begin to crawl. It will then follow links from any URLs listed in this section. So if there are no links to sub.domain.com:81 it would not crawl to that location. https://portswigger.net/burp/documentation/desktop/automated-scanning/setting-pro-scope I hope this helps. Please let me know if you have any further questions.

Oleksii | Last updated: Apr 18, 2023 01:34PM UTC

So, just to clarify: it will ignore links to `domain.com:81` even if have added it to the scope, unless `http://domain.com:81` is added to the list of `URLs to scan`, right?

Michelle, PortSwigger Agent | Last updated: Apr 18, 2023 03:27PM UTC

If there are links to http://domain.com:81 from one of the URLs listed under URLs to scan, they will be followed as they are in scope. If there are no links to http://domain.com:81, the scanner will not use http://domain.com:81 as a starting point for a crawl unless it is included in URLs to scan. I hope that makes sense.

Oleksii | Last updated: Apr 19, 2023 12:03PM UTC

Thank you Michelle for detailed explanation but I see slight different behavior. All the details are below. I launch Burp Suite Pro v2023.3.3 and a test webapp inside a docker-compose environment using a config: ``` version: '3' services: domain.com: build: {context: webapp} scanner: image: burp-suite depends_on: domain.com: condition: service_healthy ``` The source code (Golang) for the webapp is: ``` package main import ( "fmt" "log" "net/http" ) func main() { go func() { log.Fatal(http.ListenAndServe(":80", &h{})) }() log.Fatal(http.ListenAndServe(":81", &h{})) } type h struct{} func (h) ServeHTTP(w http.ResponseWriter, r *http.Request) { defer func() { _ = r.Body.Close() }() log.Printf("%s %s %s", r.Host, r.Method, r.URL.Path) w.Header().Set("Content-Type", "text/html") _, _ = w.Write([]byte(fmt.Sprintf( `<html> <head><title>Webapp %[1]s</title></head> <body><h1>Webapp - %[1]s</h1> <a href="http://domain.com/">part 80</a><br> <a href="http://domain.com:81/">part 81</a><br> <hr> <a href="/a">sub a</a><br> <a href="/b">sub b</a><br> <p>%[2]s </body></html>`, r.URL.Path, r.URL.RawQuery))) } ``` As you can see, the webapp listens on both 80 and 81 TCP port numbers and every page has links to: * http://domain.com/ * http://domain.com:81/ I start the scan using Burp Suite REST API with configuration: ``` { "urls": [ "http://domain.com/" ], "scope": { "type": "AdvancedScope", "include": [ { "enabled": true, "protocol": "any", "host": "^domain\\.com$", "port": "^80|81$", "file": "^/.*$" } ], "exclude": [] }, "scan_configurations": [ { "type": "CustomConfiguration", "config": "{\"crawler\":{\"crawl_limits\":{\"maximum_crawl_time\":180,\"maximum_request_count\":0,\"maximum_unique_locations\":5000},\"crawl_optimization\":{\"breadth_first_hop_limit\":8,\"crawl_strategy\":\"normal\",\"crawl_strategy_customized\":true,\"function_fingerprint_threshold\":0,\"link_fingerprint_trigger\":0,\"logging_directory\":\"\",\"logging_enabled\":false,\"max_state_changing_sequences\":0,\"max_submissions_per_form\":1,\"max_unmatched_link_tolerance\":0,\"maximum_link_depth\":10,\"recent_destinations_buffer_size\":5},\"customization\":{\"browser_based_navigation_mode\":\"yes\",\"customize_user_agent\":true,\"maximum_items_from_sitemap\":1000,\"maximum_speculative_links\":1000,\"request_robots_txt\":true,\"request_sitemap\":true,\"request_speculative\":true,\"submit_forms\":true,\"user_agent\":\"User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)\"},\"error_handling\":{\"number_of_follow_up_passes\":1,\"pause_task_requests_timed_out_count\":10,\"pause_task_requests_timed_out_percentage\":0},\"login_functions\":{\"attempt_to_self_register_a_user\":false,\"trigger_login_failures\":true}}}" }, { "type": "CustomConfiguration", "config": "{\"scanner\":{\"issues_reported\":{\"scan_type_intrusive_active\":true,\"scan_type_javascript_analysis\":true,\"scan_type_light_active\":true,\"scan_type_medium_active\":true,\"scan_type_passive\":true,\"select_individual_issues\":true},\"error_handling\":{\"consecutive_audit_check_failures_to_skip_insertion_point\":5,\"consecutive_insertion_point_failures_to_fail_audit_item\":5,\"number_of_follow_up_passes\":1,\"pause_task_requests_timed_out_count\":0,\"pause_task_requests_timed_out_percentage\":0}}}" } ] } ``` As you can see, the scope includes both `http://domain.com/` and `http://domain.com:81/`. Although, during the scan Burp Suite doesn't follow `http://domain.com:81/` link and crawls/audits only `http://domain.com/` part of the webapp. I clearly see this in webapp's logs and in Burp Suite logs (written by `IHttpListener.processHttpMessage()` Java hook). Another interesting finding is when the webapp responds with `HTTP 302 Found` and `Location: http://domain.com:81`, then Burp Suite does follow it and effectively scans both `domain.com` and `domain.com:81`.

Michelle, PortSwigger Agent | Last updated: Apr 20, 2023 10:29AM UTC

Hi Thanks for the additional detail about the pages. From the previous descriptions, I hadn't picked up on the fact each page contained a link. If you crawl the website by creating a scan through the UI and check the live crawl view and the target site map, do you see the links being followed? If your app includes a link like http://domain.com/foo, is that link followed? If it's easier to send us some screenshots of the scan results to help show what you're seeing, please feel free to send an email to support@portswigger.net.

Oleksii | Last updated: Apr 20, 2023 04:11PM UTC

Hi Michelle, The every page of the webapp contains: ``` <a href="/a">sub a</a><br> <a href="/b">sub b</a><br> ``` which makes a tree structure of the site like: ``` domain.com/ |- a/ | |- a/ | | |- a/ | | | ... | | \- b/ | | ... | \- b/ | ... \- b/ |- a/ | ... \- b/ ... ``` And all of them are followed and scanned perfectly by the Burp Suite and I'm pretty sure `/foo/` will be followed and scanned as well.

Michelle, PortSwigger Agent | Last updated: Apr 21, 2023 11:29AM UTC

Hi To help me understand how your site is structured better, would you be happy to perform a crawl only of the site and send me the resulting Burp project file? I can then see in the site map which links have been followed. If you can send it support@portswigger.net, that would be great :)

Oleksii | Last updated: Apr 28, 2023 11:39AM UTC

Finalizing the conversation with extracts from the email thread. You wrote: > We do deduplicate links based on the URL ignoring their ports (and protocols), this is intentional to help prevent wasting time crawling on both https and http. If the file were different, so for example you visited the file index.html on http://domain.com but the file test.html on http://domain.com:81, then both links would be visited. I changed the webapp to use `http://domain.com:81/page` link instead of `http://domain.com:81/` and I do confirm that the `http://domain.com:81/` is crawled and scanned now. Could you please describe this behavior in the documentation or provide a link to an existing description, if exists? Thank you!

Michelle, PortSwigger Agent | Last updated: May 02, 2023 08:47AM UTC