-
-
Notifications
You must be signed in to change notification settings - Fork 762
Description
Currently, it appears 404-like responses are not filtered out. (I'm unaware if this feature exists or is in the works.)
For instance, consider:
- a URL responds with 200 OK, but the content is "Page not found" (non-HTML).
- a React SPA returns the same HTML for all routes
In these cases, it would be useful to deduplicate URL responses to minimise noise. This would also be useful when combined with "probing" modules such as ffuf, robots, or git when the HTTP response is non-HTML 200 OK but is an FP. (Although in the ffuf case, we wouldn't have access to the HTTP response... perhaps it's time to handroll one? :') But that's a separate issue.)
feroxbuster has this kind of 404 detection which is very helpful for removing FPs. (implementation here, for reference)
Implementation-wise, I'm thinking of 2 functions? But not sure where to put them. Perhaps in the Scanner class?
detect_404_like_response(url: str)- accepts base URLs and attempts to detect 404 patterns, then registers them in a global tablededup_404_like_response() -> bool- for modules to query whether a HTTP response is 404-like, and thus, choose not to emit a URL/FINDING/CODE_REPOSITORY/whatever event
Thoughts?