Skip to content

Deduplicate 404-like/SPA False Positives URLs #2809

@TrebledJ

Description

@TrebledJ

Currently, it appears 404-like responses are not filtered out. (I'm unaware if this feature exists or is in the works.)

For instance, consider:

  • a URL responds with 200 OK, but the content is "Page not found" (non-HTML).
  • a React SPA returns the same HTML for all routes

In these cases, it would be useful to deduplicate URL responses to minimise noise. This would also be useful when combined with "probing" modules such as ffuf, robots, or git when the HTTP response is non-HTML 200 OK but is an FP. (Although in the ffuf case, we wouldn't have access to the HTTP response... perhaps it's time to handroll one? :') But that's a separate issue.)

feroxbuster has this kind of 404 detection which is very helpful for removing FPs. (implementation here, for reference)

Implementation-wise, I'm thinking of 2 functions? But not sure where to put them. Perhaps in the Scanner class?

  • detect_404_like_response(url: str) - accepts base URLs and attempts to detect 404 patterns, then registers them in a global table
  • dedup_404_like_response() -> bool - for modules to query whether a HTTP response is 404-like, and thus, choose not to emit a URL/FINDING/CODE_REPOSITORY/whatever event

Thoughts?

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions