Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions docs/standard/base-types/best-practices-regex.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,20 @@ ms.assetid: 618e5afb-3a97-440d-831a-70e4c526a51c

The regular expression engine in .NET is a powerful, full-featured tool that processes text based on pattern matches rather than on comparing and matching literal text. In most cases, it performs pattern matching rapidly and efficiently. However, in some cases, the regular expression engine can appear to be slow. In extreme cases, it can even appear to stop responding as it processes a relatively small input over the course of hours or even days.

This article outlines some of the best practices that developers can adopt to ensure that their regular expressions achieve optimal performance.
This article outlines some of the best practices that developers can adopt to ensure that their regular expressions achieve optimal performance and robustness.

[!INCLUDE [regex](../../../includes/regex.md)]
> [!WARNING]
> Unrestricted use of regular expressions with untrusted input can subject applications to [denial-of-service attacks](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS). The .NET regular expression engine offers options to mitigate these attacks. For more information, see [Consider the input source](#consider-the-input-source), [Take charge of backtracking](#take-charge-of-backtracking), and [Use time-out values](#use-time-out-values).
>
> The .NET regular expression engine does not offer protection against untrusted *patterns*, and applications should not create regular expression patterns from untrusted user-provided values. For more information, see [Use trusted patterns](#use-trusted-patterns).

## Use trusted patterns

The .NET regular expression engine is designed with the assumption that patterns are trusted, that is, they are authored or reviewed by the application developer, not supplied by end users or other untrusted sources. Patterns can cause excessive resource consumption regardless of the input text, and the regular expression engine does not attempt to guard against hostile patterns.
The .NET regular expression engine distinguishes *patterns* (the regular expression itself, such as `^[0-9A-Za-z]+$`) from the *input text* (the string being evaluated against the regular expression, such as `123AbC456`). These values are typically passed to the `Regex` APIs through arguments named *pattern* and *input*, respectively.

These APIs are designed with the assumption that patterns are trusted, that is, they are authored or reviewed by the application developer, not supplied by end users or other untrusted sources. Patterns can cause excessive resource consumption regardless of the input text, and the regular expression engine does not attempt to guard against hostile patterns.

If your application needs to accept search expressions from users, avoid passing user input directly as a regex pattern. Instead, consider these alternatives:
If your application needs to accept search expressions from users, avoid passing user-provided values directly as a regex pattern. Instead, consider these alternatives:

- Support a restricted search syntax (such as simple wildcards or substring matching) that you translate into a regex pattern internally.
- Use <xref:System.Text.RegularExpressions.Regex.Escape*?displayProperty=nameWithType> to treat any user-supplied text as a literal string within a pattern.
Expand Down Expand Up @@ -222,6 +227,9 @@ The regular expression time-out interval defines the period of time that the reg

If you've defined a time-out interval and a match isn't found at the end of that interval, the regular expression method throws a <xref:System.Text.RegularExpressions.RegexMatchTimeoutException> exception. In your exception handler, you can choose to retry the match with a longer time-out interval, abandon the match attempt and assume that there's no match, or abandon the match attempt and log the exception information for future analysis.

> [!WARNING]
> Time-out values are not intended as a security boundary against malicious *patterns*. For more information, see [Use trusted patterns](#use-trusted-patterns).

The following example defines a `GetWordData` method that instantiates a regular expression with a time-out interval of 350 milliseconds to calculate the number of words and average number of characters in a word in a text document. If the matching operation times out, the time-out interval is increased by 350 milliseconds and the <xref:System.Text.RegularExpressions.Regex> object is reinstantiated. If the new time-out interval exceeds one second, the method rethrows the exception to the caller.

[!code-csharp[Conceptual.RegularExpressions.BestPractices#12](./snippets/regex/csharp/timeout1.cs#12)]
Expand Down
5 changes: 4 additions & 1 deletion docs/standard/base-types/regular-expression-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,10 @@ The following example is identical to the previous example, except that the stat

## NonBacktracking mode

By default, .NET's regex engine uses *backtracking* to try to find pattern matches. A backtracking engine is one that tries to match one pattern, and if that fails, goes backs and tries to match an alternate pattern, and so on. A backtracking engine is very fast for typical cases, but slows down as the number of pattern alternations increases, which can lead to *catastrophic backtracking*. The <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayProperty=nameWithType> option, which was introduced in .NET 7, doesn't use backtracking and avoids that worst-case scenario. Its goal is to provide consistently good behavior, regardless of the input being searched.
By default, .NET's regex engine uses *backtracking* to try to find pattern matches. A backtracking engine is one that tries to match one pattern, and if that fails, goes back and tries to match an alternate pattern, and so on. A backtracking engine is very fast for typical cases, but slows down as the number of pattern alternations increases, which can lead to *catastrophic backtracking*. The <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayProperty=nameWithType> option, which was introduced in .NET 7, doesn't use backtracking and avoids that worst-case scenario. Its goal is to provide consistently good behavior, regardless of the input being searched.

> [!WARNING]
> The .NET regex engine assumes the *pattern* is trusted. NonBacktracking mode doesn't change this assumption: it guards against expensive *input*, not against actively malicious *patterns*. For more information, see [Use trusted patterns](best-practices-regex.md#use-trusted-patterns).

The <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayProperty=nameWithType> option doesn't support everything the other built-in engines support. In particular, the option can't be used in conjunction with <xref:System.Text.RegularExpressions.RegexOptions.RightToLeft?displayProperty=nameWithType>, <xref:System.Text.RegularExpressions.RegexOptions.ECMAScript?displayProperty=nameWithType>, or <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType>. It also doesn't allow for the following constructs in the pattern:

Expand Down
2 changes: 1 addition & 1 deletion includes/regex.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@

> [!WARNING]
> When using <xref:System.Text.RegularExpressions> to process untrusted input, pass a timeout. A malicious user can provide input to `RegularExpressions`, causing a [Denial-of-Service attack](https://www.cisa.gov/news-events/news/understanding-denial-service-attacks). ASP.NET Core framework APIs that use `RegularExpressions` pass a timeout.
> Unrestricted use of <xref:System.Text.RegularExpressions> with untrusted input can subject applications to [denial-of-service attacks](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS). Consult [Best practices for regular expressions in .NET](/dotnet/standard/base-types/best-practices-regex) for guidance on how to safely use .NET regular expressions with untrusted input.
Loading