-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Open
Labels
api-suggestionEarly API idea and discussion, it is NOT ready for implementationEarly API idea and discussion, it is NOT ready for implementationarea-System.Text.Encoding
Milestone
Description
Background and motivation
.NET already exposes functionality of encoding detection based on first few bytes in StreamReader class:
| private void DetectEncoding() |
This functionality is tightly coupled with
StreamReader. I think, it can be extracted into its own API, so developers won't need to write their own implementations of various precision/complexity
API Proposal
namespace System.Text;
public class Encoding
{
public static bool TryDetectFromUnicodePreamble(ReadOnlySpan<byte> input, out Encoding? encoding);
}API Usage
string DecodeUserProvidedBytes(ReadOnlySpan<byte> bytes)
{
var encoding = Encoding.TryDetectFromUnicodePreamble(bytes, out var detectedEncoding)
? detectedEncoding
: Encoding.UTF8; // some sane default guess
return encoding.GetString(bytes);
}Alternative Designs
Make an API, that takes a fallback encoding, e.g. Encoding Encoding.DetectFromUnicodePreamble(ReadOnlySpan<byte> input, Encoding fallbackEncoding). The problems I see here are:
- Should the fallback variant be nullable? If yes, what should be the behavior:
- Let's say, throw
ArgumentNullExceptionif detect didn't succeed. This is bad because now we are throwing argument validation exception based on user input, which is not deterministic - Ok, maybe always check it for null. Let's say, bytes are not just any user input, but rather come from the source with deterministic rules, so the API user knows that detect based on first few bytes always succeeds and can only return different encodings. Now we are forsing user to pass a non-null dummy unused value
- Don't check fallback case for null at all. But that would mean we have to return a nullable value from the method, which defeats its original purpose (call and always get a result)
- Let's say, throw
- Such shape would limit the fallback behavior user can provide. For instance, let's say user is working on some kind of text editor app and when encoding cannot be determined they want the app user to select one from a combo box. With this API shape such behavior would not be possible without hacks
Risks
- With the addition of this API .NET runtime now takes responsibility to detect as many common encodings as possble. Given that this functionality already exists in the runtime, I don't consider that risk to be huge
Metadata
Metadata
Assignees
Labels
api-suggestionEarly API idea and discussion, it is NOT ready for implementationEarly API idea and discussion, it is NOT ready for implementationarea-System.Text.Encoding