-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Encourage always-escaping ampersand character. #11988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In the example highlighting ambiguities from missing semicolons on named character references, a "correct" encoding is provided, but that example makes no mention of the fact that the fragment was ambiguous precisely because the ampersand wasn't escaped. This patch adds a clarifying note explaining how this situation is avoided by always escaping the ampersand. Co-authored-by: Jon Surrell <[email protected]> GitHub-PR: 11988 GitHub-PR-URL: whatwg#11988
d1fb385 to
9753779
Compare
|
As a side note, I overlooked adding my name to the list of contributors in my first submission. |
|
I was surprised to find no recommendation about escaping
I read this as if I would change this section to something like the following: -<!-- &ted is ok, since it's not a named character reference -->
+<!-- "&ted" is ok because "ted" is not a named character reference.
+<!-- "&ted" is equivalent and less error-prone because "&" explicitly decodes to "&". -->There is precedent for such a recommendation. Section 4.12.1.3 Restrictions for contents of script elements has a prominent note with an encoding recommendation:
Section 13.1.4 Character references seems like a good place to add a similar note. For example Note Where character references are allowed, it's a good idea to always encode I would consider mention the most common characters that are useful to escape in different contexts, but the note about |
|
https://html.spec.whatwg.org/multipage/syntax.html#character-references already requires this so I'm not sure we need to state it again in the parser section. Is the problem that the parser doesn't flag it? |
I believe the problem here is that the illustrative example in the syntax-error section explicitly states that the correct way to produce HTML text containing The example illustrates that a parser will correctly identify So basically this is just a confusing aspect for implementers and it seems like we could tweak the wording to maintain the demonstration of how these errors are handled without encouraging people to lean on syntax errors in cases where they produce the right output. |
|
I see, this is part of https://html.spec.whatwg.org/multipage/introduction.html#syntax-errors. We don't disallow |
|
@annevk thanks. I’m very open to trying out different ideas, but I think the spec is actually a bit vague on this.
Unless I’m wrong, the spec does not require that However, if someone is authoring HTML and not intending to produce a character reference, a stray I think we all agree that the intention is to always escape |
|
That's what I'm saying as well though in my latest comment. The Writing section explicitly allows you to do this. So I don't want to accept this PR as-is, as it'll contradict the Writing section. @zcorpan was involved in some of the details here and should probably weigh in. |
|
sounds great, and I have no wish that this be as-is. in fact, I was hoping for further input because I myself struggled to figure out how best to represent it. @sirreal is the author of the original suggestion. interestingly enough, the HTML 3 spec was clearer on this point, but that entire document comprises only a handful of ill-defined paragraphs 🙃
|

In the example highlighting ambiguities from missing semicolons on named character references, a "correct" encoding is provided, but that example makes no mention of the fact that the fragment was ambiguous precisely because the ampersand wasn't escaped.
This patch adds a clarifying note explaining how this situation is avoided by always escaping the ampersand.
(See WHATWG Working Mode: Changes for more details.)