ICU-3736 UAX44-LM2 loose matching for character names#3932
Hidden character warning
ICU-3736 UAX44-LM2 loose matching for character names#3932eggrobin wants to merge 22 commits intounicode-org:mainfrom
Conversation
richgillam
left a comment
There was a problem hiding this comment.
Maybe it's just because I'm not awake yet (or maybe I'm not the best choice of reviewer), but I had trouble finding my way through this. A lot of it made sense, but it wasn't clear to me what the actual rules for matching are, and there seemed to be several spots in the code that assumed more knowledge on the part of the reader than I possess. Can you help me understand what's going on here?
| if (static_cast<uint8_t>(';') >= tokenCount || tokens[static_cast<uint8_t>(';')] == static_cast<uint16_t>(-1)) { | ||
| continue; | ||
| } | ||
| } |
There was a problem hiding this comment.
Why are you taking this out?
There was a problem hiding this comment.
Unicode 1 names were removed from ICU in ICU 49.
|
I added some comments; they are pretty long because while writing them I noticed some subtle edge cases (which, as best I can tell, were handled correctly, but are noteworthy)… |
markusicu
left a comment
There was a problem hiding this comment.
As usual, nice and elegant code where I feel like I learn good and modern techniques. I do have a few comments though.
| public: | ||
| class Matcher { | ||
| public: | ||
| Matcher(const CharacterNameQuery *const query) |
There was a problem hiding this comment.
why not take the query by reference? it must not be nullptr.
There was a problem hiding this comment.
Google style uses pointers when there is a lifetime requirement, as here, and I like that. Added a comment though, lifetime requirements should be documented…
There was a problem hiding this comment.
AFAICT Google code has been moving away from pointers to references when not nullable, including for outputs.
There was a problem hiding this comment.
Regardless of Google... I much prefer using a reference when the thing can't be null.
There was a problem hiding this comment.
AFAICT Google code has been moving away from pointers to references when not nullable, including for outputs.
For outputs, yes; but not for lifetime dependencies if I recall correctly.
There was a problem hiding this comment.
I don't see how a pointer helps for that; it's certainly not something we do in ICU.
Please change query to a reference.
There was a problem hiding this comment.
They do advise to make the member a reference. Done.
There was a problem hiding this comment.
I guess I see the point:
In some cases reference parameters can bind to temporaries, leading to lifetime bugs.
One of many C++ foot guns... I guess I can live with the pointer :-(
There was a problem hiding this comment.
I don't see how a pointer helps for that;
Well, the ToTW has some pretty clear examples; it prevents passing temporaries.
But a cursory search for outlive finds at least one comment about a lifetime requirements on reference parameters in ICU, so I guess we do this here. Done.
There was a problem hiding this comment.
But we certainly do that in some places in ICU: see, e.g.,
icu/icu4c/source/i18n/units_data.cpp
Lines 51 to 54 in c5946d7
Checklist