Merged
Conversation
markusicu
reviewed
Apr 17, 2026
Comment on lines
+303
to
+306
| final int snippetIndex = | ||
| stringIndices.getOrDefault(snippet, allTheStrings.length()); | ||
| if (snippetIndex == allTheStrings.length()) { | ||
| allTheStrings.append(snippet).append(RECORD_SEPARATOR); |
Member
There was a problem hiding this comment.
Please create & use a helper function that takes a string (without the separator) and returns the index. Internally, figure out whether to reuse or append.
Member
Author
There was a problem hiding this comment.
Done. And by systematically checking for a pre-existing string (I was doing that for the property values but not for the HTML), brought the size down to 8.76 MiB (from 8.91 MiB) mentioned above.
markusicu
approved these changes
Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Compare https://eggrobin.github.io/unicode-annotations/charindex.html (old) and https://eggrobin.github.io/unicode-annotations/charindex-smol.html (with this change).
(Note that I will probably replace charindex.html with the -smol one after merging this.)
Take all the highly repetitive strings (both the actual property values and the HTML snippets), stick them in a giant string, and deflate that, replacing the strings with indices in the giant string throughout the data structures: it goes from 22 MiB to 1388 kiB (6.3%). Also don’t try to pretty-print a map with 66666 entries.
This brings the generated charindex.html from 42.3 MiB to 8.91 MiB (21% of its size).
The page gets compressed by the server, and the compressed size doesn’t change much (4477 kB vs. 3175 kB, says Chrome), so this doesn’t change download times very much.
However, this massively reduces the time spent parsing JS. When the page is loaded from disk cache, the time to DomContentLoaded goes from 2.10 s to 636 ms.