Speed up search#501
Open
tim-band wants to merge 3 commits into
Open
Conversation
* removed regexing every single original text in the DB * folded original text denormalization added * migration 0056 made reversible * weakened a couple of tests (sorry)
tcouch
approved these changes
May 12, 2026
Collaborator
tcouch
left a comment
There was a problem hiding this comment.
This looks great. The only thing is this doesn't actually address issue #494 as that's to do with the speed with which the Antiquarian page itself loads.
The priority for search is issue #493 which might be confusing things. At the moment a search for "Varro" puts the actually Antiquarian page near the end. So the order in which content types are returned in search results needs fixing. Could be done in another PR though if you want to merge this.
| return content | ||
|
|
||
|
|
||
| def fold_latin_and_remove_punctuation(content: str) -> str: |
Collaborator
There was a problem hiding this comment.
Does make_plain_text not already remove punctuation? No harm in doing it twice I suppose.
* type hints * pydocs * minor snippet refactor
8e8ae5d to
2f1cd12
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Speeds up search.
There is a little problem with this in that the snippet now has all folding applied, rather than just the folding that was necessary to get this search done, which can look a bit weird sometimes. We could change the snippet finding code to look at the plain (rather than folded) text, in which case we'll see the unfolded text, but only if the query matches without folding (which is I think how it works right now anyway)
This speeds up the query for "Varro" from 17-18 seconds down to about 11.