Skip to content

Speed up search#501

Open
tim-band wants to merge 3 commits into
developmentfrom
fix/speed-up-varro
Open

Speed up search#501
tim-band wants to merge 3 commits into
developmentfrom
fix/speed-up-varro

Conversation

@tim-band
Copy link
Copy Markdown
Collaborator

@tim-band tim-band commented May 11, 2026

Speeds up search.

  • removed regexing every single original text in the DB at the start of each query
  • folded original text denormalization added, so it doesn't have to be done each query
  • migration 0056 made reversible (easy fix)
  • weakened a couple of tests (sorry) -- but I don't think they were that important. For example varro will match "var,ro".

There is a little problem with this in that the snippet now has all folding applied, rather than just the folding that was necessary to get this search done, which can look a bit weird sometimes. We could change the snippet finding code to look at the plain (rather than folded) text, in which case we'll see the unfolded text, but only if the query matches without folding (which is I think how it works right now anyway)

This speeds up the query for "Varro" from 17-18 seconds down to about 11.

* removed regexing every single original text in the DB
* folded original text denormalization added
* migration 0056 made reversible
* weakened a couple of tests (sorry)
@tim-band tim-band requested a review from tcouch May 11, 2026 08:20
Copy link
Copy Markdown
Collaborator

@tcouch tcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. The only thing is this doesn't actually address issue #494 as that's to do with the speed with which the Antiquarian page itself loads.

The priority for search is issue #493 which might be confusing things. At the moment a search for "Varro" puts the actually Antiquarian page near the end. So the order in which content types are returned in search results needs fixing. Could be done in another PR though if you want to merge this.

return content


def fold_latin_and_remove_punctuation(content: str) -> str:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does make_plain_text not already remove punctuation? No harm in doing it twice I suppose.

* type hints
* pydocs
* minor snippet refactor
@tim-band tim-band changed the title Fixes #494: Speed up loading Varro Speed up search May 12, 2026
@tim-band tim-band force-pushed the fix/speed-up-varro branch from 8e8ae5d to 2f1cd12 Compare May 12, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants