Skip to content

Improvements from live testing#6404

Open
nadav-govari wants to merge 2 commits intonadav/pr11from
nadav/pr12
Open

Improvements from live testing#6404
nadav-govari wants to merge 2 commits intonadav/pr11from
nadav/pr12

Conversation

@nadav-govari
Copy link
Copy Markdown
Collaborator

Description

There were a number of observations made when testing.

  1. When deploying >2 compactors, the split cache becomes an impediment rather than an optimization. if the splits are never read, theyre just resident in memory and it doesnt take long to start thrashing. Rather than fix the split cache, remove it (for now. I might come back and fix this down the road)

  2. The merge policy operations only produced one merge per level per call, which resulted in huge split backlogs buildings. We now construct as many merge operations as we can on every tick/

  3. The splits table wasnt sorted the way we needed, so our scans were expensive. Added an index to make this easier.

  4. Got rid of the scan cursor. We can do this naturally with a scan limit and a maturity greater than. We'll always fetch the <= 5000 oldest immature splits. As splits get merged and marked for deletion, the head will move naturally.

How was this PR tested?

Extensive testing on a test cluster that sees meaningful traffic, including a backfill of over 900k immature splits using the compactor architecture.

Comment on lines +43 to +44
const SCAN_PAGE_SIZE: usize = 5_000;
#[derive(Debug)]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const SCAN_PAGE_SIZE: usize = 5_000;
#[derive(Debug)]
const SCAN_PAGE_SIZE: usize = 5_000;
#[derive(Debug)]

break;
}
for operation in operations {
for split in operation.splits_as_slice() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for split in operation.splits_as_slice() {
for split in &operation.splits {

Whoever wrote splits_as_slice is a terrorist.

@@ -0,0 +1 @@
DROP INDEX IF EXISTS splits_published_maturity_timestamp_idx; No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That newline

--
-- The partial predicate is restricted to split_state = 'Published' because
-- partial-index predicates must be IMMUTABLE; "now()" cannot appear here.
CREATE INDEX IF NOT EXISTS splits_published_maturity_timestamp_idx
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make diesel-guard and our users' DB not happy:

❌ Unsafe migration detected in quickwit-metastore/migrations/postgresql/27_create-index-splits-published-maturity-timestamp.up.sql

❌ ADD INDEX without CONCURRENTLY  (line 17)

Problem:
  Creating index 'splits_published_maturity_timestamp_idx' on table 'splits' without CONCURRENTLY acquires a SHARE lock, blocking writes (INSERT, UPDATE, DELETE). Duration depends on table size. Reads are still allowed.

Safe alternative:
  Use CONCURRENTLY to build the index without blocking writes:
     CREATE INDEX CONCURRENTLY splits_published_maturity_timestamp_idx ON splits;

  Note: CONCURRENTLY takes longer and uses more resources, but allows concurrent INSERT, UPDATE, and DELETE operations. The index build may fail if there are deadlocks or unique constraint violations.

  Considerations:
  - Requires more total work and takes longer to complete
  - If it fails, it leaves behind an "invalid" index that should be dropped

  Note: CONCURRENTLY cannot run inside a transaction block.
  Add `-- no-transaction` as the first line of the migration file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants