Avoid leaking sitrep rows during deletion#10143
Conversation
| .transaction(&conn, |conn| { | ||
| async move { | ||
| // Step 1: Delete orphaned fm_sitrep metadata rows. | ||
| let mut sitreps_deleted = 0usize; |
There was a problem hiding this comment.
So, the structure here is basically:
- Delete sitrep metadata rows (make orphaned rows)
- Clean up orphaned rows (which maybe we just made, or maybe got created by an errant INSERT operation that bailed out)
We could make this transaction smaller by "limiting how many sitrep metadata rows we create at once".
For example, we could:
- "Only create up to 64 orphans in a txn"
- ... then clean up all their data, within this transaction (to avoid torn reads)
- ... then, COMMIT the transaction, and do another batch, until we stop cleaning stuff up / hit some arbitrary limit / etc
There was a problem hiding this comment.
This is maybe nitpicky, but this comment was initially a little bit confusing to me, because we are kind of using "orphaned" to refer to two different things: both sitreps that are orphaned due to not being in fm_sitrep_history, and sitrep child rows that are orphaned due to the sitrep metadata row being deleted. Also, later on you refer to "creating orphans", but that actually means deleting a metadata row, rather than actually creating new records. Anyway I just wanted to write that down for clarity.
There was a problem hiding this comment.
And, addressing the actual idea you proposed here, I agree that paginating the whole transaction might be a good idea to limit the duration of the transaction...
There was a problem hiding this comment.
I can try to update the comment here to make the orphan terminology more clear. I was kinda using "orphaned" to refer to the top-level sitrep, and "deeply orphaned" to refer to child tables of sitreps which no longer exist.
Adding a TODO for the pagination because it will have subtleties that maybe should be reviewed independently of this PR.
Thank you for your support in this matter :D |
hawkw
left a comment
There was a problem hiding this comment.
this code is kinda wild, i'm impressed --- thank you for humoring me
| ) -> Result<GcOrphansResult, Error> { | ||
| let conn = self.pool_connection_authorized(opctx).await?; | ||
|
|
||
| // TODO(eliza): there should probably be an authz object for the fm sitrep? |
hawkw
left a comment
There was a problem hiding this comment.
This is great, thank you for adding the macro and stuff! I had a bunch of small notes on the comments etc, but it looks great overall!
| // Ensure columns stay aligned even if a child table name is long. | ||
| let width = child_tables | ||
| .keys() | ||
| .map(|name| "orphaned rows deleted:".len() + name.len() + 1) |
There was a problem hiding this comment.
turbo nit: maybe worth a comment explaining why there's two spaces in orphaned rows deleted? i very briefly thought it was a typo before i realized what this was doing...
| stats.rows_deleted, | ||
| ); | ||
| println!(" {:<width$}{:>NUM_WIDTH$}", " batches:", stats.batches,); | ||
| } |
There was a problem hiding this comment.
not important, but it occurs to me that maybe if child_tables is empty, we should complain in some way? that would indicate a bug...
| /// metadata. To add a new child table to the sitrep GC, just add a line | ||
| /// here — the GC loop, omdb display, and completeness test all adapt | ||
| /// automatically. |
There was a problem hiding this comment.
nitpick: technically, you don't add a line here, you do that in the place where we invoke the macro...
There was a problem hiding this comment.
also, maybe it's worth briefly describing the syntax so that the person adding a new table is aware of the default column name for the sitrep ID and how it can be overridden (though in practice I think i would complain in a review if someone added a new sitrep child table where it was named anything other than that...)
There was a problem hiding this comment.
updated the docs, added an example of the syntax
| .transaction(&conn, |conn| { | ||
| async move { | ||
| // Step 1: Delete orphaned fm_sitrep metadata rows. | ||
| let mut sitreps_deleted = 0usize; |
There was a problem hiding this comment.
This is maybe nitpicky, but this comment was initially a little bit confusing to me, because we are kind of using "orphaned" to refer to two different things: both sitreps that are orphaned due to not being in fm_sitrep_history, and sitrep child rows that are orphaned due to the sitrep metadata row being deleted. Also, later on you refer to "creating orphans", but that actually means deleting a metadata row, rather than actually creating new records. Anyway I just wanted to write that down for clarity.
| .transaction(&conn, |conn| { | ||
| async move { | ||
| // Step 1: Delete orphaned fm_sitrep metadata rows. | ||
| let mut sitreps_deleted = 0usize; |
There was a problem hiding this comment.
And, addressing the actual idea you proposed here, I agree that paginating the whole transaction might be a good idea to limit the duration of the transaction...
|
|
||
| /// Builds a DELETE query that removes orphaned `fm_sitrep` metadata | ||
| /// rows in a single paginated batch (sitreps not in history whose | ||
| /// parent is not current). Returns (rows_deleted, next_marker). |
There was a problem hiding this comment.
turbo nit: perhaps this should have a line saying which test output file contains the text of the query, like you added in #10061?
| /// 1. Finds a batch of distinct `sitrep_id` values in the child table | ||
| /// 2. Deletes rows whose `sitrep_id` has no corresponding `fm_sitrep` | ||
| /// 3. Returns (rows_deleted, next_marker) where next_marker is NULL | ||
| /// when there are no more pages |
There was a problem hiding this comment.
nit: similarly, perhaps a "the SQL generated by this query is in ..." comment?
| Ok(result) => { | ||
| status.orphaned_sitreps_deleted = result.sitreps_deleted; | ||
| status.sitrep_metadata_batches = result.sitrep_metadata_batches; | ||
| status.batch_size = result.batch_size; | ||
| status.child_tables = result | ||
| .child_tables |
There was a problem hiding this comment.
this is a nitpick, but I'd kind of like it if we exhaustively destructured result here, like:
| Ok(result) => { | |
| status.orphaned_sitreps_deleted = result.sitreps_deleted; | |
| status.sitrep_metadata_batches = result.sitrep_metadata_batches; | |
| status.batch_size = result.batch_size; | |
| status.child_tables = result | |
| .child_tables | |
| Ok(GcOrphansResult { sitreps_deleted, sitrep_metadata_batches, batch_size, child_tables }) => { | |
| status.orphaned_sitreps_deleted = sitreps_deleted; | |
| status.sitrep_metadata_batches = sitrep_metadata_batches; | |
| status.batch_size = batch_size; | |
| status.child_tables = child_tables |
so that we get a compiler error if someone adds additional fields to GcOrphansResult that we're not handling here?
| // --------------------------------------------------------------- | ||
| // SitrepChildTableCounts: mirrors BlueprintTableCounts from | ||
| // deployment.rs to ensure every fm_* child table is covered by | ||
| // `SitrepChildTable`. | ||
| // --------------------------------------------------------------- |
There was a problem hiding this comment.
nit, take it or leave it: i would like it if this comment described what this thing actually does, in addition to saying "it's like that thing in this other file, and it's used for this purpose" --- it counts the number of rows in each sitrep child table, and the list of sitrep child tables is determined using the SitrepChildTable enum above.
There was a problem hiding this comment.
Updated the comment
| "found fm_* child table(s) not covered by \ | ||
| SitrepChildTable: {}\n\n\ | ||
| If you added a new fm_* child table, add a variant \ | ||
| to SitrepChildTable and update the orphan GC code \ | ||
| in fm_sitrep_gc_orphans.", |
There was a problem hiding this comment.
turbo nitpick: i might add
| "found fm_* child table(s) not covered by \ | |
| SitrepChildTable: {}\n\n\ | |
| If you added a new fm_* child table, add a variant \ | |
| to SitrepChildTable and update the orphan GC code \ | |
| in fm_sitrep_gc_orphans.", | |
| "found fm_* child table(s) not covered by \ | |
| `SitrepChildTable`: {}\n\n\ | |
| If you added a new fm_* child table, add a variant \ | |
| to `SitrepChildTable` and update the orphan GC code \ | |
| in `fm_sitrep_gc_orphans`.", |
There was a problem hiding this comment.
also, maybe we should say something about what to do if there's a false positive (i.e. if you add a new fm_foo table that should not be covered by orphan GC) --- something like "consider either dropping the fm_ prefix or adding your table name to tables_ignored"?
There was a problem hiding this comment.
Added back-ticks + advice
smklein
left a comment
There was a problem hiding this comment.
Addressed everything but the pagination - agree it's worth doing, but given that it miiiight increase some complexity I'm punting a bit (leaving as a TODO)
| // Ensure columns stay aligned even if a child table name is long. | ||
| let width = child_tables | ||
| .keys() | ||
| .map(|name| "orphaned rows deleted:".len() + name.len() + 1) |
| stats.rows_deleted, | ||
| ); | ||
| println!(" {:<width$}{:>NUM_WIDTH$}", " batches:", stats.batches,); | ||
| } |
| /// metadata. To add a new child table to the sitrep GC, just add a line | ||
| /// here — the GC loop, omdb display, and completeness test all adapt | ||
| /// automatically. |
There was a problem hiding this comment.
updated the docs, added an example of the syntax
| ) -> Result<GcOrphansResult, Error> { | ||
| let conn = self.pool_connection_authorized(opctx).await?; | ||
|
|
||
| // TODO(eliza): there should probably be an authz object for the fm sitrep? |
| .transaction(&conn, |conn| { | ||
| async move { | ||
| // Step 1: Delete orphaned fm_sitrep metadata rows. | ||
| let mut sitreps_deleted = 0usize; |
There was a problem hiding this comment.
I can try to update the comment here to make the orphan terminology more clear. I was kinda using "orphaned" to refer to the top-level sitrep, and "deeply orphaned" to refer to child tables of sitreps which no longer exist.
Adding a TODO for the pagination because it will have subtleties that maybe should be reviewed independently of this PR.
|
|
||
| /// Builds a DELETE query that removes orphaned `fm_sitrep` metadata | ||
| /// rows in a single paginated batch (sitreps not in history whose | ||
| /// parent is not current). Returns (rows_deleted, next_marker). |
| /// 1. Finds a batch of distinct `sitrep_id` values in the child table | ||
| /// 2. Deletes rows whose `sitrep_id` has no corresponding `fm_sitrep` | ||
| /// 3. Returns (rows_deleted, next_marker) where next_marker is NULL | ||
| /// when there are no more pages |
| // --------------------------------------------------------------- | ||
| // SitrepChildTableCounts: mirrors BlueprintTableCounts from | ||
| // deployment.rs to ensure every fm_* child table is covered by | ||
| // `SitrepChildTable`. | ||
| // --------------------------------------------------------------- |
There was a problem hiding this comment.
Updated the comment
| "found fm_* child table(s) not covered by \ | ||
| SitrepChildTable: {}\n\n\ | ||
| If you added a new fm_* child table, add a variant \ | ||
| to SitrepChildTable and update the orphan GC code \ | ||
| in fm_sitrep_gc_orphans.", |
There was a problem hiding this comment.
Added back-ticks + advice
| Ok(result) => { | ||
| status.orphaned_sitreps_deleted = result.sitreps_deleted; | ||
| status.sitrep_metadata_batches = result.sitrep_metadata_batches; | ||
| status.batch_size = result.batch_size; | ||
| status.child_tables = result | ||
| .child_tables |
|
yay! |
Follow-up to #10143 Adds pagination during sitrep garbage collection. While I was there, I realized that we actually don't need transactions on the delete pathway anymore. - The delete operation (within the garbage collection pass) deletes sitreps by deleting out-of-history `fm_sitrep` rows, which immediately orphans all other sub-tables within the sitrep. - All the "concurrent reads" of sitreps go through `fm_sitrep_read_on_conn`, which reads metadata from `fm_sitrep` **last**. If this sitrep can be read: it has not been deleted yet. If this sitrep cannot be read: it has been deleted, and prior reads can be discarded. - All the "concurrent inserts" of sitreps are contingent on the parent sitrep not being stale. So: because sitrep insert writes to the `fm_sitrep` table first, the child rows are "not orphaned" (won't be GC-ed). This protection lasts for the duration of `fm_sitrep_insert`, OR until the parent sitrep is marked stale - at which point insert should fail anyway. All this is to say: the "read" and "insert" pathways function fine if a `fm_sitrep` row is deleted non-atomically before subsequent child rows. Therefore: no transaction is necessary here.
Fixes #10131
In an attempt to avoid using transactions during the INSERT pathway (see: #10133)
this PR instead tries to alter how DELETE works for sitreps.
Rather than using a two-step GC process (listing orphaned sitreps and deleting
them - even while concurrent INSERT operations could be underway) this PR
also enables the deletion of "rows within a sitrep, for which the sitrep metadata
record no longer exists".
This allows us to delete rows that were previously leaked, and is also now the
primary mechanism by which rows are removed: we delete the high-level
sitrep first, then remove all these newly-orphaned rows. This is still performed
within a transaction on the DELETE pathway to avoid reading torn sitreps
during the loading phase (as originally appeared in #9594).