Delete sitreps non-transactionally with improved pagination#10210
Delete sitreps non-transactionally with improved pagination#10210
Conversation
zoom zoom! |
| .get_result_async::<(i64, Option<Uuid>)>(&*conn) | ||
| .await | ||
| .map_err(|e| { | ||
| public_error_from_diesel(e, ErrorHandler::Server) |
There was a problem hiding this comment.
nit: would kinda like an internal_context here saying which step we were trying to do when we failed?
| .get_result_async::<(i64, Option<Uuid>)>(&*conn) | ||
| .await | ||
| .map_err(|e| { | ||
| public_error_from_diesel(e, ErrorHandler::Server) |
There was a problem hiding this comment.
similarly, it would be kinda nice if this said what went wrong...
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| write!( | ||
| f, | ||
| "reads_ok={}, reads_err={}, inserts={}, \ | ||
| gc_runs={}", | ||
| self.reads_ok.load(Ordering::Relaxed), | ||
| self.reads_err.load(Ordering::Relaxed), | ||
| self.inserts.load(Ordering::Relaxed), | ||
| self.gc_runs.load(Ordering::Relaxed), | ||
| ) | ||
| } |
There was a problem hiding this comment.
would love to use exhaustive destructuring here, so we get an error if we add another counter...
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | |
| write!( | |
| f, | |
| "reads_ok={}, reads_err={}, inserts={}, \ | |
| gc_runs={}", | |
| self.reads_ok.load(Ordering::Relaxed), | |
| self.reads_err.load(Ordering::Relaxed), | |
| self.inserts.load(Ordering::Relaxed), | |
| self.gc_runs.load(Ordering::Relaxed), | |
| ) | |
| } | |
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | |
| let Self { reads_ok, reads_err, inserts, gc_runs } = self; | |
| write!( | |
| f, | |
| "reads_ok={}, reads_err={}, inserts={}, \ | |
| gc_runs={}", | |
| reads_ok.load(Ordering::Relaxed), | |
| reads_err.load(Ordering::Relaxed), | |
| inserts.load(Ordering::Relaxed), | |
| gc_runs.load(Ordering::Relaxed), | |
| ) | |
| } |
| #[derive(Clone)] | ||
| struct Stats { | ||
| reads_ok: Arc<AtomicUsize>, | ||
| reads_err: Arc<AtomicUsize>, | ||
| inserts: Arc<AtomicUsize>, | ||
| gc_runs: Arc<AtomicUsize>, | ||
| } |
There was a problem hiding this comment.
not a big deal, but...why are these all Arced individually? why not Arc the whole Stats type?
There was a problem hiding this comment.
eh just test-writing laziness. I'll update it to use a single Arc
| let guard = live_sitreps.lock().await; | ||
| if guard.is_empty() { | ||
| drop(guard); | ||
| tokio::task::yield_now().await; |
There was a problem hiding this comment.
what is the yield_now doing here? it might be nice if there was a comment explaining what this is intended for?
There was a problem hiding this comment.
I'll add a comment. The TL;DR, is "there is nothing to read yet, because writer tasks haven't run".
| &log, "GC pass complete"; | ||
| "sitreps_deleted" => result.sitreps_deleted, | ||
| "child_tables" => ?result.child_tables, | ||
| ); |
There was a problem hiding this comment.
| &log, "GC pass complete"; | |
| "sitreps_deleted" => result.sitreps_deleted, | |
| "child_tables" => ?result.child_tables, | |
| ); | |
| &log, | |
| "GC pass complete"; | |
| "sitreps_deleted" => result.sitreps_deleted, | |
| "child_tables" => ?result.child_tables, | |
| ); |
| // Join all tasks — a panic in any task (from assert_sitreps_eq) | ||
| // means we detected a torn read. | ||
| for handle in handles { | ||
| handle.await.expect("task panicked"); | ||
| } |
There was a problem hiding this comment.
i believe this may not actually be necessary if the tests are being compiled with panic = "abort"? but probably good to do anyway
| // Insert an empty child so the original isn't | ||
| // current and can be deleted. | ||
| let child_id = SitrepUuid::new_v4(); | ||
| let child = fm::Sitrep { | ||
| metadata: fm::SitrepMetadata { | ||
| id: child_id, | ||
| parent_sitrep_id: Some(sitrep_id), | ||
| time_created: Utc::now(), | ||
| creator_id: OmicronZoneUuid::new_v4(), | ||
| comment: "child".to_string(), | ||
| inv_collection_id: CollectionUuid::new_v4(), | ||
| }, | ||
| cases: Default::default(), | ||
| ereports_by_id: Default::default(), | ||
| }; |
There was a problem hiding this comment.
would it perhaps be more accurate to real life if we also stuffed cases into the child sitreps?
There was a problem hiding this comment.
Good call, I'll use make_sitrep_with_cases
|
|
||
| /// Test that deeply orphaned child rows (whose fm_sitrep metadata |
There was a problem hiding this comment.
Unintentional deletion?
| /// *complete* sitrep (no torn reads). Errors (e.g. `NotFound`) are | ||
| /// expected and fine — partial data is not. | ||
| /// | ||
| /// Writers race with each other, causing `ParentNotCurrent` failures. |
There was a problem hiding this comment.
I think we expect that one of the racing writers will always win and the rest will fail with ParentNotCurrent or whatever, is that right? Is there any possibility that all writers will fail, such that the test wouldn't make progress?
There was a problem hiding this comment.
if exactly one racing writer doesn't always win, then we have much worse problems :)
Follow-up to #10143
Adds pagination during sitrep garbage collection.
While I was there, I realized that we actually don't need transactions on the delete pathway anymore.
fm_sitreprows, which immediately orphans all other sub-tables within the sitrep.fm_sitrep_read_on_conn, which reads metadata fromfm_sitreplast. If this sitrep can be read: it has not been deleted yet. If this sitrep cannot be read: it has been deleted, and prior reads can be discarded.fm_sitreptable first, the child rows are "not orphaned" (won't be GC-ed). This protection lasts for the duration offm_sitrep_insert, OR until the parent sitrep is marked stale - at which point insert should fail anyway.All this is to say: the "read" and "insert" pathways function fine if a
fm_sitreprow is deleted non-atomically before subsequent child rows. Therefore: no transaction is necessary here.