Skip to content

Fix more flakiness in Cosmos tests#38211

Open
AndriySvyryd wants to merge 10 commits intomainfrom
andriysvyryd/cosmos-concurrency-test-fix
Open

Fix more flakiness in Cosmos tests#38211
AndriySvyryd wants to merge 10 commits intomainfrom
andriysvyryd/cosmos-concurrency-test-fix

Conversation

@AndriySvyryd
Copy link
Copy Markdown
Member

@AndriySvyryd AndriySvyryd commented May 1, 2026

  • I've read the guidelines for contributing and seen the walkthrough
  • I've posted a comment on an issue with a detailed description of how I am planning to contribute and got approval from a member of the team
  • The code builds and tests pass locally (also verified by our automated build checks)
  • Commit messages follow this format:
        Summary of the changes
        - Detail 1
        - Detail 2

        Fixes #bugnumber
  • Tests for the changes have been added (for bug fixes / features)
  • Code follows the same patterns and style as existing code in this repo

Description

CosmosBulkConcurrencyTest and CosmosConcurrencyTest both used the same Cosmos DB database name ("CosmosConcurrencyTest") because CosmosBulkConcurrencyTest.ConcurrencyFixture inherited StoreName from CosmosConcurrencyTest.CosmosFixture without overriding it.

Since xUnit runs test classes in parallel, both classes' ConcurrencyTestAsync methods call CleanAsync per-test, which deletes and recreates containers. When these calls race against the same database, one class can delete the database while the other is trying to create containers in it, resulting in a sporadic 404 NotFound.

The fix overrides StoreName in the bulk fixture to "CosmosBulkConcurrencyTest" so each test class gets its own isolated database.

Don't delete the Cosmos test databases immediately after the TestStore is disposed as that goes against the purpose of shared test databases and also creates race conditions.

Use execution strategy in EnsureCreated to make it resilient to transitive failures.
Also, clear the change tracker before calling SeedData delegate, so that when it's retried the old state isn't included.

CosmosBulkConcurrencyTest and CosmosConcurrencyTest both inherited the
same StoreName ('CosmosConcurrencyTest'). Since xUnit runs test classes
in parallel, their CleanAsync calls raced against the same database —
one class could delete the database while the other was creating
containers, causing a 404 NotFound.

Override StoreName in the bulk fixture so each class uses its own
database.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@AndriySvyryd AndriySvyryd marked this pull request as ready for review May 1, 2026 20:40
@AndriySvyryd AndriySvyryd requested a review from a team as a code owner May 1, 2026 20:40
Copilot AI review requested due to automatic review settings May 1, 2026 20:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes flakiness in Cosmos concurrency functional tests by isolating the database used by CosmosBulkConcurrencyTest from CosmosConcurrencyTest, preventing parallel test runs from racing on CleanAsync (delete/recreate).

Changes:

  • Override StoreName in CosmosBulkConcurrencyTest.ConcurrencyFixture to use a unique Cosmos database name.

Comment thread test/EFCore.Cosmos.FunctionalTests/Update/CosmosBulkConcurrencyTest.cs Outdated
…yTest.cs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 1, 2026 20:47
@AndriySvyryd AndriySvyryd enabled auto-merge (squash) May 1, 2026 20:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

AndriySvyryd and others added 2 commits May 1, 2026 23:39
…licts

When the AsyncSeeder adds entities to the context and SaveChangesAsync
hits a transient Cosmos error (429/503), the execution strategy retries
CleanAsyncImpl with the same context. The retry's seeder then fails
trying to Add entities already tracked from the previous attempt.

Clear ChangeTracker at the start of each attempt so retries start clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The seeder invoked by SeedDataAsync adds entities to the context, so if
EnsureCreatedAsync is called inside any retry loop and the seeder's
SaveChangesAsync hits a transient error, the next attempt finds stale
tracked entities and fails with an identity conflict.

Clearing the tracker at the start of EnsureCreatedAsync makes the method
inherently retry-safe rather than relying on callers to clear state.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 2, 2026 06:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread src/EFCore.Cosmos/Storage/Internal/CosmosDatabaseCreator.cs Outdated
Copilot AI review requested due to automatic review settings May 2, 2026 07:11
@AndriySvyryd AndriySvyryd force-pushed the andriysvyryd/cosmos-concurrency-test-fix branch from 0179bf6 to e235149 Compare May 2, 2026 07:11
@AndriySvyryd AndriySvyryd force-pushed the andriysvyryd/cosmos-concurrency-test-fix branch from e235149 to 62293b4 Compare May 2, 2026 07:15
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs:529

  • This keeps the first shared CosmosTestStore's _storeContext undisposed until ProcessExit (the 'reserved for cleanup' instance). That means a DbContext (and likely its underlying Cosmos client/service provider) can stay alive for the entire test process per StoreName, increasing resource usage and potentially affecting long-running test runs. Consider storing only the information needed for cleanup (e.g., store name/connection info) and creating a short-lived context at ProcessExit for deletion, so DisposeAsync can always dispose _storeContext deterministically.
        private readonly CosmosTestStore _testStore = testStore;

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            if (TestEnvironment.UseTokenCredential)
            {
                optionsBuilder.UseCosmos(
                    _testStore.ConnectionUri, _testStore.TokenCredential, _testStore.Name, _testStore._configureCosmos);
            }
            else

Comment thread test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs Outdated
Comment thread src/EFCore.Cosmos/Storage/Internal/CosmosDatabaseCreator.cs Outdated
@AndriySvyryd AndriySvyryd force-pushed the andriysvyryd/cosmos-concurrency-test-fix branch from 62293b4 to c302751 Compare May 2, 2026 07:17
Multiple test fixtures can share the same Cosmos database (e.g. all
Northwind query tests share 'Northwind'). Previously, the first fixture
to dispose would delete the database, causing other fixtures still
running in parallel to fail with NotFound or wrong data.

Shared stores now register themselves in a static dictionary at
construction time. A static constructor registers a ProcessExit handler
that deletes all shared databases in parallel after the test run
completes. Non-shared databases continue to be deleted immediately on
dispose.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 2, 2026 07:19
@AndriySvyryd AndriySvyryd force-pushed the andriysvyryd/cosmos-concurrency-test-fix branch from c302751 to de42341 Compare May 2, 2026 07:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs
Comment thread test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs Outdated
Comment thread src/EFCore.Cosmos/Storage/Internal/CosmosDatabaseCreator.cs Outdated
@AndriySvyryd AndriySvyryd changed the title Fix flaky CosmosBulkConcurrencyTest by using unique database name Fix more flakiness in Cosmos tests May 2, 2026
@AndriySvyryd AndriySvyryd disabled auto-merge May 2, 2026 15:39
Copilot AI review requested due to automatic review settings May 2, 2026 15:51
@AndriySvyryd AndriySvyryd force-pushed the andriysvyryd/cosmos-concurrency-test-fix branch from 5de08b2 to 57e73cf Compare May 2, 2026 15:51
@AndriySvyryd AndriySvyryd force-pushed the andriysvyryd/cosmos-concurrency-test-fix branch from 57e73cf to 59ebab1 Compare May 2, 2026 15:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comment thread src/EFCore.Relational/Storage/RelationalDatabaseCreator.cs
Comment thread test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs
Comment thread test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs
Comment thread src/EFCore.Cosmos/Storage/Internal/CosmosDatabaseCreator.cs Outdated
Comment thread src/EFCore.Relational/Storage/RelationalDatabaseCreator.cs Outdated
- CosmosDatabaseCreator.EnsureCreatedAsync: wrap entire body in
  execution strategy. Use StrongBox to persist the 'created' flag
  across retries. Clear ChangeTracker only when AsyncSeeder is set.

- RelationalDatabaseCreator.EnsureCreated/EnsureCreatedAsync: wrap
  seeding in the execution strategy. Clear ChangeTracker only when
  the seeder is set.

- Migrator.MigrateImplementation/MigrateAsyncImplementation: clear the
  change tracker before invoking the seeder.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@AndriySvyryd AndriySvyryd force-pushed the andriysvyryd/cosmos-concurrency-test-fix branch from 59ebab1 to 03a6ac8 Compare May 3, 2026 01:26
- Refactor CosmosDatabaseCreator: move ChangeTracker.Clear() into
  SeedDataAsync so both EnsureCreatedAsync and CosmosTestStore.SeedAsync
  benefit from the fix without duplication.

- Add InternalServerError (500) to CosmosExecutionStrategy retryable
  errors. The Cosmos emulator returns 500 when overwhelmed with
  concurrent operations.

- Fix ReadItemPartitionKeyQueryFixtureBase entity sorter for
  SinglePartitionKeyEntity to sort by (Id, PartitionKey) instead of
  just Id. Two entities share the same Id across partitions, causing
  non-deterministic assertion matching.

- Dispose _storeContext for deferred shared stores in DisposeAsync.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 3, 2026 05:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Comment thread src/EFCore.Relational/Migrations/Internal/Migrator.cs Outdated
Comment thread src/EFCore.Relational/Migrations/Internal/Migrator.cs Outdated
Comment thread test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs Outdated
Comment thread test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs
Comment thread src/EFCore.Cosmos/Storage/Internal/CosmosDatabaseCreator.cs
Comment thread src/EFCore.Relational/Storage/RelationalDatabaseCreator.cs Outdated
Comment thread src/EFCore.Relational/Storage/RelationalDatabaseCreator.cs Outdated
@roji roji assigned AndriySvyryd and unassigned roji May 3, 2026
AndriySvyryd and others added 2 commits May 3, 2026 09:28
…posal

CosmosDatabaseCreator.EnsureCreatedAsync:
- Track DataInserted flag so InsertDataAsync (not idempotent) is not
  re-run on retry. On retry, clear ChangeTracker instead.
- Remove ChangeTracker.Clear from SeedDataAsync since callers handle it.

CosmosTestStore:
- CleanAsync: track retry state, only clear ChangeTracker on retry.
- DisposeAsync: only dispose context for non-canonical instances in
  _deferredStores (the canonical instance stays alive for ProcessExit).
- Add CosmosBulkExecutionTest to deferred deletion list (shared by
  CosmosBulkWarningTest). Remove F1Test (not actually shared).

RelationalDatabaseCreator.EnsureCreated/Async:
- Track retry state, only clear ChangeTracker on retry.

Migrator.MigrateImplementation/Async:
- Use MigrationExecutionState.SeedingAttempted flag to only clear
  ChangeTracker on retry.

CosmosExecutionStrategy:
- Revert InternalServerError (500) from retryable errors — the 500s
  were caused by leaked CosmosClient instances from improper disposal.

ReadItemPartitionKeyQueryInheritanceFixtureBase:
- Fix DerivedSinglePartitionKeyEntity entity sorter to sort by
  (Id, PartitionKey) to avoid non-deterministic matching.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ferred deletion

- Merge CosmosBulkWarningTest tests into CosmosBulkExecutionTest using
  the non-shared test pattern to eliminate the shared database. Delete
  the CosmosBulkWarningTest file.

- Inline deferred deletion for the only shared database (Northwind).
  Add a debug assertion that throws when an unexpected database is
  shared across multiple fixture types.

- Refactor SeedDataAsync to accept a clearChangeTracker parameter,
  eliminating the duplicate clear logic in EnsureCreatedAsync.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 3, 2026 16:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comment thread src/EFCore.Cosmos/Storage/Internal/CosmosDatabaseCreator.cs
await store.EnsureDeletedAsync(store._storeContext).ConfigureAwait(false);
}
catch
{
The test uses assertOrder: true without an OrderBy, which produces
non-deterministic results on the Linux Cosmos emulator.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants