feat: using pytest as a test runner, diversify the tests into unit, integration, and performance tests #1028

jenstroeger · 2025-11-01T11:58:10Z

Based on conversations with @behnazh and the How to keep Unit tests and Integrations tests separate in pytest blurb on StackOverflow, this change adds three new Makefile goals:

make test-unit: run the unit tests, just like we have so far;
make test-integration: run integration tests; and

make test-performance: run performance tests:

---------------------------------------------- benchmark: 1 tests ----------------------------------------------
Name (time in us)        Min     Max    Mean  StdDev  Median     IQR  Outliers  OPS (Kops/s)  Rounds  Iterations
----------------------------------------------------------------------------------------------------------------
test_something        3.0105  4.7433  3.0774  0.1769  3.0461  0.0246      3;10      324.9469     100          10
----------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

The existing goal make test runs all tests from all three categories.

Points to consider:

Rename “performance” to “benchmark” because the pytest plugin is called pytest-benchmark?
Look into and add performance regression checks (see also Consider tracking performance across package releases. #563).
Elaborate in the Readme, and expand the module docstrings for each test to give more background.

…ntegration, and performance tests

danirivas · 2025-11-04T12:05:23Z

Based on conversations with @behnazh and the How to keep Unit tests and Integrations tests separate in pytest blurb on StackOverflow, this change adds three new Makefile goals:
* `make test-unit`: run the [unit tests](https://en.wikipedia.org/wiki/Unit_testing), just like we have so far;

* `make test-integration`: run [integration tests](https://en.wikipedia.org/wiki/Integration_testing); and

* `make test-performance`: run [performance tests](https://en.wikipedia.org/wiki/Software_performance_testing):
  ```
The existing goal make test runs all tests from all three categories.

I'm 100% on board with splitting the tests in these 3 categories. Even if you want to always run all categories, at least you can easily control the order in which they do.

Unit tests are the bare minimum so they are always a pre-condition. Integration tests may run slower but only run after unit tests make sure you didn't break anything. And there's no point in running performance tests if you don't know yet your changes are even correct. So having these 3, and in this order, makes perfect sense.

That said, why split them by category in different directories if they are then marked with the same categories? Not against it, just wondering.

Points to consider:

* [ ]  Rename “performance” to “benchmark” because the pytest plugin is called [pytest-benchmark](https://github.com/ionelmc/pytest-benchmark)?

I like performance if they'll be used for checking regression in, well, performance. We use a benchmark to asses performance 🤓

But if there's no good or bad (i.e. no regression checks) then benchmark seems accurate.

* [ ]  Look into and add performance regression checks (see also [Consider tracking performance across package releases. #563](https://github.com/jenstroeger/python-package-template/issues/563)).

The main complexity I've always faced when trying to add this to my projects is that if the benchmark is too small or too unrealistic, I fear that the latency might fall within the error margin and be subject to too much variability.

I know this is a template for a package but unless I'm testing compute-bound functions, the benchmark should probably consider integration tests in a realistic scenario (i.e. warmed up DBs with enough data and plans executed). I'm always wary of automating this part because, when it comes to performance, I'd love to gather as many data points (and plot them!) as possible before drawing any conclusions.

Another idea I've been entertaining (and this might be off-topic for the scope of this PR and repo) is counting the number of queries that a function or endpoint executes. That number alone might not tell the full story but in case there's an unexpected latency spike, it can help understand why. For example, a function that loops over something and accidentally does thousand small queries instead of one big join or IN clause or looping over an ORM object with lazy relationships, which ends up with one query per row.

jenstroeger · 2025-11-07T22:56:37Z

Unit tests are the bare minimum so they are always a pre-condition. Integration tests may run slower but only run after unit tests make sure you didn't break anything.

I’m curious: for test-driven development, would you write primarily unit tests or would you consider writing some integration tests as well?

And there's no point in running performance tests if you don't know yet your changes are even correct.

Agreed, good point!

That said, why split them by category in different directories if they are then marked with the same categories? Not against it, just wondering.

If I didn’t and I wanted to find e.g. performance tests then I might have to rummage through a bunch of files to find those. I thought it’s just a helpful way of organizing the test files; and because I can reuse the same filenames I thought that’ll help with orienting myself in the test code 🤔

Now if pytest would allow me to mark folders that’d be useful… maybe?

I like performance if they'll be used for checking regression in, well, performance. We use a benchmark to asses performance 🤓

Fair point, agreed! And like we discussed in issue #563 having regression tracking would be great!

The main complexity I've always faced […]

I very much agree, and I’ve looked at these performance/regression tests as close siblings of the unit tests, not so much of the overall integration tests. And personally, I’d be more interested in the performance of this or that critical/relevant function… not all of them, anyway.

Another idea I've been entertaining (and this might be off-topic for the scope of this PR and repo) is […]

This sounds like something a (sampling) profiler can do, or some targeted instrumentation?

danirivas · 2025-11-26T11:32:18Z

I’m curious: for test-driven development, would you write primarily unit tests or would you consider writing some integration tests as well?

I don't want to give the go-to answer but... it depends 🤓 If it's a something that heavily depends on other components, it might be a good idea to add integration tests as part of a TDD loop. That said, we both know that sometimes is difficult to get people to stick to a rigid definition of unit in testing. For years I've worked in projects where the unit was clearly defined and enclosed. From there, we'd start building the test pyramid all the way to the top (i.e. end-to-end integration tests). However, it's difficult and, specially, exhausting to make every single function a testable unit when folks break the code in smaller functions just for the sake of having... well, smaller functions¹ 🤷

This sounds like something a (sampling) profiler can do, or some targeted instrumentation?

I love a good profiling as much as anyone but profilers tend to be quite verbose. They are a great tool for when you don't know what you're looking for or for when you are just debugging a performance issue (emphasis on debugging). For regression testing, I'd lean more towards targeted instrumentation to extract specific and well-understood metrics that be used to quickly compare runs across versions.

For instance, the example I mentioned about tracking the number of DB queries. There's nothing wrong in doing DB queries if they are needed but we know they are expensive by nature so a spike in those may hint an underlying issue. And if that metric is coupled with performance numbers, it can help put the spotlight on a specific part that may need optimization the event of a performance regression. This, of course, requires some previous knowledge of the application as that may not translate well to other applications.

Oh, boy! How much unnecessary pain and frustration has the Clean Code caused! or devs that understood it as a dogma to follow. ↩

.pre-commit-config.yaml

behnazh · 2025-11-26T12:30:12Z

I’m curious: for test-driven development, would you write primarily unit tests or would you consider writing some integration tests as well?

I usually start by adding a unit test first. For me, a unit test should be fast, require no network calls, and run fully offline. If a unit test can’t meaningfully cover the behavior, then I add an integration test. I agree with @danirivas that unit tests act as a quick precondition check and should catch the obvious issues before we run the slower, more expensive integration tests (which literally cost money).

danirivas · 2025-11-26T13:55:46Z

I’m curious: for test-driven development, would you write primarily unit tests or would you consider writing some integration tests as well?

I usually start by adding a unit test first. For me, a unit test should be fast, require no network calls, and run fully offline. If a unit test can’t meaningfully cover the behavior, then I add an integration test. I agree with @danirivas that unit tests act as a quick precondition check and should catch the obvious issues before we run the slower, more expensive integration tests (which literally cost money).

Agree. I personally tend to prioritize quick feedback at the risk of having some false positives slip through until the full test-suite runs. If feedback takes 10min, chances are I'll go do something else and probably forget tests were running until the next epoch reminds me to context-switch back to it.

jenstroeger · 2025-11-27T02:33:21Z

@behnazh I usually start by adding a unit test first. […]

I have little to add, I very much agree 🤓

behnazh · 2025-11-27T02:47:59Z

@behnazh I usually start by adding a unit test first. […]

I have little to add, I very much agree 🤓

I have one more thing to add 🙂 I think what I have described does not follow the test-driven approach. I typically write the code first and consider unit tests afterwards, whereas the test-driven approach emphasizes starting by writing a test and then developing your code to pass that test.

jenstroeger · 2025-11-27T05:19:16Z

I have one more thing to add 🙂 […]

I think we have text-book approaches to engineering, and the real world. Perhaps the best we can do is try to be consistent over time and as a project scales, and also be realistic about why no single one approach will always work. Testing has great value and it bugs me how engineers often neglect to test their own code.

[Deleted two hours of opinionated drivel 🤓]

.pre-commit-config.yaml

…it hook, to untie the two pytest invocations

…g the flit command, to ensure we call the venv’s flit and not another pre-installed version

Makefile

…e tests

Makefile

Refs: f3961e5

behnazh · 2025-11-30T23:57:12Z

The scope for this PR is feat, and not test I think because this is a template project!

test: using pytest as a test runner, diversify the tests into unit, i…

191b194

…ntegration, and performance tests

jenstroeger mentioned this pull request Nov 3, 2025

Consider tracking performance across package releases. #563

Open

behnazh reviewed Nov 26, 2025

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

test: run only unit tests for the local git push hook

ca2e6e5

jenstroeger force-pushed the diversify-testing branch from 5238344 to ca2e6e5 Compare November 26, 2025 12:46

jenstroeger requested a review from behnazh November 26, 2025 12:47

jenstroeger added 3 commits November 26, 2025 21:43

docs: improve module docstrings for tests

2e30862

chore: add explanations for pytest markers

567c636

chore: improve Makefile and how & when to run what tests

86bf9af

jenstroeger commented Nov 27, 2025

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

jenstroeger added 3 commits November 27, 2025 22:31

chore: call pytest directly from the Makefile instead of the pre-comm…

bc93ec4

…it hook, to untie the two pytest invocations

chore: run unit tests only from the git push hook

345b862

fix: run flit through the venv’s python interpreter instead of callin…

f3961e5

…g the flit command, to ensure we call the venv’s flit and not another pre-installed version

danirivas reviewed Nov 28, 2025

View reviewed changes

Makefile Show resolved Hide resolved

danirivas reviewed Nov 28, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

chore: disable coverage tracking for integration tests and performanc…

1babae2

…e tests

jenstroeger requested a review from danirivas November 28, 2025 16:08

behnazh reviewed Nov 29, 2025

View reviewed changes

Makefile Show resolved Hide resolved

Makefile Outdated Show resolved Hide resolved

chore: remove changes to running flit

53d65c5

Refs: f3961e5

jenstroeger mentioned this pull request Nov 30, 2025

fix: invoke flit through the Python interpreter’s module-call argument to ensure the Python venv’s flit is being run #1045

Merged

jenstroeger changed the title ~~test: using pytest as a test runner, diversify the tests into unit, integration, and performance tests~~ feat: using pytest as a test runner, diversify the tests into unit, integration, and performance tests Dec 1, 2025

jenstroeger added 2 commits November 30, 2025 21:00

chore: improve comment

01dfc17

docs: update README

27c31c0

jenstroeger requested a review from behnazh December 1, 2025 05:33

jenstroeger marked this pull request as ready for review December 1, 2025 05:36

behnazh approved these changes Dec 1, 2025

View reviewed changes

jenstroeger merged commit 1b34fdd into main Dec 3, 2025
31 of 32 checks passed

jenstroeger mentioned this pull request Dec 17, 2025

Add and extend tests jenstroeger/fullstack-webapp-template#9

Open

feat: using pytest as a test runner, diversify the tests into unit, integration, and performance tests #1028

feat: using pytest as a test runner, diversify the tests into unit, integration, and performance tests #1028

Uh oh!

Conversation

jenstroeger commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danirivas commented Nov 4, 2025

Uh oh!

jenstroeger commented Nov 7, 2025

Uh oh!

danirivas commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

Uh oh!

behnazh commented Nov 26, 2025

Uh oh!

danirivas commented Nov 26, 2025

Uh oh!

jenstroeger commented Nov 27, 2025

Uh oh!

behnazh commented Nov 27, 2025

Uh oh!

jenstroeger commented Nov 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

behnazh commented Nov 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jenstroeger commented Nov 1, 2025 •

edited

Loading

danirivas commented Nov 26, 2025 •

edited

Loading