proposal for regression testing by HansVRP · Pull Request #490 · ESA-APEx/apex_algorithms

HansVRP · 2026-04-30T14:33:16Z

Idea for starting to include regression benchmarks.

@JanssenBrm I would also need info on how to best expose it such that we can keep a log on the service catalogue

algorithm-services-catalogue · 2026-04-30T14:36:33Z

🔍 Catalogue's Preview Site Deployed

Your changes have been deployed to the preview site:

🔗 Preview URL: https://esa-apex.github.io/apex-algorithms-catalogue-web/pr-preview/pr-490/

This preview will be updated automatically when you push new changes to your PR.

HansVRP · 2026-05-13T07:19:50Z

@JanssenBrm @VictorVerhaert ready to check. I have opted for a more adaptive benchmark where we look at the average and the std. Depending on the nr of successful runs the benchmark becomes more determinantal

HansVRP · 2026-06-02T07:52:31Z

@JanssenBrm @JeroenVerstraelen @VictorVerhaert all feedback is welcome

VictorVerhaert

Two small optional comment aimed at trying to prevent false fails. One more question: could you try and run it using github actions and see how it behaves in practice?
Otherwise the pr looks clean

HansVRP · 2026-06-12T07:38:32Z

seems to be working well:

https://jenkins.vgt.vito.be/job/openEO/job/openeo-apex-benchmarks-handpicked-run/61/

HansVRP · 2026-06-12T15:38:33Z

ended up making some changes seeing the behavior on jenkins.

So I did add explicit time limitations on data to use and I made cost the sole gating value which can determine a failure. All other values were to volatile, especially for these small benchmarks which hover around 4 credits.

HansVRP · 2026-06-23T07:56:28Z

(blocked by #547)

HansVRP · 2026-06-23T14:15:11Z

@JanssenBrm @soxofaan Could the two of you also take a look at this proposal of regression testing?

In general I look into the merged parquet for the last X months and per usage metric/cost get all valid metrics for succesful runs.

Based on that I calculate the median and the MAD and convert it into a standard deviation.

For now the test will fail if the measured cost > median +3.5 x MAD which seemed sensible for some tests I have ran.

The other metrics fluctuate quite a lot, so for those I only log warnings in case of a failure

soxofaan · 2026-06-24T10:44:24Z

short on time here, so I could only give this a quick look
I'm a bit worried about the design here: if I understand correctly, you are adding hundreds of lines of logic in the main test_run_benchmark code path, involving (S3) io, pandas manipulations, which will make the main code path (already known to be brittle and trigger happy), even more brittle and hard to debug.
Can't the performance regression be added as a separate post-processing tool run, working on test suite output?

HansVRP · 2026-06-24T11:03:19Z

@soxofaan most of the code indeed relates to getting a good statistic out of this merger parquet file which can be used to determine there occurred regression or not. Ideally it would be cached somewhere locally such that we would not need to calculate the baseline on runtime each time.

am I correct that this is what you are proposing as well?

HansVRP · 2026-06-24T11:28:28Z

what I can do is move the baseline calculation to a separate github workflow and trigger it weekly to recompute the baseline.

in the actual test phase we would read in said file from S3 and check for regression?

soxofaan · 2026-06-24T11:30:59Z

no I'm not talking about caching (I'm not sure there is even something useful to do with caching in a test_run_benchmark context),
What I mean is to keep the pytest ... test_run_benchmark run/workflow to the bare essentials (for stability and to-the-point results)
and do all the other analysis, trend watching, ... in separate tools or workflows (to allow experimentation, iteration, interactivity)

HansVRP · 2026-06-24T11:53:02Z

makes sense; I'll split both workflows up, let the regression trigger weekly and create separate github issues!

…eek to compare recent benchmark runs

HansVRP requested review from JanssenBrm and VictorVerhaert April 30, 2026 14:33

VictorVerhaert self-assigned this May 4, 2026

JeroenVerstraelen mentioned this pull request May 5, 2026

[EPIC] APEx openEO UDP benchmark & regression workflow #1

Open

5 tasks

HansVRP force-pushed the hv_regresion_benchmark branch 3 times, most recently from 42a8863 to be6e8ed Compare May 13, 2026 07:17

HansVRP requested a review from JeroenVerstraelen May 13, 2026 09:35

VictorVerhaert reviewed Jun 5, 2026

View reviewed changes

Comment thread qa/tools/apex_algorithm_qa_tools/benchmarks.py Outdated

Comment thread qa/tools/apex_algorithm_qa_tools/benchmark_trends.py Outdated

HansVRP requested a review from VictorVerhaert June 12, 2026 15:37

HansVRP requested review from soxofaan and removed request for JeroenVerstraelen June 23, 2026 14:12

HansVRP force-pushed the hv_regresion_benchmark branch 2 times, most recently from f008197 to f0005a4 Compare June 25, 2026 12:26

adding a seperate workflow for regression testing which runs once a w…

b2a10e6

…eek to compare recent benchmark runs

HansVRP force-pushed the hv_regresion_benchmark branch from f0005a4 to b2a10e6 Compare June 25, 2026 12:34

HansVRP added 2 commits June 25, 2026 14:50

performance logging

c8859d7

ponytail review

6020001

info update

d22c9f7

Uh oh!

Conversation

HansVRP commented Apr 30, 2026

Uh oh!

algorithm-services-catalogue Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Catalogue's Preview Site Deployed

Uh oh!

HansVRP commented May 13, 2026

Uh oh!

HansVRP commented Jun 2, 2026

Uh oh!

VictorVerhaert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HansVRP commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HansVRP commented Jun 12, 2026

Uh oh!

HansVRP commented Jun 23, 2026

Uh oh!

HansVRP commented Jun 23, 2026

Uh oh!

soxofaan commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HansVRP commented Jun 24, 2026

Uh oh!

HansVRP commented Jun 24, 2026

Uh oh!

soxofaan commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HansVRP commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

algorithm-services-catalogue Bot commented Apr 30, 2026 •

edited

Loading

HansVRP commented Jun 12, 2026 •

edited

Loading

soxofaan commented Jun 24, 2026 •

edited

Loading

soxofaan commented Jun 24, 2026 •

edited

Loading