Skip to content

CI improvemens: General improvements#3984

Draft
hdiethelm wants to merge 25 commits intoLinuxCNC:masterfrom
hdiethelm:ci_improvemens
Draft

CI improvemens: General improvements#3984
hdiethelm wants to merge 25 commits intoLinuxCNC:masterfrom
hdiethelm:ci_improvemens

Conversation

@hdiethelm
Copy link
Copy Markdown
Contributor

This is the follow up to #3983

It will be more in depth changes so riskier and will take some time.

There will be a few pushes, including some that should fail on purpose to test if everything works as desired. Just tell me if I abuse the CI to much and I will find a different solution.

It is experimental for now but I need a PR so the CI runs.

@hdiethelm hdiethelm changed the title Ci improvemens: General cleanup Ci improvemens: General improvements Apr 30, 2026
@hdiethelm hdiethelm changed the title Ci improvemens: General improvements CI improvemens: General improvements Apr 30, 2026
@hdiethelm
Copy link
Copy Markdown
Contributor Author

Remove eatmydata tested:
Before: 4h 42m 44s
After: 4h 34m 41s + 15m estimated of a stalled process -> 4h 49m 41s
So no big impact and better readable.

@hdiethelm hdiethelm force-pushed the ci_improvemens branch 3 times, most recently from f59978a to 1921e21 Compare April 30, 2026 07:31
@BsAtHome
Copy link
Copy Markdown
Contributor

Did you see this message at the "Complete job" stage:

Warning: Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v2. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

@hdiethelm
Copy link
Copy Markdown
Contributor Author

Did you see this message at the "Complete job" stage:

Warning: Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v2. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

Yes, commit is already there, I just wait for the last CI to pass before the next push.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

@BsAtHome Feel free to cancel any pipeline that has failed jobs. I think I don't have the access rights to do so.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented Apr 30, 2026

So first success, with 4 CPU's, down to 3h 32
Before: 4h 38m usage / 38m runtime
After: 3h 42m usage / 24m runtime

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented Apr 30, 2026

Now the bigger change will be to run all in prepared dockers or runners. It probably won't reduce the runtime a lot, estimated: 3-5 min. However, it is just nice for the debian to not hammer their package repo just for fun. But let's see how well that works.

@BsAtHome Debian is the main target, right? So running rip-* under a debian container is also fine? At the moment, they run with an ubuntu-24.04 runner.

@BsAtHome
Copy link
Copy Markdown
Contributor

Debian is the primary target, yes.

Nice to see improvement and it is primarily in the independent packages building the documentation. Those were the slowest all the time.

The tests are not going to be significantly faster because they run in sequence. We've been discussing parallel execution, but that requires #2722 to be fully implemented. There are a significant number of issues not addressed in that PR (some noted, others implied, still others to be discovered).

@andypugh
Copy link
Copy Markdown
Collaborator

#3983 is in.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

#3983 is in.

Thanks, re-based on top of master.

Debian is the primary target, yes.

Nice to see improvement and it is primarily in the independent packages building the documentation. Those were the slowest all the time.

The tests are not going to be significantly faster because they run in sequence. We've been discussing parallel execution, but that requires #2722 to be fully implemented. There are a significant number of issues not addressed in that PR (some noted, others implied, still others to be discovered).

It depends what the target is. For local usage, #2722 is the most comfortable way for users. But it can be also done another way:

  • One Build
  • Run multiple tests in parallel, each testing one part. Can also be separated using:
    • Multiple runners (This needs an artifact, the way linuxcnc is built right now, this can be cumbersome bit it might work)
    • Might be multiple namespaces in the same runner. If this works, this can be used also locally. Namespaces is one part how docker separates itself from the host but it can be also used directly. I would have to try it out.

Due to the test runner will only shorten the CI if the package-indep gets below 12min, I see this as low prio for the CI.

@andypugh Objections if I push docker images to the linuxcnc github? They will appear somewhere here: https://github.com/LinuxCNC/linuxcnc/packages I can probably do that in CI without any additional rights. However, I might need you if I mess something up and packages have to be deleted. I will try it in my account first with the free credits but to do something meaningful, I have to do it in the linuxcnc CI.

@BsAtHome
Copy link
Copy Markdown
Contributor

Maybe the curl progress meter can be shut off at download.

@NTULINUX, when do these kernels move to linuxcnc.org? Is there a procedure?

@grandixximo
Copy link
Copy Markdown
Contributor

Opened #3992 as a small stopgap for the firefox snap flake (issue #3991). It adds one apt remove firefox line before each upgrade. If you would prefer the larger CI cleanup in #3984 to land first I can close mine; otherwise rebasing #3984 over my change should be a one-line conflict at most. Whichever lands first works for me.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented May 1, 2026

@grandixximo This is fine for me. The main target here is to not have to do all this installs at all. But this will take some time.

I could extract the --cpu improvements (25% faster package build) and if desired:

  • Actions update (A bit risky, I can not test the release target, but I don't expect that there is an issue)
  • Split into dependency / build / test (I don't see any risks)
  • Remove apt-get upgrade
  • Remove eatmydata

in a separate PR, so this can be already merged while I am busy here.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented May 1, 2026

So, I have a container based prototype running in my gitlab:
https://github.com/hdiethelm/linuxcnc-fork/tree/ci_docker_prototyping
hdiethelm/linuxcnc-fork@ci_improvemens...hdiethelm:linuxcnc-fork:ci_docker_prototyping
The containers are visible here and in the project page, will be similar on the original repo:
https://github.com/hdiethelm?tab=packages&repo_name=linuxcnc-fork
Docker build:
https://github.com/hdiethelm/linuxcnc-fork/actions/runs/25216626662
LinuxCNC rip-and-test:
https://github.com/hdiethelm/linuxcnc-fork/actions/runs/25216626671

Do you think such an approach is viable? If yes, I will try how to automatically update the docker images from time to time and then port all other targets.

Advantage:

  • Faster, rip-and-test uses 3m30 less time, will be even more for the debian package build
  • Will not download any packages except if new dependency's are added until the docker images are updated
  • Also rip-and-test targets could use a matrix to test arm / all Debian variants
  • You can use the build docker images also locally

Disadvantage:

  • Due to preinstalled package, missing dependency's might be discovered later
  • Possibly a bit more difficult to maintain
  • Will use storage for images but depending on your contract, this is for free

I will not yet merge this it this PR. As soon as I do this, packages will most probably appear in the original linuxcnc repo.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

@grandixximo

might want to look at: ROS2, Yocto, buildroot for inspiration

Let's continue the discussion here. LinuxCNC is a bit special due to it need's a ton of depency's to build. The container needed is like 3.4GB in size.

If you find something, just tell me. I had some inspiration from https://github.com/open-webui/open-webui due to I know they build containers from CI and many different manuals / articles.

@BsAtHome
Copy link
Copy Markdown
Contributor

BsAtHome commented May 1, 2026

The size of the container should be acceptable and manageable if it is in a local repo for quick download. If there need to be many different versions, that could be a problem.

However, I have no clue what deals have been made with github. @andypugh, do you know of any deals with github?

BTW, you need to rebase after I merged the snapped firefox killer.

hdiethelm added 7 commits May 2, 2026 00:38
It fails anyways, let's see if there is any difference in time
Error is: ERROR: ld.so: object 'libeatmydata.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Scripts are easyer maintainable and can be tested locally
apt upgrade should not be needed: Runner / docker should be up to date
There is no reason to fetch all for test builds
There are no submodules, so deactivated
The old ones are outdated and will be removed as mentioned by the
warning in the actions tab.
@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented May 2, 2026

While you are restructuring the CI flow, would it be in scope to add a working-tree-clean gate after the build steps?

Done, was not so bad, only one man was missing, so I was able to enforce it. But I had to add an exclude option so I can exclude the auto updated VERSION / debian/changelog file. The regexp is not perfect, it would also exclude VERSION123 but better than nothing. Any good idea to do this?

The doc step still fails with changed .po files. Should I ignore these?

@grandixximo
Copy link
Copy Markdown
Contributor

For the regex false-positives (VERSION123 etc.), git pathspec excludes give exact-path matching with no regex anchoring needed:

git status -u --porcelain -- ':(exclude)VERSION' ':(exclude)debian/changelog'

That replaces the grep -v -E step entirely. If you want to keep the regex form for flexibility, anchoring against the porcelain XY filename format works: ^.{3}(VERSION|debian/changelog)$.

For the .po files, those get touched by msgmerge during the doc build (line-number refs in source comments shift whenever any translatable string moves), so failing CI on them would basically mean any docs change blocks the gate. I'd lean toward excluding them too, but that's a doc/i18n call. @BsAtHome any preference, or should we ping the translation maintainers (Hans Unzner / Steffen Möller from recent .po history)?

@hdiethelm
Copy link
Copy Markdown
Contributor Author

For the regex false-positives (VERSION123 etc.), git pathspec excludes give exact-path matching with no regex anchoring needed:

Thanks for the hint with pathspec. I tend towards this solution. Regexp is the thing that many people don't really understand and get it wrong (including me).
For the doc's, ':(exclude)docs/po/*.po' ':(exclude)docs/po/documentation.pot' does the trick. For now, I just allow failure until we know what to do.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented May 3, 2026

BTW: With the dockers from my repo, you can easily test parts of the CI locally with this command run from the local source directory and then copy-past snippets from ci.yml:
podman run --rm -it -v .:/linuxcnc ghcr.io/hdiethelm/linuxcnc/build-container:trixie-amd64-latest

I prefer podman over docker due to podman runs without setuid so the created files are not owned by root.

@BsAtHome
Copy link
Copy Markdown
Contributor

BsAtHome commented May 3, 2026

For the .po files, those get touched by msgmerge during the doc build (line-number refs in source comments shift whenever any translatable string moves), so failing CI on them would basically mean any docs change blocks the gate. I'd lean toward excluding them too, but that's a doc/i18n call. @BsAtHome any preference, or should we ping the translation maintainers (Hans Unzner / Steffen Möller from recent .po history)?

The docs build, integration and translations are fragile. Failing on translations/.po can lead to some self-inflicted wounds. That said, the most common problem is the newline consistency, where translations are missing or adding trailing newlines. Most of that gets caught now because msgmerge fails and that is good. This is not normally merged into the main tree, but just the weblate branch failing CI. That fail is a Good ThingTM.

The default man-pages should all function and there should not be any non-.gitignored remains in docs/man/*. That bug has bitten many times. I think, in general, there should be no non-.gitignored remains anywhere after the build and test. That also goes for some of the junk left-over in config/* when, f.ex. running RIP qtdragon from a sim config.

@grandixximo
Copy link
Copy Markdown
Contributor

I think, in general, there should be no non-.gitignored remains anywhere after the build and test.

If I understand correctly, the new CI enforces this already

That also goes for some of the junk left-over in config/* when, f.ex. running RIP qtdragon from a sim config.

There are no tests for the UI's, since they are not run, no left over can be caught, we'd have to make tests that run the UIs, and then check for leftovers. @hdiethelm I could work on a separate PR for UI tests, would that be ok?

@BsAtHome
Copy link
Copy Markdown
Contributor

BsAtHome commented May 3, 2026

I think, in general, there should be no non-.gitignored remains anywhere after the build and test.

If I understand correctly, the new CI enforces this already

Great!

That also goes for some of the junk left-over in config/* when, f.ex. running RIP qtdragon from a sim config.

There are no tests for the UI's, since they are not run, no left over can be caught, we'd have to make tests that run the UIs, and then check for leftovers. @hdiethelm I could work on a separate PR for UI tests, would that be ok?

As i understand it, there has been talk about test-running some/all/any of the sims in xvfb and do some basic functional testing. That would be a very great test target because the UIs also break. Some sim UIs haven't seen a test in ages (one ini-file has a very old version tag).

See #3756.

@grandixximo
Copy link
Copy Markdown
Contributor

@hdiethelm planning to start on #3756 (xvfb-based UI smoke tests) once this lands or in parallel. Quick coordination question on the dep side before I start: my tests will need at minimum xvfb, xauth (already added by you, thanks), xdotool, and python3-pytest-qt. Possibly mesa-utils / libgl1-mesa-dri for GLcanon under llvmpipe.

I think this is a different situation from your rtapi cleanup V2 (#3919) where I rebased: that one shares files in src/rtapi/* and conflicts were unavoidable. UI tests live under tests/ui/* and only touch .github/ for the dep install list, so the only real shared edit is 3-5 package names. No code conflict, no rebase requirement on either side.

Two ways to handle the deps:

  1. You add the extras to your dep list in this PR while it's open. Keeps the install step in one place.
  2. I add them in the UI test PR. Lets this one land cleanly without scope creep.

Either works for me. Lean which way?

I'll branch from master so neither PR blocks the other. Whoever lands second adjusts the dep list (one-line edit).

@hdiethelm
Copy link
Copy Markdown
Contributor Author

...so the only real shared edit is 3-5 package names. No code conflict, no rebase requirement on either side.

Two ways to handle the deps:

1. You add the extras to your dep list in this PR while it's open. Keeps the install step in one place.
2. I add them in the UI test PR. Lets this one land cleanly without scope creep.

Probably you need 2. so the CI still runs on your branch.

I have to rebase to master anyway before merge to check if all is still fine. Just do your thing, as long as there are only a few packages more, i will see them, no issue or if mine gets in first this should also be no issue for you to rebase before merge.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

So, I finally managed to get all packages inside the docker container:
https://github.com/hdiethelm/linuxcnc-fork/actions/runs/25278237458
https://github.com/hdiethelm/linuxcnc-fork/actions/runs/25278693726
To do this, I just compile the deb's, install them and remove linuxcnc_* afterwards, so the dependency's stay:
https://github.com/hdiethelm/linuxcnc-fork/blob/ci_docker_prototyping/.github/docker/Dockerfile#L34

I would not call it elegant but it does the job. Docker build goes up from 7 to 12 min but due to this is only needed from time to time to be up to date, I don't see any issue.

Or has anyone an idea how to install these packages without building / installing linuxcnc?

The linuxcnc pipeline stays at ~ 21m duration / 3h usage. The advantage is that the debian archives are not loaded for nothing and that if they are down, the build still works.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented May 3, 2026

I think, in general, there should be no non-.gitignored remains anywhere after the build and test.

If I understand correctly, the new CI enforces this already

Actually, I check after build. Should I move the check to after test? It's easy to do, I can even test after build and after test, takes no time at all.

@BsAtHome What to do with the untracked po files? At the moment, I don't ignore them but I made this build step non-failing, similar to cppcheck / shellcheck:
https://github.com/LinuxCNC/linuxcnc/actions/runs/25276731551
https://github.com/LinuxCNC/linuxcnc/actions/runs/25276731551/job/74107710597#step:6:17

@BsAtHome
Copy link
Copy Markdown
Contributor

BsAtHome commented May 3, 2026

@BsAtHome What to do with the untracked po files? At the moment, I don't ignore them but I made this build step non-failing, similar to cppcheck / shellcheck: https://github.com/LinuxCNC/linuxcnc/actions/runs/25276731551 https://github.com/LinuxCNC/linuxcnc/actions/runs/25276731551/job/74107710597#step:6:17

The po-files should just be ignored. There is no point in failing on them. The whole i18n and l10n stuff needs to be cleaned up anyway. But that is subordinate to fixing the docs build in general (docs build is quite fragile). And all of that is subordinate to reorganizing the docs structure.

Currently, cppcheck and shellcheck must be ignored because the build is still not clean. FWIW, the ability to add -Werror took many, many months before all the kinks were ironed out.
The cppcheck part requires the remote scheduler to be fixed (well, rewritten). Then that could become terminal on any warning or error.
The shellcheck is even harder because the code expects a lot of dubious constructs to work and needs a lot more work. It was a lot worse but there is still a lot of work to get it clean.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

The po-files should just be ignored.

Done:
0cce4ae
https://github.com/LinuxCNC/linuxcnc/actions/runs/25287962668/job/74135296641#step:6:9

Currently, cppcheck and shellcheck must be ignored because the build is still not clean. FWIW, the ability to add -Werror took many, many months before all the kinks were ironed out. The cppcheck part requires the remote scheduler to be fixed (well, rewritten). Then that could become terminal on any warning or error. The shellcheck is even harder because the code expects a lot of dubious constructs to work and needs a lot more work. It was a lot worse but there is still a lot of work to get it clean.

I know that, introducing this kind of checks takes a lot of patience. Best in my opinion is to warn only for certain files but make the test failing for everything that is already good. So at least no new issues are introduced and then go from there step by step.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

Any clue why this fails? Might be a broken package went into sid just right now? I did not do any relevant changes...
https://github.com/LinuxCNC/linuxcnc/actions/runs/25287962668

@hdiethelm
Copy link
Copy Markdown
Contributor Author

@BsAtHome Thinking about this checks, of course I found some issues I created.

One option would be to do something like this once and commit the file:
scripts/shellcheck.sh | grep -Po '^In \K[^ ]+' | sort -u > shellcheck_warn_only.txt
Similar for cppcheck.

  • Then warn only for all files in this list / hard fail for everything else. So at least no new issues come in.
  • Remove corrected files from these lists to avoid new issues coming in on already fixed files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants