Skip to content

ci: replace addlicense with HawkEye for license header checks#3325

Merged
hubcio merged 16 commits into
apache:masterfrom
Standing-Man:license-header-chcek
Jun 5, 2026
Merged

ci: replace addlicense with HawkEye for license header checks#3325
hubcio merged 16 commits into
apache:masterfrom
Standing-Man:license-header-chcek

Conversation

@Standing-Man
Copy link
Copy Markdown
Contributor

@Standing-Man Standing-Man commented May 27, 2026

Which issue does this PR close?

Rationale

Because license header validation runs on every git commit, the current setup requires starting a Docker container each time. In addition, addlicense only provides an AMD64 Docker image, which can introduce unnecessary overhead on other architectures.

This PR replaces addlicense with HawkEye for license header validation, eliminating the Docker dependency and providing a more lightweight and consistent checking experience.

What changed?

This PR migrates license-header validation to HawkEye and standardizes ASF license headers using a centralized licenserc.toml.

Changes

  • Added HawkEye configuration in licenserc.toml, including explicit comment-style mappings and repository-specific excludes.
  • Replaced the old license-header check with scripts/ci/license-headers.sh for both local and CI usage.
  • Updated CI to install HawkEye through the official action and run the shared license-header script.
  • Improved the script’s reliability and diagnostics:
    • parses HawkEye JSON output with jq
    • reports files updated by --fix
    • detects duplicate ASF headers caused by comment-style mismatches
    • prints clearer guidance to keep a single header with the configured comment style
    • fails clearly on unknown file types
  • Normalized existing ASF headers across the repository to match the configured styles, including PHP, C/C++, Rust, web, shell, config, SDK, examples, and tests.
  • Kept PHP headers aligned with the generated foreign/php/iggy-php.stubs.php style.
  • Standardized shell scripts so shebangs are followed directly by the ASF header.

Validation

./scripts/ci/license-headers.sh --check
./scripts/ci/license-headers.sh --fix

Local Execution

  • Passed / not passed
    Passed
  • Pre-commit hooks ran / not ran
    Ran

AI Usage

  1. Which tools? (e.g., GitHub Copilot, Claude, ChatGPT) CodeX
  2. Scope of usage? (e.g., autocomplete, generated functions, entire implementation) entire implementation
  3. How did you verify the generated code works correctly? I verified it locally by running the relevant tests and manually checking the generated output.
  4. Can you explain every line of the code if asked? Yes

@github-actions
Copy link
Copy Markdown

Thanks for the pull request. It is now waiting for review, labeled S-waiting-on-review.

You can update that label as the review goes back and forth, with slash commands - each on its own line, in a regular PR comment (not an inline review reply):

  • /ready - mark it S-waiting-on-review again, after addressing feedback
  • /author - mark it S-waiting-on-author (maintainers, or anyone who has had a PR merged before)
  • /request-review @user ... - request reviewers (@user or @org/team)

Commands take up to ~90s to apply. If no reaction (👍 or 😕) appears on your comment, the apply step likely failed - check the repo's Actions tab for the PR Triage Apply run. Commands posted inside a review body (rather than a normal comment) cannot be reacted to, so they stay log-only.

See CONTRIBUTING.md for details.

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label May 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.26%. Comparing base (da75706) to head (4825bbb).

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3325      +/-   ##
============================================
- Coverage     74.44%   72.26%   -2.19%     
  Complexity      943      943              
============================================
  Files          1245     1245              
  Lines        121477   116548    -4929     
  Branches      97599    92882    -4717     
============================================
- Hits          90439    84227    -6212     
- Misses        28080    29150    +1070     
- Partials       2958     3171     +213     
Components Coverage Δ
Rust Core 72.90% <ø> (-2.65%) ⬇️
Java SDK 58.44% <ø> (ø)
C# SDK 69.43% <ø> (-0.50%) ⬇️
Python SDK 81.06% <ø> (ø)
PHP SDK 83.57% <ø> (ø)
Node SDK 91.35% <ø> (-0.06%) ⬇️
Go SDK 40.20% <ø> (ø)
Files with missing lines Coverage Δ
core/ai/mcp/src/api.rs 36.84% <ø> (ø)
core/ai/mcp/src/configs.rs 44.24% <ø> (ø)
core/ai/mcp/src/log.rs 13.33% <ø> (ø)
core/ai/mcp/src/main.rs 54.54% <ø> (+1.01%) ⬆️
core/ai/mcp/src/service/mod.rs 60.28% <ø> (ø)
core/ai/mcp/src/service/requests.rs 100.00% <ø> (ø)
core/ai/mcp/src/stream.rs 75.91% <ø> (+0.72%) ⬆️
core/cli/src/args/common.rs 97.87% <ø> (ø)
core/cli/src/args/context.rs 100.00% <ø> (ø)
core/cli/src/args/message.rs 66.12% <ø> (ø)
... and 102 more

... and 625 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Standing-Man Standing-Man changed the title chore(ci): replace addlicense with HawkEye for license header checks ci: replace addlicense with HawkEye for license header checks May 27, 2026
Comment thread bdd/rust/tests/common/global_context.rs
Comment thread scripts/ci/license-headers.sh
Comment thread .github/workflows/_common.yml Outdated
Comment thread scripts/ci/license-headers.sh Outdated
Comment thread scripts/ci/license-headers.sh Outdated
Comment thread scripts/ci/license-headers.sh
Comment thread licenserc.toml Outdated
Comment thread licenserc.toml
Comment thread licenserc.toml
Comment thread licenserc.toml
Comment thread scripts/ci/license-headers.sh
@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels May 27, 2026
@Standing-Man Standing-Man force-pushed the license-header-chcek branch 4 times, most recently from cc8b0e0 to 28e84b9 Compare May 29, 2026 16:51
@Standing-Man
Copy link
Copy Markdown
Contributor Author

Standing-Man commented May 30, 2026

Unrelated fixes included in this PR:

  1. Fixed the helmfmt pre-commit hook, which was not passing any arguments and could cause the check to fail.
  2. Fixed an indentation issue in foreign/csharp/Benchmarks/Program.cs.
  3. Updated cli::system::test_login_command::should_help_match to replace hidden whitespace used for indentation with an explicit CLAP_INDENT constant, since the original blank-space formatting could be removed by the trailing-whitespace check.

The current license header validation is overly strict. In some cases, a file already contains a valid license header using one comment style, but the checker still reports the file as missing a license header. As a result, running --fix may insert another license header using a different comment style, leading to duplicate license headers in the same file.

To prevent this issue, duplicate license header check has been added to the license header validation script. cc @hubcio

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels May 30, 2026
@hubcio
Copy link
Copy Markdown
Contributor

hubcio commented Jun 1, 2026

@Standing-Man i was thinking about this:

The current license header validation is overly strict. In some cases, a file already contains a valid license header using one comment style, but the checker still reports the file as missing a license header. As a result, running --fix may insert another license header using a different comment style, leading to duplicate license headers in the same file.

perhaps this is good issue to raise in hawkeye repository?

Copy link
Copy Markdown
Contributor

@hubcio hubcio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the local pre-commit path now needs hawkeye (compiled from source via cargo install, slow the first time) and jq installed on the host. neither is mentioned in CONTRIBUTING, so a line there would save contributors a confusing first run. or perhaps can you create a Tooling section in CONTRIBUTING.md, where you'll list all tools needed for percommit hooks to pass?

Comment thread web/src/app.html Outdated
Comment thread core/configs/src/configs_impl/typed_env_provider.rs Outdated
Comment thread scripts/ci/license-headers.sh Outdated
Comment thread licenserc.toml
Comment thread licenserc.toml
Comment thread .pre-commit-config.yaml Outdated
Comment thread .github/workflows/_common.yml Outdated
@Standing-Man Standing-Man force-pushed the license-header-chcek branch from 263e106 to 9e688e9 Compare June 2, 2026 03:25
@Standing-Man Standing-Man requested a review from hubcio June 2, 2026 04:22
@Standing-Man Standing-Man force-pushed the license-header-chcek branch from 3b20a23 to 1e06d8c Compare June 2, 2026 07:59
Copy link
Copy Markdown
Contributor

@hubcio hubcio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few items from re-reviewing the latest push.

on CONTRIBUTING.md (not in the diff, so noting it here): the pre-commit license-headers hook is language: system and runs license-headers.sh, which hard-fails with "hawkeye command not found" if hawkeye isn't on PATH. so every contributor now needs cargo install hawkeye --version 6.5.1 --locked locally, but the pre-commit hooks section only documents prek and typos-cli. the first commit after prek install will fail until they read the script's hint. worth adding hawkeye to the install list.

after you fix these comments below (+ this one above) i believe we can merge.

Comment thread web/src/app.html Outdated
Comment thread core/configs/src/configs_impl/typed_env_provider.rs Outdated
Comment thread AGENTS.md Outdated
Comment thread scripts/ci/license-headers.sh
Comment thread scripts/ci/license-headers.sh Outdated
Comment thread scripts/ci/license-headers.sh Outdated
@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jun 2, 2026
@Standing-Man Standing-Man force-pushed the license-header-chcek branch 2 times, most recently from 0fe9ce0 to e2b4875 Compare June 3, 2026 23:53
@Standing-Man
Copy link
Copy Markdown
Contributor Author

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels Jun 4, 2026
@Standing-Man
Copy link
Copy Markdown
Contributor Author

Standing-Man commented Jun 4, 2026

@Standing-Man i was thinking about this:

The current license header validation is overly strict. In some cases, a file already contains a valid license header using one comment style, but the checker still reports the file as missing a license header. As a result, running --fix may insert another license header using a different comment style, leading to duplicate license headers in the same file.

perhaps this is good issue to raise in hawkeye repository?

Good suggestion, I’ll check the HawkEye repository and verify whether this can be raised as an enhancement issue.

If we don’t enforce a strict one-to-one mapping between file types and comment styles, the license header check would also avoid this kind of issue.

hubcio
hubcio previously approved these changes Jun 5, 2026
@Standing-Man
Copy link
Copy Markdown
Contributor Author

Thank you very much for your review. This has been quite a journey. 😂 @hubcio

numinnex
numinnex previously approved these changes Jun 5, 2026
@hubcio
Copy link
Copy Markdown
Contributor

hubcio commented Jun 5, 2026

yeah, i think its one of biggest PRs. good that you've found hawkeye, i wasn't fully happy with addlicense. good job!

Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Restore the intentionally misspelled env var names used by the fuzzy
matching test, and make the typos pre-commit hook respect configured
excludes for explicitly passed files.

Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
Signed-off-by: StandingMan <jmtangcs@gmail.com>
@Standing-Man Standing-Man dismissed stale reviews from hubcio and numinnex via 4825bbb June 5, 2026 09:27
@Standing-Man Standing-Man force-pushed the license-header-chcek branch from 91ae033 to 4825bbb Compare June 5, 2026 09:27
@hubcio hubcio merged commit dbeab0c into apache:master Jun 5, 2026
92 checks passed
@github-actions github-actions Bot removed the S-waiting-on-review PR is waiting on a reviewer label Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants