SONARJAVA-6524 Generate built-in profiles from rule metadata#5705
SONARJAVA-6524 Generate built-in profiles from rule metadata#5705romainbrenguier wants to merge 5 commits into
Conversation
c0ca171 to
f826968
Compare
Replace metadata-based profile generation with directory-based approach. Each rule's profile membership is now represented by a file in profile-specific directories (profiles/sonar_way/, profiles/sonar_agentic_ai/). This eliminates merge conflicts when parallel PRs add rules to profiles, as each PR creates a new file instead of editing a shared JSON array. Changes: - Add ProfileJsonGenerator to scan profile directories and generate JSONs - Create profile directories with 534 (Sonar way) and 467 (Agentic AI) rule files - Update pom.xml to generate and copy profiles during build - Add README.md with usage instructions
6c724fb to
4ebb9c8
Compare
…7) to match the 468 files in sonar_agentic_ai profile directory; updated MetadataTest to read generated Sonar_way_profile.json from target/classes/ instead of src/main/resources/ since it is now generated during the build
Comment: <details> <summary><b>Code Review</b> <kbd>👍 Approved with suggestions</kbd> <kbd>5 resolved / 7 findings</kbd></summary> Automates built-in profile generation by moving rule membership metadata into individual rule files, resolving issues with stale JSON tracking and brittle manual updates. Consolidate the duplicate copy operations in the build configuration and refine the rule-key validation logic to prevent silent file drops. <details> <summary>💡 <b>Quality:</b> Generated profiles copied twice via <resources> and copy-resources</summary> <kbd>📄 <a href="https://github.com/SonarSource/sonar-java/pull/5705/files#diff-a2a59812e774224a494679a03de77f5fe24ceb84295e379d6b9583ef97a1ee15R148-R155">sonar-java-plugin/pom.xml:148-155</a></kbd> <kbd>📄 <a href="https://github.com/SonarSource/sonar-java/pull/5705/files#diff-a2a59812e774224a494679a03de77f5fe24ceb84295e379d6b9583ef97a1ee15R397-R411">sonar-java-plugin/pom.xml:397-411</a></kbd> The build both declares `${project.build.directory}/generated-resources/profiles` as a `<resource>` directory (which the default `process-resources` execution already copies into `${project.build.outputDirectory}`) and adds a separate `copy-generated-profiles` maven-resources-plugin execution that copies the same directory to the same `outputDirectory`. The two mechanisms are redundant. Keeping only one (the `<resources>` entry is sufficient) would reduce confusion and avoid double-processing the same files. <details> <summary>Fix</summary> ```` <!-- Remove the redundant copy-generated-profiles execution; the <resources> entry for generated-resources/profiles already copies the files into ${project.build.outputDirectory} during process-resources. --> ```` </details> </details> <details> <summary>💡 <b>Quality:</b> Misnamed rule-key files are silently dropped from profiles</summary> <kbd>📄 <a href="https://github.com/SonarSource/sonar-java/pull/5705/files#diff-527d6d3ff6d0b2988ebdcb2fe8ecc63ce3bf3ce782e105cb2ddd1881b66929edR67">sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:67</a></kbd> <kbd>📄 <a href="https://github.com/SonarSource/sonar-java/pull/5705/files#diff-527d6d3ff6d0b2988ebdcb2fe8ecc63ce3bf3ce782e105cb2ddd1881b66929edR73-R76">sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:73-76</a></kbd> `collectRuleKeys` filters profile-directory entries with `isValidRuleKey` (`S\d+`). Any file that does not exactly match — e.g. a typo like `s106` (lowercase), `S106 ` (trailing space), or `S106.txt` — is silently skipped, so the corresponding rule disappears from the generated profile with no error or warning. Given the whole design relies on humans creating empty files named after rule keys, a silent drop makes profile-membership mistakes hard to detect. Consider logging a warning for files in a profile directory that do not match the expected rule-key pattern (excluding known files such as README/.gitignore). <details> <summary>Fix</summary> ```` files .filter(Files::isRegularFile) .map(Path::getFileName) .map(Path::toString) .peek(name -> { if (!isValidRuleKey(name)) { System.err.println("Ignoring non-rule-key file in profile directory: " + name); } }) .filter(ProfileJsonGenerator::isValidRuleKey) .sorted(Comparator.comparingInt(ProfileJsonGenerator::numericKey)) .collect(Collectors.toList()); ```` </details> </details> <details> <summary><kbd>✅ 5 resolved</kbd></summary> <details> <summary>✅ <b>Quality:</b> Profile generator silently drops rules with unknown profile names</summary> > <kbd>📄 sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:64-72</kbd> <kbd>📄 sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:84-97</kbd> > `collectKeysByProfile` looks up each profile name extracted from a rule's `defaultQualityProfiles` with `keysByProfile.get(profile)` and only adds the rule key when the returned list is non-null. Any profile name that is not exactly one of the two keys in `PROFILES` ("Sonar way", "Sonar agentic AI") is therefore silently ignored. > > This migration moves profile membership into ~500 hand-edited rule metadata files, so a typo such as "Sonar Way", "sonar way", or "Sonar agentic Al" in any single rule would silently exclude that rule from the built-in profile with no error. The safety nets are weak: `MetadataTest.ensure_sane_Sonar_way_profile` only asserts the Sonar way size is `> 400`, so a handful of dropped rules would go completely unnoticed (the agentic test uses an exact size, but Sonar way does not). Likewise, a rule whose JSON omits `defaultQualityProfiles` entirely is silently excluded. > > Recommend failing the build (or at minimum warning) when a rule references a profile name that is not in `PROFILES`, so accidental omissions surface at build time instead of shipping an incomplete profile. </details> <details> <summary>✅ <b>Quality:</b> Regex-based JSON parsing in ProfileJsonGenerator is fragile</summary> > <kbd>📄 sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:33-35</kbd> <kbd>📄 sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:84-97</kbd> <kbd>📄 sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:99-105</kbd> > `ProfileJsonGenerator` extracts `sqKey` and `defaultQualityProfiles` via hand-written regular expressions rather than a JSON parser. This works for the current well-formatted metadata, but it is brittle: `JSON_STRING_PATTERN` blindly captures every quoted token inside the `defaultQualityProfiles` array, so any future change such as an inline comment, an escaped quote, or reformatting could yield wrong profile names or miss entries. Because the generator runs as a single-file source launch (`java ProfileJsonGenerator.java`) it cannot easily depend on Gson; however the fragility is worth a comment and tight patterns. Consider at least documenting the assumption that metadata files are machine-generated and strictly formatted, and validating extracted profile names against the known set (see related finding) so malformed input cannot silently produce an incorrect profile. </details> <details> <summary>✅ <b>Bug:</b> Stale source profile JSONs collide with generated ones</summary> > <kbd>📄 sonar-java-plugin/pom.xml:148-155</kbd> <kbd>📄 sonar-java-plugin/pom.xml:397-411</kbd> <kbd>📄 sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:42</kbd> <kbd>📄 sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:56-57</kbd> > The PR's stated goal is to "stop tracking the generated profile JSONs," but the old hand-maintained files are still present in source: `sonar-java-plugin/src/main/resources/org/sonar/l10n/java/rules/java/Sonar_way_profile.json` and `Sonar_agentic_AI_profile.json` (the diff shows 0 deletions). `ProfileJsonGenerator` now writes freshly generated files to the SAME packaged path (`org/sonar/l10n/java/rules/java/Sonar_way_profile.json`). > > In the pom, both `src/main/resources` and `${project.build.directory}/generated-resources/profiles` are declared as resource directories (lines 148-155), and there is also a `copy-generated-profiles` copy-resources execution. Both the stale src copy and the generated copy resolve to the identical target path in `target/classes`. Which one ends up packaged depends entirely on maven-resources-plugin copy ordering and its `overwrite` timestamp semantics (by default a resource is only copied when the source is newer than the destination). This is fragile: the plugin may ship the stale, hand-maintained profile instead of the generated one, and at minimum the two definitions can silently diverge while both remain authoritative-looking. > > Delete the old `Sonar_way_profile.json` / `Sonar_agentic_AI_profile.json` from `src/main/resources` so the generated artifact is the single source of truth, and ensure the per-rule profile membership files fully reproduce the previous profile contents. </details> <details> <summary>✅ <b>Edge Case:</b> numericKey throws cryptic NumberFormatException on stray files</summary> > <kbd>📄 sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:61-75</kbd> > `collectRuleKeys` lists every regular file in a profile directory and feeds each filename to `numericKey`, which does `Integer.parseInt(ruleKey.substring(1))`. Any file whose name is not exactly `S<digits>` — e.g. a `.gitkeep`, `.DS_Store`, editor swap file, or a typo'd rule key such as `S891O` (letter O) — causes a `NumberFormatException` that aborts the build with an opaque message ("For input string ...") and no indication of the offending directory/file. > > Consider filtering to files matching `S\d+` (and/or sorting with a fallback comparator) and throwing a descriptive error that names the bad file, so contributors immediately understand the problem. </details> <details> <summary>✅ <b>Bug:</b> MetadataTest reads deleted src/main/resources profile JSON</summary> > <kbd>📄 sonar-java-plugin/src/main/resources/org/sonar/l10n/java/rules/java/.gitignore:1</kbd> > This PR deletes `src/main/resources/org/sonar/l10n/java/rules/java/Sonar_way_profile.json` (and the agentic one) and adds a `.gitignore` for `*_profile.json`, so the profile JSONs now only exist as generated artifacts under `target/generated-resources` / `target/classes`. However `MetadataTest.ensure_sane_Sonar_way_profile()` still reads the profile via a hard-coded filesystem path: `Path.of("src/main/resources/" + JavaSonarWayProfile.SONAR_WAY_PATH)` and opens it with `Files.newReader(profilePath.toFile(), ...)`. Since that file no longer exists in the source tree, the test will fail with FileNotFoundException. This test is explicitly listed in the PR's test command (`-Dtest=MetadataTest,...`). The PR description says tests should be updated 'to validate the generated classpath resources instead of src/main/resources files', but MetadataTest was not updated. Point the test at the generated output (e.g. `target/classes` + SONAR_WAY_PATH) or load it from the classpath via `getResourceAsStream(SONAR_WAY_PATH)`. </details> </details> <details> <summary>🤖 <b>Prompt for agents</b></summary> ```` Code Review: Automates built-in profile generation by moving rule membership metadata into individual rule files, resolving issues with stale JSON tracking and brittle manual updates. Consolidate the duplicate copy operations in the build configuration and refine the rule-key validation logic to prevent silent file drops. 1. 💡 Quality: Generated profiles copied twice via <resources> and copy-resources Files: sonar-java-plugin/pom.xml:148-155, sonar-java-plugin/pom.xml:397-411 The build both declares `${project.build.directory}/generated-resources/profiles` as a `<resource>` directory (which the default `process-resources` execution already copies into `${project.build.outputDirectory}`) and adds a separate `copy-generated-profiles` maven-resources-plugin execution that copies the same directory to the same `outputDirectory`. The two mechanisms are redundant. Keeping only one (the `<resources>` entry is sufficient) would reduce confusion and avoid double-processing the same files. Fix: <!-- Remove the redundant copy-generated-profiles execution; the <resources> entry for generated-resources/profiles already copies the files into ${project.build.outputDirectory} during process-resources. --> 2. 💡 Quality: Misnamed rule-key files are silently dropped from profiles Files: sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:67, sonar-java-plugin/src/main/build/ProfileJsonGenerator.java:73-76 `collectRuleKeys` filters profile-directory entries with `isValidRuleKey` (`S\d+`). Any file that does not exactly match — e.g. a typo like `s106` (lowercase), `S106 ` (trailing space), or `S106.txt` — is silently skipped, so the corresponding rule disappears from the generated profile with no error or warning. Given the whole design relies on humans creating empty files named after rule keys, a silent drop makes profile-membership mistakes hard to detect. Consider logging a warning for files in a profile directory that do not match the expected rule-key pattern (excluding known files such as README/.gitignore). Fix: files .filter(Files::isRegularFile) .map(Path::getFileName) .map(Path::toString) .peek(name -> { if (!isValidRuleKey(name)) { System.err.println("Ignoring non-rule-key file in profile directory: " + name); } }) .filter(ProfileJsonGenerator::isValidRuleKey) .sorted(Comparator.comparingInt(ProfileJsonGenerator::numericKey)) .collect(Collectors.toList()); ```` </details> </details> <details> <summary><b>Options</b> </summary> <kbd>Auto-apply is off</kbd> → Gitar will not commit updates to this branch.<br><kbd>Display: compact</kbd> → Showing less information. Comment with these commands to change: <table> <tr> <td><kbd>Auto-apply</kbd></td> <td><kbd>Compact</kbd></td> </tr> <tr> <td> ``` gitar auto-apply:on ``` </td> <td> ``` gitar display:verbose ``` </td> </tr> </table> </details> <sub>Was this helpful? React with 👍 / 👎 | [Gitar](https://gitar.ai)</sub>
Comment: <details>
<summary>💡 <b>Quality:</b> Generated profiles copied twice via <resources> and copy-resources</summary>
The build both declares `${project.build.directory}/generated-resources/profiles` as a `<resource>` directory (which the default `process-resources` execution already copies into `${project.build.outputDirectory}`) and adds a separate `copy-generated-profiles` maven-resources-plugin execution that copies the same directory to the same `outputDirectory`. The two mechanisms are redundant. Keeping only one (the `<resources>` entry is sufficient) would reduce confusion and avoid double-processing the same files.
**Fix:**
````
<!-- Remove the redundant copy-generated-profiles execution; the
<resources> entry for generated-resources/profiles already copies
the files into ${project.build.outputDirectory} during process-resources. -->
````
<!-- gitar:apply-fix-start -->
<!-- gitar:fix-id:019f127a-8fe9-7052-8685-2c854f0d54cb -->
- [ ] Apply fix
<!-- gitar:apply-fix-end -->
<sub><!-- gitar:footer-start -->Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎<!-- gitar:footer-end --></sub>
</details>
…uild/ProfileJsonGenerator.java
Comment: <details>
<summary>💡 <b>Quality:</b> Misnamed rule-key files are silently dropped from profiles</summary>
`collectRuleKeys` filters profile-directory entries with `isValidRuleKey` (`S\d+`). Any file that does not exactly match — e.g. a typo like `s106` (lowercase), `S106 ` (trailing space), or `S106.txt` — is silently skipped, so the corresponding rule disappears from the generated profile with no error or warning. Given the whole design relies on humans creating empty files named after rule keys, a silent drop makes profile-membership mistakes hard to detect. Consider logging a warning for files in a profile directory that do not match the expected rule-key pattern (excluding known files such as README/.gitignore).
**Fix:**
````
files
.filter(Files::isRegularFile)
.map(Path::getFileName)
.map(Path::toString)
.peek(name -> {
if (!isValidRuleKey(name)) {
System.err.println("Ignoring non-rule-key file in profile directory: " + name);
}
})
.filter(ProfileJsonGenerator::isValidRuleKey)
.sorted(Comparator.comparingInt(ProfileJsonGenerator::numericKey))
.collect(Collectors.toList());
````
<!-- gitar:apply-fix-start -->
<!-- gitar:fix-id:019f127a-8fe9-7052-8685-2c961a1eab8f -->
- [ ] Apply fix
<!-- gitar:apply-fix-end -->
<sub><!-- gitar:footer-start -->Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎<!-- gitar:footer-end --></sub>
</details>
|
Thanks for the comprehensive review. I've addressed both remaining suggestions:
Both changes improve the build configuration clarity and help prevent silent profile-membership mistakes. |
Code Review ✅ Approved 7 resolved / 7 findingsAutomates profile JSON generation from rule metadata during the build and transitions to directory-based profile composition. All previous findings regarding parsing fragility, stale resource handling, and silent failures have been resolved. ✅ 7 resolved✅ Quality: Profile generator silently drops rules with unknown profile names
✅ Quality: Regex-based JSON parsing in ProfileJsonGenerator is fragile
✅ Bug: Stale source profile JSONs collide with generated ones
✅ Edge Case: numericKey throws cryptic NumberFormatException on stray files
✅ Bug: MetadataTest reads deleted src/main/resources profile JSON
...and 2 more resolved from earlier reviews OptionsAuto-apply is off → Gitar will not commit updates to this branch. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|





Summary
Testing
Summary by Gitar
ProfileJsonGenerator.javato automate the creation of profile JSON files during the build process.pom.xmlto include generated resource directories and configuredexec-maven-pluginto execute the generator.README.mdinsrc/main/resources/profiles/detailing the new rule management and build process.JavaAgenticWayProfileTestto reflect the change in the total count of active rules from465to467.This will update automatically on new commits.