Implement SCA Reachability runtime detection: report vulnerable classes and callsites via telemetry#11352
Implement SCA Reachability runtime detection: report vulnerable classes and callsites via telemetry#11352jandro996 wants to merge 85 commits into
Conversation
Adds a new SCA Reachability subsystem that reports which vulnerable library classes were actually loaded at runtime, reducing false positives from static dependency scanning. Gated on DD_APPSEC_SCA_ENABLED. Key components: - Gradle task downloads GHSA enrichments from sca-reachability-database and generates sca_cves.json bundled in the agent jar at build time - ClassFileTransformer (observation-only) detects when vulnerable classes are loaded, resolves JAR versions via pom.properties, and checks semver ranges using ComparableVersion (Maven semantics) - ScaReachabilityCollector bridges the transformer and telemetry without circular dependencies, following the WafMetricCollector pattern - ScaReachabilityPeriodicAction reports hits on each app-dependencies-loaded heartbeat by adding reachability metadata to existing dependency entries
…n task The Gradle task now writes to src/main/resources/ and runs only when -PrefreshSca is passed or the file is absent, so CI builds never need network access to the private sca-reachability-database repo.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e607887e99
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
On Java 9+, the system classloader (jdk.internal.loader.ClassLoaders$AppClassLoader) no longer extends URLClassLoader, so the URLClassLoader chain walk misses all main classpath entries. Add a fallback that reads java.class.path to cover this case, deduplicating with a HashSet<URL> to avoid scanning the same JAR twice.
…rivate Test verifies: (1) system classloader is not URLClassLoader on Java 9+, and (2) findArtifactVersionInClasspath finds artifacts via java.class.path fallback. Applies to Java 9 and all subsequent JDKs (permanent JDK design change).
When sca_cves.json contains symbols with method != null, the transformer injects a static callback at method entry using ASM. The callback fires the first time the method is called and reports via ScaReachabilityCallback (bootstrap classloader, accessible from any application class). Key changes: - ScaReachabilityCallback in agent-bootstrap: bootstrap-visible callback with runtime dedup (vulnId|artifact|methodName) and handler registration - ScaReachabilityTransformer: injectMethodCallbacks() uses ByteBuddy ASM to inject INVOKESTATIC at first line number of each target method; processClass() routes class-level vs method-level symbols separately - ScaReachabilityHit: adds symbolName + line fields; existing constructor defaults to <clinit>/1 for class-level hits (backward compatible) - ScaReachabilityPeriodicAction: buildMetadataValue() now uses hit.symbolName() and hit.line() instead of hardcoded values - 6 tests: ASM injection, callback fires on method call only, dedup, multiple methods, safe method not reported, class-level unaffected
…rsion-unresolved Two cases required deferred retransformation: 1. Classes already loaded at startup (before transformer registered): the bytecode callback cannot be injected without retransformClasses() 2. Classes where DependencyResolver returned empty deps at load time (version not yet resolvable): empty results are now not cached to allow retries ScaReachabilityTransformer now stores Instrumentation and exposes performPendingRetransforms() called on each telemetry heartbeat via a Runnable callback in ScaReachabilityCollector.periodicWorkCallback. Classes are queued via: - pendingRetransform (Class<?> queue) from checkAlreadyLoadedClasses - pendingRetransformNames (String set) from processClass on empty deps
retransformClasses() always starts from the ORIGINAL class file bytes, not from the previously-transformed bytes. A dedup check in injectCallbacks() that blocked re-injection on the second pass caused the callback to be removed (the class was returned to its original, un-instrumented state). The authoritative dedup for method-level hits is ScaReachabilityCallback.reported (bootstrap-side), which persists across retransformations regardless of how many times transform() is called on the same class. Also update .claude-invariants.md: retransformClasses is now used (for method-level only), the cache constraint clarified, and the dedup invariant documents the two-level approach (transformer for class-level, bootstrap for method-level).
…avadoc, add retransform tests - performPendingRetransforms(): early return when instrumentation is null (unit test safety) - ScaReachabilityCollector: encapsulate periodicWorkCallback as private with getter/setter - ScaReachabilityTransformer class Javadoc: update dedup description from (vulnId,artifact) pair to (vulnId,artifact,symbolName) tuple; document two-level dedup strategy - Add 3 tests for performPendingRetransforms(): no-op with null inst, retransformClasses called for pending queue, no-op when both queues empty
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 82ea8065d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…esolution P1: Replace StandardCharsets.UTF_8 with "UTF-8" string in ScaCveDatabase.load(). java.nio.* is forbidden during premain (bootstrap_design_guidelines.md) because it can trigger premature provider initialization before the app configures the runtime. P2: Add classpath fallback in resolveVersionForArtifact() for entries where the vulnerable artifact is an aggregator/starter POM whose watched classes live in a transitive dependency JAR (e.g., spring-boot-starter-web watches @controller but @controller is defined in spring-context.jar, not the starter). The new helper first checks the class's own JAR, then falls back to findArtifactVersionInClasspath with a hit cache (classpathArtifactCache). processPathA uses the same helper for consistency.
…IfPresent helper - Add CLASS_LEVEL_SYMBOL = "<clinit>" constant to avoid magic string repetition (appeared 3 times in the same class; a typo would silently produce wrong symbol names) - Extract reportClassLevelHitIfPresent(entry, version, internalClassName) helper to unify identical class-level symbol matching loops in processPathA, processPathB, and processClass — all three now delegate to the single helper
Move CLASS_LEVEL_SYMBOL = "<clinit>" to ScaReachabilityHit (internal-api) as a public constant so both the transformer (appsec) and the telemetry payload builder share the canonical definition without cross-module string duplication. The convenience constructor also uses the constant now. ScaReachabilityTransformer delegates to ScaReachabilityHit.CLASS_LEVEL_SYMBOL. Fix misleading comment in processClass: "We enqueue via classBeingRedefined is null here" → explains that classBeingRedefined is null on first class load, preventing direct Class<?> queuing, so scheduleRetransformByName is used instead.
…lback - ScaCveDatabase: move "java.nio.* forbidden in premain" comment from the imports block to inline at the InputStreamReader construction site (comments in imports are unusual and smola flags verbose placement) - ScaReachabilityTransformer.resolveVersionForArtifact: make package-private for testing; add 4 tests covering the two-step fallback: (1) version from classJarDeps directly (2) classpath fallback when classJarDeps is empty (transitive JAR case) (3) classpathArtifactCache hit on second call (4) null for absent artifact
- Remove empty visitCode() in MethodEntryInjector: the method only called super.visitCode() and its comment was misleading — the actual no-debug-info fallback injection is handled by ensureInjected() in the visitInsn/visitVarInsn/ visitMethodInsn/visitFieldInsn overrides, not here - Remove private CLASS_LEVEL_SYMBOL alias in ScaReachabilityTransformer: the constant is used in exactly one place (reportClassLevelHitIfPresent) and ScaReachabilityHit.CLASS_LEVEL_SYMBOL is self-documenting at that site; the alias added a private field with no benefit after the constant was moved to ScaReachabilityHit in a previous commit
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Per the RFC and Python implementation (dd-trace-py#17156), the telemetry payload path/symbol/line for method-level hits must report the APPLICATION FRAME that called the vulnerable method (the callsite), not the vulnerable method itself. ScaReachabilityCallback.onMethodHit() now walks Thread.getStackTrace() to find the first non-agent, non-JDK frame after the vulnerable class: ScaReachabilityCallback.onMethodHit (skip - us) com.foo.VulnerableClass.method (skip - vulnerable class) com.myapp.UserService.processRequest (CALLSITE - report this) The dotClassName/methodName params are still baked into the bytecode and used only for deduplication (vulnId|artifact|methodName key). The handler receives the callsite's class/method/line for telemetry. Fallback: if no application frame is found (e.g. called from JDK internals), reports the vulnerable symbol itself so the backend knows it was reached. Class-level hits (<clinit>) are unchanged — no callsite at class load time.
ScaReachabilityCallback (bootstrap) must stay minimal — complex logic does not belong there. Move findCallsite() to ScaReachabilitySystem which has access to internal-api utilities. The handler runs synchronously so the full call stack is still present: ScaReachabilitySystem handler ScaReachabilityCallback.onMethodHit <vulnerable method> <application callsite> ← reported Uses the same class-prefix predicate as AbstractStackWalker. isNotDatadogTraceStackElement (package-private, so replicated inline) to skip agent/JDK frames, consistent with the IAST trie-based filtering infrastructure used elsewhere in the codebase.
…ltering Make isNotDatadogTraceStackElement public in AbstractStackWalker so SCA Reachability can use the existing predicate directly rather than duplicating the 3 class-prefix conditions inline.
… path ScaReachabilitySystemCallsiteTest covers: - findCallsite returns null when vulnerable class is not on the stack (triggers fallback: handler reports the vulnerable symbol itself) - findCallsite skips the vulnerable class frame and returns the first non-agent frame above it (using java.lang.Thread as a non-agent class guaranteed to be at the top of getStackTrace()) Note on the method-level integration test: TargetClass is in com.datadog.appsec.sca.* (agent namespace) so AbstractStackWalker filters it as agent code and findCallsite() returns null. The test now documents this fallback behaviour explicitly. In production the vulnerable class is always a 3rd-party library (e.g. com.fasterxml.jackson.*) and the happy path fires correctly — verified by ScaReachabilitySystemCallsiteTest.
The stream().anyMatch() for detecting method-level symbols was computed for every entry unconditionally. It is only needed when version == null (deps not yet resolvable). Moving the check inside the version==null branch eliminates the stream allocation on the common path where the version resolves successfully.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6008ac9ca0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…cators JDK classes (e.g. java.sql.PreparedStatement, protectionDomain==null) are loaded by ANY app that uses JDBC, regardless of which driver is present. Using their presence to infer that a specific library (e.g. PostgreSQL) is "reachable" produces classpath-presence false positives, not runtime reachability signals. Entries that list JDK symbols (e.g. the PostgreSQL advisory) also include library-specific classes (e.g. org.postgresql.ds.PGSimpleDataSource) that Path A correctly detects when those classes are actually loaded. In checkAlreadyLoadedClasses(), classes with no code source (JDK/bootstrap) are now skipped silently. The invariants and KB are updated accordingly.
39e96bb to
0255588
Compare
ScaReachabilityRetransformTest verifies that performPendingRetransforms() retransforms ALL classloader instances of a class, not just the first one. Simulates Spring Boot's multiple LaunchedURLClassLoader instances by returning the same Class<?> twice from getAllLoadedClasses() — with the old remove() approach only one entry was passed to retransformClasses(); with the fix (contains()+removeAll()) both are collected. Also tests the re-queue path: on retransformClasses() failure, all collected classes must be added back to pendingRetransform for the next heartbeat retry. Adds mockito to appsec test dependencies. Makes pendingRetransformNames package-private (consistent with pendingRetransform and pendingClassEvents).
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 41af332443
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Debugger benchmarksParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 5 unstable metrics. See unchanged results
Request duration reports for reportsgantt
title reports - request duration [CI 0.99] : candidate=None, baseline=None
dateFormat X
axisFormat %s
section baseline
noprobe (336.051 µs) : 308, 365
. : milestone, 336,
basic (300.494 µs) : 293, 308
. : milestone, 300,
loop (8.985 ms) : 8980, 8991
. : milestone, 8985,
section candidate
noprobe (342.376 µs) : 301, 383
. : milestone, 342,
basic (300.82 µs) : 294, 308
. : milestone, 301,
loop (8.995 ms) : 8990, 9000
. : milestone, 8995,
|
Kafka / producer-benchmarkParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics. See unchanged results
|
Kafka / consumer-benchmarkParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics. See unchanged results
|
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 75 metrics, 8 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.63.0-SNAPSHOT~e6ba094024, baseline=1.63.0-SNAPSHOT~e6f76bbd49
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.077 s) : 0, 1076906
Total [baseline] (8.936 s) : 0, 8936411
Agent [candidate] (1.077 s) : 0, 1076579
Total [candidate] (8.949 s) : 0, 8949120
section iast
Agent [baseline] (1.252 s) : 0, 1252102
Total [baseline] (9.494 s) : 0, 9494253
Agent [candidate] (1.252 s) : 0, 1251589
Total [candidate] (9.514 s) : 0, 9514027
gantt
title insecure-bank - break down per module: candidate=1.63.0-SNAPSHOT~e6ba094024, baseline=1.63.0-SNAPSHOT~e6f76bbd49
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.244 ms) : 0, 1244
crashtracking [candidate] (1.226 ms) : 0, 1226
BytebuddyAgent [baseline] (643.438 ms) : 0, 643438
BytebuddyAgent [candidate] (642.922 ms) : 0, 642922
AgentMeter [baseline] (30.032 ms) : 0, 30032
AgentMeter [candidate] (29.971 ms) : 0, 29971
GlobalTracer [baseline] (250.075 ms) : 0, 250075
GlobalTracer [candidate] (250.398 ms) : 0, 250398
AppSec [baseline] (32.72 ms) : 0, 32720
AppSec [candidate] (32.814 ms) : 0, 32814
Debugger [baseline] (64.096 ms) : 0, 64096
Debugger [candidate] (64.168 ms) : 0, 64168
Remote Config [baseline] (609.856 µs) : 0, 610
Remote Config [candidate] (605.68 µs) : 0, 606
Telemetry [baseline] (8.454 ms) : 0, 8454
Telemetry [candidate] (8.504 ms) : 0, 8504
Flare Poller [baseline] (9.176 ms) : 0, 9176
Flare Poller [candidate] (9.15 ms) : 0, 9150
section iast
crashtracking [baseline] (1.222 ms) : 0, 1222
crashtracking [candidate] (1.231 ms) : 0, 1231
BytebuddyAgent [baseline] (830.082 ms) : 0, 830082
BytebuddyAgent [candidate] (829.248 ms) : 0, 829248
AgentMeter [baseline] (11.536 ms) : 0, 11536
AgentMeter [candidate] (11.558 ms) : 0, 11558
GlobalTracer [baseline] (243.928 ms) : 0, 243928
GlobalTracer [candidate] (244.049 ms) : 0, 244049
AppSec [baseline] (26.915 ms) : 0, 26915
AppSec [candidate] (27.799 ms) : 0, 27799
Debugger [baseline] (64.714 ms) : 0, 64714
Debugger [candidate] (63.942 ms) : 0, 63942
Remote Config [baseline] (516.668 µs) : 0, 517
Remote Config [candidate] (518.527 µs) : 0, 519
Telemetry [baseline] (7.961 ms) : 0, 7961
Telemetry [candidate] (8.013 ms) : 0, 8013
Flare Poller [baseline] (3.358 ms) : 0, 3358
Flare Poller [candidate] (3.335 ms) : 0, 3335
IAST [baseline] (25.053 ms) : 0, 25053
IAST [candidate] (25.065 ms) : 0, 25065
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.63.0-SNAPSHOT~e6ba094024, baseline=1.63.0-SNAPSHOT~e6f76bbd49
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.071 s) : 0, 1071479
Total [baseline] (11.21 s) : 0, 11210460
Agent [candidate] (1.071 s) : 0, 1071496
Total [candidate] (11.052 s) : 0, 11052376
section appsec
Agent [baseline] (1.273 s) : 0, 1273097
Total [baseline] (11.1 s) : 0, 11099581
Agent [candidate] (1.273 s) : 0, 1273177
Total [candidate] (11.129 s) : 0, 11128852
section iast
Agent [baseline] (1.253 s) : 0, 1253250
Total [baseline] (11.223 s) : 0, 11223114
Agent [candidate] (1.254 s) : 0, 1253793
Total [candidate] (11.258 s) : 0, 11258020
section profiling
Agent [baseline] (1.317 s) : 0, 1317112
Total [baseline] (11.125 s) : 0, 11125224
Agent [candidate] (1.315 s) : 0, 1314877
Total [candidate] (11.081 s) : 0, 11081431
section sca
Agent [baseline] (1.272 s) : 0, 1271671
Total [baseline] (11.089 s) : 0, 11089173
Agent [candidate] (1.309 s) : 0, 1308964
Total [candidate] (11.156 s) : 0, 11155778
gantt
title petclinic - break down per module: candidate=1.63.0-SNAPSHOT~e6ba094024, baseline=1.63.0-SNAPSHOT~e6f76bbd49
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.223 ms) : 0, 1223
crashtracking [candidate] (1.218 ms) : 0, 1218
BytebuddyAgent [baseline] (637.827 ms) : 0, 637827
BytebuddyAgent [candidate] (639.331 ms) : 0, 639331
AgentMeter [baseline] (30.154 ms) : 0, 30154
AgentMeter [candidate] (29.92 ms) : 0, 29920
GlobalTracer [baseline] (250.051 ms) : 0, 250051
GlobalTracer [candidate] (249.013 ms) : 0, 249013
AppSec [baseline] (32.581 ms) : 0, 32581
AppSec [candidate] (32.431 ms) : 0, 32431
Debugger [baseline] (63.675 ms) : 0, 63675
Debugger [candidate] (62.27 ms) : 0, 62270
Remote Config [baseline] (617.408 µs) : 0, 617
Remote Config [candidate] (602.498 µs) : 0, 602
Telemetry [baseline] (8.464 ms) : 0, 8464
Telemetry [candidate] (8.435 ms) : 0, 8435
Flare Poller [baseline] (9.943 ms) : 0, 9943
Flare Poller [candidate] (11.365 ms) : 0, 11365
section appsec
crashtracking [baseline] (1.241 ms) : 0, 1241
crashtracking [candidate] (1.22 ms) : 0, 1220
BytebuddyAgent [baseline] (680.02 ms) : 0, 680020
BytebuddyAgent [candidate] (680.779 ms) : 0, 680779
AgentMeter [baseline] (12.058 ms) : 0, 12058
AgentMeter [candidate] (12.164 ms) : 0, 12164
GlobalTracer [baseline] (248.777 ms) : 0, 248777
GlobalTracer [candidate] (249.151 ms) : 0, 249151
AppSec [baseline] (186.053 ms) : 0, 186053
AppSec [candidate] (186.212 ms) : 0, 186212
Debugger [baseline] (65.862 ms) : 0, 65862
Debugger [candidate] (64.533 ms) : 0, 64533
Remote Config [baseline] (574.118 µs) : 0, 574
Remote Config [candidate] (566.402 µs) : 0, 566
Telemetry [baseline] (7.78 ms) : 0, 7780
Telemetry [candidate] (7.802 ms) : 0, 7802
Flare Poller [baseline] (8.965 ms) : 0, 8965
Flare Poller [candidate] (8.976 ms) : 0, 8976
IAST [baseline] (24.722 ms) : 0, 24722
IAST [candidate] (24.74 ms) : 0, 24740
section iast
crashtracking [baseline] (1.229 ms) : 0, 1229
crashtracking [candidate] (1.222 ms) : 0, 1222
BytebuddyAgent [baseline] (829.507 ms) : 0, 829507
BytebuddyAgent [candidate] (829.57 ms) : 0, 829570
AgentMeter [baseline] (11.505 ms) : 0, 11505
AgentMeter [candidate] (11.562 ms) : 0, 11562
GlobalTracer [baseline] (242.507 ms) : 0, 242507
GlobalTracer [candidate] (242.007 ms) : 0, 242007
AppSec [baseline] (28.779 ms) : 0, 28779
AppSec [candidate] (26.044 ms) : 0, 26044
Debugger [baseline] (65.522 ms) : 0, 65522
Debugger [candidate] (69.015 ms) : 0, 69015
Remote Config [baseline] (527.665 µs) : 0, 528
Remote Config [candidate] (530.471 µs) : 0, 530
Telemetry [baseline] (8.101 ms) : 0, 8101
Telemetry [candidate] (8.161 ms) : 0, 8161
Flare Poller [baseline] (3.37 ms) : 0, 3370
Flare Poller [candidate] (3.48 ms) : 0, 3480
IAST [baseline] (25.29 ms) : 0, 25290
IAST [candidate] (25.296 ms) : 0, 25296
section profiling
crashtracking [baseline] (538.087 µs) : 0, 538
crashtracking [candidate] (541.284 µs) : 0, 541
BytebuddyAgent [baseline] (690.8 ms) : 0, 690800
BytebuddyAgent [candidate] (689.505 ms) : 0, 689505
AgentMeter [baseline] (9.386 ms) : 0, 9386
AgentMeter [candidate] (9.245 ms) : 0, 9245
GlobalTracer [baseline] (210.545 ms) : 0, 210545
GlobalTracer [candidate] (210.801 ms) : 0, 210801
AppSec [baseline] (32.615 ms) : 0, 32615
AppSec [candidate] (32.605 ms) : 0, 32605
Debugger [baseline] (68.092 ms) : 0, 68092
Debugger [candidate] (68.267 ms) : 0, 68267
Remote Config [baseline] (590.621 µs) : 0, 591
Remote Config [candidate] (583.444 µs) : 0, 583
Telemetry [baseline] (8.374 ms) : 0, 8374
Telemetry [candidate] (8.125 ms) : 0, 8125
Flare Poller [baseline] (3.591 ms) : 0, 3591
Flare Poller [candidate] (3.536 ms) : 0, 3536
ProfilingAgent [baseline] (94.138 ms) : 0, 94138
ProfilingAgent [candidate] (93.609 ms) : 0, 93609
Profiling [baseline] (94.692 ms) : 0, 94692
Profiling [candidate] (94.169 ms) : 0, 94169
section sca
crashtracking [baseline] (1.225 ms) : 0, 1225
crashtracking [candidate] (1.238 ms) : 0, 1238
BytebuddyAgent [baseline] (679.46 ms) : 0, 679460
BytebuddyAgent [candidate] (686.512 ms) : 0, 686512
AgentMeter [baseline] (12.187 ms) : 0, 12187
AgentMeter [candidate] (12.486 ms) : 0, 12486
GlobalTracer [baseline] (248.934 ms) : 0, 248934
GlobalTracer [candidate] (253.19 ms) : 0, 253190
AppSec [baseline] (185.31 ms) : 0, 185310
AppSec [candidate] (187.795 ms) : 0, 187795
Debugger [baseline] (65.757 ms) : 0, 65757
Debugger [candidate] (65.056 ms) : 0, 65056
Remote Config [baseline] (576.358 µs) : 0, 576
Remote Config [candidate] (592.635 µs) : 0, 593
Telemetry [baseline] (7.702 ms) : 0, 7702
Telemetry [candidate] (12.712 ms) : 0, 12712
Flare Poller [baseline] (8.915 ms) : 0, 8915
Flare Poller [candidate] (4.825 ms) : 0, 4825
IAST [baseline] (24.638 ms) : 0, 24638
IAST [candidate] (25.086 ms) : 0, 25086
ScaReachability [candidate] (22.195 ms) : 0, 22195
LoadParameters
See matching parameters
SummaryFound 2 performance improvements and 3 performance regressions! Performance is the same for 15 metrics, 16 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.63.0-SNAPSHOT~e6ba094024, baseline=1.63.0-SNAPSHOT~e6f76bbd49
dateFormat X
axisFormat %s
section baseline
no_agent (17.274 ms) : 17105, 17443
. : milestone, 17274,
appsec (18.931 ms) : 18743, 19119
. : milestone, 18931,
code_origins (17.875 ms) : 17696, 18054
. : milestone, 17875,
iast (17.778 ms) : 17606, 17951
. : milestone, 17778,
profiling (19.134 ms) : 18941, 19327
. : milestone, 19134,
tracing (17.757 ms) : 17584, 17931
. : milestone, 17757,
section candidate
no_agent (19.386 ms) : 19188, 19583
. : milestone, 19386,
appsec (18.483 ms) : 18295, 18671
. : milestone, 18483,
code_origins (18.204 ms) : 18025, 18384
. : milestone, 18204,
iast (17.937 ms) : 17757, 18117
. : milestone, 17937,
profiling (18.095 ms) : 17918, 18273
. : milestone, 18095,
tracing (17.783 ms) : 17610, 17956
. : milestone, 17783,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.63.0-SNAPSHOT~e6ba094024, baseline=1.63.0-SNAPSHOT~e6f76bbd49
dateFormat X
axisFormat %s
section baseline
no_agent (1.283 ms) : 1271, 1296
. : milestone, 1283,
iast (3.319 ms) : 3274, 3365
. : milestone, 3319,
iast_FULL (5.958 ms) : 5898, 6018
. : milestone, 5958,
iast_GLOBAL (3.729 ms) : 3672, 3786
. : milestone, 3729,
profiling (2.359 ms) : 2336, 2383
. : milestone, 2359,
tracing (1.905 ms) : 1889, 1921
. : milestone, 1905,
section candidate
no_agent (1.277 ms) : 1265, 1289
. : milestone, 1277,
iast (3.307 ms) : 3258, 3356
. : milestone, 3307,
iast_FULL (6.301 ms) : 6236, 6367
. : milestone, 6301,
iast_GLOBAL (3.786 ms) : 3718, 3854
. : milestone, 3786,
profiling (2.31 ms) : 2289, 2332
. : milestone, 2310,
tracing (1.912 ms) : 1897, 1927
. : milestone, 1912,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 2 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.63.0-SNAPSHOT~e6ba094024, baseline=1.63.0-SNAPSHOT~e6f76bbd49
dateFormat X
axisFormat %s
section baseline
no_agent (15.462 s) : 15462000, 15462000
. : milestone, 15462000,
appsec (14.609 s) : 14609000, 14609000
. : milestone, 14609000,
iast (18.247 s) : 18247000, 18247000
. : milestone, 18247000,
iast_GLOBAL (18.043 s) : 18043000, 18043000
. : milestone, 18043000,
profiling (14.907 s) : 14907000, 14907000
. : milestone, 14907000,
tracing (14.853 s) : 14853000, 14853000
. : milestone, 14853000,
section candidate
no_agent (15.238 s) : 15238000, 15238000
. : milestone, 15238000,
appsec (14.728 s) : 14728000, 14728000
. : milestone, 14728000,
iast (18.559 s) : 18559000, 18559000
. : milestone, 18559000,
iast_GLOBAL (18.122 s) : 18122000, 18122000
. : milestone, 18122000,
profiling (14.958 s) : 14958000, 14958000
. : milestone, 14958000,
tracing (14.893 s) : 14893000, 14893000
. : milestone, 14893000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.63.0-SNAPSHOT~e6ba094024, baseline=1.63.0-SNAPSHOT~e6f76bbd49
dateFormat X
axisFormat %s
section baseline
no_agent (1.492 ms) : 1480, 1503
. : milestone, 1492,
appsec (3.752 ms) : 3534, 3971
. : milestone, 3752,
iast (2.282 ms) : 2213, 2352
. : milestone, 2282,
iast_GLOBAL (2.335 ms) : 2265, 2405
. : milestone, 2335,
profiling (2.507 ms) : 2343, 2671
. : milestone, 2507,
tracing (2.098 ms) : 2044, 2151
. : milestone, 2098,
section candidate
no_agent (1.491 ms) : 1479, 1503
. : milestone, 1491,
appsec (3.85 ms) : 3626, 4075
. : milestone, 3850,
iast (2.283 ms) : 2214, 2353
. : milestone, 2283,
iast_GLOBAL (2.329 ms) : 2259, 2399
. : milestone, 2329,
profiling (2.114 ms) : 2059, 2169
. : milestone, 2114,
tracing (2.091 ms) : 2037, 2145
. : milestone, 2091,
|
Add dd.appsec.sca.enabled=true scenario to measure overhead introduced by ScaReachabilityTransformer alongside the existing appsec variant. - startup: new 'sca' variant with appsec+sca flags - load: new 'sca' server on port 8086 (cores 43-44), healthcheck range extended to 8086
In } catch (Exception e) {
log.warn("Failed to retransform classes", e);
}
Suggest widening to } catch (Throwable t) {
log.warn("Failed to retransform classes", t);
}This matches the |
InternalError and other JVM Errors from retransformClasses() would escape a catch(Exception) block and kill the telemetry thread silently. Matches the existing catch(Throwable) in transform() for the same reason.
| start_server "appsec" "-javaagent:${TRACER} -Ddd.appsec.enabled=true -Dserver.port=8083" "taskset -c 37-38 " & | ||
| start_server "iast" "-javaagent:${TRACER} -Ddd.iast.enabled=true -Dserver.port=8084" "taskset -c 39-40 " & | ||
| start_server "code_origins" "-javaagent:${TRACER} -Ddd.code.origin.for.spans.enabled=true -Dserver.port=8085" "taskset -c 41-42 " & | ||
| start_server "sca" "-javaagent:${TRACER} -Ddd.appsec.enabled=true -Ddd.appsec.sca.enabled=true -Dserver.port=8086" "taskset -c 43-44 " & |
There was a problem hiding this comment.
Did you validate that the cores 43 and 44 are available so we can pin?
There was a problem hiding this comment.
Not explicitly - assumed it from the pattern (second socket 24–47, existing servers use up to 41–42). That said, looking at the previous benchmark run the SCA variant was missing from the load results entirely; turns out we also forgot to add it to k6.js (fixed in baf5389). Will validate once the next benchmark run includes the complete setup.
|
|
||
| outputs.file(outputFile) | ||
| onlyIf { | ||
| project.hasProperty('refreshSca') || !outputFile.exists() |
There was a problem hiding this comment.
Are you planning to run this periodically or is it a one time only approach?
There was a problem hiding this comment.
Manual and on-demand. When the database team adds new symbols, we need to run ./gradlew generateScaCvesJson -PrefreshSca (the -PrefreshSca flag overrides the onlyIf skip), then commit the updated sca_cves.json. The task logs a reminder to do so. No automation planned.
Ideally in the future should be obtained via RC
| new ScaReachabilityDependencyRegistry(); | ||
|
|
||
| /** Keyed by {@link #depKey(String, String)}. */ | ||
| private final ConcurrentHashMap<String, DependencyState> dependencies = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
We have no upper bound for this map, should we allow it to grow indefinitely? we also never clear entries even when everything is reported.
There was a problem hiding this comment.
Good point. The map is naturally bounded, it grows by one entry per unique artifact@version of a vulnerable library actually loaded, which in practice is at most the number of entries in sca_cves.json (currently 49). Entries are intentionally never removed: the RFC stateful model requires re-reporting if a method hit arrives after an earlier drain, so the dependency state must persist across heartbeats (only pendingReport is cleared, not the entry itself).
That said, a defensive cap makes sense for when the database grows. I can add a MAX_TRACKED_DEPENDENCIES constant and silently skip new entries once it is reached (existing entries would still be updated). Would that address the concern?
…buildSrc Move the inline generateScaCvesJson task from appsec/build.gradle into a proper Gradle plugin (ScaEnrichmentsPlugin) registered as 'dd-trace-java.sca-enrichments'. The plugin is simpler to test and removes ~90 lines of scripting from the build script. The plugin and committed sca_cves.json are a temporary solution — the symbol database will be delivered via Remote Config in a future iteration, at which point both will be removed.
Use pluginManager.withPlugin("java") to defer the processResources
dependency until after the java plugin has registered the task, avoiding
a configuration-time failure when the plugin is applied before java.
Add 4 integration tests using GradleFixture/TestKit:
- task is SKIPPED when file exists and -PrefreshSca is absent
- task attempts to run (not SKIPPED) when -PrefreshSca is set
- task attempts to run (not SKIPPED) when file is absent
- processResources depends on generateScaCvesJson
| private static final java.util.Set<String> reported = | ||
| java.util.concurrent.ConcurrentHashMap.newKeySet(); |
There was a problem hiding this comment.
nit: Full Qualified Name
only LLM to generate that 😁
| if (reported.add(key)) { | ||
| h.onMethodHit(vulnId, artifact, version, dotClassName, methodName, line); | ||
| } | ||
| } catch (Throwable t) { |
There was a problem hiding this comment.
Throwable is usually too wide. it includes OutOfMemoryError and StackOverflowError which I think should be propagated anyhow.
use catch (Exception ex) instead
| } | ||
| } | ||
|
|
||
| static ScaCveDatabase parse(java.io.Reader reader) throws IOException { |
| return index.size(); | ||
| } | ||
|
|
||
| private static String readAll(java.io.Reader reader) throws IOException { |
| static StackTraceElement findCallsite(String vulnerableClass) { | ||
| return findCallsite(vulnerableClass, Thread.currentThread().getStackTrace()); | ||
| } |
There was a problem hiding this comment.
consider using datadog.trace.util.stacktrace.StackWalker to benefit the StackWalk API from JDK9
| ScaReachabilityDependencyRegistry.INSTANCE.registerCve( | ||
| entry.artifact(), version, entry.vulnId()); | ||
| if (dotClassName == null) { | ||
| dotClassName = className.replace('/', '.'); |
| */ | ||
| public void checkAlreadyLoadedClasses() { | ||
| for (Class<?> clazz : instrumentation.getAllLoadedClasses()) { | ||
| String internalName = clazz.getName().replace('.', '/'); |
There was a problem hiding this comment.
I had issues in the past using getAllLoadedClasses where some item are null. Probably caused by rare class unloading in between.
you should not assumed that all classes are not null
| if (!pendingRetransformNames.isEmpty()) { | ||
| Set<String> matched = new HashSet<>(); | ||
| for (Class<?> loaded : instrumentation.getAllLoadedClasses()) { | ||
| String name = loaded.getName().replace('.', '/'); |
| } | ||
| } | ||
|
|
||
| // package-private for testing |
| } | ||
| pendingReport = false; | ||
| List<CveSnapshot> cveSnapshots = new ArrayList<>(cves.size()); | ||
| for (java.util.Map.Entry<String, CveState> entry : cves.entrySet()) { |
There was a problem hiding this comment.
Hi Alejandro! Our benchmarks are being migrated to the apm-sdks-benchmarks implementation, e.g. https://github.com/DataDog/apm-sdks-benchmarks/blob/main/.gitlab/ci-java-load-parallel.yml and https://github.com/DataDog/apm-sdks-benchmarks/blob/main/.gitlab/ci-java-startup-parallel.yml, so ideally no more changes are made to the local benchmarks/ folder 😅
WDYT of this PR that ports your changes here to the new implementation? https://github.com/DataDog/apm-sdks-benchmarks/pull/161 (It's tested here: #11504)
Apologies for the confusion -- my plan was to remove these dd-trace-java benchmarks (#11502) earlier, but I had been waiting for more data from the apm-sdks-benchmarks one.
What Does This Do
Implements the SCA Reachability subsystem for the Java APM agent. When `DD_APPSEC_SCA_ENABLED=true`, the agent detects at runtime which classes and methods from vulnerable library versions are actually loaded and invoked, and reports this via `app-dependencies-loaded` telemetry using the RFC stateful heartbeat model.
Build - Symbol database
Detection - ClassFileTransformer (two-phase design)
`ScaReachabilityTransformer` uses a two-phase approach to keep JAR I/O off the class-loading thread:
Additional transformer details:
Callsite detection
`ScaReachabilitySystem.findCallsite()` walks the thread stack to report the application frame that called the vulnerable method, not the vulnerable method itself:
Telemetry - RFC stateful heartbeat
Benchmarks
Added `sca` variant (`-Ddd.appsec.enabled=true -Ddd.appsec.sca.enabled=true`) to startup and load petclinic benchmarks to measure `ScaReachabilityTransformer` overhead against the appsec baseline.
Motivation
Implements APPSEC-62260 - SCA Reachability runtime detection per the RFC.
Additional Notes
The
generateScaCvesJsonGradle plugin and the committedsca_cves.jsonare a temporary solution. In a future iteration the SCA symbol database will be delivered at runtime via Remote Config, at which point this plugin and the committed JSON file will be removed.`sca_cves.json` is committed to the repo because `sca-reachability-database` is a private repo not accessible from CI without a token. Only the maintainer runs the refresh task.
Method-level symbols in `sca_cves.json` (snakeyaml `load`/`loadAll`, xstream `fromXML`, etc.) are manually curated for testing. The database format does not yet define method-level entries; when it does, the Gradle task will be updated and the manual entries removed.
`AbstractStackWalker.isNotDatadogTraceStackElement` visibility changed from package-private to `public` to allow callsite filtering from the `appsec` module.
`sca_stack_exclusion.trie` is a curated copy of the IAST exclusion trie adapted for SCA callsite detection (testing framework entries removed). The trie cannot be reused directly from IAST without a circular dependency.
The `ScaReachabilitySmokeTest` extends `AbstractAppSecServerSmokeTest` and remains Groovy until that base class is migrated to Java.
Contributor Checklist
Use `solves` instead, and assign the PR milestone to the issue
Jira ticket: APPSEC-62260
Note: Once your PR is ready to merge, add it to the merge queue by commenting `/merge`. `/merge -c` cancels the queue request. `/merge -f --reason "reason"` skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.