Skip to content

Conversation

@lievan
Copy link
Contributor

@lievan lievan commented Jun 24, 2025

Tracks number of tokens read from and written to the prompt cache for anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

anthropic returns cache_creation/read_input_tokens in their usage field.

We map these to cache_write/read_input_tokens keys in our metrics field.

Testing is blocked on DataDog/dd-apm-test-agent#217

implementation note

Right now, we are using get_llmobs_metrics_tags to set metrics for anthropic, which depends on using set_metric and get_metric. We do not want to continue this pattern for prompt caching, so we instead directly extract it out from response.usagefield.

The caveat is that for the streamed case, the usage field is a dictionary that is manually constructed by us when parsing out streamed chunks

Follow ups

  1. Move all the unit tests to use llmobs_events fixture
  2. De-couple metrics parsing from set/get metrics completely

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@github-actions
Copy link
Contributor

github-actions bot commented Jun 24, 2025

CODEOWNERS have been resolved as:

releasenotes/notes/ant-p-cache-3d4001a431cedd67.yaml                    @DataDog/apm-python
tests/contrib/anthropic/cassettes/anthropic_completion_cache_read.yaml  @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_cache_write.yaml  @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_stream_cache_read.yaml  @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_stream_cache_write.yaml  @DataDog/ml-observability
ddtrace/contrib/internal/anthropic/_streaming.py                        @DataDog/ml-observability
ddtrace/llmobs/_integrations/anthropic.py                               @DataDog/ml-observability
tests/contrib/anthropic/test_anthropic_llmobs.py                        @DataDog/ml-observability

@github-actions
Copy link
Contributor

github-actions bot commented Jun 24, 2025

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 275 ± 4 ms.

The average import time from base is: 281 ± 4 ms.

The import time difference between this PR and base is: -5.1 ± 0.2 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 2.349 ms (0.85%)
ddtrace.bootstrap.sitecustomize 1.667 ms (0.61%)
ddtrace.bootstrap.preload 1.547 ms (0.56%)
ddtrace.internal.remoteconfig.client 0.705 ms (0.26%)
ddtrace.appsec._common_module_patches 0.120 ms (0.04%)
ddtrace.appsec._asm_request_context 0.120 ms (0.04%)
ddtrace.appsec._utils 0.120 ms (0.04%)
ddtrace 0.682 ms (0.25%)
ddtrace.internal._unpatched 0.034 ms (0.01%)
json 0.034 ms (0.01%)
json.decoder 0.034 ms (0.01%)
re 0.034 ms (0.01%)
enum 0.034 ms (0.01%)
types 0.034 ms (0.01%)

@pr-commenter
Copy link

pr-commenter bot commented Jun 24, 2025

Benchmarks

Benchmark execution time: 2025-07-04 19:17:10

Comparing candidate commit 43deda5 in PR branch evan.li/anthropic-prompt-caching with baseline commit a8419a4 in branch main.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 546 metrics, 3 unstable metrics.

scenario:iastaspectsospath-ospathsplitdrive_aspect

  • 🟥 execution_time [+262.999ns; +374.999ns] or [+7.171%; +10.225%]

@lievan lievan marked this pull request as ready for review June 24, 2025 20:04
@lievan lievan requested review from a team as code owners June 24, 2025 20:04
@lievan lievan requested review from ZStriker19 and nsrip-dd June 24, 2025 20:04
lievan added 2 commits June 24, 2025 14:25
@lievan lievan requested review from a team as code owners July 3, 2025 15:50
@lievan lievan requested review from gnufede and juanjux July 3, 2025 15:50
@lievan lievan merged commit 9776e38 into main Jul 7, 2025
468 of 469 checks passed
@lievan lievan deleted the evan.li/anthropic-prompt-caching branch July 7, 2025 16:29
alyshawang pushed a commit that referenced this pull request Jul 25, 2025
Tracks number of tokens read from and written to the prompt cache for
anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

anthropic returns `cache_creation/read_input_tokens` in their usage
field.

We map these to `cache_write/read_input_tokens` keys in our `metrics`
field.

Testing is blocked on
DataDog/dd-apm-test-agent#217

### implementation note
Right now, we are using `get_llmobs_metrics_tags` to set metrics for
anthropic, which depends on using `set_metric` and `get_metric`. We do
not want to continue this pattern for prompt caching, so we instead
directly extract it out from `response.usage`field.

The caveat is that for the streamed case, the `usage` field is a
dictionary that is manually constructed by us when parsing out streamed
chunks

### Follow ups
1. Move all the unit tests to use `llmobs_events` fixture
2. De-couple `metrics` parsing from set/get metrics completely

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants