Skip to content

dynamic_modules: add remote HTTP support for asyncDataSource module binary#43818

Merged
wbpcode merged 6 commits intoenvoyproxy:mainfrom
kanurag94:dym-module-http
Mar 11, 2026
Merged

dynamic_modules: add remote HTTP support for asyncDataSource module binary#43818
wbpcode merged 6 commits intoenvoyproxy:mainfrom
kanurag94:dym-module-http

Conversation

@kanurag94
Copy link
Copy Markdown
Contributor

@kanurag94 kanurag94 commented Mar 6, 2026

Commit Message: dynamic_modules: add remote HTTP support for asyncDataSource module binary
Additional Description: Extends the asyncDataSource support (added in #43333 for local.filename) to support fetching dynamic module binaries from remote HTTP sources via module.remote in DynamicModuleConfig. The module is downloaded asynchronously during listener initialization with SHA256 verification, written to a temporary file, and loaded via dlopen. If the remote fetch fails, the filter is not installed and requests pass through (fail-open).
Risk Level: Low
Testing: Added unit tests + Manually tested
Docs Changes: API docs already a part of this PR
Release Notes: Added

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
@repokitteh-read-only
Copy link
Copy Markdown

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #43818 was opened by kanurag94.

see: more, trace.

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
@kanurag94
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to fetch dynamic modules from remote HTTP sources asynchronously. However, critical security vulnerabilities were identified in the caching and temporary file handling mechanisms. These include the use of the world-writable /tmp directory with predictable filenames and a lack of integrity verification for cached files, which could lead to local privilege escalation and arbitrary code execution. The staging file creation is also vulnerable to symlink attacks and race conditions. Furthermore, a critical use-after-free bug was found in the asynchronous callback implementation.

Comment on lines +146 to +147
[weak_state, proto_config_copy, module_config_copy, &context,
&scope](const std::string& data) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The lambda for the RemoteAsyncDataProvider captures context and scope by reference. These references point to objects that are typically allocated on the stack during the configuration loading process. Since the lambda is executed asynchronously after the remote data fetch completes, the original stack frame will be gone, leading to dangling references and a use-after-free. This is a critical issue that could lead to crashes.

To resolve this, you should avoid capturing stack variables by reference in an asynchronous callback. Instead, you should extract the required long-lived objects (e.g., from context.api(), context.mainThreadDispatcher(), etc.) and capture those. This will likely require refactoring buildFilterFactoryCallback to accept these individual components instead of the entire ServerFactoryContext.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ServerFactoryContext is long-lived and not stack allocated.
  • stats::scope is owned by listener and is again long lived.
  • RemoteAsyncDataProvider is fired during init so both references are guaranteed valid.

This comment seems wrong.

Comment thread source/extensions/dynamic_modules/dynamic_modules.cc Outdated
Comment thread source/extensions/dynamic_modules/dynamic_modules.cc Outdated
Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
@kanurag94
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@kanurag94
Copy link
Copy Markdown
Contributor Author

/retest

@kanurag94 kanurag94 changed the title dynamic_modules: asyncDataSource - http support dynamic_modules: add remote HTTP support for asyncDataSource module binary Mar 9, 2026
@kanurag94 kanurag94 marked this pull request as ready for review March 9, 2026 03:38
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/coverage-shephards: FYI only for changes made to (test/coverage.yaml).
envoyproxy/coverage-shephards assignee is @RyanTheOptimist
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @wbpcode
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #43818 was ready_for_review by kanurag94.

see: more, trace.

Comment thread test/coverage.yaml Outdated
source/extensions/filters/http/cache: 95.9
source/extensions/filters/http/dynamic_forward_proxy: 94.8
source/extensions/filters/http/dynamic_modules: 95.2
source/extensions/filters/http/dynamic_modules: 95.0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really have to decrease the coverage here? is there no way to have an integration test to cover everything?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coverage drop comes mainly from defensive paths:

  • mkstemp() failure -- don't know how clearly but probably a exhaustion of FDs should only trigger this.
  • write() failure -- needs interrupt signals or disk permission issues
  • std::filesystem::permissions/rename failure -- needs mocking read only permissions.
  • weak_ptr expiry -- needs a race condition to trigger.

They'd need exceptional situations to happen. All of them are very tought for UTs so I lowered the coverage instead.

Writing an integration test is quite hard for this one, if the approach looks good -- I can give it a try.

I have tried to increase coverage back with some possible paths that can be tested.

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
cb_or_error.status().message());
return;
}
state->filter_factory_cb = cb_or_error.value();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the following lambda will be called at any threads, I think there may result in unexpected contention? here?

[async_state](Http::FilterChainFactoryCallbacks& callbacks) -> void {
    if (async_state->filter_factory_cb) {
      async_state->filter_factory_cb(callbacks);
    }
  };

As the first version, to simplify the review, we I think we and use a lock and left a TODO here. Then, we can resolve the lock with another PR. cc @agrawroh WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, the remote fetching will pause the warming...so, it may be safe

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbpcode The remote fetching will pause the warming...so, it may be safe...

Yes, that's my impression as well.

@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Mar 10, 2026

After re-thinking of the implementation, I changed a little of my mind. There three different solutions for async data fetching:

  1. The current RemoteAsyncDataProvider. It will block the warming of filter_chain/listener until initial fetching success or failure. But it cannot support the per route level filter configuration or route level filter chain (by composite/filter_chain or any similar thing)
  2. always nack if cache miss and start an sync task at background. Accept the configuration until the data is downloaded. Will result in some configuration rejections and in worst case, may block bootstrap of Envoy if the remote data server is down.
  3. always accept the configuration first and fetch the data at background. May result in in consistent behavior or even security problem.

I am thinking maybe 2 is better way to go or hybriding 1 and 2. WDYT? cc @agrawroh @kanurag94

@kanurag94
Copy link
Copy Markdown
Contributor Author

After re-thinking of the implementation, I changed a little of my mind. There three different solutions for async data fetching:

  1. The current RemoteAsyncDataProvider. It will block the warming of filter_chain/listener until initial fetching success or failure. But it cannot support the per route level filter configuration or route level filter chain (by composite/filter_chain or any similar thing)
  2. always nack if cache miss and start an sync task at background. Accept the configuration until the data is downloaded. Will result in some configuration rejections and in worst case, may block bootstrap of Envoy if the remote data server is down.
  3. always accept the configuration first and fetch the data at background. May result in in consistent behavior or even security problem.

I am thinking maybe 2 is better way to go or hybriding 1 and 2. WDYT? cc @agrawroh @kanurag94

I agree. I wanted to implement 1 in this PR, and 2 in a follow up since we discussed splitting it into two PRs when I raised #43333 (nack_on_cache_miss). I can add the functionality here if needed. Or document the per-route level behavior (though I will raise a PR very soon).

@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Mar 11, 2026

After re-thinking of the implementation, I changed a little of my mind. There three different solutions for async data fetching:

  1. The current RemoteAsyncDataProvider. It will block the warming of filter_chain/listener until initial fetching success or failure. But it cannot support the per route level filter configuration or route level filter chain (by composite/filter_chain or any similar thing)
  2. always nack if cache miss and start an sync task at background. Accept the configuration until the data is downloaded. Will result in some configuration rejections and in worst case, may block bootstrap of Envoy if the remote data server is down.
  3. always accept the configuration first and fetch the data at background. May result in in consistent behavior or even security problem.

I am thinking maybe 2 is better way to go or hybriding 1 and 2. WDYT? cc @agrawroh @kanurag94

I agree. I wanted to implement 1 in this PR, and 2 in a follow up since we discussed splitting it into two PRs when I raised #43333 (nack_on_cache_miss). I can add the functionality here if needed. Or document the per-route level behavior (though I will raise a PR very soon).

SGTM. :)

@agrawroh
Copy link
Copy Markdown
Member

@kanurag94 @wbpcode

Option 2 would require a module cache which is non-trivial so I'm okay with this PR in it's current form, and I agree with the plan to add Option 2 as a follow-up. Per-route limitation is pre-existing and it's acceptable.

Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@wbpcode wbpcode merged commit cfa9b86 into envoyproxy:main Mar 11, 2026
30 checks passed
@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Mar 11, 2026

I think than we can add in following PRs:

  • cache support
  • nack on cache miss

bmjask pushed a commit to bmjask/envoy that referenced this pull request Mar 14, 2026
…inary (envoyproxy#43818)

**Commit Message**: dynamic_modules: add remote HTTP support for
asyncDataSource module binary
**Additional Description**: Extends the asyncDataSource support (added
in envoyproxy#43333 for local.filename) to support fetching dynamic module
binaries from remote HTTP sources via module.remote in
DynamicModuleConfig. The module is downloaded asynchronously during
listener initialization with SHA256 verification, written to a temporary
file, and loaded via dlopen. If the remote fetch fails, the filter is
not installed and requests pass through (fail-open).
**Risk Level**: Low
**Testing**: Added unit tests + Manually tested
**Docs Changes**: API docs already a part of this PR
**Release Notes**: Added

---------

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
Signed-off-by: bjmask <11672696+bjmask@users.noreply.github.com>
wbpcode pushed a commit that referenced this pull request Mar 18, 2026
**Commit Message**: dynamic_modules: remote source: add caching for
fetched modules

**Motivation**
Follow up from #43818 and #43333
Next PR: `nack_on_cache_miss` support (NACK on cache miss + background
fetch)

**Additional Description**: 

`newDynamicModuleFromBytes` already writes fetched modules to a
deterministic path based on SHA256. This PR adds a check before the
async fetch: if the file already exists on disk, load it directly and
skip the HTTP fetch entirely.

It is a simple approach that forms base for the next PR.

**Risk Level**: Low
**Testing**: Added unit tests
**Docs Changes**: NA
**Release Notes**: Added
Platform Specific Features: NA
[Optional Runtime guard:] Not needed
[Optional Fixes #Issue]: NA
[Optional Fixes commit #PR or SHA]: NA
[Optional Deprecated:]: NA
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]
NA

---------

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
@isker
Copy link
Copy Markdown
Contributor

isker commented Mar 18, 2026

Could the "from bytes" support added here be used to enable local inline_bytes support as well? https://www.envoyproxy.io/docs/envoy/v1.37.1/api-v3/config/core/v3/base.proto#envoy-v3-api-msg-config-core-v3-datasource

@kanurag94
Copy link
Copy Markdown
Contributor Author

Could the "from bytes" support added here be used to enable local inline_bytes support as well? https://www.envoyproxy.io/docs/envoy/v1.37.1/api-v3/config/core/v3/base.proto#envoy-v3-api-msg-config-core-v3-datasource

We discussed this in review and concluded that it may not be actually ever useful to have inline_bytes except for completion. #43333 (comment)

fishcakez pushed a commit to fishcakez/envoy that referenced this pull request Mar 25, 2026
…inary (envoyproxy#43818)

**Commit Message**: dynamic_modules: add remote HTTP support for
asyncDataSource module binary
**Additional Description**: Extends the asyncDataSource support (added
in envoyproxy#43333 for local.filename) to support fetching dynamic module
binaries from remote HTTP sources via module.remote in
DynamicModuleConfig. The module is downloaded asynchronously during
listener initialization with SHA256 verification, written to a temporary
file, and loaded via dlopen. If the remote fetch fails, the filter is
not installed and requests pass through (fail-open).
**Risk Level**: Low
**Testing**: Added unit tests + Manually tested
**Docs Changes**: API docs already a part of this PR
**Release Notes**: Added

---------

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
fishcakez pushed a commit to fishcakez/envoy that referenced this pull request Mar 25, 2026
…yproxy#43974)

**Commit Message**: dynamic_modules: remote source: add caching for
fetched modules

**Motivation**
Follow up from envoyproxy#43818 and envoyproxy#43333
Next PR: `nack_on_cache_miss` support (NACK on cache miss + background
fetch)

**Additional Description**: 

`newDynamicModuleFromBytes` already writes fetched modules to a
deterministic path based on SHA256. This PR adds a check before the
async fetch: if the file already exists on disk, load it directly and
skip the HTTP fetch entirely.

It is a simple approach that forms base for the next PR.

**Risk Level**: Low
**Testing**: Added unit tests
**Docs Changes**: NA
**Release Notes**: Added
Platform Specific Features: NA
[Optional Runtime guard:] Not needed
[Optional Fixes #Issue]: NA
[Optional Fixes commit #PR or SHA]: NA
[Optional Deprecated:]: NA
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]
NA

---------

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
TAOXUY pushed a commit to TAOXUY/envoy that referenced this pull request Apr 1, 2026
…yproxy#43974)

**Commit Message**: dynamic_modules: remote source: add caching for
fetched modules

**Motivation**
Follow up from envoyproxy#43818 and envoyproxy#43333
Next PR: `nack_on_cache_miss` support (NACK on cache miss + background
fetch)

**Additional Description**:

`newDynamicModuleFromBytes` already writes fetched modules to a
deterministic path based on SHA256. This PR adds a check before the
async fetch: if the file already exists on disk, load it directly and
skip the HTTP fetch entirely.

It is a simple approach that forms base for the next PR.

**Risk Level**: Low
**Testing**: Added unit tests
**Docs Changes**: NA
**Release Notes**: Added
Platform Specific Features: NA
[Optional Runtime guard:] Not needed
[Optional Fixes #Issue]: NA
[Optional Fixes commit #PR or SHA]: NA
[Optional Deprecated:]: NA
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]
NA

---------

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
Signed-off-by: Xuyang Tao <taoxuy@google.com>
nshipilov pushed a commit to nshipilov/envoy that referenced this pull request Apr 13, 2026
…yproxy#43974)

**Commit Message**: dynamic_modules: remote source: add caching for
fetched modules

**Motivation**
Follow up from envoyproxy#43818 and envoyproxy#43333
Next PR: `nack_on_cache_miss` support (NACK on cache miss + background
fetch)

**Additional Description**:

`newDynamicModuleFromBytes` already writes fetched modules to a
deterministic path based on SHA256. This PR adds a check before the
async fetch: if the file already exists on disk, load it directly and
skip the HTTP fetch entirely.

It is a simple approach that forms base for the next PR.

**Risk Level**: Low
**Testing**: Added unit tests
**Docs Changes**: NA
**Release Notes**: Added
Platform Specific Features: NA
[Optional Runtime guard:] Not needed
[Optional Fixes #Issue]: NA
[Optional Fixes commit #PR or SHA]: NA
[Optional Deprecated:]: NA
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]
NA

---------

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
Signed-off-by: Nick Shipilov <nick.shipilov.n@gmail.com>
kpramesh2212 pushed a commit to kpramesh2212/envoy that referenced this pull request Apr 14, 2026
…yproxy#43974)

**Commit Message**: dynamic_modules: remote source: add caching for
fetched modules

**Motivation**
Follow up from envoyproxy#43818 and envoyproxy#43333
Next PR: `nack_on_cache_miss` support (NACK on cache miss + background
fetch)

**Additional Description**: 

`newDynamicModuleFromBytes` already writes fetched modules to a
deterministic path based on SHA256. This PR adds a check before the
async fetch: if the file already exists on disk, load it directly and
skip the HTTP fetch entirely.

It is a simple approach that forms base for the next PR.

**Risk Level**: Low
**Testing**: Added unit tests
**Docs Changes**: NA
**Release Notes**: Added
Platform Specific Features: NA
[Optional Runtime guard:] Not needed
[Optional Fixes #Issue]: NA
[Optional Fixes commit #PR or SHA]: NA
[Optional Deprecated:]: NA
[Optional [API
Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]
NA

---------

Signed-off-by: Anurag Aggarwal <kanurag94@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants