Skip to content

add pagination & remote search flow for remote storage#9708

Closed
Light2Dark wants to merge 5 commits into
mainfrom
sham/storage-inspector-pagination
Closed

add pagination & remote search flow for remote storage#9708
Light2Dark wants to merge 5 commits into
mainfrom
sham/storage-inspector-pagination

Conversation

@Light2Dark

@Light2Dark Light2Dark commented May 28, 2026

Copy link
Copy Markdown
Collaborator

📝 Summary

Fixes #9662. This PR introduces a "Load more" button, that when clicked fetches more entries under the remote storage panel.

It also surfaces this button when a prefix search produces no entries, eg. folder/filename. A bug with Google Drive's file id has been fixed too.

Screen.Recording.2026-05-28.at.10.24.22.AM.mov

📋 Pre-Review Checklist

  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • Video or media evidence is provided for any visual changes (optional).

✅ Merge Checklist

  • I have read the contributor guidelines.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Tests have been added for the changes made.

@vercel

vercel Bot commented May 28, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Jun 9, 2026 7:26am

Request Review

@github-actions github-actions Bot added the bash-focus Area to focus on during release bug bash label May 28, 2026

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 16 files

Architecture diagram
sequenceDiagram
    participant UI as React UI (StorageInspector)
    participant Hook as useStorageEntries Hook
    participant State as Jotai Store (state.ts)
    participant API as Server API (commands.py)
    participant Backend as StorageBackend (storage.py)
    participant Obstore as Obstore (S3/GCS)
    participant Fsspec as FsspecFilesystem (local/SFTP)

    Note over UI,Fsspec: Storage List with Pagination

    UI->>Hook: Expand directory or section
    Hook->>State: Check cache (entriesByPath)
    alt Cache miss
        State-->>Hook: No cached entries
        Hook->>API: NEW: listEntries(namespace, prefix, limit, pageToken=null)
        API->>Backend: list_entries(prefix, limit, page_token)
        
        alt Obstore backend
            Backend->>Obstore: list_with_delimiter(prefix)
            Obstore-->>Backend: Raw entries (may be truncated at 1000)
            Backend->>Backend: _paginate_entries(offset=0, limit)
            Backend-->>API: StorageListResult(entries, nextPageToken, mayHaveMore)
        else Fsspec backend
            Backend->>Fsspec: ls(prefix, detail=True)
            Fsspec-->>Backend: File list
            Backend->>Backend: _paginate_entries(offset=0, limit)
            Backend-->>API: StorageListResult(entries, nextPageToken)
        end
        
        API-->>Hook: NEW: StorageEntriesNotification(entries, nextPageToken, mayHaveMore)
        Hook->>State: NEW: setEntries({entries, nextPageToken, mayHaveMore})
        State-->>Hook: entries, hasMore, mayHaveMore
        Hook-->>UI: Render entries + "Load more" button
    else Cache hit
        State-->>Hook: Cached entries + pagination metadata
        Hook-->>UI: Render entries
        alt hasMore (nextPageToken != null)
            UI->>Hook: Show "Load more" button
        else mayHaveMore (no token, but may exist)
            UI->>Hook: Show "May exist more" indicator
        end
    end

    Note over UI,Hook: Load More Click

    alt hasMore is true
        UI->>Hook: loadMore()
        Hook->>Hook: setIsLoadingMore(true)
        Hook->>API: NEW: listEntries(namespace, prefix, limit, pageToken=nextPageToken)
        API->>Backend: list_entries(prefix, limit, page_token)
        
        alt Obstore backend
            Backend->>Obstore: list_with_delimiter(prefix)
            Obstore-->>Backend: Raw entries
            Backend->>Backend: _paginate_entries(offset=parseInt(pageToken), limit)
            Backend-->>API: StorageListResult(entries, nextPageToken, mayHaveMore)
        else Fsspec backend
            Backend->>Fsspec: ls(prefix, detail=True)
            Fsspec-->>Backend: File list
            Backend->>Backend: _paginate_entries(offset=parseInt(pageToken), limit)
            Backend-->>API: StorageListResult(entries, nextPageToken)
        end
        
        API-->>Hook: NEW: StorageEntriesNotification(entries, nextPageToken, mayHaveMore)
        Hook->>State: NEW: setEntries({entries, append:true, nextPageToken, mayHaveMore})
        State-->>Hook: Appended entries + updated metadata
        Hook-->>UI: Render appended entries
        alt Has more pages
            UI->>Hook: Keep "Load more" button
        else No more pages but mayHaveMore
            UI->>Hook: Show "Maybe more" indicator
        end
    end

    Note over Hook,State: Error Handling

    alt Load more fails
        Hook->>Hook: catch error -> setLoadMoreError
        Hook-->>UI: Show error message with retry
        UI->>Hook: loadMore() retry
    end
Loading

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread marimo/_data/_external_storage/storage.py Outdated
Comment thread frontend/src/core/storage/state.ts Outdated
@Light2Dark Light2Dark changed the title implement pagination flow for remote storage implement pagination & remote search flow for remote storage May 28, 2026
@Light2Dark Light2Dark changed the title implement pagination & remote search flow for remote storage add pagination & remote search flow for remote storage May 28, 2026

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 6 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread frontend/src/core/storage/state.ts Outdated
Comment thread frontend/src/components/storage/storage-inspector.tsx Outdated
@Light2Dark Light2Dark added the bug Something isn't working label May 28, 2026
@Light2Dark Light2Dark marked this pull request as ready for review June 1, 2026 16:24
Copilot AI review requested due to automatic review settings June 1, 2026 16:25

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses remote storage listings being effectively truncated (e.g., delimiter-based object store listings commonly cap results), by introducing a pagination model plus a “remote search continuation” flow so users can browse/search beyond the first page of results (Fixes #9662).

Changes:

  • Add page-based listing support to the runtime storage command/notification flow (page_token request; next_page_token + may_have_more response metadata).
  • Implement offset-based pagination and provider-truncation detection in external storage backends (obstore/fsspec) via a StorageListResult return type.
  • Update tests (backend + runtime + frontend) to cover pagination metadata, “load more”, and remote-search continuation behavior.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.

Show a summary per file
File Description
marimo/_data/_external_storage/models.py Introduces StorageListResult and updates the storage backend interface to return paginated results.
marimo/_data/_external_storage/storage.py Implements offset-based pagination helpers and backend-specific listing behavior, including may_have_more for provider truncation.
marimo/_runtime/commands.py Extends StorageListEntriesCommand with page_token.
marimo/_runtime/callbacks/external_storage.py Threads pagination parameters through to backends and emits pagination metadata in StorageEntriesNotification.
marimo/_messaging/notification.py Adds next_page_token and may_have_more fields to StorageEntriesNotification.
marimo/_server/models/models.py Ensures server request models forward page_token into runtime commands.
packages/openapi/api.yaml Updates OpenAPI schemas for new pagination request/notification fields.
packages/openapi/src/api.ts Updates generated TS API types to match the OpenAPI pagination schema changes.
frontend/src/core/storage/types.ts Adds pagination metadata types and updates storageUrl signature.
frontend/src/core/storage/state.ts Stores per-path pagination metadata and implements “load more” fetching/append behavior.
frontend/src/components/storage/storage-inspector.tsx Adds UI/logic for “Load more”, “may have more” messaging, and remote-search continuation flow.
tests/_data/_external_storage/test_storage_models.py Adds/updates backend tests for StorageListResult pagination tokens and provider-boundary signaling.
tests/_runtime/test_runtime_external_storage.py Adds runtime-level tests for paging requests and notifications carrying pagination metadata.
frontend/src/core/storage/tests/types.test.ts Updates tests for the new storageUrl signature/behavior.
frontend/src/core/storage/tests/state.test.ts Updates storage state tests to validate pagination metadata storage and cache behavior.
frontend/src/core/storage/tests/useStorageEntries.test.tsx Adds coverage for loadMore behavior, append semantics, and pagination metadata handling.
frontend/src/components/storage/tests/storage-inspector.test.ts Adds coverage for remote-search prefix derivation and filtering behavior.

Light2Dark added a commit that referenced this pull request Jun 2, 2026
## 📝 Summary

<!--
If this PR closes any issues, list them here by number (e.g., Closes
#123).

Detail the specific changes made in this pull request. Explain the
problem addressed and how it was resolved. If applicable, provide before
and after comparisons, screenshots, or any relevant details to help
reviewers understand the changes easily.
-->
These fields are to indicate whether a table's schemas has been loaded
vs truly empty.
That allows the frontend to distinguish whether to hide empty vs lazy
(which we shouldn't hide).

This change makes sense for future work, since it's important to
distinguish between truly empty vs not yet loaded. Another option was
considered: `list[DataTable] | None`, where `None` indicates not yet
loaded. However, this is semantically not clear even if a simpler code
change.

This is in support of this PR
#9708

## 📋 Pre-Review Checklist
<!-- These checks need to be completed before a PR is reviewed -->

- [x] For large changes, or changes that affect the public API: this
change was discussed or approved through an issue, on
[Discord](https://marimo.io/discord?ref=pr), or the community
[discussions](https://github.com/marimo-team/marimo/discussions) (Please
provide a link if applicable).
- [x] Any AI generated code has been reviewed line-by-line by the human
PR author, who stands by it.
- [ ] Video or media evidence is provided for any visual changes
(optional). <!-- PR is more likely to be merged if evidence is provided
for changes made -->

## ✅ Merge Checklist

- [x] I have read the [contributor
guidelines](https://github.com/marimo-team/marimo/blob/main/CONTRIBUTING.md).
- [ ] Documentation has been updated where applicable, including
docstrings for API changes.
- [x] Tests have been added for the changes made.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@Light2Dark

Copy link
Copy Markdown
Collaborator Author

The PR is quite lengthy, sorry about that. Might be worth just testing if the UX is fine.

The following code snippets connects to your own Google Drive. Or to a public S3 bucket.

from gdrive_fsspec import GoogleDriveFileSystem

fs = GoogleDriveFileSystem(use_listings_cache=False)
import os

for key in [
    "AWS_ACCESS_KEY_ID",
    "AWS_SECRET_ACCESS_KEY",
    "AWS_SESSION_TOKEN",
    "AWS_PROFILE",
]:
    os.environ.pop(key, None)
os.environ["AWS_SKIP_SIGNATURE"] = "true"
from obstore.store import S3Store

store = S3Store(
    "noaa-goes16",
    region="us-east-1",
    skip_signature=True,
)

@Light2Dark Light2Dark marked this pull request as draft June 9, 2026 09:31
Light2Dark added a commit that referenced this pull request Jun 9, 2026
## 📝 Summary

Split out from #9708.

Some backends (e.g. Google Drive) allow multiple files to share the same
path, which made entry rows collide when keyed by path alone. This
surfaces the backend's stable `id` through the fsspec backend and
prefers it when keying entry rows in the storage inspector, falling back
to `path + index` when no id is available.

## 📋 Pre-Review Checklist
<!-- These checks need to be completed before a PR is reviewed -->

- [x] For large changes, or changes that affect the public API: this
change was discussed or approved through an issue, on
[Discord](https://marimo.io/discord?ref=pr), or the community
[discussions](https://github.com/marimo-team/marimo/discussions) (Please
provide a link if applicable).
- [x] Any AI generated code has been reviewed line-by-line by the human
PR author, who stands by it.
- [x] Video or media evidence is provided for any visual changes
(optional). <!-- PR is more likely to be merged if evidence is provided
for changes made -->

## ✅ Merge Checklist

- [x] I have read the [contributor
guidelines](https://github.com/marimo-team/marimo/blob/main/CONTRIBUTING.md).
- [ ] Documentation has been updated where applicable, including
docstrings for API changes.
- [x] Tests have been added for the changes made.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@Light2Dark Light2Dark closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bash-focus Area to focus on during release bug bash bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote storage limits fo view

2 participants