Skip to content

Comments

Add Colocated Python Checkpointing#3078

Open
SujeethJinesh wants to merge 2 commits intomainfrom
sujinesh/colocated_python_checkpointing
Open

Add Colocated Python Checkpointing#3078
SujeethJinesh wants to merge 2 commits intomainfrom
sujinesh/colocated_python_checkpointing

Conversation

@SujeethJinesh
Copy link
Collaborator

@SujeethJinesh SujeethJinesh commented Feb 4, 2026

Description

Enable Experimental Colocated Python Checkpointing experience for Pathways on Cloud.

This feature enables using Orbax's Colocated Python Dispatchers as an alternative to Persistence API. It also enables users to save and restore checkpoints using zarr3 and ocdbt tensorstore formats. Currently in the process of performance tuning.

Further integration/unit testing will be done in a future PR once Colocated Python testing is incorporated.

This relies on Orbax version >=0.11.33.

FIXES: b/388583223

Tests

Tested manually across a test matrix at go/colocated-python-checkpointing-in-maxtext. Integration tests will need to be added once building a colocated python sidecar is supported in MaxText (WIP).

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@SujeethJinesh SujeethJinesh force-pushed the sujinesh/colocated_python_checkpointing branch 3 times, most recently from 26a5d3c to fd94f9f Compare February 6, 2026 20:30
@codecov
Copy link

codecov bot commented Feb 6, 2026

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/common/checkpointing.py 0.00% 3 Missing and 1 partial ⚠️
src/maxtext/utils/train_utils.py 0.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@SujeethJinesh SujeethJinesh force-pushed the sujinesh/colocated_python_checkpointing branch 11 times, most recently from c72c2d2 to 64854e8 Compare February 11, 2026 00:03
@SujeethJinesh SujeethJinesh force-pushed the sujinesh/colocated_python_checkpointing branch 14 times, most recently from 3e2e2da to 9c5d207 Compare February 20, 2026 22:42
uritemplate>=4.2.0
urllib3>=2.5.0
uvicorn>=0.38.0
uvloop>=0.19.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi we usually don't recommend directly editing dependencies under generated_requirements folder. These two txt files are generated from base_requirements as in this guide. You need to edit base requirements, run seed-env to generated a new set of generated requirements.

Your current patch can work temporarily, but if someone else generated new requirement files, your current change will be lost without notice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created b/486268025 to try to figure out the issue, but it looks like when I follow that procedure, we get into a severe dependency hell with cloud-tpu-diagnostics and some other subsequent libraries.

I think for the purposes of my checkin, the only thing strictly needed actually is just that orbax be upgraded to version 0.11.33 or greater. Uvloop comes from Orbax.

@SujeethJinesh SujeethJinesh force-pushed the sujinesh/colocated_python_checkpointing branch from ac53a95 to 4ed9a99 Compare February 22, 2026 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants