diff --git a/docs/developer/design/petsc-dmplex-checkpoint-reload-plan.md b/docs/developer/design/petsc-dmplex-checkpoint-reload-plan.md
new file mode 100644
index 00000000..713f2b8d
--- /dev/null
+++ b/docs/developer/design/petsc-dmplex-checkpoint-reload-plan.md
@@ -0,0 +1,248 @@
+# PETSc DMPlex Checkpoint Reload Plan
+
+## Commit And Test Workflow
+
+Use small, meaningful commits while implementing this work.
+
+Do `git add` and `git commit` after each coherent feature, fix, or debugging
+checkpoint so progress is easy to inspect and bisect. Do not wait until the end
+to commit everything as one large change, and do not commit after every tiny
+edit.
+
+Create or update unit tests whenever they are needed to prove behavior or
+prevent regressions. Prefer adding a failing regression test before the fix when
+the failure mode is already known.
+
+## Objective
+
+Provide an exact PETSc DMPlex checkpoint reload path for UW3 mesh variables.
+This path is intended for restart and large-scale postprocessing, not
+visualisation.
+
+Target workflow:
+
+```python
+mesh.write_checkpoint(
+    "checkout",
+    outputPath=str(output_dir),
+    meshVars=[v_soln, p_soln],
+    index=0,
+)
+```
+
+Default output:
+
+```text
+checkout.mesh.00000.h5
+checkout.Velocity.00000.h5
+checkout.Pressure.00000.h5
+```
+
+Reload workflow:
+
+```python
+mesh = uw.discretisation.Mesh("checkout.mesh.00000.h5")
+v_soln = uw.discretisation.MeshVariable("Velocity", mesh, mesh.dim, degree=2)
+p_soln = uw.discretisation.MeshVariable("Pressure", mesh, 1, degree=1)
+
+v_soln.read_checkpoint("checkout.Velocity.00000.h5", data_name="Velocity")
+p_soln.read_checkpoint("checkout.Pressure.00000.h5", data_name="Pressure")
+```
+
+The reload path must not use `KDTree` remapping. It must restore FE data through
+PETSc DMPlex topology, section, vector, and `PetscSF` metadata.
+
+## Existing Output Methods
+
+UW3 has two related but different output paths.
+
+| Method | Purpose | Reload method | Strength | Limitation |
+| --- | --- | --- | --- | --- |
+| `mesh.write_timestep(...)` | Visualisation and flexible field remap | `MeshVariable.read_timestep(...)` | Writes XDMF and vertex-field data; can map data onto a different mesh | Uses coordinate/KDTree remapping; memory-heavy for large meshes and high MPI counts |
+| `mesh.write_checkpoint(...)` | Restart and exact postprocessing | `MeshVariable.read_checkpoint(...)` | Uses PETSc DMPlex section/vector metadata; avoids KDTree | Not a visualisation output; no XDMF or vertex-field datasets |
+
+The benchmark scripts should use `write_timestep()` when they need
+visualisation files, and `write_checkpoint()` when they need restart-safe,
+memory-efficient postprocessing.
+
+## Implemented Design
+
+### Checkpoint Writing
+
+`Mesh.write_checkpoint(...)` writes PETSc DMPlex HDF5 storage version `3.0.0`.
+
+The mesh file is named:
+
+```text
+<base>.mesh.<index>.h5
+```
+
+With the default `separate_variable_files=True`, each mesh variable is written
+to its own checkpoint file:
+
+```text
+<base>.<variable>.<index>.h5
+```
+
+With `separate_variable_files=False`, all variables are written into:
+
+```text
+<base>.checkpoint.<index>.h5
+```
+
+Per-variable files are the default because they avoid forcing downstream
+postprocessing to open and move through one very large combined checkpoint file.
+This is useful for large spherical benchmark cases where velocity and pressure
+files can already be large individually.
+
+### Mesh Reload
+
+When a PETSc DMPlex HDF5 mesh is loaded, UW3 keeps the topology-load `PetscSF`
+returned by `DMPlexTopologyLoad(...)`. If UW3 distributes the mesh after load,
+the topology-load SF is composed with the redistribution SF.
+
+This composed SF is stored on the mesh and is the mapping used by
+`MeshVariable.read_checkpoint(...)`.
+
+The mesh DM name is fixed to `uw_mesh` while writing and loading checkpoint
+files so PETSc can find the expected topology groups.
+
+### Variable Reload
+
+`MeshVariable.read_checkpoint(filename, data_name=None)`:
+
+- opens the checkpoint HDF5 file in PETSc HDF5 format
+- loads the saved DMPlex section for `data_name`
+- loads the saved local vector through PETSc's DMPlex local-vector path
+- copies values into the target UW3 variable using section offsets
+- syncs the UW3 local vector back to its global vector
+
+The implementation uses a small Cython wrapper around:
+
+- `DMPlexSectionLoad(...)`
+- `DMPlexLocalVectorLoad(...)`
+
+The wrapper requests only the local-data SF. This avoids failures seen when
+the global-data SF path was constructed before the local checkpoint data could
+be loaded.
+
+## PETSc Requirements
+
+The relevant PETSc DMPlex HDF5 restart sequence is:
+
+1. Load topology with `DMPlexTopologyLoad(...)`.
+2. Keep the `PetscSF` returned by topology load.
+3. Load coordinates with `DMPlexCoordinatesLoad(...)`.
+4. Load labels with `DMPlexLabelsLoad(...)`.
+5. If the mesh is redistributed, compose the topology-load SF with the
+   redistribution SF.
+6. Load the saved section with `DMPlexSectionLoad(...)`.
+7. Load the saved vector with the DMPlex vector-load API.
+8. Scatter or copy the loaded values into UW3's mesh-variable storage.
+
+The critical identity requirements are:
+
+- topology DM name must match the saved topology group
+- section DM name must match the saved variable group
+- vector name must match the saved vector group
+- the SF passed to section load must map saved topology points to current
+  distributed topology points
+
+PETSc reference: [DMPlex manual](https://petsc.org/main/manual/dmplex/).
+
+## Tests
+
+Current unit coverage is in `tests/test_0003_save_load.py`.
+
+The checkpoint roundtrip test covers:
+
+- scalar variable reload
+- vector variable reload
+- discontinuous variable reload
+- combined checkpoint file reload with `separate_variable_files=False`
+- per-variable checkpoint file reload with the default
+  `separate_variable_files=True`
+
+Required validation before PR:
+
+```bash
+./uw python -m pytest tests/test_0003_save_load.py -q
+mpirun -np 2 ./uw python -m pytest \
+    tests/test_0003_save_load.py::test_meshvariable_checkpoint_roundtrip -q
+```
+
+## Spherical Benchmark Validation
+
+The motivating case is spherical benchmark postprocessing at high MPI counts.
+The old `write_timestep()` / `read_timestep()` path can build large KDTree
+mapping structures during reload. At `1/128` this used nearly the full 4.5 TB
+allocation on Gadi.
+
+The checkpoint method avoids KDTree reload and preserves velocity/pressure
+metrics to roundoff. Boundary stress metrics require the benchmark to recover
+stress consistently after reload. In the spherical benchmark this is handled by
+projecting the six deviatoric-stress components and then forming `sigma_rr`.
+
+### Gadi Evidence
+
+| Resolution | Method | NCPUs | Walltime | Memory used | Status |
+| --- | --- | ---: | ---: | ---: | --- |
+| `1/64` | `write_timestep/read_timestep` | 144 | `00:03:43` | `211.27 GB` | completed |
+| `1/64` | `write_checkpoint/read_checkpoint` | 144 | `00:02:41` | `233.67 GB` | completed |
+| `1/128` | `write_timestep/read_timestep` | 1152 | `00:13:55` | `3.92 TB` | completed near memory limit |
+| `1/128` | `write_checkpoint/read_checkpoint` | 1152 | `00:03:57` | `1.83 TB` | completed |
+
+The `1/128` checkpoint reload reduced memory by about `2.09 TB` and walltime by
+about `3.5x` for the postprocessing run.
+
+### Metric Agreement
+
+`1/128` spherical Thieulot benchmark:
+
+| Metric | `write_timestep/read_timestep` | `write_checkpoint/read_checkpoint` |
+| --- | ---: | ---: |
+| `v_l2_norm` | `1.4319274480265082e-06` | `1.4319274480231255e-06` |
+| `p_l2_norm` | `5.985841567394967e-04` | `5.985841567395382e-04` |
+| `p_l2_norm_abs` | `1.0566381005355924e-03` | `1.0566381005356654e-03` |
+| `sigma_rr_l2_norm_lower` | `1.117914337768646e-03` | `1.1256362820288926e-03` |
+| `sigma_rr_l2_norm_upper` | `4.461443231341268e-05` | `3.811141458727819e-05` |
+| `u_dot_n_l2_norm_lower_abs` | `2.2509850571644799e-04` | `2.2509850571645164e-04` |
+| `u_dot_n_l2_norm_upper_abs` | `5.535239716141496e-05` | `5.535239716141875e-05` |
+
+Velocity, pressure, and normal-velocity metrics agree to roundoff. The
+`sigma_rr` values are close but not bitwise identical because the stress
+recovery path changed from the old reload workflow to explicit tau-component
+projection after checkpoint reload.
+
+`1/64` spherical Thieulot benchmark:
+
+| Metric | `write_timestep/read_timestep` | `write_checkpoint/read_checkpoint` |
+| --- | ---: | ---: |
+| `v_l2_norm` | `1.1662200663950889e-05` | `1.1662200663957042e-05` |
+| `p_l2_norm` | `2.7573367818459473e-03` | `2.7573367818460497e-03` |
+| `sigma_rr_l2_norm_lower` | `4.368560398155481e-03` | `4.381908965541248e-03` |
+| `sigma_rr_l2_norm_upper` | `1.6315543718450765e-04` | `1.6047310456195621e-04` |
+
+## Remaining PR Readiness Items
+
+- Run the unit checkpoint tests on the clean checkpoint-only branch.
+- Add or record one small local benchmark smoke test for reproducibility.
+- Decide whether different-rank checkpoint reload is required for the first PR
+  or should be documented as follow-up validation.
+- Keep the final checkpoint PR branch free of unrelated JIT and macOS compiler
+  commits.
+
+## Acceptance Criteria
+
+The checkpoint reload implementation is ready for review when:
+
+- `write_checkpoint()` writes PETSc DMPlex HDF5 storage version `3.0.0`.
+- `write_checkpoint()` supports `outputPath`.
+- `write_checkpoint()` defaults to one checkpoint file per variable.
+- `write_checkpoint(..., separate_variable_files=False)` still supports a
+  combined variable checkpoint file.
+- `MeshVariable.read_checkpoint(...)` reloads through PETSc metadata, not
+  coordinate/KDTree remapping.
+- scalar, vector, and discontinuous variables roundtrip in tests.
+- same-rank MPI reload is validated.
+- benchmark evidence shows the large-memory KDTree reload issue is avoided.
diff --git a/docs/developer/subsystems/checkpoint-output-and-reload-methods.md b/docs/developer/subsystems/checkpoint-output-and-reload-methods.md
new file mode 100644
index 00000000..7b9e3db0
--- /dev/null
+++ b/docs/developer/subsystems/checkpoint-output-and-reload-methods.md
@@ -0,0 +1,212 @@
+# Checkpoint Output And Reload Methods
+
+UW3 currently has two output/reload workflows for mesh and mesh-variable data.
+They serve different purposes and should not be treated as interchangeable.
+
+## Method A: `write_timestep()` / `read_timestep()`
+
+This is the visualisation and flexible remap workflow.
+
+Example:
+
+```python
+mesh.write_timestep(
+    "output",
+    index=0,
+    outputPath=str(output_dir),
+    meshVars=[velocity, pressure],
+)
+
+velocity.read_timestep("output", "Velocity", 0, outputPath=str(output_dir))
+pressure.read_timestep("output", "Pressure", 0, outputPath=str(output_dir))
+```
+
+Typical files:
+
+```text
+output.mesh.00000.h5
+output.mesh.Velocity.00000.h5
+output.mesh.Pressure.00000.h5
+output.mesh.00000.xdmf
+```
+
+The field files contain coordinate/value datasets such as `/fields/<name>` and
+`/fields/coordinates`, plus vertex-field datasets for visualisation. Reloading
+uses coordinate-based remapping. In practice this means the target variable is
+filled by comparing target coordinates to source coordinates, using a KDTree or
+similar nearest-neighbour/remap process.
+
+### Advantages
+
+- Produces XDMF/HDF5 files suitable for visualisation workflows.
+- Can remap data onto a different mesh or a different node layout.
+- Useful for postprocessing where exact finite-element section identity is not
+  required.
+
+### Disadvantages
+
+- Reload is not an exact PETSc FE-vector restart path.
+- The KDTree/remap step can be memory-heavy for large meshes.
+- At high MPI counts, remap memory can dominate postprocessing memory use.
+- Discontinuous fields and high-order fields rely on coordinate remap behavior
+  rather than PETSc section metadata.
+
+## Method B: `write_checkpoint()` / `read_checkpoint()`
+
+This is the restart and exact postprocessing workflow.
+
+Example:
+
+```python
+mesh.write_checkpoint(
+    "checkout",
+    index=0,
+    outputPath=str(output_dir),
+    meshVars=[velocity, pressure],
+)
+```
+
+Default files:
+
+```text
+checkout.mesh.00000.h5
+checkout.Velocity.00000.h5
+checkout.Pressure.00000.h5
+```
+
+Reload:
+
+```python
+mesh = uw.discretisation.Mesh("checkout.mesh.00000.h5")
+velocity = uw.discretisation.MeshVariable("Velocity", mesh, mesh.dim, degree=2)
+pressure = uw.discretisation.MeshVariable("Pressure", mesh, 1, degree=1)
+
+velocity.read_checkpoint("checkout.Velocity.00000.h5", data_name="Velocity")
+pressure.read_checkpoint("checkout.Pressure.00000.h5", data_name="Pressure")
+```
+
+By default, `write_checkpoint()` writes one checkpoint file per mesh variable.
+Use `separate_variable_files=False` to write all variables to one file:
+
+```python
+mesh.write_checkpoint(
+    "checkout",
+    index=0,
+    outputPath=str(output_dir),
+    meshVars=[velocity, pressure],
+    separate_variable_files=False,
+)
+```
+
+Combined variable file:
+
+```text
+checkout.checkpoint.00000.h5
+```
+
+The checkpoint files store PETSc DMPlex HDF5 storage version `3.0.0` data with
+the section/vector metadata required to reconstruct finite-element vectors.
+Reloading uses PETSc DMPlex topology, section, vector, and `PetscSF` metadata.
+It does not use KDTree coordinate remapping.
+
+### Advantages
+
+- Exact FE-vector reload path for restart and postprocessing.
+- Avoids KDTree memory spikes.
+- Preserves continuous, vector, and discontinuous variable layouts through
+  PETSc section metadata.
+- Per-variable files avoid forcing postprocessing to open one large combined
+  field checkpoint.
+- Better suited to large MPI jobs where memory locality matters.
+
+### Disadvantages
+
+- Does not write XDMF.
+- Does not write `/vertex_fields/...` visualisation datasets.
+- Assumes the checkpoint mesh and variable checkpoint files are used together.
+- Different-rank reload should be validated for each workflow before relying on
+  it in production.
+
+## Which Method To Use
+
+| Use case | Recommended method |
+| --- | --- |
+| ParaView/XDMF visualisation | `write_timestep()` |
+| Flexible remap onto another mesh | `write_timestep()` |
+| Exact restart/postprocessing | `write_checkpoint()` |
+| Large spherical benchmark metric evaluation | `write_checkpoint()` |
+| Avoid KDTree memory growth at high MPI counts | `write_checkpoint()` |
+
+It is valid for production scripts to write both:
+
+```python
+mesh.write_timestep("output", index=0, outputPath=str(output_dir), meshVars=[v, p])
+mesh.write_checkpoint("checkout", index=0, outputPath=str(output_dir), meshVars=[v, p])
+```
+
+The first output is for visualisation. The second output is for restart or
+metrics-from-checkpoint postprocessing.
+
+## Spherical Benchmark Evidence
+
+The spherical Thieulot benchmark exposed the practical difference between the
+two methods. Boundary metric evaluation is run in a second step after the Stokes
+solve. The old reload path used `write_timestep()` output and `read_timestep()`;
+the new path uses `write_checkpoint()` output and `read_checkpoint()`.
+
+### Resource Usage
+
+| Resolution | Method | NCPUs | Walltime | CPU time | Memory used | Exit status |
+| --- | --- | ---: | ---: | ---: | ---: | --- |
+| `1/64` | `write_timestep/read_timestep` | 144 | `00:03:43` | `07:04:27` | `211.27 GB` | `0` |
+| `1/64` | `write_checkpoint/read_checkpoint` | 144 | `00:02:41` | `05:21:14` | `233.67 GB` | `0` |
+| `1/128` | `write_timestep/read_timestep` | 1152 | `00:13:55` | `214:02:57` | `3.92 TB` | `0` |
+| `1/128` | `write_checkpoint/read_checkpoint` | 1152 | `00:03:57` | `64:19:53` | `1.83 TB` | `0` |
+
+For the `1/128` case, checkpoint reload reduced memory by about `2.09 TB` and
+reduced walltime by about `3.5x`.
+
+### Metric Agreement
+
+`1/128` spherical Thieulot benchmark:
+
+| Metric | `write_timestep/read_timestep` | `write_checkpoint/read_checkpoint` | Difference |
+| --- | ---: | ---: | ---: |
+| `v_l2_norm` | `1.4319274480265082e-06` | `1.4319274480231255e-06` | `-3.38e-18` |
+| `p_l2_norm` | `5.985841567394967e-04` | `5.985841567395382e-04` | `4.15e-17` |
+| `p_l2_norm_abs` | `1.0566381005355924e-03` | `1.0566381005356654e-03` | `7.30e-17` |
+| `sigma_rr_l2_norm_lower` | `1.117914337768646e-03` | `1.1256362820288926e-03` | `7.72e-06` |
+| `sigma_rr_l2_norm_upper` | `4.461443231341268e-05` | `3.811141458727819e-05` | `-6.50e-06` |
+| `u_dot_n_l2_norm_lower_abs` | `2.2509850571644799e-04` | `2.2509850571645164e-04` | `3.65e-18` |
+| `u_dot_n_l2_norm_upper_abs` | `5.535239716141496e-05` | `5.535239716141875e-05` | `3.79e-18` |
+
+Velocity, pressure, and normal-velocity metrics agree to roundoff. The
+remaining `sigma_rr` differences are small and come from the benchmark stress
+recovery path. The checkpoint workflow computes stress after reload by
+projecting deviatoric-stress components and forming `sigma_rr`; it does not
+reuse the old `read_timestep()` remap path.
+
+`1/64` spherical Thieulot benchmark:
+
+| Metric | `write_timestep/read_timestep` | `write_checkpoint/read_checkpoint` | Difference |
+| --- | ---: | ---: | ---: |
+| `v_l2_norm` | `1.1662200663950889e-05` | `1.1662200663957042e-05` | `6.15e-18` |
+| `p_l2_norm` | `2.7573367818459473e-03` | `2.7573367818460497e-03` | `1.02e-16` |
+| `sigma_rr_l2_norm_lower` | `4.368560398155481e-03` | `4.381908965541248e-03` | `1.33e-05` |
+| `sigma_rr_l2_norm_upper` | `1.6315543718450765e-04` | `1.6047310456195621e-04` | `-2.68e-06` |
+
+## Notes For Benchmark Scripts
+
+For production benchmark workflows:
+
+- run the solve stage first
+- write `write_timestep()` output if visualisation files are needed
+- write `write_checkpoint()` output for restart/postprocessing
+- exit before metric evaluation
+- run a second metrics-from-checkpoint job
+- reload mesh from `<base>.mesh.<index>.h5`
+- reload fields with `MeshVariable.read_checkpoint(...)`
+- compute metrics from reloaded fields
+
+This separates solver memory from postprocessing memory and avoids the KDTree
+reload path for large benchmark metric jobs.
diff --git a/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable-vectors-old.py b/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable-vectors-old.py
index 40f5bdbc..3c0c7682 100644
--- a/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable-vectors-old.py
+++ b/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable-vectors-old.py
@@ -128,8 +128,8 @@
 pc.data[:, 0] = 0.0
 index1pc.data[:, 0] = 0.0
 
-pc.load_from_checkpoint(f"test_checkpointing_np1.P.0.h5", data_name="P")
-index1pc.load_from_checkpoint(
+pc.read_checkpoint(f"test_checkpointing_np1.P.0.h5", data_name="P")
+index1pc.read_checkpoint(
     f"test_checkpointing_np1.Index1proc.0.h5", data_name="Index1proc"
 )
 
@@ -296,11 +296,11 @@
 p3 = uw.discretisation.MeshVariable("P3", mesh3, 1, degree=1, continuous=True)
 
 # %%
-v3.load_from_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
-p3.load_from_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
+v3.read_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
+p3.read_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
 
-vc.load_from_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
-pc.load_from_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
+vc.read_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
+pc.read_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
 
 
 # %%
@@ -372,7 +372,7 @@
     print("U2", U2[0:7, 0].T)
 
 # %%
-vc.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
+vc.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
 
 # %%
 # it should be fine to test this proc-by-proc
@@ -403,8 +403,8 @@
 v2 = uw.discretisation.MeshVariable("U", mesh2, mesh2.dim, degree=2)
 p2 = uw.discretisation.MeshVariable("P", mesh2, 1, degree=1, continuous=True)
 
-v2.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
-p2.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
+v2.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
+p2.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
 
 
 # TODO: Consider uw.synchronised_array_update() for multi-variable assignment
@@ -432,8 +432,8 @@
 v3 = uw.discretisation.MeshVariable("U", mesh3, mesh3.dim, degree=2)
 p3 = uw.discretisation.MeshVariable("P", mesh3, 1, degree=1, continuous=True)
 
-v3.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
-p3.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
+v3.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
+p3.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
 
 # TODO: Consider uw.synchronised_array_update() for multi-variable assignment
 print(f"i   - {uw.mpi.rank}: ", v3.data[0:7, 0].T)
diff --git a/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable-vectors.py b/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable-vectors.py
index f1b64a78..17701c5b 100644
--- a/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable-vectors.py
+++ b/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable-vectors.py
@@ -113,11 +113,11 @@
 indexc.data[:, 0] = 0.0
 
 filename = f"{expt_name}.P.0.h5"
-pc.load_from_checkpoint(filename, data_name="P")
+pc.read_checkpoint(filename, data_name="P")
 filename = f"{expt_name}.U.0.h5"
-vc.load_from_checkpoint(filename, data_name="U")
+vc.read_checkpoint(filename, data_name="U")
 filename = f"{expt_name}.Index.0.h5"
-indexc.load_from_checkpoint(filename, data_name="Index")
+indexc.read_checkpoint(filename, data_name="Index")
 
 # %%
 # mesh.write_visualisation_xdmf(expt_name+"_c",
@@ -286,11 +286,11 @@
 p3 = uw.discretisation.MeshVariable("P3", mesh3, 1, degree=1, continuous=True)
 
 # %%
-v3.load_from_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
-p3.load_from_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
+v3.read_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
+p3.read_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
 
-vc.load_from_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
-pc.load_from_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
+vc.read_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
+pc.read_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
 
 
 # %%
@@ -362,7 +362,7 @@
     print("U2", U2[0:7, 0].T)
 
 # %%
-vc.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
+vc.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
 
 # %%
 # it should be fine to test this proc-by-proc
@@ -393,8 +393,8 @@
 v2 = uw.discretisation.MeshVariable("U", mesh2, mesh2.dim, degree=2)
 p2 = uw.discretisation.MeshVariable("P", mesh2, 1, degree=1, continuous=True)
 
-v2.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
-p2.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
+v2.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
+p2.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
 
 
 # TODO: Consider uw.synchronised_array_update() for multi-variable assignment
@@ -422,8 +422,8 @@
 v3 = uw.discretisation.MeshVariable("U", mesh3, mesh3.dim, degree=2)
 p3 = uw.discretisation.MeshVariable("P", mesh3, 1, degree=1, continuous=True)
 
-v3.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
-p3.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
+v3.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
+p3.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
 
 # TODO: Consider uw.synchronised_array_update() for multi-variable assignment
 print(f"i   - {uw.mpi.rank}: ", v3.data[0:7, 0].T)
diff --git a/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable.py b/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable.py
index 09aa2e2d..6e7d5d7d 100644
--- a/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable.py
+++ b/docs/examples/WIP/developer_tools/Ex_Checkpoint_Read_MeshVariable.py
@@ -617,11 +617,11 @@ def load_mesh_var(var, data_name):
 p3 = uw.discretisation.MeshVariable("P3", mesh3, 1, degree=1, continuous=True)
 
 # %%
-v3.load_from_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
-p3.load_from_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
+v3.read_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
+p3.read_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
 
-vc.load_from_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
-pc.load_from_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
+vc.read_checkpoint(f"{expt_name}.U.0.h5", data_name="U")
+pc.read_checkpoint(f"{expt_name}.P.0.h5", data_name="P")
 
 
 # %%
@@ -693,7 +693,7 @@ def load_mesh_var(var, data_name):
     print("U2", U2[0:7, 0].T)
 
 # %%
-vc.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
+vc.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
 
 # %%
 # it should be fine to test this proc-by-proc
@@ -724,8 +724,8 @@ def load_mesh_var(var, data_name):
 v2 = uw.discretisation.MeshVariable("U", mesh2, mesh2.dim, degree=2)
 p2 = uw.discretisation.MeshVariable("P", mesh2, 1, degree=1, continuous=True)
 
-v2.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
-p2.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
+v2.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
+p2.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
 
 
 # TODO: Consider uw.synchronised_array_update() for multi-variable assignment
@@ -753,8 +753,8 @@ def load_mesh_var(var, data_name):
 v3 = uw.discretisation.MeshVariable("U", mesh3, mesh3.dim, degree=2)
 p3 = uw.discretisation.MeshVariable("P", mesh3, 1, degree=1, continuous=True)
 
-v3.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
-p3.load_from_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
+v3.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.U.0.h5", data_name="U")
+p3.read_checkpoint(f"viz_chpt_np{uw.mpi.size}.P.0.h5", data_name="P")
 
 # TODO: Consider uw.synchronised_array_update() for multi-variable assignment
 print(f"i   - {uw.mpi.rank}: ", v3.data[0:7, 0].T)
diff --git a/src/underworld3/cython/petsc_discretisation.pyx b/src/underworld3/cython/petsc_discretisation.pyx
index 1a581381..655537d6 100644
--- a/src/underworld3/cython/petsc_discretisation.pyx
+++ b/src/underworld3/cython/petsc_discretisation.pyx
@@ -60,6 +60,34 @@ def petsc_fvm_get_local_cell_sizes(mesh) -> np.array:
         return cell_radii, cell_centroids
 
 
+def petsc_dmplex_load_local_vector(dm, viewer, sectiondm, sf, data_name):
+        """
+        Load a DMPlex checkpoint section and vector through PETSc's local-vector path.
+
+        petsc4py's DMPlex.sectionLoad wrapper always requests both the global and
+        local data SFs. For restart files whose section contains local overlap dofs,
+        the global-data SF construction can fail before the local path is usable.
+        This wrapper follows the PETSc API directly and requests only localDataSF.
+        """
+
+        cdef DM c_dm = dm
+        cdef Viewer c_viewer = viewer
+        cdef DM c_sectiondm = sectiondm
+        cdef SF c_sf = sf
+        cdef Vec c_vec
+        cdef PetscSF local_sf = NULL
+        load_vec = None
+
+        CHKERRQ(DMPlexSectionLoad(c_dm.dm, c_viewer.vwr, c_sectiondm.dm, c_sf.sf, NULL, &local_sf))
+        load_vec = sectiondm.createLocalVec()
+        load_vec.setName(data_name)
+        c_vec = load_vec
+        CHKERRQ(DMPlexLocalVectorLoad(c_dm.dm, c_viewer.vwr, c_sectiondm.dm, local_sf, c_vec.vec))
+        CHKERRQ(PetscSFDestroy(&local_sf))
+
+        return load_vec
+
+
 def petsc_dm_create_submesh_from_label(incoming_dm, label_name, label_value, marked_faces=False):
         """
         Extract a submesh from a DMPlex using a label.
diff --git a/src/underworld3/cython/petsc_extras.pxi b/src/underworld3/cython/petsc_extras.pxi
index f36ae1d3..1f3f4120 100644
--- a/src/underworld3/cython/petsc_extras.pxi
+++ b/src/underworld3/cython/petsc_extras.pxi
@@ -5,10 +5,11 @@ from petsc4py.PETSc cimport DS,  PetscDS
 from petsc4py.PETSc cimport Vec, PetscVec
 from petsc4py.PETSc cimport Mat, PetscMat
 from petsc4py.PETSc cimport IS,  PetscIS
+from petsc4py.PETSc cimport SF,  PetscSF
 from petsc4py.PETSc cimport FE,  PetscFE
 from petsc4py.PETSc cimport DMLabel, PetscDMLabel
 from petsc4py.PETSc cimport PetscQuadrature, PetscSection
-from petsc4py.PETSc cimport MPI_Comm, PetscMat, GetCommDefault, PetscViewer
+from petsc4py.PETSc cimport MPI_Comm, PetscMat, GetCommDefault, Viewer, PetscViewer
 
 
 from underworld3.cython.petsc_types cimport PetscBool, PetscInt, PetscReal, PetscScalar
@@ -70,6 +71,9 @@ cdef extern from "petsc.h" nogil:
     PetscErrorCode PetscDSAddBdResidual( PetscDS, PetscInt, PetscDSBdResidualFn, PetscDSBdResidualFn )
 
     PetscErrorCode DMPlexCreateSubmesh(PetscDM, PetscDMLabel label, PetscInt value, PetscBool markedFaces, PetscDM *subdm)
+    PetscErrorCode DMPlexSectionLoad(PetscDM, PetscViewer, PetscDM, PetscSF, PetscSF *, PetscSF *)
+    PetscErrorCode DMPlexLocalVectorLoad(PetscDM, PetscViewer, PetscDM, PetscSF, PetscVec)
+    PetscErrorCode PetscSFDestroy(PetscSF *)
     PetscErrorCode UW_DMPlexFilter(PetscDM, PetscDMLabel, PetscInt, PetscBool, PetscBool, PetscDM *)
     PetscErrorCode DMGetLabel(PetscDM dm, const char name[], PetscDMLabel *label)
 
@@ -93,4 +97,4 @@ cdef extern from "petsc.h" nogil:
     PetscErrorCode DMLocalizeCoordinates(PetscDM dm)
 
     # Not wrapped at this point
-    PetscErrorCode VecConcatenate(PetscInt nx, const PetscVec X[], PetscVec *, PetscIS *)
\ No newline at end of file
+    PetscErrorCode VecConcatenate(PetscInt nx, const PetscVec X[], PetscVec *, PetscIS *)
diff --git a/src/underworld3/discretisation/discretisation_mesh.py b/src/underworld3/discretisation/discretisation_mesh.py
index 4161675b..3a31ad49 100644
--- a/src/underworld3/discretisation/discretisation_mesh.py
+++ b/src/underworld3/discretisation/discretisation_mesh.py
@@ -1,5 +1,6 @@
 from typing import Optional, Tuple, Union
 from enum import Enum
+from contextlib import contextmanager
 
 import os
 import weakref
@@ -26,6 +27,24 @@
 
 from sympy.vector import CoordSys3D
 
+
+@contextmanager
+def _temporary_petsc_option(key, value):
+    opts = PETSc.Options()
+    try:
+        old_value = opts.getString(key)
+    except KeyError:
+        old_value = None
+    opts[key] = value
+    try:
+        yield
+    finally:
+        if old_value is None:
+            opts.delValue(key)
+        else:
+            opts[key] = old_value
+
+
 ## Add the ability to inherit an Enum, so we can add standard boundary
 ## types to ones that are supplied by the users / the meshing module
 ## https://stackoverflow.com/questions/46073413/python-enum-combination
@@ -128,18 +147,23 @@ def _from_plexh5(
     if comm == None:
         comm = PETSc.COMM_WORLD
 
-    # Use createFromFile for a single-call load (issue #96: the separate
-    # topologyLoad + coordinatesLoad + labelsLoad pipeline leaves the
-    # coordinate DM in a state that DMPlexComputeBdIntegral cannot handle).
-    h5plex = PETSc.DMPlex().createFromFile(filename, interpolate=True, comm=comm)
-
+    viewer = PETSc.ViewerHDF5().create(filename, "r", comm=comm)
+    h5plex = PETSc.DMPlex().create(comm=comm)
     h5plex.setName("uw_mesh")
-    h5plex.markBoundaryFaces("All_Boundaries", 1001)
+    viewer.pushFormat(PETSc.Viewer.Format.HDF5_PETSC)
+    try:
+        sf0 = h5plex.topologyLoad(viewer)
+        h5plex.coordinatesLoad(viewer, sf0)
+        h5plex.labelsLoad(viewer, sf0)
+        h5plex.markBoundaryFaces("All_Boundaries", 1001)
+    finally:
+        viewer.popFormat()
+        viewer.destroy()
 
     if not return_sf:
         return h5plex
     else:
-        return h5plex.getPointSF(), h5plex
+        return sf0, h5plex
 
 
 class Mesh(Stateful, uw_object):
@@ -379,10 +403,11 @@ def __init__(
 
                 f.close()
 
-                # This needs to be done when reading a dm from a checkpoint
-                # or building from an imported mesh format
-
-                self.dm.setFromOptions()
+                # Do not call setFromOptions() here. DMPlexTopologyLoad()
+                # returns the topology SF needed to reload checkpoint fields.
+                # setFromOptions() can repartition/reorder the DM before UW
+                # composes that SF with any later redistribution SF, leaving
+                # checkpoint field reloads mapped to stale point numbering.
 
             else:
                 raise RuntimeError(
@@ -499,7 +524,7 @@ class replacement_boundaries(Enum):
             self.dm.setRefinementUniform()
 
             if not self.dm.isDistributed():
-                self.dm.distribute()
+                self.sf1 = self.dm.distribute()
 
             # self.dm_hierarchy = self.dm.refineHierarchy(refinement)
 
@@ -536,7 +561,7 @@ class replacement_boundaries(Enum):
             self.dm.setRefinementUniform()
 
             if not self.dm.isDistributed():
-                self.dm.distribute()
+                self.sf1 = self.dm.distribute()
 
             self.dm_hierarchy = [self.dm]
             for i in range(coarsening):
@@ -557,7 +582,7 @@ class replacement_boundaries(Enum):
 
         else:
             if not self.dm.isDistributed():
-                self.dm.distribute()
+                self.sf1 = self.dm.distribute()
 
             self.dm_hierarchy = [self.dm]
             self.dm_h = self.dm.clone()
@@ -2501,65 +2526,115 @@ def petsc_save_checkpoint(
     def write_checkpoint(
         self,
         filename: str,
+        outputPath: str = "",
         meshUpdates: bool = True,
         meshVars: Optional[list] = [],
         swarmVars: Optional[list] = [],
         index: Optional[int] = 0,
         unique_id: Optional[bool] = False,
+        separate_variable_files: bool = True,
     ):
-        """Write data in a format that can be restored for restarting the simulation.
+        """Write PETSc DMPlex checkpoint files for restart/postprocessing.
+
+        Checkpoint output stores PETSc DMPlex section/vector metadata required
+        for exact parallel reload. Unlike ``write_timestep()``, this is restart
+        output and does not write XDMF or vertex-field visualisation datasets.
 
-        The difference between this and the visualisation is 1) the parallel section needs
-        to be stored to reload the data correctly, and 2) the visualisation information (vertex form of fields)
-        is not stored. This routine uses dmplex VectorView and VectorLoad functionality.
+        Parameters
+        ----------
+        filename
+            Checkpoint base filename. With ``outputPath`` unset, this may include
+            a directory. With ``outputPath`` set, it is joined to that directory.
+        outputPath
+            Optional output directory, matching the ``write_timestep()`` style.
+        meshUpdates
+            If ``False``, write the mesh checkpoint only when it does not already
+            exist. If ``True``, always write the indexed mesh checkpoint.
+        meshVars, swarmVars
+            Variables to write into checkpoint files.
+        index
+            Checkpoint index used in output filenames.
+        unique_id
+            Preserve existing unique-rank filename behaviour for checkpoint data.
+        separate_variable_files
+            If ``True`` (default), write one file per variable:
+            ``<base>.<variable>.<index>.h5``. If ``False``, write all variables
+            into one file: ``<base>.checkpoint.<index>.h5``.
         """
 
-        # The mesh checkpoint is the same as the one required for visualisation
+        if outputPath:
+            filename = os.path.join(outputPath, filename)
 
-        if not meshUpdates:
-            from pathlib import Path
+        def _checkpoint_filename(var_name=None):
+            variable_part = f".{var_name}" if var_name is not None else ".checkpoint"
+            if unique_id:
+                return filename + f"{uw.mpi.unique}{variable_part}.{index:05}.h5"
+            return filename + f"{variable_part}.{index:05}.h5"
 
-            mesh_file = filename + ".mesh.0.h5"
-            path = Path(mesh_file)
-            if not path.is_file():
-                self.write(mesh_file)
+        def _write_variable(viewer, var):
+            if var._lvec is None:
+                var._set_vec(available=True)
 
-        else:
-            self.write(filename + f".mesh.{index:05}.h5")
+            iset, subdm = self.dm.createSubDM(var.field_id)
+            subdm.setName(var.clean_name)
+            old_lvec_name = var._lvec.getName()
+
+            try:
+                var._lvec.setName(var.clean_name)
+                self.dm.sectionView(viewer, subdm)
+                self.dm.localVectorView(viewer, subdm, var._lvec)
+            finally:
+                var._lvec.setName(old_lvec_name)
+                iset.destroy()
+                subdm.destroy()
 
-        # Checkpoint file
+        def _write_checkpoint_file(checkpoint_file, variables):
+            viewer = PETSc.ViewerHDF5().create(checkpoint_file, "w", comm=PETSc.COMM_WORLD)
+            viewer.pushFormat(PETSc.Viewer.Format.HDF5_PETSC)
+            try:
+                # Store the parallel-mesh section information for restoring the checkpoint.
+                self.dm.sectionView(viewer, self.dm)
 
-        if unique_id:
-            checkpoint_file = filename + f"{uw.mpi.unique}.checkpoint.{index:05}.h5"
-        else:
-            checkpoint_file = filename + f".checkpoint.{index:05}.h5"
+                for var in variables:
+                    _write_variable(viewer, var)
 
+                uw.mpi.barrier()  # should not be required
+            finally:
+                viewer.popFormat()
+                viewer.destroy()
+
+        old_dm_name = self.dm.getName()
         self.dm.setName("uw_mesh")
-        viewer = PETSc.ViewerHDF5().create(checkpoint_file, "w", comm=PETSc.COMM_WORLD)
 
-        # Store the parallel-mesh section information for restoring the checkpoint.
-        self.dm.sectionView(viewer, self.dm)
+        try:
+            with _temporary_petsc_option("dm_plex_view_hdf5_storage_version", "3.0.0"):
+                # The mesh checkpoint is the same as the one required for visualisation
 
-        if meshVars is not None:
-            for var in meshVars:
-                var._sync_lvec_to_gvec()
-                iset, subdm = self.dm.createSubDM(var.field_id)
-                subdm.setName(var.clean_name)
-                self.dm.globalVectorView(viewer, subdm, var._gvec)
-                self.dm.sectionView(viewer, subdm)
-                # v._gvec.view(viewer) # would add viz information plus a duplicate of the data
+                if not meshUpdates:
+                    from pathlib import Path
 
-        if swarmVars is not None:
-            for svar in swarmVars:
-                var = svar._meshVar
-                var._sync_lvec_to_gvec()
-                iset, subdm = self.dm.createSubDM(var.field_id)
-                subdm.setName(var.clean_name)
-                self.dm.globalVectorView(viewer, subdm, var._gvec)
-                self.dm.sectionView(viewer, subdm)
+                    mesh_file = filename + f".mesh.{index:05}.h5"
+                    path = Path(mesh_file)
+                    if not path.is_file():
+                        self.write(mesh_file)
 
-        uw.mpi.barrier()  # should not be required
-        viewer.destroy()
+                else:
+                    self.write(filename + f".mesh.{index:05}.h5")
+
+                variables = []
+                if meshVars is not None:
+                    variables.extend(meshVars)
+                if swarmVars is not None:
+                    variables.extend(svar._meshVar for svar in swarmVars)
+
+                if separate_variable_files:
+                    for var in variables:
+                        _write_checkpoint_file(_checkpoint_filename(var.clean_name), [var])
+                else:
+                    _write_checkpoint_file(_checkpoint_filename(), variables)
+        finally:
+            if old_dm_name is not None:
+                self.dm.setName(old_dm_name)
 
     @timing.routine_timer_decorator
     def write(self, filename: str, index: Optional[int] = None):
@@ -2586,8 +2661,12 @@ def write(self, filename: str, index: Optional[int] = None):
             # viewer.pushTimestepping(viewer)
             # viewer.setTimestep(index)
 
-        viewer(self.dm)
-        viewer.destroy()
+        viewer.pushFormat(PETSc.Viewer.Format.HDF5_PETSC)
+        try:
+            viewer(self.dm)
+        finally:
+            viewer.popFormat()
+            viewer.destroy()
 
         ## Add boundary metadata to the file
 
diff --git a/src/underworld3/discretisation/discretisation_mesh_variables.py b/src/underworld3/discretisation/discretisation_mesh_variables.py
index 13359073..49f0a787 100644
--- a/src/underworld3/discretisation/discretisation_mesh_variables.py
+++ b/src/underworld3/discretisation/discretisation_mesh_variables.py
@@ -1272,6 +1272,88 @@ def load_from_h5_plex_vector(
 
         return
 
+    @timing.routine_timer_decorator
+    @uw.collective_operation
+    def read_checkpoint(
+        self,
+        filename: str,
+        data_name: Optional[str] = None,
+    ):
+        """Load this mesh variable from ``Mesh.write_checkpoint()`` output.
+
+        This is an exact PETSc DMPlex section/vector reload path. It does not
+        use the coordinate/KDTree remapping used by ``read_timestep()``.
+        """
+
+        if data_name is None:
+            data_name = self.clean_name
+
+        if self._lvec is None:
+            self._set_vec(available=True)
+
+        indexset, subdm = self.mesh.dm.createSubDM(self.field_id)
+        sectiondm = self.mesh.dm.clone()
+        viewer = PETSc.ViewerHDF5().create(filename, "r", comm=PETSc.COMM_WORLD)
+        viewer.pushFormat(PETSc.Viewer.Format.HDF5_PETSC)
+
+        old_mesh_name = self.mesh.dm.getName()
+        old_lvec_name = self._lvec.getName()
+        old_vec_name = self._gvec.getName()
+
+        try:
+            self.mesh.dm.setName("uw_mesh")
+            subdm.setName(data_name)
+            sectiondm.setName(data_name)
+            self._lvec.setName(data_name)
+            self._gvec.setName(data_name)
+
+            from underworld3.cython.petsc_discretisation import (
+                petsc_dmplex_load_local_vector,
+            )
+
+            loaded_lvec = petsc_dmplex_load_local_vector(
+                self.mesh.dm, viewer, sectiondm, self.mesh.sf, data_name
+            )
+
+            source_section = sectiondm.getSection()
+            target_section = subdm.getSection()
+            source_array = loaded_lvec.array_r
+            target_array = self._lvec.array
+            p_start, p_end = target_section.getChart()
+
+            for point in range(p_start, p_end):
+                target_dof = target_section.getDof(point)
+                if target_dof == 0:
+                    continue
+
+                source_dof = source_section.getDof(point)
+                if source_dof < target_dof:
+                    raise RuntimeError(
+                        f"Checkpoint section has {source_dof} dofs for point {point}, "
+                        f"but target variable requires {target_dof}."
+                    )
+
+                source_offset = source_section.getOffset(point)
+                target_offset = target_section.getOffset(point)
+                target_array[target_offset : target_offset + target_dof] = (
+                    source_array[source_offset : source_offset + target_dof]
+                )
+
+            loaded_lvec.destroy()
+            self._sync_lvec_to_gvec()
+        finally:
+            self._lvec.setName(old_lvec_name)
+            self._gvec.setName(old_vec_name)
+            if old_mesh_name is not None:
+                self.mesh.dm.setName(old_mesh_name)
+            viewer.popFormat()
+            viewer.destroy()
+            sectiondm.destroy()
+            indexset.destroy()
+            subdm.destroy()
+
+        return
+
     @property
     def fn(self) -> sympy.Basic:
         """
diff --git a/src/underworld3/discretisation/enhanced_variables.py b/src/underworld3/discretisation/enhanced_variables.py
index 6b2aea31..a8812047 100644
--- a/src/underworld3/discretisation/enhanced_variables.py
+++ b/src/underworld3/discretisation/enhanced_variables.py
@@ -462,6 +462,10 @@ def load_from_h5_plex_vector(self, *args, **kwargs):
         """Load from HDF5 plex vector."""
         return self._base_var.load_from_h5_plex_vector(*args, **kwargs)
 
+    def read_checkpoint(self, *args, **kwargs):
+        """Load from a PETSc DMPlex checkpoint file."""
+        return self._base_var.read_checkpoint(*args, **kwargs)
+
     def write(self, *args, **kwargs):
         """Write variable data to HDF5 file."""
         return self._base_var.write(*args, **kwargs)
diff --git a/tests/test_0003_save_load.py b/tests/test_0003_save_load.py
index c10ec725..30d3fcbc 100644
--- a/tests/test_0003_save_load.py
+++ b/tests/test_0003_save_load.py
@@ -3,12 +3,27 @@
 # All tests in this module are quick core tests
 pytestmark = pytest.mark.level_1
 import numpy as np
+from pathlib import Path
+
+
+def _shared_tmp_path(tmp_path, uw):
+    """Use one rank-0 pytest tmp path for collective MPI I/O tests."""
+
+    shared_path = str(tmp_path) if uw.mpi.rank == 0 else None
+    shared_path = uw.mpi.comm.bcast(shared_path, root=0)
+    shared_path = Path(shared_path)
+    if uw.mpi.rank == 0:
+        shared_path.mkdir(parents=True, exist_ok=True)
+    uw.mpi.barrier()
+    return shared_path
 
 
 def test_mesh_save_and_load(tmp_path):
     import underworld3
     from underworld3.meshing import UnstructuredSimplexBox
 
+    tmp_path = _shared_tmp_path(tmp_path, underworld3)
+
     mesh = UnstructuredSimplexBox(minCoords=(0.0, 0.0), maxCoords=(1.0, 1.0), cellSize=1.0 / 32.0)
 
     mesh.write_timestep("test", meshUpdates=False, outputPath=tmp_path, index=0)
@@ -22,6 +37,8 @@ def test_meshvariable_save_and_read(tmp_path):
     import underworld3
     from underworld3.meshing import UnstructuredSimplexBox
 
+    tmp_path = _shared_tmp_path(tmp_path, underworld3)
+
     mesh = UnstructuredSimplexBox(minCoords=(0.0, 0.0), maxCoords=(1.0, 1.0), cellSize=1.0 / 32.0)
 
     X = underworld3.discretisation.MeshVariable("X", mesh, 1, degree=2)
@@ -36,10 +53,99 @@ def test_meshvariable_save_and_read(tmp_path):
     assert np.allclose(X.array, X2.array)
 
 
+def test_meshvariable_checkpoint_roundtrip(tmp_path):
+    import underworld3 as uw
+    from underworld3.meshing import UnstructuredSimplexBox
+
+    tmp_path = _shared_tmp_path(tmp_path, uw)
+
+    mesh = UnstructuredSimplexBox(
+        minCoords=(0.0, 0.0),
+        maxCoords=(1.0, 1.0),
+        cellSize=1.0 / 8.0,
+    )
+
+    x = uw.discretisation.MeshVariable("x", mesh, 1, degree=1)
+    u = uw.discretisation.MeshVariable("u", mesh, 2, degree=2)
+    d = uw.discretisation.MeshVariable("d", mesh, 1, degree=1, continuous=False)
+
+    x.data[:, 0] = x.coords[:, 0] + 2.0 * x.coords[:, 1]
+    u.data[:, 0] = 3.0 * u.coords[:, 0] - u.coords[:, 1]
+    u.data[:, 1] = u.coords[:, 0] + 4.0 * u.coords[:, 1]
+    d.data[:, 0] = 5.0 * d.coords[:, 0] + 7.0 * d.coords[:, 1]
+
+    def assert_reloaded_fields(x_reloaded, u_reloaded, d_reloaded):
+        # Parallel DMPlex reloads may repartition the mesh, so compare against the
+        # defining functions on the reloaded coordinates rather than local row order.
+        np.testing.assert_allclose(
+            x_reloaded.data[:, 0],
+            x_reloaded.coords[:, 0] + 2.0 * x_reloaded.coords[:, 1],
+            atol=1.0e-12,
+        )
+        np.testing.assert_allclose(
+            u_reloaded.data[:, 0],
+            3.0 * u_reloaded.coords[:, 0] - u_reloaded.coords[:, 1],
+            atol=1.0e-12,
+        )
+        np.testing.assert_allclose(
+            u_reloaded.data[:, 1],
+            u_reloaded.coords[:, 0] + 4.0 * u_reloaded.coords[:, 1],
+            atol=1.0e-12,
+        )
+        np.testing.assert_allclose(
+            d_reloaded.data[:, 0],
+            5.0 * d_reloaded.coords[:, 0] + 7.0 * d_reloaded.coords[:, 1],
+            atol=1.0e-12,
+        )
+
+    checkpoint_base = tmp_path / "restart"
+    mesh.write_checkpoint(
+        "restart",
+        outputPath=str(tmp_path),
+        meshUpdates=False,
+        meshVars=[x, u, d],
+        index=0,
+        separate_variable_files=False,
+    )
+
+    mesh_reloaded = uw.discretisation.Mesh(f"{checkpoint_base}.mesh.00000.h5")
+    x_reloaded = uw.discretisation.MeshVariable("x", mesh_reloaded, 1, degree=1)
+    u_reloaded = uw.discretisation.MeshVariable("u", mesh_reloaded, 2, degree=2)
+    d_reloaded = uw.discretisation.MeshVariable("d", mesh_reloaded, 1, degree=1, continuous=False)
+
+    x_reloaded.read_checkpoint(f"{checkpoint_base}.checkpoint.00000.h5", data_name="x")
+    u_reloaded.read_checkpoint(f"{checkpoint_base}.checkpoint.00000.h5", data_name="u")
+    d_reloaded.read_checkpoint(f"{checkpoint_base}.checkpoint.00000.h5", data_name="d")
+
+    assert_reloaded_fields(x_reloaded, u_reloaded, d_reloaded)
+
+    separate_base = tmp_path / "restart_separate"
+    mesh.write_checkpoint(
+        "restart_separate",
+        outputPath=str(tmp_path),
+        meshUpdates=False,
+        meshVars=[x, u, d],
+        index=0,
+    )
+
+    mesh_reloaded = uw.discretisation.Mesh(f"{separate_base}.mesh.00000.h5")
+    x_reloaded = uw.discretisation.MeshVariable("x", mesh_reloaded, 1, degree=1)
+    u_reloaded = uw.discretisation.MeshVariable("u", mesh_reloaded, 2, degree=2)
+    d_reloaded = uw.discretisation.MeshVariable("d", mesh_reloaded, 1, degree=1, continuous=False)
+
+    x_reloaded.read_checkpoint(f"{separate_base}.x.00000.h5", data_name="x")
+    u_reloaded.read_checkpoint(f"{separate_base}.u.00000.h5", data_name="u")
+    d_reloaded.read_checkpoint(f"{separate_base}.d.00000.h5", data_name="d")
+
+    assert_reloaded_fields(x_reloaded, u_reloaded, d_reloaded)
+
+
 def test_swarm_save_and_load(tmp_path):
     import underworld3 as uw
     from underworld3.meshing import UnstructuredSimplexBox
 
+    tmp_path = _shared_tmp_path(tmp_path, uw)
+
     mesh = UnstructuredSimplexBox(minCoords=(0.0, 0.0), maxCoords=(1.0, 1.0), cellSize=1.0 / 32.0)
 
     swarm = uw.swarm.Swarm(mesh)
@@ -51,9 +157,12 @@ def test_swarm_save_and_load(tmp_path):
 
 
 def test_swarmvariable_save_and_load(tmp_path):
+    import underworld3 as uw
     from underworld3 import swarm
     from underworld3.meshing import UnstructuredSimplexBox
 
+    tmp_path = _shared_tmp_path(tmp_path, uw)
+
     mesh = UnstructuredSimplexBox(minCoords=(0.0, 0.0), maxCoords=(1.0, 1.0), cellSize=1.0 / 32.0)
     swarm = swarm.Swarm(mesh)
     var = swarm.add_variable(name="X", size=1)