Skip to content

Add dependency verification documentation#110

Merged
jmhsieh merged 4 commits intomainfrom
jon/geneva-diagnostics
Feb 3, 2026
Merged

Add dependency verification documentation#110
jmhsieh merged 4 commits intomainfrom
jon/geneva-diagnostics

Conversation

@jmhsieh
Copy link
Contributor

@jmhsieh jmhsieh commented Jan 29, 2026

Summary

  • Add documentation for diagnosing and resolving package version mismatches between local and Ray worker environments
  • Document the compare_ray_environments tool with programmatic and CLI usage
  • Include common issues (NumPy, PyTorch, attrs mismatches) and solutions using manifests, custom images, or conda
  • Add diagnostic workflow for troubleshooting serialization errors

Test plan

  • Preview docs locally with npx mintlify dev
  • Verify navigation shows new page under "Job execution" after "Execution contexts"
  • Check all links resolve correctly

🤖 Generated with Claude Code

Add docs for diagnosing and resolving package version mismatches
between local and Ray worker environments, including the
compare_ray_environments tool and common issues/solutions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jmhsieh
Copy link
Contributor Author

jmhsieh commented Jan 29, 2026

The one link introduced here will point to the api docs page that will appear with this:
https://github.com/lancedb/geneva/pull/516

Copy link
Contributor

@dantasse dantasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the first cut here! I think for public docs we should be a lot more judicious than when we're generating a Claude internal doc: we need to be more sure that each bit is correct, and we need to be more concise.


**Symptoms**: `ModuleNotFoundError` for compiled extensions, segfaults.

**Solution**: Run Geneva from the same OS/architecture as your cluster (Linux x86_64).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Solution**: Run Geneva from the same OS/architecture as your cluster (Linux x86_64).
**Solution**: Run Geneva from the same OS/architecture as your cluster (Linux x86_64). Or, if that's not possible, install dependencies using `pip()` or `conda()` as described in [Execution Contexts](/geneva/jobs/contexts)

Copy link
Contributor Author

@jmhsieh jmhsieh Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that solution will fix the architecture mismatch. I think there is an explicit arch + pip approach that would be needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, it doesn't? Won't the worker just download from pip the right version of the library for their architecture?

Comment on lines 23 to 50
## Diagnostic Workflow

When encountering serialization or "abstract class" errors:

<Steps>
<Step>
**Run the diagnostic tool**:
```bash
python -m geneva.runners.ray.compare_env
```
</Step>
<Step>
**Check PACKAGES: version mismatches** section first.
</Step>
<Step>
**Identify critical packages**: numpy, torch, pyarrow, attrs, pydantic.
</Step>
<Step>
**Fix with manifest** for quick testing:
```python
from geneva.manifest.builder import GenevaManifestBuilder
manifest = GenevaManifestBuilder.create("fix").pip(["numpy==1.26.4"]).build()
```
</Step>
<Step>
**Build custom image** for production (if using KubeRay).
</Step>
</Steps>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would take out this whole section. Users will probably only need some steps of this (maybe they do or don't build custom images, etc), and idk why Claude fixated on PACKAGES, I don't think that's the only interesting bit here

Copy link
Contributor Author

@jmhsieh jmhsieh Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the high level of how the tool can help is important. I'm going to move this one section down before ### programmatic usage and imrpove the prose a bit.

+ kuberay-client
```

## Common Issues and Solutions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd cut this section too; I don't think most of these common issues really pull their weight. We already highlighted them at the top of the page, and the solution to all of them is "use a manifest to make them the same."


## Fixing Mismatches

### Option 1: Manifest pip Dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of 3 options (which are not the 3 options I'd choose; "switch to Conda" is probably not going to solve your woes!), I'd do something like this:

Suggested change
### Option 1: Manifest pip Dependencies
To fix any potentially problematic dependency mismatches, specify them in a Geneva Manifest. For prototyping, you can specify them using pip, conda, a requirements.txt, or an environment.yml file. See [Execution contexts](/geneva/jobs/contexts) for more details. For stable production jobs, we recommend baking the dependencies into the image.
To fix any missing env vars, pass them as ray_init_kwargs like so...

Copy link
Contributor Author

@jmhsieh jmhsieh Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conda ends up being the solution for some because the default ray images use conda and may have conflicts.

I'm goign to fold these solutions into next to the approapriate output sections to make it flow more easily.

jmhsieh and others added 3 commits February 2, 2026 11:48
… sections

Consolidate duplicate manifest examples by moving solutions directly under
relevant output sections: architecture fix under PYTHON/PLATFORM, env var
passing under Environment Variables, and package fixes under Packages.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jmhsieh
Copy link
Contributor Author

jmhsieh commented Feb 2, 2026

Ok, I reorganized the doc so it should flow and have less repetition now.

Copy link
Contributor

@dantasse dantasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for reorganizing and for all the changes, looks much better IMO. (couple remaining comments non-blocking)

@jmhsieh jmhsieh merged commit ddbcd96 into main Feb 3, 2026
2 of 3 checks passed
@jmhsieh jmhsieh deleted the jon/geneva-diagnostics branch February 3, 2026 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants