Skip to content

adding the API contracts#7

Open
MikeLippincott wants to merge 1 commit intoWayScience:mainfrom
MikeLippincott:data_contracts
Open

adding the API contracts#7
MikeLippincott wants to merge 1 commit intoWayScience:mainfrom
MikeLippincott:data_contracts

Conversation

@MikeLippincott
Copy link
Copy Markdown
Member

Description

This PR adds data contracts for validating inputs and outputs. This is an initial step and will be added to as other modules get added in the future.

What kind of change(s) are included?

  • Documentation (changes docs or other related content)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • These changes pass all pre-commit checks.
  • I have added comments to my code to help provide understanding
  • I have added a test which covers the code changes found within this PR
  • I have deleted all non-relevant text in this pull request template.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I'm requesting changes mainly because I think there's existing tooling you could make use of to reduce the amount of new code you need to create.

If you feel strongly about keeping things as-is just let me know and I'll circle back to give this more thought.

Comment thread README.md
# ZedProfiler

[![Coverage](https://img.shields.io/badge/coverage-87%25-green)](#quality-gates)
[![Coverage](https://img.shields.io/badge/coverage-99%25-brightgreen)](#quality-gates)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could we validate this coverage? At the moment this seems static, meaning it must be updated each time by some unknown force. If you have a reproducible method which updates this, consider documenting it under contributing.md. Otherwise, consider removing it altogether because part of code coverage is about building trust and reproducibility (which we directly collide with if we supply a measurement without any validation). A side effect of doing this is that it should increase development velocity by making it impossible to avoid an update if coverage changes.

The package accepts either:
The package accepts:
- Single-channel 3D arrays shaped (z, y, x)
- Multi-channel 4D arrays shaped (c, z, y, x)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to 4d arrays?

Comment on lines +17 to +21
EXPECTED_SPATIAL_DIMS = 3
TWO_DIMENSIONAL = 2
FOUR_DIMENSIONAL = 4
FIVE_OR_MORE_DIMENSIONS = 5
REQUIRED_RETURN_KEYS = ("image_array", "features", "metadata")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using Pydantic to help define classes which may be used for validation broadly. You could also likely incorporate Pandera, if DataFrames are to be included somehow, which integrates nicely with Pydantic. Doing this will save you effort from building bespoke data / contract validation tooling.

As you go through this effort you can use concepts from object-orientated programming (OOP) to help you modulate what is being validated and worked on at an atomic level within the software. For instance, what do these attributes pertain to - maybe it's "widget_x"? You can label it as such, using defined objects which are both validated and also passed by type to other portions of the software. You might eventually find that you also want "widget_y" with separate attributes too. Keep separation of concerns in mind and at the same time, avoid over-specific OOP ("super good enough" is often best).

@@ -1,31 +1,251 @@
"""Core data contracts shared across featurizers.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly encourage greater specificity here with concern to "what kind" of contract, now that you're building this out. Specifically, Design by contract, Hoare Triples, and Abstract Data Typing come to mind.

return True


def validate_return_schema_contract(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if this comes into play here exactly, but this came to mind: Consider using beartype to help validate objects at time of use.

Comment thread tests/conftest.py
Comment on lines +14 to +15
TEST_DIR = Path(__file__).parent
sys.path.insert(0, str(TEST_DIR))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an antipattern which arose from how things are executed for tests. Consider finding a way to remove this code. Sometimes adding an __init__.py file can help, but it depends on what happened that made you add this in (I can't tell from just the code). Pytest can sometimes be a bit cranky about internal imports unless you explicitly tell it that the the test dir is a package with the init file.

Comment thread tests/conftest.py
Comment on lines +26 to +28
# ============================================================================
# FIXTURE: Minimal valid profile
# ============================================================================
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider avoiding this bespoke commenting pattern and instead using docstrings inside the fixture definition. Comment applies wherever this pattern popped up.

Comment thread tests/conftest.py
# FIXTURE: Small 3D image profile
# ============================================================================
@pytest.fixture
def small_image_profile() -> TestProfile:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using pytest's parametrize to collapse the code needed for repeated iteration over similar data. Comment applies to many tests in this pr.

@@ -1,4 +1,4 @@
"""Tests for package export ergonomics."""
r"""Tests for package export ergonomics."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The r here looks like a typo. Consider removing.

Comment thread pyproject.toml
"fire>=0.7.1",
"jinja2>=3.1.6",
"pandas>=3.0.2",
"pyarrow>=23.0.1",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't tell exactly, but is pyarrow needed for this PR?

@d33bs d33bs mentioned this pull request Apr 23, 2026
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants