Release v0.11.1: fix --json performance and output by paxcalpt · Pull Request #17 · HenriquesLab/TaskRepo

paxcalpt · 2026-04-17T17:11:21Z

Summary

Follow-up to PR #16 addressing review comments from claude[bot] and @Copilot.

Perf: load ID cache once per invocation instead of per-task. Previously _task_to_dict called get_display_id_from_uuid, which re-opened and linearly scanned the cache file on every call — O(N) file reads and O(N²) scans for N tasks.
Output correctness: switch JSON emission from json.dump(..., sys.stdout) to click.echo(json.dumps(...)), so Click's output layer (and test capture via CliRunner) sees the output.
UX: emit a one-time stderr hint when the ID cache is empty on a filtered view, so users aren't surprised by null id values.
Cleanup: drop redundant conditional display_id if display_id is not None else None.

What I did NOT change (and why)

Kept id typed as int | null rather than a mixed int | "uuid-prefix..." string. The uuid field is already the stable fallback, and a union type would break jq/schema consumers.
Left the docstring-only concern about stale IDs on filtered views alone — behavior matches the table view by design.

Test plan

8 new unit tests (tests/unit/test_list_json.py): field coverage, missing-cache → null id, None-date handling, JSON serializability, cache loading (missing/valid/malformed), regression guard against per-task cache reads.
Full suite: 160/160 passing (uv run pytest tests/ -q).
uv run ruff check clean.
Live sanity: tsk list --json -s pending --priority h returns 25 valid JSON objects.
After merge: tag v0.11.1 → PyPI publish, then just release taskrepo.

🤖 Generated with Claude Code

Addresses review feedback on PR #16: - Load the ID cache once per invocation instead of per-task, eliminating O(N²) scans and N redundant file reads for large task lists. - Switch to click.echo for JSON output so Click's test-capture layer and stdout redirection work correctly. - Emit a stderr hint when the ID cache is empty on a filtered view, so null id values aren't silent. - Drop a redundant conditional on the id field. - Add 8 unit tests covering task-to-dict serialization, cache loading (missing/valid/malformed), and a regression guard against O(N²) reads. Keeps the JSON schema stable (int | null id, uuid always present). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-04-17T17:13:22Z

Code Review: PR #17 — Release v0.11.1

Overall this is a clean, focused follow-up. The fixes are correct and the reasoning is well-documented. A few observations below.

✅ What's well done

Performance fix is correct. _load_uuid_to_display_id() is a clean extraction: one disk read up-front, O(1) dict lookups per task. The regression guard in the tests (test_load_uuid_to_display_id_is_single_read_for_n_tasks) is a nice touch, even though it tests the calling convention rather than I/O directly.
click.echo fix is the right call. json.dump(..., sys.stdout) bypasses Click's output layer and breaks CliRunner capture — switching to click.echo(json.dumps(...)) is the correct idiom.
has_filters is defined before use (line ~113 in the merged file) — I checked; no undefined-variable risk.
Redundant conditional removed — display_id if display_id is not None else None → uuid_to_id.get(task.id) is a nice simplification.
Changelog and version bump are consistent and well-written.

🔍 Suggestions / minor issues

1. Write-then-read round-trip on unfiltered JSON runs (minor)

On an unfiltered tsk list --json call, the code:

Calls save_id_cache(sorted_tasks, rebalance=True) — writes the cache to disk.
Then immediately calls _load_uuid_to_display_id() — reads it back from disk.

This is a redundant write+read cycle. It's not a bug (the file is fresh and consistent), but it's wasteful. Consider having save_id_cache return the {uuid: display_id} map, or extracting the map inline from sorted_tasks after the save. Not blocking, but worth a follow-up ticket.

2. Stderr hint doesn't fire on empty-cache unfiltered --json (by design, but undocumented)

On an unfiltered run, save_id_cache writes the cache before _load_uuid_to_display_id() is called, so uuid_to_id is never empty post-write. This makes the hint condition not uuid_to_id and has_filters safe — but the logic relies on this ordering implicitly. A short comment near the hint would help future readers:

# Note: on unfiltered runs save_id_cache() above has already populated
# the cache, so uuid_to_id will not be empty here.
if not uuid_to_id and has_filters:

3. Missing CLI integration tests (gap)

All 8 new tests exercise _load_uuid_to_display_id and _task_to_dict directly — good unit coverage. But there's no CliRunner-based test that invokes list_tasks with --json and asserts on the full stdout output. This is the exact path that was broken before (json.dump → stdout bypass), and a regression could re-enter undetected. Recommend adding at least one integration test:

from click.testing import CliRunner
from taskrepo.cli.commands.list import list_tasks

def test_json_output_captured_by_cli_runner(...):
    runner = CliRunner()
    result = runner.invoke(list_tasks, ["--json"], obj=...)
    data = json.loads(result.output)
    assert isinstance(data, list)

4. Missing test for the stderr hint

test_load_uuid_to_display_id_returns_empty_when_cache_missing covers the empty-cache load, but no test verifies that the hint is actually emitted to stderr when the cache is empty on a filtered run. A CliRunner(mix_stderr=False) invocation would cover this path.

🔒 Security / correctness

Exception handling in _load_uuid_to_display_id (JSONDecodeError, KeyError, ValueError) is appropriately scoped — no over-broad except Exception.
default=str in json.dumps is an acceptable safety net; since all date fields are explicitly converted to ISO strings in _task_to_dict, it won't silently swallow surprises on those fields.

Summary

Approve with suggestions. The core fixes (O(N²) → O(N) I/O, Click output, UX hint) are correct and well-tested at the unit level. The two gaps — CLI integration test and stderr hint test — are the only things worth addressing before tagging v0.11.1, as they guard the exact regression path this PR is fixing.

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR is a follow-up release to improve tsk list --json by eliminating per-task cache file reads, ensuring JSON output flows through Click’s output layer for correct capture, and adding tests to prevent regressions.

Changes:

Load the UUID→display-id cache once per tsk list --json invocation and use an in-memory map during serialization.
Emit JSON via click.echo(json.dumps(...)) and add a one-time stderr hint when the ID cache is empty on filtered views.
Add unit tests covering JSON field serialization and cache-loading behavior; bump version and update changelog.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/taskrepo/cli/commands/list.py`	Adds `_load_uuid_to_display_id()`, updates JSON serialization path, and prints a stderr hint for empty cache on filtered output.
`tests/unit/test_list_json.py`	Introduces unit tests for JSON serialization and ID-cache loading/regression behavior.
`src/taskrepo/__version__.py`	Bumps version to `0.11.1`.
`CHANGELOG.md`	Documents the `0.11.1` changes and the added test coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T17:13:35Z

+        return {entry["uuid"]: int(display_id) for display_id, entry in cache.items()}
+    except (json.JSONDecodeError, KeyError, ValueError):


_load_uuid_to_display_id()’s docstring says it returns {} when the cache is missing or malformed, but the implementation can still raise exceptions (e.g., OSError if the file exists but can’t be opened, or AttributeError/TypeError if the parsed JSON isn’t a dict and has no .items()). Consider validating cache is a dict and/or broadening the exception handling so tsk list --json doesn’t crash on a corrupt/unreadable cache file.

Suggested change

return {entry["uuid"]: int(display_id) for display_id, entry in cache.items()}

except (json.JSONDecodeError, KeyError, ValueError):

if not isinstance(cache, dict):

return {}

return {

entry["uuid"]: int(display_id)

for display_id, entry in cache.items()

if isinstance(entry, dict)

}

except (OSError, json.JSONDecodeError, KeyError, TypeError, ValueError):

Copilot · 2026-04-17T17:13:36Z

+    # created/modified are set by Task.__post_init__ defaults, so they are
+    # always serializable strings — just assert they round-trip through JSON.


This comment is inaccurate: Task.created/Task.modified defaults come from the dataclass default_factory=datetime.now (not Task.__post_init__), and in this test helper they’re also explicitly set in _make_task()’s defaults. Please update or remove the comment so it matches the actual source of these values.

Suggested change

# created/modified are set by Task.__post_init__ defaults, so they are

# always serializable strings — just assert they round-trip through JSON.

# `created`/`modified` are populated for these helper-built tasks, and

# `_task_to_dict()` should serialize them to JSON-safe string values.

- Harden _load_uuid_to_display_id: reject non-dict top-level, filter entries missing `uuid`, catch OSError/TypeError (Copilot comment). - Add one-line comment clarifying the stderr-hint ordering invariant (claude[bot] suggestion). - Fix inaccurate comment about Task.__post_init__ in the test helper (Copilot comment). - Add CLI integration tests via CliRunner covering the exact regression path this release fixes: --json output end-to-end, empty result → [], and stderr hint on filtered empty-cache runs (claude[bot] suggestion). - Add unit test for non-dict top-level cache files. Deferred as follow-up: the save-then-load round-trip on unfiltered runs (noted by claude[bot], non-blocking). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-04-17T17:20:14Z

Code Review: Release v0.11.1 — `--json` performance and output correctness

Overall this is a clean, well-motivated follow-up. The two core fixes are correct, the test coverage is solid, and the changelog is thorough. A few observations below.

Strengths

Performance fix is correct. Moving the cache load out of _task_to_dict and into the call site eliminates the O(N²) file-read pattern. The dict comprehension in _load_uuid_to_display_id is a single O(N) pass — nothing more needed.
Output correctness fix is correct. click.echo(json.dumps(...)) routes through Click's output layer so CliRunner captures it properly. The old json.dump(..., sys.stdout) bypass was the root cause and is now gone.
Error handling in _load_uuid_to_display_id is appropriately defensive — missing file, malformed JSON, wrong top-level type, missing keys, and bad int conversion all return {} gracefully.
Test suite is thorough: 8 unit tests + 3 CLI integration tests, including the exact regression path that was broken.

Minor Issues

1. KeyError in the except clause is unreachable (harmless but confusing)

list.py:36

except (OSError, json.JSONDecodeError, KeyError, TypeError, ValueError):

The dict comprehension already guards with if isinstance(entry, dict) and "uuid" in entry, so entry["uuid"] will never raise KeyError. The int(display_id) call can raise ValueError, which is covered. KeyError can be dropped for clarity.

2. Regression guard test doesn't actually assert single-read behavior

tests/unit/test_list_json.py:125–135 (test_load_uuid_to_display_id_is_single_read_for_n_tasks) verifies that dict lookups return correct values, but it doesn't verify that no extra file I/O occurs. A future refactor that re-introduces a per-task file read would still pass this test. Consider using unittest.mock.patch("builtins.open") with a call counter, or patching _load_uuid_to_display_id itself in the integration test to assert it's called exactly once per list_tasks invocation. The docstring says "Regression guard" but it's really a correctness test.

3. Misleading stderr hint for --archived --json with empty cache

list.py:165–170

if not uuid_to_id and has_filters:
    click.echo(
        "Note: ID cache is empty; `id` fields will be null. "
        "Run `tsk list` without filters once to populate it.",
        err=True,
    )

has_filters is True when --archived is passed. But "Run tsk list without filters once to populate it" is misleading in the archived case — an unfiltered run would populate the cache with non-archived tasks, not archived ones. The archived task UUIDs would still be absent from the cache. Consider either excluding archived from has_filters for this hint, or adjusting the hint message.

4. get_cache_path() triggers migrate_legacy_files() on every --json invocation

id_mapping.py:18 — get_cache_path() calls migrate_legacy_files() as a side effect. This runs on every JSON output call. It's presumably idempotent and fast, but it's an implicit side effect in what looks like a pure getter. Not a bug, just worth being aware of.

Nits

test_json_output_stderr_hint_on_filtered_empty_cache relies on CliRunner's mix_stderr=True default to merge stderr into result.output. The comment correctly notes this, but making it explicit with runner = CliRunner(mix_stderr=True) would make the test's intent self-documenting.
No test covers --archived --json interaction — low priority but could catch the hint-message issue above.

Summary

Approve with minor suggestions. The performance and correctness fixes are solid and the test coverage is well above the project's 80% target. Items 2 and 3 above are the most worth addressing: #2 because the named regression guard doesn't actually guard the regression, and #3 because the hint message misleads users in the --archived case.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 17, 2026 17:11

Copilot started reviewing on behalf of paxcalpt April 17, 2026 17:11 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

paxcalpt merged commit b686ff5 into main Apr 17, 2026
8 checks passed

paxcalpt deleted the release/0.11.1 branch April 17, 2026 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.11.1: fix --json performance and output#17

Release v0.11.1: fix --json performance and output#17
paxcalpt merged 2 commits intomainfrom
release/0.11.1

paxcalpt commented Apr 17, 2026

Uh oh!

claude Bot commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

claude Bot commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return {entry["uuid"]: int(display_id) for display_id, entry in cache.items()}
		except (json.JSONDecodeError, KeyError, ValueError):

		# created/modified are set by Task.__post_init__ defaults, so they are
		# always serializable strings — just assert they round-trip through JSON.

Conversation

paxcalpt commented Apr 17, 2026

Summary

What I did NOT change (and why)

Test plan

Uh oh!

claude Bot commented Apr 17, 2026

Code Review: PR #17 — Release v0.11.1

✅ What's well done

🔍 Suggestions / minor issues

🔒 Security / correctness

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Apr 17, 2026

Code Review: Release v0.11.1 — --json performance and output correctness

Strengths

Minor Issues

Nits

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Code Review: Release v0.11.1 — `--json` performance and output correctness