Skip to content

Oc-ccl finetune code#23

Open
DeFisch wants to merge 5 commits intomainfrom
oc-ccl-finetune
Open

Oc-ccl finetune code#23
DeFisch wants to merge 5 commits intomainfrom
oc-ccl-finetune

Conversation

@DeFisch
Copy link
Copy Markdown
Collaborator

@DeFisch DeFisch commented Apr 30, 2026

No description provided.

DeFisch added 3 commits April 30, 2026 11:15
OC-CCL (One-shot Cycle-Consistency Learning) trains SAM2 by running the
cycle reference -> query -> reference, supervised against the GT mask
with BCE + Dice. The DifferentiableSAM2Tracker bypasses SAM2's
@torch.inference_mode() decorators so gradients flow through tracking.

Includes:
- src/sst/oc_ccl.py and src/sst/butterfly_dataset.py
- experiments/{oc_ccl_ablation,curriculum_oc_ccl,eval_all_ablations}.py
  plus launch_ablations.sh for the 16-run ablation grid
- Cambridge butterfly image downloaders (Zenodo)
- Hydra GlobalHydra guard and corrected sam2_hiera_l _target_ path
  so the vendored sam2 coexists with a pip-installed sam2
- README usage section and TODO checkbox
@DeFisch
Copy link
Copy Markdown
Collaborator Author

DeFisch commented Apr 30, 2026

Runnable on my end, should be good to go

@egrace479
Copy link
Copy Markdown
Member

@DeFisch, we do have a download package specifically designed for such purposes (it verifies images downloaded match expectations + other features) and to avoid bespoke download scripts in different projects (it's the recommended downloader for this dataset in particular).

Replaces the inline download_all_images.py / download_parallel.py
scripts with cautious-robot (https://github.com/Imageomics/cautious-robot),
keeping image fetching in line with other Imageomics tooling.

- data/cambridge_butterfly/build_download_csv.py flattens the per-species
  train/test JSONs into a single images.csv (columns: filename, file_url)
  with filename = <image_id>.<ext>, matching ButterflyOCCCLDataset's
  expected layout.
- README updated to: pip install cautious-robot, build CSV, run
  cautious-robot -i images.csv -o images/.

Note: cautious-robot is sequential. For ~4700 images this takes
considerably longer than the previous 16-worker parallel script, but
benefits from cautious-robot's checksum verification and retry logic.
Comment thread data/cambridge_butterfly/build_download_csv.py
The image manifest is now checked into the repo so users can run
cautious-robot directly, with --verifier-col md5 catching any corrupted
or modified downloads against the source-of-truth checksums.

- data/cambridge_butterfly/images.csv (4727 rows: filename, file_url, md5)
  is generated by querying the Zenodo public API for each of the 19
  records referenced in train_test_separate/*.json.
- build_download_csv.py is now a maintenance script: re-run only when
  the JSON splits change. It fetches fresh md5s from Zenodo per record.
- README updated to drop the build step from the user flow and to pass
  --checksum-algorithm md5 --verifier-col md5 to cautious-robot.

Verified end-to-end on a 2-row subset: cautious-robot 2.0.0 downloads
the images, computes md5s, and reports 'Buddy check successful'.
@DeFisch
Copy link
Copy Markdown
Collaborator Author

DeFisch commented Apr 30, 2026

Seems like cautious-robot streamline the download sequentially (compared to my parallel download script previously) so download images will take a few hours with it

@hlapp
Copy link
Copy Markdown
Member

hlapp commented Apr 30, 2026

@egrace479 distributed-downloader isn't the right tool here for downloading in parallel and efficiently?

@egrace479
Copy link
Copy Markdown
Member

@egrace479 distributed-downloader isn't the right tool here for downloading in parallel and efficiently?

distributed-downloader has a lot more overhead and is overkill for <5K images. This is a once to re-train step, so two lines of code (install and run) then do something else for a bit seems reasonable. We do anticipate adding potential speed-ups to cautious-robot in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants