Conversation
OC-CCL (One-shot Cycle-Consistency Learning) trains SAM2 by running the
cycle reference -> query -> reference, supervised against the GT mask
with BCE + Dice. The DifferentiableSAM2Tracker bypasses SAM2's
@torch.inference_mode() decorators so gradients flow through tracking.
Includes:
- src/sst/oc_ccl.py and src/sst/butterfly_dataset.py
- experiments/{oc_ccl_ablation,curriculum_oc_ccl,eval_all_ablations}.py
plus launch_ablations.sh for the 16-run ablation grid
- Cambridge butterfly image downloaders (Zenodo)
- Hydra GlobalHydra guard and corrected sam2_hiera_l _target_ path
so the vendored sam2 coexists with a pip-installed sam2
- README usage section and TODO checkbox
|
Runnable on my end, should be good to go |
|
@DeFisch, we do have a download package specifically designed for such purposes (it verifies images downloaded match expectations + other features) and to avoid bespoke download scripts in different projects (it's the recommended downloader for this dataset in particular). |
Replaces the inline download_all_images.py / download_parallel.py scripts with cautious-robot (https://github.com/Imageomics/cautious-robot), keeping image fetching in line with other Imageomics tooling. - data/cambridge_butterfly/build_download_csv.py flattens the per-species train/test JSONs into a single images.csv (columns: filename, file_url) with filename = <image_id>.<ext>, matching ButterflyOCCCLDataset's expected layout. - README updated to: pip install cautious-robot, build CSV, run cautious-robot -i images.csv -o images/. Note: cautious-robot is sequential. For ~4700 images this takes considerably longer than the previous 16-worker parallel script, but benefits from cautious-robot's checksum verification and retry logic.
The image manifest is now checked into the repo so users can run cautious-robot directly, with --verifier-col md5 catching any corrupted or modified downloads against the source-of-truth checksums. - data/cambridge_butterfly/images.csv (4727 rows: filename, file_url, md5) is generated by querying the Zenodo public API for each of the 19 records referenced in train_test_separate/*.json. - build_download_csv.py is now a maintenance script: re-run only when the JSON splits change. It fetches fresh md5s from Zenodo per record. - README updated to drop the build step from the user flow and to pass --checksum-algorithm md5 --verifier-col md5 to cautious-robot. Verified end-to-end on a 2-row subset: cautious-robot 2.0.0 downloads the images, computes md5s, and reports 'Buddy check successful'.
|
Seems like cautious-robot streamline the download sequentially (compared to my parallel download script previously) so download images will take a few hours with it |
|
@egrace479 distributed-downloader isn't the right tool here for downloading in parallel and efficiently? |
|
No description provided.