scripts/ Overview

This directory contains the mainline scripts required to build, run, and maintain the evaluation pipeline.

Core Entrypoints

`pipeline_host.py`

Internal shared helper, not a user-facing CLI. It is responsible for:

running Step1 readability on the host with streamed output
performing guest clock / HTTPS preflight checks
forwarding host environment variables into guest-side subprocesses

Both run_single_task.py and auto_eval.py rely on it, so single-task and batch execution share the same host orchestration logic.

`run_single_task.py`

Single-task entrypoint. It:

runs Step1 readability on the host
performs guest preflight for the selected Lima instance
executes Step2 and Step3 inside the guest

Example:

python3 scripts/run_single_task.py \
  src/7.c \
  decompiled/retdec_out/arm32/7/7_gcc_O2_no_g.c \
  --arch arm32 \
  --original-bin build/arm32/7/7_gcc_O2_no_g \
  --llm-profile qwen3.5-plus \
  --results-dir runs/qwen_demo

`launch_auto_eval.py`

Interactive batch launcher. It:

selects a profile from config/
chooses a compatible key alias
launches auto_eval.py

Example:

python3 scripts/launch_auto_eval.py \
  --llm-profile glm_official \
  --arch arm64 \
  --results-dir results_glm_v4_full \
  --retry

`auto_eval.py`

Main batch-evaluation entrypoint. The current mainline behavior is:

run Step1 readability on the host
run guest clock / HTTPS preflight
execute Step2 and Step3 inside the matching Lima guest

Common usage:

python3 scripts/auto_eval.py \
  --arch arm32 \
  --src 7 \
  --bin-name 7_gcc_O2_no_g \
  --decompiler retdec \
  --llm-profile qwen3.5-plus \
  --results-dir runs/qwen_batch

This filtered command still exercises the full auto_eval.py orchestration path and is useful for validating a single task before scaling up. For larger runs, widen the filters or use launch_auto_eval.py.

Step3-only rerun:

python3 scripts/auto_eval.py \
  --src 3 5-23 6 7 \
  --arch arm64 \
  --step3-only \
  --results-dir results_glm_v4_full

`run_pipeline_in_docker.py`

Guest worker entrypoint. It is normally invoked by run_single_task.py or auto_eval.py, not directly from the host as a primary user command.

Example:

limactl shell binbench python3 -u scripts/run_pipeline_in_docker.py \
  src/1.c \
  decompiled/ghidra_out/arm64/1/1_gcc_O0_g.c \
  --original-bin build/arm64/1/1_gcc_O0_g \
  --results-dir results_glm_v4_full \
  --llm-profile glm_official

Notes:

the mainline path usually passes --skip-readability, because Step1 has already run on the host
--skip-step3 is a debugging switch
--step3-only is used to rerun semantic evaluation from an existing successful Step2 result

Build and Environment

`build_in_docker.py`

Compiles the benchmark source corpus inside the build container and writes outputs into build/.

podman build -t cross-compiler -f scripts/Dockerfile .
podman run --platform linux/amd64 --rm -v "$(pwd):/work" cross-compiler \
  python3 scripts/build_in_docker.py

`step3_preflight.py`

Environment preflight for Step3. It verifies that Lima, Frida, the results tree, and required artifacts are ready for semantic evaluation.

`setup_lima_env.sh`

`setup_lima_arm32_env.sh`

`setup_lima_x64_env.sh`

`setup_lima_x86_env.sh`

Initialize Lima environments for each architecture.

`lima_ssh_no_mux.sh`

SSH wrapper used by batch scripts to avoid instability caused by Lima multiplexing.

Container Files

`Dockerfile`

Primary build container definition.

`Dockerfile.arm64`

Container definition for ARM64-related build and runtime workflows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts/ Overview

Core Entrypoints

`pipeline_host.py`

`run_single_task.py`

`launch_auto_eval.py`

`auto_eval.py`

`run_pipeline_in_docker.py`

Build and Environment

`build_in_docker.py`

`step3_preflight.py`

`setup_lima_env.sh`

`setup_lima_arm32_env.sh`

`setup_lima_x64_env.sh`

`setup_lima_x86_env.sh`

`lima_ssh_no_mux.sh`

Container Files

`Dockerfile`

`Dockerfile.arm64`

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

scripts/ Overview

Core Entrypoints

pipeline_host.py

run_single_task.py

launch_auto_eval.py

auto_eval.py

run_pipeline_in_docker.py

Build and Environment

build_in_docker.py

step3_preflight.py

setup_lima_env.sh

setup_lima_arm32_env.sh

setup_lima_x64_env.sh

setup_lima_x86_env.sh

lima_ssh_no_mux.sh

Container Files

Dockerfile

Dockerfile.arm64

`pipeline_host.py`

`run_single_task.py`

`launch_auto_eval.py`

`auto_eval.py`

`run_pipeline_in_docker.py`

`build_in_docker.py`

`step3_preflight.py`

`setup_lima_env.sh`

`setup_lima_arm32_env.sh`

`setup_lima_x64_env.sh`

`setup_lima_x86_env.sh`

`lima_ssh_no_mux.sh`

`Dockerfile`

`Dockerfile.arm64`