Add auto-deploy feature to StreamWise dashboard by James-QiuHaoran · Pull Request #313 · Azure/realtimevideogen

James-QiuHaoran · 2026-05-16T00:34:53Z

Add an Auto Deploy button to the web dashboard that automatically optimizes GPU resource allocation across workflow components using the model provisioner's greedy allocator.

Depends on: #312 (refactor: move policies to model_provisioner)

Changes

streamwise/allocator_bridge.py (new): Maps allocator output to K8s deployment parameters (Model enum → container names, GPU specs)
streamwise/streamwise.py: Add /api/auto_deploy and /api/auto_deploy/confirm routes
streamwise/templates/add_pod.html: Auto-deploy UI section with GPU budget inputs, workflow selector, and deployment plan preview
tests/streamwise/test_allocator_bridge.py (new): 15 tests for allocator bridge
tests/streamwise/test_streamwise_auto_deploy.py (new): 10 tests for auto-deploy API

Testing

108 streamwise tests pass (1 pre-existing Windows failure: \ est_list_files)
flake8 clean
mypy clean

…ioner/ Move the 6 policy/allocator files (greedy, milp, naive_baseline, hexgen, helix, policies) from simulator/ into streamwise/model_provisioner/ so they can be reused by both the simulator evaluation framework and the StreamWise serving system. - Create streamwise/model_provisioner/ package with __init__.py that adds simulator/ to sys.path for foundation module access - Create simulator/__init__.py that adds streamwise/ to sys.path so model_provisioner is importable from simulator code - Update all imports across simulator files and 20 test files - Switch data_loading.py to use Path instead of str for data_dir params - Fix mypy issue in wrapper/run_httpserver.py (bytearray assignment) - Add .venv to .flake8 exclude and .gitignore Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Support both local dev (../../simulator) and Docker (../simulator) paths when resolving the simulator directory for foundation module imports. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add an 'Auto Deploy' button to the web dashboard that automatically optimizes GPU resource allocation across workflow components using the model provisioner's greedy allocator. - Add streamwise/allocator_bridge.py: maps allocator output to K8s deployment parameters (Model enum -> container names, GPU specs) - Add /api/auto_deploy and /api/auto_deploy/confirm routes to streamwise.py for computing and confirming deployment plans - Add auto-deploy UI section to add_pod.html with GPU budget inputs, workflow selector, and deployment plan preview - Add comprehensive tests for allocator bridge and auto-deploy API Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ectory Add streamwise/ directory to sys.path explicitly in allocator_bridge.py so model_provisioner can be found when Python is invoked from a different working directory (e.g., in Docker/pipeline environments). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The StreamWise Docker image was missing the model_provisioner/ package and simulator/ foundation modules needed by the auto-deploy feature. - Update deployment/setup_image.sh to copy model_provisioner/ and simulator/ into the Docker build context - Update Dockerfile to COPY both directories - Fix model_provisioner/__init__.py to find simulator/ in both local dev layout (../../simulator) and Docker layout (../simulator) - Guard sys.path.insert with dedup check in allocator_bridge.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The model_provisioner and simulator foundation modules (sim_types, data_loading, utils, greedy) require pandas and tabulate which were not previously needed by the StreamWise Docker image. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a new "Auto Deploy" feature to the StreamWise dashboard: an HTTP API and UI that runs the greedy AutoModelAllocator over a user-supplied GPU budget + workflow, previews the resulting per-container deployment plan, and (on confirm) creates the corresponding pods via pod_manager.add_pod. A new bridge module translates the simulator's abstract Model/GPUType allocations into concrete K8s container specs, and the Dockerfile/setup script now bundle model_provisioner and simulator into the streamwise image.

Changes:

New streamwise/allocator_bridge.py mapping Model/GPUType allocations to deployment specs and exposing run_allocator + JSON serialization.
New /api/auto_deploy, /api/auto_deploy/confirm, /api/auto_deploy/workflows routes in streamwise.py, plus an Auto-Deploy UI section in add_pod.html with optimize/confirm flow.
Packaging changes (Dockerfile, setup_image.sh, requirements.txt) to ship simulator + model_provisioner and add pandas/tabulate; new tests for the bridge and the API routes.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
streamwise/allocator_bridge.py	New bridge: Model→container/MIG/resource mapping and `run_allocator`/serialization helpers.
streamwise/streamwise.py	Three new auto-deploy routes invoking the bridge and `pod_manager.add_pod`.
streamwise/templates/add_pod.html	Auto-deploy form, results table, and fetch logic for optimize/confirm.
streamwise/requirements.txt	Adds `pandas` and `tabulate` dependencies.
deployment/streamwise/Dockerfile	Copies `model_provisioner` and `simulator` into the image.
deployment/setup_image.sh	Stages `model_provisioner` and `simulator` for the streamwise image build.
tests/streamwise/test_allocator_bridge.py	Unit/integration tests for the bridge (mappings, specs, real allocator runs).
tests/streamwise/test_streamwise_auto_deploy.py	API tests for the three new routes, with K8s mocks and allocator patches.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Agent-Logs-Url: https://github.com/Azure/realtimevideogen/sessions/d53596b3-c563-4deb-af27-7226e9dac364 Co-authored-by: James-QiuHaoran <22564180+James-QiuHaoran@users.noreply.github.com>

Agent-Logs-Url: https://github.com/Azure/realtimevideogen/sessions/e52c100d-a760-4a1a-8771-416155f3e835 Co-authored-by: James-QiuHaoran <22564180+James-QiuHaoran@users.noreply.github.com>

Copilot · 2026-05-17T22:31:59Z

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

example.com
- Triggering command: /home/REDACTED/.local/bin/pytest /home/REDACTED/.local/bin/pytest --ignore=tests/simulator --ignore=tests/streamwise --ignore=tests/streamwise_app -vv -main/dist/indexgpg.program (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

github-actions · 2026-06-09T00:51:22Z

Test Results

1 174 tests +29 1 171 ✅ +26 14m 56s ⏱️ -5s
13 suites ± 0 0 💤 ± 0
13 files ± 0 3 ❌ + 3

For more details on these failures, see this check.

Results for commit 19b539d. ± Comparison against base commit 9bcbc4e.

♻️ This comment has been updated with latest results.

Add /api/auto_deploy/cluster_gpus endpoint that aggregates allocatable GPUs by type (H100, A100, etc.) from all ready nodes. The auto-deploy form fetches this on page load and pre-fills the GPU budget text boxes. Also fixed NVIDIA device plugin toleration for Spot nodes (needed to register GPUs on AKS Spot node pools). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The upstream NVIDIA device plugin DaemonSet lacks the Spot toleration, so it won't schedule on AKS Spot GPU nodes. Document that the local manifest (deployment/k8s/nvidia-device-plugin-ds.yaml) already includes this fix, and provide the patch command as a fallback. Also document the need for manual nvidia.com/gpu.product labels on nodes until GPU Feature Discovery is installed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Previously A100 defaulted to 8 even when no A100 nodes exist. Now all fields default to 0 and are populated only from the cluster_gpus API. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When MIG is unavailable, patch DEVICE_OPTIONS so OTHERS (kokoro+yolo) counts as 2 full GPUs instead of 1 MIG slice. Mark hunyuanframepackvae as co-located container (gpu=0) since it shares resources with the HunyuanFramePack server. This ensures budget=16 produces exactly 16 GPUs for StreamCast. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The allocator counts OTHERS as 1 GPU (SINGLE_DEVICE_MODELS constraint), but without MIG, kokoro+yolo each need a full GPU = 2. This caused per-type budget violations (e.g., A100=9 when budget=8). Instead of patching DEVICE_OPTIONS (ineffective due to allocator constraints), detect per-type overflow after allocation and trim excess replicas of the most-replicated container on the overflowing type. This preserves throughput while respecting per-type budgets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

goiri · 2026-06-09T22:05:24Z

+                        <label for="auto_gpu_a100" class="form-label">A100</label>
+                        <input type="number" class="form-control" id="auto_gpu_a100" name="gpu_a100"
+                            min="0" max="64" value="0">
+                    </div>


Try to make it generic for GPUType

goiri · 2026-06-09T22:05:57Z

+                <div class="row g-3 mb-3">
+                    <div class="col-md-3">
+                        <label for="auto_gpu_a100" class="form-label">A100</label>
+                        <input type="number" class="form-control" id="auto_gpu_a100" name="gpu_a100"


Make a row per GPU type and make the number a slider

goiri · 2026-06-09T22:06:19Z

+            <table class="table table-sm table-bordered" id="auto-deploy-plan-table">
+                <thead>
+                    <tr>
+                        <th>Container</th>


This should show the friendly name

goiri · 2026-06-09T22:06:30Z

+                    <tr>
+                        <th>Container</th>
+                        <th>GPU</th>
+                        <th>GPU Type</th>


This should show the friendly name.

goiri · 2026-06-09T22:06:59Z

+                        <th>GPU Type</th>
+                        <th>CPU</th>
+                        <th>Memory</th>
+                        <th>MIG</th>


Next version may want to allow customizing the plan.

goiri · 2026-06-09T22:07:54Z

+                    </tr>
+                </thead>
+                <tbody id="auto-deploy-plan-body"></tbody>
+            </table>


Pass the disaggregation or leave a comment that is not disaggregated.

goiri · 2026-06-09T22:08:45Z

+                            `<strong>TTFF:</strong> ${metrics.ttff_s}s &nbsp;|&nbsp; ` +
+                            `<strong>Cost:</strong> $${metrics.cost} &nbsp;|&nbsp; ` +
+                            `<strong>GPUs Needed:</strong> ${actualGpus}` +
+                            (metrics.budget_exceeded ? ' <span class="text-danger">(exceeds budget!)</span>' : '');


Make this output nicer.

- Add /auto_deploy route with standalone auto_deploy.html page - Add robot icon + 'Auto Deploy' button on main index page above Applications - Remove auto-deploy section from add_pod.html (wrappers/apps add page) - Confirm endpoint now also deploys the application container (e.g., streamcast) when a workflow name is provided in the request Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

In Docker the parent of /streamwise is /, which produced an invalid path /simulator/data/*.csv. Using _HERE resolves to /streamwise/simulator/data/ inside the container. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ocator The greedy allocator asserts GPU counts are multiples of NUM_GPUS_PER_SERVER (8). Now we round up for the allocator call, then trim specs back to the user's actual budget. This allows non-multiple budgets like 26 or 30. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds a red 🗑️ Delete All button next to the ➕ under Wrappers. Clicking it calls DELETE /api/pods/wrappers which removes all non-app pods in the rtgen namespace. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace 4-column fixed GPU type inputs with dynamic rows (one per type). Rows auto-populate from the cluster state (e.g., 32 H100s). Users can add/remove rows; the dropdown prevents duplicate types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The delete-all-wrappers endpoint and the wrapper list on the main page now exclude the streamwise management pod and all STREAMWISE_APPS (load balancer / application containers). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…oy table The /api/auto_deploy response now includes friendly_name (from services.json friendlyName field via get_friendly_container_name) and uppercased gpu_type for each spec. The frontend displays these directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

COLOCATED_CONTAINERS was unconditionally forcing hunyuanframepackvae to gpu=0 even when the policy has disaggregation={Model.HF: True}. Now co-location only applies when disaggregation is disabled. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-09T23:29:04Z

Lint Results

Check	Status
Python	✅
Shell	✅
YAML	✅
JSON	✅
Markdown	✅
Bicep	✅

github-actions · 2026-06-09T23:33:11Z

Mypy Type Checking

✅ No issues found

Metric	Count
📂 Files	300
❌ Errors	0
⚠️ Warnings	0
📝 Notes	0

Full mypy output

Success: no issues found in 300 source files

github-actions · 2026-06-09T23:49:07Z

Diff Coverage

Diff: origin/main...HEAD, staged and unstaged changes

streamwise/allocator_bridge.py (69.3%): Missing lines 19,58-62,117-124,137-141,144-148,150-154,157-159,226,232,235,240-247,249,290,292-293,321,331,335
streamwise/container_config.py (88.2%): Missing lines 16,20
streamwise/streamwise.py (54.1%): Missing lines 555,573,575-576,580-584,586-592,635,638-639,641-643,645,648-649,652-653,792,823-824,829-832,865-866,886-887,895-898,902-903,914-924,926-930,941-943,975-976,978,983-985,994,997-1004,1006
tests/streamwise/test_allocator_bridge.py (93.0%): Missing lines 255-259,268-269
tests/streamwise/test_streamwise.py (100%)
tests/streamwise/test_streamwise_auto_deploy.py (100%)

Summary

Total: 585 lines
Missing: 138 lines
Coverage: 76%

streamwise/allocator_bridge.py

Lines 15-23

  15 if _HERE not in sys.path:
  16     sys.path.insert(0, _HERE)
  17 _REPO_ROOT = os.path.dirname(_HERE)
  18 if _REPO_ROOT not in sys.path:
! 19     sys.path.insert(0, _REPO_ROOT)
  20 import model_provisioner  # noqa: E402, F401 — adds simulator/ to sys.path
  21 
  22 from dataclasses import dataclass
  23 from typing import Optional

Lines 54-66

  54 
  55 
  56 def get_mig_profile(container_name: str, gpu_type: GPUType) -> Optional[str]:
  57     """Return a MIG profile only when MIG is available and the GPU type supports it."""
! 58     if not MIG_AVAILABLE:
! 59         return None
! 60     if gpu_type not in MIG_CAPABLE_GPU_TYPES:
! 61         return None
! 62     return MIG_CONTAINERS.get(container_name)
  63 
  64 
  65 # Mapping from StreamWise app name to simulator workflow key
  66 APP_TO_WORKFLOW: dict[str, str] = {

Lines 113-128

  113 
  114 
  115 def _calc_actual_gpus_per_type(specs: list['DeploymentSpec']) -> dict[GPUType, int]:
  116     """Calculate actual GPUs needed per GPUType from deployment specs."""
! 117     result: dict[GPUType, int] = {}
! 118     for spec in specs:
! 119         if spec.mig_profile:
! 120             continue
! 121         gpu_type = _POD_STR_TO_GPU_TYPE.get(spec.gpu_type or "")
! 122         if gpu_type is not None:
! 123             result[gpu_type] = result.get(gpu_type, 0) + spec.gpu
! 124     return result
  125 
  126 
  127 def _trim_specs_for_type(
  128     specs: list['DeploymentSpec'], gpu_type_str: str, excess: int

Lines 133-163

  133     Prefers removing replicas of the most-replicated scalable container (typically
  134     realesrgan/upscaler) to minimize impact on pipeline throughput.
  135     """
  136     # Count replicas per container on this GPU type (only scalable ones)
! 137     from collections import Counter
! 138     type_counts: Counter[str] = Counter()
! 139     for spec in specs:
! 140         if spec.gpu_type == gpu_type_str and spec.gpu > 0 and spec.container_name not in COLOCATED_CONTAINERS:
! 141             type_counts[spec.container_name] += 1
  142 
  143     # Prefer trimming containers with most replicas (least impact per removal)
! 144     trimmed = 0
! 145     result_specs = list(specs)
! 146     for container_name, _count in type_counts.most_common():
! 147         if trimmed >= excess:
! 148             break
  149         # Remove replicas from the end of the list
! 150         for i in range(len(result_specs) - 1, -1, -1):
! 151             if trimmed >= excess:
! 152                 break
! 153             spec = result_specs[i]
! 154             if (spec.container_name == container_name
  155                     and spec.gpu_type == gpu_type_str
  156                     and spec.gpu > 0):
! 157                 trimmed += spec.gpu
! 158                 result_specs.pop(i)
! 159     return result_specs
  160 
  161 
  162 def get_available_workflows() -> list[str]:
  163     """Return list of available workflow names for the UI."""

Lines 222-230

  222     # Load latency data and run allocator
  223     data_dir = _get_data_dir()
  224     latency_data = load_latency_data(data_dir=data_dir)
  225 
! 226     allocator = AutoModelAllocator(
  227         workflow=workflow,
  228         latency_data=latency_data,
  229         policy=STREAMWISE_POLICY,
  230     )

Lines 228-253

  228         latency_data=latency_data,
  229         policy=STREAMWISE_POLICY,
  230     )
  231 
! 232     result = allocator.allocate(num_gpus=allocator_gpus, verbose=False)
  233 
  234     # Convert result to deployment specs
! 235     specs = result_to_deployment_specs(result)
  236 
  237     # Trim deployment specs back to the user's actual budget.
  238     # Also handles MIG-unavailable overflow (e.g., OTHERS allocates 1 GPU
  239     # but kokoro+yolo each need a full GPU = 2).
! 240     actual_per_type = _calc_actual_gpus_per_type(specs)
! 241     for gpu_type, budget_count in num_gpus.items():
! 242         actual = actual_per_type.get(gpu_type, 0)
! 243         if actual <= budget_count:
! 244             continue
! 245         excess = actual - budget_count
! 246         gpu_type_str = GPU_TYPE_TO_POD_STR[gpu_type]
! 247         specs = _trim_specs_for_type(specs, gpu_type_str, excess)
  248 
! 249     return DeploymentPlan(
  250         specs=specs,
  251         result=result,
  252         workflow_name=workflow_name,
  253         gpu_budget=gpu_budget,

Lines 286-297

  286                         container_name in COLOCATED_CONTAINERS
  287                         and not STREAMWISE_POLICY.disaggregation.get(Model.HF, False)
  288                     )
  289                     if is_colocated:
! 290                         gpu_count = 0
  291                     elif MIG_AVAILABLE and container_name in MIG_CONTAINERS:
! 292                         mig_profile = MIG_CONTAINERS[container_name]
! 293                         gpu_count = 1
  294                     elif container_name in MIG_CONTAINERS:
  295                         # MIG not available: use 1 full GPU instead of a MIG slice
  296                         gpu_count = 1
  297                     else:

Lines 317-325

  317     # when MIG is unavailable and services fall back to full GPUs).
  318     actual_gpus: dict[str, int] = {}
  319     for spec in plan.specs:
  320         if spec.mig_profile:
! 321             continue  # MIG slices don't count against full GPU budget
  322         gpu_type_key = spec.gpu_type or "unknown"
  323         actual_gpus[gpu_type_key] = actual_gpus.get(gpu_type_key, 0) + spec.gpu
  324 
  325     total_budget = sum(plan.gpu_budget.values())

Lines 327-339

  327     budget_exceeded = total_actual > total_budget
  328 
  329     warnings: list[str] = []
  330     if budget_exceeded:
! 331         mig_hint = (
  332             "Enable MIG to fit lightweight services (kokoro, yolo, realesrgan) "
  333             "on shared GPU slices."
  334         ) if not MIG_AVAILABLE else ""
! 335         warnings.append(
  336             f"Deployment requires {total_actual} full GPUs but budget is "
  337             f"{total_budget}. {mig_hint}"
  338         )

streamwise/container_config.py

Lines 12-24

  12 import os
  13 
  14 _HERE = os.path.dirname(os.path.abspath(__file__))
  15 if _HERE not in sys.path:
! 16     sys.path.insert(0, _HERE)
  17 
  18 _REPO_ROOT = os.path.dirname(_HERE)
  19 if _REPO_ROOT not in sys.path:
! 20     sys.path.insert(0, _REPO_ROOT)
  21 
  22 # model_provisioner import adds simulator/ to sys.path
  23 import model_provisioner  # noqa: E402, F401

streamwise/streamwise.py

Lines 551-559

  551 
  552 @route("/auto_deploy", methods=["GET"])
  553 async def auto_deploy_page() -> str:
  554     """Render the standalone auto-deploy page. (TODO: enable customization after auto-deploy plan is generated)"""
! 555     return await render_template("auto_deploy.html")
  556 
  557 
  558 @route("/api/pod/<pod_name>", methods=["DELETE"])
  559 async def api_remove_pod(pod_name: str) -> QuartReturn:

Lines 569-596

  569 
  570 @route("/api/pods/wrappers", methods=["DELETE"])
  571 async def api_delete_all_wrappers() -> QuartReturn:
  572     """Delete all wrapper pods (non-app, non-system pods) in the namespace."""
! 573     svcs = await get_services(namespace=NAMESPACE, k8s_cluster=k8s_cluster)
  574     # Exclude app pods and the streamwise management pod itself
! 575     excluded = set(STREAMWISE_APPS) | {"streamwise"}
! 576     wrapper_pods = [
  577         svc["pod_name"] for svc in svcs
  578         if svc.get("container_name") not in excluded and svc.get("pod_name")
  579     ]
! 580     deleted = 0
! 581     errors: list[str] = []
! 582     for pod_name in wrapper_pods:
! 583         try:
! 584             await pod_manager.remove_pod(
  585                 pod_name, namespace=NAMESPACE, k8s_cluster=k8s_cluster)
! 586             deleted += 1
! 587         except Exception as e:
! 588             errors.append(f"{pod_name}: {e}")
! 589     result: dict[str, object] = {"deleted": deleted, "total": len(wrapper_pods)}
! 590     if errors:
! 591         result["errors"] = errors
! 592     return jsonify(result), HTTPStatus.OK
  593 
  594 
  595 @route("/api/services", methods=["GET"])
  596 async def api_get_services() -> QuartReturn:

Lines 631-657

  631     try:
  632         # Build container_dict from shared constants (CONTAINER_RESOURCES + MIG_CONTAINERS).
  633         # Format: container_name -> (cpu, memory_gib, ephemeral_storage_gib, gpu_info)
  634         # gpu_info is either an int (GPU count) or a MIG profile string.
! 635         container_dict: dict[str, tuple[int, int, int, Union[int, str]]] = {}
  636 
  637         # Services not in CONTAINER_RESOURCES (CPU-only or extra services)
! 638         container_dict["podcasttranscript"] = (1, 4, 16, 0)
! 639         container_dict["slidetranscript"] = (1, 4, 16, 0)
  640 
! 641         for name, (cpu, mem, storage) in CONTAINER_RESOURCES.items():
! 642             if name in MIG_CONTAINERS:
! 643                 container_dict[name] = (cpu, mem, storage, MIG_CONTAINERS[name])
  644             else:
! 645                 container_dict[name] = (cpu, mem, storage, min(2, max_gpus))
  646 
  647         # Additional services not covered by CONTAINER_RESOURCES
! 648         container_dict["fluxkontext"] = (12, 128, 64, 1)
! 649         container_dict["whisper"] = (2, 8, 16, 1)
  650 
  651         # hunyuanframepackvae uses exactly 1 GPU (not scaled by max_gpus)
! 652         cpu, mem, storage = CONTAINER_RESOURCES["hunyuanframepackvae"]
! 653         container_dict["hunyuanframepackvae"] = (cpu, mem, storage, 1)
  654 
  655         for container_name, (cpu, mem_gib, sotrage_gib, gpu_info) in container_dict.items():
  656             num_gpus, mig_profile = parse_gpu_info(gpu_info)
  657             await pod_manager.add_pod(

Lines 788-796

  788         if not gpu_budget or not isinstance(gpu_budget, dict):
  789             return jsonify({"error": "Missing or invalid 'gpu_budget' field"}), HTTPStatus.BAD_REQUEST
  790         for gpu_type_name, count in gpu_budget.items():
  791             if isinstance(count, bool) or not isinstance(count, int) or count < 0:
! 792                 return (
  793                     jsonify(
  794                         {
  795                             "error": (
  796                                 "Invalid 'gpu_budget' field: each GPU type count must be a "

Lines 819-836

  819         return jsonify(result_json), HTTPStatus.OK
  820 
  821     except ValueError as ve:
  822         return jsonify({"error": str(ve)}), HTTPStatus.BAD_REQUEST
! 823     except AssertionError as ae:
! 824         msg = str(ae) if str(ae) else (
  825             "GPU budget too small. Each GPU type must have at least 8 GPUs "
  826             "(one full server). Use a single GPU type with 8+ GPUs, or "
  827             "ensure each type has at least 8."
  828         )
! 829         return jsonify({"error": msg}), HTTPStatus.BAD_REQUEST
! 830     except Exception as ex:
! 831         logging.exception("Error in auto_deploy: %s", ex)
! 832         return jsonify({"error": str(ex)}), HTTPStatus.INTERNAL_SERVER_ERROR
  833 
  834 
  835 @route("/api/auto_deploy/confirm", methods=["POST"])
  836 async def api_auto_deploy_confirm() -> QuartReturn:

Lines 861-870

  861 
  862         for spec in specs:
  863             container_name = spec.get("container_name")
  864             if not container_name:
! 865                 errors.append("Spec missing 'container_name'")
! 866                 continue
  867 
  868             try:
  869                 add_pod_result = await pod_manager.add_pod(
  870                     container_name=container_name,

Lines 882-891

  882                 if isinstance(add_pod_result, tuple) and len(add_pod_result) >= 2:
  883                     status_value = add_pod_result[1]
  884                     if isinstance(status_value, HTTPStatus):
  885                         status_code = status_value
! 886                     elif isinstance(status_value, int):
! 887                         status_code = HTTPStatus(status_value)
  888 
  889                 if status_code >= HTTPStatus.BAD_REQUEST:
  890                     msg = f"Failed to deploy '{container_name}' (status={int(status_code)})"
  891                     logging.error(msg)

Lines 891-907

  891                     logging.error(msg)
  892                     errors.append(msg)
  893                 else:
  894                     deployed.append(container_name)
! 895             except Exception as pod_ex:
! 896                 msg = f"Failed to deploy '{container_name}': {pod_ex}"
! 897                 logging.error(msg)
! 898                 errors.append(msg)
  899 
  900         # Also deploy the application container if workflow is specified
  901         if workflow and workflow in STREAMWISE_APPS:
! 902             try:
! 903                 add_pod_result = await pod_manager.add_pod(
  904                     container_name=workflow,
  905                     cpu=4,
  906                     memory_gib=16,
  907                     ephemeral_storage_gib=16,

Lines 910-934

  910                     mig_profile=None,
  911                     namespace=NAMESPACE,
  912                     k8s_cluster=k8s_cluster,
  913                 )
! 914                 status_code = HTTPStatus.OK
! 915                 if isinstance(add_pod_result, tuple) and len(add_pod_result) >= 2:
! 916                     status_value = add_pod_result[1]
! 917                     if isinstance(status_value, HTTPStatus):
! 918                         status_code = status_value
! 919                     elif isinstance(status_value, int):
! 920                         status_code = HTTPStatus(status_value)
! 921                 if status_code >= HTTPStatus.BAD_REQUEST:
! 922                     msg = f"Failed to deploy app '{workflow}' (status={int(status_code)})"
! 923                     logging.error(msg)
! 924                     errors.append(msg)
  925                 else:
! 926                     deployed.append(workflow)
! 927             except Exception as app_ex:
! 928                 msg = f"Failed to deploy app '{workflow}': {app_ex}"
! 929                 logging.error(msg)
! 930                 errors.append(msg)
  931 
  932         total_deployed = len(deployed)
  933         total_specs = len(specs) + (1 if workflow and workflow in STREAMWISE_APPS else 0)
  934         status = HTTPStatus.OK if not errors else HTTPStatus.MULTI_STATUS

Lines 937-947

  937             "errors": errors,
  938             "message": f"Deployed {total_deployed}/{total_specs} containers.",
  939         }), status
  940 
! 941     except Exception as ex:
! 942         logging.exception("Error in auto_deploy/confirm: %s", ex)
! 943         return jsonify({"error": str(ex)}), HTTPStatus.INTERNAL_SERVER_ERROR
  944 
  945 
  946 @route("/api/auto_deploy/workflows", methods=["GET"])
  947 async def api_auto_deploy_workflows() -> QuartReturn:

Lines 971-989

  971             gpu_count = node.get("allocatable_resources", {}).get("gpu", 0)
  972             if isinstance(gpu_count, str):
  973                 try:
  974                     gpu_count = int(gpu_count)
! 975                 except ValueError:
! 976                     continue
  977             if gpu_count <= 0:
! 978                 continue
  979             # Map gpu_model label to canonical type name
  980             canonical = _gpu_label_to_canonical(gpu_model)
  981             gpu_counts[canonical] = gpu_counts.get(canonical, 0) + gpu_count
  982         return jsonify({"gpu_budget": gpu_counts}), HTTPStatus.OK
! 983     except Exception as ex:
! 984         logging.exception("Error in cluster_gpus: %s", ex)
! 985         return jsonify({"error": str(ex)}), HTTPStatus.INTERNAL_SERVER_ERROR
  986 
  987 
  988 def _gpu_label_to_canonical(gpu_model: str) -> str:
  989     """Map a GPU product label to a canonical type name for the allocator."""

Lines 990-1010

   990     model_upper = gpu_model.upper()
   991     if "H100" in model_upper:
   992         return "H100"
   993     elif "H200" in model_upper:
!  994         return "H200"
   995     elif "A100" in model_upper:
   996         return "A100"
!  997     elif "GB200" in model_upper:
!  998         return "GB200"
!  999     elif "GB300" in model_upper:
! 1000         return "GB300"
! 1001     elif "V100" in model_upper:
! 1002         return "V100"
! 1003     elif "A10" in model_upper:
! 1004         return "A10"
  1005     # Fallback: return as-is
! 1006     return gpu_model
  1007 
  1008 
  1009 @route("/api/node/<node_name>", methods=["DELETE"])
  1010 async def api_remove_node(node_name: str) -> QuartReturn:

tests/streamwise/test_allocator_bridge.py

Lines 251-263

  251     plan = run_allocator(
  252         gpu_budget={"A100": 8},
  253         workflow_name="streamcast",
  254     )
! 255     assert isinstance(plan, DeploymentPlan)
! 256     assert len(plan.specs) > 0
! 257     assert plan.result.total_time_s > 0
! 258     assert plan.result.ttff_s > 0
! 259     assert plan.workflow_name == "streamcast"
  260 
  261 
  262 def test_run_allocator_streamchat_8_h100() -> None:
  263     """Run allocator for StreamChat with 8 H100s."""

Lines 264-273

  264     plan = run_allocator(
  265         gpu_budget={"H100": 8},
  266         workflow_name="streamchat",
  267     )
! 268     assert isinstance(plan, DeploymentPlan)
! 269     assert len(plan.specs) > 0
  270 
  271 
  272 def test_run_allocator_invalid_workflow() -> None:
  273     """Unknown workflow name raises ValueError."""

github-actions · 2026-06-09T23:49:20Z

Package	Line Rate	Health
.	80%	✔
apps	67%	✔
apps.streamanimate	89%	✔
apps.streamcast	91%	✔
apps.streamchat	88%	✔
apps.streamdub	64%	✔
apps.streamedit	79%	✔
apps.streamlecture	94%	✔
apps.streammovie	89%	✔
apps.streampersona	64%	✔
apps.streamshort	71%	✔
simulator	90%	✔
streamwise	67%	✔
streamwise.model_provisioner	83%	✔
tests	99%	✔
tests.simulator	100%	✔
tests.streamwise	99%	✔
tests.streamwise_app	99%	✔
wrapper	67%	✔
wrapper.4kagent	94%	✔
wrapper.bagel	37%	❌
wrapper.cogview	45%	➖
wrapper.fantasytalking	55%	➖
wrapper.flux	74%	✔
wrapper.flux2	100%	✔
wrapper.flux2klein	99%	✔
wrapper.fluxkontext	100%	✔
wrapper.fluxkrea	99%	✔
wrapper.fluxupscaler	68%	✔
wrapper.hidream	88%	✔
wrapper.hunyuanavatar	74%	✔
wrapper.hunyuanframepack	52%	➖
wrapper.hunyuanframepackf1	59%	➖
wrapper.hunyuanframepackvae	63%	✔
wrapper.hunyuanimage	84%	✔
wrapper.imageresize	100%	✔
wrapper.januspro	91%	✔
wrapper.kokoro	87%	✔
wrapper.llamagen	65%	✔
wrapper.mock	86%	✔
wrapper.podcasttranscript	45%	➖
wrapper.qwenimage	89%	✔
wrapper.qwenimageedit	88%	✔
wrapper.realesrgan	77%	✔
wrapper.slidetranscript	59%	➖
wrapper.vibevoice	31%	❌
wrapper.vibevoice.schedule	50%	➖
wrapper.wan	34%	❌
wrapper.wan22	75%	✔
wrapper.xtts	75%	✔
wrapper.yolo	69%	✔
Summary	78% (27214 / 34734)	✔

goiri · 2026-06-10T01:14:49Z

@@ -0,0 +1,225 @@
+# StreamWise Demo: End-to-End AKS Deployment with GPU Spot Probing


Let's move this to a different PR. It should have a different structure too.

goiri · 2026-06-10T01:15:58Z

 kubectl apply -f deployment/k8s/nvidia-device-plugin-ds.yaml
 ```

+### 5.0 Critical: Spot Node Toleration and GPU Labels


Let's do this in a separate PR.

goiri · 2026-06-10T01:16:28Z

+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.5/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-SgOJa3DmI69IUzQ2PVdRZhwQ+dy64/BUtbMJw1MZ8t5HZApcHrRKUc4W0kG879m7" crossorigin="anonymous">
+    <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
+    <link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>🎬</text></svg>">


Change the emoji to the robot?

goiri · 2026-06-10T01:19:36Z

+
+# Mapping from simulator Model enum to concrete container names used by pod_manager.
+# Some Model entries map to multiple containers (e.g., OTHERS -> kokoro + yolo).
+MODEL_TO_CONTAINERS: dict[Model, list[str]] = {


Can we get these names from services.json?

goiri · 2026-06-10T01:21:24Z

+
+
+# Mapping from StreamWise app name to simulator workflow key
+APP_TO_WORKFLOW: dict[str, str] = {


Where are we getting these second names from? Why not just use the apps? We have the names in some class no?

goiri · 2026-06-10T01:22:59Z

+# Default CPU/memory/storage for each container when deployed via auto-deploy.
+# Format: (cpu_cores, memory_gib, ephemeral_storage_gib)
+# Keep in sync with the Helm values in deployment/helm/values.yaml.
+CONTAINER_RESOURCES: dict[str, tuple[int, int, int]] = {


Don't we have something similar in the add_pod? Can we reuse?

goiri · 2026-06-10T01:23:11Z

+}
+
+# GPU type string used by pod_manager (lowercase).
+GPU_TYPE_TO_POD_STR: dict[GPUType, str] = {


Do we really need this?

James-QiuHaoran mentioned this pull request May 16, 2026

Move model provisioning policies out of the simulator #311

Closed

Haoran Qiu and others added 4 commits May 16, 2026 16:32

Fix model_provisioner __init__.py to support Docker layout

3c324f7

Support both local dev (../../simulator) and Docker (../simulator) paths when resolving the simulator directory for foundation module imports. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

James-QiuHaoran force-pushed the hqiu/auto-deploy branch from 91cf042 to 38997c3 Compare May 16, 2026 23:32

James-QiuHaoran requested review from Copilot and goiri May 17, 2026 16:39

Copilot started reviewing on behalf of James-QiuHaoran May 17, 2026 16:40 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

Copilot started work on behalf of James-QiuHaoran May 17, 2026 22:23 View session

James-QiuHaoran and others added 6 commits May 17, 2026 15:24

Add fallback for GPUs that do not support MIG

fc1abc9

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Move run_allocator to async call

367760b

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Mock k8s API

81ec0a9

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Add GPU count check

32461f8

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Restore cwd-relative default for simulator data loading

28e322e

Agent-Logs-Url: https://github.com/Azure/realtimevideogen/sessions/d53596b3-c563-4deb-af27-7226e9dac364 Co-authored-by: James-QiuHaoran <22564180+James-QiuHaoran@users.noreply.github.com>

Fix auto-deploy confirm handling for add_pod error statuses

0bd57d9

Agent-Logs-Url: https://github.com/Azure/realtimevideogen/sessions/e52c100d-a760-4a1a-8771-416155f3e835 Co-authored-by: James-QiuHaoran <22564180+James-QiuHaoran@users.noreply.github.com>

Copilot finished work on behalf of James-QiuHaoran May 17, 2026 22:32

goiri and others added 4 commits May 19, 2026 21:46

Merge branch 'hqiu/refactor-model-provisioner' into hqiu/auto-deploy

efcbf80

Fix the data path

bc173a2

Validate budget

1cd5940

Better error message

d01ee1c

Base automatically changed from hqiu/refactor-model-provisioner to main June 9, 2026 00:28

Merge branch 'main' into hqiu/auto-deploy

0c3f95e

github-code-quality Bot found potential problems Jun 9, 2026

View reviewed changes

Comment thread streamwise/allocator_bridge.py Fixed

Comment thread tests/streamwise/test_streamwise_auto_deploy.py Dismissed

goiri reviewed Jun 9, 2026

View reviewed changes

Haoran Qiu and others added 8 commits June 9, 2026 10:42

Fix: default GPU budget to 0, let cluster state populate values

3a16c29

Previously A100 defaulted to 8 even when no A100 nodes exist. Now all fields default to 0 and are populated only from the cluster_gpus API. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Update test to reflect MIG_AVAILABLE=False behavior

b239abc

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add skills on demo end-to-end

485f579

Fixed lint and comments

9d1637b

github-code-quality Bot found potential problems Jun 9, 2026

View reviewed changes

Comment thread streamwise/allocator_bridge.py Dismissed

Comment thread streamwise/container_config.py Dismissed

Fixed data path

8886baf

goiri reviewed Jun 9, 2026

View reviewed changes

Haoran Qiu and others added 12 commits June 9, 2026 15:29

Format GPUs Needed as '30 H100s' instead of JSON

b514fbc

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Hide co-located containers (gpu=0) from auto-deploy results table

35af577

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace GPU count text box with range slider (8-100)

9de7648

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Left TODOs

19b539d

goiri reviewed Jun 10, 2026

View reviewed changes

		@@ -0,0 +1,225 @@
		# StreamWise Demo: End-to-End AKS Deployment with GPU Spot Probing



		# Mapping from StreamWise app name to simulator workflow key
		APP_TO_WORKFLOW: dict[str, str] = {

Conversation

James-QiuHaoran commented May 16, 2026

Changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented May 17, 2026

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026

Lint Results

Uh oh!

github-actions Bot commented Jun 9, 2026

Mypy Type Checking

Uh oh!

github-actions Bot commented Jun 9, 2026

Diff Coverage

Diff: origin/main...HEAD, staged and unstaged changes

Summary

streamwise/allocator_bridge.py

streamwise/container_config.py

streamwise/streamwise.py

tests/streamwise/test_allocator_bridge.py

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

github-actions Bot commented Jun 9, 2026 •

edited

Loading