Skip to content

Add auto-deploy feature to StreamWise dashboard#313

Open
James-QiuHaoran wants to merge 38 commits into
mainfrom
hqiu/auto-deploy
Open

Add auto-deploy feature to StreamWise dashboard#313
James-QiuHaoran wants to merge 38 commits into
mainfrom
hqiu/auto-deploy

Conversation

@James-QiuHaoran

Copy link
Copy Markdown
Collaborator

Add an Auto Deploy button to the web dashboard that automatically optimizes GPU resource allocation across workflow components using the model provisioner's greedy allocator.

Depends on: #312 (refactor: move policies to model_provisioner)

Changes

  • streamwise/allocator_bridge.py (new): Maps allocator output to K8s deployment parameters (Model enum → container names, GPU specs)
  • streamwise/streamwise.py: Add /api/auto_deploy and /api/auto_deploy/confirm routes
  • streamwise/templates/add_pod.html: Auto-deploy UI section with GPU budget inputs, workflow selector, and deployment plan preview
  • tests/streamwise/test_allocator_bridge.py (new): 15 tests for allocator bridge
  • tests/streamwise/test_streamwise_auto_deploy.py (new): 10 tests for auto-deploy API

Testing

  • 108 streamwise tests pass (1 pre-existing Windows failure: \ est_list_files)
  • flake8 clean
  • mypy clean

…ioner/

Move the 6 policy/allocator files (greedy, milp, naive_baseline, hexgen,
helix, policies) from simulator/ into streamwise/model_provisioner/ so
they can be reused by both the simulator evaluation framework and the
StreamWise serving system.

- Create streamwise/model_provisioner/ package with __init__.py that
  adds simulator/ to sys.path for foundation module access
- Create simulator/__init__.py that adds streamwise/ to sys.path so
  model_provisioner is importable from simulator code
- Update all imports across simulator files and 20 test files
- Switch data_loading.py to use Path instead of str for data_dir params
- Fix mypy issue in wrapper/run_httpserver.py (bytearray assignment)
- Add .venv to .flake8 exclude and .gitignore

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Haoran Qiu and others added 4 commits May 16, 2026 16:32
Support both local dev (../../simulator) and Docker (../simulator) paths
when resolving the simulator directory for foundation module imports.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an 'Auto Deploy' button to the web dashboard that automatically
optimizes GPU resource allocation across workflow components using the
model provisioner's greedy allocator.

- Add streamwise/allocator_bridge.py: maps allocator output to K8s
  deployment parameters (Model enum -> container names, GPU specs)
- Add /api/auto_deploy and /api/auto_deploy/confirm routes to
  streamwise.py for computing and confirming deployment plans
- Add auto-deploy UI section to add_pod.html with GPU budget inputs,
  workflow selector, and deployment plan preview
- Add comprehensive tests for allocator bridge and auto-deploy API

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ectory

Add streamwise/ directory to sys.path explicitly in allocator_bridge.py
so model_provisioner can be found when Python is invoked from a different
working directory (e.g., in Docker/pipeline environments).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The StreamWise Docker image was missing the model_provisioner/ package
and simulator/ foundation modules needed by the auto-deploy feature.

- Update deployment/setup_image.sh to copy model_provisioner/ and
  simulator/ into the Docker build context
- Update Dockerfile to COPY both directories
- Fix model_provisioner/__init__.py to find simulator/ in both local dev
  layout (../../simulator) and Docker layout (../simulator)
- Guard sys.path.insert with dedup check in allocator_bridge.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The model_provisioner and simulator foundation modules (sim_types,
data_loading, utils, greedy) require pandas and tabulate which were not
previously needed by the StreamWise Docker image.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new "Auto Deploy" feature to the StreamWise dashboard: an HTTP API and UI that runs the greedy AutoModelAllocator over a user-supplied GPU budget + workflow, previews the resulting per-container deployment plan, and (on confirm) creates the corresponding pods via pod_manager.add_pod. A new bridge module translates the simulator's abstract Model/GPUType allocations into concrete K8s container specs, and the Dockerfile/setup script now bundle model_provisioner and simulator into the streamwise image.

Changes:

  • New streamwise/allocator_bridge.py mapping Model/GPUType allocations to deployment specs and exposing run_allocator + JSON serialization.
  • New /api/auto_deploy, /api/auto_deploy/confirm, /api/auto_deploy/workflows routes in streamwise.py, plus an Auto-Deploy UI section in add_pod.html with optimize/confirm flow.
  • Packaging changes (Dockerfile, setup_image.sh, requirements.txt) to ship simulator + model_provisioner and add pandas/tabulate; new tests for the bridge and the API routes.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
streamwise/allocator_bridge.py New bridge: Model→container/MIG/resource mapping and run_allocator/serialization helpers.
streamwise/streamwise.py Three new auto-deploy routes invoking the bridge and pod_manager.add_pod.
streamwise/templates/add_pod.html Auto-deploy form, results table, and fetch logic for optimize/confirm.
streamwise/requirements.txt Adds pandas and tabulate dependencies.
deployment/streamwise/Dockerfile Copies model_provisioner and simulator into the image.
deployment/setup_image.sh Stages model_provisioner and simulator for the streamwise image build.
tests/streamwise/test_allocator_bridge.py Unit/integration tests for the bridge (mappings, specs, real allocator runs).
tests/streamwise/test_streamwise_auto_deploy.py API tests for the three new routes, with K8s mocks and allocator patches.

Comment thread streamwise/streamwise.py
Comment thread streamwise/allocator_bridge.py Outdated
Comment thread tests/streamwise/test_allocator_bridge.py
Comment thread streamwise/allocator_bridge.py Outdated
Comment thread streamwise/streamwise.py
Comment thread tests/streamwise/test_streamwise_auto_deploy.py
Comment thread tests/streamwise/test_streamwise_auto_deploy.py
Comment thread streamwise/streamwise.py Outdated
Comment thread streamwise/allocator_bridge.py
Comment thread streamwise/streamwise.py
James-QiuHaoran and others added 6 commits May 17, 2026 15:24
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Azure/realtimevideogen/sessions/d53596b3-c563-4deb-af27-7226e9dac364

Co-authored-by: James-QiuHaoran <22564180+James-QiuHaoran@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Azure/realtimevideogen/sessions/e52c100d-a760-4a1a-8771-416155f3e835

Co-authored-by: James-QiuHaoran <22564180+James-QiuHaoran@users.noreply.github.com>

Copilot AI commented May 17, 2026

Copy link
Copy Markdown
Contributor

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: /home/REDACTED/.local/bin/pytest /home/REDACTED/.local/bin/pytest --ignore=tests/simulator --ignore=tests/streamwise --ignore=tests/streamwise_app -vv -main/dist/indexgpg.program (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Base automatically changed from hqiu/refactor-model-provisioner to main June 9, 2026 00:28
Comment thread streamwise/allocator_bridge.py Fixed
Comment thread tests/streamwise/test_streamwise_auto_deploy.py Dismissed
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Test Results

1 174 tests  +29   1 171 ✅ +26   14m 56s ⏱️ -5s
   13 suites ± 0       0 💤 ± 0 
   13 files   ± 0       3 ❌ + 3 

For more details on these failures, see this check.

Results for commit 19b539d. ± Comparison against base commit 9bcbc4e.

♻️ This comment has been updated with latest results.

Comment thread streamwise/streamwise.py
Comment thread streamwise/allocator_bridge.py Outdated
Comment thread streamwise/allocator_bridge.py Outdated
Comment thread streamwise/allocator_bridge.py Outdated
Comment thread streamwise/allocator_bridge.py
Comment thread streamwise/templates/add_pod.html Outdated
Haoran Qiu and others added 8 commits June 9, 2026 10:42
Add /api/auto_deploy/cluster_gpus endpoint that aggregates allocatable GPUs
by type (H100, A100, etc.) from all ready nodes. The auto-deploy form fetches
this on page load and pre-fills the GPU budget text boxes.

Also fixed NVIDIA device plugin toleration for Spot nodes (needed to register
GPUs on AKS Spot node pools).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The upstream NVIDIA device plugin DaemonSet lacks the Spot toleration, so
it won't schedule on AKS Spot GPU nodes. Document that the local manifest
(deployment/k8s/nvidia-device-plugin-ds.yaml) already includes this fix,
and provide the patch command as a fallback.

Also document the need for manual nvidia.com/gpu.product labels on nodes
until GPU Feature Discovery is installed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously A100 defaulted to 8 even when no A100 nodes exist. Now all
fields default to 0 and are populated only from the cluster_gpus API.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When MIG is unavailable, patch DEVICE_OPTIONS so OTHERS (kokoro+yolo)
counts as 2 full GPUs instead of 1 MIG slice.

Mark hunyuanframepackvae as co-located container (gpu=0) since it
shares resources with the HunyuanFramePack server.

This ensures budget=16 produces exactly 16 GPUs for StreamCast.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The allocator counts OTHERS as 1 GPU (SINGLE_DEVICE_MODELS constraint),
but without MIG, kokoro+yolo each need a full GPU = 2. This caused
per-type budget violations (e.g., A100=9 when budget=8).

Instead of patching DEVICE_OPTIONS (ineffective due to allocator
constraints), detect per-type overflow after allocation and trim
excess replicas of the most-replicated container on the overflowing
type. This preserves throughput while respecting per-type budgets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread streamwise/allocator_bridge.py Dismissed
Comment thread streamwise/container_config.py Dismissed
Comment thread streamwise/templates/add_pod.html Outdated
Comment thread streamwise/templates/add_pod.html Outdated
<label for="auto_gpu_a100" class="form-label">A100</label>
<input type="number" class="form-control" id="auto_gpu_a100" name="gpu_a100"
min="0" max="64" value="0">
</div>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to make it generic for GPUType

Comment thread streamwise/templates/add_pod.html Outdated
<div class="row g-3 mb-3">
<div class="col-md-3">
<label for="auto_gpu_a100" class="form-label">A100</label>
<input type="number" class="form-control" id="auto_gpu_a100" name="gpu_a100"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a row per GPU type and make the number a slider

Comment thread streamwise/templates/add_pod.html Outdated
<table class="table table-sm table-bordered" id="auto-deploy-plan-table">
<thead>
<tr>
<th>Container</th>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should show the friendly name

Comment thread streamwise/templates/add_pod.html Outdated
<tr>
<th>Container</th>
<th>GPU</th>
<th>GPU Type</th>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should show the friendly name.

Comment thread streamwise/templates/add_pod.html Outdated
<th>GPU Type</th>
<th>CPU</th>
<th>Memory</th>
<th>MIG</th>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next version may want to allow customizing the plan.

Comment thread streamwise/templates/add_pod.html Outdated
</tr>
</thead>
<tbody id="auto-deploy-plan-body"></tbody>
</table>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass the disaggregation or leave a comment that is not disaggregated.

Comment thread streamwise/templates/add_pod.html Outdated
`<strong>TTFF:</strong> ${metrics.ttff_s}s &nbsp;|&nbsp; ` +
`<strong>Cost:</strong> $${metrics.cost} &nbsp;|&nbsp; ` +
`<strong>GPUs Needed:</strong> ${actualGpus}` +
(metrics.budget_exceeded ? ' <span class="text-danger">(exceeds budget!)</span>' : '');

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this output nicer.

Comment thread streamwise/templates/add_pod.html Outdated
Comment thread streamwise/templates/add_pod.html Outdated
Haoran Qiu and others added 12 commits June 9, 2026 15:29
- Add /auto_deploy route with standalone auto_deploy.html page
- Add robot icon + 'Auto Deploy' button on main index page above Applications
- Remove auto-deploy section from add_pod.html (wrappers/apps add page)
- Confirm endpoint now also deploys the application container (e.g., streamcast)
  when a workflow name is provided in the request

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In Docker the parent of /streamwise is /, which produced an invalid
path /simulator/data/*.csv.  Using _HERE resolves to
/streamwise/simulator/data/ inside the container.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ocator

The greedy allocator asserts GPU counts are multiples of
NUM_GPUS_PER_SERVER (8). Now we round up for the allocator call, then
trim specs back to the user's actual budget. This allows non-multiple
budgets like 26 or 30.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a red 🗑️ Delete All button next to the ➕ under Wrappers.
Clicking it calls DELETE /api/pods/wrappers which removes all
non-app pods in the rtgen namespace.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace 4-column fixed GPU type inputs with dynamic rows (one per
type). Rows auto-populate from the cluster state (e.g., 32 H100s).
Users can add/remove rows; the dropdown prevents duplicate types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The delete-all-wrappers endpoint and the wrapper list on the main page
now exclude the streamwise management pod and all STREAMWISE_APPS
(load balancer / application containers).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…oy table

The /api/auto_deploy response now includes friendly_name (from
services.json friendlyName field via get_friendly_container_name)
and uppercased gpu_type for each spec. The frontend displays these
directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
COLOCATED_CONTAINERS was unconditionally forcing hunyuanframepackvae
to gpu=0 even when the policy has disaggregation={Model.HF: True}.
Now co-location only applies when disaggregation is disabled.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Lint Results

Check Status
Python
Shell
YAML
JSON
Markdown
Bicep

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Mypy Type Checking

✅ No issues found

Metric Count
📂 Files 300
❌ Errors 0
⚠️ Warnings 0
📝 Notes 0
Full mypy output
Success: no issues found in 300 source files

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Diff Coverage

Diff: origin/main...HEAD, staged and unstaged changes

  • streamwise/allocator_bridge.py (69.3%): Missing lines 19,58-62,117-124,137-141,144-148,150-154,157-159,226,232,235,240-247,249,290,292-293,321,331,335
  • streamwise/container_config.py (88.2%): Missing lines 16,20
  • streamwise/streamwise.py (54.1%): Missing lines 555,573,575-576,580-584,586-592,635,638-639,641-643,645,648-649,652-653,792,823-824,829-832,865-866,886-887,895-898,902-903,914-924,926-930,941-943,975-976,978,983-985,994,997-1004,1006
  • tests/streamwise/test_allocator_bridge.py (93.0%): Missing lines 255-259,268-269
  • tests/streamwise/test_streamwise.py (100%)
  • tests/streamwise/test_streamwise_auto_deploy.py (100%)

Summary

  • Total: 585 lines
  • Missing: 138 lines
  • Coverage: 76%

streamwise/allocator_bridge.py

Lines 15-23

  15 if _HERE not in sys.path:
  16     sys.path.insert(0, _HERE)
  17 _REPO_ROOT = os.path.dirname(_HERE)
  18 if _REPO_ROOT not in sys.path:
! 19     sys.path.insert(0, _REPO_ROOT)
  20 import model_provisioner  # noqa: E402, F401 — adds simulator/ to sys.path
  21 
  22 from dataclasses import dataclass
  23 from typing import Optional

Lines 54-66

  54 
  55 
  56 def get_mig_profile(container_name: str, gpu_type: GPUType) -> Optional[str]:
  57     """Return a MIG profile only when MIG is available and the GPU type supports it."""
! 58     if not MIG_AVAILABLE:
! 59         return None
! 60     if gpu_type not in MIG_CAPABLE_GPU_TYPES:
! 61         return None
! 62     return MIG_CONTAINERS.get(container_name)
  63 
  64 
  65 # Mapping from StreamWise app name to simulator workflow key
  66 APP_TO_WORKFLOW: dict[str, str] = {

Lines 113-128

  113 
  114 
  115 def _calc_actual_gpus_per_type(specs: list['DeploymentSpec']) -> dict[GPUType, int]:
  116     """Calculate actual GPUs needed per GPUType from deployment specs."""
! 117     result: dict[GPUType, int] = {}
! 118     for spec in specs:
! 119         if spec.mig_profile:
! 120             continue
! 121         gpu_type = _POD_STR_TO_GPU_TYPE.get(spec.gpu_type or "")
! 122         if gpu_type is not None:
! 123             result[gpu_type] = result.get(gpu_type, 0) + spec.gpu
! 124     return result
  125 
  126 
  127 def _trim_specs_for_type(
  128     specs: list['DeploymentSpec'], gpu_type_str: str, excess: int

Lines 133-163

  133     Prefers removing replicas of the most-replicated scalable container (typically
  134     realesrgan/upscaler) to minimize impact on pipeline throughput.
  135     """
  136     # Count replicas per container on this GPU type (only scalable ones)
! 137     from collections import Counter
! 138     type_counts: Counter[str] = Counter()
! 139     for spec in specs:
! 140         if spec.gpu_type == gpu_type_str and spec.gpu > 0 and spec.container_name not in COLOCATED_CONTAINERS:
! 141             type_counts[spec.container_name] += 1
  142 
  143     # Prefer trimming containers with most replicas (least impact per removal)
! 144     trimmed = 0
! 145     result_specs = list(specs)
! 146     for container_name, _count in type_counts.most_common():
! 147         if trimmed >= excess:
! 148             break
  149         # Remove replicas from the end of the list
! 150         for i in range(len(result_specs) - 1, -1, -1):
! 151             if trimmed >= excess:
! 152                 break
! 153             spec = result_specs[i]
! 154             if (spec.container_name == container_name
  155                     and spec.gpu_type == gpu_type_str
  156                     and spec.gpu > 0):
! 157                 trimmed += spec.gpu
! 158                 result_specs.pop(i)
! 159     return result_specs
  160 
  161 
  162 def get_available_workflows() -> list[str]:
  163     """Return list of available workflow names for the UI."""

Lines 222-230

  222     # Load latency data and run allocator
  223     data_dir = _get_data_dir()
  224     latency_data = load_latency_data(data_dir=data_dir)
  225 
! 226     allocator = AutoModelAllocator(
  227         workflow=workflow,
  228         latency_data=latency_data,
  229         policy=STREAMWISE_POLICY,
  230     )

Lines 228-253

  228         latency_data=latency_data,
  229         policy=STREAMWISE_POLICY,
  230     )
  231 
! 232     result = allocator.allocate(num_gpus=allocator_gpus, verbose=False)
  233 
  234     # Convert result to deployment specs
! 235     specs = result_to_deployment_specs(result)
  236 
  237     # Trim deployment specs back to the user's actual budget.
  238     # Also handles MIG-unavailable overflow (e.g., OTHERS allocates 1 GPU
  239     # but kokoro+yolo each need a full GPU = 2).
! 240     actual_per_type = _calc_actual_gpus_per_type(specs)
! 241     for gpu_type, budget_count in num_gpus.items():
! 242         actual = actual_per_type.get(gpu_type, 0)
! 243         if actual <= budget_count:
! 244             continue
! 245         excess = actual - budget_count
! 246         gpu_type_str = GPU_TYPE_TO_POD_STR[gpu_type]
! 247         specs = _trim_specs_for_type(specs, gpu_type_str, excess)
  248 
! 249     return DeploymentPlan(
  250         specs=specs,
  251         result=result,
  252         workflow_name=workflow_name,
  253         gpu_budget=gpu_budget,

Lines 286-297

  286                         container_name in COLOCATED_CONTAINERS
  287                         and not STREAMWISE_POLICY.disaggregation.get(Model.HF, False)
  288                     )
  289                     if is_colocated:
! 290                         gpu_count = 0
  291                     elif MIG_AVAILABLE and container_name in MIG_CONTAINERS:
! 292                         mig_profile = MIG_CONTAINERS[container_name]
! 293                         gpu_count = 1
  294                     elif container_name in MIG_CONTAINERS:
  295                         # MIG not available: use 1 full GPU instead of a MIG slice
  296                         gpu_count = 1
  297                     else:

Lines 317-325

  317     # when MIG is unavailable and services fall back to full GPUs).
  318     actual_gpus: dict[str, int] = {}
  319     for spec in plan.specs:
  320         if spec.mig_profile:
! 321             continue  # MIG slices don't count against full GPU budget
  322         gpu_type_key = spec.gpu_type or "unknown"
  323         actual_gpus[gpu_type_key] = actual_gpus.get(gpu_type_key, 0) + spec.gpu
  324 
  325     total_budget = sum(plan.gpu_budget.values())

Lines 327-339

  327     budget_exceeded = total_actual > total_budget
  328 
  329     warnings: list[str] = []
  330     if budget_exceeded:
! 331         mig_hint = (
  332             "Enable MIG to fit lightweight services (kokoro, yolo, realesrgan) "
  333             "on shared GPU slices."
  334         ) if not MIG_AVAILABLE else ""
! 335         warnings.append(
  336             f"Deployment requires {total_actual} full GPUs but budget is "
  337             f"{total_budget}. {mig_hint}"
  338         )

streamwise/container_config.py

Lines 12-24

  12 import os
  13 
  14 _HERE = os.path.dirname(os.path.abspath(__file__))
  15 if _HERE not in sys.path:
! 16     sys.path.insert(0, _HERE)
  17 
  18 _REPO_ROOT = os.path.dirname(_HERE)
  19 if _REPO_ROOT not in sys.path:
! 20     sys.path.insert(0, _REPO_ROOT)
  21 
  22 # model_provisioner import adds simulator/ to sys.path
  23 import model_provisioner  # noqa: E402, F401

streamwise/streamwise.py

Lines 551-559

  551 
  552 @route("/auto_deploy", methods=["GET"])
  553 async def auto_deploy_page() -> str:
  554     """Render the standalone auto-deploy page. (TODO: enable customization after auto-deploy plan is generated)"""
! 555     return await render_template("auto_deploy.html")
  556 
  557 
  558 @route("/api/pod/<pod_name>", methods=["DELETE"])
  559 async def api_remove_pod(pod_name: str) -> QuartReturn:

Lines 569-596

  569 
  570 @route("/api/pods/wrappers", methods=["DELETE"])
  571 async def api_delete_all_wrappers() -> QuartReturn:
  572     """Delete all wrapper pods (non-app, non-system pods) in the namespace."""
! 573     svcs = await get_services(namespace=NAMESPACE, k8s_cluster=k8s_cluster)
  574     # Exclude app pods and the streamwise management pod itself
! 575     excluded = set(STREAMWISE_APPS) | {"streamwise"}
! 576     wrapper_pods = [
  577         svc["pod_name"] for svc in svcs
  578         if svc.get("container_name") not in excluded and svc.get("pod_name")
  579     ]
! 580     deleted = 0
! 581     errors: list[str] = []
! 582     for pod_name in wrapper_pods:
! 583         try:
! 584             await pod_manager.remove_pod(
  585                 pod_name, namespace=NAMESPACE, k8s_cluster=k8s_cluster)
! 586             deleted += 1
! 587         except Exception as e:
! 588             errors.append(f"{pod_name}: {e}")
! 589     result: dict[str, object] = {"deleted": deleted, "total": len(wrapper_pods)}
! 590     if errors:
! 591         result["errors"] = errors
! 592     return jsonify(result), HTTPStatus.OK
  593 
  594 
  595 @route("/api/services", methods=["GET"])
  596 async def api_get_services() -> QuartReturn:

Lines 631-657

  631     try:
  632         # Build container_dict from shared constants (CONTAINER_RESOURCES + MIG_CONTAINERS).
  633         # Format: container_name -> (cpu, memory_gib, ephemeral_storage_gib, gpu_info)
  634         # gpu_info is either an int (GPU count) or a MIG profile string.
! 635         container_dict: dict[str, tuple[int, int, int, Union[int, str]]] = {}
  636 
  637         # Services not in CONTAINER_RESOURCES (CPU-only or extra services)
! 638         container_dict["podcasttranscript"] = (1, 4, 16, 0)
! 639         container_dict["slidetranscript"] = (1, 4, 16, 0)
  640 
! 641         for name, (cpu, mem, storage) in CONTAINER_RESOURCES.items():
! 642             if name in MIG_CONTAINERS:
! 643                 container_dict[name] = (cpu, mem, storage, MIG_CONTAINERS[name])
  644             else:
! 645                 container_dict[name] = (cpu, mem, storage, min(2, max_gpus))
  646 
  647         # Additional services not covered by CONTAINER_RESOURCES
! 648         container_dict["fluxkontext"] = (12, 128, 64, 1)
! 649         container_dict["whisper"] = (2, 8, 16, 1)
  650 
  651         # hunyuanframepackvae uses exactly 1 GPU (not scaled by max_gpus)
! 652         cpu, mem, storage = CONTAINER_RESOURCES["hunyuanframepackvae"]
! 653         container_dict["hunyuanframepackvae"] = (cpu, mem, storage, 1)
  654 
  655         for container_name, (cpu, mem_gib, sotrage_gib, gpu_info) in container_dict.items():
  656             num_gpus, mig_profile = parse_gpu_info(gpu_info)
  657             await pod_manager.add_pod(

Lines 788-796

  788         if not gpu_budget or not isinstance(gpu_budget, dict):
  789             return jsonify({"error": "Missing or invalid 'gpu_budget' field"}), HTTPStatus.BAD_REQUEST
  790         for gpu_type_name, count in gpu_budget.items():
  791             if isinstance(count, bool) or not isinstance(count, int) or count < 0:
! 792                 return (
  793                     jsonify(
  794                         {
  795                             "error": (
  796                                 "Invalid 'gpu_budget' field: each GPU type count must be a "

Lines 819-836

  819         return jsonify(result_json), HTTPStatus.OK
  820 
  821     except ValueError as ve:
  822         return jsonify({"error": str(ve)}), HTTPStatus.BAD_REQUEST
! 823     except AssertionError as ae:
! 824         msg = str(ae) if str(ae) else (
  825             "GPU budget too small. Each GPU type must have at least 8 GPUs "
  826             "(one full server). Use a single GPU type with 8+ GPUs, or "
  827             "ensure each type has at least 8."
  828         )
! 829         return jsonify({"error": msg}), HTTPStatus.BAD_REQUEST
! 830     except Exception as ex:
! 831         logging.exception("Error in auto_deploy: %s", ex)
! 832         return jsonify({"error": str(ex)}), HTTPStatus.INTERNAL_SERVER_ERROR
  833 
  834 
  835 @route("/api/auto_deploy/confirm", methods=["POST"])
  836 async def api_auto_deploy_confirm() -> QuartReturn:

Lines 861-870

  861 
  862         for spec in specs:
  863             container_name = spec.get("container_name")
  864             if not container_name:
! 865                 errors.append("Spec missing 'container_name'")
! 866                 continue
  867 
  868             try:
  869                 add_pod_result = await pod_manager.add_pod(
  870                     container_name=container_name,

Lines 882-891

  882                 if isinstance(add_pod_result, tuple) and len(add_pod_result) >= 2:
  883                     status_value = add_pod_result[1]
  884                     if isinstance(status_value, HTTPStatus):
  885                         status_code = status_value
! 886                     elif isinstance(status_value, int):
! 887                         status_code = HTTPStatus(status_value)
  888 
  889                 if status_code >= HTTPStatus.BAD_REQUEST:
  890                     msg = f"Failed to deploy '{container_name}' (status={int(status_code)})"
  891                     logging.error(msg)

Lines 891-907

  891                     logging.error(msg)
  892                     errors.append(msg)
  893                 else:
  894                     deployed.append(container_name)
! 895             except Exception as pod_ex:
! 896                 msg = f"Failed to deploy '{container_name}': {pod_ex}"
! 897                 logging.error(msg)
! 898                 errors.append(msg)
  899 
  900         # Also deploy the application container if workflow is specified
  901         if workflow and workflow in STREAMWISE_APPS:
! 902             try:
! 903                 add_pod_result = await pod_manager.add_pod(
  904                     container_name=workflow,
  905                     cpu=4,
  906                     memory_gib=16,
  907                     ephemeral_storage_gib=16,

Lines 910-934

  910                     mig_profile=None,
  911                     namespace=NAMESPACE,
  912                     k8s_cluster=k8s_cluster,
  913                 )
! 914                 status_code = HTTPStatus.OK
! 915                 if isinstance(add_pod_result, tuple) and len(add_pod_result) >= 2:
! 916                     status_value = add_pod_result[1]
! 917                     if isinstance(status_value, HTTPStatus):
! 918                         status_code = status_value
! 919                     elif isinstance(status_value, int):
! 920                         status_code = HTTPStatus(status_value)
! 921                 if status_code >= HTTPStatus.BAD_REQUEST:
! 922                     msg = f"Failed to deploy app '{workflow}' (status={int(status_code)})"
! 923                     logging.error(msg)
! 924                     errors.append(msg)
  925                 else:
! 926                     deployed.append(workflow)
! 927             except Exception as app_ex:
! 928                 msg = f"Failed to deploy app '{workflow}': {app_ex}"
! 929                 logging.error(msg)
! 930                 errors.append(msg)
  931 
  932         total_deployed = len(deployed)
  933         total_specs = len(specs) + (1 if workflow and workflow in STREAMWISE_APPS else 0)
  934         status = HTTPStatus.OK if not errors else HTTPStatus.MULTI_STATUS

Lines 937-947

  937             "errors": errors,
  938             "message": f"Deployed {total_deployed}/{total_specs} containers.",
  939         }), status
  940 
! 941     except Exception as ex:
! 942         logging.exception("Error in auto_deploy/confirm: %s", ex)
! 943         return jsonify({"error": str(ex)}), HTTPStatus.INTERNAL_SERVER_ERROR
  944 
  945 
  946 @route("/api/auto_deploy/workflows", methods=["GET"])
  947 async def api_auto_deploy_workflows() -> QuartReturn:

Lines 971-989

  971             gpu_count = node.get("allocatable_resources", {}).get("gpu", 0)
  972             if isinstance(gpu_count, str):
  973                 try:
  974                     gpu_count = int(gpu_count)
! 975                 except ValueError:
! 976                     continue
  977             if gpu_count <= 0:
! 978                 continue
  979             # Map gpu_model label to canonical type name
  980             canonical = _gpu_label_to_canonical(gpu_model)
  981             gpu_counts[canonical] = gpu_counts.get(canonical, 0) + gpu_count
  982         return jsonify({"gpu_budget": gpu_counts}), HTTPStatus.OK
! 983     except Exception as ex:
! 984         logging.exception("Error in cluster_gpus: %s", ex)
! 985         return jsonify({"error": str(ex)}), HTTPStatus.INTERNAL_SERVER_ERROR
  986 
  987 
  988 def _gpu_label_to_canonical(gpu_model: str) -> str:
  989     """Map a GPU product label to a canonical type name for the allocator."""

Lines 990-1010

   990     model_upper = gpu_model.upper()
   991     if "H100" in model_upper:
   992         return "H100"
   993     elif "H200" in model_upper:
!  994         return "H200"
   995     elif "A100" in model_upper:
   996         return "A100"
!  997     elif "GB200" in model_upper:
!  998         return "GB200"
!  999     elif "GB300" in model_upper:
! 1000         return "GB300"
! 1001     elif "V100" in model_upper:
! 1002         return "V100"
! 1003     elif "A10" in model_upper:
! 1004         return "A10"
  1005     # Fallback: return as-is
! 1006     return gpu_model
  1007 
  1008 
  1009 @route("/api/node/<node_name>", methods=["DELETE"])
  1010 async def api_remove_node(node_name: str) -> QuartReturn:

tests/streamwise/test_allocator_bridge.py

Lines 251-263

  251     plan = run_allocator(
  252         gpu_budget={"A100": 8},
  253         workflow_name="streamcast",
  254     )
! 255     assert isinstance(plan, DeploymentPlan)
! 256     assert len(plan.specs) > 0
! 257     assert plan.result.total_time_s > 0
! 258     assert plan.result.ttff_s > 0
! 259     assert plan.workflow_name == "streamcast"
  260 
  261 
  262 def test_run_allocator_streamchat_8_h100() -> None:
  263     """Run allocator for StreamChat with 8 H100s."""

Lines 264-273

  264     plan = run_allocator(
  265         gpu_budget={"H100": 8},
  266         workflow_name="streamchat",
  267     )
! 268     assert isinstance(plan, DeploymentPlan)
! 269     assert len(plan.specs) > 0
  270 
  271 
  272 def test_run_allocator_invalid_workflow() -> None:
  273     """Unknown workflow name raises ValueError."""

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Code Coverage

Package Line Rate Complexity Health
. 80% 0
apps 67% 0
apps.streamanimate 89% 0
apps.streamcast 91% 0
apps.streamchat 88% 0
apps.streamdub 64% 0
apps.streamedit 79% 0
apps.streamlecture 94% 0
apps.streammovie 89% 0
apps.streampersona 64% 0
apps.streamshort 71% 0
simulator 90% 0
streamwise 67% 0
streamwise.model_provisioner 83% 0
tests 99% 0
tests.simulator 100% 0
tests.streamwise 99% 0
tests.streamwise_app 99% 0
wrapper 67% 0
wrapper.4kagent 94% 0
wrapper.bagel 37% 0
wrapper.cogview 45% 0
wrapper.fantasytalking 55% 0
wrapper.flux 74% 0
wrapper.flux2 100% 0
wrapper.flux2klein 99% 0
wrapper.fluxkontext 100% 0
wrapper.fluxkrea 99% 0
wrapper.fluxupscaler 68% 0
wrapper.hidream 88% 0
wrapper.hunyuanavatar 74% 0
wrapper.hunyuanframepack 52% 0
wrapper.hunyuanframepackf1 59% 0
wrapper.hunyuanframepackvae 63% 0
wrapper.hunyuanimage 84% 0
wrapper.imageresize 100% 0
wrapper.januspro 91% 0
wrapper.kokoro 87% 0
wrapper.llamagen 65% 0
wrapper.mock 86% 0
wrapper.podcasttranscript 45% 0
wrapper.qwenimage 89% 0
wrapper.qwenimageedit 88% 0
wrapper.realesrgan 77% 0
wrapper.slidetranscript 59% 0
wrapper.vibevoice 31% 0
wrapper.vibevoice.schedule 50% 0
wrapper.wan 34% 0
wrapper.wan22 75% 0
wrapper.xtts 75% 0
wrapper.yolo 69% 0
Summary 78% (27214 / 34734) 0

@@ -0,0 +1,225 @@
# StreamWise Demo: End-to-End AKS Deployment with GPU Spot Probing

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to a different PR. It should have a different structure too.

Comment thread deployment/aks/README.md
kubectl apply -f deployment/k8s/nvidia-device-plugin-ds.yaml
```

### 5.0 Critical: Spot Node Toleration and GPU Labels

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this in a separate PR.

<meta name="viewport" content="width=device-width, initial-scale=1">
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.5/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-SgOJa3DmI69IUzQ2PVdRZhwQ+dy64/BUtbMJw1MZ8t5HZApcHrRKUc4W0kG879m7" crossorigin="anonymous">
<link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
<link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>🎬</text></svg>">

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the emoji to the robot?


# Mapping from simulator Model enum to concrete container names used by pod_manager.
# Some Model entries map to multiple containers (e.g., OTHERS -> kokoro + yolo).
MODEL_TO_CONTAINERS: dict[Model, list[str]] = {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get these names from services.json?



# Mapping from StreamWise app name to simulator workflow key
APP_TO_WORKFLOW: dict[str, str] = {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we getting these second names from? Why not just use the apps? We have the names in some class no?

# Default CPU/memory/storage for each container when deployed via auto-deploy.
# Format: (cpu_cores, memory_gib, ephemeral_storage_gib)
# Keep in sync with the Helm values in deployment/helm/values.yaml.
CONTAINER_RESOURCES: dict[str, tuple[int, int, int]] = {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have something similar in the add_pod? Can we reuse?

}

# GPU type string used by pod_manager (lowercase).
GPU_TYPE_TO_POD_STR: dict[GPUType, str] = {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants