Add auto-deploy feature to StreamWise dashboard#313
Conversation
…ioner/ Move the 6 policy/allocator files (greedy, milp, naive_baseline, hexgen, helix, policies) from simulator/ into streamwise/model_provisioner/ so they can be reused by both the simulator evaluation framework and the StreamWise serving system. - Create streamwise/model_provisioner/ package with __init__.py that adds simulator/ to sys.path for foundation module access - Create simulator/__init__.py that adds streamwise/ to sys.path so model_provisioner is importable from simulator code - Update all imports across simulator files and 20 test files - Switch data_loading.py to use Path instead of str for data_dir params - Fix mypy issue in wrapper/run_httpserver.py (bytearray assignment) - Add .venv to .flake8 exclude and .gitignore Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Support both local dev (../../simulator) and Docker (../simulator) paths when resolving the simulator directory for foundation module imports. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an 'Auto Deploy' button to the web dashboard that automatically optimizes GPU resource allocation across workflow components using the model provisioner's greedy allocator. - Add streamwise/allocator_bridge.py: maps allocator output to K8s deployment parameters (Model enum -> container names, GPU specs) - Add /api/auto_deploy and /api/auto_deploy/confirm routes to streamwise.py for computing and confirming deployment plans - Add auto-deploy UI section to add_pod.html with GPU budget inputs, workflow selector, and deployment plan preview - Add comprehensive tests for allocator bridge and auto-deploy API Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ectory Add streamwise/ directory to sys.path explicitly in allocator_bridge.py so model_provisioner can be found when Python is invoked from a different working directory (e.g., in Docker/pipeline environments). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The StreamWise Docker image was missing the model_provisioner/ package and simulator/ foundation modules needed by the auto-deploy feature. - Update deployment/setup_image.sh to copy model_provisioner/ and simulator/ into the Docker build context - Update Dockerfile to COPY both directories - Fix model_provisioner/__init__.py to find simulator/ in both local dev layout (../../simulator) and Docker layout (../simulator) - Guard sys.path.insert with dedup check in allocator_bridge.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
91cf042 to
38997c3
Compare
The model_provisioner and simulator foundation modules (sim_types, data_loading, utils, greedy) require pandas and tabulate which were not previously needed by the StreamWise Docker image. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new "Auto Deploy" feature to the StreamWise dashboard: an HTTP API and UI that runs the greedy AutoModelAllocator over a user-supplied GPU budget + workflow, previews the resulting per-container deployment plan, and (on confirm) creates the corresponding pods via pod_manager.add_pod. A new bridge module translates the simulator's abstract Model/GPUType allocations into concrete K8s container specs, and the Dockerfile/setup script now bundle model_provisioner and simulator into the streamwise image.
Changes:
- New
streamwise/allocator_bridge.pymappingModel/GPUTypeallocations to deployment specs and exposingrun_allocator+ JSON serialization. - New
/api/auto_deploy,/api/auto_deploy/confirm,/api/auto_deploy/workflowsroutes instreamwise.py, plus an Auto-Deploy UI section inadd_pod.htmlwith optimize/confirm flow. - Packaging changes (
Dockerfile,setup_image.sh,requirements.txt) to shipsimulator+model_provisionerand addpandas/tabulate; new tests for the bridge and the API routes.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| streamwise/allocator_bridge.py | New bridge: Model→container/MIG/resource mapping and run_allocator/serialization helpers. |
| streamwise/streamwise.py | Three new auto-deploy routes invoking the bridge and pod_manager.add_pod. |
| streamwise/templates/add_pod.html | Auto-deploy form, results table, and fetch logic for optimize/confirm. |
| streamwise/requirements.txt | Adds pandas and tabulate dependencies. |
| deployment/streamwise/Dockerfile | Copies model_provisioner and simulator into the image. |
| deployment/setup_image.sh | Stages model_provisioner and simulator for the streamwise image build. |
| tests/streamwise/test_allocator_bridge.py | Unit/integration tests for the bridge (mappings, specs, real allocator runs). |
| tests/streamwise/test_streamwise_auto_deploy.py | API tests for the three new routes, with K8s mocks and allocator patches. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Azure/realtimevideogen/sessions/d53596b3-c563-4deb-af27-7226e9dac364 Co-authored-by: James-QiuHaoran <22564180+James-QiuHaoran@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Azure/realtimevideogen/sessions/e52c100d-a760-4a1a-8771-416155f3e835 Co-authored-by: James-QiuHaoran <22564180+James-QiuHaoran@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Test Results1 174 tests +29 1 171 ✅ +26 14m 56s ⏱️ -5s For more details on these failures, see this check. Results for commit 19b539d. ± Comparison against base commit 9bcbc4e. ♻️ This comment has been updated with latest results. |
Add /api/auto_deploy/cluster_gpus endpoint that aggregates allocatable GPUs by type (H100, A100, etc.) from all ready nodes. The auto-deploy form fetches this on page load and pre-fills the GPU budget text boxes. Also fixed NVIDIA device plugin toleration for Spot nodes (needed to register GPUs on AKS Spot node pools). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The upstream NVIDIA device plugin DaemonSet lacks the Spot toleration, so it won't schedule on AKS Spot GPU nodes. Document that the local manifest (deployment/k8s/nvidia-device-plugin-ds.yaml) already includes this fix, and provide the patch command as a fallback. Also document the need for manual nvidia.com/gpu.product labels on nodes until GPU Feature Discovery is installed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously A100 defaulted to 8 even when no A100 nodes exist. Now all fields default to 0 and are populated only from the cluster_gpus API. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When MIG is unavailable, patch DEVICE_OPTIONS so OTHERS (kokoro+yolo) counts as 2 full GPUs instead of 1 MIG slice. Mark hunyuanframepackvae as co-located container (gpu=0) since it shares resources with the HunyuanFramePack server. This ensures budget=16 produces exactly 16 GPUs for StreamCast. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The allocator counts OTHERS as 1 GPU (SINGLE_DEVICE_MODELS constraint), but without MIG, kokoro+yolo each need a full GPU = 2. This caused per-type budget violations (e.g., A100=9 when budget=8). Instead of patching DEVICE_OPTIONS (ineffective due to allocator constraints), detect per-type overflow after allocation and trim excess replicas of the most-replicated container on the overflowing type. This preserves throughput while respecting per-type budgets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| <label for="auto_gpu_a100" class="form-label">A100</label> | ||
| <input type="number" class="form-control" id="auto_gpu_a100" name="gpu_a100" | ||
| min="0" max="64" value="0"> | ||
| </div> |
There was a problem hiding this comment.
Try to make it generic for GPUType
| <div class="row g-3 mb-3"> | ||
| <div class="col-md-3"> | ||
| <label for="auto_gpu_a100" class="form-label">A100</label> | ||
| <input type="number" class="form-control" id="auto_gpu_a100" name="gpu_a100" |
There was a problem hiding this comment.
Make a row per GPU type and make the number a slider
| <table class="table table-sm table-bordered" id="auto-deploy-plan-table"> | ||
| <thead> | ||
| <tr> | ||
| <th>Container</th> |
There was a problem hiding this comment.
This should show the friendly name
| <tr> | ||
| <th>Container</th> | ||
| <th>GPU</th> | ||
| <th>GPU Type</th> |
There was a problem hiding this comment.
This should show the friendly name.
| <th>GPU Type</th> | ||
| <th>CPU</th> | ||
| <th>Memory</th> | ||
| <th>MIG</th> |
There was a problem hiding this comment.
Next version may want to allow customizing the plan.
| </tr> | ||
| </thead> | ||
| <tbody id="auto-deploy-plan-body"></tbody> | ||
| </table> |
There was a problem hiding this comment.
Pass the disaggregation or leave a comment that is not disaggregated.
| `<strong>TTFF:</strong> ${metrics.ttff_s}s | ` + | ||
| `<strong>Cost:</strong> $${metrics.cost} | ` + | ||
| `<strong>GPUs Needed:</strong> ${actualGpus}` + | ||
| (metrics.budget_exceeded ? ' <span class="text-danger">(exceeds budget!)</span>' : ''); |
- Add /auto_deploy route with standalone auto_deploy.html page - Add robot icon + 'Auto Deploy' button on main index page above Applications - Remove auto-deploy section from add_pod.html (wrappers/apps add page) - Confirm endpoint now also deploys the application container (e.g., streamcast) when a workflow name is provided in the request Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In Docker the parent of /streamwise is /, which produced an invalid path /simulator/data/*.csv. Using _HERE resolves to /streamwise/simulator/data/ inside the container. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ocator The greedy allocator asserts GPU counts are multiples of NUM_GPUS_PER_SERVER (8). Now we round up for the allocator call, then trim specs back to the user's actual budget. This allows non-multiple budgets like 26 or 30. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a red 🗑️ Delete All button next to the ➕ under Wrappers. Clicking it calls DELETE /api/pods/wrappers which removes all non-app pods in the rtgen namespace. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace 4-column fixed GPU type inputs with dynamic rows (one per type). Rows auto-populate from the cluster state (e.g., 32 H100s). Users can add/remove rows; the dropdown prevents duplicate types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The delete-all-wrappers endpoint and the wrapper list on the main page now exclude the streamwise management pod and all STREAMWISE_APPS (load balancer / application containers). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…oy table The /api/auto_deploy response now includes friendly_name (from services.json friendlyName field via get_friendly_container_name) and uppercased gpu_type for each spec. The frontend displays these directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
COLOCATED_CONTAINERS was unconditionally forcing hunyuanframepackvae
to gpu=0 even when the policy has disaggregation={Model.HF: True}.
Now co-location only applies when disaggregation is disabled.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Lint Results
|
Mypy Type Checking✅ No issues found
Full mypy output |
Diff CoverageDiff: origin/main...HEAD, staged and unstaged changes
Summary
streamwise/allocator_bridge.pyLines 15-23 15 if _HERE not in sys.path:
16 sys.path.insert(0, _HERE)
17 _REPO_ROOT = os.path.dirname(_HERE)
18 if _REPO_ROOT not in sys.path:
! 19 sys.path.insert(0, _REPO_ROOT)
20 import model_provisioner # noqa: E402, F401 — adds simulator/ to sys.path
21
22 from dataclasses import dataclass
23 from typing import OptionalLines 54-66 54
55
56 def get_mig_profile(container_name: str, gpu_type: GPUType) -> Optional[str]:
57 """Return a MIG profile only when MIG is available and the GPU type supports it."""
! 58 if not MIG_AVAILABLE:
! 59 return None
! 60 if gpu_type not in MIG_CAPABLE_GPU_TYPES:
! 61 return None
! 62 return MIG_CONTAINERS.get(container_name)
63
64
65 # Mapping from StreamWise app name to simulator workflow key
66 APP_TO_WORKFLOW: dict[str, str] = {Lines 113-128 113
114
115 def _calc_actual_gpus_per_type(specs: list['DeploymentSpec']) -> dict[GPUType, int]:
116 """Calculate actual GPUs needed per GPUType from deployment specs."""
! 117 result: dict[GPUType, int] = {}
! 118 for spec in specs:
! 119 if spec.mig_profile:
! 120 continue
! 121 gpu_type = _POD_STR_TO_GPU_TYPE.get(spec.gpu_type or "")
! 122 if gpu_type is not None:
! 123 result[gpu_type] = result.get(gpu_type, 0) + spec.gpu
! 124 return result
125
126
127 def _trim_specs_for_type(
128 specs: list['DeploymentSpec'], gpu_type_str: str, excess: intLines 133-163 133 Prefers removing replicas of the most-replicated scalable container (typically
134 realesrgan/upscaler) to minimize impact on pipeline throughput.
135 """
136 # Count replicas per container on this GPU type (only scalable ones)
! 137 from collections import Counter
! 138 type_counts: Counter[str] = Counter()
! 139 for spec in specs:
! 140 if spec.gpu_type == gpu_type_str and spec.gpu > 0 and spec.container_name not in COLOCATED_CONTAINERS:
! 141 type_counts[spec.container_name] += 1
142
143 # Prefer trimming containers with most replicas (least impact per removal)
! 144 trimmed = 0
! 145 result_specs = list(specs)
! 146 for container_name, _count in type_counts.most_common():
! 147 if trimmed >= excess:
! 148 break
149 # Remove replicas from the end of the list
! 150 for i in range(len(result_specs) - 1, -1, -1):
! 151 if trimmed >= excess:
! 152 break
! 153 spec = result_specs[i]
! 154 if (spec.container_name == container_name
155 and spec.gpu_type == gpu_type_str
156 and spec.gpu > 0):
! 157 trimmed += spec.gpu
! 158 result_specs.pop(i)
! 159 return result_specs
160
161
162 def get_available_workflows() -> list[str]:
163 """Return list of available workflow names for the UI."""Lines 222-230 222 # Load latency data and run allocator
223 data_dir = _get_data_dir()
224 latency_data = load_latency_data(data_dir=data_dir)
225
! 226 allocator = AutoModelAllocator(
227 workflow=workflow,
228 latency_data=latency_data,
229 policy=STREAMWISE_POLICY,
230 )Lines 228-253 228 latency_data=latency_data,
229 policy=STREAMWISE_POLICY,
230 )
231
! 232 result = allocator.allocate(num_gpus=allocator_gpus, verbose=False)
233
234 # Convert result to deployment specs
! 235 specs = result_to_deployment_specs(result)
236
237 # Trim deployment specs back to the user's actual budget.
238 # Also handles MIG-unavailable overflow (e.g., OTHERS allocates 1 GPU
239 # but kokoro+yolo each need a full GPU = 2).
! 240 actual_per_type = _calc_actual_gpus_per_type(specs)
! 241 for gpu_type, budget_count in num_gpus.items():
! 242 actual = actual_per_type.get(gpu_type, 0)
! 243 if actual <= budget_count:
! 244 continue
! 245 excess = actual - budget_count
! 246 gpu_type_str = GPU_TYPE_TO_POD_STR[gpu_type]
! 247 specs = _trim_specs_for_type(specs, gpu_type_str, excess)
248
! 249 return DeploymentPlan(
250 specs=specs,
251 result=result,
252 workflow_name=workflow_name,
253 gpu_budget=gpu_budget,Lines 286-297 286 container_name in COLOCATED_CONTAINERS
287 and not STREAMWISE_POLICY.disaggregation.get(Model.HF, False)
288 )
289 if is_colocated:
! 290 gpu_count = 0
291 elif MIG_AVAILABLE and container_name in MIG_CONTAINERS:
! 292 mig_profile = MIG_CONTAINERS[container_name]
! 293 gpu_count = 1
294 elif container_name in MIG_CONTAINERS:
295 # MIG not available: use 1 full GPU instead of a MIG slice
296 gpu_count = 1
297 else:Lines 317-325 317 # when MIG is unavailable and services fall back to full GPUs).
318 actual_gpus: dict[str, int] = {}
319 for spec in plan.specs:
320 if spec.mig_profile:
! 321 continue # MIG slices don't count against full GPU budget
322 gpu_type_key = spec.gpu_type or "unknown"
323 actual_gpus[gpu_type_key] = actual_gpus.get(gpu_type_key, 0) + spec.gpu
324
325 total_budget = sum(plan.gpu_budget.values())Lines 327-339 327 budget_exceeded = total_actual > total_budget
328
329 warnings: list[str] = []
330 if budget_exceeded:
! 331 mig_hint = (
332 "Enable MIG to fit lightweight services (kokoro, yolo, realesrgan) "
333 "on shared GPU slices."
334 ) if not MIG_AVAILABLE else ""
! 335 warnings.append(
336 f"Deployment requires {total_actual} full GPUs but budget is "
337 f"{total_budget}. {mig_hint}"
338 )streamwise/container_config.pyLines 12-24 12 import os
13
14 _HERE = os.path.dirname(os.path.abspath(__file__))
15 if _HERE not in sys.path:
! 16 sys.path.insert(0, _HERE)
17
18 _REPO_ROOT = os.path.dirname(_HERE)
19 if _REPO_ROOT not in sys.path:
! 20 sys.path.insert(0, _REPO_ROOT)
21
22 # model_provisioner import adds simulator/ to sys.path
23 import model_provisioner # noqa: E402, F401streamwise/streamwise.pyLines 551-559 551
552 @route("/auto_deploy", methods=["GET"])
553 async def auto_deploy_page() -> str:
554 """Render the standalone auto-deploy page. (TODO: enable customization after auto-deploy plan is generated)"""
! 555 return await render_template("auto_deploy.html")
556
557
558 @route("/api/pod/<pod_name>", methods=["DELETE"])
559 async def api_remove_pod(pod_name: str) -> QuartReturn:Lines 569-596 569
570 @route("/api/pods/wrappers", methods=["DELETE"])
571 async def api_delete_all_wrappers() -> QuartReturn:
572 """Delete all wrapper pods (non-app, non-system pods) in the namespace."""
! 573 svcs = await get_services(namespace=NAMESPACE, k8s_cluster=k8s_cluster)
574 # Exclude app pods and the streamwise management pod itself
! 575 excluded = set(STREAMWISE_APPS) | {"streamwise"}
! 576 wrapper_pods = [
577 svc["pod_name"] for svc in svcs
578 if svc.get("container_name") not in excluded and svc.get("pod_name")
579 ]
! 580 deleted = 0
! 581 errors: list[str] = []
! 582 for pod_name in wrapper_pods:
! 583 try:
! 584 await pod_manager.remove_pod(
585 pod_name, namespace=NAMESPACE, k8s_cluster=k8s_cluster)
! 586 deleted += 1
! 587 except Exception as e:
! 588 errors.append(f"{pod_name}: {e}")
! 589 result: dict[str, object] = {"deleted": deleted, "total": len(wrapper_pods)}
! 590 if errors:
! 591 result["errors"] = errors
! 592 return jsonify(result), HTTPStatus.OK
593
594
595 @route("/api/services", methods=["GET"])
596 async def api_get_services() -> QuartReturn:Lines 631-657 631 try:
632 # Build container_dict from shared constants (CONTAINER_RESOURCES + MIG_CONTAINERS).
633 # Format: container_name -> (cpu, memory_gib, ephemeral_storage_gib, gpu_info)
634 # gpu_info is either an int (GPU count) or a MIG profile string.
! 635 container_dict: dict[str, tuple[int, int, int, Union[int, str]]] = {}
636
637 # Services not in CONTAINER_RESOURCES (CPU-only or extra services)
! 638 container_dict["podcasttranscript"] = (1, 4, 16, 0)
! 639 container_dict["slidetranscript"] = (1, 4, 16, 0)
640
! 641 for name, (cpu, mem, storage) in CONTAINER_RESOURCES.items():
! 642 if name in MIG_CONTAINERS:
! 643 container_dict[name] = (cpu, mem, storage, MIG_CONTAINERS[name])
644 else:
! 645 container_dict[name] = (cpu, mem, storage, min(2, max_gpus))
646
647 # Additional services not covered by CONTAINER_RESOURCES
! 648 container_dict["fluxkontext"] = (12, 128, 64, 1)
! 649 container_dict["whisper"] = (2, 8, 16, 1)
650
651 # hunyuanframepackvae uses exactly 1 GPU (not scaled by max_gpus)
! 652 cpu, mem, storage = CONTAINER_RESOURCES["hunyuanframepackvae"]
! 653 container_dict["hunyuanframepackvae"] = (cpu, mem, storage, 1)
654
655 for container_name, (cpu, mem_gib, sotrage_gib, gpu_info) in container_dict.items():
656 num_gpus, mig_profile = parse_gpu_info(gpu_info)
657 await pod_manager.add_pod(Lines 788-796 788 if not gpu_budget or not isinstance(gpu_budget, dict):
789 return jsonify({"error": "Missing or invalid 'gpu_budget' field"}), HTTPStatus.BAD_REQUEST
790 for gpu_type_name, count in gpu_budget.items():
791 if isinstance(count, bool) or not isinstance(count, int) or count < 0:
! 792 return (
793 jsonify(
794 {
795 "error": (
796 "Invalid 'gpu_budget' field: each GPU type count must be a "Lines 819-836 819 return jsonify(result_json), HTTPStatus.OK
820
821 except ValueError as ve:
822 return jsonify({"error": str(ve)}), HTTPStatus.BAD_REQUEST
! 823 except AssertionError as ae:
! 824 msg = str(ae) if str(ae) else (
825 "GPU budget too small. Each GPU type must have at least 8 GPUs "
826 "(one full server). Use a single GPU type with 8+ GPUs, or "
827 "ensure each type has at least 8."
828 )
! 829 return jsonify({"error": msg}), HTTPStatus.BAD_REQUEST
! 830 except Exception as ex:
! 831 logging.exception("Error in auto_deploy: %s", ex)
! 832 return jsonify({"error": str(ex)}), HTTPStatus.INTERNAL_SERVER_ERROR
833
834
835 @route("/api/auto_deploy/confirm", methods=["POST"])
836 async def api_auto_deploy_confirm() -> QuartReturn:Lines 861-870 861
862 for spec in specs:
863 container_name = spec.get("container_name")
864 if not container_name:
! 865 errors.append("Spec missing 'container_name'")
! 866 continue
867
868 try:
869 add_pod_result = await pod_manager.add_pod(
870 container_name=container_name,Lines 882-891 882 if isinstance(add_pod_result, tuple) and len(add_pod_result) >= 2:
883 status_value = add_pod_result[1]
884 if isinstance(status_value, HTTPStatus):
885 status_code = status_value
! 886 elif isinstance(status_value, int):
! 887 status_code = HTTPStatus(status_value)
888
889 if status_code >= HTTPStatus.BAD_REQUEST:
890 msg = f"Failed to deploy '{container_name}' (status={int(status_code)})"
891 logging.error(msg)Lines 891-907 891 logging.error(msg)
892 errors.append(msg)
893 else:
894 deployed.append(container_name)
! 895 except Exception as pod_ex:
! 896 msg = f"Failed to deploy '{container_name}': {pod_ex}"
! 897 logging.error(msg)
! 898 errors.append(msg)
899
900 # Also deploy the application container if workflow is specified
901 if workflow and workflow in STREAMWISE_APPS:
! 902 try:
! 903 add_pod_result = await pod_manager.add_pod(
904 container_name=workflow,
905 cpu=4,
906 memory_gib=16,
907 ephemeral_storage_gib=16,Lines 910-934 910 mig_profile=None,
911 namespace=NAMESPACE,
912 k8s_cluster=k8s_cluster,
913 )
! 914 status_code = HTTPStatus.OK
! 915 if isinstance(add_pod_result, tuple) and len(add_pod_result) >= 2:
! 916 status_value = add_pod_result[1]
! 917 if isinstance(status_value, HTTPStatus):
! 918 status_code = status_value
! 919 elif isinstance(status_value, int):
! 920 status_code = HTTPStatus(status_value)
! 921 if status_code >= HTTPStatus.BAD_REQUEST:
! 922 msg = f"Failed to deploy app '{workflow}' (status={int(status_code)})"
! 923 logging.error(msg)
! 924 errors.append(msg)
925 else:
! 926 deployed.append(workflow)
! 927 except Exception as app_ex:
! 928 msg = f"Failed to deploy app '{workflow}': {app_ex}"
! 929 logging.error(msg)
! 930 errors.append(msg)
931
932 total_deployed = len(deployed)
933 total_specs = len(specs) + (1 if workflow and workflow in STREAMWISE_APPS else 0)
934 status = HTTPStatus.OK if not errors else HTTPStatus.MULTI_STATUSLines 937-947 937 "errors": errors,
938 "message": f"Deployed {total_deployed}/{total_specs} containers.",
939 }), status
940
! 941 except Exception as ex:
! 942 logging.exception("Error in auto_deploy/confirm: %s", ex)
! 943 return jsonify({"error": str(ex)}), HTTPStatus.INTERNAL_SERVER_ERROR
944
945
946 @route("/api/auto_deploy/workflows", methods=["GET"])
947 async def api_auto_deploy_workflows() -> QuartReturn:Lines 971-989 971 gpu_count = node.get("allocatable_resources", {}).get("gpu", 0)
972 if isinstance(gpu_count, str):
973 try:
974 gpu_count = int(gpu_count)
! 975 except ValueError:
! 976 continue
977 if gpu_count <= 0:
! 978 continue
979 # Map gpu_model label to canonical type name
980 canonical = _gpu_label_to_canonical(gpu_model)
981 gpu_counts[canonical] = gpu_counts.get(canonical, 0) + gpu_count
982 return jsonify({"gpu_budget": gpu_counts}), HTTPStatus.OK
! 983 except Exception as ex:
! 984 logging.exception("Error in cluster_gpus: %s", ex)
! 985 return jsonify({"error": str(ex)}), HTTPStatus.INTERNAL_SERVER_ERROR
986
987
988 def _gpu_label_to_canonical(gpu_model: str) -> str:
989 """Map a GPU product label to a canonical type name for the allocator."""Lines 990-1010 990 model_upper = gpu_model.upper()
991 if "H100" in model_upper:
992 return "H100"
993 elif "H200" in model_upper:
! 994 return "H200"
995 elif "A100" in model_upper:
996 return "A100"
! 997 elif "GB200" in model_upper:
! 998 return "GB200"
! 999 elif "GB300" in model_upper:
! 1000 return "GB300"
! 1001 elif "V100" in model_upper:
! 1002 return "V100"
! 1003 elif "A10" in model_upper:
! 1004 return "A10"
1005 # Fallback: return as-is
! 1006 return gpu_model
1007
1008
1009 @route("/api/node/<node_name>", methods=["DELETE"])
1010 async def api_remove_node(node_name: str) -> QuartReturn:tests/streamwise/test_allocator_bridge.pyLines 251-263 251 plan = run_allocator(
252 gpu_budget={"A100": 8},
253 workflow_name="streamcast",
254 )
! 255 assert isinstance(plan, DeploymentPlan)
! 256 assert len(plan.specs) > 0
! 257 assert plan.result.total_time_s > 0
! 258 assert plan.result.ttff_s > 0
! 259 assert plan.workflow_name == "streamcast"
260
261
262 def test_run_allocator_streamchat_8_h100() -> None:
263 """Run allocator for StreamChat with 8 H100s."""Lines 264-273 264 plan = run_allocator(
265 gpu_budget={"H100": 8},
266 workflow_name="streamchat",
267 )
! 268 assert isinstance(plan, DeploymentPlan)
! 269 assert len(plan.specs) > 0
270
271
272 def test_run_allocator_invalid_workflow() -> None:
273 """Unknown workflow name raises ValueError.""" |
|
| @@ -0,0 +1,225 @@ | |||
| # StreamWise Demo: End-to-End AKS Deployment with GPU Spot Probing | |||
There was a problem hiding this comment.
Let's move this to a different PR. It should have a different structure too.
| kubectl apply -f deployment/k8s/nvidia-device-plugin-ds.yaml | ||
| ``` | ||
|
|
||
| ### 5.0 Critical: Spot Node Toleration and GPU Labels |
There was a problem hiding this comment.
Let's do this in a separate PR.
| <meta name="viewport" content="width=device-width, initial-scale=1"> | ||
| <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.5/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-SgOJa3DmI69IUzQ2PVdRZhwQ+dy64/BUtbMJw1MZ8t5HZApcHrRKUc4W0kG879m7" crossorigin="anonymous"> | ||
| <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}"> | ||
| <link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>🎬</text></svg>"> |
There was a problem hiding this comment.
Change the emoji to the robot?
|
|
||
| # Mapping from simulator Model enum to concrete container names used by pod_manager. | ||
| # Some Model entries map to multiple containers (e.g., OTHERS -> kokoro + yolo). | ||
| MODEL_TO_CONTAINERS: dict[Model, list[str]] = { |
There was a problem hiding this comment.
Can we get these names from services.json?
|
|
||
|
|
||
| # Mapping from StreamWise app name to simulator workflow key | ||
| APP_TO_WORKFLOW: dict[str, str] = { |
There was a problem hiding this comment.
Where are we getting these second names from? Why not just use the apps? We have the names in some class no?
| # Default CPU/memory/storage for each container when deployed via auto-deploy. | ||
| # Format: (cpu_cores, memory_gib, ephemeral_storage_gib) | ||
| # Keep in sync with the Helm values in deployment/helm/values.yaml. | ||
| CONTAINER_RESOURCES: dict[str, tuple[int, int, int]] = { |
There was a problem hiding this comment.
Don't we have something similar in the add_pod? Can we reuse?
| } | ||
|
|
||
| # GPU type string used by pod_manager (lowercase). | ||
| GPU_TYPE_TO_POD_STR: dict[GPUType, str] = { |
Add an Auto Deploy button to the web dashboard that automatically optimizes GPU resource allocation across workflow components using the model provisioner's greedy allocator.
Changes
streamwise/allocator_bridge.py(new): Maps allocator output to K8s deployment parameters (Model enum → container names, GPU specs)streamwise/streamwise.py: Add/api/auto_deployand/api/auto_deploy/confirmroutesstreamwise/templates/add_pod.html: Auto-deploy UI section with GPU budget inputs, workflow selector, and deployment plan previewtests/streamwise/test_allocator_bridge.py(new): 15 tests for allocator bridgetests/streamwise/test_streamwise_auto_deploy.py(new): 10 tests for auto-deploy APITesting