Skip to content

Conversation

@wolfgang-desalvador
Copy link
Collaborator

This pull request modernizes and simplifies the deployment workflow for Azure CycleCloud Slurm Workspace environments. It replaces manual parameter file editing and environment variable management with a comprehensive, interactive deployment script (deploy-ccws.sh). The documentation is extensively updated to guide users through both interactive and automated deployments, including support for advanced features like availability zones and optional database integration.

Deployment workflow modernization:

  • Introduced the scripts/deploy-ccws.sh automation script, which interactively collects deployment parameters, generates the required output.json file, and optionally performs the deployment. This replaces manual editing of parameter templates and environment variable setup.
  • Added detailed documentation for the new script, including usage examples, parameter descriptions, interactive prompts for availability zones, and troubleshooting guidance. This ensures users can reliably reproduce deployments and understand all options.

Template and documentation cleanup:

  • Removed the legacy large-ai-training-cluster-parameters.template file, which relied on environment variable substitution. All parameterization is now handled by deploy-ccws.sh, streamlining the process and reducing the risk of misconfiguration.
  • Updated the README to clarify the optional nature of MySQL Flexible Server deployment, and to explain how database integration is handled via script flags rather than direct template modification.

Database integration improvements:

  • Added support for all-or-nothing database configuration via script flags (--db-name, --db-user, --db-password, --db-id). The script validates these parameters and disables accounting if not all are provided, improving security and usability.

These changes make deployments more robust, reproducible, and user-friendly, while reducing manual steps and potential errors.

@edwardsp edwardsp self-requested a review November 7, 2025 08:23
Copy link
Contributor

@edwardsp edwardsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved, just need to fix conflicts

@wolfgang-desalvador wolfgang-desalvador force-pushed the wdesalvador/add-ccws-deploy-script branch from 65e4088 to c080b1b Compare November 7, 2025 09:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive deployment automation script (deploy-ccws.sh) for Azure CycleCloud Workspace for Slurm, replacing the previous manual template-based approach. The changes include:

  • A new 888-line Bash deployment script with extensive parameter handling, validation, and interactive/non-interactive modes
  • Removal of the old template file (large-ai-training-cluster-parameters.template)
  • Comprehensive README documentation updates with detailed usage examples and parameter references
  • Git configuration updates (.gitignore for generated artifacts, .gitattributes for shell script line endings)
  • Minor documentation corrections standardizing "Azure CycleCloud Workspace for Slurm" naming

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
infrastructure_references/azure_cyclecloud_workspace_for_slurm/scripts/deploy-ccws.sh New deployment automation script with SKU validation, zone discovery, database auto-creation, and parameter generation
infrastructure_references/azure_cyclecloud_workspace_for_slurm/large-ai-training-cluster-parameters.template Removed old template file in favor of automated generation
infrastructure_references/azure_cyclecloud_workspace_for_slurm/README.md Comprehensive documentation update with script usage, examples, and parameter reference
examples/megatron-lm/GPT3-175B/slurm/README.md Minor naming correction: "Azure CycleCloud Slurm Workspace" → "Azure CycleCloud Workspace for Slurm"
examples/llm-foundry/slurm/README.md Minor naming correction: "Azure CycleCloud Slurm Workspace" → "Azure CycleCloud Workspace for Slurm"
README.md Minor naming correction in infrastructure references catalog
.gitignore Added generated artifacts: output.json and cyclecloud-slurm-workspace directory
.gitattributes Enforced LF line endings for shell scripts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wolfgang-desalvador wolfgang-desalvador force-pushed the wdesalvador/add-ccws-deploy-script branch from 34be821 to de5baf8 Compare November 7, 2025 13:47
@edwardsp edwardsp self-requested a review November 10, 2025 10:09
@edwardsp edwardsp merged commit e31244b into main Nov 10, 2025
15 of 17 checks passed
@edwardsp edwardsp deleted the wdesalvador/add-ccws-deploy-script branch November 10, 2025 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants