Skip to content

feat(runners): Add nightly recycle for macOS 26 finch runners#1102

Draft
ayush-panta wants to merge 1 commit into
mainfrom
nightly-cycle-mac26-runners
Draft

feat(runners): Add nightly recycle for macOS 26 finch runners#1102
ayush-panta wants to merge 1 commit into
mainfrom
nightly-cycle-mac26-runners

Conversation

@ayush-panta

@ayush-panta ayush-panta commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Description of changes: Add scheduled ASG actions to spin down and spin up macOS 26 finch self-hosted runners nightly at 10:00/10:30 UTC (2:00/2:30 AM PST). This prevents state accumulation that causes runners to become unresponsive over time, which previously required manual termination to resolve.

This PR only targets these specific runners as they are the only ones frequently becoming unresponsive.

Testing done: Ran npx jest -c ./jest.config.unit.js:

...
Test Suites: 12 passed, 12 total
Tests:       15 passed, 15 total
Snapshots:   0 total
Time:        7.298 s, estimated 14 s
  • I've reviewed the guidance in CONTRIBUTING.md

License Acceptance

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ayush-panta ayush-panta requested a review from a team as a code owner June 2, 2026 20:25
Add scheduled ASG actions to spin down and spin up macOS 26 finch
runners nightly at 10:00/10:30 UTC (2:00/2:30 AM PST). This prevents
state accumulation that causes runners to become unresponsive over
time, which previously required manual termination to resolve.

Signed-off-by: ayush-panta <ayushkp@amazon.com>
@ayush-panta ayush-panta force-pushed the nightly-cycle-mac26-runners branch from d1429f5 to b81402f Compare June 2, 2026 20:26
Comment thread lib/asg-runner-stack.ts
this.repo === 'finch';

if (isMac26Finch) {
new autoscaling.CfnScheduledAction(this, 'NightlySpinDown', {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would replacing the root-volume nightly instead of provisioning a new instance altogher be better? will that solve our host contamination issues?
https://docs.aws.amazon.com/ebs/latest/userguide/ebs-restoring-volume.html#replace-root

@coderbirju coderbirju Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that there is a 24hour cycle for mac hosts to become available after termination, we should be sure that we will not hit our capacity quota if a termination fails

@Swapnanil-Gupta Swapnanil-Gupta left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this. Should we try to figure out why this is happening first?

@ayush-panta ayush-panta marked this pull request as draft June 3, 2026 18:45
@ayush-panta ayush-panta marked this pull request as ready for review June 3, 2026 18:46
@ayush-panta ayush-panta marked this pull request as draft June 3, 2026 18:46
@ayush-panta

Copy link
Copy Markdown
Contributor Author

Not a fan of this. Should we try to figure out why this is happening first?

It is difficult to determine why it is happening because the instances lose connection, so I can't SSM in to look at logs. I'm looking at an alternative approach with root volume replacement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants