We're updating the issue view to help you get more done. 

As a devop I want to be able to deploy nodes to autoscaling groups in the cloud

Description

I want to deploy nodes to cloud autoscaling groups to control my resource usage. Autoscaling starts and stops instances according to metrics or scheduling.

Stopping instances is problematic as they may still be running jobs. We need therefore some sort of mechanism for pausing termination until the job is complete.

AWS autoscaling:
A Lifecycle Hook for termination will put the instance into a terminating wait state, which will expire after a heartbeat time out (max 2hrs). The instance can emit a heartbeat to reset the timeout if jobs are still running. Once jobs are finished the instance can signal that termination can complete.

There are a couple of ways that this can be accomplished:

1. The instance ping AWS to get it's lifecycle state looking for terminate:wait

2. A CloudWatch alarm can trigger a Lambda function on the terminate:wait event. The Lambda should then PUT the Opencast node into terminating state.

The Opencast node if told it needs to terminate, should tell put itself into maintenance to stop receiving jobs from admin. Then periodically check whether its jobs are complete, if not emit a heartbeat, else signal that terminating the instance can continue.

Steps to reproduce

None

Status

Assignee

James Perrin

Reporter

James Perrin

Criticality

None

Tags (folksonomy)

Components

Affects versions

7.0

Priority