We're updating the issue view to help you get more done. 

Worker node restarted suddenly can start with an incorrect job load

Steps to reproduce

Steps to reproduce:

1. Start worker node, process some jobs so the worker is busy
2. Kill the worker suddenly in an unclean shutdown (kill -9 the java process, or power-cycle the machine without a soft shutdown/reboot).
3. Start the worker node

Actual Results:

The worker node starts up with a local job load figure that is non-zero.


2019-03-28 16:14:56,556 | DEBUG | qtp1562501205-174 | (ServiceRegistryJpaImpl:935) - 174 Adding to load cache: Job 71435344, type org.opencastproject.execute, load 2.0, status RUNNING
2019-03-28 16:14:56,557 | DEBUG | qtp1562501205-174 | (ServiceRegistryJpaImpl:949) - 174 Current host load: 23.5, job load cache size: 1

The admin node keeps allocating jobs to this worker (because the admin node thinks it has a low job load), but the worker node keeps declining them, for example:

2019-04-03 12:20:03,660 | DEBUG | qtp1562501205-50361 | (AbstractJobProducer:193) - 50361 Declining job 71655516 of type org.opencastproject.composer with load 3 because load of 24.5 would exceed this node's limit of

As a result, the worker node no longer processes any jobs for the cluster (or perhaps processes only small jobs, far under its capacity)

Expected Results:

Worker node should start with job load figure of 0.

Workaround (if any):

Restart the worker node (clean shutdown and restart).



Greg Logan


Stephen Marquard


Incorrectly Functioning With Workaround

Tags (folksonomy)



Fix versions

Affects versions