Steps to reproduce:
1. Start worker node, process some jobs so the worker is busy
2. Kill the worker suddenly in an unclean shutdown (kill -9 the java process, or power-cycle the machine without a soft shutdown/reboot).
3. Start the worker node
The worker node starts up with a local job load figure that is non-zero.
2019-03-28 16:14:56,556 | DEBUG | qtp1562501205-174 | (ServiceRegistryJpaImpl:935) - 174 Adding to load cache: Job 71435344, type org.opencastproject.execute, load 2.0, status RUNNING
2019-03-28 16:14:56,557 | DEBUG | qtp1562501205-174 | (ServiceRegistryJpaImpl:949) - 174 Current host load: 23.5, job load cache size: 1
The admin node keeps allocating jobs to this worker (because the admin node thinks it has a low job load), but the worker node keeps declining them, for example:
2019-04-03 12:20:03,660 | DEBUG | qtp1562501205-50361 | (AbstractJobProducer:193) - 50361 Declining job 71655516 of type org.opencastproject.composer with load 3 because load of 24.5 would exceed this node's limit of
As a result, the worker node no longer processes any jobs for the cluster (or perhaps processes only small jobs, far under its capacity)
Worker node should start with job load figure of 0.
Workaround (if any):
Restart the worker node (clean shutdown and restart).