OptimisticLockException on worker node can cause jobs to be stuck in DISPATCHING state

Steps to reproduce

Steps to reproduce:
1. Run an Opencast cluster with significant load
2. Observe in Jobs table that some jobs remain in DISPATCHING state
3. Observe in Servers table that some servers have jobs queued but aren't running any jobs

Workaround by restarting worker node (after placing it in maintenance mode).

This only seems to affect Inspect jobs (not sure why).

Activity

Show:
Karen Dolan
September 27, 2018, 4:18 PM

@greg_logan, we appear to have this issue but from a RollbackException wrapping the Optimistic Lock. So the catch clause did not catch it. I'm going to make a pull referencig this ticket with the addition of the RollbackException in the catch.

From our logs, we can see that the job completed one second before this error.

JobBarrier.suspendWaiterJob - Unable to put Some(240757) into a waiting state, this may cause a deadlock: javax.persistence.RollbackException: javax.persistence.OptimisticLockException: Exception [EclipseLink-5006] (Eclipse Persistence Services - 2.6.4.v20160829-44060b6): org.eclipse.persistence.exceptions.OptimisticLockException

Fixed and reviewed

Assignee

Greg Logan

Reporter

Stephen Marquard

Severity

Incorrectly Functioning With Workaround