Stale database connection causes job failure

Steps to reproduce

It appears that a worker node can attempt to use a database connection that has been closed by the server because it has been idle for longer than the server-configured wait_timeout period (default 8 hours).

This can be worked around by increasing wait_timeout on the server (e.g. to 24 hours) but Opencast should recover gracefully from this situation without causing job processing to fail.

javax.persistence.RollbackException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.0.2.v20100323-r6872): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 73,042,797 milliseconds ago. The last packet sent successfully to the server was 73,042,797 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
Error Code: 0
Call: SELECT id FROM mh_job WHERE (id = ?)
bind => [40574143]
Query: DoesExistQuery(referenceClass=JobJpaImpl sql="SELECT id FROM mh_job WHERE (id = ?)")
at org.eclipse.persistence.internal.jpa.transaction.EntityTransactionImpl.commitInternal(EntityTransactionImpl.java:102)
at org.eclipse.persistence.internal.jpa.transaction.EntityTransactionImpl.commit(EntityTransactionImpl.java:63)
at org.opencastproject.serviceregistry.impl.ServiceRegistryJpaImpl.createJob(ServiceRegistryJpaImpl.java:472)
at org.opencastproject.serviceregistry.impl.ServiceRegistryJpaImpl.createJob(ServiceRegistryJpaImpl.java:360)
at org.opencastproject.inspection.ffmpeg.MediaInspectionServiceImpl.inspect(MediaInspectionServiceImpl.java:152)
at org.opencastproject.composer.impl.ComposerServiceImpl.inspect(ComposerServiceImpl.java:1505)
at org.opencastproject.composer.impl.ComposerServiceImpl.encode(ComposerServiceImpl.java:302)
at org.opencastproject.composer.impl.ComposerServiceImpl.mux(ComposerServiceImpl.java:460)
at org.opencastproject.composer.impl.ComposerServiceImpl.process(ComposerServiceImpl.java:1406)
at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:202)
at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:167)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.0.2.v20100323-r6872): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 73,042,797 milliseconds ago. The last packet sent successfully to the server was 73,042,797 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
Error Code: 0
Call: SELECT id FROM mh_job WHERE (id = ?)
bind => [40574143]
Query: DoesExistQuery(referenceClass=JobJpaImpl sql="SELECT id FROM mh_job WHERE (id = ?)")
at org.eclipse.persistence.exceptions.DatabaseException.sqlException(DatabaseException.java:333)
at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.processExceptionForCommError(DatabaseAccessor.java:1422)
at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:679)
at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeCall(DatabaseAccessor.java:530)
at org.eclipse.persistence.sessions.server.ServerSession.executeCall(ServerSession.java:529)
at org.eclipse.persistence.internal.sessions.IsolatedClientSession.executeCall(IsolatedClientSession.java:133)
at org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.executeCall(DatasourceCallQueryMechanism.java:206)
at org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.executeCall(DatasourceCallQueryMechanism.java:192)
at org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.selectRowForDoesExist(DatasourceCallQueryMechanism.java:661)
at org.eclipse.persistence.queries.DoesExistQuery.executeDatabaseQuery(DoesExistQuery.java:234)
at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:675)
at org.eclipse.persistence.queries.DatabaseQuery.executeInUnitOfWork(DatabaseQuery.java:589)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:2898)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1225)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1207)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1167)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.checkForUnregisteredExistingObject(UnitOfWorkImpl.java:761)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.discoverAndPersistUnregisteredNewObjects(UnitOfWorkImpl.java:4151)
at org.eclipse.persistence.mappings.ObjectReferenceMapping.cascadeDiscoverAndPersistUnregisteredNewObjects(ObjectReferenceMapping.java:801)
at org.eclipse.persistence.mappings.ObjectReferenceMapping.cascadeDiscoverAndPersistUnregisteredNewObjects(ObjectReferenceMapping.java:779)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.cascadeDiscoverAndPersistUnregisteredNewObjects(ObjectBuilder.java:1540)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.discoverAndPersistUnregisteredNewObjects(UnitOfWorkImpl.java:4163)
at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.discoverUnregisteredNewObjects(RepeatableWriteUnitOfWork.java:225)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.calculateChanges(UnitOfWorkImpl.java:699)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitToDatabaseWithChangeSet(UnitOfWorkImpl.java:1503)
at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.commitRootUnitOfWork(RepeatableWriteUnitOfWork.java:200)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitAndResume(UnitOfWorkImpl.java:1139)
at org.eclipse.persistence.internal.jpa.transaction.EntityTransactionImpl.commitInternal(EntityTransactionImpl.java:84)
... 14 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 73,042,797 milliseconds ago. The last packet sent successfully to the server was 73,042,797 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
at sun.reflect.GeneratedConstructorAccessor175.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:406)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3313)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1940)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2109)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2643)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2077)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2228)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:76)
at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeSelect(DatabaseAccessor.java:892)
at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:602)
... 39 more
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3294)
... 47 more

Activity

Show:
Andi Krieger
October 30, 2016, 12:22 AM

This is happening here also. On our rather fresh install, leaving the machine without a job for > 8h. The first job after that hits this exception, it seems, causing the processing to fail.

Andi Krieger
October 30, 2016, 12:26 AM

Assigning fix version 2.3.0 because this problem persists with 2.2.2, and we are in the public QA phase for 2.3.

Andi Krieger
October 30, 2016, 4:41 AM

Also affected, it seems:
https://groups.google.com/a/opencast.org/forum/#!msg/users/iep800Gkieo/oW8hN3ttAwAJ
https://groups.google.com/a/opencast.org/forum/#!topic/users/QDUYKXzy1YU

Andi Krieger
January 12, 2017, 10:30 PM

I'm really having a hard time with this; I need to restart our 3 nodes every 8 hours to have a working system.
While this is a workaround, it just does not feel right. What else can I do? "curl" the system page regularly has no effect here.

Stephen Marquard
July 8, 2017, 9:15 PM
Edited

According to my testing, to avoid this issue in Opencast 3.x, you should explicitly configure org.opencastproject.db.jdbc.pool.max.idle.time in etc/custom.properties to a value lower than the mysql server's wait_timeout (28800 by default)

  1. max.idle.time must be lower than mysql wait_timeout
    org.opencastproject.db.jdbc.pool.max.idle.time=3600

The c3p0 connection pool has a default of 0 for maxIdleTime, so unless this is changed, connections closed by mysql won't be removed from the pool and will get used eventually triggering this issue.

http://www.mchange.com/projects/c3p0/#maxIdleTime

Fixed and reviewed

Assignee

Lukas Rohner

Reporter

Stephen Marquard

Severity

Incorrectly Functioning With Workaround

Tags (folksonomy)

None

Components

Fix versions

Affects versions

Priority

Major