Engage Server Locks-up Due to Deadlock in Jetty when run for extended periods
Steps to reproduce
Steps to reproduce:
1. Run a distributed system - one admin, two workers and one engage for multiple days
2. Occasionally, the engage server will lock up.
every few days the engage server locks up and becomes unresponsive
engage server runs without locking up
Workaround (if any):
restart the server
The updated org.apache.felix.http.bundle-2.2.0-opencast20160413.jar has been running in our UCT 1.6.x production system for about a week, so far without problems, and without a repeat of the deadlock / lockup.
Here is the process for building a replacement jar, including the fix for this issue and MH-9610:
svn co http://svn.apache.org/repos/asf/felix/releases/org.apache.felix.http-2.2.0/
patch -p1 < configurable-http-timeout.patch
patch -p0 < jetty-version.diff
mvn versions:set -DnewVersion=2.2.0-opencast20160413
mvn clean install
Resulting jar is in ./bundle/target/org.apache.felix.http.bundle-2.2.0-opencast20160413.jar
Testing the replacement jar org.apache.felix.http.bundle-2.2.0-opencast20160413.jar. This replaces org.apache.felix.http.bundle-2.2.0-opencast.jar in /opt/matterhorn/lib/ext/. Clear felix cache, and update the reference in etc/system.properties (under felix.auto.start.1).
This version includes the fix for and upgrades the jetty version from 6.1.24 to 6.1.26 for the deadlock fix.
We seem to have had a similar issue on our 1.6.x admin/engage node (happened twice in about 6 months).
Oddly it only affected http connections which hung, https connections to the same server were fine (both proxied through apache httpd, though restarting httpd did not help).
Attaching thread dump.