Engage Server Locks-up Due to Deadlock in Jetty when run for extended periods

Steps to reproduce

Steps to reproduce:
1. Run a distributed system - one admin, two workers and one engage for multiple days
2. Occasionally, the engage server will lock up.
3.

Actual Results:
every few days the engage server locks up and becomes unresponsive

Expected Results:
engage server runs without locking up

Workaround (if any):
restart the server

Activity

Show:
Stephen Marquard
April 20, 2016, 7:04 PM

The updated org.apache.felix.http.bundle-2.2.0-opencast20160413.jar has been running in our UCT 1.6.x production system for about a week, so far without problems, and without a repeat of the deadlock / lockup.

Stephen Marquard
April 14, 2016, 8:39 AM

Here is the process for building a replacement jar, including the fix for this issue and MH-9610:

svn co http://svn.apache.org/repos/asf/felix/releases/org.apache.felix.http-2.2.0/
cd org.apache.felix.http-2.2.0
patch -p1 < configurable-http-timeout.patch
patch -p0 < jetty-version.diff
mvn versions:set -DnewVersion=2.2.0-opencast20160413
mvn clean install

Resulting jar is in ./bundle/target/org.apache.felix.http.bundle-2.2.0-opencast20160413.jar

Stephen Marquard
April 13, 2016, 7:48 PM

Testing the replacement jar org.apache.felix.http.bundle-2.2.0-opencast20160413.jar. This replaces org.apache.felix.http.bundle-2.2.0-opencast.jar in /opt/matterhorn/lib/ext/. Clear felix cache, and update the reference in etc/system.properties (under felix.auto.start.1).

This version includes the fix for and upgrades the jetty version from 6.1.24 to 6.1.26 for the deadlock fix.

Stephen Marquard
April 13, 2016, 4:00 PM

Thread discussing the resolution to this is here:

https://groups.google.com/a/opencast.org/forum/#!topic/matterhorn-users/X8Z0Qlf6diE

Stephen Marquard
June 15, 2015, 2:11 PM

We seem to have had a similar issue on our 1.6.x admin/engage node (happened twice in about 6 months).

Oddly it only affected http connections which hung, https connections to the same server were fine (both proxied through apache httpd, though restarting httpd did not help).

Attaching thread dump.

Fixed and reviewed
Your pinned fields
Click on the next to a field label to start pinning.

Assignee

John Crossman

Reporter

Jonathan Felder

Severity

Crash/Hang

Tags (folksonomy)