Socket timeout in TrustedHttpClientImpl needs to be set

Steps to reproduce

Steps to reproduce:
1. Engage crashes at exactly the time a job is dispatched to it by another node (usually admin)

Actual Results:
The node that dispatched the job stops dispatching jobs

Expected Results:
The node that dispatched the job continues dispatching jobs

Workaround (if any):
Restart the node that dispatched the job.

In TrustedHttpClientImpl, there is a DEFAULT_SOCKET_TIMEOUT defined as passed to the execute method, but it is never used.

We had a problem where our engage crashed and the admin stopped dispatching jobs, even after the engage node came back up.
The logs indicated that there was a job dispatched to engage at the exact time it crashed and then the next job dispatched was only when admin was restarted hours later.
It seemed that the request never timed out and the JobDispatcher thread hung when dispatching to engage.

I was able to replicate this problem in our development environment using the debugger to pause the code at exact that point on engage.

Activity

Show:
Stephen Marquard
May 6, 2017, 3:46 PM

Is this an issue in 2.x / 3.x / develop? It doesn't seem like this code change made it beyond 1.7.

Rute Santos
September 28, 2016, 4:41 PM

Pull request #1185

James Perrin
September 28, 2016, 3:34 PM

I've been staring at the code and I note that in retryAuthAndRequestAfterNonceTimeout() it doesn't reset the connectionTimeout on the new HttpClient it creates, this may also be applicable to the socketTimeout

Rute Santos
September 26, 2016, 2:24 PM

Yes, it's been working for us for a long time. It's a very small change and I will create a patch.

James Perrin
September 26, 2016, 1:55 PM

Did you try the 60s socket timeout? Do you have a patch for this?

Fixed and reviewed
Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Rute Santos

Reporter

Rute Santos

Severity

Incorrectly Functioning With Workaround