Socket timeout in TrustedHttpClientImpl needs to be set
Steps to reproduce
Steps to reproduce:
1. Engage crashes at exactly the time a job is dispatched to it by another node (usually admin)
The node that dispatched the job stops dispatching jobs
The node that dispatched the job continues dispatching jobs
Workaround (if any):
Restart the node that dispatched the job.
In TrustedHttpClientImpl, there is a DEFAULT_SOCKET_TIMEOUT defined as passed to the execute method, but it is never used.
We had a problem where our engage crashed and the admin stopped dispatching jobs, even after the engage node came back up.
The logs indicated that there was a job dispatched to engage at the exact time it crashed and then the next job dispatched was only when admin was restarted hours later.
It seemed that the request never timed out and the JobDispatcher thread hung when dispatching to engage.
I was able to replicate this problem in our development environment using the debugger to pause the code at exact that point on engage.
Is this an issue in 2.x / 3.x / develop? It doesn't seem like this code change made it beyond 1.7.
Pull request #1185
I've been staring at the code and I note that in retryAuthAndRequestAfterNonceTimeout() it doesn't reset the connectionTimeout on the new HttpClient it creates, this may also be applicable to the socketTimeout
Yes, it's been working for us for a long time. It's a very small change and I will create a patch.
Did you try the 60s socket timeout? Do you have a patch for this?