Job dispatching can be slowed down excessively by host loads query

Description

With a significant number of jobs (in history and/or running), job dispatching can become very slow, in excess of 3 minutes per dispatch cycle.

This is caused by the call:

SystemLoad systemLoad = getHostLoads(em, true);

for every job being dispatched. This is the query

https://bitbucket.org/cilt/opencast/src/dc68c37d2eea7adbd32edba371d5192c1351feed/modules/matterhorn-common-jpa-impl/src/main/java/org/opencastproject/serviceregistry/impl/jpa/ServiceRegistrationJpaImpl.java?at=r%2F3.x&fileviewer=file-view-default#ServiceRegistrationJpaImpl.java-64

On our 3.x production system, New Relic reports this query taking 94% of all database time. Response time can be up to 2s, and the query can be called hundreds of times per minute.

The job dispatching code needs refactoring to avoid this expensive query being called frequently.

Activity

Show:
James Perrin
December 11, 2017, 4:41 PM

Yeah I guess I'm the only person that can review that it works on MSSQL!

Greg Logan
December 11, 2017, 4:39 PM

let's reopen here, I'll create a branch and file another PR.

James Perrin
December 11, 2017, 4:28 PM

looks like changing it to "GROUP BY job.processorServiceRegistration.hostRegistration.baseUrl" fixes things for MS SQL, should make it faster for MYSQL too as it was grouping by every field of job.processorServiceRegistration (mh_service_registration) previously.

I can reopen this ticket or create a new one, preference?

Stephen Marquard
December 11, 2017, 2:44 PM

Looks like it's the other way around - you have to add it to the GROUP BY.

Greg Logan
December 11, 2017, 2:41 PM

we could also add the host to the output of the select. It wouldn't be technically useful, but it might resolve the issue with MSSQL. If I made that change in a separate branch would you have time to test?

Fixed and reviewed
Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Greg Logan

Reporter

Stephen Marquard