Gathering workflow statistics for JMX causes extreme performance issues

Steps to reproduce

Steps to reproduce:
1. Schedule a large number of upcoming recordings (e.g. 3000+)
2. Schedule a set of new recordings (e.g. 50 in a series) in a group
3.

Actual Results:

The action to schedule new recordings is extremely slow (e.g. 30+min to complete). Other actions that involve updating workflows are also extremely slow.

Expected Results:

Response time < 60s for group scheduling.

Diagnosis:

WorkflowServiceImpl.java includes this line for updating statistics that are provided by JMX:

workflowsStatistics.updateWorkflow(getBeanStatistics(), getHoldWorkflows());

The method getHoldWorkflows() reads every single upcoming scheduled recording into memory, as these are workflows in PAUSED state.

This causes extremely high memory usage, frequent GC activity, and slow response times.

Activity

Show:
James Perrin
July 27, 2015, 8:34 PM
Edited

relates to Job stats collection

James Perrin
July 27, 2015, 8:35 PM

like the job stats this should be configured to be off by default.

Stephen Marquard
July 27, 2015, 8:38 PM

It should be off by default, but also should be a more efficient implementation (or this figure removed from the JMX stats) otherwise turning it on would basically kill your server in some circumstances.

James Perrin
July 27, 2015, 8:53 PM

I would just add the config in this ticket. As the JMX stats are infrequently used, the amount of effort vs payoff is too high and better to create a separate ticket for it.

Fixed and reviewed

Assignee

James Perrin

Reporter

Stephen Marquard

Severity

Performance

Tags (folksonomy)

None

Components

Fix versions

Affects versions

Priority

Critical