ServiceRegistry not updating some fields in the database correctly when dispatching jobs

Steps to reproduce

I've seen 2 cases where the database is not correctly updated:

1) When the job cannot be dispatched and is put into the dispatchPriorityList: the job's processor service is never set to null in the database.

2) When the job changes status from QUEUED to DISPATCHING: the value DISPATCHING never makes it to the database.

Note that these are not critical because #1 will "fix itself" in one of the next runs of the dispatcher and #2 will "fix itself" by the worker when it accepts the job, but it has caused DCE some trouble because of our horizontal scaling mechanism that needs to have accurate info in the database

Relevant pieces of code:

1) In the method update(): https://github.com/opencast/opencast/blob/9646abd232b08705dbbf868dcea5a390097a56c1/modules/serviceregistry/src/main/java/org/opencastproject/serviceregistry/impl/ServiceRegistryJpaImpl.java#L1128-L1132

The processor service is never set to null in the database if the processing host is null.

Solution: set it to null if host is null.

2) In the method updateInternal(): https://github.com/opencast/opencast/blob/9646abd232b08705dbbf868dcea5a390097a56c1/modules/serviceregistry/src/main/java/org/opencastproject/serviceregistry/impl/ServiceRegistryJpaImpl.java#L1006-L1031

The line at https://github.com/opencast/opencast/blob/9646abd232b08705dbbf868dcea5a390097a56c1/modules/serviceregistry/src/main/java/org/opencastproject/serviceregistry/impl/ServiceRegistryJpaImpl.java#L1013:

originalJob = JpaJob.from(fromDb.toJob());

Makes the 'job' state be reset to QUEUED (the value from the database), probably because 'job' and 'fromDb' are actually the same instance in the EM cache? Note that, when calling the same method via public 'updateJob', the problem does not happen because the JpaJob 'job' is created from scratch and not read from the database (not associated yet with the EM?)

This behavior can be verified by running the unit tests and logging the state right before and after that line.

The solution is to remove the line (it's only for debugging purposes).

Activity

Show:
Greg Logan
July 30, 2019, 5:28 PM

I’d be curious to see some unit tests around the JpaJob.from static. That method should be copying the status correctly (https://github.com/opencast/opencast/blob/9646abd232b08705dbbf868dcea5a390097a56c1/modules/common-jpa-impl/src/main/java/org/opencastproject/job/jpa/JpaJob.java#L269) so if it’s not then there are likely other issues.

Rute Santos
July 30, 2019, 5:41 PM
Edited

Hi @gregorydlogan, the method JpaJob.fromDb IS copying the status correctly. The problem is in updateInternal(). Accessing all fields of the variable ‘fromDb’ seems to be reloading the same values in the variable ‘job’ (both objects have the same id and are managed by the same entity manager) and thus resetting the status to QUEUED.

Greg Logan
July 31, 2019, 10:29 PM

Yikes. That’s a subtle one that would be very very difficult to catch. To be clear, I’m find with removing 1013, but it worries me that JPA is doing this. IMO, it shouldn’t be, because JpaJob.from() is supposed to be creating a new object which should be outside of the EM…

Rute Santos
August 1, 2019, 3:56 PM

Hi @gregorydlogan,
Thanks for pointing that out. The piece that resets the status is actually 'fromDb.toJob();'. The 'fromDb' variable is gotten from the EM cache with em.find. I just confirmed this by changing the code to:

fromDb = em.find(JpaJob.class, job.getId());
System.out.println(String.format("BEFORE toJob, Job %s status is %s", job.getId(), job.getStatus()));
Job j = fromDb.toJob();
System.out.println(String.format("AFTER toJob, Job %s status is %s", job.getId(), job.getStatus()));
originalJob = JpaJob.from(j);
System.out.println(String.format("AFTER from, Job %s status is %s", job.getId(), job.getStatus()));

And got:
BEFORE toJob, Job 6 status is DISPATCHING
AFTER toJob, Job 6 status is QUEUED
AFTER from, Job 6 status is QUEUED

It is indeed strange...

Assignee

Rute Santos

Reporter

Rute Santos

Severity

Incorrectly Functioning With Workaround

Tags (folksonomy)

None

Components

Affects versions

Priority

Minor
Configure