ServiceRegistry not updating some fields in the database correctly when dispatching jobs
Steps to reproduce
I've seen 2 cases where the database is not correctly updated:
1) When the job cannot be dispatched and is put into the dispatchPriorityList: the job's processor service is never set to null in the database.
2) When the job changes status from QUEUED to DISPATCHING: the value DISPATCHING never makes it to the database.
Note that these are not critical because #1 will "fix itself" in one of the next runs of the dispatcher and #2 will "fix itself" by the worker when it accepts the job, but it has caused DCE some trouble because of our horizontal scaling mechanism that needs to have accurate info in the database
Relevant pieces of code:
1) In the method update(): https://github.com/opencast/opencast/blob/9646abd232b08705dbbf868dcea5a390097a56c1/modules/serviceregistry/src/main/java/org/opencastproject/serviceregistry/impl/ServiceRegistryJpaImpl.java#L1128-L1132
The processor service is never set to null in the database if the processing host is null.
Solution: set it to null if host is null.
2) In the method updateInternal(): https://github.com/opencast/opencast/blob/9646abd232b08705dbbf868dcea5a390097a56c1/modules/serviceregistry/src/main/java/org/opencastproject/serviceregistry/impl/ServiceRegistryJpaImpl.java#L1006-L1031
The line at https://github.com/opencast/opencast/blob/9646abd232b08705dbbf868dcea5a390097a56c1/modules/serviceregistry/src/main/java/org/opencastproject/serviceregistry/impl/ServiceRegistryJpaImpl.java#L1013:
originalJob = JpaJob.from(fromDb.toJob());
Makes the 'job' state be reset to QUEUED (the value from the database), probably because 'job' and 'fromDb' are actually the same instance in the EM cache? Note that, when calling the same method via public 'updateJob', the problem does not happen because the JpaJob 'job' is created from scratch and not read from the database (not associated yet with the EM?)
This behavior can be verified by running the unit tests and logging the state right before and after that line.
The solution is to remove the line (it's only for debugging purposes).
Thanks for pointing that out. The piece that resets the status is actually 'fromDb.toJob();'. The 'fromDb' variable is gotten from the EM cache with em.find. I just confirmed this by changing the code to:
fromDb = em.find(JpaJob.class, job.getId());
System.out.println(String.format("BEFORE toJob, Job %s status is %s", job.getId(), job.getStatus()));
Job j = fromDb.toJob();
System.out.println(String.format("AFTER toJob, Job %s status is %s", job.getId(), job.getStatus()));
originalJob = JpaJob.from(j);
System.out.println(String.format("AFTER from, Job %s status is %s", job.getId(), job.getStatus()));
BEFORE toJob, Job 6 status is DISPATCHING
AFTER toJob, Job 6 status is QUEUED
AFTER from, Job 6 status is QUEUED
It is indeed strange...
Yikes. That’s a subtle one that would be very very difficult to catch. To be clear, I’m find with removing 1013, but it worries me that JPA is doing this. IMO, it shouldn’t be, because JpaJob.from() is supposed to be creating a new object which should be outside of the EM…
Hi @gregorydlogan, the method JpaJob.fromDb IS copying the status correctly. The problem is in updateInternal(). Accessing all fields of the variable ‘fromDb’ seems to be reloading the same values in the variable ‘job’ (both objects have the same id and are managed by the same entity manager) and thus resetting the status to QUEUED.
I’d be curious to see some unit tests around the JpaJob.from static. That method should be copying the status correctly (https://github.com/opencast/opencast/blob/9646abd232b08705dbbf868dcea5a390097a56c1/modules/common-jpa-impl/src/main/java/org/opencastproject/job/jpa/JpaJob.java#L269) so if it’s not then there are likely other issues.