Uploaded image for project: 'Opencast'
  1. MH-12261

Race condition leads to FileAlreadyExistsException and FileNotFoundException

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed and reviewed
    • Affects Version/s: 2.3.4, 3.0, 3.1
    • Fix Version/s: 3.2
    • Component/s: Backend Software
    • Labels:
      None
    • Severity:
      Incorrectly Functioning Without Workaround
    • Steps to reproduce:
      Hide
      Under high load, workflow operations sometimes fail with either a
      FileAlreadyExistsException or a FileNotFoundException, typically for episode.xml.

      Investigation (including adding some additional detailed logging) shows this to arise from a race condition between the workflow operation and index receiver threads.

      Symptoms of the same underlying issue:

      Symptom 1 - paths don't match warning:

      2017-05-24 08:56:46,300 | WARN | pool-89-thread-1 | (WorkspaceImpl:277) - The working file repository and workspace paths don't match. Looking up http://media.uct.ac.za/files/mediapackage/f4c8df18-3fb2-40aa-84a1-c494854c1f28/catalog-0/episode.xml at /data/opencast/work/shared/files/mediapackage/f4c8df18-3fb2-40aa-84a1-c494854c1f28/catalog-0/episode.xml failed

      Symptom 2 - file not found:

      2017-05-24 08:56:46,393 | DEBUG | pool-63-thread-422 | (ServiceRegistryJpaImpl:872) - Updating Job {id:58706694, operation:START_OPERATION, status:FINISHED}
      2017-05-24 08:56:46,456 | ERROR | pool-89-thread-1 | (DublinCoreUtil:115) - Unable to load metadata from catalog 'http://media.uct.ac.za/files/mediapackage/f4c8df18-3fb2-40aa-84a1-c494854c1f28/catalog-0/episode.xml': java.io.FileNotFoundException: /data/opencast/work/shared/workspace/mediapackage/f4c8df18-3fb2-40aa-84a1-c494854c1f28/catalog-0/episode.xml (No such file or directory)
      2017-05-24 08:56:46,457 | ERROR | pool-89-thread-1 | (BaseMessageReceiverImpl$MessageWatcher:148) - Problem while getting org.opencastproject.index.service.message.WorkflowMessageReceiverImpl message events java.io.FileNotFoundException: /data/opencast/work/shared/workspace/mediapackage/f4c8df18-3fb2-40aa-84a1-c494854c1f28/catalog-0/episode.xml (No such file or directory)
              at java.io.FileInputStream.open0(Native Method)
              at java.io.FileInputStream.open(FileInputStream.java:195)
              at java.io.FileInputStream.<init>(FileInputStream.java:138)
              at org.opencastproject.metadata.dublincore.DublinCoreUtil.loadDublinCore(DublinCoreUtil.java:112)
              at org.opencastproject.metadata.dublincore.DublinCoreUtil$1.apply(DublinCoreUtil.java:98)
              at org.opencastproject.metadata.dublincore.DublinCoreUtil$1.apply(DublinCoreUtil.java:95)
              at org.opencastproject.util.data.Option$1.fmap(Option.java:208)
              at org.opencastproject.util.data.Option.map(Option.java:57)
              at org.opencastproject.metadata.dublincore.DublinCoreUtil.loadDublinCore(DublinCoreUtil.java:95)
              at org.opencastproject.metadata.dublincore.DublinCoreUtil.loadEpisodeDublinCore(DublinCoreUtil.java:76)
              at org.opencastproject.index.service.message.WorkflowMessageReceiverImpl.execute(WorkflowMessageReceiverImpl.java:83)

      Symptom 3 - file already exists:

      2017-05-30 11:56:27,212 | ERROR | pool-63-thread-159 | (SeriesWorkflowOperationHandler:236) - Unable to update episode catalog isPartOf field: java.nio.file.FileAlreadyExistsException: /data/opencast/work/shared/workspace/mediapackage/912313e3-4ef6-4d82-95c6-19bee09c5e0b/catalog-0/episode.xml
      java.nio.file.FileAlreadyExistsException: /data/opencast/work/shared/workspace/mediapackage/912313e3-4ef6-4d82-95c6-19bee09c5e0b/catalog-0/episode.xml

      The problem appears to be in WorkspaceImpl:

      public File get(final URI uri) throws NotFoundException, IOException

      which will copy (hardlink) a file from the WFR to the workspace when needed.

      This method implements insufficient locking (too late) and so is vulnerable to concurrency issues when it is called at nearly the same time for the same URI by both a workflow operation thread and an index receiver thread.
      Show
      Under high load, workflow operations sometimes fail with either a FileAlreadyExistsException or a FileNotFoundException, typically for episode.xml. Investigation (including adding some additional detailed logging) shows this to arise from a race condition between the workflow operation and index receiver threads. Symptoms of the same underlying issue: Symptom 1 - paths don't match warning: 2017-05-24 08:56:46,300 | WARN | pool-89-thread-1 | (WorkspaceImpl:277) - The working file repository and workspace paths don't match. Looking up http://media.uct.ac.za/files/mediapackage/f4c8df18-3fb2-40aa-84a1-c494854c1f28/catalog-0/episode.xml at /data/opencast/work/shared/files/mediapackage/f4c8df18-3fb2-40aa-84a1-c494854c1f28/catalog-0/episode.xml failed Symptom 2 - file not found: 2017-05-24 08:56:46,393 | DEBUG | pool-63-thread-422 | (ServiceRegistryJpaImpl:872) - Updating Job {id:58706694, operation:START_OPERATION, status:FINISHED} 2017-05-24 08:56:46,456 | ERROR | pool-89-thread-1 | (DublinCoreUtil:115) - Unable to load metadata from catalog ' http://media.uct.ac.za/files/mediapackage/f4c8df18-3fb2-40aa-84a1-c494854c1f28/catalog-0/episode.xml&#39;: java.io.FileNotFoundException: /data/opencast/work/shared/workspace/mediapackage/f4c8df18-3fb2-40aa-84a1-c494854c1f28/catalog-0/episode.xml (No such file or directory) 2017-05-24 08:56:46,457 | ERROR | pool-89-thread-1 | (BaseMessageReceiverImpl$MessageWatcher:148) - Problem while getting org.opencastproject.index.service.message.WorkflowMessageReceiverImpl message events java.io.FileNotFoundException: /data/opencast/work/shared/workspace/mediapackage/f4c8df18-3fb2-40aa-84a1-c494854c1f28/catalog-0/episode.xml (No such file or directory)         at java.io.FileInputStream.open0(Native Method)         at java.io.FileInputStream.open(FileInputStream.java:195)         at java.io.FileInputStream.<init>(FileInputStream.java:138)         at org.opencastproject.metadata.dublincore.DublinCoreUtil.loadDublinCore(DublinCoreUtil.java:112)         at org.opencastproject.metadata.dublincore.DublinCoreUtil$1.apply(DublinCoreUtil.java:98)         at org.opencastproject.metadata.dublincore.DublinCoreUtil$1.apply(DublinCoreUtil.java:95)         at org.opencastproject.util.data.Option$1.fmap(Option.java:208)         at org.opencastproject.util.data.Option.map(Option.java:57)         at org.opencastproject.metadata.dublincore.DublinCoreUtil.loadDublinCore(DublinCoreUtil.java:95)         at org.opencastproject.metadata.dublincore.DublinCoreUtil.loadEpisodeDublinCore(DublinCoreUtil.java:76)         at org.opencastproject.index.service.message.WorkflowMessageReceiverImpl.execute(WorkflowMessageReceiverImpl.java:83) Symptom 3 - file already exists: 2017-05-30 11:56:27,212 | ERROR | pool-63-thread-159 | (SeriesWorkflowOperationHandler:236) - Unable to update episode catalog isPartOf field: java.nio.file.FileAlreadyExistsException: /data/opencast/work/shared/workspace/mediapackage/912313e3-4ef6-4d82-95c6-19bee09c5e0b/catalog-0/episode.xml java.nio.file.FileAlreadyExistsException: /data/opencast/work/shared/workspace/mediapackage/912313e3-4ef6-4d82-95c6-19bee09c5e0b/catalog-0/episode.xml The problem appears to be in WorkspaceImpl: public File get(final URI uri) throws NotFoundException, IOException which will copy (hardlink) a file from the WFR to the workspace when needed. This method implements insufficient locking (too late) and so is vulnerable to concurrency issues when it is called at nearly the same time for the same URI by both a workflow operation thread and an index receiver thread.

      TestRail: Results

        Attachments

          Activity

            People

            • Assignee:
              smarquard Stephen Marquard
              Reporter:
              smarquard Stephen Marquard
            • Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                TestRail: Cases