Uploaded image for project: 'Opencast'
  1. MH-12290

DublinCore Catalog XML parsing is slow

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed and reviewed
    • Affects Version/s: 3.3
    • Fix Version/s: None
    • Component/s: Backend Software

      Description

      This appears to be a regression from 2.x.

      The XML parse operation in DublinCoreXmlFormat.readImpl(InputSource in) appears to take about 4ms to run, of which 2ms is:

        final SAXParserFactory factory = SAXParserFactory.newInstance(); 
        factory.setValidating(false); 
        factory.setNamespaceAware(true); 
      

      and another 2ms for:

          factory.newSAXParser().parse(in, this);  

      This is problematic when a lot of XML catalogs are parsed for the same request, as for MH-12157 Resolved (parsing XML catalogs for a large number of series).

      It seems this was much faster in 2.x, although it's not clear why.

      Possibly strategies to improve this include re-using the SAXParserFactory rather than creating a new one for each call. Some discussion here:

      https://www.ibm.com/developerworks/library/x-perfap2/index.html

      Best is to avoid storing commonly-used attributes only inside XML blobs.

        TestRail: Results

          Attachments

            Issue links

              Activity

                People

                • Assignee:
                  karen_dolan Karen Dolan
                  Reporter:
                  smarquard Stephen Marquard
                • Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    TestRail: Cases