During an index rebuild for the elasticsearch index, there are at least 2 messages sent and received for every item in the index:
1. One message which is a status update, that gets turned into a log entry
2. At least one message which is the item to index, that gets indexed.
For large collections, the status updates are unnecessary for every single item, and generate a large amount of unnecessary message traffic. For example, our production instance has 30,000+ archive entries, so 60K+ messages are sent and received between Opencast and activemq. This is done for the Admin UI index and also for the External API index.
There are separate listeners for the status update and the indexing operation. When the indexing operation has processed all its messages, it sends a finished update to the status listener.
The status updates can be reduced to one for every percentage point of progress for the large collections (archive/episode, and series).
So instead of 32285 entries like this:
2017-02-12 20:51:25,091 | INFO | pool-24-thread-1 | (AbstractSearchIndex:173) - Updating service: 'Archive' with 28336/32285 finished.
We get 100 entries like this:
2017-02-12 20:51:25,091 | INFO | pool-24-thread-1 | (AbstractSearchIndex:173) - Updating service: 'Archive' with 28336/32285 finished, 87% complete.
and the performance benefit of not sending and receiving the many unnecessary messages.