Textanalysis shouldn't create tons of copies of the source video
Steps to reproduce
The textanalysis workflow operation extract an image from the source video for each video segment. Then the tesseract is called to extract text from this image files. And so on…
Opencast create for each video segment an image extraction job. Each job call workspace.get(URI..., unique: true). This create a copy of the source video. If you run an textanalysis for 100 segments, you will end up copying the video file 100 times.
You can pass an array of timestamps to the image extraction job. This will create only one job for extracting all image files at once.