Improve performance for image extraction by last frame

Description

While doing the pull request the solves issue "Improve composer and ffmpeg inspection service to allow image extraction by last frame", a discussion about potential performance improvements raised:

https://bitbucket.org/opencast-community/matterhorn/pull-requests/594/mh-10335-improvement-improve-composer-and/diff#comment-13212065

Lars pointed out, that the implementation suffers relative bad performance in comparison to alternative implementations:

Since I complained about this, let me try to give a bit more details about what I see as very problematic with this pull request.
The Goal: Extract Last Frame
First of all, I assume that the ultimate goal off all this has not changed and it is still to extract the last frame from a video stream and encode it as still image. Now lets take a look at how you are doing that by assuming the video is a book, the pages are the frames and you want to extract the last page of that book.
What do we know before we start? Roughly the size of the book. So we know that it has about let's say 500 pages. This we know because of the old implementation of the inspect service. Getting this information takes milliseconds and does not depend on the type of video.
Do we have any constraints? Yes, sadly, we cannot seek backwards in the video stream, so we cannot just get the last frame from the back.
Now for your method: First of all, you have a look at all pages, one by one and count them. This, of course, takes some additional time. Time we did not need to spend before.
Then at some point you know: This book has exactly 497 pages. What are we now doing with this information? Right, we go to the first page and, of course, we read the whole first page. Then we decide that it is the first page we have read and we do not want to know the contents. So we forget the contents. Now we go to the second page, we read the second page, we decide that we do not want the second page, we forget the second page. This we continue till the end of the book.
Sounds like an incredible slow algorithm to get the last page with a tremendous overhead? It is! I am pretty sure that you would not do that if you had to do that manually.
Slow Inspection
One problem is certainly the encoding for which there are way better methods (see below), but that will not hurt users as you are only using that if you want this feature. A much bigger problem is that this will also hurt all others Opencast users.
Inspection is used in Opencast workflows again and again. In fact, you have to use it if you want to keep accurate metadata since encoding processes, trimming, the video editor, ... do change the media. That means, slowing this process has due to its repetition a large impact on the overall performance.
Let us just test how -count_frames impacts the performance:
% mediainfo fg.mp4
...
Duration : 1h 51mn

% /usr/bin/time -p ffprobe -show_format -show_streams -count_frames -of json fg.mp4
...
real 294.31
user 961.99
sys 10.67

% /usr/bin/time -p ffprobe -show_format -show_streams -of json fg.mp4
...
real 0.09
user 0.06
sys 0.02
You see that what was nearly instantanious before now takes nearly five minutes on my laptop. Assuming that you execute that on two videos, once initially, once after generating the preview, once after editting the video and once after the final encoding (assuming you generate only one output format), you already use inspect eight times, meaning additional 40min. That is bad.
Faster Ways?
Given the constraints and your prior knowledge, there are certainly faster ways to archive that goal without hurting other users. In fact there are ways which are a lot easier to implement as well.
What I would suggest for know is to seek to nearly the end of the video stream – remember that we roughly know how long it is – and just extract the following frames, replacing old ones as long as there are more frames.
Doing that is very easy:
ffmpeg -ss ${NEAR-END} -i in.mp4 -updatefirst 1 last-frame.jpg
The NEAR-END just needs to be a position in seconds near the end of the video stream. It does not matter exactly how near. Of course, nearer means a faster process, but encoding five or ten seconds does not make a big difference.
If we test that we will see something like this:
/usr/bin/time -p ffmpeg -i fg.mp4 -filter:v 'select=eq(n\,160511)' -r 1 -frames:v 1 fg.jpg
...
real 281.94
user 953.31
sys 9.30

/usr/bin/time -p ffmpeg -ss 6685 -i fg.mp4 -updatefirst 1 last-frame.jpg
...
real 0.57
user 0.74
sys 0.03
You see that it is more than slightly faster.
All you have to do for the implementation is to generate the NEAR-END (e.g. End - 5sec) and hand that over to the image workflow operation using a customized encoding profile. Or sllightly modify that one and create a extarct-last-frame workflow operation which might make things easier for users to configure.
I hope that clearifies my problem with this pull request a bit.

The main issue is that even people that would not use the frame accurate frame counting would pay a performance overhead (without any benefits for them).

We agreed on adding a configuration option that can be set to enable (default: disabled) accurate frame counting and that we would file this issue describing potential ways of improving the performance of accurate frame counting.

Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Unassigned

Reporter

Sven Stauber

Criticality

Low