Augment SilenceDetection with Voice Activity Detection

Description

Opencast added SilenceDetection in MH-9178, which examines the audio track to detect segments with no audio. These are then proposed as trimming points for the video editor.

In some venue A/V configurations where boundary microphones are used as a fallback for lapel microphones, there may be never be a region of silence - instead, background noise will be recorded when a speaker is not speaking.

It is helpful to trim out these segments from the video (typically at the start and end of a recording), but SilenceDetection does not detect them as silence (because they are not).

Voice Activity Detection (https://en.wikipedia.org/wiki/Voice_activity_detection) is helpful here, because we can identify two additional cases for possible trimming:

  • Continuous background speech (audience noise)

  • White noise which contains no speech but is also not silence (e.g. an empty room with other sources of noise)

There is an open source implementation of Voice Activity Detection in WebRTC (https://webrtc.org/), which is considered to be one of the best open implementations available (other options include Sphinx4, and some audio codecs which include VAD support).

There is a python module which surfaces just the VAD component from WebRTC and is easy to install:

https://pypi.python.org/pypi/webrtcvad

It is proposed to extend the SilenceDetection service to additionally use VAD (if configured to do so).

Some experimentation and an appropriate algorithm is required to use the WebRTC VAD output to identify segments appropriately (distinguishing continuous background noise speech, single speaker speech, white noise and silence).

Steps to reproduce

None

Status

Assignee

Stephen Marquard

Reporter

Stephen Marquard

Criticality

None

Tags (folksonomy)

None

Components

Fix versions

Affects versions

4.1
2.2.1

Priority

Major