Overview
Projects
MoCA
People
Downloads
Publications
|
MoCA Project: Audio
Analysis Activities
Within the MoCA project, we have the goal to extract
video contents from picture AND audio tracks. One application of such
content extraction is automatic indexing, which will support a user
(both, professionals like librarians and home users) in a search task.
The audio track bears important information about a video's content,
not only within speech segments but also within the background music
and within different noises. We have, e.g., tried to automatically
detect violence by analyzing the audio stream.
In the following paragraphs, different activites of
audio content analysis within the MoCA project are presented.
Audio Track
Segmentation and Classification
As already mentioned above, different types of
information are found within speech, music, silence and noise segments
of the audio track. Before being able to extract such information, a
segmentation of the audio track is therefore necessary. We have
implemented algorithms to perform such a segmentation based on the
similarity of consecutive sound. On top of this segmentation, a
classification is necessary in order to be able to distinguish music,
speech, noises and silence.
Music
Analysis
Content of music will be described along two dimensions
within our project: rhythm characteristics and tone characteristics.
Rhythm characteristics give a temporal description of musical events.
In a first step, we have extracted the beat of percussion-based music
both by analyzing time domain (i.e. temporal distribution of amplitude
values) and frequency domain (i.e. the frequency patterns of typical
percussion instruments) parameters.
Tone characteristics give a direct description of the content of
musical events. Our first efforts have been the extraction of the
fundamental frequency of musical events and the identification of
single notes of single voiced music. We further intend to extract
melodies and harmonic characteristics.
Speech
Analysis
Content of speech is twofold: it can be used to
determine who's speaking (speaker recognition) and to determine what's
being said (speech recognition). Both dimensions have been a main
research area within Artificial Intelligence projects and software for
both is available. Therefore, our research efforts within MoCA Audio
Content Analysis have left out speech analysis. It is, however,
interesting to consider the situation for speech analysis tools within
video content analysis: the vocabulary used is basically unlimited,
lots of different speakers are involved, training is not possible,
speech is continuous and a lot of disturbing noises occur. Therefore,
common speech recognition packages ae difficult to use for this task.
In a first approach, complete recognition of speech is not necessary: a
simpler word spotting algorithm is sufficient.
Analysis of
specific noises: violence detection
As an example for the analyis of noises, we implemented
an application to detect violence in movie sequences. As violence
itself contains many aspects and is strongly dependent on the cultural
environment, a computer system cannot recognize violence in all its
forms. We therefore concentrated sucessfully on the recognition of a
few indicators of violence: shots, explosions and cries.
Publications
For more information, see our technical report on Audio
Content Analysis.
This paper describes the theoretic framework and
applications of automatic audio content analysis. After explaining the
basic properties of audio analysis, we present a toolbox being the
basis for the development of audio analysis algorithms. We also
describe new applications which can be developed using the toolset,
among them music indexing and retrieval as well as violence detection
in the sound track of videos.
Pfeiffer, Silvia; Fischer, Stephan; Effelsberg,
Wolfgang.
Automatic
Audio Content Analysis
April, 1996.
|