Today, many video-on-demand databases containing
thousands of films
have been built. They contain a lot of data with very little
information on the data. It is the goal of the MoCA-project (Movie
content analysis) to extract the information hidden in the video and
audio to enable the user to select specifically the movies he wants by
automatic content analysis, a process which includes video, audio and
text processing. A first step to approach that ambitious goal is MoCA
genre recognition.
Our approach to the genre recognition task is to
obtain a variety of statistical data from the films. In step one we
gather raw statistics. These include frame to frame pixel differences,
image homogeneity, hue, saturation and motion vectors in the video
domain. In the audio domain, loudness and frequency statistics are
obtained. In step two we derive information from the statistics of step
1. We calculate camera motion, cuts, fades and dissolves in the video
domain. In the audio domain we calculate the amount of silence and
human voice per scene and the overall loudness. In a last step we
compare the data obtained in steps 1 and 2 to genre profiles using a
chi square test. We identify a film to be part of a specific genre if
the data is very similar to one of these profiles.