| Article |
| Type of Publication |
| Localizing and Segmenting Text in Images and
Videos |
| Title |
|
Axel Wernicke
|
| Authors |
| IEEE Transactions on Circuits and Systems for Video
Technology 12 (4), pp.
256--268, April 2002 |
| Published in |
| Many images especially those used for page
design on web pages as well as videos contain visible text. If
these text occurrences could be detected, segmented, and
recognized auto-matically, they would be a valuable source of
high-level seman-tics for indexing and retrieval. In this paper,
we propose a novel method for localizing and segmenting text in
complex images and videos. Text lines are identified by using a
complex-valued multi-layer feed-forward network trained to detect
text at a fixed scale and position. The network s output at all
scales and positions is in-tegrated into a single text-saliency
map, serving as a starting point for candidate text lines. In the
case of video, these candidate text lines are refined by
exploiting the temporal redundancy of text in video. Localized
text lines are then scaled to a fixed height of 100 pixels and
segmented into a binary image with black characters on white
background. For videos, temporal redundancy is exploited to
improve segmentation performance. Input images and videos can be
of any size due to a true multiresolution approach. Moreover, the
system is not only able to locate and segment text occurrences
into large binary images, but is also able to track each text
line with sub-pixel accuracy over the entire occurrence in a
video, so that one text bitmap is created for all instances of
that text line. Therefore, our text segmentation results can also
be used for ob-ject- based video encoding such as that enabled by
MPEG-4. |
| Abstract |
|
MPEG-4 object encoding
object detection
object segmentation
text detection
text segmentation
video indexing
video OCR
video processing
|
| Keywords |
| [PDF]
[BIB] [XML] |
| Downloads & Bib-Entries |