Real-time person detection in low-resolution thermal infrared imagery with MSER and CNNs

Conference paper


Christian Herrmann
Thomas Müller
Dieter Willersinn
Jürgen Beyerer


lProc. SPIE 9987, Electro-Optical and Infrared Systems: Technoogy and Applications, 99870I, 2016.


SPIE Security + Defence: Electro-Optical and Infrared Systems: Technology and Applications, Edinburgh, United Kingdom, September 26 - 29, 2016

In many camera-based systems, person detection and localization is an important step for safety and security applications such as search and rescue, reconnaissance, surveillance, or driver assistance. Long-wave infrared (LWIR) imagery promises to simplify this task because it is less affected by background clutter or illumination changes. In contrast to a lot of related work, we make no assumptions about any movement of persons or the camera, i.e. persons may stand still and the camera may move or any combination thereof. Furthermore, persons may appear arbitrarily in near or far distances to the camera leading to low-resolution persons in far distances. To address this task, we propose a two-stage system, including a proposal generation method and a classifier to verify, if the detected proposals really are persons. In contradiction to use all possible proposals as with sliding window approaches, we apply Maximally Stable Extremal Regions (MSER) and classify the detected proposals afterwards with a Convolutional Neural Network (CNN). The MSER algorithm acts as a hot spot detector when applied to LWIR imagery. Because the body temperature of persons is usually higher than the background, they appear as hot spots in the image. However, the MSER algorithm is unable to distinguish between different kinds of hot spots. Thus, all further LWIR sources such as windows, animals or vehicles will be detected, too. Still by applying MSER, the number of proposals is reduced significantly in comparison to a sliding window approach which allows employing the high discriminative capabilities of deep neural networks classifiers that were recently shown in several applications such as face recognition or image content classification. We suggest using a CNN as classifier for the detected hot spots and train it to discriminate between person hot spots and all further hot spots. We specifically design a CNN that is suitable for the low-resolution person hot spots that are common with LWIR imagery applications and is capable of fast classification. Evaluation on several different LWIR person detection datasets shows an error rate reduction of up to 80 percent compared to previous approaches consisting of MSER, local image descriptors and a standard classifier such as an SVM or boosted decision trees. Further time measurements show that the proposed processing chain is capable of real-time person detection in LWIR camera streams.