Proceedings of the 12th Conference on Computer Robot Vision (CRV), 2015.
12th Conference on Computer and Robot Vision (CRV), Halifax, Canada, 3.-5. Juni 2015
Increasingly large amounts of video data raise the question if large-scale face retrieval is feasible. To find fast and accurate matching strategies, an according face track descriptor is constructed by using local features, extended by an encoding of the respective measurement conditions. The feature encoding allows collecting all features of one face track together in a single feature set, where cumulative descriptors, known from image or object retrieval applications, especially bag of words and fisher vectors, can be applied. These descriptors are known to be viable for large-scale retrieval applications. To explore large-scale video face retrieval, we first evaluate on the largest available public datasets, i.e. YouTube Faces Database and Face in Action Database. Finally, the behavior of face retrieval for increasing amounts of data is investigated by combining these datasets with 55K face tracks, collected from about 100 hours of TV data, making it the largest collection of face tracks we are aware of.