Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Hundreds of billions of photographs are created on the web each year. An important step towards understanding the content of these photographs is to be able to understand all objects that are depicted. My research focuses on the problem of automatically naming and localizing objects in large collections of images. This is referred to as the task of object detection.The work in this thesis scales up object detection algorithms in both the number of images and the number of objects that can be recognized. I've developed efficient object detection algorithms which can be applied on large image collections and studied using shareable generic object attribute descriptions that can be used to effectively describe a variety of object classes without learning individual class appearance models. The key roadblock to scaling up object detection is that extensive manual annotation is required for training the models, which can be very time-consuming and expensive.To address this roadblock, my colleagues and I created the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). ILSVRC serves as a benchmark large-scale object recognition for hundreds of international research teams. I led the effort to construct the object detection benchmark, scaling up by more than an order of magnitude compared to previous dataset (e.g., the PASCAL VOC). The construction of this dataset required developing novel crowd engineering techniques for reducing annotation cost. The availability of this large-scale data lead to a revolution in object detection algorithms. I performed a detailed analysis of the current state of the field of object recognition, providing insights for future research efforts. Thinking ahead about scaling up object detection even further, I developed a framework for bringing together the state-of-the-art automatic large-scale object detection with state-of-the-art crowd engineering techniques into a principled human-in-the-loop framework for accurately and efficiently localizing objects in images.