Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 1 von 219

Details

Autor(en) / Beteiligte
Titel
BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers
Ist Teil von
  • Computer Vision - ECCV 2022, 2022, Vol.13669, p.1-18
Ort / Verlag
Switzerland: Springer
Erscheinungsjahr
2022
Link zum Volltext
Quelle
Alma/SFX Local Collection
Beschreibungen/Notizen
  • 3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. The code is available at https://github.com/zhiqi-li/BEVFormer.
Sprache
Englisch
Identifikatoren
ISBN: 9783031200762, 3031200764
ISSN: 0302-9743
eISSN: 1611-3349
DOI: 10.1007/978-3-031-20077-9_1
Titel-ID: cdi_springer_books_10_1007_978_3_031_20077_9_1

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX