Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Remote sensing images contain many objects that resemble road structures, making it diffcult to distinguish roads from the background. Moreover, road extraction is affected by many factors, such as lighting conditions, noise, occlusions, etc., resulting in incomplete and discontinuous road extraction. Learning discriminative road features from remote sensing images is a highly challenging task. In this paper, a novel road extraction model is proposed for remote sensing images under encoder and decoder U-Net like architecture. An axial Transformer module (ATM) is designed to learn global road features in the deepest layer with linear computational complexity regarding image size. And a multilayer attention fusion module (MLAF) is also presented to fuse multiple layers of Transformer features, obtaining more comprehensive and richer semantic information. In the skip connection, a channel attention module (CAM) is designed to weight the feature maps along the channel dimension, with the goal of improving the capability of feature representation. Extensive experiments are conducted on the DeepGlobe and Massachusetts road datasets. Compared with other methods, our proposed method in this paper realized road extraction from remote sensing images with higher accuracy and less computational cost, e.g., achieving a intersection over union (IoU) of 81.71% (1.02% improvement) and a 22.38% reduction in convergence time over the latest TransRoadNet on the Massachusetts road dataset. Ablation experiments also demonstrate the effectiveness of the designed model.