Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Online Emission Policy Selection for Radar Antijamming Using Bandit-Optimized Policy Search
Ist Teil von
IEEE transactions on aerospace and electronic systems, 2024-06, Vol.60 (3), p.3132-3147
Ort / Verlag
New York: IEEE
Erscheinungsjahr
2024
Quelle
IEEE Electronic Library Online
Beschreibungen/Notizen
Although learning the emission policy online to capture an unknown jammer's emission pattern and exploit its strategy weakness is a promising and appealing technique for radar antijamming, current research on cognitive radar's active emission policies still suffers from low sample efficiency, limited generalization, and underutilization of radar emission flexibility. To address these issues, we propose a framework based on policy search that combines policy parameterization, black-box process, and bandit optimization to efficiently adjust radar emission policy online and significantly improve radar detection performance in the presence of main lobe interference. In short, we let a low-dimensional vector drive the radar's emission policy in a manual code framework and restrict the policy parameters to contain only categorical variables. Then, a Thompson sampling decision tree (TSDT) algorithm is proposed to determine the optimal information-sharing mechanism among parameterized policies and capture their correlation to accelerate optimization. As a result, TSDT recommends the active emission policy through hierarchical sampling when balancing the exploration–exploitation tradeoff. In this way, the low-dimensional vector fully represents the emission flexibility, and its low dimension significantly facilitates the sample efficiency. Also, the black-box process avoids strong assumptions about the jammer. The experiment simulates a frequency agility (FA) radar with optional pulse widths, bandwidths, and FA policies against four kinds of jammers with different decision logics, and the results show that the proposed algorithm can quickly adapt to diverse jammers and significantly improve the radar's detection performance.