Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Powerful knockoffs via minimizing reconstructability
Ist Teil von
The Annals of statistics, 2022-02, Vol.50 (1), p.252
Ort / Verlag
Hayward: Institute of Mathematical Statistics
Erscheinungsjahr
2022
Quelle
Project Euclid
Beschreibungen/Notizen
Model-X knockoffs (J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 (2018) 551–577) allows analysts to perform feature selection using almost any machine learning algorithm while provably controlling the expected proportion of false discoveries. This procedure involves constructing synthetic variables, called knockoffs, which effectively act as controls during feature selection. The gold standard for constructing knockoffs has been to minimize the mean absolute correlation (MAC) between features and their knockoffs, but, surprisingly, we prove this procedure can be powerless in extremely easy settings, including Gaussian linear models with correlated exchangeable features. The key problem is that minimizing the MAC creates joint dependencies between the features and knockoffs, which allow machine learning algorithms to reconstruct the effect of the features on the response using the knockoffs. To improve power, we propose generating knockoffs which minimize the reconstructability (MRC) of the features, and we demonstrate our proposal for Gaussian features by showing it is computationally efficient, robust, and powerful. We also prove that certain MRC knockoffs minimize a notion of estimation error in Gaussian linear models. Through extensive simulations, we show MRC knockoffs often dramatically outperform MAC-minimizing knockoffs, and we find no settings in which MAC-minimizing knockoffs outperform MRC knockoffs by more than a slight margin. We implement our methods and many others from the knockoffs literature in a new python package knockpy.