UB Paderborn / Katalog / Suche / Details

Ergebnis 25 von 99

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), 2022, p.855-860

2022

Volltextzugriff (PDF)

Autor(en) / Beteiligte

Titel

Convolutional Transformer with Similarity-based Boundary Prediction for Action Segmentation

Ist Teil von

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), 2022, p.855-860

Ort / Verlag

IEEE

Erscheinungsjahr

2022

Quelle

IEEE Electronic Library Online

Beschreibungen/Notizen

Action classification has made great progress, but segmenting and recognizing actions from long videos remains a challenging problem. Recently, Transformer-based models with strong sequence modeling ability have succeeded in many se-quence modeling tasks. However, the lack of inductive bias and the difficulty of handling long video sequences limit the application of the Transformer in the action segmentation task. In order to explore the potential of the Transformer in this task, we replace some specific linear layers in the vanilla Transformer with dilated temporal convolution, and a sparse attention mechanism is utilized to reduce the time and space complexities to process long video sequences. Besides, directly using frame-wise classification loss to train the model will cause that frames at boundaries of actions are treated equally with those in the middle of actions, and the learned features are not sensitive to boundaries. We propose a new local log-context attention module to predict whether each frame is at the beginning, middle, or end of an action. Since boundary frames are similar to their neighboring frames of different classes, our similarity-based boundary prediction helps learn more discriminative features. Extensive experiments on three datasets show the effectiveness of our method.

Sprache: Englisch
Identifikatoren: eISSN: 2375-0197
DOI: 10.1109/ICTAI56018.2022.00131
Titel-ID: cdi_ieee_primary_10097931

Format: –
Schlagworte: Complexity theory, Computer Vision, Convolution, Convolutional neural networks, Predictive models, Task analysis, Temporal Convolutional Neural Network, Transformers, Transfromer, Video Action Segmentation, Video sequences

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX