Fused behavior recognition model based on attention mechanism

Visual Computing for Industry, Biomedicine, and Art

Table 5 Comparison of recognition accuracy with state-of-the-art methods on Something-Something v1 dataset

Methods	Input modality	Pre_training	Top-1 val (%)	TOP-1test (%)
TSN by ref. [23] (7 frames)	RGB	ImageNet	18.48	–
MultiScale TRN [23]	RGB	ImageNet	34.44	33.6
ECO (16 frames) [12]	RGB	ImageNet	41.4	–
TRN (ResNet-50) by ref. [13] (8frames)	RGB	ImageNet	38.9	–
ResNet34-3DRes18 (16 frames)	RGB	Kinetics	41.012	–
Res34-SE-IM-Net (16 frames)	RGB	Kinetics	41.398	36.5