Skip to main content

Table 5 Comparison of recognition accuracy with state-of-the-art methods on Something-Something v1 dataset

From: Fused behavior recognition model based on attention mechanism

Methods

Input modality

Pre_training

Top-1 val (%)

TOP-1test (%)

TSN by ref. [23] (7 frames)

RGB

ImageNet

18.48

MultiScale TRN [23]

RGB

ImageNet

34.44

33.6

ECO (16 frames) [12]

RGB

ImageNet

41.4

TRN (ResNet-50) by ref. [13] (8frames)

RGB

ImageNet

38.9

ResNet34-3DRes18 (16 frames)

RGB

Kinetics

41.012

Res34-SE-IM-Net (16 frames)

RGB

Kinetics

41.398

36.5