Skip to main content
Fig. 2 | Visual Computing for Industry, Biomedicine, and Art

Fig. 2

From: STTG-net: a Spatio-temporal network for human motion prediction based on transformer and graph convolution network

Fig. 2

Temporal transformer (T-transformer) module. The module combines the encoded features of the connected human pose vector Z through the TPE and the input sequence, and obtains the output \({\mathrm{Z}}_{{\mathrm{L}}_{\mathrm{T}}}\) through the T-transformer module composed of 6 identical T-transformer layers. Specifically, each T-transformer layer will through the layer norm, and then the multi-head attention calculation is performed by the dot product attention composed of Q, K, and V of multiple heads, and finally connect the attention results and pass through the MLP composed of two FC layers

Back to article page