Fig. 7From: Vision transformer architecture and applications in digital health: a tutorial and surveyDecoder and mask multihead attention block to produce the final imageBack to article page