Fig. 11From: Vision transformer architecture and applications in digital health: a tutorial and surveyExamples of using ViT for surgical instruction prediction. Transformer prediction is based on the SIGT method [62]. GT is used as a reference for comparison and validationBack to article page