From: Multi-task approach based on combined CNN-transformer for efficient segmentation and classification of breast tumors in ultrasound images
ViT version
Image resolution
Projection dimension
Number of MSA heads
Number of transformers layers
ViT-base
224 × 224
768
12
Our adapted transformer
256 × 256
64
8