From: Dual modality prompt learning for visual question-grounded answering in robotic surgery
Models | EndoVis-18 | EndoVis-17 | ||||
---|---|---|---|---|---|---|
ACC | F-score | mIoU | ACC | F-score | mIoU | |
Co-Attn DeiT [24] | 0.6136 | 0.3208 | 0.7273 | 0.3805 | 0.3026 | 0.6870 |
CAT-ViL DeiT [15] | 0.6452 | 0.3321 | 0.7705 | 0.4491 | 0.3622 | 0.7322 |
GVLE-LViT [14] | 0.6659 | 0.3614 | 0.7625 | 0.4576 | 0.2489 | 0.7275 |
TCP (Ours) | 0.6845 | 0.4846 | 0.7762 | 0.4639 | 0.3334 | 0.7509 |
VCP (Ours) | 0.6581 | 0.4078 | 0.7740 | 0.4915 | 0.3636 | 0.7685 |
DMPL (Ours) | 0.6953 | 0.5137 | 0.7827 | 0.4957 | 0.3717 | 0.7436 |