@an_lzv привет! Можешь в head плиз вставить код
Scroll

ViT (Vision Transformer)

An advanced image recognition model that uses transformer architecture instead of traditional convolutional networks, excelling in tasks like object detection, image segmentation, and classification.