Multimodal few-shot classification without attribute embedding
Abstract Multimodal few-shot learning aims to exploit complementary information inherent in multiple modalities for vision tasks in low data scenarios.Most of the current research focuses on a suitable embedding space for the various modalities.While solutions based on embedding provide state-of-the-art results, they reduce the interpretability of