Image for JavaScript Programming Language

GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification

Abstract: Visual document understanding (VDU) has rapidly advanced with the development of powerful multi-modal language models. However, these models typically require extensive document pre-training ...

IEEE

SARCLIP: The First Vision–Language Foundation Model for SAR Image

Abstract: Foundation models have achieved remarkable breakthroughs across various domains, with the widely use of masked image modeling (MIM) and self-supervised learning (SSL). However, these models ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification

SARCLIP: The First Vision–Language Foundation Model for SAR Image

Trending now