1sec.ai

Tag

#vision-language-model

Every item tagged vision-language-model, newest first.

3 items

modelsFeb 19

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

Google released PaliGemma 2 Mix, a new instruction-tuned vision language model. The model is available on Hugging Face. It targets applications requiring image-text understanding. You can explore its capabilities through the Hugging Face demo.

Key takeaways
  • PaliGemma 2 Mix is instruction-tuned for image-text tasks.
  • Available on Hugging Face for download and demo.
  • Targets applications like visual question answering.
modelsNov 26

SmolVLM - small yet mighty Vision Language Model

The SmolVLM model is a new vision language model that is small yet efficient. It is designed to be a compact and capable model for visual tasks. The model is available on the Hugging Face platform. You can use it for various applications that require vision and language understanding.

Key takeaways
  • SmolVLM is a small yet efficient vision language model.
  • Available on Hugging Face platform.
  • Designed for compact and capable visual tasks.
modelsMay 14

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Google released PaliGemma, an open vision language model that combines image and text understanding. PaliGemma is designed for tasks like visual question answering and image captioning. You can use PaliGemma for applications requiring both vision and language capabilities. The model is available on the Hugging Face platform.

Key takeaways
  • PaliGemma is an open vision language model.
  • Designed for visual question answering and image captioning.
  • Available on Hugging Face platform.