Tag

#multilingual

Every item tagged multilingual, newest first.

5 items

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

IBM released Granite Embedding Multilingual R2 under Apache 2.0, offering 32K context and sub-100M retrieval quality. This open multilingual embedding model supports 100+ languages and targets builders seeking high-quality, locally deployable embeddings. The model's performance makes it suitable for applications requiring low-latency, high-accuracy retrieval.

Key takeaways

32K context window for handling long input sequences.
Sub-100M retrieval quality benchmark achieved.
Supports 100+ languages for multilingual applications.

HHugging Face Blog#open-source #multilingual #embeddings

researchNov 21

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

The Open ASR Leaderboard now includes multilingual and long-form speech recognition tracks, expanding its scope beyond English-only short-form transcription. This update enables more comprehensive evaluation of automatic speech recognition systems across diverse languages and audio formats. You can explore the refreshed leaderboard and dataset to assess ASR model performance in real-world scenarios. The leaderboard's growth reflects increasing demand for robust, multilingual ASR capabilities.

Key takeaways

Open ASR Leaderboard adds multilingual and long-form tracks.
Expanded scope enables more comprehensive ASR system evaluation.
Leaderboard now reflects growing demand for multilingual ASR.

HHugging Face Blog#open-source #speech-recognition #multilingual

modelsMar 12

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Google released Gemma 3, a new open LLM that is multimodal, multilingual, and has a long context window. Gemma 3 is available on Hugging Face and aims to provide a high-performance, open alternative for builders. The model supports multiple languages and modalities, making it suitable for a wide range of applications.

Key takeaways

Gemma 3 is multimodal and multilingual.
Available on Hugging Face.
Long context window for handling complex inputs.

HHugging Face Blog#open-llm #multimodal #multilingual

modelsMay 24

Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens and 11 languages

The Technology Innovation Institute released Falcon 2, an 11B parameter pretrained language model and vision-language model (VLM) trained on 5000B tokens across 11 languages. Falcon 2 targets applications requiring broad multilingual and multimodal capabilities. You can access Falcon 2 models via Hugging Face for research and product development.

Key takeaways

11B parameter model trained on 5000B tokens across 11 languages.
Supports both language and vision-language tasks.
Available on Hugging Face for research and development.

HHugging Face Blog#multilingual #multimodal #pretrained-models

modelsJul 12

Introducing The World's Largest Open Multilingual Language Model: BLOOM

The BLOOM model, developed by the BigScience research workshop, is a multilingual language model with 176 billion parameters, making it one of the largest open models available. It was trained on 1.5 trillion tokens across 46 languages. BLOOM is designed to be a more accessible and transparent alternative to closed language models, allowing builders to fine-tune and adapt it for specific use cases. The model's large size and diverse training data enable it to handle a wide range of natural langa

Key takeaways

176 billion parameters, one of the largest open models.
Trained on 1.5 trillion tokens across 46 languages.
Designed for accessibility and transparency, allowing fine-tuning.

HHugging Face Blog#open-source #multilingual #large-language-models