researchApr 21
QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard
The QIMMA leaderboard evaluates Arabic language models on 11 tasks, providing a comprehensive benchmark for Arabic NLP. It includes datasets like XNLI-AR and AR-MLQA, and model performances range from 40-80% accuracy. You can use this leaderboard to compare and improve Arabic language models.
Key takeaways
- Evaluates models on 11 Arabic NLP tasks.
- Includes datasets like XNLI-AR and AR-MLQA.
- Model accuracy ranges from 40-80%.