1sec.ai

Tag

#leaderboard

Every item tagged leaderboard, newest first.

6 items

researchApr 21

QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard

The QIMMA leaderboard evaluates Arabic language models on 11 tasks, providing a comprehensive benchmark for Arabic NLP. It includes datasets like XNLI-AR and AR-MLQA, and model performances range from 40-80% accuracy. You can use this leaderboard to compare and improve Arabic language models.

Key takeaways
  • Evaluates models on 11 Arabic NLP tasks.
  • Includes datasets like XNLI-AR and AR-MLQA.
  • Model accuracy ranges from 40-80%.
modelsFeb 14

Fixing Open LLM Leaderboard with Math-Verify

The Open LLM Leaderboard has introduced Math-Verify, a new evaluation method that uses mathematical problems to assess model performance. This approach aims to provide a more accurate measure of models' reasoning capabilities. You can now use Math-Verify to benchmark your models. The leaderboard has seen significant participation, with over 100,000 submissions.

Key takeaways
  • Math-Verify evaluates models on mathematical problems.
  • New method aims to improve accuracy of reasoning benchmarks.
  • Over 100,000 submissions to Open LLM Leaderboard.
modelsFeb 10

The Open Arabic LLM Leaderboard 2

The Open Arabic LLM Leaderboard 2 has been released on Hugging Face, providing updated rankings of Arabic-language models. The leaderboard evaluates models on tasks like sentiment analysis and question-answering. You can use this leaderboard to compare and select suitable models for your Arabic-language NLP projects. The leaderboard aims to support the development of more capable Arabic-language models.

Key takeaways
  • Leaderboard evaluates models on Arabic-language tasks.
  • Supports development of more capable Arabic-language models.
  • Available on Hugging Face platform.
otherOct 4

Introducing the Open FinLLM Leaderboard

The Hugging Face FinBench leaderboard evaluates LLMs on financial tasks like risk assessment and sentiment analysis. It provides a benchmark for builders to compare model performance on real-world financial scenarios. The leaderboard aims to help developers choose the best model for their specific use cases. You can use this leaderboard to inform your model selection.

Key takeaways
  • Evaluates LLMs on financial tasks like risk assessment and sentiment analysis.
  • Provides a benchmark for comparing model performance.
  • Helps developers choose the best model for specific use cases.
modelsDec 1

Open LLM Leaderboard: DROP deep dive

The Hugging Face Open LLM Leaderboard now integrates DROP, a challenging reading comprehension benchmark. The addition of DROP increases the diversity of evaluation metrics and provides builders with a more comprehensive view of model performance. The leaderboard currently features 150+ models from 70+ organizations. You can use the leaderboard to compare models and identify areas for improvement.

Key takeaways
  • The Open LLM Leaderboard now includes the DROP benchmark.
  • The leaderboard features 150+ models from 70+ organizations.
  • The addition of DROP increases evaluation metric diversity.
modelsJun 23

What's going on with the Open LLM Leaderboard?

The Open LLM Leaderboard has been updated to use MMLU as its primary benchmark. This change aims to provide a more comprehensive evaluation of language models' performance. The leaderboard now ranks models based on their MMLU scores. You can explore the updated rankings and compare model performance.

Key takeaways
  • The Open LLM Leaderboard now uses MMLU as its primary benchmark.
  • The leaderboard ranks models based on their MMLU scores.
  • The change aims to provide a more comprehensive evaluation of language models.