#financial-llm — 1sec.ai

The Stanford EDGAR Filings Dataset: Reconstructing U.S. Corporate and Financial Disclosures into Layout-Faithful and Token-Efficient Pretraining Data

Researchers at Stanford created the Stanford EDGAR Filings Dataset, an open dataset of SEC filings reconstructed into layout-faithful MultiMarkdown. This dataset provides a new source of long-context training data for financial language models, addressing the scarcity of high-quality, publicly available documents. The dataset can be used to pretrain large language models, potentially improving their performance on financial tasks. You can access the dataset for your own research and model fine-t

Key takeaways

Stanford EDGAR Filings Dataset is an open dataset of SEC filings.
Dataset is in layout-faithful MultiMarkdown format.
Addresses scarcity of long-context training data for financial LLMs.

aarXiv#open-data #financial-llm #long-context

otherOct 4

Introducing the Open FinLLM Leaderboard

The Hugging Face FinBench leaderboard evaluates LLMs on financial tasks like risk assessment and sentiment analysis. It provides a benchmark for builders to compare model performance on real-world financial scenarios. The leaderboard aims to help developers choose the best model for their specific use cases. You can use this leaderboard to inform your model selection.

Key takeaways

Evaluates LLMs on financial tasks like risk assessment and sentiment analysis.
Provides a benchmark for comparing model performance.
Helps developers choose the best model for specific use cases.

HHugging Face Blog#financial-llm #benchmarks #leaderboard