Tag

#long-context

Every item tagged long-context, newest first.

6 items

The Stanford EDGAR Filings Dataset: Reconstructing U.S. Corporate and Financial Disclosures into Layout-Faithful and Token-Efficient Pretraining Data

Researchers at Stanford created the Stanford EDGAR Filings Dataset, an open dataset of SEC filings reconstructed into layout-faithful MultiMarkdown. This dataset provides a new source of long-context training data for financial language models, addressing the scarcity of high-quality, publicly available documents. The dataset can be used to pretrain large language models, potentially improving their performance on financial tasks. You can access the dataset for your own research and model fine-t

Key takeaways

Stanford EDGAR Filings Dataset is an open dataset of SEC filings.
Dataset is in layout-faithful MultiMarkdown format.
Addresses scarcity of long-context training data for financial LLMs.

aarXiv#open-data #financial-llm #long-context

modelsApr 28

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

NVIDIA released Nemotron 3 Nano Omni, a multimodal model that processes long-context inputs across documents, audio, and video. The model is optimized for agent applications and available on Hugging Face. You can deploy it for tasks like document understanding, speech recognition, and video analysis.

Key takeaways

Processes long-context multimodal inputs.
Optimized for agent applications.
Available on Hugging Face for deployment.

HHugging Face Blog#multimodal #long-context #agent-applications

modelsApr 24

DeepSeek-V4: a million-token context that agents can actually use

DeepSeek-V4 offers a 1M token context window, making it suitable for long-range tasks. The model is available on Hugging Face for download and integration into applications. A 1M token context enables more comprehensive text analysis and generation. This capability is particularly useful for builders working on tasks that require processing large volumes of text.

Key takeaways

1M token context window for comprehensive text analysis.
Available on Hugging Face for download and integration.
Enables long-range tasks with large text volumes.

HHugging Face Blog#long-context #open-source #hugging-face

modelsJul 8

SmolLM3: smol, multilingual, long-context reasoner

SmolLM3 is a new multilingual, long-context LLM released on the Hugging Face platform. It is designed for reasoning tasks and offers a unique combination of capabilities. The model is available for download and use. You can explore its features and performance on the Hugging Face blog.

Key takeaways

SmolLM3 is multilingual and supports long-context reasoning.
The model is available on the Hugging Face platform.
It is designed for tasks that require reasoning capabilities.

HHugging Face Blog#multilingual-llm #long-context #hugging-face

researchApr 16

Introducing HELMET: Holistically Evaluating Long-context Language Models

Researchers introduced HELMET, a benchmark for evaluating long-context language models. HELMET assesses models on tasks requiring up to 128k token context. You can use HELMET to compare models like Llama-3, GPT-4o, and Claude 3.5 on long-context tasks. This helps you identify which models excel in handling lengthy inputs.

Key takeaways

HELMET evaluates models on up to 128k token context.
Benchmark includes tasks for long-context language understanding.
You can use HELMET to compare models like Llama-3, GPT-4o, and Claude 3.5.

HHugging Face Blog#long-context #benchmarks #evaluation

researchJan 23

Mastering Long Contexts in LLMs with KVPress

Researchers from NVIDIA and Hugging Face introduced KVPress, a method to improve long-context handling in large language models. KVPress uses a combination of techniques like sparse attention and compression to efficiently process longer sequences. This approach allows LLMs to handle up to 128K tokens, significantly expanding their context window. You can now explore KVPress in the Hugging Face Transformers library.

Key takeaways

KVPress enables LLMs to handle up to 128K tokens.
Uses sparse attention and compression for efficiency.
Available in Hugging Face Transformers library.

HHugging Face Blog#long-context #llms #transformers