1sec.ai

Tag

#model-architecture

Every item tagged model-architecture, newest first.

1 item

Variable-Width Transformers

Transformers with variable width outperform constant-width models on a range of tasks. The proposed ×-Transformer consistently outperforms parameter-matched baselines, suggesting nonuniform capacity allocation improves performance. This work empirically investigates nonuniform capacity allocation across network depth.

Key takeaways
  • Most transformer architectures maintain constant width across all layers.
  • Proposed ×-Transformer consistently outperforms parameter-matched baselines.
  • Nonuniform capacity allocation improves performance on a range of tasks.