research15h
Explaining Attention with Program Synthesis
Researchers propose a program synthesis approach to explain attention in transformer language models by approximating attention heads with executable programs. They compute attention matrices on random training examples and prompt a language model to generate a program that mimics the attention head's behavior. The generated programs provide insights into how attention heads work. This method can help build more interpretable deep learning models.
Key takeaways
- Program synthesis used to approximate attention head behavior.
- Attention matrices computed on random training examples.
- Generated programs provide insights into attention head workings.