Analyzing the Structure of Attention in a Transformer Language Model
Details
Florence, USA. Date of Talk: 2019-07-28
Speakers
Jesse Vig
Event
Analyzing the Structure of Attention in a Transformer Language Model
The Transformer is a fully attention-based alternative to recurrent networks that has achieved state-of-the-art results across a range of NLP tasks. In this paper, we analyze the structure of attention in a Transformer language model, the GPT-2 small pretrained model. We visualize attention for individual instances and analyze the interaction between attention and syntax over a large corpus. We find that attention targets different parts of speech at different layer depths within the model, and that attention aligns with dependency relations most strongly in the middle layers. We also find that the deepest layers of the model capture the most distant relationships. Finally, we extract exemplar sentences that reveal highly specific patterns targeted by particular attention heads.
Additional information
Focus Areas
Our work is centered around a series of Focus Areas that we believe are the future of science and technology.
Licensing & Commercialization Opportunities
We’re continually developing new technologies, many of which are available for Commercialization.
News
PARC scientists and staffers are active members and contributors to the science and technology communities.