A Past, Present, and Future of Attention

A Background on Attention Mechanisms in Deep Learning

Past

Seq2Seq/Encoder-Decoder Model

Align and Translate

Fig. 1: The attention mechanism used in Bahdanau et. al (2015) is shown here as the feed-forward network (a_{t,n}) in between the encoder (the bidirectional RNN on the bottom) and the decoder (RNN with input from the context vector and the previous state).

Visual Attention

Fig. 2: The model used by Xu et. al (2015) in tackling the image captioning problem. They use a convolutional layer to extract certain features from the input image and use a recurrent network to align those features with the corresponding word using attention.
Fig. 3: Results from Xu et. al (2015) are shown here with generated captions — the underlined word in the caption corresponds to the white highlighted “attended” portion of the image.

Present

Transformer

How Does The Transformer Work?

Architecture

Fig. 4: Transformer model architecture (Image source: Vaswani et. al (2017)⁵)

Encoder

Decoder

The Performer

Future

References

FatBrain Fellow ’20 — ‘21 | Reed College Physics ‘20

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store