

Share Dialog
Share Dialog
Subscribe to SeanChao
Subscribe to SeanChao
<100 subscribers
<100 subscribers
https://github.com/allenai/naacl2021-longdoc-tutorial
https://underline.io/events/122/sessions?eventSessionId=4103

Leverage natural hierarchy of the document (words→sentences→paragraphs)


Local LSTM + Global LSTM as Encoder:


This can be used for pre-training.
Goal: update the representations of chunks conditioned on other chunks multiple times



the time complexity of self-attention is O(n^2)
Transformer XL
Compressive Transformers: keep a compressed history memory
make attention operations sparse
Content-based patterns
Reformer: locality sensitive hashing + attention calculation within the same hash buckets

Apply a kernel function to Q and V, the attention becomes:

https://github.com/allenai/naacl2021-longdoc-tutorial
https://underline.io/events/122/sessions?eventSessionId=4103

Leverage natural hierarchy of the document (words→sentences→paragraphs)


Local LSTM + Global LSTM as Encoder:


This can be used for pre-training.
Goal: update the representations of chunks conditioned on other chunks multiple times



the time complexity of self-attention is O(n^2)
Transformer XL
Compressive Transformers: keep a compressed history memory
make attention operations sparse
Content-based patterns
Reformer: locality sensitive hashing + attention calculation within the same hash buckets

Apply a kernel function to Q and V, the attention becomes:

No activity yet