深度學習的attention機制是什麼？

首頁>Club>伏城在上海2021-04-07 01:51

深度學習的attention機制是什麼？

回覆列表

1 # 北航秦曾昌

最近幾年，深度學習（Deep Learning）熱度大漲，相關研究也越來越深度，重新整理很多領域傳統方法達不到的效果。其中注意力機制（Attention Mechanism）逐漸成為神經網路的一個新的熱點，本文對目前的Attention Mechanism在自然語言處理（NLP）領域做一個簡單的概括。

一、Attention Mechanism的分類
Attention Mechanism 根據現有的研究分為三類，分別是Hard Attention，Soft Attention與Self Attention，其中，Soft Attention又下分成Global Attention與Local Attention。

二、Attention Mechanism的經典之作

（1）Neural Machine Translation by Jointly Learning to Align and Translate [1]

這篇文章算是在NLP領域使用attention的第一個。在機器翻譯中加入attention，也就是對原有的sequence to sequence的encoder to decoder結構進行了改進。
encoder與decoder可以是RNN或其變體的結構，只不過在decoder target sentences的時候，並不是統一的基於一個encoder得到的context vectors，而是考量encoder中每一個隱藏節點ht對當前target word（yt）的貢獻，其中的aij就是source到target的相關程度、重要程度的體現。

（2）Effective Approaches to Attention-based Neural Machine Translation [2]

這篇論文同樣也是非常有代表性的文章，就是這篇文章對attention

在RNN模型上進行了形式上的擴充套件，提出了兩種attention機制，一種是全域性（global）機制，一種是區域性（local）機制。
簡單來說，global attention，就是與文獻[1]的思路是一樣的，就是deconder需要對源語言的所有詞進行考量，對具體的操作，提出幾種不同的方法。

local attention的主要目的是減少global attention計算時的耗費，local attention不去考慮源語言端的所有詞，而是根據一個預測函式，先預測當前解碼時要對齊的源語言端的位置Pt，然後透過上下文視窗，僅考慮視窗內的詞。

（3）Attention Is All You Need[3]

此前關於，Attention的研究結果已經充分表明，加入Attention的神經網路，無論是RNN還是CNN，無論是處理影象還是文字，效果都得到了很大的提升，而谷歌提出的Attention Is All You Need就更有意思了，直接用attention取代了CNN與RNN，單純用attention就能在諸如翻譯的問題上取得更好的結果，當然文章提出了一種新的attention的方式，主要的創新點就在self-attention、encoder也引入attention和Multi-Head Attention。
參考文獻：

[1] Bahdanau, D., Cho, K. & Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. Iclr 2015 1–15 (2014).

[2] Luong, M. & Manning, C. D. Effective Approaches to Attention-based Neural Machine Translation. 1412–1421 (2015).

[3] Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[J]. 2017.

劇多

深度學習的attention機制是什麼？

相關內容