發布於 2023-04-11

Improving Implicit Sentiment Learning via Local Sentiment Aggregation

本文為 “Improving Implicit Sentiment Learning via LSA” (2021.10) 的論文重點摘要

論文全文參考 (強烈建議 v1, v2版本一起閱讀)

Improving Implicit Sentiment Learning via LSAhttps://arxiv.org/abs/2110.08604

Demo 參考

ABSA Quadruple Extractionhttps://huggingface.co/spaces/Gradio-Blocks/Multilingual-Aspect-Based-Sentiment-Analysis

Description

Goal

2022年 ABSC 的 SOTA model (SemEval-2014 Task 4 - Subtask 2)
ABSA/ABSC 任務 (Aspect Based Sentiment Analysis/Classification)
- 根據資料集的定義, 這任務又細分為兩類 Aspect term polarity 和 Aspect category polarity
- Aspect term polarity: 對已經提供的 term 進行情感分類。
  - Example: I hated their fajitas, but their salads were great → {fajitas: negative, salads: positive}
- Aspect category polarity: 對抽象概念 aspect 進行情感分類。
  - Example: The restaurant was too expensive → {price: negative}
- 此篇論文討論的 ABSC 問題是屬於 Aspect term polarity 類的。

Contributions

非句法樹結構方法優於句法樹(syntax tree)方法。
- 實驗中比較了數個 syntax-based 的方法在不同的資料集上, 如: SK-GCN-BERT (2020), DGEDT-BERT (2020), ASGCN-RoBERTa (2021), SARL-RoBERTa (2021) 等等。
- syntax tree 有個比較大的問題是 syntax 切出來的 token, 會與 BERT token 有 alignment issue。
- 此篇論文 propose 的方法 $LSA_{T}-X-DeBERTa$ , $LSA_{S}-X-DeBERTa$ 均優於上面 syntax-based 方法約 2-8% 的 F1 score。 (X 代表是 large 模型)

syntax tree example

syntax tree example

LSA (Local Sentiment Aggregation) 是 ABSC 的通用泛式。
- 實驗中抽換了 pre-train model (BERT, RoBERTa, DeBERTa) 對於結果都有較顯著的改進 0.5%-1%。因此作者認為 LSA 具有一定的擴展性和靈活性。
- LSA 是一種通用的架構用來捕捉 aspect 附近的 local sentiment, 分別提取出 aspect feature, 底下會介紹相關細節。
差分加權策略 → 使 LSA 可以用 gradient descent 來優化相鄰情感值。
- 此策略主要是用在多個 aspect 使用的, 兩個 aspect 中間的子句在計算 aspect feature 時會同時受到影響, 此時就乘上一個 weight $\eta^L$ , $\eta^R$ 來自動訓練出哪個 aspect 影響比較大。

Methodology

Sentiment Pattern

作者觀察了五個著名的 ABSC 資料集, 歸納出情感具有 cluster 特性, 因此有了提取相鄰情感的想法 (Sentiment Coherency)。
相鄰情感可以處理更進階的隱式情感, 並可以消除因為部分噪音文字造成的分類錯誤問題。

此例子中有兩個 sentiment cluster

此例子中有兩個 sentiment cluster

Local Sentiment Aggregation (LSA)

LSA 使用了 sentiment aggregation window 用來抽取上述的相鄰情感
具體用了三個 local sentiment feature representation
- $LSA_P$ $L S A_{P}$ : BERT-SPC based feature
  - 將 aspect 嵌在句子後面, 利用 BERT 原本就有注意力模組來提取 aspect feature
  - Example: CLS text SEP aspect SEP
  BERT-SPC based feature
- $LSA_T$ $L S A_{T}$ : Local content focus (LCF) based feature
  - 計算 token 和 aspect 的相對距離來獲取 aspect feature
  詳細計算方式
  - 定義
    $\{W_1^C,W_2^C,...,W_n^C\}$ 為 token 序列
    $H^C_{w_i^C}$ (要學的向量) 為每個 token 位置的 hidden state
    $d_{w_i^C}$ (定值) 為 token $W_i^C$ 與 aspect 的距離
    $\alpha$ (超參數) 為距離的 threshold, 通常訂為 3
    $H^*_{w_i^C}$ : aspect feature
  - 計算方式
    token 與 aspect 較近的時候 ( $d_{w_i^C}<\alpha$ ), aspect feature 就是當下的 $H^C_{w_i^C}$
    token 與 aspect 較遠的時候 ( $d_{w_i^C}\geq\alpha$ ), aspect feature 會乘距離懲罰項 $(1-\frac{d_{w_i^C}-\alpha}{n})$
    $d_{w_i^C}$ 距離計算為所有的 aspect 和 token $w_i^C$ 平均絕對值距離
- $LSA_S$ $L S A_{S}$ : Syntactical local context focus (LCFS) based feature
  - 距離計算改為 syntax-tree 中每個 token 到 aspect 的最短距離
  詳細計算方式
  - 計算方式
    $H^*_{w_i^C}$ 計算方式與 $LSA_T$ 相同
    $d_{w_i^C}=\frac{\sum_{i=j}^mdist(w_i^C,w_j^a)}{m}$

Sentiment Aggregation Window

Sentiment Aggregation Window 拼接了 aspect 附近的 aspect feature (left, right, text indices)
Aggregation Window Padding
- 拼接的時候會遇到一個狀況是剛好 aspect 位於句子邊界的情況, 作者使用了 copy 的方式做 padding 而非傳統補空值的方法。
- Example
  - 無左右兩邊界: $\left[null,H^t,null\right]$ → $\left[H^t,H^t,H^t\right]$
  - 無右邊界: $\left[H^L,H^t,null\right]$ → $\left[H^L,H^t,H^t\right]$
Differential Weighted Aggregation
- 當有多個 aspect 時, 會遇到共用子句的狀況這時候計算 aspect feature 會有衝突, 這種時候就會用 $\eta^*_l$ 和 $\eta^*_r$ 分別代表左右 aspect feature 的加權值。
- Aggregated hidden state
$H^o_{dwa}=\left[\eta^*_l\{H^l_k\};H^t;\eta_r^*\{H_k^r\}\right]$

Conclusion

此篇論文提出一個藉由 LSA 的方法來提取 local sentiment, 在多個 dataset 應證此方法的有效性。
在 $LSA_S$ 中使用了 syntax-tree 的架構, 由於上面提到的 token alignment 問題, 作者不建議使用。
此篇論文有個比較不直觀的地方是左右 aspect feature, 作者有實驗了去除和保留的差異, 如下表:

Aggregating Window 架構比較

Aggregating Window 架構比較