WizardLM - Empowering Large Language Models to Follow Complex Instructions

本文為 “WizardLM: Empowering Large Language Models to Follow Complex Instructions” (2023.04) 的論文重點摘要

論文全文參考

Demo 參考

Description

最近的 LLaMA 系列 model 的 instruction 均使用較為普通的 self-instruction 生成方法或是人工生成較簡單的 instruction → 缺乏資料多元性
受到 ChatGPT & GPT4 的啟發, 上述兩模型都經由大量高質量的人工標註保證其資料多元性 → 思考低成本取得高質量的 instruction

在複雜的 instruction 資料中, 聲稱比 ChatGPT 表現來的好, 這邊有個疑點是沒有一個好的 benchmark 來做評測標準 (僅用 63 筆測試 & 工讀生判定生成的好壞)
複雜指令 WizardLM 勝率比較
在各種難度的 instruction 中, 表現比同種 LLaMA 架構的 Alpaca 和 Vincuna 來得好, 這也驗證了資料多元性或許可以讓模型學習的更好
本文提出一種新的 instruction 產生方式, 有別於一般的 self-instruction, 該 instruction 會衍生出新的複雜分支結果, 作者稱為 “Evol-Instruct”

“Evol-Instruct” 的核心想法是利用原有的簡單 instruction 透過 ChatGPT API 來產生出進階的 instruction 作為模型訓練資料
“Evol-Instruct” 分為兩大類: In-depth Evolving (藍) 和 In-breadth Evolving (紅)

In-depth Evolving 又可以分為 5 小類, 底下列出對應的 prompt 生成方式
- add constraints
  Please add one more constraints/requirements into #GivenPrompt#
- deepening
  If #Given Prompt# contains inquiries about certain issues, the depth and breadth of the inquiry can be increased
- concretizing
  Please replace general concepts with more specific concepts.
- increase reasoning steps
  If #GivenPrompt# can be solved with just a few simple thinking processes, you can rewrite it to explicitly request multiple-step reasoning.
- complicate input
  You should try your best not to make the #Rewritten Prompt# become verbose, #Rewritten Prompt# can only add 10 to 20 words into #GivenPrompt#.
In-breadth Evolving 基於廣度提升 instruction 的複雜度, 底下列出 prompt 生成方式
Your goal is to draw inspiration from the #Given Prompt# to create a brand new prompt. This new prompt should belong to the same domain as the #Given Prompt# but be even more rare. The LENGTH and difficulty level of the #Created Prompt# should be similar to that of the #Given Prompt#. The #Created Prompt# must be reasonable and must be understood and responded by humans. ‘#Given Prompt#’, ‘#Created Prompt#’, ‘given prompt’ and ‘created prompt’ are not allowed to appear in #Created Prompt#.

Evol-Instruct Example

Overview of Evol-Instruct

Response Generation: 由於經過 “Evol-Instruct” 後的 instruction 可能與原本問題的答案差距甚大, 作者又去打了一輪 ChatGPT API 作為完整的 QA pair 進行訓練

此篇論文提出一種優化 instruction 的方式, 藉由 ChatGPT API 的改寫能力來取得進階的 instruction 以及對應的 output 作為訓練資料
能夠優於 Alpaca, Vincuna 等近期熱門 model, 可以歸功於工程的成功, 雖然資料取得方式較為暴力, 但要養出一支好的羊駝就必須給好的飼料才能有好的成果