2023 LLM 產學技術交流會

本文為 2023 LLM 產學技術交流會 的演講筆記

Keynote (中研院院士孔祥重 & 聯發科副總林宗瑤)

Embedding vectors

Large Language Models: text → Embedding
Multimodal Models: (text, image) → text and image embeddings

Tackled by old master

Forward Knowledge: Causes → Observed Effects
- 藉由觀察而有大量資料
Inverse problems: Observed Effects (e.g.,Defect Image) → Causes
- 困難的任務 & 稀疏的資料
- With Forward Knowledge, 可以分為三種 Task
  1. Input Known or novel?
    - Cluster seen images into embedding space
    - Form mediod-text-prompt defined classes
  2. Classify the input or create a new class
  3. Incorporate new classes into knowledge base
    - fine-tune model (e.g.,CLIP)
- 方法
  - Static classifier (before 2021)
    - Costly training & predict only seen classes
  - Dynamic classifier (Recent)
    - pre-trained Visual-language model (e.g.,CLIP) & text prompt defined downstream classifier for any target classes

Sensitive to prompt details

chosen prompts maybe misaligned to image class distributions
- Text Prompt Example
  - “Orange cat wearing bowtie”: tie (20%), cat & tie (80%)
  - “Orange cat wearing a bowtie”: cat & tie (100%)

Challenge to move LLM to edge

Analytic AI vs. Generative AI
- 參數量: <10M vs. >1000M
- 推論算力: 1-10’s TOPS vs. 100’s-1000’s TOPS
  - 以 LLaMA-7B 為例需要 40 TOPS for 512 words/1sec
  - 頻寬: 70G/sec for 10 words/sec
SW/HW co-optimization
- Method
  - Pruning: take advantage of sparsity
  - Quantization: enable low bitwidth from FP32 to INT4
  - Compression: reduce memory footprint and decompressed in APU on-the-fly
- Benefit
  - >60% memory footprint and access reduction
  - >3X performance improvement
  - 3 token/sec → 10 token/sec
- Drawback
  - Has quality loss issue

專題演講 (陽交大教授陳添福 & 中央教授蔡宗翰 & 台大教授李宏毅)

LLM

LLM Pipeline: LLM → Fine-tune → Optimization → Deployment
Open LLM
- Falcon-7B
  - GPU usage: ~15GB
  - Training token: 1.5T tokens
  - Extra technology: FlashAttention and multi-query attention
- LLaMA 2-7B
  - GPU usage: ~10GB
  - Training token: 2.0T tokens
TAIDE (Trustworthy AI Dialogue Engine)
- Dataset
  - Training Dataset (3.1B tokens)
    - rm-static-zh
    - alpaca-zh by NTU
    - 教育部國語辭典
    - reliable_source_news
    - oots_zh_wiki
    - 科技大擂台_訓練資料集 & 測試資料集
    - Formosa Language Understanding Dataset (FLUD)
  - Fine-tune Dataset (42w)
- Model
  - LLaMA2-13B-Chat → CP (3.1B tokens) → fine-tune (42w) → Taide-LLaMA2-13B-Chat
- Method
  - multi-node training
  - Deepspeed
Custom LLM == Foundation model + Custom Fine-tuning data + target landing
- LLM + fine-tuning (PEFT) → Custom LLM + SFT data → Optimization → Deployment
- Training LLM efficiently
  - LoRA (Low-Rank Adaptation)
  - (IA)^3 (Infused Adapter by Inhibiting and Amplifying Inner Activation)
  - UniPEFT (Unified PEFT)
Compress LLM
- Knowledge distillation
  - Attention-Guide Distillation: force student to pay more attention to what teacher focuses on
  - Intermediate representation Distillation: student learns teacher’s inference process and output distribution
- Pruning
  - LLM-Pruner
  - ZipLM
  - Wanda
- Sparsity
  - SparseGPT
  - Sparse-Quantized Representation (SpQR)
  - SqueezeLLM
- Low-precision inference (FP16, BF16)
- Quantization
  - INT8, INT4 by PTQ, QAT
MoE (mixture-of-experts)
- Benefit
  - Good for model parallelism
  - Better on Knowledge-heavy tasks
- Drawback
  - Worse on reasoning tasks
  - slower at transferring knowledge

Speech ChatGPT

Speech LM
- 有別於當前熱門的 LLM decoder 架構, 在語音任務中只用 decoder 架構無法有效的對齊 (e.g.,不同語調指到同類別), 所以採用了 encoder-decoder 架構
- 利用 text prompt 可以做各種語音分類任務, 這邊 prompt 可以用 embedding 的方式來替換 (透過 gradient descent 學習)
- 對於沒看過的標籤有一定的遷移學習能力, 但效果有限

Conclusion

此演講一大重點是首次公布’號稱’台版 ChatGPT TAIDE 的訓練細節, 使用了 Meta 最新的 LLaMA2-13B-Chat 模型, 搭配數十張的 V100 經過 100 天的訓練而得到 Taide-LLaMA2-13B-Chat, 有以下特點
- 號稱是台灣在地化的模型 (∵ 計算繁體中文平均機率值 > 簡體中文平均機率值)
- 號稱不會像 Taiwan-LLaMA2 一樣學中文後忘英文 (∵ 透過大量 chatGPT 產出的中英文 alignment 資料)
- 號稱是一個單純的模型, 不會像 Taiwan-LLaMA2 回答一些毒品相關的知識 (∵ Taiwan-LLaMA2 在預訓練資料階段學習各種色情和借貸廣告資料 zh_TW_c4)
- 號稱未來會釋出商用模型 TAIDE-LLaMA2-C

2023 LLM 產學技術交流會

Keynote (中研院院士 孔祥重 & 聯發科副總 林宗瑤)

Embedding vectors

Tackled by old master

Sensitive to prompt details

Challenge to move LLM to edge

專題演講 (陽交大教授 陳添福 & 中央教授 蔡宗翰 & 台大教授 李宏毅)

LLM

Speech ChatGPT

Conclusion

Keynote (中研院院士孔祥重 & 聯發科副總林宗瑤)

專題演講 (陽交大教授陳添福 & 中央教授蔡宗翰 & 台大教授李宏毅)