Llama 3 zhtw

在 Llama 3 δΈŠθ©¦ι©—δΈ­ζ–‡ Continue Pretraining (CP)οΌŒε…±θ¨ˆθ¨“η·΄ 800M tokens。

由於中文預訓練θͺžζ–™ε“θ³ͺι‚„ζœ‰ζ”Ήι€²η©Ίι–“οΌŒCP 後葨現ζœͺθƒ½θΆ…θΆŠεŽŸη‰ˆ Llama 3οΌŒζˆ‘ε€‘ζ―”θΌƒεΉΎε€‹ι–‹ζΊη€ΎηΎ€θ¨“η·΄ηš„δΈ­ζ–‡ Llama 3 δΉŸζœ‰ι‘žδΌΌη‹€ζ³γ€‚

εœ¨θ‹±ζ–‡ζ–Ήι’ LLaMA 3 zhtw 使用 FineWebοΌŒδ½ΏεΎ— MMLU θ‘¨ηΎι«˜ζ–Όε…Άδ»–δΈ­ζ–‡CPζ¨‘εž‹οΌŒθƒ½εŠ›θˆ‡εŽŸη‰ˆ LLaMA 3 ζŒεΉ³γ€‚

Benchmarks

Models ↑ TMMLU+ (ACC) CMMLU (ACC) MMLU (ACC)
TC, Knowledge CN, Knowledge EN, Knowledge
5 shot 5 shot 5 shot
Yi-6B 6B 49.63 75.53 65.35
Qwen-7B 7B 42.84 73.1 61.00
Meta-Llama-3-8B 8B 41.97 50.8 65.17
p208p2002/llama-3-zhtw-8B 8B 41.84 50.6 65.31
Breeze-7B-Base-v0_1 7B 40.35 44.05 61.63
hfl/llama-3-chinese-8b 8B 39.64 50.9 61.1

Recipe

Datasets

Dataset Lang Weight
FineWeb en 0.35
Wudao zh-cn 0.1
C4Tw zh-tw 0.1
WikiZhTw zh-tw 0.15
NdltdT10 zh-tw 0.1
GitHubMarkDown code 0.1
GitHubPython code 0.1

Hyper Parameters

  • Learning Rate: 1e-7
  • Global Batch Size: 60
  • Sequence Length: 8192
Downloads last month
12
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for p208p2002/llama-3-zhtw-8B

Quantizations
1 model

Datasets used to train p208p2002/llama-3-zhtw-8B

Spaces using p208p2002/llama-3-zhtw-8B 6

Collection including p208p2002/llama-3-zhtw-8B