metadata

base_model:
  - meta-llama/Meta-Llama-3-8B
  - meta-llama/Meta-Llama-3-8B-Instruct
  - rinna/llama-3-youko-8b
  - rinna/llama-3-youko-8b-instruct
  - tokyotech-llm/Llama-3-Swallow-8B-v0.1
  - tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
  - shisa-ai/shisa-v1-llama3-8b
  - lmg-anon/vntl-llama3-8b-v2-qlora
library_name: transformers
tags:
  - mergekit
  - merge
  - translation
  - japanese_media
  - otaku_media
  - visual_novels
  - VNs
language:
  - en
  - ja

Llama-3-VNTL-Yollisa-8B

This is a merge of pre-trained language models created using mergekit.

This merge is an expansion on the idea of merging at extremely low weight as an alternitive to finetuning with the added step of subtracting the base model from finetunes before merging. Instruct format is the custom version of llama3 that VNTL uses, but you should be able to mix in some regular llama3 formats as well, and it might even help with improving translation quality with the right prompt.

Usage

Samplers

No reccomended samplers yet. Stick with temp: 0 or top_k: 1 for now.

Configuration

The following YAML configuration was used to produce this model:

Llama-3-Yollow-8B

models:
  # Pivot model
  - model: meta-llama/Meta-Llama-3-8B
  # Target models
  - model: rinna/llama-3-youko-8b
  - model: tokyotech-llm/Llama-3-Swallow-8B-v0.1
merge_method: sce
base_model: meta-llama/Meta-Llama-3-8B
parameters:
  select_topk: 1.0
dtype: float32

Llama-3-Minus-Base-8B

models:
  # Finetune model
  - model: meta-llama/Meta-Llama-3-8B-Instruct
    parameters:
      weight: 1.0
  # Base model
  - model: meta-llama/Meta-Llama-3-8B
    parameters:
      weight: -1.0
merge_method: task_arithmetic
base_model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
  normalize: false
dtype: float32

Llama-3-Youko-Minus-Base-8B

models:
  # Finetune model
  - model: rinna/llama-3-youko-8b-instruct
    parameters:
      weight: 1.0
  # Base model
  - model: meta-llama/Meta-Llama-3-8B
    parameters:
      weight: -1.0
merge_method: task_arithmetic
base_model: rinna/llama-3-youko-8b-instruct
parameters:
  normalize: false
dtype: float32

Llama-3-Swallow-Minus-Base-8B

models:
  # Finetune model
  - model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
    parameters:
      weight: 1.0
  # Base model
  - model: meta-llama/Meta-Llama-3-8B
    parameters:
      weight: -1.0
merge_method: task_arithmetic
base_model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
parameters:
  normalize: false
dtype: float32

Llama-3-Shisa-Minus-Base-8B

models:
  # Finetune model
  - model: shisa-ai/shisa-v1-llama3-8b
    parameters:
      weight: 1.0
  # Base model
  - model: meta-llama/Meta-Llama-3-8B
    parameters:
      weight: -1.0
merge_method: task_arithmetic
base_model: shisa-ai/shisa-v1-llama3-8b
parameters:
  normalize: false
dtype: float32

Llama-3-VNTL-Yollisa-8B

models:
  # Base
  - model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora
    parameters:
      weight: 1.0
  # Models
  - model: Casual-Autopsy/Llama-3-Minus-Base-8B
    parameters:
      density: 0.35
      weight: 10e-5
  - model: Casual-Autopsy/Llama-3-Shisa-Minus-Base-8B
    parameters:
      density: 0.85
      weight: 25e-5
  - model: Casual-Autopsy/Llama-3-Swallow-Minus-Base-8B
    parameters:
      density: 0.85
      weight: 25e-5
  - model: Casual-Autopsy/Llama-3-Youko-Minus-Base-8B
    parameters:
      density: 0.85
      weight: 25e-5
merge_method: ties
base_model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora
parameters:
  normalize: false
  int8_mask: false
dtype: float32