Llama-3-8B-3SOME-v2-Regularized

This is a regularization merge based off of Joseph717171's, but with the additional steps of preserving some token embeddings and using model_stock after ties. This seems to have kept most of the model's original capabilities while noticeably improving it, at least for roleplay and story writing; I have not tested it outside of these use cases. Some of the benefits that I have observed are: less repetition, including sentence structure repetition, wider variety of responses, and better contextual understanding.

I tested this with other llama-3-8B models and it had the same effect on them, but I haven't tested it on other sizes or architectures. I'll probably upload more of these later.

Details

Models Used

Merge Config

merge_method: ties
dtype: bfloat16
parameters:
  normalize: true
  int8_mask: true
chat_template: "llama3"
base_model: NousResearch/Meta-Llama-3-8B
models:
  - model: TheDrummer/Llama-3SOME-8B-v2
    parameters:
      density: 1
      weight: 1
tokenizer:
  source: TheDrummer/Llama-3SOME-8B-v2
  tokens:
    <|begin_of_text|>:
      source: TheDrummer/Llama-3SOME-8B-v2
      force: true
    <|end_of_text|>:
      source: TheDrummer/Llama-3SOME-8B-v2
      force: true
    <|start_header_id|>:
      source: TheDrummer/Llama-3SOME-8B-v2
      force: true
    <|end_header_id|>:
      source: TheDrummer/Llama-3SOME-8B-v2
      force: true
    <|eot_id|>:
      source: TheDrummer/Llama-3SOME-8B-v2
      force: true
name: Llama-3-8B-3SOME-v2-ties
---
merge_method: model_stock
dtype: bfloat16
parameters:
  normalize: true
  int8_mask: true
chat_template: "llama3"
tokenizer:
  source: union
base_model: NousResearch/Meta-Llama-3-8B
models:
  - model: TheDrummer/Llama-3SOME-8B-v2
  - model: Llama-3-8B-3SOME-v2-ties
Downloads last month
2
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for HiroseKoichi/Llama-3-8B-3SOME-v2-Regularized