Llama-3-8B-3SOME-v2-Regularized
This is a regularization merge based off of Joseph717171's, but with the additional steps of preserving some token embeddings and using model_stock after ties. This seems to have kept most of the model's original capabilities while noticeably improving it, at least for roleplay and story writing; I have not tested it outside of these use cases. Some of the benefits that I have observed are: less repetition, including sentence structure repetition, wider variety of responses, and better contextual understanding.
I tested this with other llama-3-8B models and it had the same effect on them, but I haven't tested it on other sizes or architectures. I'll probably upload more of these later.
Details
Models Used
Merge Config
merge_method: ties
dtype: bfloat16
parameters:
normalize: true
int8_mask: true
chat_template: "llama3"
base_model: NousResearch/Meta-Llama-3-8B
models:
- model: TheDrummer/Llama-3SOME-8B-v2
parameters:
density: 1
weight: 1
tokenizer:
source: TheDrummer/Llama-3SOME-8B-v2
tokens:
<|begin_of_text|>:
source: TheDrummer/Llama-3SOME-8B-v2
force: true
<|end_of_text|>:
source: TheDrummer/Llama-3SOME-8B-v2
force: true
<|start_header_id|>:
source: TheDrummer/Llama-3SOME-8B-v2
force: true
<|end_header_id|>:
source: TheDrummer/Llama-3SOME-8B-v2
force: true
<|eot_id|>:
source: TheDrummer/Llama-3SOME-8B-v2
force: true
name: Llama-3-8B-3SOME-v2-ties
---
merge_method: model_stock
dtype: bfloat16
parameters:
normalize: true
int8_mask: true
chat_template: "llama3"
tokenizer:
source: union
base_model: NousResearch/Meta-Llama-3-8B
models:
- model: TheDrummer/Llama-3SOME-8B-v2
- model: Llama-3-8B-3SOME-v2-ties
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support