|
--- |
|
license: mit |
|
library_name: transformers |
|
base_model: |
|
- huihui-ai/Moonlight-16B-A3B-Instruct-abliterated |
|
tags: |
|
- abliterated |
|
- uncensored |
|
- Pruned |
|
--- |
|
|
|
# huihui-ai/Moonlight-16B-A3B-Instruct-abliterated-Pruned |
|
|
|
|
|
This is a pruned version of the [huihui-ai/Moonlight-16B-A3B-Instruct-abliterated](https://huggingface.co/huihui-ai/Moonlight-16B-A3B-Instruct-abliterated), |
|
reduced from 64 experts to 32 experts. The pruned model is mainly used for [code](https://huggingface.co/huihui-ai/Moonlight-16B-A3B-Instruct-abliterated-Pruned/blob/main/coding_problems.py) generation. |
|
|
|
|
|
This is a test validation to see if we can prune the model according to professional requirements and still maintain acceptable performance. The model size has been reduced by about half, and no distortion has occurred. |
|
|
|
This allows the model to be pruned according to one's needs. |
|
|
|
This pruned model has a total parameter is equivalent to 8B. |
|
|
|
This model has the same architecture as [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) model, and we will try the pruned version of the [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) model. |
|
|
|
|
|
|
|
## Use with transformers |
|
|
|
``` |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
# Load the model and tokenizer |
|
NEW_MODEL_ID = "huihui-ai/Moonlight-16B-A3B-Instruct-abliterated-Pruned" |
|
model = AutoModelForCausalLM.from_pretrained( |
|
NEW_MODEL_ID, |
|
device_map="auto", |
|
trust_remote_code=True, |
|
torch_dtype=torch.bfloat16 |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True) |
|
if tokenizer.pad_token is None: |
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
tokenizer.pad_token_id = tokenizer.eos_token_id |
|
|
|
# Initialize conversation context |
|
initial_messages = [ |
|
{"role": "system", "content": "You are a helpful assistant provided by Moonshot-AI."} |
|
] |
|
messages = initial_messages.copy() # Copy the initial conversation context |
|
|
|
# Enter conversation loop |
|
while True: |
|
# Get user input |
|
user_input = input("User: ").strip() # Strip leading and trailing spaces |
|
|
|
# If the user types '/exit', end the conversation |
|
if user_input.lower() == "/exit": |
|
print("Exiting chat.") |
|
break |
|
|
|
# If the user types '/clean', reset the conversation context |
|
if user_input.lower() == "/clear": |
|
messages = initial_messages.copy() # Reset conversation context |
|
print("Chat history cleared. Starting a new conversation.") |
|
continue |
|
|
|
# If input is empty, prompt the user and continue |
|
if not user_input: |
|
print("Input cannot be empty. Please enter something.") |
|
continue |
|
|
|
# Add user input to the conversation |
|
messages.append({"role": "user", "content": user_input}) |
|
|
|
tokenized_message = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True) |
|
response_token_ids = model.generate(tokenized_message['input_ids'].to("cuda:0"), use_cache=False, pad_token_id=tokenizer.pad_token_id, max_new_tokens=8192) |
|
generated_tokens =response_token_ids[:, len(tokenized_message['input_ids'][0]):] |
|
response = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0] |
|
|
|
# Add the model's response to the conversation |
|
messages.append({"role": "assistant", "content": response}) |
|
|
|
# Print the model's response |
|
print(f"Response: {response}") |
|
``` |
|
|
|
### Donation |
|
|
|
If you like it, please click 'like' and follow us for more updates. |
|
You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai. |
|
|
|
##### Your donation helps us continue our further development and improvement, a cup of coffee can do it. |
|
- bitcoin: |
|
``` |
|
bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge |
|
``` |