wangyiqun/qwen25_3b_instruct_lora_vulgarity_finetuned

README

Project Overview

Yo! You're looking at a sick project where we've finetuned the Qwen 2.5 3B model using LoRA with a dirty language corpus. Yeah, you heard it right, we're taking this language model to a whole new level of sass!

What's LoRA?

LoRA, or Low-Rank Adaptation, is like a magic trick for large language models. Instead of finetuning the entire massive model, which is as expensive as buying a spaceship, LoRA only tweaks a small part of it. It's like fixing a small engine in a big plane.

The core formula of LoRA is:

$\Delta W = BA$

Here, $W$ is the original weight matrix of the model. $\Delta W$ is the low-rank update to $W$. $B$ and $A$ are two low-rank matrices. By training these two small matrices, we can achieve a similar effect as finetuning the whole $W$. It's efficient, it's fast, and it's like a cheat code for model finetuning!

Code Explanation

Let's break down the provided code:

Model and Tokenizer Loading:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Check for GPU availability
device = "cuda" if torch.cuda.is_available() else "cpu"

# Model name
model_name = "Qwen/Qwen2.5-3B-Instruct"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to(device)
# Load the LoRA model
lora_model = PeftModel.from_pretrained(base_model, "./qwen25_3b_instruct_lora_vulgarity_finetuned")

This part loads the Qwen 2.5 3B model and its tokenizer. Then it applies the LoRA adaptation to the base model using the finetuned LoRA weights.

Inference Example:

input_text = "Hello"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(lora_model.device)
output = lora_model.generate(input_ids, max_new_tokens=50, do_sample=True, top_p=0.95, temperature=0.35)
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(output_text)

This is a simple inference example. It takes an input text, converts it to input IDs, generates an output using the finetuned model, and then decodes the output to text.

Gradio Interface:

import gradio as gr

def chatbot(input_text, history):
    # Chatbot logic here
    ...

iface = gr.Interface(
    fn=chatbot,
    inputs=[gr.Textbox(label="输入你的问题"), gr.State()],
    outputs=[gr.Chatbot(label="聊天历史"), gr.State()],
    title="Qwen2.5-finetune-骂人专家",
    description="Qwen2.5-finetune-骂人专家"
)

iface.launch(share=True, inbrowser=False, debug=True)

This creates a Gradio interface for the chatbot. Users can input text, and the chatbot will respond based on the finetuned model.

How to Run

Make sure you have all the necessary libraries installed. You can install them using pip:
```
pip install torch transformers peft gradio
```
Place your finetuned LoRA weights in the ./qwen25_3b_instruct_lora_vulgarity_finetuned directory.
Run the Python script. It will start the Gradio server, and you can access the chatbot through the provided link.

Warning

This project uses a dirty language corpus for finetuning. Please use it responsibly and don't let it loose in a polite society!

That's it, folks! You're now ready to unleash the power of this finetuned Qwen 2.5 model. Have fun!