Qwen3-4B-ft-bf16

Qwen3-4B-ft-bf16 is a fine-tuned, moderately abliterated version of the Qwen3-4B model. Designed for enhanced context awareness and controlled expressiveness, this model balances precision with creativity across a wide range of tasks—from complex reasoning to natural dialogue, code generation, and multilingual understanding.

Key Features:

Improved Context Awareness
Retains and utilizes long-range contextual information effectively, making it ideal for long-form conversations, document understanding, and summarization tasks.
Moderate Abliteration
Introduces measured behavioral flexibility that enhances creativity and adaptability while maintaining reliability, alignment, and safety in outputs.
Dual Thinking Modes
Supports dynamic switching between thinking mode (for math, logic, and coding) and non-thinking mode (for general-purpose conversations), ensuring optimal task matching.
Multilingual Mastery
Excels in over 100 languages and dialects for translation, multilingual chat, and cross-lingual reasoning.
Tool-Ready Agent Capabilities
Designed to integrate with tool APIs and complex workflows, with consistent performance in both thinking and non-thinking contexts.

Quickstart with Hugging Face Transformers🤗

pip install transformers==4.51.3
pip install huggingface_hub[hf_xet]

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "prithivMLmods/Qwen3-4B-ft-bf16"

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Define input
prompt = "Describe how renewable energy impacts economic development."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate output
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# Parse thinking content
try:
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip()
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip()

print("thinking content:", thinking_content)
print("content:", content)

Best Practices

Sampling Settings:
- Thinking mode: temperature=0.6, top_p=0.95, top_k=20
- Non-thinking mode: temperature=0.7, top_p=0.8, top_k=20
Token Length:
- Standard: 32768 tokens
- Extended Reasoning Tasks: up to 38912 tokens
Prompt Design:
- Math Problems: Add "Please reason step by step, and put your final answer within \boxed{}."
- MCQs: Format answers as {"answer": "B"} for easy parsing.
- Multi-turn: Omit thinking logs in conversation history for cleaner context.

prithivMLmods
/

Qwen3-4B-ft-bf16

Qwen3-4B-ft-bf16

Key Features:

Quickstart with Hugging Face Transformers🤗

Best Practices

Model tree for prithivMLmods/Qwen3-4B-ft-bf16