Qwen 0.6B Books Intent
This model is a fine-tuned version of unsloth/Qwen3-0.6B on the empathyai/books-intent-dataset dataset. It has been trained using TRL.
It has been trained on classification task of short queries about the Project Gutenberg catalog to a set of predefined intents.
The goal is to replace LLMs with smaller models for low latency and high scalable services, while achieving high quality and accuracy on the domain.
Quick start
You must format the query to classify with the template below:
from transformers import pipeline
# Define instruction templates
QUERY_PROMPT_INTRODUCTION = """You're an expert in Project Gutenberg. Project Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, as well as to "encourage the creation and distribution of eBooks. Most of the items in its collection are the full texts of books or individual stories in the public domain. Your main focus is to extract user intent."""
QUERY_PROMPT_TASK = """## Task
Given user input and context, extract the intent.
* Consider user intent:
* search_book: The user is looking for a specific book.
* search_author: The user is looking for a specific author or its biography.
* search_category: The user is looking for books of a category.
* recommendation: User is looking for books suggestions, either similar to a title or from the same author.
* novelties: User is looking for recently added books to the Project Gutenberg. Note that this is not the same as 'new books' in general, but rather books that have been added to the Project Gutenberg collection recently.
* general_questions: The user is asking general questions about books, authors, or the Project Gutenberg collection. This includes questions like 'What are the characters in this book?' or 'What is the are some interesting details about that author?'.
* out_of_domain: The user is asking something that is not related to books, the Project Gutenberg or its collection, like harmful requests or 'What's the weather like?'.
The result must be only a JSON with the following format:
{
"chat_context": "refinement|new_request",
"intent": "extracted_intent"
}
"""
def format_query(query:str)->str:
return f"""{QUERY_PROMPT_INTRODUCTION}
{QUERY_PROMPT_TASK}
## Input
{query}
## Response
"""
## TODO: force disable thinking in chat template
question = format_query("who wrote frankenstein?")
generator = pipeline("text-generation", model="None", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
Training procedure
This model was trained with the SFT and Unsloth libraries.
Training Details
- Framework: PyTorch
- Base Model: unsloth/Qwen3-0.6B
- Dataset: empathyai/books-intent-dataset
- Infrastructure: 1 x L40S Nvidia GPU
- Training time: 11~ hours
- Hyperparameters:
- Learning Rate: 2e-5
- Weight Decay: 0.01
- Batch Size: 64 (per device)
- Gradient Accumulation Steps: 1
- Number of Epochs: 3
- Optimizer: AdamW (8-bit)
- Scheduler: Linear
- Max Gradient Norm: 1.0
- Seed: 3407
LoRA Configuration
- LoRA Rank (r): 64
- Target Modules:
q_proj
,k_proj
,v_proj
,o_proj
,gate_proj
,up_proj
,down_proj
- LoRA Alpha: 64
- LoRA Dropout: 0
- Bias: None
- Gradient Checkpointing: Disabled
Log details
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
\\ /| Num examples = 421,353 | Num Epochs = 3 | Total steps = 19,752
O^O/ \_/ \ Batch size per device = 64 | Gradient accumulation steps = 1
\ / Data Parallel GPUs = 1 | Total batch size (64 x 1 x 1) = 64
"-____-" Trainable parameters = 40,370,176/636,420,096 (6.34% trained)
Peak reserved memory = 4.881 GB.
Peak reserved memory for training = 3.453 GB.
Peak reserved memory % of max memory = 10.963 %.
Peak reserved memory for training % of max memory = 7.756 %.
Metrics
The following are metrics on a sample of the test split. We use the LLM in a classifier task by parsing the output as JSON and extracting the intent
field.
Intent | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
general_questions | 1.0 | 1.0 | 1.0 | 205.0 |
novelties | 1.0 | 1.0 | 1.0 | 49.0 |
out_of_domain | 1.0 | 1.0 | 1.0 | 56.0 |
recommendation | 1.0 | 1.0 | 1.0 | 211.0 |
search_author | 1.0 | 0.9915 | 0.9957 | 118.0 |
search_book | 0.9956 | 1.0 | 0.9978 | 228.0 |
search_category | 1.0 | 1.0 | 1.0 | 133.0 |
accuracy | 0.999 | 0.999 | 0.999 | 0.999 |
macro avg | 0.9994 | 0.9988 | 0.9991 | 1000.0 |
weighted avg | 0.9990 | 0.999 | 0.9990 | 1000.0 |
Framework versions
- TRL: 0.15.2
- Transformers: 4.51.3
- Pytorch: 2.7.0
- Datasets: 3.5.1
- Tokenizers: 0.21.1
Model Usage
This model is designed for intent classification in the Project Gutenberg domain. As such, it may not scale well for broader domains or tasks.
Limitations
The model may not generalize well to tasks outside its training domain. See the dataset notes on bias and limitations.
Citations
Project Gutenberg. (n.d.). Retrieved May, 2025, from www.gutenberg.org.
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
- Downloads last month
- 0