metadata
license: mit
language:
- en
tags:
- canadian-immigration
- chatbot
- gemma
- fine-tuning
- instruction-tuning
- llm
model_name: gemma-2b-it-canada-immigration
finetuned_from: google/gemma-2b-it
base_model:
- google/gemma-2b-it
Gemma-2B-IT Fine-Tuned on Canadian Immigration Q&A
This model is a fine-tuned version of google/gemma-2b-it
, trained by Arash Ghezavati to specialize in answering questions about Canadian immigration, study permits, Express Entry, work visas, and PR pathways.
Model Details
- Base model:
google/gemma-2b-it
- Fine-tuned with: LoRA (Low-Rank Adaptation) on Q&A dataset
- Training type: Instruction-style tuning with
<|user|>
and<|assistant|>
prompts - Language: English π¬π§
- License: MIT
- Trained by: Arash Ghezavati
π Dataset
Fine-tuned on a custom dataset created from real Canadian immigration content sourced from:
- canada.ca
- alberta.ca
- cic.gc.ca
- Other provincial and legal sources
π§Ό Dataset Format
Each entry is formatted as:
{
"messages": [
{"role": "system", "content": "You are a helpful assistant providing information from Canadian immigration and government programs."},
{"role": "user", "content": "What are the PR options for international students?"},
{"role": "assistant", "content": "International students can apply for PR through the Canadian Experience Class, Provincial Nominee Programs, and more..."}
]
}
Use Cases
β Direct Use
- Ideal for bots answering immigration-related questions.
- Used in production in Canada Immigration API Space.
π« Out-of-Scope Use
- Not suitable for legal decision-making or replacing certified immigration consultants.
- Not intended for multilingual queries (English only).
π Training Details
- Epochs: 3
- Batch size: 2
- Learning rate: 3e-4
- Optimizer: AdamW
- Adapter: LoRA (q_proj, v_proj modules)
- Frameworks: Transformers, PEFT, TRL
- Compute: Google Colab Pro (1 GPU)
π Evaluation
Manual testing across ~800 immigration Q&A examples showed:
- β Accurate extraction of information.
- β Context-specific answers.
- β Smooth conversational responses.
π§ͺ Example Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("arashGh/gemma-2b-it-canada-immigration")
tokenizer = AutoTokenizer.from_pretrained("arashGh/gemma-2b-it-canada-immigration")
input_text = "Can I work more than 24 hours per week as a student?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π Environmental Impact
- Trained on: Google Colab (1x A100 GPU)
- Time used: ~3 hours
- Carbon Estimate: Low (light fine-tuning)
π€ Author
- Name: Arash Ghezavati
- Location: Vancouver, Canada
- Profile: huggingface.co/arashGh
π Acknowledgements
Thanks to Google for releasing the Gemma base model and Hugging Face for providing the hosting and training tools.