Gemma 2 English to Luxembourgish model card
Model Information
Summary description and brief definition of inputs and outputs.
Description
A fine-tuned version for Enlgish to Luxembourgish translation, dataset from news and dictionary examples.
Usage
Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
pip install -U transformers
Then, copy the snippet from the section that is relevant for your usecase.
Running with the pipeline
API
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="etamin/Letz-MT-gemma2-2b-en-lb",
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda", # replace with "mps" to run on a Mac device
)
messages = [
{"role": "user", "content": """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful AI assistant for translation.<|eot_id|>
<|start_header_id|>user<|end_header_id|>\n\nTranslate the English input text into Luxembourgish.
Do not include any additional information or unrelated content.\n\n
"I have not met you yet."
"""},
]
outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)
# ech sinn em d'éinescht nach begéint
Translation Template
The instruction-tuned models use a chat template that must be adhered to for conversational use. The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model_id = "etamin/Letz-MT-gemma2-2b-en-lb"
dtype = torch.bfloat16
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="cuda",
torch_dtype=dtype,)
chat = [
{ "role": "user", "content": """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful AI assistant for translation.<|eot_id|>
<|start_header_id|>user<|end_header_id|>\n\nTranslate the English input text into Luxembourgish.
Do not include any additional information or unrelated content.\n\n
"I have not met you yet."
""" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
After the prompt is ready, generation can be performed like this:
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
print(tokenizer.decode(outputs[0]))
Inputs and outputs
- Input: Text string, such as a prompt to do the translation task.
- Output: Generated Luxembourgish text in response to the input.
Citation
@misc{song2025llmsilverbulletlowresource,
title={Is LLM the Silver Bullet to Low-Resource Languages Machine Translation?},
author={Yewei Song and Lujun Li and Cedric Lothritz and Saad Ezzini and Lama Sleem and Niccolo Gentile and Radu State and Tegawendé F. Bissyandé and Jacques Klein},
year={2025},
eprint={2503.24102},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.24102},
}
Model Data
Data used for model training and how the data was processed.
Training Dataset
RTL (https://www.rtl.lu/) Lod.lu (https://data.public.lu/en/datasets/letzebuerger-online-dictionnaire-lod-linguistesch-daten/)
- Downloads last month
- 8
Model tree for etamin/Letz-MT-gemma2-2b-en-lb
Base model
google/gemma-2-2b