Gemma 2 English to Luxembourgish model card

Model Information

Summary description and brief definition of inputs and outputs.

Description

A fine-tuned version for Enlgish to Luxembourgish translation, dataset from news and dictionary examples.

Usage

Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:

pip install -U transformers

Then, copy the snippet from the section that is relevant for your usecase.

Running with the `pipeline` API

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="etamin/Letz-MT-gemma2-2b-en-lb",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # replace with "mps" to run on a Mac device
)

messages = [
    {"role": "user", "content": """
          <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful AI assistant for translation.<|eot_id|>
          <|start_header_id|>user<|end_header_id|>\n\nTranslate the English input text into Luxembourgish.
          Do not include any additional information or unrelated content.\n\n
          "I have not met you yet."
          """},
]

outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)
# ech sinn em d'éinescht nach begéint

Translation Template

The instruction-tuned models use a chat template that must be adhered to for conversational use. The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.

Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "etamin/Letz-MT-gemma2-2b-en-lb"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=dtype,)

chat = [
    { "role": "user", "content": """
          <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful AI assistant for translation.<|eot_id|>
          <|start_header_id|>user<|end_header_id|>\n\nTranslate the English input text into Luxembourgish.
          Do not include any additional information or unrelated content.\n\n
          "I have not met you yet."
          """ },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

After the prompt is ready, generation can be performed like this:

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
print(tokenizer.decode(outputs[0]))

Inputs and outputs

Input: Text string, such as a prompt to do the translation task.
Output: Generated Luxembourgish text in response to the input.

Citation

@misc{song2025llmsilverbulletlowresource,
      title={Is LLM the Silver Bullet to Low-Resource Languages Machine Translation?}, 
      author={Yewei Song and Lujun Li and Cedric Lothritz and Saad Ezzini and Lama Sleem and Niccolo Gentile and Radu State and Tegawendé F. Bissyandé and Jacques Klein},
      year={2025},
      eprint={2503.24102},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.24102}, 
}

Model Data

Data used for model training and how the data was processed.

Training Dataset

RTL (https://www.rtl.lu/) Lod.lu (https://data.public.lu/en/datasets/letzebuerger-online-dictionnaire-lod-linguistesch-daten/)

etamin
/

Letz-MT-gemma2-2b-en-lb

Gemma 2 English to Luxembourgish model card

Model Information

Description

Usage

Running with the `pipeline` API

Translation Template

Inputs and outputs

Citation

Model Data

Training Dataset

Model tree for etamin/Letz-MT-gemma2-2b-en-lb

Gemma 2 English to Luxembourgish model card

Model Information

Description

Usage

Running with the pipeline API

Translation Template

Inputs and outputs

Citation

Model Data

Training Dataset

Model tree for etamin/Letz-MT-gemma2-2b-en-lb

Running with the `pipeline` API