fares7elsadek's picture
Update README.md
f44afc0 verified
|
raw
history blame
2.44 kB
metadata
license: mit
datasets:
  - google/boolq
language:
  - en
metrics:
  - bleu
base_model:
  - google-t5/t5-base
pipeline_tag: text2text-generation
tags:
  - question-generation
  - education
  - code
  - boolean-questions
  - text-generation-inference
library_name: transformers

BoolQ T5

This repository contains a T5-base model fine-tuned on the BoolQ dataset for generating true/false question-answer pairs. Leveraging T5’s text-to-text framework, the model can generate natural language questions and their corresponding yes/no answers directly from a given passage.

Model Overview

Built with PyTorch Lightning, this implementation streamlines training, validation, and hyperparameter tuning. By adapting the pre-trained T5-base model to the task of question generation and answer prediction, it effectively bridges comprehension and generation in a single framework.

Data Processing

Input Construction

Each input sample is formatted as follows:

truefalse: [answer] passage: [passage] </s>

Target Construction

Each target sample is formatted as:

question: [question] answer: [yes/no] </s>

The boolean answer is normalized to “yes” or “no” to ensure consistency during training.

Training Details

  • Framework: PyTorch Lightning
  • Optimizer: AdamW with linear learning rate scheduling and warmup
  • Batch Sizes:
    • Training: 6
    • Evaluation: 6
  • Maximum Sequence Length: 256 tokens
  • Number of Training Epochs: 4

Evaluation Metrics

The model’s performance was evaluated using BLEU scores for both the generated questions and answers. For question generation, the metrics are as follows:

Metric Question
BLEU-1 0.5143
BLEU-2 0.3950
BLEU-3 0.3089
BLEU-4 0.2431

Note: These metrics offer a quantitative assessment of the model’s quality in generating coherent and relevant question-answer pairs.

How to Use

You can easily utilize this model for inference using the Hugging Face Transformers pipeline:

from transformers import pipeline

generator = pipeline("text2text-generation", model="Fares7elsadek/boolq-t5-base-question-generation")

# Example inference:
input_text = "truefalse: [answer] passage: [Your passage here] </s>"
result = generator(input_text)
print(result)