File size: 2,439 Bytes
8d66eb1
 
 
 
 
 
 
 
 
 
9e98439
8d66eb1
 
 
 
9e98439
 
94dd40c
8d66eb1
e0a20cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f44afc0
e0a20cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f44afc0
e0a20cd
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: mit
datasets:
- google/boolq
language:
- en
metrics:
- bleu
base_model:
- google-t5/t5-base
pipeline_tag: text2text-generation
tags:
- question-generation
- education
- code
- boolean-questions
- text-generation-inference
library_name: transformers
---
# BoolQ T5

This repository contains a **T5-base** model fine-tuned on the [BoolQ dataset](https://huggingface.co/datasets/google/boolq) for generating true/false question-answer pairs. Leveraging T5’s text-to-text framework, the model can generate natural language questions and their corresponding yes/no answers directly from a given passage.

## Model Overview

Built with [PyTorch Lightning](https://www.pytorchlightning.ai/), this implementation streamlines training, validation, and hyperparameter tuning. By adapting the pre-trained **T5-base** model to the task of question generation and answer prediction, it effectively bridges comprehension and generation in a single framework.

## Data Processing

### Input Construction

Each input sample is formatted as follows:

```
truefalse: [answer] passage: [passage] </s>
```

### Target Construction

Each target sample is formatted as:

```
question: [question] answer: [yes/no] </s>
```

The boolean answer is normalized to “yes” or “no” to ensure consistency during training.

## Training Details

- **Framework:** PyTorch Lightning  
- **Optimizer:** AdamW with linear learning rate scheduling and warmup  
- **Batch Sizes:**  
  - Training: 6  
  - Evaluation: 6  
- **Maximum Sequence Length:** 256 tokens  
- **Number of Training Epochs:** 4

## Evaluation Metrics

The model’s performance was evaluated using BLEU scores for both the generated questions and answers. For question generation, the metrics are as follows:

| Metric  | Question |
|---------|----------|
| BLEU-1  | 0.5143   |
| BLEU-2  | 0.3950   |
| BLEU-3  | 0.3089   |
| BLEU-4  | 0.2431   |

*Note: These metrics offer a quantitative assessment of the model’s quality in generating coherent and relevant question-answer pairs.*

## How to Use

You can easily utilize this model for inference using the Hugging Face Transformers pipeline:

```python
from transformers import pipeline

generator = pipeline("text2text-generation", model="Fares7elsadek/boolq-t5-base-question-generation")

# Example inference:
input_text = "truefalse: [answer] passage: [Your passage here] </s>"
result = generator(input_text)
print(result)
```