MathGPT-2 (distilgpt2 Fine-Tuned for Arithmetic)

This model is a fine-tuned version of DistilGPT-2 on a custom dataset consisting exclusively of arithmetic problems and their answers. The goal of this model is to act as a calculator that can solve basic arithmetic problems.

Model Description

The model was trained using a dataset of simple arithmetic expressions, including addition, subtraction, multiplication, and division. The training data was generated using Python and ensured to have no duplicate expressions.

Key Features:

  • Solves basic arithmetic (addition, subtraction, multiplication, division)
  • Can handle simple problems like 12 + 5 =
  • Fine-tuned version of distilgpt2 on a math-specific dataset
  • Trained for 10 epochs (further improvements can be made by training for more epochs)

Model Details

  • Model architecture: DistilGPT-2
  • Training duration: 10 epochs (could be improved further)
  • Dataset: Generated math expressions like 12 + 5 = 17
  • Tokenization: Standard GPT-2 tokenizer
  • Fine-tuned on: Simple arithmetic operations

Intended Use

This model is designed to:

  • Answer basic arithmetic problems (addition, subtraction, multiplication, division).
  • It can generate answers for simple problems like 12 * 6 = ?.

Example:

Input:

13 + 47 =

Output:

60

Benchmark Results

We evaluated the model using a set of 10000 randomly generated math expressions to assess its performance. Here are the results:

  • Accuracy: 76.3%
  • Average Inference Time: 0.1448 seconds per question

Training Data

The training dataset was generated using Python, consisting of random arithmetic expressions (addition, subtraction, multiplication, division) between numbers from 1 to 100. The expressions were formatted as:

2 + 3 = 5
100 - 25 = 75
45 * 5 = 225
100 / 25 = 4

No duplicate expressions were used, ensuring the model learns unique patterns.

Fine-Tuning

This model was fine-tuned from the distilgpt2 base model for 100 epochs.


Limitations

  • Basic Arithmetic Only: The model can only handle basic arithmetic problems like addition, subtraction, multiplication, and division. It does not handle more complex operations like exponentiation, logarithms, or advanced algebra.
  • Limited Training Duration: While trained for 10 epochs, more epochs or data diversity may improve the model's performance further.
  • No real-time validation: The model's performance varies, and there are still inaccuracies in answers for some problems.
Downloads last month
37
Safetensors
Model size
81.9M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for FlameF0X/MathGPT2

Finetuned
(687)
this model

Space using FlameF0X/MathGPT2 1

Collection including FlameF0X/MathGPT2