tiny_shakespeare_transformer

A small Transformer Decoder model trained from scratch on the Tiny Shakespeare dataset.

Training details

  • Dataset: Tiny Shakespeare
  • Epochs: 5
  • Learning Rate: 0.0003
  • Batch Size: 32
  • Block Size: 128
  • Optimizer: AdamW
  • Loss Function: CrossEntropyLoss
  • Dropout Rate: 0.1
  • Embedding Dimension: 256
  • Number of Layers: 6
  • Number of Attention Heads: 8

Usage

To use this model, simply load it using the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("NataliaH/tiny_shakespeare_transformer")
tokenizer = AutoTokenizer.from_pretrained("NataliaH/tiny_shakespeare_transformer")

# Encode input text
inputs = tokenizer("Once upon a time", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

Model Architecture

This model is a Transformer Decoder-based architecture, optimized for text generation. It was trained on the Tiny Shakespeare dataset to generate Shakespeare-like text.

Training Process

  • Training was performed for 5 epochs.
  • The model uses AdamW optimizer with a learning rate of 0.0003.
  • Dropout rate during training was set to 0.1 to reduce overfitting.

License

This model is released under the MIT License.

Downloads last month
19
Safetensors
Model size
124M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support