OpenAI GPT-2 355M
Model description
This custom GPT-2 model is derived from the gpt2-medium model and trained on the Alpaca dataset. Anezatra team meticulously trained this model on the Alpaca dataset for natural language processing tasks. The model excels in text generation and language understanding tasks, making it ideal for chat applications.
Training Procedure
This model was trained with 4 x A100 GPUs
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 128
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.15
- num_epochs: 1
- Downloads last month
- 20
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.