|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- roneneldan/TinyStories |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Card for amusktweewt/tiny-stories-v1 |
|
|
|
This model is a custom transformer-based language model trained on the **TinyStories** dataset, designed for creative text generation tasks such as storytelling and conversational agents. **This model is purely an academic project and should not be used in production or practical applications.** |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model utilizes a custom tokenizer with Byte Pair Encoding (BPE) and has been trained with a smaller architecture to balance efficiency and performance. It is designed for generating coherent and contextually relevant short stories. However, a known issue with the tokenizer causes spaces between tokens to appear repeated, leading to suboptimal text output quality. |
|
|
|
- **Developed by:** amusktweewt |
|
- **Model type:** AutoModelForCausalLM |
|
- **Language(s) (NLP):** English |
|
- **License:** MIT |
|
|
|
### Model Sources |
|
|
|
- **Repository:** HuggingFace repository |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model is intended for academic and research purposes only. It demonstrates a proof of concept for training smaller transformer-based language models. |
|
|
|
### Out-of-Scope Use |
|
|
|
- Not suitable for tasks requiring factual accuracy |
|
- Should not be used in production environments or applications involving sensitive content |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
### Risks and Biases |
|
|
|
The model may reflect biases present in the training data, leading to unintended or inappropriate outputs. Additionally, the tokenizer issue can result in suboptimal and incoherent text generations. |
|
|
|
### Recommendations |
|
|
|
This model is meant for research and demonstration purposes. Users should validate outputs critically and avoid using it for practical applications. |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, PreTrainedTokenizerFast |
|
|
|
model = AutoModelForCausalLM.from_pretrained("amusktweewt/tiny-stories-v1") |
|
tokenizer = PreTrainedTokenizerFast.from_pretrained("amusktweewt/tiny-stories-v1") |
|
|
|
prompt = "Once upon a time," |
|
inputs = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False) |
|
outputs = model.generate(**inputs, max_new_tokens=50) |
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained on the **TinyStories** dataset, consisting of curated short stories. Preprocessing ensured consistent formatting and tokenization using a custom BPE tokenizer. |
|
|
|
### Training Procedure |
|
|
|
#### Preprocessing |
|
|
|
- Used BPE tokenizer with a vocabulary size of 4096 |
|
- Included special tokens: `<sos>`, `<pad>`, `<|endoftext|>`, and `<unk>` |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Batch size:** 64 |
|
- **Epochs:** 3 |
|
- **Learning rate:** 1e-3 |
|
- **Scheduler:** Cosine annealing |
|
- **Precision:** Mixed precision (FP16) |
|
|
|
#### Speeds, Sizes, Times |
|
|
|
- **Training time:** Approx. 5 hours 30 minutes |
|
- **Model size:** 230 MB |
|
- **Dataset size:** 535.98 million tokens |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
A subset of the training data was used for evaluation, focusing on coherence and storytelling quality. |
|
|
|
#### Metrics |
|
|
|
- **Loss**: 0.9723 |
|
- **Qualitative Evaluation**: Manual assessment of generated outputs for coherence and relevance. |
|
|
|
### Results |
|
|
|
- **Sample Outputs:** |
|
- Prompt: "in a far away country" |
|
Completion: "in a far away coun try . He was so excited to explore the world . He was so happy to be able to explore the world ." |
|
|
|
#### Summary |
|
|
|
The model generates coherent short stories suitable for research demonstration but is limited by tokenizer issues and should not be used in real-world scenarios. |
|
|
|
## Environmental Impact |
|
|
|
- **Hardware Type:** NVIDIA 4090 GPU |
|
- **Hours used:** 5.5 |
|
- **Carbon Emitted:** Approx. 0.2 kg CO2 eq |
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
|
|
- Transformer architecture with 8 layers, 12 attention heads, and a hidden size of 768 |
|
|
|
### Compute Infrastructure |
|
|
|
#### Hardware |
|
|
|
- Single GPU (NVIDIA 4090) |
|
|
|
#### Software |
|
|
|
- Python 3.8+ |
|
- HuggingFace Transformers 4.x |
|
- PyTorch 1.x |
|
|
|
## Model Card Authors |
|
|
|
amusktweewt |
|
|
|
## Model Card Contact |
|
|
|
For questions or feedback, contact amusktweewt. |