File size: 4,256 Bytes
08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 08bc781 5b76e09 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
library_name: transformers
license: mit
datasets:
- roneneldan/TinyStories
language:
- en
---
# Model Card for amusktweewt/tiny-stories-v1
This model is a custom transformer-based language model trained on the **TinyStories** dataset, designed for creative text generation tasks such as storytelling and conversational agents. **This model is purely an academic project and should not be used in production or practical applications.**
## Model Details
### Model Description
This model utilizes a custom tokenizer with Byte Pair Encoding (BPE) and has been trained with a smaller architecture to balance efficiency and performance. It is designed for generating coherent and contextually relevant short stories. However, a known issue with the tokenizer causes spaces between tokens to appear repeated, leading to suboptimal text output quality.
- **Developed by:** amusktweewt
- **Model type:** AutoModelForCausalLM
- **Language(s) (NLP):** English
- **License:** MIT
### Model Sources
- **Repository:** HuggingFace repository
## Uses
### Direct Use
This model is intended for academic and research purposes only. It demonstrates a proof of concept for training smaller transformer-based language models.
### Out-of-Scope Use
- Not suitable for tasks requiring factual accuracy
- Should not be used in production environments or applications involving sensitive content
## Bias, Risks, and Limitations
### Risks and Biases
The model may reflect biases present in the training data, leading to unintended or inappropriate outputs. Additionally, the tokenizer issue can result in suboptimal and incoherent text generations.
### Recommendations
This model is meant for research and demonstration purposes. Users should validate outputs critically and avoid using it for practical applications.
## How to Get Started with the Model
```python
from transformers import AutoModelForCausalLM, PreTrainedTokenizerFast
model = AutoModelForCausalLM.from_pretrained("amusktweewt/tiny-stories-v1")
tokenizer = PreTrainedTokenizerFast.from_pretrained("amusktweewt/tiny-stories-v1")
prompt = "Once upon a time,"
inputs = tokenizer(prompt, return_tensors="pt", return_token_type_ids=False)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Details
### Training Data
The model was trained on the **TinyStories** dataset, consisting of curated short stories. Preprocessing ensured consistent formatting and tokenization using a custom BPE tokenizer.
### Training Procedure
#### Preprocessing
- Used BPE tokenizer with a vocabulary size of 4096
- Included special tokens: `<sos>`, `<pad>`, `<|endoftext|>`, and `<unk>`
#### Training Hyperparameters
- **Batch size:** 64
- **Epochs:** 3
- **Learning rate:** 1e-3
- **Scheduler:** Cosine annealing
- **Precision:** Mixed precision (FP16)
#### Speeds, Sizes, Times
- **Training time:** Approx. 5 hours 30 minutes
- **Model size:** 230 MB
- **Dataset size:** 535.98 million tokens
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
A subset of the training data was used for evaluation, focusing on coherence and storytelling quality.
#### Metrics
- **Loss**: 0.9723
- **Qualitative Evaluation**: Manual assessment of generated outputs for coherence and relevance.
### Results
- **Sample Outputs:**
- Prompt: "in a far away country"
Completion: "in a far away coun try . He was so excited to explore the world . He was so happy to be able to explore the world ."
#### Summary
The model generates coherent short stories suitable for research demonstration but is limited by tokenizer issues and should not be used in real-world scenarios.
## Environmental Impact
- **Hardware Type:** NVIDIA 4090 GPU
- **Hours used:** 5.5
- **Carbon Emitted:** Approx. 0.2 kg CO2 eq
## Technical Specifications
### Model Architecture and Objective
- Transformer architecture with 8 layers, 12 attention heads, and a hidden size of 768
### Compute Infrastructure
#### Hardware
- Single GPU (NVIDIA 4090)
#### Software
- Python 3.8+
- HuggingFace Transformers 4.x
- PyTorch 1.x
## Model Card Authors
amusktweewt
## Model Card Contact
For questions or feedback, contact amusktweewt. |