TheHS-V1-7B-8E
Introduction
TheHS-V1-7B-8E is the first LLM released by thehosy.
Features:
- Type: Causal Language Models
- Training Stage: Pretraining
- Architecture: Qwen3 MoE
- Number of Parameters: 7.52B
- Number of Layers: 28
- Number of Attention Heads (GQA): 24 for Q and 4 for KV
- Context Length: Full 16384 tokens and generation 8192 tokens
We do not recommend using base language models for conversations. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.
Requirements
The code of TheHS-V1-7B-8E has been in the latest Hugging face transformers
and we advise you to use the latest version of transformers
.
Citation
...
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support