TheHS-V1-7B-8E

Introduction

TheHS-V1-7B-8E is the first LLM released by thehosy.

Features:

Type: Causal Language Models
Training Stage: Pretraining
Architecture: Qwen3 MoE
Number of Parameters: 7.52B
Number of Layers: 28
Number of Attention Heads (GQA): 24 for Q and 4 for KV
Context Length: Full 16384 tokens and generation 8192 tokens

We do not recommend using base language models for conversations. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.

Requirements

The code of TheHS-V1-7B-8E has been in the latest Hugging face transformers and we advise you to use the latest version of transformers.

Citation

...

thehosy
/

TheHS-V1-7B-8E-Base

You need to agree to share your contact information to access this model

TheHS-V1-7B-8E

Introduction

Requirements

Citation