You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

TheHS-V1-7B-8E

Introduction

TheHS-V1-7B-8E is the first LLM released by thehosy.

Features:

  • Type: Causal Language Models
  • Training Stage: Pretraining
  • Architecture: Qwen3 MoE
  • Number of Parameters: 7.52B
  • Number of Layers: 28
  • Number of Attention Heads (GQA): 24 for Q and 4 for KV
  • Context Length: Full 16384 tokens and generation 8192 tokens

We do not recommend using base language models for conversations. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.

Requirements

The code of TheHS-V1-7B-8E has been in the latest Hugging face transformers and we advise you to use the latest version of transformers.

Citation

...

Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support