StepLaw
/

StepLaw-N_1.0B-D_1.0B-LR1.381e-03-BS524288

Text Generation

Model card Files Files and versions Community

StepLaw-N_1.0B-D_1.0B-LR1.381e-03-BS524288 / config.json

zhengwenzhen's picture

Upload config.json with huggingface_hub

fd662f3 verified about 1 month ago

history blame contribute delete

464 Bytes

	{
	"architectures": [
	"Step1MoEForCausalLM"
	],
	"model_type": "step1",
	"hidden_size": 2048,
	"intermediate_size": 8192,
	"num_attention_heads": 16,
	"num_attention_groups": 16,
	"num_hidden_layers": 16,
	"max_seq_len": 65536,
	"vocab_size": 65536,
	"rms_norm_eps": 1e-05,
	"torch_dtype": "bfloat16",
	"moe_every_n_layer": 64,
	"moe_intermediate_size": 4096,
	"moe_num_experts": 8,
	"moe_top_k": 2,
	"use_moe": true,
	"moe_layer_offset": 1
	}