Nova-Nox-Neural-Network

All images used are created by Rikka Botan.

Flash Technical Report (Japanese)

https://qiita.com/peony_snow/items/8ae4e83b8de5c342ab62

About

N4: Nova-Nox-Neural-Network is a mechanism designed to enhance accuracy by integrating the self-referential capabilities of the Attention mechanism with a simplified version of the selective copying architecture inspired by S6, thereby enabling the acquisition of a more expressive QK matrix.

The architecture employs ASGG: Adaptive Swish-GELU Gating as the activation function within its MLP components, contributing to richer representational capacity.

Furthermore, it utilizes DyT for normalization, which improves computational efficiency.

Key Features

A simplified Selective copying mechanism
ASGG: Adaptive Swish-GELU Gating + MLP
DyT: Dynamic Tanh Normalization

training result

Training Setting
Parameters：127M
(vocab_size=32768, hidden_size=768, inter_size=1536, heads=6, layers=18)
Optimizer: AdamW
(lr=6e-4, betas=(0.9, 0.95), eps=1e-9, weight_decay=1e-1, warmup_steps=2000)
batch size: 8
accumlation: 16
dataset: fineweb (0.5B token, 1 epoch: 976 steps)
max length: 512
dtype: bfloat16

Implementation and License

This repository is official pure pytorch implementation.

Licensed under "MIT License".

Commercial use permitted

How to use

Clone the repository

git clone https://github.com/Rikka-Botan/Nova-Nox-Neural-Network.git

Import necessary libraries

import torch
from torch import nn
import torch.nn.functional as F
from model.N4_modeling import N4C

Model create

"""
Args:
hidden_size: int - model hidden size,
inter_size: int - model mlp intermediate size,
vocab_size : int - tokenizer vocab num,
heads: int - heads num,
layers: int - N4D(Decoder) layers num
"""

hidden_size = 768
intermediate_size = 3072
vocab_size = 32064
heads = 6
layers = 6

model = N4C(
  hidden_size,
  intermediate_size,
  vocab_size,
  heads,
  layers
)
output = model(tokenized_text)

How to Train

training code

from torch.optim import AdamW

optimizer = AdamW(
  model.parameters(),
  lr=6.0e-4,
  betas=(0.9, 0.95),
  eps=1e-8,
  weight_decay=1e-1
)

for batch in dataloader:
  optimizer.zero_grad()
  batch = batch.to(device)
  loss = model.to(device)(input=batch, labels=batch)[1]
  loss.backward()
  optimizer.step()

How to inference

inference code

# N4: Nova Nox Neural Network inference
# coding=utf-8
# Copyright 2025 Rikka Botan. All rights reserved
# Licensed under the "MIT License"

import torch
from transformers import AutoTokenizer
import os
from model.n4_modeling import N4C

model_name = "mistralai/Mistral-7B-v0.3"
tokenizer = AutoTokenizer.from_pretrained(model_name)
cwd=os.path.abspath('your model path')
model = N4C(
    vocab_size=32768,
    hidden_size=768,
    inter_size=1536,
    heads=6,
    layers=18,
    bias=False
)
state_dict = torch.load(os.path.join(cwd, 'N4_test_model.bin'), weights_only=True)
model.load_state_dict(state_dict, strict=False)
model = model.to('cpu')
model.eval()
text = "Large Language Models (LLMs) are advanced artificial intelligence systems designed to"
inputs = tokenizer(text, return_tensors='pt')
output = model.generate_n4c(
    input_ids=inputs["input_ids"].to('cpu'),
    max_new_tokens=128
    temperature: float = 0.7,
    top_k: int = 10,
    top_p: float = 2,
    eos_token_id: int = 2)
for token in inputs['input_ids']:
    print(tokenizer.decode(token), end=" ")
for token in output:
    print(tokenizer.decode(token), end=" ", flush=True)

Acknowledgements

I thank the developers of python and pytorch.

I thank all the researchers for their efforts to date.

I thank Japan's high standard of education.

And most of all, thank you for your interest in this repository.

Citations

I would be happy to include a citation at the end, but it is not required.

Feel free to use this model.

Contact Us

My X account

About Author

Rikka Botan

Japanese independent researcher having shy and pampered personality >_<

Twin-tail hair is a charm point :)

Interested in natural language processings.

Usually using python and C.

RikkaBotan
/

Nova-Nox-Neural-Network_test_127m_0.5B_fineweb