|
--- |
|
license: llama3.1 |
|
datasets: |
|
- TheFinAI/Fino1_Reasoning_Path_FinQA |
|
language: |
|
- en |
|
base_model: |
|
- TheFinAI/Fino1-8B |
|
tags: |
|
- Llama |
|
- conversational |
|
- finance |
|
--- |
|
# Fino1-8B Quantized Models |
|
|
|
This repository contains Q4_KM and Q5_KM quantized versions of [TheFinAI/Fino1-8B](https://huggingface.co/TheFinAI/Fino1-8B), a financial reasoning model based on Llama 3.1 8B Instruct. These quantized variants maintain the model's financial reasoning capabilities while providing significant memory and speed improvements. |
|
|
|
Discover our full range of quantized language models by visiting our [SandLogic Lexicon HuggingFace](https://huggingface.co/SandLogicTechnologies). To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com/). |
|
|
|
## Model Details |
|
|
|
### Base Information |
|
- **Original Model**: Fino1-8B |
|
- **Quantized Versions**: |
|
- Q4_KM (4-bit quantization) |
|
- Q5_KM (5-bit quantization) |
|
- **Base Architecture**: Llama 3.1 8B Instruct |
|
- **Primary Focus**: Financial reasoning tasks |
|
- **Paper**: [arxiv.org/abs/2502.08127](https://arxiv.org/abs/2502.08127) |
|
|
|
|
|
## π° Financial Capabilities |
|
|
|
Both quantized versions maintain the original model's strengths in: |
|
- Financial mathematical reasoning |
|
- Structured financial question answering |
|
- FinQA dataset-based problems |
|
- Step-by-step financial calculations |
|
- Financial document analysis |
|
### Quantization Benefits |
|
|
|
#### Q4_KM Version |
|
- Model size: 4.92 GB (75% reduction) |
|
- Optimal for resource-constrained environments |
|
- Faster inference speed |
|
- Suitable for rapid financial calculations |
|
|
|
#### Q5_KM Version |
|
- Model size: 5.73 GB (69% reduction) |
|
- Better quality preservation |
|
- Balanced performance-size trade-off |
|
- Recommended for precision-critical financial applications |
|
|
|
|
|
|
|
|
|
## π Usage |
|
```bash |
|
pip install llama-cpp-python |
|
``` |
|
Please refer to the llama-cpp-python [documentation](https://llama-cpp-python.readthedocs.io/en/latest/) to install with GPU support. |
|
|
|
|
|
```python |
|
from llama_cpp import Llama |
|
|
|
llm = Llama( |
|
model_path="model/path/", |
|
verbose=False, |
|
# n_gpu_layers=-1, # Uncomment to use GPU acceleration |
|
# n_ctx=2048, # Uncomment to increase the context window |
|
) |
|
|
|
# Example of a reasoning task |
|
output = llm( |
|
"""Q: A company's revenue grew from $100,000 to $150,000 in one year. |
|
Calculate the percentage growth rate. A: """, |
|
max_tokens=256, |
|
stop=["Q:", "\n\n"], |
|
echo=False |
|
) |
|
|
|
print(output["choices"][0]["text"]) |
|
|
|
``` |
|
|
|
## Training Details |
|
|
|
### Original Model Training |
|
- **Dataset**: TheFinAI/Fino1_Reasoning_Path_FinQA |
|
- **Methods**: SFT (Supervised Fine-Tuning) and RF |
|
- **Hardware**: 4xH100 GPUs |
|
- **Configuration**: |
|
- Batch Size: 16 |
|
- Learning Rate: 2e-5 |
|
- Epochs: 3 |
|
- Optimizer: AdamW |
|
|
|
|
|
|
|
|