jalonso24's picture
Update README.md
f116e97 verified
|
raw
history blame contribute delete
3.91 kB
---
language: [es]
license: mit
tags:
- text-classification
- agriculture
- climate
- potato
- Peru
- Huancavelica
- LLaMA
- environmental-prediction
model-index:
- name: llama-lateblight-classifier
results:
- task:
type: text-classification
name: Potato Late Blight Risk Classification
dataset:
name: Huancavelica Late Blight Benchmark (Balanced)
type: tabular
metrics:
- name: Accuracy
type: accuracy
value: 0.97
- name: F1 (macro)
type: f1
value: 0.97
- name: Precision
type: precision
value: 0.97
- name: Recall
type: recall
value: 0.97
pipeline_tag: text-classification
library_name: transformers
---
# 🌾 LLaMA Late Blight Classifier (Huancavelica, Peru)
This model is a fine-tuned classifier based on `openlm-research/open_llama_3b`, trained to predict **potato late blight risk levels** (`Bajo`, `Moderado`, `Alto`) in the highlands of Huancavelica, Peru. It uses environmental inputs (temperature, humidity, precipitation) and crop variety metadata to output discrete classifications.
---
## 🤝 Use Case
**Direct Use**: Agronomic advisory systems or research tools predicting potato late blight risk from structured prompts or API queries.
**Not for**: Open-ended generation, conversational use, or regions with different pathogen pressures without retraining.
---
## 🌐 Model Details
- **Base model**: `openlm-research/open_llama_3b`
- **Architecture**: LLaMA-3B with classification head (`AutoModelForSequenceClassification`)
- **Fine-tuning method**: Full fine-tuning on a balanced, curated dataset (not LoRA)
- **Tokenizer**: Compatible LLaMA tokenizer (`tokenizer.model` included)
- **Language**: Spanish (with structured Spanish prompts)
- **Task**: Hard classification (3-class)
---
## 🎓 Training
- **Dataset**: 156 training + 24 validation examples (balanced across 3 classes)
- **Labels**: `Bajo`, `Moderado`, `Alto`
- **Format** (JSONL):
```json
{
"instruction": "Evalúa el riesgo de tizón tardío basado en los datos climáticos y la variedad.",
"input": "Escenario 1: Temperatura promedio 17.2 °C, Humedad 83%, Precipitación 3.4 mm, Variedad Yungay",
"output": "Moderado"
}
```
- **Epochs**: 10
- **Optimizer**: AdamW (mixed precision)
- **Hardware**: 1x A100 40GB (Colab Pro, single GPU)
---
## 🌿 Evaluation (Balanced Test Set, n = 90)
| Class | Precision | Recall | F1 | Support |
|-----------|-----------|--------|-------|---------|
| Bajo | 1.00 | 0.90 | 0.95 | 30 |
| Moderado | 0.91 | 1.00 | 0.95 | 30 |
| Alto | 1.00 | 1.00 | 1.00 | 30 |
| **Accuracy** | | | **0.97** | 90 |
---
## 📈 Intended Use and Limitations
- **Designed for**: Highland regions in Peru (esp. Huancavelica), with expert-labeled ground truth and local pathogen behavior.
- **Limitations**:
- May generalize poorly to lowland areas or different varieties.
- Not a substitute for in-field disease monitoring.
---
## 📑 Citation
If you use this model, please cite:
> Jorge Luis Alonso, *Predicting Potato Late Blight in Huancavelica Using LLaMA Models*, 2025
---
## 🌍 License
MIT License (model + training data)
---
## ⚡ Quick Inference Example
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
model = AutoModelForSequenceClassification.from_pretrained("jalonso24/llama-lateblight-classifier")
tokenizer = AutoTokenizer.from_pretrained("jalonso24/llama-lateblight-classifier")
clf = pipeline("text-classification", model=model, tokenizer=tokenizer, top_k=1)
prompt = "Escenario: Temperatura 18.1 °C, Humedad 85%, Variedad Amarilis"
clf(prompt)
# ➞ [{'label': 'Alto', 'score': 0.95}]
```