Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
twinkle-ai
's Collections
🏎️ Formosa-1 Series
🧠 Traditional Chinese Reasoning Datasets
📋 Eval Logs
📋 Eval Logs
updated
4 days ago
Benchmark log generated with Twinkle Eval, recording the model's outputs for each prompt.
Upvote
2
twinkle-ai/llama-4-eval-logs-and-scores
Viewer
•
Updated
25 days ago
•
750
•
133
•
2
Upvote
2
Share collection
View history
Collection guide
Browse collections