cartesia-ai
/

Llamba-1B

@@ -1,8 +1,30 @@
 # Llamba Models
 The Llamba models are part of Cartesia's [Edge](https://github.com/cartesia-ai/edge) library, designed for efficient, high-performance machine learning applications.
-For more details, refer to the [paper](#).
 ---
 ## Usage
@@ -20,7 +42,7 @@ To use Llamba with PyTorch:
 from transformers import AutoTokenizer
 from cartesia_pytorch.Llamba.llamba import LlambaLMHeadModel
-model = LlambaLMHeadModel.from_pretrained("AvivBick/Llamba-1B", strict=True).to('cuda')
 tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
 input_ids = tokenizer("Hello, my name is", return_tensors="pt").input_ids
 input_ids = input_ids.to('cuda')
@@ -30,11 +52,23 @@ print(tokenizer.decode(output, skip_special_tokens=True))
 ### Llamba on MLX
-To run Llamba with the Metal framework:
-_(Add specific instructions here when available.)_
 ---
 ### Evaluations
-Details on model performance, benchmarks, and evaluation metrics can be found in the [paper link](#).
-_(Expand on this section if specific results or datasets are available.)_

+---
+tags:
+  - Llamba
+  - recurrent-models
+  - distillation
+  - cartesia
+  - edge
+license: apache-2.0
+library_name: cartesia-pytorch
+datasets:
+  - ai2_arc
+  - PIQA
+  - Winogrande
+  - HellaSwag
+  - Lambada
+  - MMLU
+  - OpenBookQA
+inference:
+  precision: bf16
+  hardware: gpu
+---
 # Llamba Models
 The Llamba models are part of Cartesia's [Edge](https://github.com/cartesia-ai/edge) library, designed for efficient, high-performance machine learning applications.
+For more details, refer to the [paper](https://arxiv.org/abs/2502.14458).
 ---
 ## Usage
 from transformers import AutoTokenizer
 from cartesia_pytorch.Llamba.llamba import LlambaLMHeadModel
+model = LlambaLMHeadModel.from_pretrained("cartesia-ai/Llamba-1B", strict=True).to('cuda')
 tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
 input_ids = tokenizer("Hello, my name is", return_tensors="pt").input_ids
 input_ids = input_ids.to('cuda')
 ### Llamba on MLX
+To run Llamba with the Metal framework see [cartesia-metal](https://github.com/cartesia-ai/edge/tree/main/cartesia-metal)
 ---
 ### Evaluations
+The Llamba models have been evaluated on multiple standard benchmarks, demonstrating efficiency gains while maintaining strong performance. Below are the results:
+| Model      | ARC-C (0-shot) | ARC-C (25-shot) | ARC-E (0-shot) | ARC-E (25-shot) | PIQA (0-shot) | PIQA (10-shot) | WG (0-shot) | WG (5-shot) |
+|------------|---------------|----------------|---------------|----------------|---------------|---------------|------------|------------|
+| Llamba-1B  | 37.2          | 41.8           | 69.5          | 71.2           | 74.0          | 74.3          | 60.6       | 58.1       |
+| Llamba-3B  | 48.5          | 53.0           | 79.0          | 81.1           | 78.6          | 79.5          | 70.4       | 72.4       |
+| Llamba-8B  | 54.6          | 60.0           | 82.5          | 85.8           | 80.9          | 81.5          | 73.3       | 76.9       |
+| Model      | HS (0-shot) | HS (10-shot) | LMB (0-shot) | LMB (10-shot) | MMLU (0-shot) | MMLU (5-shot) | OBQA (0-shot) | OBQA (10-shot) |
+|------------|------------|------------|------------|------------|------------|------------|------------|------------|
+| Llamba-1B  | 61.2       | 60.2       | 48.4       | 39.0       | 38.0       | 31.3       | 37.0       | 38.0       |
+| Llamba-3B  | 73.8       | 74.3       | 65.8       | 60.0       | 52.7       | 50.3       | 42.8       | 42.8       |
+| Llamba-8B  | 77.6       | 78.7       | 69.4       | 65.0       | 61.0       | 60.0       | 43.4       | 45.8       |
+More details on model performance, benchmarks, and evaluation metrics can be found in the [paper](https://arxiv.org/abs/2502.14458).