Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +65 -3
adapter_config.json +21 -0
adapter_model.bin +3 -0
original_args.json +26 -0

README.md CHANGED Viewed

@@ -1,3 +1,65 @@
----
-license: mit
----

+---
+language: en
+license: apache-2.0
+tags:
+- audio
+- speech
+- transcription
+- librispeech
+- llama-3
+datasets:
+- librispeech_asr
+---
+# Audio-LLaMA: LoRA Adapter for Audio Transcription
+This model is a LoRA adapter fine-tuned on audio transcription tasks. It requires the Llama base model to be used.
+## Model Details
+- **Base Model**: meta-llama/Llama-3.2-3B-Instruct
+- **Audio Model**: openai/whisper-large-v3-turbo
+- **LoRA Rank**: 32
+- **Task**: Audio transcription from LibriSpeech dataset
+- **Training Framework**: PEFT (Parameter-Efficient Fine-Tuning)
+## Usage
+This is a PEFT (LoRA) adapter that needs to be combined with the base Llama model to work:
+```python
+import torch
+from peft import PeftModel, PeftConfig
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load the LoRA configuration
+config = PeftConfig.from_pretrained("cdreetz/audio-llama")
+# Load the base model
+model = AutoModelForCausalLM.from_pretrained(
+    config.base_model_name_or_path,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+# Load the tokenizer
+tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
+# Load the LoRA adapter
+model = PeftModel.from_pretrained(model, "cdreetz/audio-llama")
+# Run inference
+prompt = "Transcribe this audio:"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=100)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## Training
+This model was fine-tuned using LoRA on audio transcription tasks. It starts with a Llama 3 base model and uses Whisper-processed audio features for audio understanding.
+## Limitations
+This model requires special code for audio processing with Whisper before passing to the Llama model. See the [Audio-LLaMA repository](https://github.com/cdreetz/audio-llama) for full usage instructions.

adapter_config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+  "base_model_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
+  "bias": "none",
+  "enable_lora": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "target_modules": [
+    "q_proj",
+    "k_proj",
+    "v_proj",
+    "gate_proj",
+    "up_proj",
+    "down_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f63fd54f85cd58730463ed2ef5c15a657fed4c174fed433966eddda933dd9ab
+size 172603978

original_args.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "llama_path": "meta-llama/Llama-3.2-3B-Instruct",
+  "whisper_path": "openai/whisper-large-v3-turbo",
+  "data_path": "./audio_instruction_examples.json",
+  "audio_dir": "./",
+  "output_dir": "./checkpoints",
+  "batch_size": 16,
+  "eval_batch_size": 16,
+  "grad_accum_steps": 4,
+  "num_epochs": 5,
+  "learning_rate": 5e-05,
+  "weight_decay": 0.01,
+  "warmup_steps": 500,
+  "max_grad_norm": 1.0,
+  "lora_rank": 32,
+  "save_steps": 1000,
+  "eval_steps": 500,
+  "log_steps": 100,
+  "max_audio_length": 30,
+  "text_max_length": 512,
+  "use_wandb": false,
+  "wandb_project": "audio-llm",
+  "seed": 42,
+  "fp16": true,
+  "num_workers": 4
+}