--- language: en license: apache-2.0 tags: - audio - speech - transcription - librispeech - llama-3 datasets: - librispeech_asr --- # Audio-LLaMA: LoRA Adapter for Audio Understanding ## Model Details - **Base Model**: meta-llama/Llama-3.2-3B-Instruct - **Audio Model**: openai/whisper-large-v3-turbo - **LoRA Rank**: 32 - **Task**: Audio transcription from LibriSpeech dataset - **Training Framework**: PEFT (Parameter-Efficient Fine-Tuning) ## Usage This is a PEFT (LoRA) adapter that needs to be combined with the base Llama model to work: ```python import torch from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM, AutoTokenizer # Load the LoRA configuration config = PeftConfig.from_pretrained("cdreetz/audio-llama") # Load the base model model = AutoModelForCausalLM.from_pretrained( config.base_model_name_or_path, torch_dtype=torch.float16, device_map="auto" ) # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) # Load the LoRA adapter model = PeftModel.from_pretrained(model, "cdreetz/audio-llama") # Run inference prompt = "Transcribe this audio:" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=100) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## Training This model was fine-tuned using LoRA on audio transcription tasks. It starts with a Llama 3 base model and uses Whisper-processed audio features for audio understanding. ## Limitations This model requires special code for audio processing with Whisper before passing to the Llama model. See the [Audio-LLaMA repository](https://github.com/cdreetz/audio-llama) for full usage instructions.