Audio-LLaMA: LoRA Adapter for Audio Understanding

Model Details

  • Base Model: meta-llama/Llama-3.2-3B-Instruct
  • Audio Model: openai/whisper-large-v3-turbo
  • LoRA Rank: 32
  • Task: Audio transcription from LibriSpeech dataset
  • Training Framework: PEFT (Parameter-Efficient Fine-Tuning)

Usage

This is a PEFT (LoRA) adapter that needs to be combined with the base Llama model to work:

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the LoRA configuration
config = PeftConfig.from_pretrained("cdreetz/audio-llama")

# Load the base model
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the LoRA adapter
model = PeftModel.from_pretrained(model, "cdreetz/audio-llama")

# Run inference
prompt = "Transcribe this audio:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training

This model was fine-tuned using LoRA on audio transcription tasks. It starts with a Llama 3 base model and uses Whisper-processed audio features for audio understanding.

Limitations

This model requires special code for audio processing with Whisper before passing to the Llama model. See the Audio-LLaMA repository for full usage instructions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train cdreetz/audio-llama