cdreetz commited on
Commit
c5e4708
·
verified ·
1 Parent(s): b6ef9a2

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +65 -3
  2. adapter_config.json +21 -0
  3. adapter_model.bin +3 -0
  4. original_args.json +26 -0
README.md CHANGED
@@ -1,3 +1,65 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - audio
6
+ - speech
7
+ - transcription
8
+ - librispeech
9
+ - llama-3
10
+ datasets:
11
+ - librispeech_asr
12
+ ---
13
+
14
+ # Audio-LLaMA: LoRA Adapter for Audio Transcription
15
+
16
+ This model is a LoRA adapter fine-tuned on audio transcription tasks. It requires the Llama base model to be used.
17
+
18
+ ## Model Details
19
+
20
+ - **Base Model**: meta-llama/Llama-3.2-3B-Instruct
21
+ - **Audio Model**: openai/whisper-large-v3-turbo
22
+ - **LoRA Rank**: 32
23
+ - **Task**: Audio transcription from LibriSpeech dataset
24
+ - **Training Framework**: PEFT (Parameter-Efficient Fine-Tuning)
25
+
26
+ ## Usage
27
+
28
+ This is a PEFT (LoRA) adapter that needs to be combined with the base Llama model to work:
29
+
30
+ ```python
31
+ import torch
32
+ from peft import PeftModel, PeftConfig
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+
35
+ # Load the LoRA configuration
36
+ config = PeftConfig.from_pretrained("cdreetz/audio-llama")
37
+
38
+ # Load the base model
39
+ model = AutoModelForCausalLM.from_pretrained(
40
+ config.base_model_name_or_path,
41
+ torch_dtype=torch.float16,
42
+ device_map="auto"
43
+ )
44
+
45
+ # Load the tokenizer
46
+ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
47
+
48
+ # Load the LoRA adapter
49
+ model = PeftModel.from_pretrained(model, "cdreetz/audio-llama")
50
+
51
+ # Run inference
52
+ prompt = "Transcribe this audio:"
53
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
54
+ outputs = model.generate(**inputs, max_new_tokens=100)
55
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
56
+ print(response)
57
+ ```
58
+
59
+ ## Training
60
+
61
+ This model was fine-tuned using LoRA on audio transcription tasks. It starts with a Llama 3 base model and uses Whisper-processed audio features for audio understanding.
62
+
63
+ ## Limitations
64
+
65
+ This model requires special code for audio processing with Whisper before passing to the Llama model. See the [Audio-LLaMA repository](https://github.com/cdreetz/audio-llama) for full usage instructions.
adapter_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model_name_or_path": "meta-llama/Llama-3.2-3B-Instruct",
3
+ "bias": "none",
4
+ "enable_lora": null,
5
+ "fan_in_fan_out": false,
6
+ "inference_mode": true,
7
+ "lora_alpha": 32,
8
+ "lora_dropout": 0.05,
9
+ "modules_to_save": null,
10
+ "peft_type": "LORA",
11
+ "r": 32,
12
+ "target_modules": [
13
+ "q_proj",
14
+ "k_proj",
15
+ "v_proj",
16
+ "gate_proj",
17
+ "up_proj",
18
+ "down_proj"
19
+ ],
20
+ "task_type": "CAUSAL_LM"
21
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f63fd54f85cd58730463ed2ef5c15a657fed4c174fed433966eddda933dd9ab
3
+ size 172603978
original_args.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "llama_path": "meta-llama/Llama-3.2-3B-Instruct",
3
+ "whisper_path": "openai/whisper-large-v3-turbo",
4
+ "data_path": "./audio_instruction_examples.json",
5
+ "audio_dir": "./",
6
+ "output_dir": "./checkpoints",
7
+ "batch_size": 16,
8
+ "eval_batch_size": 16,
9
+ "grad_accum_steps": 4,
10
+ "num_epochs": 5,
11
+ "learning_rate": 5e-05,
12
+ "weight_decay": 0.01,
13
+ "warmup_steps": 500,
14
+ "max_grad_norm": 1.0,
15
+ "lora_rank": 32,
16
+ "save_steps": 1000,
17
+ "eval_steps": 500,
18
+ "log_steps": 100,
19
+ "max_audio_length": 30,
20
+ "text_max_length": 512,
21
+ "use_wandb": false,
22
+ "wandb_project": "audio-llm",
23
+ "seed": 42,
24
+ "fp16": true,
25
+ "num_workers": 4
26
+ }