Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ tags:
|
|
21 |
<p>
|
22 |
|
23 |
<p align="center">
|
24 |
-
Kimi-Audio-7B-
|
25 |
</p>
|
26 |
|
27 |
## Introduction
|
@@ -30,7 +30,7 @@ We present Kimi-Audio, an open-source audio foundation model excelling in **audi
|
|
30 |
|
31 |
Kimi-Audio is designed as a universal audio foundation model capable of handling a wide variety of audio processing tasks within a single unified framework. Key features include:
|
32 |
|
33 |
-
* **Universal Capabilities:** Handles diverse tasks like speech recognition (ASR), audio question answering (AQA), audio captioning (AAC), speech emotion recognition (SER), sound event/scene classification (SEC/ASC)
|
34 |
* **State-of-the-Art Performance:** Achieves SOTA results on numerous audio benchmarks (see our [Technical Report](https://raw.githubusercontent.com/MoonshotAI/Kimi-Audio/main/assets/kimia_report.pdf)). <!-- TODO: Replace with actual raw PDF URL -->
|
35 |
* **Large-Scale Pre-training:** Pre-trained on over 13 million hours of diverse audio data (speech, music, sounds) and text data.
|
36 |
* **Novel Architecture:** Employs a hybrid audio input (continuous acoustic + discrete semantic tokens) and an LLM core with parallel heads for text and audio token generation.
|
|
|
21 |
<p>
|
22 |
|
23 |
<p align="center">
|
24 |
+
<a href="https://huggingface.co/moonshotai/Kimi-Audio-7B">🤗 Kimi-Audio-7B</a> | <a href="https://huggingface.co/moonshotai/Kimi-Audio-7B-Instruct">🤗 Kimi-Audio-7B-Instruct </a> | <a href="https://raw.githubusercontent.com/MoonshotAI/Kimi-Audio/master/assets/kimia_report.pdf">📑 Paper</a>
|
25 |
</p>
|
26 |
|
27 |
## Introduction
|
|
|
30 |
|
31 |
Kimi-Audio is designed as a universal audio foundation model capable of handling a wide variety of audio processing tasks within a single unified framework. Key features include:
|
32 |
|
33 |
+
* **Universal Capabilities:** Handles diverse tasks like speech recognition (ASR), audio question answering (AQA), audio captioning (AAC), speech emotion recognition (SER), sound event/scene classification (SEC/ASC) and end-to-end speech conversation.
|
34 |
* **State-of-the-Art Performance:** Achieves SOTA results on numerous audio benchmarks (see our [Technical Report](https://raw.githubusercontent.com/MoonshotAI/Kimi-Audio/main/assets/kimia_report.pdf)). <!-- TODO: Replace with actual raw PDF URL -->
|
35 |
* **Large-Scale Pre-training:** Pre-trained on over 13 million hours of diverse audio data (speech, music, sounds) and text data.
|
36 |
* **Novel Architecture:** Employs a hybrid audio input (continuous acoustic + discrete semantic tokens) and an LLM core with parallel heads for text and audio token generation.
|