Update README.md
Browse files
README.md
CHANGED
@@ -6,10 +6,21 @@ library_name: bertopic
|
|
6 |
pipeline_tag: text-classification
|
7 |
---
|
8 |
|
9 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
-
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
12 |
-
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
13 |
|
14 |
## Usage
|
15 |
|
@@ -177,3 +188,11 @@ Source: [Hugging Face dataset page](https://huggingface.co/datasets/inparallel/s
|
|
177 |
* Numba: 0.60.0
|
178 |
* Plotly: 5.24.1
|
179 |
* Python: 3.10.12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
pipeline_tag: text-classification
|
7 |
---
|
8 |
|
9 |
+
# -BERTopic_Arab_news
|
10 |
+
|
11 |
+
A modular implementation of BERTopic for topic modeling, specifically trained on `Arabic news articles`. This implementation allows for flexible component selection at each layer of the topic modeling pipeline.
|
12 |
+
|
13 |
+

|
14 |
+
|
15 |
+
|
16 |
+
## The core of this project is BERTopic, which is used to perform topic modeling on the processed text. The following steps are performed:
|
17 |
+
|
18 |
+
#### - Topic Modeling: BERTopic is trained on the cleaned dataset to identify topics in the articles.
|
19 |
+
#### - Fine-tuning with KeyBERT: We use KeyBERT-inspired representations to improve the clarity and interpretability of the topics.
|
20 |
+
#### - Topic Extraction: The most frequent topics are extracted, and each document is assigned a topic.
|
21 |
+
#### - Topic Updates: The model can be fine-tuned by updating topics with n-grams for more domain-specific phrases.
|
22 |
+
|
23 |
|
|
|
|
|
24 |
|
25 |
## Usage
|
26 |
|
|
|
188 |
* Numba: 0.60.0
|
189 |
* Plotly: 5.24.1
|
190 |
* Python: 3.10.12
|
191 |
+
|
192 |
+
|
193 |
+
|
194 |
+
|
195 |
+
# Visualization: Displays topic distribution and document-level information
|
196 |
+
|
197 |
+
|
198 |
+

|