Ah7med commited on
Commit
48a48db
·
verified ·
1 Parent(s): 3926e42

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -3
README.md CHANGED
@@ -6,10 +6,21 @@ library_name: bertopic
6
  pipeline_tag: text-classification
7
  ---
8
 
9
- # BERTopic_ArXiv
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
- This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
- BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
 
14
  ## Usage
15
 
@@ -177,3 +188,11 @@ Source: [Hugging Face dataset page](https://huggingface.co/datasets/inparallel/s
177
  * Numba: 0.60.0
178
  * Plotly: 5.24.1
179
  * Python: 3.10.12
 
 
 
 
 
 
 
 
 
6
  pipeline_tag: text-classification
7
  ---
8
 
9
+ # -BERTopic_Arab_news
10
+
11
+ A modular implementation of BERTopic for topic modeling, specifically trained on `Arabic news articles`. This implementation allows for flexible component selection at each layer of the topic modeling pipeline.
12
+
13
+ ![image](https://github.com/user-attachments/assets/0bdfdba4-2f76-4857-8467-51cc8018bed0)
14
+
15
+
16
+ ## The core of this project is BERTopic, which is used to perform topic modeling on the processed text. The following steps are performed:
17
+
18
+ #### - Topic Modeling: BERTopic is trained on the cleaned dataset to identify topics in the articles.
19
+ #### - Fine-tuning with KeyBERT: We use KeyBERT-inspired representations to improve the clarity and interpretability of the topics.
20
+ #### - Topic Extraction: The most frequent topics are extracted, and each document is assigned a topic.
21
+ #### - Topic Updates: The model can be fine-tuned by updating topics with n-grams for more domain-specific phrases.
22
+
23
 
 
 
24
 
25
  ## Usage
26
 
 
188
  * Numba: 0.60.0
189
  * Plotly: 5.24.1
190
  * Python: 3.10.12
191
+
192
+
193
+
194
+
195
+ # Visualization: Displays topic distribution and document-level information
196
+
197
+
198
+ ![image](https://github.com/user-attachments/assets/96ab4297-c8cc-4227-9164-eb129d45bffc)