Ah7med
/

BERTopic_ArXiv

Text Classification

Model card Files Files and versions Community

Ah7med commited on Jan 21

Commit

48a48db

·

verified ·

1 Parent(s): 3926e42

Update README.md

Files changed (1) hide show

README.md +22 -3

README.md CHANGED Viewed

@@ -6,10 +6,21 @@ library_name: bertopic
 pipeline_tag: text-classification
 ---
-# BERTopic_ArXiv
-This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
-BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
 ## Usage
@@ -177,3 +188,11 @@ Source: [Hugging Face dataset page](https://huggingface.co/datasets/inparallel/s
 * Numba: 0.60.0
 * Plotly: 5.24.1
 * Python: 3.10.12

 pipeline_tag: text-classification
 ---
+# -BERTopic_Arab_news
+A modular implementation of BERTopic for topic modeling, specifically trained on `Arabic news articles`. This implementation allows for flexible component selection at each layer of the topic modeling pipeline.
+![image](https://github.com/user-attachments/assets/0bdfdba4-2f76-4857-8467-51cc8018bed0)
+## The core of this project is BERTopic, which is used to perform topic modeling on the processed text. The following steps are performed:
+#### - Topic Modeling: BERTopic is trained on the cleaned dataset to identify topics in the articles.
+#### - Fine-tuning with KeyBERT: We use KeyBERT-inspired representations to improve the clarity and interpretability of the topics.
+#### - Topic Extraction: The most frequent topics are extracted, and each document is assigned a topic.
+#### - Topic Updates: The model can be fine-tuned by updating topics with n-grams for more domain-specific phrases.
 ## Usage
 * Numba: 0.60.0
 * Plotly: 5.24.1
 * Python: 3.10.12
+# Visualization: Displays topic distribution and document-level information
+![image](https://github.com/user-attachments/assets/96ab4297-c8cc-4227-9164-eb129d45bffc)