BERTopic_ML-ArXiv-Abstracts

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("b-verma/BERTopic_ML-ArXiv-Abstracts")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 120
  • Number of training documents: 117592
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 data - models - model - learning - based 153 Machine Learning and Deep Learning
0 policy - reinforcement - reinforcement learning - rl - agent 32094 Reinforcement Learning and Control
1 graph - node - graphs - nodes - gnns 10423 Graph Embedding and Representation Learning
2 speech - audio - speaker - music - asr 3598 Speech Technology
3 3d - object - video - segmentation - point 3527 3D Object Understanding
4 equations - differential - physics - differential equations - pdes 2706 Discovering and Solving Partial Differential Equations
5 adversarial - attacks - adversarial examples - robustness - attack 2287 Adversarial Robustness
6 networks - relu - neural - neural networks - activation 2210 Deep Learning Activation Functions
7 segmentation - medical - images - image - tumor 2207 Medical Image Segmentation
8 gradient - stochastic - sgd - convergence - convex 1835 Convergence Analysis of Non-Convex Optimization Algorithms
9 federated - fl - federated learning - clients - privacy 1717 Federated Learning and Privacy
10 channel - wireless - radio - network - communication 1698 Channel Allocation and Estimation in Wireless Communications
11 privacy - private - differential privacy - dp - differential 1449 Privacy-Preserving Machine Learning
12 clinical - patient - patients - medical - health 1353 Clinical Patient Representation Learning
13 gans - gan - generative - generative adversarial - generator 1298 Generative Adversarial Networks (GANs)
14 bandit - regret - arm - bandits - arms 1269 Armed Bandit Problems
15 financial - stock - market - trading - price 1246 Financial Time Series Analysis
16 recommendation - user - item - recommender - items 1193 Recommendation Systems
17 power - energy - electricity - load - forecasting 1144 Power and Energy Forecasting
18 causal - treatment - observational - effect - causal inference 1070 Causal Inference and Learning
19 explanations - explanation - counterfactual - interpretability - interpretable 1048 Explanation Methods for Machine Learning Models
20 driving - autonomous - vehicle - vehicles - driver 1007 Autonomous Driving
21 malware - detection - iot - security - attacks 959 Cybersecurity Threats in IoT Networks
22 quantum - classical - circuit - circuits - quantum machine 947 Quantum Machine Learning
23 fairness - fair - bias - discrimination - protected 938 Fair Machine Learning
24 hardware - memory - gpu - dnn - accelerators 924 Edge AI Hardware for Efficient DNN Inference
25 clustering - means - clusters - cluster - algorithm 921 Clustering Algorithms
26 crop - images - satellite - remote sensing - hyperspectral 899 Remote Sensing and Deep Learning
27 time series - series - time - forecasting - series forecasting 847 Time Series Analysis and Forecasting
28 pruning - compression - sparsity - sparse - network 836 Neural Network Pruning
29 distributed - communication - sgd - decentralized - gradient 822 Distributed Optimization Methods
30 label - labels - multi label - noisy - noisy labels 812 Multi-Label Learning
31 meta - meta learning - shot - task - shot learning 776 Few-Shot Learning and Meta-Learning
32 traffic - temporal - travel - spatial - road 770 Traffic Forecasting and Prediction
33 anomaly - anomaly detection - detection - anomalies - outlier 750 Anomaly Detection
34 uncertainty - calibration - bayesian - bayesian neural - bayesian neural networks 735 Uncertainty Estimation in Deep Learning
35 variational - inference - posterior - mcmc - carlo 724 Inference and Approximation
36 domain - domain adaptation - adaptation - source - target 717 Unsupervised Domain Adaptation
37 continual - continual learning - forgetting - catastrophic forgetting - catastrophic 679 Continual Learning and Forgetting
38 vae - latent - variational - vaes - generative 678 Disentangled Representation Learning
39 visual - image - vqa - modal - captioning 627 Multimodal Vision and Language Understanding
40 code - program - software - programs - source code 621 Software Engineering
41 brain - fmri - functional - disease - ad 615 Brain Connectivity and Disease Diagnosis
42 spiking - snns - spike - neurons - spiking neural 603 Spiking Neural Networks (SNNs)
43 activity - activity recognition - har - gait - sensor 600 Human Activity Recognition (HAR)
44 dictionary - sparse - signal - dictionary learning - recovery 595 Sparse Signal Processing
45 news - social - media - fake - fake news 580 Fake News Detection
46 automl - ml - machine learning - machine - research 542 Automated Machine Learning (AutoML)
47 class - imbalanced - classifiers - minority - classification 500 Class Imbalance in Classification
48 gravitational - galaxy - solar - simulations - mass 491 Gravitational Wave Detection and Analysis
49 molecular - molecules - chemical - drug - molecule 481 Molecular Design and Discovery
50 recurrent - rnns - rnn - recurrent neural - lstm 479 Recurrent Neural Networks (RNNs)
51 bo - bayesian optimization - optimization - bayesian - function 476 Global Optimization with Bayesian Methods
52 logic - reasoning - symbolic - logical - relational 473 Integrating Reasoning and Learning
53 climate - weather - precipitation - water - forecasting 472 Climate and Weather Prediction
54 gp - gaussian - gaussian process - gaussian processes - processes 470 Scalable Gaussian Process Inference for Large Datasets
55 regret - online - online learning - convex - bounds 456 Online Learning and Regret Bounds
56 language - bert - fine - language models - fine tuning 455 Fine-tuning Language Models
57 nas - search - architecture search - architecture - neural architecture 453 Neural Architecture Search (NAS)
58 eeg - bci - brain - eeg signals - signals 453 Emotion and Brain Signals Analysis
59 dialogue - dialog - conversational - responses - conversation 433 Conversational AI Models
60 emotion - emotion recognition - facial - recognition - emotions 417 Emotion Recognition
61 knowledge - knowledge graph - knowledge graphs - kg - entities 409 Embedding Knowledge Graphs
62 active learning - active - al - learning - labeling 388 Active Learning
63 quantization - precision - bit - quantized - floating 378 Quantization for Deep Neural Networks
64 materials - molecular - chemical - atomic - material 356 Materials Discovery and Property Prediction using Machine Learning
65 bounds - pac - bound - generalization - pac bayes 352 Generalization Bounds
66 fault - maintenance - industrial - manufacturing - monitoring 329 Fault Detection and Diagnosis in Industrial Settings
67 translation - machine translation - nmt - neural machine translation - neural machine 329 Machine Translation
68 tensor - tensors - rank - decomposition - tensor completion 328 Tensor Completion and Rank Decomposition
69 topic - topics - topic models - lda - topic modeling 325 Topic Modeling
70 covid - covid 19 - 19 - chest - ct 312 Computer-Aided Diagnosis of COVID-19
71 teacher - distillation - student - knowledge distillation - knowledge 310 Knowledge Transfer and Distillation
72 students - student - course - courses - educational 310 Education Technology
73 combinatorial - problems - combinatorial optimization - problem - solvers 310 Combinatorial Optimization
74 trees - tree - forest - decision - decision trees 304 Interpretable Machine Learning Models
75 contrastive - contrastive learning - self supervised - supervised - self 303 Contrastive Learning for Representation Learning
76 face - face recognition - facial - deepfake - recognition 298 Face Recognition and Bias
77 lasso - regression - sparse - screening - sparsity 298 High-Dimensional Sparse Regression
78 kernel - kernels - random - regression - ridge 296 Kernel Methods and Regression
79 seismic - inversion - reservoir - oil - velocity 295 Seismic Inverse Modeling
80 backdoor - poisoning - attacks - attack - backdoor attacks 292 Backdoor Attacks
81 manifold - manifold learning - dimensional - manifolds - dimensionality 288 Manifold Learning and Dimensionality Reduction
82 ecg - heart - electrocardiogram - cardiac - signals 288 Cardiac Signal Processing and Classification
83 attention - vision - vit - transformers - transformer 286 Computer Vision Transformers
84 word - embeddings - word embeddings - words - embedding 283 "Word Embeddings and Their Applications in Natural Language Processing"
85 question - qa - questions - answering - answer 279 Question Answering
86 denoising - image - noise - restoration - image denoising 276 Image Denoising
87 ctr - product - commerce - click - user 274 Advertising and Predictive Modeling
88 graphical - graphical models - belief propagation - belief - ising 266 Inference and Learning in Graphical Models
89 transport - ot - optimal transport - wasserstein - optimal 264 Optimal Transport and Related Methods
90 matrix - rank - matrix completion - completion - low rank 259 Low Rank Matrix Completion
91 covid - covid 19 - 19 - pandemic - spread 255 COVID-19 Forecasting and Prediction
92 svm - support vector - support - svms - vector 255 Machine Learning - SVM
93 physics - particle - detector - high energy - energy 244 High Energy Particle Physics
94 feature selection - feature - selection - features - feature selection methods 240 Feature Selection for High-Dimensional Data
95 ranking - items - rank - pairwise - comparisons 232 Ranking and Learning from Noisy Comparisons
96 hyperparameter - hpo - hyperparameters - hyperparameter optimization - optimization 229 Hyperparameter Optimization for Deep Learning Models
97 pricing - revenue - price - auctions - regret 227 Dynamic Pricing and Demand Learning
98 ood - ood detection - distribution - distribution ood - detection 225 Out-of-Distribution Detection in Deep Learning
99 bayesian - bayesian networks - bayesian network - structure - structure learning 224 Bayesian Network Structure Learning
100 pca - principal - principal component - component analysis - principal component analysis 224 Principal Component Analysis (PCA)
101 protein - proteins - sequence - sequences - structure 214 Protein Representation and Prediction
102 hashing - hash - codes - retrieval - search 209 Large-Scale Image Retrieval and Hashing
103 submodular - submodular functions - functions - maximization - approximation 207 Submodular Function Minimization
104 mixture - em - mixtures - em algorithm - mixture models 206 Mixture Models and EM Algorithm
105 metric learning - metric - distance - distance metric - similarity 206 Metric Learning for Machine Learning
106 equivariant - equivariance - group - symmetry - spherical 200 Equivariant Deep Learning
107 nmf - nonnegative - factorization - matrix - matrix factorization 199 NMF (Nonnegative Matrix Factorization)
108 compression - video - coding - distortion - rate distortion 198 Neural Compression
109 mri - reconstruction - pet - imaging - mr 198 Magnetic Resonance Imaging Reconstruction
110 oct - retinal - dr - diabetic - images 197 Retinal Imaging and Disease Diagnosis
111 entity - relation - relation extraction - entities - extraction 192 Relation Extraction
112 handwritten - text - characters - character - recognition 178 Handwritten Character Recognition
113 augmentation - data augmentation - mixup - data - augmentations 178 Data Augmentation for Improving Deep Learning Performance
114 crowdsourcing - workers - crowd - worker - crowdsourced 168 Crowdsourcing Labeling and Annotation
115 summarization - summaries - summary - abstractive - text 165 Automatic Summarization
116 circuit - design - circuits - chip - synthesis 161 Circuit Design Optimization
117 view - multi view - views - multi - clustering 160 Multi-View Clustering
118 cancer - gene - genes - disease - expression 158 Cancer Gene Expression Analysis

Training hyperparameters

  • calculate_probabilities: True
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 2.1.3
  • HDBSCAN: 0.8.40
  • UMAP: 0.5.7
  • Pandas: 2.2.3
  • Scikit-Learn: 1.6.1
  • Sentence-transformers: 3.4.1
  • Transformers: 4.49.0
  • Numba: 0.61.0
  • Plotly: 6.0.1
  • Python: 3.10.16
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support