Update README.md

0717e96 almost 2 years ago

4.24 kB

	---
	tags:
	- bertopic
	library_name: bertopic
	pipeline_tag: text-classification
	inference: false
	license: apache-2.0
	datasets:
	- pszemraj/summcomparer-gauntlet-v0p1
	language:
	- en
	---

	# BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-summary

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.


	Hierarchy of topics:

	![Hierarchy](https://i.imgur.com/Q8UHCQO.png)

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U -q bertopic safetensors
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("pszemraj/BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-summary")

	topic_model.visualize_topics()

	# for dataframe:
	# topic_model.get_topic_info()
	```

	predicting new instances:

	```python
	topic, embedding = topic_model.transform(text)
	print(topic)
	```


	## Topic overview

	* Number of topics: 24
	* Number of training documents: 1960

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| no_saic_raw_sp - sep_4 - sec - data - image \| 13 \| -1_no_saic_raw_sp_sep_4_sec_data \|
	\| 0 \| lecture - applications - methods - learning - topics \| 104 \| 0_lecture_applications_methods_learning \|
	\| 1 \| cogvideo - videos - cogview2 - cog - video \| 303 \| 1_cogvideo_videos_cogview2_cog \|
	\| 2 \| ship - rainsford - hunted - island - hunts \| 117 \| 2_ship_rainsford_hunted_island \|
	\| 3 \| films - dissertation - film - noir - identity \| 106 \| 3_films_dissertation_film_noir \|
	\| 4 \| linguistics - language - languages - foundational - systems \| 104 \| 4_linguistics_language_languages_foundational \|
	\| 5 \| nemo - dory - transcript - clownfish - fish \| 103 \| 5_nemo_dory_transcript_clownfish \|
	\| 6 \| train - bruno - washington - station - tennis \| 102 \| 6_train_bruno_washington_station \|
	\| 7 \| images - representations - image - captions - representation \| 102 \| 7_images_representations_image_captions \|
	\| 8 \| merge - merging - explain - concept - problems \| 102 \| 8_merge_merging_explain_concept \|
	\| 9 \| enhancement - enhancing - recordings - improve - waveforms \| 100 \| 9_enhancement_enhancing_recordings_improve \|
	\| 10 \| arendelle - elsa - frozen - kristoff - olaf \| 99 \| 10_arendelle_elsa_frozen_kristoff \|
	\| 11 \| scene - story - script - movie - gillis \| 97 \| 11_scene_story_script_movie \|
	\| 12 \| lecture - lemmatization - nlp - medical - techniques \| 96 \| 12_lecture_lemmatization_nlp_medical \|
	\| 13 \| questions - topics - conversation - terrance - talk \| 85 \| 13_questions_topics_conversation_terrance \|
	\| 14 \| sniper - kill - fury - combat - narrator \| 81 \| 14_sniper_kill_fury_combat \|
	\| 15 \| images - lecture - ezurich - pathology - medical \| 67 \| 15_images_lecture_ezurich_pathology \|
	\| 16 \| timeseries - framework - interpretability - representations - next_concept \| 37 \| 16_timeseries_framework_interpretability_representations \|
	\| 17 \| prediction - predictions - forecasting - predict - markov \| 27 \| 17_prediction_predictions_forecasting_predict \|
	\| 18 \| images - imaging - computational - convolutional - lecture \| 27 \| 18_images_imaging_computational_convolutional \|
	\| 19 \| technology - treatment - methods - medical - detection \| 27 \| 19_technology_treatment_methods_medical \|
	\| 20 \| novel - translation - henry - read - learn \| 23 \| 20_novel_translation_henry_read \|
	\| 21 \| abridged - brief - synopsis - short - citations \| 22 \| 21_abridged_brief_synopsis_short \|
	\| 22 \| lecture - pathology - medical - computational - patients \| 16 \| 22_lecture_pathology_medical_computational \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: True
	* language: None
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: None
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True

	## Framework versions

	* Numpy: 1.22.4
	* HDBSCAN: 0.8.29
	* UMAP: 0.5.3
	* Pandas: 1.5.3
	* Scikit-Learn: 1.2.2
	* Sentence-transformers: 2.2.2
	* Transformers: 4.29.2
	* Numba: 0.56.4
	* Plotly: 5.13.1
	* Python: 3.10.11