distilbertTourism-multilingual-sentiment

A fine-tuned DistilBERT model for performing sentiment analysis on tourism-related texts in multiple languages. This model is a key component of the thesis project "Enhancing Tourist Destination Management through a Multilingual Web-Based Tourist Survey System with Machine Learning." It is designed to analyze reviews, feedback, and other textual data to improve tourist feedback collection in Panglao.

Overview

This model builds on the distilbert-base-multilingual-cased architecture and has been fine-tuned on tourism-specific sentiment data. With support for eight languages, it provides a practical solution for multilingual sentiment classification in the tourism sector.

Thesis Context:
As part of the thesis project, this model integrates with a comprehensive system that leverages advanced natural language processing techniques. In addition to this DistilBERT-based sentiment analyzer, the system utilizes BERTopic for topic modeling. The project aims to surpass the 70% accuracy benchmark set by the IPCR while addressing language barriers and inefficiencies inherent in traditional survey methods.

Model Details

  • Task: Text Classification (Sentiment Analysis)
  • Base Model: distilbert-base-multilingual-cased
  • Architecture: DistilBERT
  • Parameters: 135M
  • Tensor Format: F32 (Safetensors)
  • Supported Languages: 8 (Multilingual)
  • Training Data: 160k synthetic tourism reviews
  • Performance: Achieves over 95% confidence in sentiment classification for tourism-related texts.
  • Fine-tuning: Adapted to the tourism domain (242 fine-tuning iterations/steps indicated)

Usage

To integrate this model into your application, you can use the Hugging Face Transformers library. Below is an example in Python:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Define the model repository
model_name = "SCANSKY/distilbertTourism-multilingual-sentiment"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example input text (replace with your own tourism-related text)
text = "I had an amazing experience during my trip!"
inputs = tokenizer(text, return_tensors="pt")

# Perform inference
outputs = model(**inputs)
logits = outputs.logits

# You can further process the logits to get predicted sentiment labels.

Installation

Ensure you have the required packages installed:

pip install transformers safetensors

Limitations

  • Domain Specific: This model is fine-tuned specifically for tourism sentiment analysis and may not perform optimally on texts from other domains.
  • Inference API: Currently, the model does not support direct deployment to the Hugging Face Inference API since it lacks a library tag.

Future Work

  • Dataset Expansion: Incorporating additional data from more tourism sources could further improve performance.
  • Model Optimization: Experimentation with different fine-tuning strategies or hyperparameters might yield even better sentiment classification accuracy.
  • API Integration: Future updates may include support for direct inference API deployment.

Acknowledgements

  • This model is based on the robust DistilBERT architecture.
  • Special thanks to the Hugging Face community for providing the infrastructure that makes deploying and sharing models seamless.
  • This work is part of the thesis project "Enhancing Tourist Destination Management through a Multilingual Web-Based Tourist Survey System with Machine Learning." The project also utilizes BERTopic for topic modeling, aiming to revolutionize the collection and analysis of tourist feedback by overcoming language barriers and improving upon traditional survey methods.

Citation

@inproceedings{your_citation,
  title={Distilbert Sentiment for Multilingual Tourism Feedback},
  author={Paul Andre D. Tadiar},
  year={2025}
}

Downloads last month
36
Safetensors
Model size
135M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SCANSKY/distilbertTourism-multilingual-sentiment

Finetuned
(272)
this model