arxiv:2504.21475

Advancing Arabic Reverse Dictionary Systems: A Transformer-Based Approach with Dataset Construction Guidelines

Published on Apr 30

· Submitted by

Omartificial-Intelligence-Space on May 14

Upvote

Authors:

Serry Sibaee ,

Abdullah Al Harbi ,

Omer Nacar ,

Adel Ammar ,

Wadii Boulila

Abstract

This study addresses the critical gap in Arabic natural language processing by developing an effective Arabic Reverse Dictionary (RD) system that enables users to find words based on their descriptions or meanings. We present a novel transformer-based approach with a semi-encoder neural network architecture featuring geometrically decreasing layers that achieves state-of-the-art results for Arabic RD tasks. Our methodology incorporates a comprehensive dataset construction process and establishes formal quality standards for Arabic lexicographic definitions. Experiments with various pre-trained models demonstrate that Arabic-specific models significantly outperform general multilingual embeddings, with ARBERTv2 achieving the best ranking score (0.0644). Additionally, we provide a formal abstraction of the reverse dictionary task that enhances theoretical understanding and develop a modular, extensible Python library (RDTL) with configurable training pipelines. Our analysis of dataset quality reveals important insights for improving Arabic definition construction, leading to eight specific standards for building high-quality reverse dictionary resources. This work contributes significantly to Arabic computational linguistics and provides valuable tools for language learning, academic writing, and professional communication in Arabic.

View arXiv page View PDF Add to collection

Community

Omartificial-Intelligence-Space

Paper author Paper submitter 1 day ago

This paper introduces a novel transformer-based approach for Arabic reverse dictionary systems, leveraging a semi-encoder neural network with geometrically decreasing hidden layers to achieve state-of-the-art performance. It also establishes formal quality standards for Arabic lexicographic definitions, ensuring consistency and reliability in dataset construction. Additionally, the authors provide a theoretical abstraction of the reverse dictionary task, promoting reproducibility and deeper understanding of embedding-based word retrieval. To support ongoing research, they release RDTL, an open-source Python library with modular training pipelines tailored for reverse dictionary applications.

librarian-bot

1 day ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.21475 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2504.21475 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.21475 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.