ValueError: Tokenizer class AksharaTokenizer does not exist or is not currently imported.

#1
by journalesque - opened

image.png

Please help!

SVECTOR org

Dear @journalesque ,

Thank you for bringing this to our attention. We understand that you're encountering the following error while using Akshara-2B-Hindi:

ValueError: Tokenizer class AksharaTokenizer does not exist or is not currently imported.

This issue likely stems from the tokenizer not being correctly registered in transformers. We are currently investigating this on our end. In the meantime, please try the following:

  1. Ensure you have the latest version of transformers

pip install --upgrade transformers

  1. Load the tokenizer explicitly with trust_remote_code=True
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "SVECTOR-CORPORATION/Akshara-2B-Hindi", 
    trust_remote_code=True
)

If the issue persists, could you share your Python and transformers versions? This will help us diagnose the problem faster. We appreciate your patience and will update you as soon as we have a resolution.

Best regards,
SVECTOR Support Team

Getting same error, please check the screenshot for the info on python version, transformer info.

Machine: Apple M2 Pro, 15.3.1

Akshara_error.png

Please help me out here

SVECTOR org

Dear @jayahariv ,

Thank you for bringing this to our attention. We understand that you're encountering the following error while using Akshara-2B-Hindi:

ValueError: Tokenizer class AksharaTokenizer does not exist or is not currently imported.

This issue likely stems from the tokenizer not being correctly registered in transformers. We are currently investigating this on our end. In the meantime, please try the following:

  1. Ensure you have the latest version of transformers
    Run the following command to update:

pip install --upgrade transformers

  1. Load the tokenizer explicitly with trust_remote_code=True

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
"SVECTOR-CORPORATION/Akshara-2B-Hindi",
trust_remote_code=True
)

If the issue persists, could you share your Python and transformers versions? This will help us diagnose the problem faster. We appreciate your patience and will update you as soon as we have a resolution.

Best regards,
SVECTOR Support Team

Thanks for the quick reply, I have tried both. Still the same issue.

SVECTOR org

Fixed the AksharaTokenizer Registration Issue

Issue

@jayahariv , thank you for reporting this issue with our model. We've identified the root cause of the error you're encountering: the AksharaTokenizer class was not properly registered with the transformers library.

Solution

To resolve this issue, please follow these steps:

1. Download the Tokenizer File

Download the akshara_tokenizer.py file we've provided and save it in your project directory (the same directory where your script is located).

2. Import the Tokenizer Module

At the beginning of your script, add the following import:

import akshara_tokenizer

3. Use AutoTokenizer as Normal

After importing the tokenizer module, load your model as usual:

from transformers import AutoTokenizer

# Ensure the path is correct
tokenizer = AutoTokenizer.from_pretrained("path/to/model")

This ensures that the AksharaTokenizer class is registered before being loaded with AutoTokenizer.

Dependencies

Please make sure you have the following dependencies installed:

pip install transformers>=4.49.0 
regex

Additional Notes

  • We have updated our model repository to include akshara_tokenizer.py for all users.
  • Future releases will have this component pre-registered, eliminating the need for manual registration.

If you continue to experience issues, please don't hesitate to contact our support team.


Best regards,
SVECTOR Support Team
[email protected]

Sign up or log in to comment