Token Classification
spaCy
Turkish
Eval Results

Please Upgrade Spacy Version

#1
by yovelcohen1 - opened

Hi there! thanks for the great models, is there a chance to upgrade the spacy versions range? at least until 3.8.2?
It would be great to stay consistent with newer versions, and also, it seems that it shouldn't require too much and would work pretty smoothly.

Turkish NLP Suite org

Hello, thanks for your message.

Upgrading is definitely in plans, also I'm planning misc enhancements to performance. However timeline is not quite deterministic , as I do Turkish development as side project. I plan to do the upgrade till end of summer definitely, tough. Thanks for your visit again!

@BayanDuygu I simply tried to update the spacy versions range in the meta.json file, then rebuilding the package.
It worked and the package seems to be working perfectly fine.
I can open a PR if you want, I did it just for the turkish-nlp-suite/tr_core_news_md

Turkish NLP Suite org

@BayanDuygu I simply tried to update the spacy versions range in the meta.json file, then rebuilding the package.
It worked and the package seems to be working perfectly fine.
I can open a PR if you want, I did it just for the turkish-nlp-suite/tr_core_news_md

You wanna update the range? Why not, I'd like a PR definitely. Sorry for the late answer btw,.

I'm also experiencing an issue with the spaCy version. @yovelcohen1 , I tried to unpack the .whl file, but it says 'Missing tr_core_news_md-1.0.dist-info/RECORD file.' How did you pack/unpack the .whl file?

Turkish NLP Suite org

I'm also experiencing an issue with the spaCy version. @yovelcohen1 , I tried to unpack the .whl file, but it says 'Missing tr_core_news_md-1.0.dist-info/RECORD file.' How did you pack/unpack the .whl file?

Normally you wouldn't modify a whl file by hand. whl files need to have file info. If you want you can follow this directions, I was lazy to write, asked ChatGPT instead :grin:

The error indicates that the RECORD file in the .dist-info directory is missing. This file is part of the metadata of a Python .whl file (wheel) and is required to maintain integrity and track file contents. If you're trying to modify a .whl file, you must ensure that the RECORD file is updated properly when repacking. Here's how you can unpack, modify, and repack a .whl file, ensuring the RECORD file is handled correctly.

Steps to Unpack and Repack a .whl File

1. Unpack the .whl File

A .whl file is essentially a ZIP archive. You can extract it using any ZIP utility or Python's zipfile module.

Using the command line:

unzip tr_core_news_md-1.0.whl -d unpacked_wheel

The contents of the .whl file will be extracted to the unpacked_wheel directory.

2. Modify the Files

You can now edit the files inside the extracted directory. Be careful not to accidentally delete or modify critical metadata files in the .dist-info directory unless necessary.

3. Recreate the RECORD File

The RECORD file contains a list of all files in the .whl, along with their hashes and sizes. When you modify or add files, you need to regenerate this file.

To regenerate the RECORD file:

  • Use the wheel library, which provides a tool for working with .whl files. Install it if not already installed:

    pip install wheel
    
  • Navigate to the unpacked directory:

    cd unpacked_wheel
    
  • Use the wheel pack command to regenerate the RECORD file and repack the .whl:

    wheel pack . -d ../
    

This will create a new .whl file in the parent directory with the correct RECORD file.

4. Repack the .whl File Manually (if not using wheel)

If you prefer to repack the .whl file manually, follow these steps:

  1. Navigate to the directory containing the unpacked files:

    cd unpacked_wheel
    
  2. Use the zip command to repack the files:

    zip -r ../new_tr_core_news_md-1.0.whl .
    
  3. Regenerate the RECORD file:

    • You need to manually calculate the hash (e.g., SHA-256) and size for each file in the .whl, then update the RECORD file. This is error-prone, so using the wheel tool is strongly recommended.

5. Verify the New .whl File

To ensure the new .whl file is valid, install it in a virtual environment:

pip install new_tr_core_news_md-1.0.whl

If it installs without errors, the repacked file is good to go.


Summary

  • Use unzip or Python to extract the .whl file.
  • Modify the files as needed.
  • Use the wheel library to regenerate the RECORD file and repack the .whl file.
  • Avoid manually editing the RECORD file unless absolutely necessary, as it's error-prone.

@BayanDuygu I was using wrong unpack commad. Thanks for the help. Using 3.8.2 spacy version does not gives any error.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment