Classifier data type error

#527
by yihan1119 - opened

Hello developers,

Thank you all for the great works and efforts maintaining this repo!

I tried to apply my dataset with cell classification, however, I had the following error:
0%| | 0/1 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/Users/yiwa1675/Jupyter_lab/SwitchA_sc/Geneformer/./2CellClassification.py", line 115, in
all_metrics = cc.validate(
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/geneformer/classifier.py", line 791, in validate
trainer = self.train_classifier(
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/geneformer/classifier.py", line 1269, in train_classifier
trainer.train()
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2241, in train
return inner_training_loop(
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2500, in _inner_training_loop
batch_samples, num_items_in_batch = self.get_batch_samples(epoch_iterator, num_batches)
batch_samples += [next(epoch_iterator)]
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/data_loader.py", line 566, in iter
current_batch = next(dataloader_iter)
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in next
data = self._next_data()
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 764, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
return self.collate_fn(data)
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/geneformer/collator_for_classification.py", line 642, in call
batch = self._prepare_batch(features)
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/geneformer/collator_for_classification.py", line 652, in _prepare_batch
batch = super()._prepare_batch(features)
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/geneformer/collator_for_classification.py", line 631, in _prepare_batch
batch = self.tokenizer.pad(
File "/Users/yiwa1675/miniconda3/envs/py310/lib/python3.10/site-packages/geneformer/collator_for_classification.py", line 358, in pad
raise ValueError(
ValueError: type of 2.0 unknown: <class 'float'>. Should be one of a python, numpy, pytorch or tensorflow object.
0%| | 0/28815 [00:00<?, ?it/s]

I noticed the data type is float from dataset_info.json file:
{
"citation": "",
"description": "",
"features": {
"input_ids": {
"feature": {
"dtype": "float64",
"_type": "Value"
},
"_type": "Sequence"
},
"label": {
"dtype": "int64",
"_type": "Value"
},
"length": {
"dtype": "int64",
"_type": "Value"
}
},
"homepage": "",
"license": ""
}

I made sure the input single cell RNA-seq adata matrix is int32, and I had no issue in the tokenization step.
Thanks a lot in advance for your help!

Sign up or log in to comment