fix quantize logic (and lora adapter weights loading)
Thank you so much for this pull request. Previously the model was not explicitly moved to the device in the case of peft due to already being placed on the GPU. Could you confirm if moving to device is necessary and does not result in error in this update?
Yes, as per what I've tried, not moving to device actually caused error saying some tensor is not on the same device. But I do have to note an issue I've found that the evaluate_saved_model seems to have worse results when ran second time (and so on) or after the validate function, and I currently do not know why.
https://www.kaggle.com/code/rivuletnriver/diabetes-classification-geneformer
Does this difference in the results of validate and evaluate_saved_model only occur with quantized models? If so, perhaps there is an issue with how the model is saving and getting reloaded when quantized. Also, does this difference occur even with our original code or is there some difference related to this pull request that we should investigate before merging?
(This is assuming there are no other differences, such as different validation vs. test data being used for the evaluations or the labels of the data loaded with evaluate_saved_model not matching the training, etc)
I have not successfully implemented the original model since my dataset is rather small and requires lora, but I think it should be a problem with reloading the quantized model. There is no difference as for what I know, since the result worsens when running the function on the same data the second time (or later tries as well), as what I've found using either the train+eval or test set.