Validating fine-tuning of AnglE model

#27
by hugguc - opened

Hey,

I could use some guidance in using AnglE. I'd appreciate a response.

I'd like to i) download pre-trained AnglE model (e.g. UAE-Large-V1) and then ii) fine-tune that model to my subject area. I seem to not be able to do that successfully, hence a couple of questions:

  1. Does this approach (of using pre-trained AnglE model and then fine-tuning it to my space) even make sense, or I absolutely have to fine-tune AngleE from Bert or alike, in a single step?

  2. Can I run a fine-tuning experiment, using only a handful (say 20, total) of positive and negative examples?

  3. In machine learning space, test-on-train inference usually works with a very high degree of accuracy: if one pulls example/label pairs from the train set and forms the test set out of those pairs, then the inference would work close-to-perfectly on that test set. I'm expecting the same effect when using AnglE: once I've trained the model on my own positive/negative examples, those same examples should be properly characterized as positive/negative with a very high degree of certainty. Is that a correct assumption? Somehow I'm not observing this to be the case and I presume that I'm doing something wrong.

Thanks again!

WhereIsAI org
โ€ข
edited 15 days ago

thanks for using angle. Here are the answers to your questions:

  1. it makes sense and is recommended to fine-tune WhereIsAI/UAE-Large-V1 with your domain data; it is suggested to set a smaller learning rate, e.g., 1e-6/1e-7. To help you fine-tune your model, here is a tutorial to fine-tune a medical domain embedding with angle and WhereIsAI/UAE-Large-V1: https://angle.readthedocs.io/en/latest/notes/tutorial.html

  2. yes, you can try it, but I think 20 samples are not that enough. I encourage you to collect more samples, at least a few hundred.

  3. it depends on your data and training hyperparameters. To verify it, you can compare the average similarity of pos/negative pairs between fine-tuned and non-fine-tuned models. Take pos pairs as an example, if the average similarity of the fine-tuned model is smaller than non-fine-tuned's, the fine-tuning should work well.

Thanks for providing this guidance, this recipe seems to have worked for me!

I guess that would be a simple question. I've fine-tuned model WhereIsAI/UAE-Large-V1 using your instructions and saved it into --save_dir ckpts/uae-medical-large-v1. I didn't push the model to the hub, however (removed arguments from "push_to_hub 1" like).

When I load the model, I see what seems to be a warning message, which I include below.

I wonder, am I using the correct syntax to load the model? Is this warning message expected?

Thank you!

angle_t = AnglE.from_pretrained('ckpts/uae-medical-large-v1', pooling_strategy='cls').cuda()
Some weights of BertModel were not initialized from the model checkpoint at ckpts/uae-medical-large-v1 and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

WhereIsAI org

hi @hugguc , you got this message since you set --load_mlm_model 1 during training. it is a normal message, actually; you can test it.

WhereIsAI org

i already removed the --load_mlm_model from the tutorial note

Hey @SeanLee97 ,

Thanks for explaining that issue and fixing the doc!

As I reported earlier, I'm noticing that my fine-tuning run, for which I use only very few examples, terminates very quickly, while not achieving good prediction performance even on the samples that I used for training. I'm wondering, is there a way to have the trainer run through more iterations, in hope that it will at least produce descent predictions on the training set? I tried to change --epoch argument from your recommedation of 1 to 10, but this didn't affect the result by much, and the run also was very quick.

Thank you for your advice!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment