SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper β’ 2504.11468 β’ Published 10 days ago β’ 20