mistralai/Mistral-Small-3.1-24B-Instruct-2503 · Transformers Code Almost Works

I am trying to run this model using transformers as I already have custom training set up for this library.

The template follows from the docs: https://huggingface.co/docs/transformers/main/en/model_doc/mistral3

It loads the model, and fails with:
TypeError: PixtralVisionModel.forward() missing 1 required positional argument: 'image_sizes'

Running just text (no images) failes with a different error:
visuals = [content for content in message["content"] if content["type"] in ["image", "video"]]
~~~~~~~^^^^^^^^
TypeError: string indices must be integers, not 'str'