ByteDance-Seed/UI-TARS-7B-DPO · Does UI-TARS-7B-DPO have official multilingual benchmarks (en/ru/zh/etc)?

The model bytedance-research/UI-TARS-7B-DPO seems to support English, Russian, and Chinese (with varying quality based on informal testing). Is there any official data or benchmarks on its multilingual performance? Specifically:

Supported languages: Is there a full list?

Per-language metrics: Accuracy, fluency, or task-specific scores (e.g., MMLU, FLORES)?

DPO impact: Does preference optimization favor certain languages?

Any links to papers, GitHub docs, or internal stats would be appreciated!