arxiv:2505.15817

Learning to Reason via Mixture-of-Thought for Logical Reasoning

Published on May 21

· Submitted by

TongZheng1999 on May 22

Upvote

Authors:

Tong Zheng ,

Lichang Chen ,

Simeng Han ,

Abstract

A Mixture-of-Thought framework enables LLMs to reason across natural language, code, and symbolic logic, improving accuracy on logical reasoning tasks compared to single-modality approaches.

AI-generated summary

Human beings naturally utilize multiple reasoning modalities to learn and solve logical problems, i.e., different representational formats such as natural language, code, and symbolic logic. In contrast, most existing LLM-based approaches operate with a single reasoning modality during training, typically natural language. Although some methods explored modality selection or augmentation at inference time, the training process remains modality-blind, limiting synergy among modalities. To fill in this gap, we propose Mixture-of-Thought (MoT), a framework that enables LLMs to reason across three complementary modalities: natural language, code, and a newly introduced symbolic modality, truth-table, which systematically enumerates logical cases and partially mitigates key failure modes in natural language reasoning. MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions. Experiments on logical reasoning benchmarks including FOLIO and ProofWriter demonstrate that our MoT framework consistently and significantly outperforms strong LLM baselines with single-modality chain-of-thought approaches, achieving up to +11.7pp average accuracy gain. Further analyses show that our MoT framework benefits both training and inference stages; that it is particularly effective on harder logical reasoning problems; and that different modalities contribute complementary strengths, with truth-table reasoning helping to overcome key bottlenecks in natural language inference.

View arXiv page View PDF GitHub repository Add to collection

Community

TongZheng1999

Paper author Paper submitter 11 days ago

Abstract: Human beings naturally utilize multiple reasoning modalities to learn and solve logical problems, i.e., different representational formats such as natural language, code, and symbolic logic. In contrast, most existing LLM-based approaches operate with a single reasoning modality during training, typically natural language. Although some methods explored modality selection or augmentation at inference time, the training process remains modality-blind, limiting synergy among modalities. To fill in this gap, we propose Mixture-of-Thought (MoT), a framework that enables LLMs to reason across three complementary modalities: natural language, code, and a newly introduced symbolic modality, truth-table, which systematically enumerates logical cases and partially mitigates key failure modes in natural language reasoning. MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions. Experiments on logical reasoning benchmarks including FOLIO and ProofWriter demonstrate that our MoT framework consistently and significantly outperforms strong LLM baselines with single-modality chain-of-thought approaches, achieving up to +11.7pp average accuracy gain. Further analyses show that our MoT framework benefits both training and inference stages; that it is particularly effective on harder logical reasoning problems; and that different modalities contribute complementary strengths, with truth-table reasoning helping to overcome key bottlenecks in natural language inference.

librarian-bot

10 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

zddfunseeker

6 days ago

•

edited 6 days ago

Nice work! The motivation of this paper closely aligns with that of Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective (https://arxiv.org/abs/2501.11110). I recommend citing this work to support a more comprehensive analysis and to further promote research in this direction.

TongZheng1999

Paper author 6 days ago

•

edited 6 days ago

Hi Dongdong, long time no see—hope you’ve been well! Thanks for sharing—I hadn’t seen CoR before, and it’s a very nice piece of work. CoR’s innovative sequential synergy of thought paradigms (NL, code, Lean), coupled with its elegant progressive training strategy, represents a significant advance in LLM-based mathematical reasoning.

Our MoT framework differs in three key ways:
1. Parallel Synergy
We integrate thought paradigms in parallel through our MoT inference rather than sequentially. Recent work by the Gemini team [2] and the Qwen team [1] also highlights the power of parallel thinking.
2. Task-Specific Innovation: Truth-Table Paradigm
Focusing on logical reasoning, we identify bottlenecks in existing paradigms and first introduce a truth-table paradigm to complement NL and code. (CoR covers NL, code, and Lean, which suits mathematical reasoning.)
3. Self-Evolving Training
We equip models with all paradigms via an on-policy self-evolving training loop—no external model is needed to generate training data, unlike CoR’s auxiliary-model and progressive-training strategy.

We will add this discussion in our paper. Please let me know if I’ve misunderstood or missed anything.
[1] Chen, Mouxiang, et al. “Parallel Scaling Law for Language Models.” arXiv (2025).
[2] DeepMind Gemini Pro: https://deepmind.google/models/gemini/pro

wanng

6 days ago

This is an excellent piece of work. I would like to suggest citing the paper "Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective" (https://arxiv.org/abs/2501.11110), as its motivation closely aligns with that of the present work. Given the similarities between the two studies, I kindly recommend discussing their relationship in the manuscript. Doing so would not only provide a more comprehensive analysis, but also further promote research in this important direction.

TongZheng1999

Paper author 6 days ago

Thank you for sharing CoR—I hadn’t seen it before, and I appreciate its elegant design and strong empirical results on mathematical reasoning benchmarks. Our MoT introduces three key innovations—parallel synergy, a truth-table paradigm, and self-evolving training—and we view CoR as a relevant concurrent work. Below are the key distinctions between CoR and MoT:
1. Task focus: CoR targets mathematical reasoning, whereas MoT is designed for logical reasoning tasks.
2. Task-specific innovation in paradigm selection: We identify bottlenecks in existing paradigms on solving logical problems and first introduce a truth-table paradigm to complement NL and code.
3. Synergy strategy: CoR employs sequential synergy; MoT generates all modalities in parallel and fuses their outputs through majority voting or MoT sampling.
4. Training strategy: CoR relies on an auxiliary large language model and a progressive training schedule to build its datasets. MoT uses a closed-loop, on-policy self-evolving training loop—no external model is needed to generate data.

Thanks again for bringing this work to our attention. Please feel free to let me know if I’ve misunderstood any aspect of it.