Learning to Reason via Mixture-of-Thought for Logical Reasoning
Abstract
A Mixture-of-Thought framework enables LLMs to reason across natural language, code, and symbolic logic, improving accuracy on logical reasoning tasks compared to single-modality approaches.
Human beings naturally utilize multiple reasoning modalities to learn and solve logical problems, i.e., different representational formats such as natural language, code, and symbolic logic. In contrast, most existing LLM-based approaches operate with a single reasoning modality during training, typically natural language. Although some methods explored modality selection or augmentation at inference time, the training process remains modality-blind, limiting synergy among modalities. To fill in this gap, we propose Mixture-of-Thought (MoT), a framework that enables LLMs to reason across three complementary modalities: natural language, code, and a newly introduced symbolic modality, truth-table, which systematically enumerates logical cases and partially mitigates key failure modes in natural language reasoning. MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions. Experiments on logical reasoning benchmarks including FOLIO and ProofWriter demonstrate that our MoT framework consistently and significantly outperforms strong LLM baselines with single-modality chain-of-thought approaches, achieving up to +11.7pp average accuracy gain. Further analyses show that our MoT framework benefits both training and inference stages; that it is particularly effective on harder logical reasoning problems; and that different modalities contribute complementary strengths, with truth-table reasoning helping to overcome key bottlenecks in natural language inference.
Community
Abstract: Human beings naturally utilize multiple reasoning modalities to learn and solve logical problems, i.e., different representational formats such as natural language, code, and symbolic logic. In contrast, most existing LLM-based approaches operate with a single reasoning modality during training, typically natural language. Although some methods explored modality selection or augmentation at inference time, the training process remains modality-blind, limiting synergy among modalities. To fill in this gap, we propose Mixture-of-Thought (MoT), a framework that enables LLMs to reason across three complementary modalities: natural language, code, and a newly introduced symbolic modality, truth-table, which systematically enumerates logical cases and partially mitigates key failure modes in natural language reasoning. MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions. Experiments on logical reasoning benchmarks including FOLIO and ProofWriter demonstrate that our MoT framework consistently and significantly outperforms strong LLM baselines with single-modality chain-of-thought approaches, achieving up to +11.7pp average accuracy gain. Further analyses show that our MoT framework benefits both training and inference stages; that it is particularly effective on harder logical reasoning problems; and that different modalities contribute complementary strengths, with truth-table reasoning helping to overcome key bottlenecks in natural language inference.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning (2025)
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains (2025)
- SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning (2025)
- General-Reasoner: Advancing LLM Reasoning Across All Domains (2025)
- Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL (2025)
- AlignRAG: Leveraging Critique Learning for Evidence-Sensitive Retrieval-Augmented Reasoning (2025)
- Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Nice work! The motivation of this paper closely aligns with that of Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective (https://arxiv.org/abs/2501.11110). I recommend citing this work to support a more comprehensive analysis and to further promote research in this direction.
Hi Dongdong, long time no seeāhope youāve been well! Thanks for sharingāI hadnāt seen CoR before, and itās a very nice piece of work. CoRās innovative sequential synergy of thought paradigms (NL, code, Lean), coupled with its elegant progressive training strategy, represents a significant advance in LLM-based mathematical reasoning.
Our MoT framework differs in three key ways:
1. Parallel Synergy
We integrate thought paradigms in parallel through our MoT inference rather than sequentially. Recent work by the Gemini team [2] and the Qwen team [1] also highlights the power of parallel thinking.
2. Task-Specific Innovation: Truth-Table Paradigm
Focusing on logical reasoning, we identify bottlenecks in existing paradigms and first introduce a truth-table paradigm to complement NL and code. (CoR covers NL, code, and Lean, which suits mathematical reasoning.)
3. Self-Evolving Training
We equip models with all paradigms via an on-policy self-evolving training loopāno external model is needed to generate training data, unlike CoRās auxiliary-model and progressive-training strategy.
We will add this discussion in our paper. Please let me know if Iāve misunderstood or missed anything.
[1] Chen, Mouxiang, et al. āParallel Scaling Law for Language Models.ā arXiv (2025).
[2] DeepMind Gemini Pro: https://deepmind.google/models/gemini/pro
This is an excellent piece of work. I would like to suggest citing the paper "Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective" (https://arxiv.org/abs/2501.11110), as its motivation closely aligns with that of the present work. Given the similarities between the two studies, I kindly recommend discussing their relationship in the manuscript. Doing so would not only provide a more comprehensive analysis, but also further promote research in this important direction.
Thank you for sharing CoRāI hadnāt seen it before, and I appreciate its elegant design and strong empirical results on mathematical reasoning benchmarks. Our MoT introduces three key innovationsāparallel synergy, a truth-table paradigm, and self-evolving trainingāand we view CoR as a relevant concurrent work. Below are the key distinctions between CoR and MoT:
1. Task focus: CoR targets mathematical reasoning, whereas MoT is designed for logical reasoning tasks.
2. Task-specific innovation in paradigm selection: We identify bottlenecks in existing paradigms on solving logical problems and first introduce a truth-table paradigm to complement NL and code.
3. Synergy strategy: CoR employs sequential synergy; MoT generates all modalities in parallel and fuses their outputs through majority voting or MoT sampling.
4. Training strategy: CoR relies on an auxiliary large language model and a progressive training schedule to build its datasets. MoT uses a closed-loop, on-policy self-evolving training loopāno external model is needed to generate data.
Thanks again for bringing this work to our attention. Please feel free to let me know if Iāve misunderstood any aspect of it.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper