Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States
Abstract
The DynToM benchmark evaluates LLMs' ability to track and understand the temporal progression of mental states, revealing significant gaps compared to human performance.
As Large Language Models (LLMs) increasingly participate in human-AI interactions, evaluating their Theory of Mind (ToM) capabilities - particularly their ability to track dynamic mental states - becomes crucial. While existing benchmarks assess basic ToM abilities, they predominantly focus on static snapshots of mental states, overlooking the temporal evolution that characterizes real-world social interactions. We present DynToM, a novel benchmark specifically designed to evaluate LLMs' ability to understand and track the temporal progression of mental states across interconnected scenarios. Through a systematic four-step framework, we generate 1,100 social contexts encompassing 5,500 scenarios and 78,100 questions, each validated for realism and quality. Our comprehensive evaluation of ten state-of-the-art LLMs reveals that their average performance underperforms humans by 44.7\%, with performance degrading significantly when tracking and reasoning about the shift of mental states. This performance gap highlights fundamental limitations in current LLMs' ability to model the dynamic nature of human mental states.
Community
DYNTOM addresses a critical gap in current ToM evaluations - the ability to track and understand how human mental states evolve over time in real-world social interactions. While existing benchmarks like SocialIQA, BigToM, and TOMBENCH focus on static snapshots, our work introduces a novel approach to evaluate LLMs' understanding of dynamic mental state changes across multiple interconnected scenarios.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Theory of Mind in Large Language Models: Assessment and Enhancement (2025)
- MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems (2025)
- Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective (2025)
- Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models? (2025)
- Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications (2025)
- DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic (2025)
- Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper