ICPR 2026 · Lyon, France

Advancing Comprehensive Reasoning in Multimodal Large Language Models

From visual perception to complex multimodal reasoning: foundations, benchmarks, training strategies, and real-world applications.

August 21, 2026 International Conference on Pattern Recognition

View Schedule Meet the Speakers

Overview

Reasoning beyond perception

Multimodal Large Language Models have rapidly evolved from perception-oriented vision-language systems to general-purpose agents capable of complex reasoning over images, videos, text, and structured knowledge. Despite impressive progress, comprehensive multimodal reasoning remains challenging due to visual ambiguity, compositional reasoning, hallucination, temporal understanding, and the gap between language-based reasoning and grounded visual evidence.

This tutorial provides a systematic overview of reasoning capabilities in MLLMs, covering the transition from basic visual perception to complex inference. We will discuss how multimodal reasoning differs from text-only reasoning, key challenges in visual and visual-language reasoning, recent progress inspired by reasoning-focused LLMs, and practical strategies for building, evaluating, and deploying reasoning-capable MLLMs.

What you will learn

Core topics

Foundations of Multimodal Reasoning

How MLLMs connect visual perception, language understanding, commonsense reasoning, and symbolic abstraction.

Visual Reasoning Challenges

Grounding failures, hallucination, spatial-temporal reasoning, compositional inference, and robustness.

Reasoning-Centric Training and Evaluation

Instruction tuning, chain-of-thought style supervision, data curation, benchmark design, and evaluation protocols.

Applications and Future Directions

MLLM agents, robotics, human motion understanding, scientific reasoning, and trustworthy multimodal AI.

Program

Tutorial schedule

Schedule to be announced

The detailed tutorial program, including talk titles, speaker order, and session timing, will be updated after the final ICPR 2026 tutorial schedule is confirmed.

Invited speakers

Talkers

Ziquan Liu

Queen Mary University of London

Dr. Ziquan Liu is a Lecturer / Assistant Professor at the School of Electronic Engineering and Computer Science, Queen Mary University of London. He is affiliated with the Centre for Multimodal AI and the Computer Vision Group. His research focuses on reliable and responsible machine learning, trustworthy AI, and multimodal foundation models.

Homepage

Yirui Wu

Hohai University

Dr. Yirui Wu is a Young Professor at Hohai University, a member of the Hydrology Big Data Group, and the leader of Delta Lab. His research interests include computer vision, artificial intelligence, multimedia computing, and intelligent water conservancy.

Homepage

Siyuan Yang

KTH Royal Institute of Technology

Dr. Siyuan Yang is currently a Wallenberg–NTU Presidential Postdoctoral Fellow in the Department of Robotics, Perception and Learning at KTH Royal Institute of Technology. His research interests include computer vision, action recognition, and human pose estimation.

Tutorial organizers

Organizers

Jun Liu

Lancaster University

Professor / Chair in Digital Health at the School of Computing and Communications, with research interests in computer vision and human-centered AI.

Homepage

Yiwei Wang

University of California, Merced

Assistant Professor in the Department of Computer Science and Director of the UC Merced NLP Lab, focusing on Natural Language Processing and Multimodal Large Language Models.

Homepage

Yujun Cai

University of Queensland

Lecturer (Assistant Professor) at the School of Electrical Engineering and Computer Science, with research interests in multi-modal understanding and trustworthy large models .

Homepage

Junsong Yuan

University at Buffalo, SUNY

Professor and Director of the Visual Computing Lab at Department of Computer Science and Engineering, focusing on computer vision and video understanding.

Advancing Comprehensive Reasoning in Multimodal Large Language Models

Reasoning beyond perception

Core topics

Foundations of Multimodal Reasoning

Visual Reasoning Challenges

Reasoning-Centric Training and Evaluation

Applications and Future Directions

Tutorial schedule

Schedule to be announced

Talkers

Ziquan Liu

Yirui Wu

Siyuan Yang

Organizers

Jun Liu

Yiwei Wang

Yujun Cai

Junsong Yuan

Resources