IROS 2026 Workshop

Small Data, Rich Sensing

Multimodal Learning for Robotic Manipulation

Half-day Workshop • Pittsburgh, PA, USA • September 27, 2026

About the Workshop

Foundation models trained on internet-scale data have achieved impressive generalization on vision and language domains, but robotic manipulation is fundamentally contact-rich. Success and safety often hinge on signals that vision alone cannot reliably infer — including touch, force/torque, proprioception, and audio. This mismatch creates a sensory gap: foundation models excel at semantics and geometry, yet we still lack a clear understanding of when vision-only policies break under contact, friction, compliance, occlusion, and real-time constraints.

This workshop brings together researchers in foundation models, tactile/haptic sensing, state estimation, and control to address a central question:

"Is vision all we need for manipulation, or what additional sensing and interaction modeling is minimally necessary?"

Topics of Interest

Foundation Models & VLA

Vision–language–action and foundation-model pipelines for manipulation; failure-mode taxonomies for vision-only policies under contact, friction, compliance, and occlusion; sensory sufficiency evaluation via principled ablations and capability-centric metrics (success, robustness, recovery, and safety).

Rich Sensing

Tactile and haptic sensing, force/torque, proprioception, and audio for contact-rich interaction; sensor design, simulation, and calibration; identifying when non-visual signals change capability — not just accuracy.

Multimodal Learning & Control

Multimodal fusion and representation learning; contact-aware control interfaces; uncertainty estimation and recovery under limited real-world data; evaluation protocols, reproducible benchmarks, and negative/ablation results.

Invited Speakers

Jie Tan

Google DeepMind

David Hsu

National University of Singapore

Jiajun Wu

Stanford University

Yunzhu Li

Columbia University

Program

Time	Event
8:30 AM	Welcome
8:45 AM	Invited Talk 1
9:15 AM	Invited Talk 2
9:45 AM	Poster Session & Coffee Break
10:15 AM	Invited Talk 3
11:00 AM	Invited Talk 4
11:30 AM	Panel Debate: Is vision all we need?
12:15 PM	Conclusion
12:30 PM	Workshop Ends

Call for Papers

We invite submissions of short papers (4 pages + references) and extended abstracts (2 pages) on topics including but not limited to: multimodal sensing for manipulation, foundation models for contact-rich tasks, tactile/haptic perception, force/torque estimation, multimodal fusion, contact-aware control, and evaluation protocols for sensory sufficiency.

We welcome works-in-progress, ablation studies, negative results, and system papers with reproducible artifacts (datasets, protocols, benchmarks). Accepted contributions will be presented as posters with optional lightning talks.

Important Dates

Submission deadline: TBD
Notification: TBD
Camera-ready: TBD
Workshop: September 27, 2026 — IROS 2026, Pittsburgh, PA, USA

Submit on OpenReview

Organizers

Haonan Chen

Harvard University

Shuijing Liu

UT Austin

Yuxiang Ma

MIT

Mingyo Seo

UT Austin

Raven Huang

Stanford University