IROS 2026 Workshop

Small Data, Rich Sensing

Multimodal Learning for Robotic Manipulation

Half-day Workshop • Pittsburgh, PA, USA • September 27, 2026

About the Workshop

Foundation models trained on internet-scale data have achieved impressive generalization on vision and language domains, but robotic manipulation is fundamentally contact-rich. Success and safety often hinge on signals that vision alone cannot reliably infer — including touch, force/torque, proprioception, and audio. This mismatch creates a sensory gap: foundation models excel at semantics and geometry, yet we still lack a clear understanding of when vision-only policies break under contact, friction, compliance, occlusion, and real-time constraints.

This workshop brings together researchers in foundation models, tactile/haptic sensing, state estimation, and control to address a central question:

"Is vision all we need for manipulation, or what additional sensing and interaction modeling is minimally necessary?"

Topics of Interest

Foundation Models & VLA

Vision–language–action and foundation-model pipelines for manipulation; failure-mode taxonomies for vision-only policies under contact, friction, compliance, and occlusion; sensory sufficiency evaluation via principled ablations and capability-centric metrics (success, robustness, recovery, and safety).

Rich Sensing

Tactile and haptic sensing, force/torque, proprioception, and audio for contact-rich interaction; sensor design, simulation, and calibration; identifying when non-visual signals change capability — not just accuracy.

Multimodal Learning & Control

Multimodal fusion and representation learning; contact-aware control interfaces; uncertainty estimation and recovery under limited real-world data; evaluation protocols, reproducible benchmarks, and negative/ablation results.

Invited Speakers

Jie Tan Jie Tan

Google DeepMind

David Hsu David Hsu

National University of Singapore

Jiajun Wu Jiajun Wu

Stanford University

Yunzhu Li Yunzhu Li

Columbia University

Program

Time Event
8:30 AMWelcome
8:45 AMInvited Talk 1
9:15 AMInvited Talk 2
9:45 AMPoster Session & Coffee Break
10:15 AMInvited Talk 3
11:00 AMInvited Talk 4
11:30 AMPanel Debate: Is vision all we need?
12:15 PMConclusion
12:30 PMWorkshop Ends

Call for Papers

We invite submissions of short papers (4 pages + references) and extended abstracts (2 pages) on topics including but not limited to: multimodal sensing for manipulation, foundation models for contact-rich tasks, tactile/haptic perception, force/torque estimation, multimodal fusion, contact-aware control, and evaluation protocols for sensory sufficiency.

We welcome works-in-progress, ablation studies, negative results, and system papers with reproducible artifacts (datasets, protocols, benchmarks). Accepted contributions will be presented as posters with optional lightning talks.

Important Dates

  • Submission deadline: TBD
  • Notification: TBD
  • Camera-ready: TBD
  • Workshop: September 27, 2026 — IROS 2026, Pittsburgh, PA, USA
Submit on OpenReview

Organizers

Haonan Chen Haonan Chen

Harvard University

Shuijing Liu Shuijing Liu

UT Austin

Mingyo Seo Mingyo Seo

UT Austin

Raven Huang Raven Huang

Stanford University