Real2Sim2Real Tactile Policy Learning

Blind Dexterous Grasping via Real2Sim2Real Tactile Policy Learning

A tactile-only dexterous grasping policy for a sensorized LEAP Hand: calibrated contact simulation, geometry-aware tactile representation, and diffusion-based policy distillation.

44tactile channels

20real objects

0visual input

Paper Supplementary Video Code Hardware

**Zero-shot sim-to-real transfer of a tactile-conditioned control policy for reactive blind grasping.** The deployed policy performs blind object interaction—searching, adjusting contacts, and lifting—without any visual input, using only robot state and sparse tactile feedback.

Abstract

Blind grasping with a dexterous hand is a crucial manipulation capability. Nevertheless, learning such tactile-only policies for real robots remains challenging due to the tactile sim-to-real gap and the limited expressiveness of sparse tactile signals.

To bridge this gap, we propose a framework for tactile-only blind grasping that is deployable on a physical multi-fingered robotic hand. Our approach combines three key components. First, we introduce a Real2Sim tactile calibration pipeline that constructs a contact-calibrated digital-twin simulator capable of reproducing real tactile signals. Second, we improve the expressiveness of sparse tactile observations using a layout-aware tactile encoder, which incorporates sensor-geometry priors through self-supervised pretraining. Third, to improve generalization to unseen objects, we train object-specific reinforcement-learning experts in the calibrated simulator and aggregate their successful grasp trajectories into a tactile-conditioned Diffusion Policy.

We evaluate our method on a physical LEAP Hand equipped with distributed tactile sensing across 10 seen and 10 unseen objects. The deployed policy achieves a 27% real-world grasp success rate across all 20 objects, without real-world grasping demonstrations or visual input.

Key Ideas

Real2Sim tactile calibration

Aligns binary contact-event timing between simulation and hardware using lightweight task-agnostic controlled-contact motions, without requiring complex soft-body modeling or real-world grasping demonstrations.

Layout-aware tactile encoder

Grounds each sensor in 3D via hand kinematics and is pretrained with privileged geometric supervision (object pose, contact labels), injecting spatial priors into sparse binary tactile observations.

Expert-to-diffusion distillation

Decouples exploration from deployment: object-specific RL experts generate diverse contact-rich trajectories in simulation, which are aggregated to train a single tactile-conditioned Diffusion Policy capable of multimodal grasp generation under partial observability.

Supplementary Video

Blind grasping in motion

A compact reel of physical trials shows the policy searching for contact, adjusting finger placement, and lifting objects without visual input.

Policy Architecture

Our Real2Sim2Real framework consists of three stages:

Stage 1 — Calibrated simulation. We construct a paired simulator matching the robot’s kinematics and tactile sensor layout, then calibrate binary tactile events via task-agnostic controlled-contact motions to align contact onset and activation patterns with the physical hand.
Stage 2 — Privileged tactile pretraining. A layout-aware tactile encoder is pretrained in simulation using privileged geometric and contact labels (object pose, geometry, contact masks) that are unavailable during deployment, distilling 3D structural priors into the representation.
Stage 3 — Diffusion policy learning. Object-specific RL experts are trained purely as data generators. Their successful contact-rich trajectories are aggregated, and a tactile-conditioned Diffusion Policy is trained from this offline dataset—the only policy deployed on the real hand.

Overview of the proposed Real2Sim2Real framework for blind dexterous grasping — **Real2Sim2Real framework overview.** Calibrated tactile simulation provides contact-rich training data; a layout-aware tactile encoder distills geometric priors; a Diffusion Policy enables multimodal grasp generation for real-world deployment.

Hardware

Our physical platform consists of a 6-DoF xArm6 manipulator equipped with a 16-DoF LEAP Hand. The hand is instrumented with distributed tactile sensing: four custom-built TwinTac sensors on the fingertips (providing $4 \times 8 = 32$ binary channels) and 12 FSR channels on the palm and finger surfaces, yielding 44 tactile channels in total.

Open-source hardware. To facilitate reproducibility, we release the full hardware design of our tactile sensing suite:

TwinTac fingertip sensors — PCB design files, elastomer shell 3D models, and assembly instructions
FSR palm & phalange sensors — 3D-printable mounting brackets and wiring diagrams
Communication firmware — ROS2-based driver code for synchronized tactile-proprioceptive data streaming

Open-source hardware repository (coming soon)

Experiment Setup

We evaluate on 20 diverse objects (10 seen, 10 unseen) with the policy deployed zero-shot on the physical LEAP Hand—no real-world grasping demonstrations, no visual input, only proprioception and sparse tactile feedback.

Experiment Results

Per-object grasp success rates

Success rates for all 20 objects across 5 trials each, comparing policies with and without privileged tactile pretraining.

Seen avg. w/ pretrain 60.4%

Unseen avg. w/ pretrain 43.2%

Average improvement +23.7%

Seen Objects

10 trained object instances

60.4%

Object	w/o Pretrain	w/ Pretrain	Improve.
Cube	60%	62%	+2%
Ball	8%	56%	+48%
Box	30%	48%	+18%
Cross	52%	66%	+14%
CubeBall	38%	100%	+62%
Egg	16%	24%	+8%
Big Egg	0%	8%	+8%
H-shape	44%	76%	+32%
Hollow Cube	70%	98%	+28%
Hourglass	44%	66%	+22%
Seen Avg.	36.2%	60.4%	+24.2%

Unseen Objects

10 held-out object instances

43.2%

Object	w/o Pretrain	w/ Pretrain	Improve.
Hash-Shape	22%	42%	+20%
C-Shape	18%	34%	+16%
E-Shape	18%	38%	+20%
T-Shape	28%	56%	+28%
Cylinder	2%	14%	+12%
Fork	20%	62%	+42%
Ring	50%	80%	+30%
Snowman	22%	48%	+26%
Tetrahedral	8%	28%	+20%
Triple	12%	30%	+18%
Unseen Avg.	20.0%	43.2%	+23.2%

Qualitative Results

Policy rollouts

Limitations

Several limitations remain in the current system. First, the overall success rate is 27%—encouraging for a challenging tactile-only setting, but modest in absolute terms. Second, the hardware provides incomplete tactile coverage: contacts in unsensed regions of the hand produce ambiguous observations that can confuse the policy. Third, the fixed execution horizon does not always suffice for the policy to establish stable grasp configurations, sometimes leading to empty grasps or object drops during lifting.

Conclusion

This paper presents a complete pipeline for learning and deploying a blind grasping policy on a robotic dexterous hand. Our pipeline combines a tactile-enabled dexterous hand platform, a high-fidelity tactile simulation environment calibrated via Real2Sim, and an RL expert policy with a distillation framework for Sim2Real deployment. By jointly leveraging contact-event calibration, geometry-aware tactile representation learning, and diffusion-based policy aggregation, our system achieves promising results on a large and diverse real-world object set. Future work will focus on improving grasp robustness through denser tactile skins, explicit slip detection, and two-timescale reactive control architectures.