Revanth Gundala

Compressing Robot Vision into 8 Objects

We replaced 256 visual patch tokens with 8 learned object slots and trained a robot VLA from scratch. Slot compression improved training efficiency by 11%.

Mar 5, 2026

Compressing Robot Vision into 8 Objects

Trying to Make a VLA Its Own Reward Model