Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation
Yuchen Li, Amanmeet Garg, Shalini Chaudhuri, Rui Zhao, Garin Kessler
March 2026Abstract
Perceptio explores perception-enhanced vision-language modeling through spatial token generation for complex 2D and 3D spatial reasoning.
Publication
arXiv preprint, 2026