I am a PhD researcher in Computer Vision at Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI). My research focuses on Multimodal Visual Language Models (VLMs) for image, video, 3D, and 4D generation and perception.
I am the first author of 3D-CoMPaT (ECCV Oral) and 3D-CoMPaT++ (TPAMI 2025, accepted). I also led the NeurIPS paper PointNeXt (1,000+ citations). My work Exploring Scaling Laws of PointNets received a Spotlight Talk at 3DV 2025.
I have served as a core organizer of a CVPR workshop and as a program chair and reviewer for leading AI conferences and journals such as TPAMI, IJCV, CVPR, ICCV, AAAI, TCSVT, and NeurIPS. I also contributed to Apache RocketMQ as an open-source developer.
I previously interned at Amazon Science (Prime Video, Seattle) and Sony AI (Tokyo).
PhD in Computer Vision
Mohamed Bin Zayed University of Artificial Intelligence
MSc in Computer Science
King Abdullah University of Science and Technology
BSc in Computer Science and Technology
Southern University of Science and Technology
School of Computing (Exchange Student)
National University of Singapore
Visiting Student (Electronics and Computer Engineering)
University of British Columbia