I am a final-year PhD student in Computer Vision at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). My research focuses on AIGC, 3D computer vision, multimodal vision-language models, and video generation, with an emphasis on spatial reasoning and controllable generation across image, audio, video, 3D, and 4D content.
I am the first author of PointNeXt, which has received over 1,450 citations, and 3D-CoMPaT, which was presented as an oral paper at ECCV. I also work on perception-enhanced vision-language models, such as Perceptio, and long-horizon pose diffusion for music-driven dance video synthesis, such as UCanDance.
Before joining MBZUAI, I studied at SUSTech and KAUST and gained international experience across research, open-source engineering, and applied innovation through positions and collaborations with Amazon Science, Sony AI, Dubai Business Associates / Emirates Airlines, and Apache RocketMQ.
Beyond research, I am a global traveler who has visited more than 30 countries and a lifelong tennis player who enjoys playing tennis in different countries while traveling.
If you are interested in related research, please feel free to reach out at yuchen.li [at] mbzuai.ac.ae.