First-Person Video and Motion Sync Achieved
EgoTwin: First AI to sync first-person videos & human motions. Breakthrough in wearable tech, AR, and embodied AI.
“RoboPub” Publication: 20% Discount Offer Link.
AI-generated third-person perspective videos have become highly proficient, but first-person perspective generation remains challenging.
To address this, the National University of Singapore, Nanyang Technological University, Hong Kong University of Science and Technology, and Shanghai AI Laboratory jointly released EgoTwin, achieving the first-ever joint generation of first-person perspective videos and human motions.
This breakthrough overcomes two major bottlenecks—view-motion alignment and causal coupling—opening new avenues for wearable computing, AR, and embodied intelligence applications.
EgoTwin is a diffusion model-based framework that jointly generates first-person perspective videos and human motions in a view-consistent and causally coherent manner.
The generated videos can leverage camera poses derived from human motions, enhanced by 3D Gaussian Splatting to render three-dimensional scenes.


