RoboPub

RoboPub

First-Person Video and Motion Sync Achieved

EgoTwin: First AI to sync first-person videos & human motions. Breakthrough in wearable tech, AR, and embodied AI.

Meng Li's avatar
Meng Li
Oct 01, 2025
∙ Paid
1
1
Share

“RoboPub” Publication: 20% Discount Offer Link.


AI-generated third-person perspective videos have become highly proficient, but first-person perspective generation remains challenging.

To address this, the National University of Singapore, Nanyang Technological University, Hong Kong University of Science and Technology, and Shanghai AI Laboratory jointly released EgoTwin, achieving the first-ever joint generation of first-person perspective videos and human motions.

This breakthrough overcomes two major bottlenecks—view-motion alignment and causal coupling—opening new avenues for wearable computing, AR, and embodied intelligence applications.

Image

EgoTwin is a diffusion model-based framework that jointly generates first-person perspective videos and human motions in a view-consistent and causally coherent manner.

The generated videos can leverage camera poses derived from human motions, enhanced by 3D Gaussian Splatting to render three-dimensional scenes.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture