MeanFlow-MP1: Dual-SOTA Robotic Learning from PKU
PKU's MP1 robot learning model achieves dual-SOTA speed & success rates with MeanFlow tech
"RoboPub" Publication: 20% Discount Offer Link.
In current VLA models, the "A"—action generation model—determines the quality and speed of action generation. Specifically, generative models face a "fundamental trade-off" between inference speed and task success rate.
Diffusion Models (e.g., Diffusion Policy and DP3) generate high-quality action sequences through multi-step iterations but have slow inference speeds, making them unsuitable for real-time control requirements. Flow-based models (e.g., FlowPolicy), while offering fast inference, require additional architectural constraints or consistency losses to ensure trajectory validity, increasing design complexity and potentially limiting performance and generalization.
Additionally, robotic manipulation faces the challenge of data-efficient few-shot generalization. Standard imitation learning strategies are prone to "feature collapse," where key states requiring different actions are erroneously mapped to similar latent representations, leading to inaccurate responses in new scenarios. Thus, enhancing the model's ability to distinguish between different states is critical for improving policy generalization.
To address these challenges, a research team from Peking University proposed a novel robotic learning framework called MP1. This framework introduces the MeanFlow paradigm, a recent breakthrough in image generation, to robotic learning for the first time, achieving millisecond-level inference speed and laying the foundation for VLA action generation models.