A Good Leader

A person who inspires and motivates others to achieve their goals.

An Experienced Instructor

A person who inspires and motivates others to achieve their goals.

Cognitive Level Far Above Peers

Possesses exceptional cognitive abilities that stand out among peers.

Only Conduct Impactful Research

Focuses solely on research that makes a significant difference.

Resilience and Tolerance

Trains oneself to handle challenges and tolerate frustrations effectively.

Dream Big, Work Hard, Stay Humble.

——Be Versatile, Be Great.

🎄 About Me

Hi there! Here is Shiduo Zhang, you can call me Joey as well. I am a third-year master’s student in School of Computer Science at Fudan University, advised by Prof.Xipeng Qiu. My research interests focus on embodied AI, particularly exploring the intersection of foundation models and robotics. Previously, I obtained my Bachelor’s degree from Tongji University, and spent an unforgettable year and a half as a research assistant intern at MARS Lab, Tsinghua IIIS and Shanghai QiZhi Institute during my undergraduate studies, advised by Prof.Hang Zhao. Now, I am fortunate to collaborate with Prof.Yue Wang and visit in USC and Bay area.

Personally, I am dedicated to addressing truly meaningful research problems, and I tend to focus on work that aligns with first principles. The most fundamental issues I currently believe in are Scaling and Search. Feel free to contact me anytime if you share the same passion for these problems.

🔥 News

2026.1.25 one paper was accepted by ICLR 2026!

2025.6.24 one paper was accepted by ICCV 2025!

2025.5.15 one paper was accepted by ACL 2025!

2025.3.3 one paper was accepted by ICLR 2025 Workshop!

2025.2.27 one paper was accepted by CVPR 2025!

2024.12.25 VLABench Preview Version was released!

📑 Publications

SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Underreview

Senyu Fei, Siyin Wang, Li Ji, Ao Li, Shiduo Zhang, Liming Liu, Jinlong Hou, Jingjing Gong, Xianzhong Zhao & Xipeng Qiu

"Dense reward design via latent representation with world model."

[Paper] [Project Page] [Code]

FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models With Learnable Action Tokenizer and Block-Wise Decoding

ICLR'26

Shiduo Zhang^*^†, Yicheng Liu^*^†, Zibin Dong, Baijun Ye, Xiaopeng Yu, Linqi Yin, Tianyuan Yuan, Junhao Shi, Luca Yu, John Zheng, Tao Jiang, Jingjing Gong, Hang Zhao & Xipeng Qiu

"Autoregressive VLA achieves both SOTA on performance and frequency."

[Paper] [Project Page] [Code]

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

Underreview

Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, Jinlan Fu, Jingjing Gong & Xipeng Qiu

"A more comprehensive study on the generalization of VLA Models"

[Paper] [Project Page] [Code]

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks

ICCV'25

Shiduo Zhang, Zhe Xu, Peiju Liu, Xiaopeng Yu, Yuan Li, Qinghui Gao, Zhaoye Fei, Zhangyue Yin, Zuxuan Wu, Yu-Gang Jiang & Xipeng Qiu

"First benchmark for VLA. Not only for evaluation, definition matters most."

[Paper] [Project Page] [Code]

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning

ACL'25

Siyin Wang, Zhaoyao Fei, Qinyuan Chen, Shiduo Zhang, Panpan Cai, Jinlan Fu & Xipeng Qiu

[Paper] [Project Page] [Code]

Large Trajectory Models are Scalable Motion Predictors and Planners

Qiao Sun^*, Shiduo Zhang^* , Danjiao Ma, Jingzhe Shi, Derun Li, Simian Luo, Yu Wang, Ningyi Xu, Guangzhi Cao & Hang Zhao

"First work to scale casual transformer in Autonomous Driving. Previous SOTA at NuPlan."

[Paper] [Code]

MOSS: An Open Conversational Large Language Model

Machine Intelligence Research

Tianxiang Sun, Xiaotian Zhang, Zhengfu He, Peng Li, Qinyuan Cheng, Xiangyang Liu, Hang Yan, Yunfan Shao, Qiong Tang, Shiduo Zhang, Xingjian Zhao, Ke Chen, Yining Zheng, Zhejian Zhou, Ruixiao Li, Jun Zhan, Yunhua Zhou, Linyang Li, Xiaogui Yang, Lingling Wu, Zhangyue Yin, Xuanjing Huang, Yu-Gang Jiang & Xipeng Qiu

"First open source large language model after ChatGPT moment. Github star 12k+."

[Paper] [Hugging Face] [Code]

⚙️ Projects

AgentSim2: A Large-Scale Agent Town Driven by Large Language Models 2024.2-2024.5

Jiaju Lin^*, Shiduo Zhang^*

"The second version of AgentSim. A much larger ai town than Stanford one."

Generalizing Motion Planners with Mixture of Experts for Autonomous Driving 2023.12-2024.2

Main idea provided by Shiduo Zhang.

"Introduce MoE to autonomous driving, proposed by me. Due to time constraints, I only contributed to the idea and baseline in early discussions."

[Paper] [Project Page] [Code]

General Intelligent Agent for Computer Usage: A Method to Robotics Process Automation AI. 2022.5-2022.11

Shiduo Zhang, Ziwen Zhuang & Hang Zhao

"A highly forward-thinking and insightful idea!. Using mouse control for general-purpose RPA tasks. Just like Claude’s work in October 2024. However, our practice began two years earlier, in a time before GPT."

Vision-Guided Quadrupedal Agile Locomotion with Multi-Modal Information Fusion 2021.12-2022.8

Shiduo Zhang, Ziwen Zhuang & Hang Zhao

"One of the earliest projects on locomotion. Concurrent work of Legged Gym."

Dream Big, Work Hard, Stay Humble.

🎄 About Me

🔥 News

📑 Publications

SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models With Learnable Action Tokenizer and Block-Wise Decoding

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning

Large Trajectory Models are Scalable Motion Predictors and Planners

MOSS: An Open Conversational Large Language Model

⚙️ Projects

AgentSim2: A Large-Scale Agent Town Driven by Large Language Models 2024.2-2024.5

Generalizing Motion Planners with Mixture of Experts for Autonomous Driving 2023.12-2024.2

General Intelligent Agent for Computer Usage: A Method to Robotics Process Automation AI. 2022.5-2022.11

Vision-Guided Quadrupedal Agile Locomotion with Multi-Modal Information Fusion 2021.12-2022.8

Academic Service

Blogs