A Good Leader
A person who inspires and motivates others to achieve their goals.
An Experienced Instructor
A person who inspires and motivates others to achieve their goals.
Cognitive Level Far Above Peers
Possesses exceptional cognitive abilities that stand out among peers.
Only Conduct Impactful Research
Focuses solely on research that makes a significant difference.
Resilience and Tolerance
Trains oneself to handle challenges and tolerate frustrations effectively.
Avatar

Dream Big, Work Hard, Stay Humble.

——Be Versatile, Be Great.

🎄 About Me

Hi there! Here is Shiduo Zhang, you can call me Joey as well. I am a second-year master’s student in School of Computer Science at Fudan University, advised by Prof.Xipeng Qiu. My research interests focus on embodied AI, particularly exploring the intersection of foundation models and robotics. Previously, I obtained my Bachelor’s degree from Tongji University, and spent an unforgettable year and a half as a research assistant intern at MARS Lab, Tsinghua IIIS and Shanghai QiZhi Institute during my undergraduate studies, advised by Prof.Hang Zhao.


Personally, I am dedicated to addressing truly meaningful research problems, and I tend to focus on work that aligns with first principles. The most fundamental issues I currently believe in are Scaling and Search. Feel free to contact me anytime if you share the same passion for these problems.

🔥 News

2025.6.24 one paper was accepted by ICCV 2025!
2025.5.15 one paper was accepted by ACL 2025!
2025.3.3 one paper was accepted by ICLR 2025 Workshop!
2025.2.27 one paper was accepted by CVPR 2025!
2024.12.25 VLABench Preview Version was released!

📑 Publications

Overview

FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models With Learnable Action Tokenizer and Block-Wise Decoding

Underreview

Shiduo Zhang*, Yicheng Liu*, Zibin Dong, Baijun Ye, Xiaopeng Yu, Linqi Yin, Tianyuan Yuan, Junhao Shi, Luca Yu, John Zheng, Tao Jiang, Jingjing Gong, Hang Zhao & Xipeng Qiu

"Autoregressive VLA achieves both SOTA on performance and frequency."

[Paper] [Project Page] [Code]

Overview

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

Underreview

Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, Jinlan Fu, Jingjing Gong & Xipeng Qiu

"A more comprehensive study on the generalization of VLA Models"

[Paper] [Project Page] [Code]

Overview

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks

ICCV'25

Shiduo Zhang, Zhe Xu, Peiju Liu, Xiaopeng Yu, Yuan Li, Qinghui Gao, Zhaoye Fei, Zhangyue Yin, Zuxuan Wu, Yu-Gang Jiang & Xipeng Qiu

"First benchmark for VLA. Not only for evaluation, definition matters most."

[Paper] [Project Page] [Code]

Overview

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning

ACL'25

Siyin Wang, Zhaoyao Fei, Qinyuan Chen, Shiduo Zhang, Panpan Cai, Jinlan Fu & Xipeng Qiu

[Paper] [Project Page] [Code]

Overview

Large Trajectory Models are Scalable Motion Predictors and Planners

Qiao Sun*, Shiduo Zhang* , Danjiao Ma, Jingzhe Shi, Derun Li, Simian Luo, Yu Wang, Ningyi Xu, Guangzhi Cao & Hang Zhao

"First work to scale casual transformer in Autonomous Driving. Previous SOTA at NuPlan."

[Paper] [Code]

Overview

MOSS: An Open Conversational Large Language Model

Machine Intelligence Research

Tianxiang Sun, Xiaotian Zhang, Zhengfu He, Peng Li, Qinyuan Cheng, Xiangyang Liu, Hang Yan, Yunfan Shao, Qiong Tang, Shiduo Zhang, Xingjian Zhao, Ke Chen, Yining Zheng, Zhejian Zhou, Ruixiao Li, Jun Zhan, Yunhua Zhou, Linyang Li, Xiaogui Yang, Lingling Wu, Zhangyue Yin, Xuanjing Huang, Yu-Gang Jiang & Xipeng Qiu

"First open source large language model after ChatGPT moment. Github star 12k+."

[Paper] [Hugging Face] [Code]

⚙️ Projects

Overview

AgentSim2: A Large-Scale Agent Town Driven by Large Language Models 2024.2-2024.5

Jiaju Lin*, Shiduo Zhang*

"The second version of AgentSim. A much larger ai town than Stanford one."

Overview

Generalizing Motion Planners with Mixture of Experts for Autonomous Driving 2023.12-2024.2

Main idea provided by Shiduo Zhang.

"Introduce MoE to autonomous driving, proposed by me. Due to time constraints, I only contributed to the idea and baseline in early discussions."

[Paper] [Project Page] [Code]

Overview

General Intelligent Agent for Computer Usage: A Method to Robotics Process Automation AI. 2022.5-2022.11

Shiduo Zhang, Ziwen Zhuang & Hang Zhao

"A highly forward-thinking and insightful idea!. Using mouse control for general-purpose RPA tasks. Just like Claude’s work in October 2024. However, our practice began two years earlier, in a time before GPT."

Overview

Vision-Guided Quadrupedal Agile Locomotion with Multi-Modal Information Fusion 2021.12-2022.8

Shiduo Zhang, Ziwen Zhuang & Hang Zhao

"One of the earliest projects on locomotion. Concurrent work of Legged Gym."

Academic Service

Academic Talk Give a talk at MSRA, Beijing.
Teaching Assistant Pattern Recognition and Machine Learning (COMP130137.01 & AIE410001.01)
Reviewer IROS, CVPR, ICLR, ARR

Blogs