Yao Zhang
Ph.D. Candidate @ LMU Munich
I am a final-year Ph.D. candidate in Computer Science at LMU Munich, advised by Prof. Volker Tresp. Previously, I received my M.Sc. (2021) and B.Sc. (2019) from LMU Munich.
My research is focused on agent reliability, reducing compounding failures in long-horizon tasks through structured search and process supervision. As agents take on increasingly autonomous roles, reliable multi-step execution remains one of the central challenges for real-world deployment. I develop methods that address this, from single-agent execution to multi-agent system design. Earlier in my PhD, I also worked on multimodal learning, including parameter-efficient fine-tuning in federated and continual settings. Looking ahead, I am interested in enabling agents to take on increasingly autonomous roles, where they can be trusted to operate over extended periods with minimal human oversight. Feel free to reach out if you are interested in my work or would like to connect.
I am on the job market for Research Scientist / Applied Scientist positions in agentic systems.
News
| Jan 2026 | WebArbiter accepted at ICLR 2026. |
|---|---|
| Nov 2025 | AUVIC accepted at AAAI 2026. |
| Oct 2025 | GroundedPRM presented at LAW@NeurIPS 2025. |
| Aug 2025 | SwarmAgentic accepted at EMNLP 2025 (Main). |
| Mar 2025 | FedBiP accepted at CVPR 2025. |
| Dec 2024 | WebPilot accepted at AAAI 2025. |
| Oct 2024 | CL-CrossVQA accepted at WACV 2025. |
| Dec 2023 | FedDAT accepted at AAAI 2024. |
Selected Publications
Full List →-
ICLR 2026WebArbiter is a reasoning-first, principle-inducing WebPRM that formulates process reward modeling as text generation, producing structured justifications that conclude with a preference verdict to identify the action most conducive to task completion. Trained via reasoning distillation and reinforcement learning, it achieves SOTA on WebPRMBench and delivers substantial gains in reward-guided trajectory search on WebArena-Lite. -
EMNLP 2025 (Main)SwarmAgentic is a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. It maintains a population of candidate systems and evolves them via feedback-guided updates inspired by Particle Swarm Optimization, enabling efficient exploration of the agentic system design space. -
LAW@NeurIPS 2025GroundedPRM is a tree-guided and fidelity-aware framework for automatic process reward modeling that combines MCTS-guided path construction with tool-based step verification. It achieves SOTA performance with only 10% of the training data compared to existing auto-labeled methods, demonstrating exceptional sample efficiency and superior reasoning quality. -
AAAI 2025WebPilot is a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. It uses Global Optimization for high-level planning and Local Optimization for executing subtasks, achieving SOTA performance on WebArena with a 93% relative increase in success rate.
Academic Service
- Conference Reviewer: ARR, NeurIPS, CVPR, AAAI, BMVC
- Teaching Assistant: Bachelor Seminar on Generative AI; Master Seminar on Knowledge Graphs, LMU Munich