Yao Zhang

Ph.D. Candidate @ LMU Munich

me.jpg

I am a final-year Ph.D. candidate in Computer Science at LMU Munich, advised by Prof. Volker Tresp. Previously, I received my M.Sc. (2021) and B.Sc. (2019) from LMU Munich.

My research is focused on agent reliability, reducing compounding failures in long-horizon tasks through structured search and process supervision. As agents take on increasingly autonomous roles, reliable multi-step execution remains one of the central challenges for real-world deployment. I develop methods that address this, from single-agent execution to multi-agent system design. Earlier in my PhD, I also worked on multimodal learning, including parameter-efficient fine-tuning in federated and continual settings. Looking ahead, I am interested in enabling agents to take on increasingly autonomous roles, where they can be trusted to operate over extended periods with minimal human oversight. Feel free to reach out if you are interested in my work or would like to connect.

I am on the job market for Research Scientist / Applied Scientist positions in agentic systems.

Research Interests: Agent Reliability Reward Modeling Agentic Systems Multimodal Learning

News

Jan 2026 WebArbiter accepted at ICLR 2026.
Nov 2025 AUVIC accepted at AAAI 2026.
Oct 2025 GroundedPRM presented at LAW@NeurIPS 2025.
Aug 2025 SwarmAgentic accepted at EMNLP 2025 (Main).
Mar 2025 FedBiP accepted at CVPR 2025.
Dec 2024 WebPilot accepted at AAAI 2025.
Oct 2024 CL-CrossVQA accepted at WACV 2025.
Dec 2023 FedDAT accepted at AAAI 2024.

Selected Publications

Full List →
  1. WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents
    Yao Zhang ,  Shijie Tang ,  Zeyu Li , and 2 more authors
    ICLR 2026
    WebArbiter is a reasoning-first, principle-inducing WebPRM that formulates process reward modeling as text generation, producing structured justifications that conclude with a preference verdict to identify the action most conducive to task completion. Trained via reasoning distillation and reinforcement learning, it achieves SOTA on WebPRMBench and delivers substantial gains in reward-guided trajectory search on WebArena-Lite.
  2. SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence
    Yao Zhang ,  Chenyang Lin ,  Shijie Tang , and 4 more authors
    EMNLP 2025 (Main)
    SwarmAgentic is a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. It maintains a population of candidate systems and evolves them via feedback-guided updates inspired by Particle Swarm Optimization, enabling efficient exploration of the agentic system design space.
  3. GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning
    Yao Zhang ,  Yu Wu ,  Haowei Zhang , and 6 more authors
    LAW@NeurIPS 2025
    GroundedPRM is a tree-guided and fidelity-aware framework for automatic process reward modeling that combines MCTS-guided path construction with tool-based step verification. It achieves SOTA performance with only 10% of the training data compared to existing auto-labeled methods, demonstrating exceptional sample efficiency and superior reasoning quality.
  4. WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
    Yao Zhang ,  Zijian Ma ,  Yunpu Ma , and 3 more authors
    AAAI 2025
    WebPilot is a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. It uses Global Optimization for high-level planning and Local Optimization for executing subtasks, achieving SOTA performance on WebArena with a 93% relative increase in success rate.

Academic Service

  • Conference Reviewer: ARR, NeurIPS, CVPR, AAAI, BMVC
  • Teaching Assistant: Bachelor Seminar on Generative AI; Master Seminar on Knowledge Graphs, LMU Munich