Publications
2026
- Yao Zhang , Shijie Tang , Zeyu Li , and 2 more authorsICLR 2026
Web agents hold great potential for automating complex computer tasks, yet their interactions involve long-horizon, sequential decision-making with irreversible actions. In such settings, outcome-based supervision is sparse and delayed, often rewarding incorrect trajectories and failing to support inference-time scaling. This motivates the use of Process Reward Models (WebPRMs) for web navigation, but existing approaches remain limited: scalar WebPRMs collapse progress into coarse, weakly grounded signals, while checklist-based WebPRMs rely on brittle template matching that fails under layout or semantic changes and often mislabels superficially correct actions as successful, providing little insight or interpretability. To address these challenges, we introduce WebArbiter, a reasoning-first, principle-inducing WebPRM that formulates reward modeling as text generation, producing structured justifications that conclude with a preference verdict and identify the action most conducive to task completion under the current context. Training follows a two-stage pipeline: reasoning distillation equips the model with coherent principle-guided reasoning, and reinforcement learning corrects teacher biases by directly aligning verdicts with correctness, enabling stronger generalization. To support systematic evaluation, we release WebPRMBench, a comprehensive benchmark spanning four diverse web environments with rich tasks and high-quality preference annotations. On WebPRMBench, WebArbiter-7B outperforms the strongest baseline, GPT-5, by 9.1 points. In reward-guided trajectory search on WebArena-Lite, it surpasses the best prior WebPRM by up to 6.4 points, underscoring its robustness and practical value in complex web tasks.
@misc{zhang2026webarbiterprincipleguidedreasoningprocess, title={WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents}, author={Zhang, Yao and Tang, Shijie and Li, Zeyu and Han, Zhen and Tresp, Volker}, year={2026}, eprint={2601.21872}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2601.21872}, } - AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language ModelsHaokun Chen , Jianing Li , Yao Zhang , and 4 more authorsAAAI 2026
Multimodal Large Language Models (MLLMs) achieve impressive performance once optimized on massive datasets. Such datasets often contain sensitive or copyrighted content, raising significant data privacy concerns. Regulatory frameworks mandating the ’right to be forgotten’ drive the need for machine unlearning. This technique allows for the removal of target data without resource-consuming retraining. However, while well-studied for text, visual concept unlearning in MLLMs remains underexplored. A primary challenge is precisely removing a target visual concept without disrupting model performance on related entities. To address this, we introduce AUVIC, a novel visual concept unlearning framework for MLLMs. AUVIC applies adversarial perturbations to enable precise forgetting. This approach effectively isolates the target concept while avoiding unintended effects on similar entities. To evaluate our method, we construct VCUBench. It is the first benchmark designed to assess visual concept unlearning in group contexts. Experimental results demonstrate that AUVIC achieves state-of-the-art target forgetting rates while incurs minimal performance degradation on non-target concepts.
@misc{chen2025auvicadversarialunlearningvisual, title={AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models}, author={Chen, Haokun and Li, Jianing and Zhang, Yao and Bi, Jinhe and Xia, Yan and Gu, Jindong and Tresp, Volker}, year={2026}, eprint={2511.11299}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2511.11299}, }
2025
- Yao Zhang , Chenyang Lin , Shijie Tang , and 4 more authorsEMNLP 2025 (Main)
The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated system multi-agent generation.
@misc{zhang2025swarmagenticfullyautomatedagentic, title={SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence}, author={Zhang, Yao and Lin, Chenyang and Tang, Shijie and Chen, Haokun and Zhou, Shijie and Ma, Yunpu and Tresp, Volker}, year={2025}, eprint={2506.15672}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2506.15672}, } - Yao Zhang , Yu Wu , Haowei Zhang , and 6 more authorsLAW@NeurIPS 2025
Process Reward Models (PRMs) aim to improve multi-step reasoning in Large Language Models (LLMs) by supervising intermediate steps and identifying errors. However, building effective PRMs remains challenging due to the lack of scalable, high-quality annotations. Existing approaches rely on costly human labeling, LLM-based self-evaluation that is prone to hallucination, or Monte Carlo (MC) estimation, which infers step quality solely from rollout outcomes and often introduces noisy, misaligned supervision due to credit misattribution. These issues result in three core limitations: noisy rewards, low factual fidelity, and misalignment with step-level reasoning objectives. To address these challenges, we introduce GroundedPRM, a tree-guided and fidelity-aware framework for automatic process supervision. To reduce reward noise and enable fine-grained credit assignment, we construct structured reasoning paths via Monte Carlo Tree Search (MCTS). To eliminate hallucinated supervision, we validate each intermediate step using an external tool, providing execution-grounded correctness signals. To combine both step-level validation and global outcome assessment, we design a hybrid reward aggregation mechanism that fuses tool-based verification with MCTS-derived feedback. Finally, we format the reward signal into a rationale-enhanced, generative structure to promote interpretability and compatibility with instruction-tuned LLMs. GroundedPRM is trained on only 40K automatically labeled samples, amounting to just 10% of the data used by the best-performing PRM trained with auto-labeled supervision. Nevertheless, it achieves up to a 26% relative improvement in average performance on ProcessBench. When used for reward-guided greedy search, GroundedPRM outperforms even PRMs trained with human-labeled supervision, offering a scalable and verifiable path toward high-quality process-level reasoning.
@misc{zhang2025groundedprmtreeguidedfidelityawareprocess, title={GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning}, author={Zhang, Yao and Wu, Yu and Zhang, Haowei and Li, Weiguo and Chen, Haokun and Wu, Jingpei and Li, Guohao and Han, Zhen and Tresp, Volker}, year={2025}, eprint={2510.14942}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2510.14942}, } - Yao Zhang , Zijian Ma , Yunpu Ma , and 3 more authorsAAAI 2025
LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by exploring unknowns, continuously adapting strategies, and resolving ambiguities through exploration. To emulate human-like adaptability, web agents need strategic exploration and complex decision-making. Monte Carlo Tree Search (MCTS) is well-suited for this, but classical MCTS struggles with vast action spaces, unpredictable state transitions, and incomplete information in web tasks. In light of this, we develop WebPilot, a multi-agent system with a dual optimization strategy that improves MCTS to better handle complex web environments. Specifically, the Global Optimization phase involves generating a high-level plan by breaking down tasks into manageable subtasks and continuously refining this plan, thereby focusing the search process and mitigating the challenges posed by vast action spaces in classical MCTS. Subsequently, the Local Optimization phase executes each subtask using a tailored MCTS designed for complex environments, effectively addressing uncertainties and managing incomplete information. Experimental results on WebArena and MiniWoB++ demonstrate the effectiveness of WebPilot. Notably, on WebArena, WebPilot achieves SOTA performance with GPT-4, achieving a 93% relative increase in success rate over the concurrent tree search-based method. WebPilot marks a significant advancement in general autonomous agent capabilities, paving the way for more advanced and reliable decision-making in practical environments.
@misc{zhang2024webpilotversatileautonomousmultiagent, title={WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration}, author={Zhang, Yao and Ma, Zijian and Ma, Yunpu and Han, Zhen and Wu, Yu and Tresp, Volker}, year={2025}, eprint={2408.15978}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2408.15978}, } - Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual BackpropagationXiaowen Ma , Chenyang Lin , Yao Zhang , and 2 more authorsUnder Review, 2025
Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative "team" focused on a specific subtask. Agentic Neural Network follows a two-phase optimization strategy: (1) Forward Phase – Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase – Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables ANN to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across four benchmark datasets, ANN surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements. Our findings indicate that ANN provides a scalable, data-driven framework for multi-agent systems, combining the collaborative capabilities of LLMs with the efficiency and flexibility of neural network principles.
@misc{ma2025agenticneuralnetworksselfevolving, title={Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation}, author={Ma, Xiaowen and Lin, Chenyang and Zhang, Yao and Tresp, Volker and Ma, Yunpu}, year={2025}, eprint={2506.09046}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2506.09046}, } - Yao Zhang , Haokun Chen , Ahmed Frikha , and 5 more authorsWACV 2025
Visual Question Answering (VQA) is a multi-discipline research task. To produce the right answer, it requires an understanding of the visual content of images, the natural language questions, as well as commonsense reasoning over the information contained in the image and world knowledge. Recently, large-scale Vision-and-Language Pre-trained Models (VLPMs) have been the mainstream approach to VQA tasks due to their superior performance. The standard practice is to fine-tune large-scale VLPMs pre-trained on huge general-domain datasets using the domain-specific VQA datasets. However, in reality, the application domain can change over time, necessitating VLPMs to continually learn and adapt to new domains without forgetting previously acquired knowledge. Most existing continual learning (CL) research concentrates on unimodal tasks, whereas a more practical application scenario, i.e, CL on cross-domain VQA, has not been studied. Motivated by this, we introduce CL-CrossVQA, a rigorous Continual Learning benchmark for Cross-domain Visual Question Answering, through which we conduct extensive experiments on 4 VLPMs, 4 CL approaches, and 5 VQA datasets from different domains. In addition, by probing the forgetting phenomenon of the intermediate layers, we provide insights into how model architecture affects CL performance, why CL approaches can help mitigate forgetting in VLPMs to some extent, and how to design CL approaches suitable for VLPMs in this challenging continual learning environment.
@misc{zhang2022clcrossvqacontinuallearningbenchmark, title={CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering}, author={Zhang, Yao and Chen, Haokun and Frikha, Ahmed and Yang, Yezi and Krompass, Denis and Zhang, Gengyuan and Gu, Jindong and Tresp, Volker}, year={2025}, eprint={2211.10567}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2211.10567}, } - FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language ModelsYao Zhang , Hewei Gao , Haokun Chen , and 3 more authorsUnder Review, 2025
Multimodal Large Language Models (MLLMs) excel in tasks like multimodal reasoning and cross-modal retrieval but face deployment challenges in real-world scenarios due to distributed multimodal data and strict privacy requirements. Federated Learning (FL) offers a solution by enabling collaborative model training without centralizing data. However, realizing FL for MLLMs presents significant challenges, including high computational demands, limited client capacity, substantial communication costs, and heterogeneous client data. Existing FL methods assume client-side deployment of full models, an assumption that breaks down for large-scale MLLMs due to their massive size and communication demands. To address these limitations, we propose FedNano, the first FL framework that centralizes the LLM on the server while introducing NanoEdge, a lightweight module for client-specific adaptation. NanoEdge employs modality-specific encoders, connectors, and trainable NanoAdapters with low-rank adaptation. This design eliminates the need to deploy LLM on clients, reducing client-side storage by 95%, and limiting communication overhead to only 0.01% of the model parameters. By transmitting only compact NanoAdapter updates, FedNano handles heterogeneous client data and resource constraints while preserving privacy. Experiments demonstrate that FedNano outperforms prior FL baselines, bridging the gap between MLLM scale and FL feasibility, and enabling scalable, decentralized multimodal AI systems.
@misc{zhang2025fednanolightweightfederatedtuning, title={FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models}, author={Zhang, Yao and Gao, Hewei and Chen, Haokun and Li, Weiguo and Ma, Yunpu and Tresp, Volker}, year={2025}, eprint={2506.14824}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2506.14824}, } - Haokun Chen , Hang Li , Yao Zhang , and 7 more authorsCVPR 2025
One-Shot Federated Learning (OSFL), a special decentralized machine learning paradigm, has recently gained significant attention. OSFL requires only a single round of client data or model upload, which reduces communication costs and mitigates privacy threats compared to traditional FL. Despite these promising prospects, existing methods face challenges due to client data heterogeneity and limited data quantity when applied to real-world OSFL systems. Recently, Latent Diffusion Models (LDM) have shown remarkable advancements in synthesizing high-quality images through pretraining on large-scale datasets, thereby presenting a potential solution to overcome these issues. However, directly applying pretrained LDM to heterogeneous OSFL results in significant distribution shifts in synthetic data, leading to performance degradation in classification models trained on such data. This issue is particularly pronounced in rare domains, such as medical imaging, which are underrepresented in LDM’s pretraining data. To address this challenge, we propose Federated Bi-Level Personalization (FedBiP), which personalizes the pretrained LDM at both instance-level and concept-level. Hereby, FedBiP synthesizes images following the client’s local data distribution without compromising the privacy regulations. FedBiP is also the first approach to simultaneously address feature space heterogeneity and client data scarcity in OSFL. Our method is validated through extensive experiments on three OSFL benchmarks with feature space heterogeneity, as well as on challenging medical and satellite image datasets with label heterogeneity. The results demonstrate the effectiveness of FedBiP, which substantially outperforms other OSFL methods.
@misc{chen2025fedbipheterogeneousoneshotfederated, title={FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models}, author={Chen, Haokun and Li, Hang and Zhang, Yao and Bi, Jinhe and Zhang, Gengyuan and Zhang, Yueqi and Torr, Philip and Gu, Jindong and Krompass, Denis and Tresp, Volker}, year={2025}, eprint={2410.04810}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2410.04810}, } - Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMsGengyuan Zhang , Mingcong Ding , Tong Liu , Yao Zhang , and 1 more authorWorld Models@ICLR 2025
Multimodal large language models (MLLMs) have demonstrated strong performance in understanding videos holistically, yet their ability to process streaming videos – videos are treated as a sequence of visual events – remains underexplored. Intuitively, leveraging past events as memory can enrich contextual and temporal understanding of the current event. In this paper, we show that leveraging memories as contexts helps MLLMs better understand video events. However, because such memories rely on predictions of preceding events, they may contain misinformation, leading to confabulation and degraded performance. To address this, we propose a confabulation-aware memory modification method that mitigates confabulated memory for memory-enhanced event understanding.
@misc{zhang2025memoryhelpsconfabulationmisleads, title={Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs}, author={Zhang, Gengyuan and Ding, Mingcong and Liu, Tong and Zhang, Yao and Tresp, Volker}, year={2025}, eprint={2502.15457}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.15457}, } - Does Machine Unlearning Truly Remove Knowledge?Haokun Chen , Yueqi Zhang , Yuan Bi , Yao Zhang , and 8 more authorsLockLLM@NeurIPS 2025
In recent years, Large Language Models (LLMs) have achieved remarkable advancements, drawing significant attention from the research community. Their capabilities are largely attributed to large-scale architectures, which require extensive training on massive datasets. However, such datasets often contain sensitive or copyrighted content sourced from the public internet, raising concerns about data privacy and ownership. Regulatory frameworks, such as the General Data Protection Regulation (GDPR), grant individuals the right to request the removal of such sensitive information. This has motivated the development of machine unlearning algorithms that aim to remove specific knowledge from models without the need for costly retraining. Despite these advancements, evaluating the efficacy of unlearning algorithms remains a challenge due to the inherent complexity and generative nature of LLMs. In this work, we introduce a comprehensive auditing framework for unlearning evaluation, comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods. By using various auditing algorithms, we evaluate the effectiveness and robustness of different unlearning strategies. To explore alternatives beyond prompt-based auditing, we propose a novel technique that leverages intermediate activation perturbations, addressing the limitations of auditing methods that rely solely on model inputs and outputs.
@misc{chen2025doesmachineunlearningtruly, title={Does Machine Unlearning Truly Remove Knowledge?}, author={Chen, Haokun and Zhang, Yueqi and Bi, Yuan and Zhang, Yao and Liu, Tong and Bi, Jinhe and Lan, Jian and Gu, Jindong and Grosser, Claudia and Krompass, Denis and Navab, Nassir and Tresp, Volker}, year={2025}, eprint={2505.23270}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2505.23270}, }
2024
- Haokun Chen , Yao Zhang , Denis Krompass , and 2 more authorsAAAI 2024
Recently, foundation models have exhibited remarkable advancements in multi-modal learning. These models, equipped with millions (or billions) of parameters, typically require a substantial amount of data for finetuning. However, collecting and centralizing training data from diverse sectors becomes challenging due to distinct privacy regulations. Federated Learning (FL) emerges as a promising solution, enabling multiple clients to collaboratively train neural networks without centralizing their local data. To alleviate client computation burdens and communication overheads, previous works have adapted Parameter-efficient Finetuning (PEFT) methods for FL. Hereby, only a small fraction of the model parameters are optimized and communicated during federated communications. Nevertheless, most previous works have focused on a single modality and neglected one common phenomenon, i.e., the presence of data heterogeneity across the clients. Therefore, in this work, we propose a finetuning framework tailored to heterogeneous multi-modal FL, called Federated Dual-Aadapter Teacher (FedDAT). Specifically, our approach leverages a Dual-Adapter Teacher (DAT) to address data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer. FedDAT is the first approach that enables an efficient distributed finetuning of foundation models for a variety of heterogeneous Vision-Language tasks. To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity, where FedDAT substantially outperforms the existing centralized PEFT methods adapted for FL.
@misc{chen2023feddatapproachfoundationmodel, title={FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning}, author={Chen, Haokun and Zhang, Yao and Krompass, Denis and Gu, Jindong and Tresp, Volker}, year={2024}, eprint={2308.12305}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2308.12305}, }
2023
- Yao Zhang , Yunpu Ma , Thomas Seidl , and 1 more authorIJCNN 2023 (Oral)
Transformers have improved the state-of-the-art across numerous tasks in sequence modeling. Besides the quadratic computational and memory complexity w.r.t the sequence length, the self-attention mechanism only processes information at the same scale, i.e., all attention heads are in the same resolution, resulting in the limited power of the Transformer. To remedy this, we propose a novel and efficient structure named Adaptive Multi-Resolution Attention (AdaMRA for short), which scales linearly to sequence length in terms of time and space. Specifically, we leverage a multi-resolution multi-head attention mechanism, enabling attention heads to capture long-range contextual information in a coarse-to-fine fashion. Moreover, to capture the potential relations between query representation and clues of different attention granularities, we leave the decision of which resolution of attention to use to query, which further improves the model’s capacity compared to vanilla Transformer. In an effort to reduce complexity, we adopt kernel attention without degrading the performance. Extensive experiments on several benchmarks demonstrate the effectiveness and efficiency of our model by achieving a state-of-the-art performance-efficiency-memory trade-off.
@misc{zhang2021adaptivemultiresolutionattentionlinear, title={Adaptive Multi-Resolution Attention with Linear Complexity}, author={Zhang, Yao and Ma, Yunpu and Seidl, Thomas and Tresp, Volker}, year={2023}, eprint={2108.04962}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2108.04962}, } - Zhen Han , Ruotong Liao , Jindong Gu , Yao Zhang , and 5 more authorsFindings of ACL 2023
Since conventional knowledge embedding models cannot take full advantage of the abundant textual information, there have been extensive research efforts in enhancing knowledge embedding using texts. However, existing enhancement approaches cannot apply to temporal knowledge graphs (tKGs), which contain time-dependent event knowledge with complex temporal dynamics. Specifically, existing enhancement approaches often assume knowledge embedding is time-independent. In contrast, the entity embedding in tKG models usually evolves, which poses the challenge of aligning temporally relevant texts with entities. To this end, we propose to study enhancing temporal knowledge embedding with textual data in this paper. As an approach to this task, we propose Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA), which takes the temporal aspect into account and injects textual information into temporal knowledge embedding. To evaluate ECOLA, we introduce three new datasets for training and evaluating ECOLA. Extensive experiments show that ECOLA significantly enhances temporal KG embedding models with up to 287% relative improvements regarding Hits@1 on the link prediction task.
@misc{han2023ecolaenhancedtemporalknowledge, title={ECOLA: Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations}, author={Han, Zhen and Liao, Ruotong and Gu, Jindong and Zhang, Yao and Ding, Zifeng and Gu, Yujia and Köppl, Heinz and Schütze, Hinrich and Tresp, Volker}, year={2023}, eprint={2203.09590}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2203.09590}, }
2021
- Michael Fromm , Evgeniy Faerman , Max Berrendorf , Siddharth Bhargava , Ruoxia Qi , Yao Zhang , and 4 more authorsAAAI 2021
Peer reviewing is a central process in modern research and essential for ensuring high quality and reliability of published work. At the same time, it is a time-consuming process and increasing interest in emerging fields often results in a high review workload, especially for senior researchers in this area. How to cope with this problem is an open question and it is vividly discussed across all major conferences. In this work, we propose an Argument Mining based approach for the assistance of editors, meta-reviewers, and reviewers. We demonstrate that the decision process in the field of scientific publications is driven by arguments and automatic argument identification is helpful in various use-cases. One of our findings is that arguments used in the peer-review process differ from arguments in other domains making the transfer of pre-trained models difficult. Therefore, we provide the community with a new peer-review dataset from different computer science conferences with annotated arguments. In our extensive empirical evaluation, we show that Argument Mining can be used to efficiently extract the most relevant parts from reviews, which are paramount for the publication decision.
@misc{fromm2020argumentminingdrivenanalysis, title={Argument Mining Driven Analysis of Peer-Reviews}, author={Fromm, Michael and Faerman, Evgeniy and Berrendorf, Max and Bhargava, Siddharth and Qi, Ruoxia and Zhang, Yao and Dennert, Lukas and Selle, Sophia and Mao, Yang and Seidl, Thomas}, year={2021}, eprint={2012.07743}, archivePrefix={arXiv}, primaryClass={cs.CY}, url={https://arxiv.org/abs/2012.07743}, doi={https://doi.org/10.5281/zenodo.4314390}, }
2020
- Yao Zhang , Yifeng Lu , and Thomas SeidlIn Proceedings of the 22nd International Conference on Information Integration and Web-Based Applications & Services (IIWAS)IIWAS 2020
KNNAC introduces an efficient k-nearest neighbor based clustering algorithm with active core detection. The method identifies cluster cores through active learning, improving clustering efficiency and accuracy.
@inproceedings{zhang2020knnacefficientknearest, title = {KNNAC: An Efficient k Nearest Neighbor Based Clustering with Active Core Detection}, author = {Zhang, Yao and Lu, Yifeng and Seidl, Thomas}, booktitle = {Proceedings of the 22nd International Conference on Information Integration and Web-Based Applications \& Services (IIWAS)}, year = {2020} } - Yifeng Lu , Yao Zhang , Florian Richter , and 1 more authorIn Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN)IJCNN 2020 (Oral)
We propose a k-nearest neighbor based clustering method with shape alternation adaptivity. The approach adapts to varying cluster shapes and densities, improving clustering performance on diverse datasets.
@inproceedings{lu2020knearestneighborbasedclustering, title = {k-Nearest Neighbor Based Clustering with Shape Alternation Adaptivity}, author = {Lu, Yifeng and Zhang, Yao and Richter, Florian and Seidl, Thomas}, booktitle = {Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN)}, year = {2020} }