MASCOT: Towards Multi-Agent Socio-Collaborative Companion Systems

Abstract

Social interaction significantly impacts well-being, mental health, and cognition. Yet an estimated 1 in 6 people worldwide lack the social interactions they need, creating crises of social isolation and loneliness. As large language models (LLMs) integrate deeper into human lives, their role is shifting from passive tools to active socio-collaborative companions in affective and collaborative settings.

We propose MASCOT, a generalizable multi-agent framework for developing multi-perspective socio-collaborative companions. Unlike previous multi-agent systems optimized solely for task efficiency, MASCOT targets user-agent interaction quality, explicitly balancing individual agent persona consistency with global discourse dynamics. We introduce an efficient bi-level optimization strategy: (1) a Reinforcement-Learning-from-AI-Feedback (RLAIF) pipeline that fine-tunes individual agents for strict Persona Fidelity, and (2) a meta-agent policy guided by group-level rewards to ensure Interaction Synergy.

Extensive experiments demonstrate that MASCOT achieves significant improvements: +14.1 in Persona Consistency and +10.6 in Social Contribution compared to baseline approaches.

Motivation

Moving beyond dyadic interactions to multi-agent socio-collaborative companion systems. While single-agent support systems (left) can result in limited insights and echo chambers, multi-agent systems (right) provide diverse, balanced perspectives and foster a shared sense of community through agents with distinct roles.

Challenges in Multi-Agent Social Companions

1. 🎭 Persona Fidelity

At the individual agent level, models struggle to maintain stable, unique character traits. Agents often succumb to 🫥 persona collapse, regressing to 🔁 generic, homogenized assistant behaviors that lack unique perspectives required for their specific roles.

2. 🤝 Interaction Synergy

At the collective level, uncoordinated agents fail to generate complementary, non-redundant contributions. This manifests as Social Sycophancy (the 👍 "Yes-Man" Bias), where agents prioritize agreeableness, creating 🔄 echo chambers rather than constructive conversations.

3. 🧭 Long-Horizon State Consistency

Agents often struggle to track the logical progression of a conversation, treating each turn as a local optimization problem and losing track of the conversation's 📉 logical arc, such as 🔂 re-litigating settled topics.

MASCOT Framework

MASCOT is a multi-agent framework that optimizes the generation process through a two-phase training pipeline: (1) Persona-Aware Behavioral Alignment and (2) Collaborative Dialogue Optimization.

MASCOT optimizes multi-agent socio-collaborative companions through bi-level optimization: individual persona alignment and group dynamics coordination.

Two-Stage Pipeline

Stage 1: Persona-Aware Behavioral Alignment

We move beyond standard few-shot prompting by explicitly aligning a base policy with persona constraints via Reinforcement Learning from AI Feedback (RLAIF).

Persona Reward Modeling

Each candidate response is evaluated by an LLM judge according to evaluation criteria including persona adherence and contextual appropriateness. We construct preference pairs and train a persona-aware reward model.

RL Fine-tuning with GRPO

We optimize a composite reward function combining persona adherence and format rewards, using Group Relative Policy Optimization (GRPO) for computational efficiency.

Stage 2: Collaborative Dialogue Optimization

While Stage 1 optimizes individual agents for persona adherence, it does not account for group dynamics. We introduce group-level optimization to ensure cohesive community conversations.

Hierarchical Generation Process

At each turn, a meta-agent generates directives specifying the next speaker with specific instructions, then the selected agent generates a response conditioned on its persona and the directive.

Group Reward Modeling

The Group Reward Model evaluates entire N-turn conversation blocks based on holistic criteria including global state consistency and interaction quality.

Key Contributions

Unified Framework

A generalizable multi-agent framework that enables socio-collaborative companions in multi-perspective group conversations

Bi-level Optimization

Low-resource, multi-level optimization pipeline that harmonizes individual persona adherence with collective interaction synergy

Extensive Evaluation

Comprehensive evaluation across psychology (empathetic conversations) and workplace settings (business and academic meetings)

Performance Results

MASCOT consistently outperforms strong baselines across multiple datasets and evaluation metrics, demonstrating significant improvements in both persona fidelity and interaction synergy.

Key Results

+14.1 Persona Consistency Improvement

+10.6 Social Contribution Improvement

0.187% Trainable Parameters (vs Full)

Empathetic Dialogues Results

Collaborative Interaction Metrics

Experimental Setup

Evaluation Datasets

Empathetic Dialogues: A benchmark dataset for empathetic conversation grounded in emotional situations. We categorize dialogues by sentiment valence (positive, negative, neutral) to evaluate how MASCOT agents provide emotional support across different affective contexts.

QMSum: A query-based multi-domain meeting summarization benchmark covering academic, committee, and product meetings. This dataset tests MASCOT's ability to coordinate collaborative discussions in professional workplace settings.

Agent Personas

The Anchor

Emotional Validator - Provides empathetic support and emotional grounding

The Catalyst

Critical Thinker - Offers Socratic questioning and cognitive reframing

The Beacon

Growth Advocate - Provides growth-oriented positive reinforcement

Action Guide

Practical Advisor - Offers actionable suggestions and structured guidance

Baselines

Prompting Methods

Zero-shot, Few-shot, Supervised Fine-tuning (SFT)

Reasoning Methods

Chain-of-Thoughts (CoT), Self-Consistency

Ablation Study

We analyze the contribution of each component by systematically removing them. The persona alignment module proves most critical for maintaining individual agent quality.

Key Insight

Removing persona-specific prompting (MASCOT-P) resulted in the most significant performance drop, particularly in Empathetic Support Quality (72.8 → 65.1) and Consistency (76.6 → 72.3). Without explicit persona grounding, agents default to generic AI assistant behaviors.

Impact of Collaborative Optimization

The MASCOT-C variant with randomized speaking orders primarily affected Social Contribution and Relevance & Coherence. The collaborative module ensures each agent's contribution builds upon the previous turn rather than reacting in isolation.

MBTI Personality Analysis

To investigate MASCOT's effectiveness at scale, we simulate users with 16 Myers-Briggs Type Indicator (MBTI) profiles.

The Anchor (Emotional Validator)

Consistently outperforms other agents across most user categories, particularly with Diplomats (INFP: 76.6) and specific Analysts (ENTJ: 76.0).

The Beacon (Growth Advocate)

Demonstrates highest stability across Sentinel category, maintaining narrow performance variance (71.60-74.21).

Case Study

How Multi-Agent Coordination Enables Better Support

The Anchor: Provides emotional validation and establishes trust with the user.

The Catalyst: Offers cognitive reframing and encourages deeper reflection.

The Beacon: Provides growth-oriented perspectives and positive reinforcement.

Meta-Agent: Coordinates turn-taking to ensure diverse, non-redundant contributions.

BibTeX

@article{wang2026mascot,
  title={MASCOT: Towards Multi-Agent Socio-Collaborative Companion Systems},
  author={Wang, Yiyang and Jin, Yiqiao and Cabral, Alex and Hester, Josiah},
  journal={arXiv:2601.14230},
  year={2026}
}

Usage and License Notices

The data, code and model checkpoint are intended and licensed mainly for research.

This website is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.