LLM-Driven Voice Agents That Collaboratively Talk to Each Other: Towards Vocal Multi- Agent Systems
Abstract
Voice assistants such as Alexa, Google Assistant, and Siri have become increasingly sophisticated, integrating Large Language Models (LLMs) to deliver personalized responses. However, these platforms are constrained by single-agent paradigms that limit collaboration, transparency, and complex problem solving [1,2]. This paper proposes a novel LLM-driven multi-agent voice architecture in which multiple specialized voice-powered agents converse with one another— vocally and intelligibly—to collaboratively resolve user requests. Drawing inspiration from cooperative multi-agent systems [3,4] and human-like conversational transparency [5,6], we demonstrate a prototype in a smart kitchen environment involving culinary, nutrition, and inventory agents. Evaluation suggests improvements in explainability, task success, and user trust, though challenges remain in orchestration, privacy, and cognitive load. This research introduces vocal multi-agent systems as a new frontier in interactive AI, advancing beyond single-agent frameworks towards explainable, collaborative, and socially intelligent voice ecosystems.