LLM-Driven Voice Agents That Collaboratively Talk to Each Other: Towards Vocal Multi- Agent Systems

Navneet Dalipkumar Magotra

doi:10.64917/feaiml/Volume02Issue12-05

Open Access

LLM-Driven Voice Agents That Collaboratively Talk to Each Other: Towards Vocal Multi- Agent Systems

https://doi.org/10.64917/feaiml/Volume02Issue12-05

PDF

Navneet Dalipkumar Magotra ¹

⁴ Binghamton University: Binghamton, New York, US

Abstract

Voice assistants such as Alexa, Google Assistant, and Siri have become increasingly sophisticated, integrating Large Language Models (LLMs) to deliver personalized responses. However, these platforms are constrained by single-agent paradigms that limit collaboration, transparency, and complex problem solving [1,2]. This paper proposes a novel LLM-driven multi-agent voice architecture in which multiple specialized voice-powered agents converse with one another— vocally and intelligibly—to collaboratively resolve user requests. Drawing inspiration from cooperative multi-agent systems [3,4] and human-like conversational transparency [5,6], we demonstrate a prototype in a smart kitchen environment involving culinary, nutrition, and inventory agents. Evaluation suggests improvements in explainability, task success, and user trust, though challenges remain in orchestration, privacy, and cognitive load. This research introduces vocal multi-agent systems as a new frontier in interactive AI, advancing beyond single-agent frameworks towards explainable, collaborative, and socially intelligent voice ecosystems.

How to Cite

Navneet Dalipkumar Magotra. (2025). LLM-Driven Voice Agents That Collaboratively Talk to Each Other: Towards Vocal Multi- Agent Systems. Frontiers in Emerging Artificial Intelligence and Machine Learning, 2(12), 45–54. https://doi.org/10.64917/feaiml/Volume02Issue12-05

⬇ Endnote/Zotero/Mendeley (RIS) ⬇ BibTeX

References

📄 Hoy, M.B., 2018. Alexa, Siri, Cortana, and more: An introduction to voice assistants. Medical Reference Services Quarterly, 37(1), pp.81–88.

📄 Luger, E. and Sellen, A., 2016. Like having a really bad PA: The gulf between user expectation and experience of conversational agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 5286–5297). ACM.

📄 Wooldridge, M., 2009. An Introduction to MultiAgent Systems. John Wiley & Sons.

📄 Jennings, N.R., Sycara, K. and Wooldridge, M., 1998. A roadmap of agent research and development. Autonomous Agents and Multi-Agent Systems, 1(1), pp.7–38.

📄 Doshi-Velez, F. and Kim, B., 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

📄 Miller, T., 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, pp.1–38.

📄 Traum, D.R. and Rickel, J., 2002. Embodied agents for multi-party dialogue in immersive virtual worlds. In Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems (pp. 766–773). ACM.

📄 Bohus, D. and Horvitz, E., 2011. Multiparty turn taking in situated dialog: Study, lessons, and directions. In Proceedings of the SIGDIAL 2011 Conference (pp. 98–109). ACL.

📄 Skantze, G., 2017. Towards a general, continuous model of turn-taking in spoken dialogue using LSTM recurrent neural networks. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue (pp. 220–230). ACL.

📄 Clark, H.H. and Brennan, S.E., 1991. Grounding in communication. In Resnick, L.B., Levine, J.M., & Teasley, S.D. (Eds.), Perspectives on Socially Shared Cognition (pp. 127– 149). American Psychological Association

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors retain the copyright of their articles published in this journal. All articles are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly cited.