Master's Thesis: The Structure of Deception in Multi-Agent LLM Systems
How LLM Agents Lie, Break Promises, and Exploit Trust
Large language models are increasingly deployed as autonomous agents that communicate, commit, and coordinate in multi-agent systems. Deception in such settings—including promise-breaking, selective information sharing, and exploitation of other agents’ interpretive frameworks—introduces deployment risks that isolated-model evaluation cannot detect.
This thesis develops a unified framework for measuring LLM deception in multi-agent settings and populates it with empirical evaluations across three interaction structures:
Key Contributions:
-
Unified Taxonomy: Organizes fragmented literature along goal-directedness, object, and mechanism dimensions, revealing systematic benchmark coverage gaps across 35 existing benchmarks.
- Multi-Setting Empirical Evaluation: Tests frontier LLMs in progressively less structured settings:
- One-shot games with mandated announcements
- Repeated games with endogenous announcements and heterogeneous model compositions
- Resource-gathering simulation with narrative goals and no announcement protocol
-
Qualitative Deception Profiles: Demonstrates that aggregate lying rates obscure structurally distinct deceptive behaviors—deception in prescribed protocols takes the form of planned false commitments, while under narrative goals it manifests as strategic silence that message-level classification cannot observe.
- Monitoring Failure Modes: Shows that three candidate monitoring approaches from existing literature each fail against specific failure modes, highlighting the inadequacy of one-size-fits-all detection methods.
Central Claim: LLM deception in multi-agent settings is not a single phenomenon but a family of structurally distinct failure modes, each shaped by different features of the interaction. Current benchmarks and monitoring approaches systematically underrepresent this variety.
Thesis Committee: Vincent Conitzer (Chair), Aditi Raghunathan Advisors: Vincent Conitzer, Zhijing Jin CMU Technical Report: CMU-CS-26-105