Multi-agent systems (MAS) have been democratised in recent years thanks to the natural language interfacing made possible by large language models (LLM). While their ability to solve complex tasks is undeniable, the dynamics emerging from these systems can be hard to predict, and guarantees are needed. Jailbreak, adversariality, or power-seeking are concerning failure modes of MAS, and evaluating these capabilities remains a difficult problem. In this respect, interpretability could be one of the best tools to monitor and control several agents simultaneously and automatically. Indeed the models' internals convey the information used for its prediction and can be used symbolically for gaining understanding or control.