Transformers [1] have demonstrated remarkable success across various domains, including natural language processing, vision, and, more recently, graph-based tasks [2, 3]. The extension of transformers to graphs typically involves positional encoding of nodes to introduce structural information [3]. Meanwhile, graph neural networks (GNNs) have been extensively studied through the lens of differential equations, particularly as discretisations of heat diffusion processes on graphs [4]. This interpretation highlights a key limitation of GNNs: their tendency to over-smooth high-frequency information [5]. Empirical evidence suggests that transformers applied to graphs mitigate this issue, yet a theoretical understanding remains underdeveloped [6]. Similar to GNNs, a recent line of work [7, 8, 9] established transformers as discretisations of dynamical particle systems on Euclidean domains.
[1] A Vaswani et al. Attention is all you need. NeurIPS, 2017
[1] T Lin et al. A Survey of Transformers. AI Open, 2022
[3] A Shehzad et al. Graph Transformers: A Survey. ArXiv Preprint, 2024
[4] B Chamberlain and J Rowbottom et al. GRAND: Graph Neural Diffusion. NeurIPS, 2021
[5] F Di Giovanni et al. Understanding convolution on graphs via energies. TMLR, 2023
[6] Müller et al. Attending to Graph Transformers. TMLR, 2024
[7] V Castin et al. A Unified Perspective on the Dynamics of Deep Transformers. ArXiv Preprint, 2025
[8] M E Sander et al. Sinkformers: Transformers with doubly stochastic attention. AISTATS, 2022
[9] B Geshkovski et al. The emergence of clusters in self-attention dynamics, NeurIPS, 2024
This page is only available in English