ICLR 2026

The Mind's Transformer:
Computational Neuroanatomy of LLM-Brain Alignment

Cheng-Yeh Chen  &  Raghupathy Sivakumar
Georgia Institute of Technology
Paper Code ICLR Poster

Overview
Abstract

The alignment of Large Language Models (LLMs) and brain activity provides a powerful framework to advance our understanding of cognitive neuroscience and artificial intelligence. MindTransformer zooms into one of the fundamental units of LLMs—the transformer block—to provide the first systematic computational neuroanatomy of its internal operations and human brain activity.

Analyzing 21 state-of-the-art LLMs (270M–123B parameters) across five major families, we reveal a universal intra-block hierarchy: early attention states align with sensory cortices, while Feedforward Network (FFN) states align with higher-order association areas. Over 90% and 96% of brain voxels in language and sensory regions are better explained by previously unexplored intermediate computations than by the commonly used hidden states.

Furthermore, we identify that Rotary Positional Embeddings (RoPE) are the specific mathematical engine driving alignment in the human auditory cortex—per-head queries with RoPE best explain 74% of auditory cortex voxels versus 8% without, systematically improving alignment along both ventral and dorsal auditory pathways. Building on these insights, MindTransformer achieves correlation improvements of 0.111 in primary auditory cortex—gains that exceed those from scaling LLMs by 456×.


Framework
Dissecting the Transformer Block
We decompose each transformer block into 13 distinct intermediate states and map each to human brain activity.
MindTransformer Framework Overview — 13 intermediate states extracted from each transformer block
Figure 1 · Overview of the MindTransformer Framework
We extract 13 fine-grained intermediate states from each transformer block (left) and map them to human brain activity via voxel-wise encoding models. The winning ratio analysis (center) reveals that no single standard representation dominates—diverse states contribute across the brain. Brain plots (right) show the best-performing state for each voxel, revealing a double dissociation: FFN Activated States and Per-Head Context Vectors dominate in semantic processing regions (Language Network), while RoPE-enhanced attention states dominate in sensory cortices (Auditory Cortex).
Key Discoveries
Three principal contributions from our computational neuroanatomy analysis.
+31%
Auditory Cortex Gain
MindTransformer achieves 0.467 correlation in Heschl's Gyrus—a 31.0% improvement over baselines that surpasses the gains from scaling model size by 456× (270M → 123B parameters). The improvement is 29.2% random-adjusted and 46.0% GloVe-adjusted.
RoPE
Auditory Stream Engine
Rotary Positional Embeddings are the critical component: per-head queries with RoPE explain 73.88% of auditory cortex voxels versus just 7.82% without—a nearly tenfold increase. This effect delineates both the ventral and dorsal auditory processing streams.
5 Families
Universal Hierarchy
The intra-block hierarchy is universal across Llama, Mistral, Qwen, Gemma, and GPT: early attention states → sensory cortices, FFN states → association areas. This parallels the brain's own processing hierarchy at a within-block level of granularity.
Finding 1
Universal Intra-Block Hierarchy
The computational depth within a single transformer block mirrors the brain's cortical hierarchy.
Weighted Computational Depth vs Cortical Hierarchy across LLM families
Figure 2 · Weighted Computational Depth vs. Cortical Hierarchy
We quantify the topological alignment between the transformer's internal processing depth (y-axis) and the brain's cortical hierarchy (x-axis) across five LLM families. Two functional segments emerge: the auditory stream (HG → MTG) shows a steep slope where ascending cortical levels correspond to deeper intra-block computations, while the language network (MTG → AG) shows a plateau where alignment stabilizes at the block's later stages. High R² values across most families confirm this is a universal property of transformer architectures.

Finding 2
The Neurobiological Role of RoPE
Rotary Positional Embeddings delineate the brain's canonical ventral and dorsal auditory streams.
RoPE alignment along ventral and dorsal auditory streams
Figure 3 · RoPE Delineates the Brain's Auditory Streams
Comparing per-head query with RoPE vs. input hidden states, the difference map (left) reveals strong improvement concentrated around the Sylvian fissure. The top-10 parcels with highest correlation difference (center) strikingly delineate the brain's canonical ventral and dorsal streams for auditory language processing. The largest improvement is in primary auditory cortex (Heschl's Gyrus, Δ=0.1233), cascading along both anatomical streams to the Planum Temporale and Superior Temporal Gyrus. This provides the first neurobiological validation of RoPE's functional role and the first strong evidence of LLM-brain alignment in low-level sensory processing regions.

Finding 3
Efficiency vs. Scale
Computational neuroanatomy insights outperform massive model scaling.
MindTransformer vs model scaling across brain regions
Figure 4 · Beating the Scaling Laws
We compare MindTransformer (Mode 2, ✕ markers) against standard baselines (● markers) across models from 270M to 123B parameters. In low-level sensory regions like Heschl's Gyrus (leftmost panel), MindTransformer's improvement is substantially larger than gains from 456× model scaling. Moving toward high-level semantic regions (rightward panels), the gap narrows as standard baselines already perform well. This demonstrates that understanding the internal computational neuroanatomy of transformers yields greater efficiency than simply increasing model size—particularly in sensory regions where traditional approaches have struggled.
Comprehensive Analysis
Validated across 21 models spanning 5 architectural families, from 270M to 123B parameters.
Family Model Variants (Parameters) Layers
Llama-3 1B, 3B, 8B, 70B 16 – 80
Qwen-3 0.6B, 1.7B, 4B, 8B, 14B, 32B 28 – 64
Mistral 7B (v0.2/v0.3), Small (22B), Large (123B) 32 – 88
GPT-oss 20B, 120B 24 – 36
Gemma-3 270M, 1B, 4B, 12B, 27B 18 – 62
Reference
Citation
@inproceedings{chen2026mindtransformer,
  title     = {The Mind's Transformer: Computational Neuroanatomy
               of LLM-Brain Alignment},
  author    = {Chen, Cheng-Yeh and Sivakumar, Raghupathy},
  booktitle = {International Conference on Learning
               Representations (ICLR)},
  year      = {2026}
}