About me

I am a researcher on the interpretability team at Anthropic where I try to reverse engineer how language models compose microscopic building blocks into higher level computational circuits.

I completed my PhD at MIT where I worked with Dimitris Bertsimas, Max Tegmark, and Neel Nanda on interpretability. Before that, I was a software engineer at Google working on data pipelines for the storage analytics team, and researched fairness-optimized political redistricting with David Shmoys.

Selected Publications

See Google Scholar for a full and up-to-date list.

  • Refusal in Language Models is Mediated by a Single Direction
    by Andy Arditi, Oscar Obeso, Aaquib Syed, Daniel Paleka, Nina Panickssery, Wes Gurnee, Neel Nanda
    Appeared at NeurIPS 2024 [arXiv]
  • Not All Language Model Features are Linear
    by Joshua Engels, Eric J Michaud, Isaac Liao, Wes Gurnee, Max Tegmark
    Appearing at ICLR 2025 [arXiv]
  • Confidence Regulation Neurons in Language Models
    by Alessandro Stolfo, Ben Wu, Wes Gurnee, Yonatan Belinkov, Xingyi Song, Mrinmaya Sachan, Neel Nanda
    Appeared at NeurIPS 2024 [arXiv]
  • Universal Neurons in GPT2 Language Models
    by Wes Gurnee, Theo Horsley, Zifan Carl Guo, Tara Rezaei Kheirkhah, Qinyi Sun, Will Hathaway, Neel Nanda, Dimitris Bertsimas
    Published in TMLR [arXiv] [Twitter]
  • Language Models Represent Space and Time
    by Wes Gurnee and Max Tegmark.
    Appeared at ICLR 2024 [arXiv] [Twitter]
  • Finding Neurons in a Haystack: Case Studies with Sparse Probing
    by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dimitrii Troitskii, and Dimitris Bertsimas
    Published in TMLR [Paper] [arXiv] [Twitter]
  • Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
    by Wes Gurnee and Dimitris Bertsimas.
    Published in Nonlinear Dynamics [Paper] [arXiv] [Twitter]
  • Combatting gerrymandering with social choice: The design of multi-member districts
    by Nikhil Garg, Wes Gurnee, David Rothschild, David Shmoys
    Published in EC ‘22 [Paper] [arXiv] [Talk]
  • Fairmandering: A column generation heuristic for fairness-optimized political districting
    by Wes Gurnee and David Shmoys
    Best paper award at SIAM ACDA ‘21 [Paper] [arXiv] [Talk]

Other Projects and Writing