About me
I am a third year PhD student researching language model interpretability at the MIT ORC advised by Dimitris Bertsimas. I am interested in developing a mechanistic understanding of neural networks and using this to better monitor, control, and align advanced AI systems. I am fortunate to also collaborate with Neel Nanda and Max Tegmark and am graciously supported by an Open Philanthropy early career grant.
Previously, I was a software engineer at Google working on data pipelines for the storage analytics team, and researched fairness-optimized political redistricting with David Shmoys.
Publications
- Universal Neurons in GPT2 Language Models
by Wes Gurnee, Theo Horsley, Zifan Carl Guo, Tara Rezaei Kheirkhah, Qinyi Sun, Will Hathaway, Neel Nanda, Dimitris Bertsimas
Under review [arXiv] [Twitter] - Language Models Represent Space and Time
by Wes Gurnee and Max Tegmark.
Published at ICLR ‘24 [arXiv] [Twitter] - Training Dynamics of Contextual N-Grams in Language Models
by Lucia Quirke, Lovis Heindrich, Wes Gurnee, and Neel Nanda
Published in NeurIPS 2023 ATTRIB Workshop [arXiv] [Twitter] - Finding Neurons in a Haystack: Case Studies with Sparse Probing
by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dimitrii Troitskii, and Dimitris Bertsimas
Published in TMLR [Paper] [arXiv] [Twitter] - Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
by Wes Gurnee and Dimitris Bertsimas.
Published in Nonlinear Dynamics [Paper] [arXiv] [Twitter] - Combatting gerrymandering with social choice: The design of multi-member districts
by Nikhil Garg, Wes Gurnee, David Rothschild, David Shmoys
Published in EC ‘22 [Paper] [arXiv] [Talk] - Fairmandering: A column generation heuristic for fairness-optimized political districting
by Wes Gurnee and David Shmoys
Best paper award at SIAM ACDA ‘21 [Paper] [arXiv] [Talk]
Other Projects and Writing
- SAE reconstruction errors are (empirically) pathological (2024) - A preliminary research post on a potential issue with sparse autoencoder reconstructions.
- Inductive Biases of SGD Training (2022) - A review of inductive biases of stochastic gradient descent (SGD) when training deep neural network.
- Analytics for Health Security (2022) - An analytics enabled defense-in-depth strategy for health security.
- Optimal Political Districting: The Anchor Method (2022) - A formulation of optimal political districting using the anchor method.
- Scalable Approximation of k-medians for Political Districting (2020) - Using a linear programming relaxation to approximate the k-medians problem for political districting.