Cambridge AI Safety Hub Invitation to apply for the Mentorship for Alignment Researchers, Apply by 5/3
The Cambridge AI Safety Hub would like to invite exceptional computer
science students at Purdue to apply to the upcoming iteration of the
Mentorship for Alignment Researchers (MARS), an AI safety fellowship
that matches exceptional students and early-career researchers with
experienced researchers and academics from AI labs, think tanks, and
academia. In July we will be flying out promising students and working
professionals to the United Kingdom to participate in a "sprint week"
where they will begin a research project that they'll subsequently
carry out remotely through September.
We'll have more than 20 projects spanning multiple disciplines, but a
few projects we think especially interesting to computer science
students are:
• Research with Yossi Gandelsman (Reve) on whether LLMs can predict
the layer at which their own neurons appear, detect polysemantic
neurons, identify causal connections between two neurons in their own
architecture, or anticipate their own attention patterns.
• A project with Lindley Lentati (Cambridge Inference) on reproducible
white-box jailbreak monitoring, covering automated attack generation,
multi-layer probe aggregation, and streaming token-by-token detection.
• An investigation with Rhea Karty and Jacob Davis (ERA; LASR Labs) of
whether steering vectors for traits like confidence and honesty are
context-independent or persona-dependent, using LoRA adapters for
character-trained models and tracking trait geometry across training
checkpoints.
• Work with James Lucassen (Redwood Research) on deferral protocols
for AI control — implementing defer-to-trusted in BashArena,
developing usefulness monitors, and building methodology to evaluate
them.
• Work with Shivam Raval and Luiza Corpaci (Harvard; AMD) on detecting
unfaithful formal translations, using Lean-verified equational
theories as ground truth and mech-interp methods to locate where
translation failures occur.
Applications close on May 3rd. Students can find more information on
our program's webpage.
MARS itself isn't paid, but the Cambridge AI Safety Hub covers
airfare, lodging, meals, visa costs, and compute (where applicable)
for the in-person sprint week in Cambridge. The rest of the program is
remote.