Research · The Lab

Method,
not direction.

A closer look at each thread: what it is, and the open questions inside it.

Machines
Alignment

Alignment: models that do what we intend.

As models get more capable, the gap between what we ask for and what we actually want is harder to close. We work on training and oversight that keep models doing what we intend, and correctable when they don't.

1.1Scalable oversight for tasks humans can't fully check.
1.2Honesty and principled abstention under uncertainty.
1.3Corrigibility and resistance to specification gaming.

Machines
Interpretability

Interpretability: understanding how a model works.

We reverse-engineer what's actually happening inside trained models, the features and circuits behind their behavior, so claims about a model can be checked instead of assumed.

2.1Features and circuits behind specific behaviors.
2.2Probing and editing internal representations.
2.3From interpretability findings to safety cases.

Molecules
Biology

Biology: reading and designing biological systems.

We use models to read and design biological systems, and we borrow from what biology already knows about learning and robustness.

3.1Models for biological sequence and structure.
3.2Learning from noisy, expensive experimental data.
3.3Biological priors on robustness and adaptation.

Systems
Societal

Societal: how AI affects work and society.

AI ends up in institutions, markets, and people's lives. We study how it shifts work and power, and which policy and incentive choices actually matter.

4.1Diffusion, labor, and the economics of capable AI.
4.2Governance and incentive design for deployment.
4.3Measuring real-world impact, not benchmarks.

Method,not direction.

Alignment: models that do what we intend.

Interpretability: understanding how a model works.

Biology: reading and designing biological systems.

Societal: how AI affects work and society.

Method,
not direction.