Research

Papers

AI Testing Should Account for Sophisticated Strategic Behaviour. Kovarik, Chen, Petersen, Ghersengorin, Conitzer. NeurIPS 2025 (Position Paper).(Show less)

Argues that AI evaluations must account for the possibility that AI systems reason strategically about being tested, and illustrates how game-theoretic analysis can be used to "evaluate evaluations".

Game Theory with Simulation of Other Players. Kovarik, Oesterheld, Conitzer. IJCAI 2023.(Show more)

Game Theory with Simulation in the Presence of Unpredictable Randomisation. Kovarik, Sauerberg, Hammond, Conitzer. AAMAS 2025.(Show more)

Blog posts

Evaluation as a (Cooperation-Enabling?) Tool(Show more)

Post sequence: Understanding AI Oversight and Its Limitations
A collection of posts, each examining one idea about AI evaluation and its limitations. They give informal descriptions of ideas that form the starting point of our investigation — in future work, we might look into some of these in more detail, with experiments, or more formally. The posts are grouped thematically, not in the order of importance.(Show less)

Context: Some of the ideas give important context that motivates and informs our work on AI oversight, but isn't central to our agenda:

Adjacent Agenda: Infrastructure for Trading with Partially Misaligned AIs(Show more)

Scheming Is Less Rational than X-risk Tropes Suggest(Show more)

WIP What Are the Different Purposes of AI Evaluation?(Show more)

Foundational concepts: We describe several concepts that are interesting in their own right but also serve as building blocks for the later results:

Deployment Awareness(Show more)

WIP Evaluation Gaming Isn't a Problem Just for "Evals"(Show more)

Entanglement: To Solve a Complex Task, the Solver Needs a Lot of Information about the Environment(Show more)

Limitations of AI oversight: Intuitions and results directly related to our main thesis, that observation-based oversight of AI is fundamentally limited when it comes to dealing with misaligned AI that behaves strategically:

WIP Conjecture: Observation-based Oversight Cannot Prevent AI Takeover(Show more)

The Human Substitution Test as a Terminal Diagnosis for AI Evaluations(Show more)

If This Were a Test, How Much Would It Cost?(Show more)

Research#

Papers#

Blog posts#

Research

Papers

Blog posts