About

The Project

While we also have other interests in AI safety, this project is specifically about Understanding the Limitations of AI Oversight. By which we mean both traditional "evals" and various other forms oversight such as monitoring, (most forms of) interpretability, or staged deployment; the unifying theme being that these are observation-based methods of controlling AI as opposed to, for example, safety by design.

Our theory of change has two parts: (1) Contribute to "the science of AI oversight", making existing evaluations (and other methods) more effective. (2) Build common knowledge of where evaluations don't work, in order to increase our willingness to invest into alternative approaches.

Team

We are a small team of AI safety researchers based in Prague.

Vojta Kovarik — Google Scholar, LessWrong Vojta is a researcher at Czech Technical University and Charles University in Prague, with a PhD in mathematics and background in game theory. He previously did a postdoc at Vince Conitzer's Foundations of Cooperative AI Lab at Carnegie Mellon University and worked at the Future of Humanity Institute. His work focuses on the game-theoretic foundations of AI evaluation — modelling the interaction between AI systems and their evaluators as strategic games.

Tomas Gavenciak — Google Scholar, Personal webpage Tomas has a PhD in computer science, postdoc at ETH Zurich, and many years of followup research in AI and AI safety. He has also worked on various other projects such as organising the Human-Aligned AI Summer School or epidemic modelling during COVID (including a paper in Science).

Hiring and Collaboration

We are open to external collaborations adjacent to our agenda. Our comparative advantage is in formal and conceptual research, running smaller-scale experiments, and creating MVPs. We are happy to contribute this type of work to a larger project, or collaborate on smaller projects of this nature.

If there is a particularly good fit in terms of motivation and background, we are open to hiring — among other possibilities, Vojta can supervise PhD students or a postdoc at Czech Technical University in Prague.

The key prerequisites we are looking for:

Existential risk from AI — understanding the stakes involved in building superhuman AI or "merely" incorporating automation into the foundations of our society; being familiar with prominent thinking on this topic
Game theory — ability to use the existing formal tools for reasoning about strategic AI while maintaining healthy scepticism about what these tools can do
Modern AI — understanding the basics of machine learning and frontier AI, ability to run experiments with LLMs (experiments on evaluation awareness, sandbagging, and strategic behaviour in LLMs)

If this research sounds relevant to your work, reach out to Vojta at his gmail address.

Funding

This project is funded by Effective Ventures. Vojta also has funding from CTU.

About#

The Project#

Team#

Hiring and Collaboration#

Funding#

About

The Project

Team

Hiring and Collaboration

Funding