Experiments we’re interested in supporting
Eventually, we want every safety and alignment researcher in the world to be able to test their solution ideas on our platform. This will not only yield benchmarks for comparison, but also a playground for interaction, where at least some acute safety failures and harmful interaction dynamics can be discovered in silico before reaching the real world. It’s probably irresponsible to expect our platform alone will ever be enough to certify a new AI technology as definitely safe, but if it appears unsafe on our platform then it probably shouldn’t be deployed in reality (which would probably be the opposite of fun). Below are some of the key research areas in which we intend to support early experimentation:
Alignment with humans and human-like cultures
For AI alignment solutions to work in the real world, they have to work with the full richness of humans and human culture (e.g, transmissibility of culture through interaction, and language as a form of culture). We would like to develop benchmarks that capture aspects of this richness.
We’re interested in testing whether a given agent or group is able to interact safely and productively with another agent or group employing different alignment paradigms, different coordination norms, or different value functions.
AI “assistant” algorithms are intended to learn and align with the intentions of a particular person or group, and take safe actions assisting with those intentions.
We’re interested in whether “mediator” agents, when introduced into an interaction between two or more groups, can bring greater peace and prosperity between those groups.