AI Safety Fundamentals

Machine Learning track

The machine learning track of AI Safety Fundamentals is a seven-week research-oriented reading group on technical AI safety. Topics covered include neural network interpretability, learning from human feedback, goal misgeneralization in reinforcement learning settings, and potential catastrophic risks from advanced AI systems. The program is open to both undergraduate and graduate students. Students with machine learning experience are especially encouraged to apply, although no prior experience is required.

Participants meet weekly in small sections facilitated by a TA that is a graduate student or an upperclassman with experience in AI safety research. Dinner is provided, and no work is assigned outside of weekly meetings. Our curriculum is based on a course developed by OpenAI researcher Richard Ngo.

Apply here by Wednesday, February 14th, 11:59pm EST.

For those interested in AI policy, we recommend applying to the policy track of AI Safety Fundamentals. You can participate in both tracks simultaneously.

AI Safety Fundamentals

Machine Learning track

MIT AI Alignment