Scalable Oversight for Advanced AI Systems
New approaches to monitoring and controlling AI systems at scale, addressing the challenge of human oversight for superhuman capabilities.
Latest insights in AI Safety, Alignment, and Governance
New approaches to monitoring and controlling AI systems at scale, addressing the challenge of human oversight for superhuman capabilities.
Exploring how constitutional frameworks can guide AI development toward human values and ethical alignment principles.
Understanding the internal mechanisms of neural networks through circuit analysis and feature attribution techniques.
Analysis of emerging global governance structures for AI safety and the coordination challenges ahead.
Evidence-based policy suggestions for governing the development and deployment of powerful language models.
How AI systems fail when deployed outside their training distribution and methods for improving robustness.
Current methods and emerging alternatives to reinforcement learning from human feedback for value alignment.
Systematic approaches to finding weaknesses in AI systems before deployment through adversarial testing.
Building accountability mechanisms for AI decision-making in high-stakes domains.
How access to talent and compute resources shapes the AI safety ecosystem and policy implications.