Research Blog

Latest insights in AI Safety, Alignment, and Governance

Curated Research
AI Safety
February 15, 2025

Scalable Oversight for Advanced AI Systems

New approaches to monitoring and controlling AI systems at scale, addressing the challenge of human oversight for superhuman capabilities.

Nicholas Temple Mann Read More →
Alignment
February 10, 2025

Constitutional AI: Designing AI Values

Exploring how constitutional frameworks can guide AI development toward human values and ethical alignment principles.

Nicholas Temple Mann Read More →
Technical Papers
February 8, 2025

Mechanistic Interpretability in Neural Networks

Understanding the internal mechanisms of neural networks through circuit analysis and feature attribution techniques.

Kathleen Chapman Read More →
Governance
February 5, 2025

International AI Governance Frameworks

Analysis of emerging global governance structures for AI safety and the coordination challenges ahead.

James Castle Read More →
Policy Analysis
February 1, 2025

Policy Recommendations for Large Language Models

Evidence-based policy suggestions for governing the development and deployment of powerful language models.

CSGA Research Team Read More →
AI Safety
January 28, 2025

Distributional Shift and Robustness

How AI systems fail when deployed outside their training distribution and methods for improving robustness.

CSGA Research Team Read More →
Alignment
January 25, 2025

RLHF and Beyond: Preference Learning

Current methods and emerging alternatives to reinforcement learning from human feedback for value alignment.

Nicholas Temple Mann Read More →
Technical Papers
January 20, 2025

Red Teaming Adversarial LLMs

Systematic approaches to finding weaknesses in AI systems before deployment through adversarial testing.

James Castle Read More →
Governance
January 18, 2025

Accountability in AI Systems

Building accountability mechanisms for AI decision-making in high-stakes domains.

CSGA Research Team Read More →
Policy Analysis
January 15, 2025

Talent and Compute: Critical Resources

How access to talent and compute resources shapes the AI safety ecosystem and policy implications.

CSGA Research Team Read More →

Stay Updated

Get the latest AI safety research delivered to your inbox