Research Blog - CSGA-AI

AI Safety

February 15, 2025

Scalable Oversight for Advanced AI Systems

New approaches to monitoring and controlling AI systems at scale, addressing the challenge of human oversight for superhuman capabilities.

Nicholas Temple Mann Read More →

Alignment

February 10, 2025

Constitutional AI: Designing AI Values

Exploring how constitutional frameworks can guide AI development toward human values and ethical alignment principles.

Nicholas Temple Mann Read More →

Technical Papers

February 8, 2025

Mechanistic Interpretability in Neural Networks

Understanding the internal mechanisms of neural networks through circuit analysis and feature attribution techniques.

Kathleen Chapman Read More →

Governance

February 5, 2025

International AI Governance Frameworks

Analysis of emerging global governance structures for AI safety and the coordination challenges ahead.

James Castle Read More →

Policy Analysis

February 1, 2025

Policy Recommendations for Large Language Models

Evidence-based policy suggestions for governing the development and deployment of powerful language models.

CSGA Research Team Read More →

AI Safety

January 28, 2025

Distributional Shift and Robustness

How AI systems fail when deployed outside their training distribution and methods for improving robustness.

CSGA Research Team Read More →

Alignment

January 25, 2025

RLHF and Beyond: Preference Learning

Current methods and emerging alternatives to reinforcement learning from human feedback for value alignment.

Nicholas Temple Mann Read More →

Technical Papers

January 20, 2025

Red Teaming Adversarial LLMs

Systematic approaches to finding weaknesses in AI systems before deployment through adversarial testing.

James Castle Read More →

Governance

January 18, 2025

Accountability in AI Systems

Building accountability mechanisms for AI decision-making in high-stakes domains.

CSGA Research Team Read More →

Policy Analysis

January 15, 2025

Talent and Compute: Critical Resources

How access to talent and compute resources shapes the AI safety ecosystem and policy implications.

CSGA Research Team Read More →

Scalable Oversight for Advanced AI Systems

Constitutional AI: Designing AI Values

Mechanistic Interpretability in Neural Networks

International AI Governance Frameworks

Policy Recommendations for Large Language Models

Distributional Shift and Robustness

RLHF and Beyond: Preference Learning

Red Teaming Adversarial LLMs

Accountability in AI Systems

Talent and Compute: Critical Resources

Related Resources

Publications

Glossary

Methodology

Stay Updated