Anthropic AI Model Evaluation Initiative
This initiative provides funding for projects that develop independent evaluations to assess the safety and advanced capabilities of AI models, targeting researchers and organizations focused on AI safety and evaluation methodologies.
Description
Anthropic’s new initiative supports the development of high-quality, third-party evaluations to measure advanced AI model capabilities and associated risks. This program responds to the growing need for robust assessments in the AI landscape, where current evaluations are limited, particularly in safety-relevant areas. The initiative seeks to fund external organizations that can create effective evaluation tools aligned with Anthropic’s Responsible Scaling Policy, with a focus on critical safety metrics and national security risks.
Applications are invited from any third-party organization capable of proposing sophisticated evaluation mechanisms. There is no explicit mention of organizational type restrictions, which suggests eligibility may include nonprofits, research institutions, academic entities, and possibly private sector developers with the required domain expertise. Evaluations must be submitted in English, and proposals will be reviewed on a rolling basis.
The initiative categorizes its funding focus into three primary areas: AI Safety Level (ASL) assessments, advanced capability and safety metrics, and infrastructure/tools for evaluation development. ASL assessments target cybersecurity risks, CBRN threats, model autonomy, national security implications, and social manipulation. Metrics in the second category address complex scientific understanding, multilingual capability evaluations, harmful content detection, and broader societal impacts. The third area supports the development of tools, templates, and platforms that streamline the creation and deployment of evaluations, especially for those without coding expertise.
Anthropic encourages proposals that demonstrate high difficulty, originality, and scalability. Evaluations should not duplicate existing training data, must be well-documented, and ideally, offer expert-level benchmarks and diverse task formats. Safety-relevant evaluations should convincingly illustrate potential risks if high model performance were observed.
Applicants will receive tailored funding based on their project needs and can expect iterative support from Anthropic’s domain experts across various internal teams. Proposals should be submitted via the designated application form, and inquiries can be directed to eval-initiative@anthropic.com.