Research Papers and Reports – AI Standards Lab

AI Standards Lab publishes research on AI safety engineering, AI governance, and standards development. This list includes both lab publications and papers co-authored by our team members.

Open Problems in AI Incident Governance

July 7, 2026

AI systems may produce failures after deployment that pre-deployment safety assessments do not anticipate. Managing these failures requires adequate AI incident governance, encompassing sound definitions, taxonomies, monitoring practices, reporting mechanisms, and incident analysis. We examine existing frameworks from regulatory bodies (including the EU AI Act, California’s SB 53, and New York’s RAISE Act) and independent…
Read more
Recommendations for the EU AI Act Digital Omnibus Trilogue

April 7, 2026

We have published a report analysing the Council and European Parliament positions going into the EU AI Act Omnibus trilogue. We make several recommendations for the parties engaged in the trilogue. Our main concern in making these recommendations is that, while it is positive if the Omnibus can remove some administrative burdens, it should not…
Read more
Defining AI Models and AI Systems: A Framework to Resolve the Boundary Problem

March 19, 2026

AI regulation assigns distinct obligations to providers of AI models and AI systems, but the lack of clear, consistent definitions for “AI model” and “AI system” creates ambiguity across the value chain. This paper surveys the definitions used in academic literature and regulatory documents, and proposes conceptual and operational definitions for drawing a principled boundary…
Read more
Recommendations on the European Parliament Amendments to the EU AI Act in the Digital Omnibus

March 5, 2026

We have published a report analysing some of the 750+ AI Act amendments that were proposed by the European Parliament in the context of the EU AI Act Omnibus. We provide a first analysis of these amendments, highlighting specific ones that we either welcome or oppose, based on our area of expertise.
Read more
A Scorecard for the Quality of AI Evaluations

February 23, 2026

We have published a working draft of a Quality Scorecard for AI Evaluations, a standards-based framework for assessing the reliability, validity, and rigour of AI evaluations. The scorecard provides structured scoring across five dimensions and a classification system to match evaluations to appropriate governance and deployment contexts.
Read more
Recommendations on the Digital Omnibus Amendments to the EU AI Act

January 23, 2026

We analysed the Commission’s Digital Omnibus proposals for the AI Act, highlighting concerns with Article 6(4) database deletion, Article 75(1) enforcement centralisation, and Article 4a data processing rules, whilst proposing targeted amendments to address critical regulatory gaps.
Read more
Agentic Product Maturity Ladder V0.1

December 1, 2025

MLCommons releases the Agentic Product Maturity Ladder V0.1, a systematic framework defining six progressive maturity levels (R0–R5) for benchmarking AI agent reliability. Initial assessment of four task domains shows no agents yet meet thresholds for product-level capability benchmarking.
Read more
Safety Frameworks and Standards: A comparative analysis to advance risk management of frontier AI

October 9, 2025

This research memo compares Frontier Safety Frameworks with international risk management standards. FSFs offer frontier-specific innovations like capability thresholds but often leave key considerations implicit. Standards provide systematic rigor but weren’t designed for frontier AI. The paper shows how integrating both approaches can advance frontier AI risk management.
Read more
An Analysis of the GPAI model guidelines published by the European Commission

July 29, 2025

On July 18, 2025, The European Commission published its first Guidelines on the scope of obligations for providers of general-purpose AI models under the AI Act. In this post, we provide an analysis of these guidelines. As these obligations for providers go into force on 2 August 2025, we decided that a timely publication of…
Read more
Deprecating Benchmarks: Criteria and Framework

July 8, 2025

As AI models rapidly advance, many benchmarks become outdated or flawed yet continue to be used, inflating performance claims and obscuring safety concerns. This paper introduces criteria and a framework for deprecating inadequate benchmarks, with recommendations for developers, policymakers, and governance actors on how to maintain rigorous evaluation standards.
Read more