Responsible AI Beyond Checklists

There’s a pattern that’s become familiar in AI development: a team builds a system, ships it, then someone asks “but is it ethical?” The answer is usually a checklist. Did you check for bias? Did you document the model card? Did you review against the internal guidelines? Boxes are ticked. The system ships.

This isn’t ethics. It’s liability management dressed up as ethics.

The checklist problem

Checklists are useful when you know what you’re looking for. They work well in aviation because the failure modes of a 737 are well understood and finite. The failure modes of a large-scale ML system deployed in a social context are neither.

A bias audit on a hiring model might catch demographic disparities in predictions. It won’t catch the upstream problem that the training data reflects decades of discriminatory hiring. It won’t catch that the model is being used in a context its designers never envisioned. It won’t catch the organizational dynamics that make it hard for someone to raise concerns after deployment.

Checklists give organisations the language to say “we did the responsible AI work” without engaging with the harder, more contextual questions.

What a design-driven approach looks like

The alternative isn’t to abandon process — it’s to treat responsibility as a design constraint from the start, not a review at the end.

This means asking, during scoping: who is affected by this system, and were they involved in defining the problem? It means building in mechanisms to surface harms that emerge post-deployment, not just pre-launch. It means creating incentive structures where raising concerns is rewarded, not penalised.

It also means accepting that some systems shouldn’t be built. That’s the hardest part, because it conflicts with the default assumption that if something is technically possible and commercially viable, it should exist.

The role of interdisciplinary teams

No single discipline has the tools to do this well. ML engineers understand the technical failure modes. Social scientists understand institutional and structural dynamics. Lawyers understand regulatory context. Ethicists can reason clearly about values in conflict. The problem is that these people rarely sit in the same room, and when they do, they often don’t share a common vocabulary.

Building that shared language is slow, unglamorous work. It doesn’t produce a paper. It doesn’t ship a feature. But it’s probably the most important infrastructure a serious AI organisation can invest in.

I’m thinking about this a lot as Neuromatch expands its curriculum on responsible AI. More on that soon.