Somehow, following one of the innumerable recent commentary posts or threads on the latest AI chatbot developments and what they portend for AI safety, I got to reading
‘s case for panic which led to ‘s excellent, levelheaded list of mitigation protocols that might plausibly reduce the risk of catastrophic runaway AI (note she now has an update in light of the newest developments). Stephanie in turn pointed me to the NIST "playbook" for managing AI risk currently (but only until February 27th!) open for public comment. I submitted a public comment which I reproduce below, in the spirit of an “open letter” to invite further comment. If you’re not interested in the nuts and bolts of hypothetical AI risk management frameworks, you can stop reading now.To whom it may concern:
I read with interest the recent AI risk management draft playbook from NIST and am thankful for the opportunity to submit comment. I’m glad there’s some standards agency doing this, and laying out a list of high-level worthwhile actions is a good start. But I think the playbook falls short of true usefulness in three ways:
It does not offer actionable advice on actual safety-increasing best practices, focusing instead on the sorts of processes that would create, catalog, and codify those best practices. This is a missed opportunity because, despite the newness of the AI field, historical experience with risky systems already gives us plenty of specific best practices worth recommending.
It is not well organized to distinguish between, or prioritize, measures that reduce risk of life-endangering catastrophes and measures that reduce other risks (e.g. legal risk, bias risk). In light of growing domain expert alarm about catastrophic risk specifically, this is a missed opportunity to maximize the impact and focus of a safety framework. The point is not that other categories of risk don’t matter, but that given the magnitude of potential material impacts, life-endangering catastrophic risk clearly deserves center stage.
It does not incorporate lessons learned from the successes, failures, and general approaches of other existing frameworks for regulating high-risk critical infrastructure, including both government and private sector approaches. Again, I acknowledge that AI is new and in some important respects different, but nonetheless there are parallels to other forms of software development as well as hardware infrastructure, where other risk management frameworks have achieved both great successes and failures in effectiveness and efficiency, and it’s worth learning from both the successes and the failures. Relevant examples include the DO-178C software development standards for avionics, the SOC2 security compliance framework, and possibly other frameworks and standards used in situations such as nuclear regulation.
Let’s look in detail at where the NIST playbook could and should have done better in these areas.
Give actionable specifics
GOVERN 1.2 suggests the creation of a bunch of important risk management policies, notably: risk mapping and measurement, testing and validation, change management, and incident response. GOVERN 1.3 likewise gives a useful taxonomy of important impact measurement activities.
However, these sections of the playbook says nothing about what these policies and activities should contain or prescribe. Later on there are indirect gestures at some of them, e.g. GOVERN 4.1 mentioning three lines of defense and red-teaming; MAP 3.3 suggesting narrowed scope; MAP 3.5 on interpretability; MEASURE 2.4 on monitoring for anomalies; MEASURE 2.5 again discussing operating scope; MEASURE 2.6 on chaos engineering; MEASURE 2.7 again on red teaming; MEASURE 3.2 on emergent risks; and MANAGE 2.4 on bypass and shutoff mechanisms. But there’s nothing about how red teams should operate, or what chaos engineering-based testing approaches should entail, or how stringent change management requirements should be.
It may be that the playbook authors considered AI to be too new a field to be able to specify anything in these areas yet. But we know existing risk management experts can already say useful things about how AI governance mechanisms should work, because they’re saying them. For example, Stephanie Losi’s blogpost at riskmusings.substack.com/p/possible-paths-for-ai-regulation discusses specific kinds of best practices likely to be useful in the AI domain in areas such as defense in depth, redundancy and diversity of controls, separation of duties, immutable logging, near-miss reporting, and more. And it gives a long list of specific questions that good AI risk management policies must answer. I recognize that it’s not easy to balance prescriptiveness and flexibility in a regulatory framework: but the NIST framework as written needlessly allows for box-checking policies that don’t substantively address the major risks or use widely known best practices to counter them.
Put the worst risks first
Several sections of the playbook lump together disparate types of risks of widely varying severities. For example, MANAGE 1.3 recommends “Prioritize risks involving physical safety, legal liabilities, regulatory compliance, and negative impacts on individuals, groups, or society.” One of these things is not like the others, and it’s the one where people die. Failure to prioritize that above all others— to focus regulatory prescriptions on that risk case— drastically reduces the usefulness of the NIST framework.
By comparison, the DO-178C standard for avionics software development has five levels of regulatory stringency, stratified by the level of risk to human life from failure of the software being regulated. Level A is for the software whose failures could make the plane crash. Level E is for cases where failure is a minor inconvenience, like the in-flight entertainment not working. The standard spells out the specific additional levels of stringency of practice required as the risk to life climbs.
There has been much recent controversy over dire warnings that superintelligent AI could destroy the entire human species. We need not take a side in that controversy to observe that malfunctioning AI with access to critical physical infrastructure could cause damage at least as severe as a plane crash, or a major terrorist attack, or a nuclear accident. Regulatory frameworks for protections against those kinds of catastrophes are precise and specific in their targeting of analyses, monitoring and testing steps, and fail-safes to the reduction of those risks to human life. A framework designed for a tool as powerful as AI deserves no less. To say this is not to dismiss fears about the impacts of e.g. AI bias, but to put them in proper perspective.
Learn from existing practice
In a novel regulatory framework for a software tool with catastrophic failure potential, I would have expected explicit and credited borrowings from other frameworks, public and private, for regulating tools with such potential. The DO-178C avionics framework mentioned above, with which I’m most personally familiar, is only one example; others include the SOC2 security auditing and best-practice standards, the NRC rules governing nuclear power development, and I’m sure many others that people with broader experience (especially outside the US) could list.
Moreover, learning from these frameworks is not just a matter of copying or adapting the best practices they prescribe. In many cases there are years or decades’ worth of practical experience with these frameworks in use, and regulators, auditors, and developers subject to the regulations could all usefully be consulted for lessons learned. Which provisions have turned out to substantively improve safety, and which are box-checking bureaucracy? Which are easiest and hardest to comply with? What kinds of risk analyses are useful when the excrement hits the air-conditioning, and what kinds are expensive exercises in making up phony numbers and castles-in-the-air scenarios?
It may be that NIST has consulted such experts already. Even if so, cataloging those consultations and remarking on lessons learned would be a worthwhile adjunct to the framework, to show that it is in conversation with the existing human wisdom on risk management and will continue to be so.
Come to think of it, it would also be good to see evidence of this framework drawing on much deeper histories of risk management in technical domains we now consider settled and benign. An OpenAI scientist recently took a lot of online flak for comparing the risks of AI to those of electricity: twitter.com/woj_zaremba/status/1627370896095838208?lang=en. But the analogy is fruitful. We trust electricity in our homes because we know it operates according to a series of well-understood laws, and because the minority of people who understand those laws have engineered gadgets that enable the rest of us, who don’t understand the laws at all, to use its power safely with ease. Those gadgets are validated for safety according to detailed, consistent procedures at both design and manufacturing stages. None of that was true when electricity was as new a field as AI is now.
The goal of an AI risk management framework should surely be to make AI more and more like electricity in those respects over time. This NIST framework doesn’t help achieve that goal yet, but it could.
Bravo, substantive comments and public service in action! I'm so glad our discussion in the comments led to this.