Dual-use foundation models with widely available mod el weights could plausibly exacerbate the risks AI models pose to public safety by allowing a wider range of actors, including irresponsible and malicious users, to leverage the existing capabilities of these models and augment them to create more dangerous systems.33 For instance, even if the original model has built-in safeguards to prohibit certain prompts that may harm public safety, such as content filters,34 blocklists,35 and prompt shields,36 direct model weight access can allow individuals to strip these safety features.37 While people may be able to circumvent these mechanisms in closed models, direct access to model weights can allow these safety features to be circumvented more easily. Further, these actions are much easier and require fewer resources and technical knowledge than training a new model directly. Such actions may be difficult to monitor, oversee, and control, unless the individual uploads the modified model publicly.38 As with all digital data in the Internet age, the release of mod el weights also cannot feasibly be reversed. Widely available model weights could potentially exacerbate the risk that non-experts use dual-use foundation models to design, synthesize, produce, acquire, or use, chemical, biological, radiological, or nuclear (CBRN) weapons. Open model weights could possibly increase this risk because they are: 1. more accessible to a wider range of actors, including actors who otherwise could not develop advanced AI models or use them in this way (either be cause closed models lack these capabilities, or they cannot "jailbreak" them to generate the desired information); and 2. easy to distribute, which means that the original model and augmented, offshoot models, as well as instructions for how to exploit them, can be proliferated and used for harm without developer knowledge. Unverified source (2024)
Comment X 13d ago
Polls
replying to National Telecommunications and Information Administration (NTIA)
Terms · Privacy · Contact