LLM Killswitch

Super Weights & GPUHammer - novel in-memory threat to LLM-based AI

Dan Martin, September 2025
dan.m@camulos.io

Super Weights

Apple's Machine Learning Research (MLR) team recently released a paper on Large Language Model (LLM) Super Weights in which they show the majority of models have a small number of "super weights" that if they are set to zero cause the model to stop working.


For Llama 7B, Llama2 7B, Llama2 13B and Mistral 7B this is a single weight, all the way up to just 6 weights in Phi3-mini-4k-instruct. In all scenarios the weights were located in the down_proj layers of the MLP.


The authors show that the role of these weights is to essentially bias the model away from generating stop words, and without these weights (in this case setting them to zero) the model is more likely to generate stopwords and thus considerably less likely to generate meaningful content.


We have reproduced the original research findings for the Mistral 7B model and conducted a small extension of their work by exploring additional corruption scenarios. Specifically, in addition to zeroing out a weight value we evaluated the impact of a randomising the weight in a selection of realistic manners (normally distributed, uniform 0-1 and uniform FP16).


Our key result is that in all weight randomisation scenarios, we observe the same results as Apple MLR: high probability tokens dominating model output leading to considerably less meaningful content generation effectively creating an unusable model.


While Apple MLR discuss super weights as a mechanism to understand LLM behaviour, we are concerned about the possibility of them being used as an attack vector.

Row Hammer & GPUHammer

Row Hammer is a computer security exploit based on the way dynamic random access memory (DRAM) works. By repeatedly "hammering" a particular memory address, writing to it multiple times in quick succession, the write might leak into other memory addresses not accessed in the request and change their content as well as the target address.


If the attacker is sharing a machine with another user, it might be possible to use a Row Hammer attack to change the memory of a program owned by that other user. Since the release in 2014 there have been several follow up works, including; DRAMMER a root exploit attack on Android devices and Rowhammer.js a pure javascript implementation which runs in Firefox, on what is in practice a very low level attack. There have also been a multiple of follow on works on different hardware and architectures.


This year at USENIX Lin et al. demonstrated GPU Hammer, which extended Row Hammer over to Graphics Processing Unit (GPU) memory (specifically on GDDR6 memory in an NVIDIA A6000 GPU). They demonstrate a degradation of a user's machine learning model's accuracy from 80% to 0.1% using a single bit flip. The attack is against a deep neural network trained for image classification; several networks were tried including AlexNet.

LLM 'killswitch' attack?

Combining the knowledge of Super Weights, with the capabilities of GPU Hammer, might make it possible to attack deployed LLMs as randomised weight changes have been observed and GPUHammer-inspired in-memory attacks have theability to flip a sufficient number of bits to achieve this effect - a form of runtime denial of service model attack.


If the LLM was being used simply as a chat bot, this attack would be detected by human model users - they would see bizzare responses and likely quickly complain allowing operations and security teams to investigate and remidate fairly quickly.

However, there are several other use cases where such an attack could potentially go undetected for a long period of time. A couple of such scenarios include, but are not limited to:

  • LLMs used for monitoring, e.g. to detect unacceptable behaviour on a chat platform

  • LLMs used as judges to train other models

  • Probes trained on LLM internals, for classification, when then output of the LLM isn't required

  • Agentic systems

Close

With Apple's super weights research showing LLM models to be so brittle as to be rendered useless as a result of a single weight value change opens up the potential for GPUHammer-inspired memory attacks to become a new threat to AI systems from hard to detect in-memory attacks not simlpy model file modification attacks.

The ability to achieve strong observability on GPU operations and actively monitor atypical write operations that are a signature of "hammer" attack may well become an essential component in a strong AI security posture.