Indirect prompt injection: how people manipulate neural networks
A new study by Kaspersky reveals a growing trend of “indirect prompt injection”, a technique used to manipulate the outputs of large language models (LLMs) like ChatGPT and search chatbots powered by AI. While no instances of serious destructive actions by chatbots have been found, the potential for misuse remains.
LLMs are powerful tools used in various applications, from document analysis to recruitment and even threat research. However, Kaspersky researchers discovered that a vulnerability where malicious actors can embed hidden instructions within websites and online documents is being exploited in the wild. These instructions can then be picked up by LLM-based systems, potentially influencing search results or chatbot responses.
The study identified several uses for indirect prompt injection:
1. HR-related injections: Job seekers are embedding prompts in resumes to manipulate recruitment algorithms and ensure favorable outcomes or prioritization by AI systems. Techniques including using small fonts or matching text color to the background are applied to hide the attack from human reviewers.
2. Ad injections. Advertisers are placing prompt injections in landing pages to influence search chatbots to generate positive reviews of products.
3. Injection as protest. Individuals opposed to the widespread use of LLMs are embedding protest prompts in their personal websites and social media profiles, expressing their dissent through humorous, serious, or aggressive instructions.
4. Injection as insult. On social media, users are employing prompt injection as a form of insult or to disrupt spam bots, often with requests to generate poems, ASCII art, or opinions on political topics.
While the study found no evidence of malicious use for financial gain, it highlights potential future risks. For instance, attackers could manipulate LLMs to spread misinformation or exfiltrate sensitive data.
“Indirect prompt injection is a novel vulnerability that highlights the need for robust security measures in the age of AI. By understanding these risks and implementing appropriate safeguards, we can ensure that LLMs are used safely and responsibly”, comments Vladislav Tushkanov, Research Development Group Manager at Kaspersky’s Machine Learning Technology Research Team.
To protect your current and future systems based on large language models (LLMs), consider the following advice:
· Understand the potential vulnerabilities in your LLM-based systems and evaluate the risks associated with prompt injection attacks.
· Be aware of reputational risks, as marketing bots can be manipulated to make radical statements, leading to potential reputational damage.
· Acknowledge the limits of protection. Complete protection against prompt injection is not possible, especially with more complex attacks like multimodal injections.
· Use input and output moderation tools to filter inputs and outputs of LLMs, though they may not offer total security.
· Recognize the risks that arise from processing untrusted or unverified content in LLM systems.
· Restrict the decision-making capabilities of AI systems to prevent unintended actions.
· Ensure all computers and servers running LLM-based systems are protected with up-to-date security tools and practices.