LLM Unicode Prompt Injection

January 17, 2024

Be careful copying AI prompts…

It has become common place on social media to see posts sharing “super prompts” or prompt templates. Researchers have discovered a technique that uses unicode to hide prompt injection as non-printable characters1.

Prompt injection, a term coined by Simon Willison, is a type of attack that attempts to override a user or application prompt to either alter the results or to exfiltrate earlier elements of the prompt or used in retrieval augmented generation (RAG). It is a real challenge for LLM apps at the moment as there are no completely reliable mitigation techniques.

Prompt injection is not the only use for this new technique, it can be used as a part of poisoning LLM training data making it very hard to detect. LLM poisoning can cause issues for advanced applications of LLMs such as agents and interpreters2

Time to start practicing safe prompting…

The linked tweet has these recommendations:

“If you’re building an AI feature or app, strip out bad invisible unicode or disallow unicode beyond the basic emojis from going into the llm.”
“If you’re doing something sensitive with AI and you are copying and pasting from anywhere, paste it into a website where you can see hidden characters”