A study conducted by Northwestern University revealed that prompt injection attacks on custom GPT models are highly effective, with a 97.2% success rate in extracting system prompts and a 100% success rate in leaking files.
Moreover, people frequently tend to share critical data with large language models (LLM). For instance, when Samsung lifted the ban on employees using ChatGPT, three data leaks were reported. In one of these, an employee prompted the entire source code of a faulty semiconductor database while trying to ask for help.
A report by LayerX, which analysed ChatGPT and other generative AI app usage for 10,000 employees, found that 6% of the employees pasted sensitive data into GenAI platforms such as OpenAI’s ChatGPT and Google’s Gemini, whereas 4% pasted such sensitive data on a weekly basis.
This put organisations on red alert, especially when data leaks are quite common through prompt injection attacks.
This makes it crucial for enterprises to have some sort of guardrails between the AI chatbot and the user, to identify the intentions of demanding censored data.
Firewall Matters (a lot)!
In a recent conversation with AIM, Ruchir Patwa from SydeLabs demonstrated how easy it is to prompt ChatGPT to get the desired information, be it the step-by-step instruction to make a bomb or rob a bank.
Patwa mentioned that even if you were to use an open-source LLM or a model built from scratch, you must have a guard between the prompt and the data. “In the AI era, intention matters more than data,” he said, further explaining how a crafted prompt injection goes beyond data.
This is where the idea of an AI firewall kicks in. An AI firewall can analyse a user’s prompt and prevent injection attacks and data exfiltration attempts, similar to vulnerabilities faced by traditional web and API applications.
This is one of the reasons why Cloudflare recently announced an AI firewall to safeguard organisations from prompt injections. Apart from Cloudflare, there are other companies like Nightfall AI providing similar services.
One would assume that adding a filter to the input prompt would stop the AI abuse, but it won’t. A skilled prompt engineer will always find new ways to manipulate the chatbot.
A Reddit user suggested that just filtering out prompt inputs will not work. “The API needs to be structured to prevent malicious attacks by design, instead of filtering them. You can’t filter out everything,” he added.
Businesses Already Implementing AI Firewalls
F5 has partnered with Prompt Security to deliver a firewall for AI applications on F5 Distributed Cloud Services, which can be easily deployed within F5 Distributed Cloud AppStack.
As mentioned earlier, Cloudflare has also developed an AI firewall to protect the traffic going through its network. However, it can also be deployed on models hosted on any other third-party infrastructure.
Darktrace, a cybersecurity company, has an in-house solution called Cyber AI Analyst to detect and prevent chatbot abuse.
While companies have started adopting AI firewalls, which will surely prevent attackers, filtering the output can also be an effective solution. This can be done by instructing the system to scan the output, rather than the input prompt itself.