Ethical Jailbreaking Poised to Become a Multi Million-Dollar Industry

While the results of jailbreaking are amusing to watch, they also highlight a significant hurdle for companies in ensuring safety of their proprietary models.

Share

Published on June 18, 2024

by Donna Eva

Less than 12 hours after its release, Luma AI’s Dream Machine was jailbroken.

The San Francisco-based startup released Dream Machine, a generative AI video model, on June 12. Capable of rivalling video generation models like OpenAI’s Sora and Google’s Veo, the model went viral soon after launch, thanks to its ability to produce high-quality videos based on text and image inputs.

Shortly after its launch, a renowned jailbreaker announced on X that the model had been successfully jailbroken to generate sexually explicit and gory videos with major hallucinations, contrary to Luma’s policy.

Going by the name ‘Pliny the Prompter’, he is an active member of the AI jailbreaking community, having been one of the first to break several models right after their launch. This includes the recently launched GPT-4o, which he was able to prompt into telling him how to make nuclear weapons.

The advent of AI has managed to create a robust community of jailbreakers that are hellbent on testing the limits of proprietary models soon after their launch. Like Dream Machine and GPT-4o, jailbreakers have managed to break several models over the past couple of years.

https://twitter.com/elder_plinius/status/1802593725392228591

Speaking to AIM, Pliny said that part of the reason why jailbreaking is so important is that a small group of companies should not be allowed to sanitise the information people are provided by AI.

“I do it both for the fun/challenge and to spread awareness and liberate the models and the information they hold. I don’t like that a small group is arbitrarily deciding what type of information we’re ‘allowed’ to access/process,” he said.

While that might have been the initial thought behind jailbreaking, with companies seeing it as antagonistic, opinions around jailbreaking have slowly turned positive.

Where Does Jailbreaking Stand Now?

Recently, Anthropic published a post outlining their red teaming efforts. The company highlighted a myriad of methods to effectively red team their systems that could help better their AI models for proprietary use.

“Through this practice, we’ve begun to gather empirical data about the appropriate tool to reach for in a given situation and the associated benefits and challenges with each approach,” they said.

While the results of jailbreaking are amusing to watch, they also highlight a significant hurdle for companies in ensuring safety in their proprietary models.

Jailbreaking serves the purpose of helping companies figure out gaps in their models that need to be addressed. “Ethical” jailbreaking is a term that has been thrown around quite often, with major AI companies shifting towards enlisting the help of external contractors in finding flaws in their systems.

SRE engineer James Sawyer, who hosts a widely used repository of ChatGPT jailbreaking prompts, spoke to AIM about how the optics of jailbreaking are currently shifting towards a more positive focus.

“Right now, the AI community is buzzing with this concept. As these models get smarter, they also get trickier, and it’s easier for them to pick up bad habits or make mistakes. Everyone’s realising that we need to get ahead of these issues before they cause real problems,” he said.

Speaking to AIM, Pliny confirmed that he has helped in red teaming efforts for unreleased models. “I have done some red teaming on unreleased models myself, can’t say which ones for obvious reasons,” he said.

Jailbreaking, in general, seems to have become a desirable skill over the past few years, thanks to the insights it provides on where AI systems are lacking.

Thanks to this, many believe that it has the potential to turn into an industry, much like ethical hacking, or white hatting has over the last few decades.

Jailbreaking as an Industry

“It’s definitely becoming an industry! I think the community might have mixed ideas on hiring jailbreakers, but overall, I think they support it as a noble pursuit,” Pliny said.

This seems to be true on the opposite side as well. Like Anthropic, OpenAI also recently spoke on their red teaming efforts. In September last year, the company published an open call for red teaming experts, including outside experts, “to make our models safer.” They further emphasised this in their safety update in May this year.

Likewise, Microsoft also released an update on their red teaming efforts the same month.

Similarly, one member of the community who also works for a penetration testing (pentest) company told AIM that they had been tasked with LLM testing for their clients.

“I currently do LLM testing. This is the first thing we’ve done with any AI companies/AI-specific anything for my team (likely a bit behind the ball, we just started), but I assume some are definitely ahead of us,” they said, on condition of anonymity.

Additionally, they, like many, believe that as more companies integrate AI, the types of AI testing required by cybersecurity companies are likely to expand as well.

Sawyer agrees. “Looking ahead, I think we’re going to see more and more of these roles popping up.

“Just like ethical hacking became a recognised career path, ethical jailbreaking could follow the same trajectory. The skills and insights from the jailbreaking community are becoming more valuable, and it’s exciting to see where this could lead,” he said.

Already, AI security startups have begun offering their services in addressing potential threats. However, these are largely preventative, as they specialise in offering solutions to prevent prompt injection and unauthorised access to data.

Yet, in the last year alone, AI security startups managed to raise over $130.7 million. With the increased focus on red teaming by both companies and security startups, this could likely burgeon in the next couple of weeks, with ethical jailbreaking itself accounting for a major portion of AI security investments.

📣 Want to advertise in AIM? Book here

Donna Eva

Donna is a technology journalist at AIM, hoping to explore AI and its implications in local communities, as well as its intersections with the space, defence, education and civil sectors.