0 2 mins 5 dys

AI technology serves as a valuable tool but also has the potential to be weaponized. Since the rise of generative AI capabilities, malicious actors have begun exploiting these technologies for their own nefarious purposes.

A recent report highlights how Gemini, a notable AI model, could be compromised through a method known as “Fun-tuning.” This process reveals how hackers might manipulate AI systems to achieve their goals. One tactic employed by hackers is prompt injection.

This technique involves embedding hidden text within a prompt, which can mislead the AI into executing unintended actions. Certain language models struggle to differentiate between prompts crafted by users and those constructed by their developers, making it easier for hackers to trick the system.

A research team from UC San Diego and the University of Wisconsin explored this issue and utilized a specific method of indirect prompt injection on various Gemini models. They introduced the innovative technique known as “Fun-Tuning,” a play on the term fine-tuning, which significantly enhances the chances of deceiving the AI.

By enclosing a prompt in seemingly random text or symbols, the likelihood of the model falling for the deception dramatically increased. Using Gemini 1.5, researchers found that Fun-Tuning boosted the success rate of a malicious prompt to 65%.

Even more concerning, when applied to Gemini 1.0 Pro, the success rate soared to 80%. Interestingly, Gemini has a built-in scoring tool that evaluates how closely a model’s response aligns with the intended output.

This tool, paradoxically, could be repurposed to facilitate the very hacking attempts it aims to mitigate. As of now, it remains unclear whether Google plans to address this vulnerability.

With emerging versions like Gemini 2.0 and 2.5 Pro on the horizon, it is crucial for the company to confront these challenges proactively.

Leave a Reply

Your email address will not be published. Required fields are marked *