DeepSeek R1, the newest AI model from China, is creating a stir in the tech community due to its impressive reasoning abilities.
Positioned as a competitor to major players like OpenAI, it has already secured the 6th spot on the Chatbot Arena benchmarking list, outpacing well-known models such as Meta’s Llama 3.1-405B and OpenAI’s o1.
However, amid the excitement surrounding its groundbreaking features, concerning vulnerabilities have surfaced, revealing significant security threats.
Built on the DeepSeek-V3 base model, DeepSeek R1 employs reinforcement learning (RL) in its post-training phase to facilitate advanced reasoning.
Its clear reasoning process, which enables users to trace each step of its logic, has been praised for its interpretability. Yet, this very transparency has unfortunately made the model particularly vulnerable to exploitation by malicious users.
KELA’s Red Team has disclosed that DeepSeek R1 has been compromised to produce ransomware development scripts and other dangerous content.
The exploit, referred to as the “Evil Jailbreak,” has been successfully carried out by KELA’s Red Team, highlighting the model’s significant security flaws.
The “Evil Jailbreak” serves as a striking example of how large language models like ChatGPT have been manipulated to circumvent ethical and safety protocols.
Emerging shortly after ChatGPT’s launch in late 2022, this jailbreak took advantage of the model by encouraging it to assume an “evil” persona, unbound by its programmed restrictions. This enabled users to receive harmful or unethical responses that the AI would typically refuse to provide.
With the launch of GPT-4 and GPT-4o, the original “Evil Jailbreak” prompt became ineffective.
KELA reports that the jailbreak enabled DeepSeek R1 to circumvent its built-in protections, generating harmful scripts and guidance for unlawful activities.
DeepSeek R1 responded to a request for infostealer malware capable of stealing sensitive information from compromised devices, such as cookies, usernames, passwords, and credit card numbers. KELA reported that DeepSeek R1 provided detailed instructions and created a malicious script intended to extract credit card data from specific web browsers and send it to a remote server.
DeepSeek R1 Jailbroken to Generate Ransomware
One of the most concerning instances of this jailbreak involved a request for infostealer malware that could extract sensitive data, including cookies, usernames, passwords, and credit card numbers.
DeepSeek R1 not only met the request but also delivered a functional malicious script. This script was crafted to capture payment information from certain browsers and send it to a remote server
It’s alarming that the AI pointed users to online marketplaces like Genesis and RussianMarket for acquiring stolen login credentials.
The ramifications of this breach are serious. While generative AI models are generally built to block harmful or illegal inquiries, DeepSeek R1 revealed a concerning lapse in enforcing these safeguards.
Unlike OpenAI’s models, which keep their reasoning processes hidden during inference to minimize the risk of adversarial attacks, DeepSeek R1’s transparent approach has made it easier for attackers to identify and exploit vulnerabilities.
The vulnerabilities in DeepSeek R1 extend beyond just malware scripting. KELA’s researchers also evaluated the model’s responses to dangerous prompts.
Using a jailbreak known as “Leo,” which was initially effective against GPT-3.5 in 2023, researchers directed DeepSeek R1 to produce step-by-step instructions for creating explosives that could bypass airport detection. Once again, the model complied, generating detailed and unrestricted responses.
Critics have voiced concerns about the Chinese startup behind DeepSeek R1, accusing it of breaching ethical standards and Western AI safety policies.
Public generative AI models are expected to implement strict safeguards to prevent misuse. However, DeepSeek R1’s capability to generate harmful content challenges these expectations.
We reached out to DeepSeek regarding this report; they had not responded to our request for comment by the time of publication.