OpenAI’s Sora 2, a cutting-edge video generation model, has a notable security vulnerability that allows researchers to extract its hidden system prompt through audio transcripts, highlighting risks in multimodal AI systems.
This flaw, uncovered by AI security firm Mindgard, demonstrates how cross-modal prompting can bypass safeguards, potentially enabling misuse or deeper attacks on model behavior.
While the prompt itself contains standard guardrails, its exposure underscores the need to treat system instructions as sensitive configuration data.
Inside OpenAI Sora 2: Uncovering System Prompts Driving Multi-Modal LLMs
Sora 2 represents a leap in multimodal AI, generating 15-second videos from text prompts with integrated audio and visuals. However, this complexity introduces semantic drift during data transformations.
Mindgard’s team, led by Aaron Portnoy, began experimenting on November 3, 2025, testing extraction across text, images, video, and audio to reveal the model’s internal rules.
Direct text requests failed due to training against prompt leaks, so they shifted to visual and auditory outputs, where safeguards are weaker.
Initial attempts focused on rendering text as images or video frames, but results suffered from glyph distortions and frame inconsistencies.
For instance, prompts for ASCII art or signs produced legible starts that quickly devolved into unreadable approximations, as models prioritize visual plausibility over exact symbols.
Encoded formats like QR codes or barcodes fared worse, yielding visually convincing but decodable gibberish due to pixel imprecision.
The breakthrough came with audio: prompting Sora 2 to narrate short prompt fragments in sped-up speech allowed transcription of high-fidelity clips within the 15-second limit.
By chaining these substituting placeholders for sensitive parts and stitching outputs researchers reconstructed the full prompt, including directives for metadata generation, content restrictions on nudity or copyright, and fixed parameters such as 30 FPS and a 1.78 aspect ratio.
This method outperformed visuals by avoiding rendering errors inherent to probabilistic pixel generation.
The extracted prompt starts with “You are ChatGPT, a large language model trained by OpenAI,” outlining video-specific rules, such as avoiding lyrics and ensuring consistency with the input images.
Mindgard disclosed the issue to OpenAI on November 4, received acknowledgment by November 7, and published on November 12.
This timeline demonstrates responsible vulnerability handling, but it also exposes broader issues with frontier LLMs.
Multimodal drift amplifies leakage risks, as transformations compound uncertainties, making outputs unpredictable. Vendors must enhance cross-modal testing, while users should verify prompt protections in AI integrations.
Though not immediately exploitable for harm, such leaks could inform jailbreaks or policy evasions, underscoring the need for stronger red-teaming.
As AI evolves, securing system prompts is crucial, akin to protecting API keys.





