Veo 3 Ocean Guide: Realistic Water AI Prompts (2026)

1. Introduction: The Fluidity of Generative Physics
1.1 Redefining "Simulation" in the Age of Veo 3
For over three decades, the term "water simulation" in computer graphics has been synonymous with the rigorous mathematical solving of fluid mechanics. The industry standard, exemplified by software like SideFX Houdini, relies on solvers based on the Navier-Stokes equations to calculate velocity fields, pressure gradients, viscosity, and surface tension for millions of individual particles or voxels. This process, known as Computational Fluid Dynamics (CFD), is causal and explicit: a digital rock dropped into digital water displaces a specific volume, creating ripples calculated through conservation of mass and momentum. While physically accurate, this method is computationally exorbitant; a five-second shot of a storm-tossed ocean can require terabytes of cache data and days of rendering time on a distributed farm.
Google Veo 3 introduces a fundamental redefinition of this concept. It does not "simulate" water in the traditional deterministic sense; it generates it. Veo 3 operates not on a grid of voxels, but within a compressed latent space. It has "learned" the physics of water by observing millions of hours of video data—from YouTube clips of vast oceans to cinematic footage of breaking waves. When a user prompts for a "roaring ocean storm," the model does not calculate the collision of water molecules. Instead, it predicts the spatiotemporal arrangement of pixels that statistically represents a storm, based on the patterns encoded in its neural weights.
This distinction is critical for the VFX artist. Traditional simulation is simulation via calculation; Veo 3 is simulation via prediction. This shift allows for near-real-time generation of complex fluid behaviors that would otherwise require render farms, but it fundamentally alters the control mechanism. The artist no longer manipulates gravity constants or viscosity settings; they manipulate semantic tokens and reference imagery to guide the model's prediction of "next-frame" fluid states.
1.2 From Particle Systems to Latent Space
The evolution from Veo 2 to Veo 3.1 marks a transition from simple video generation to what DeepMind terms a "World Model". A World Model does not merely hallucinate pixels; it constructs an internal representation of the environment's physics, geometry, and causality.
Veo 2 and Early Generators: These models were primarily focused on visual fidelity but often struggled with temporal coherence. Waves might morph into clouds, water might flow upwards, or the horizon line might fracture during a pan. They lacked a robust understanding of object permanence and the continuity of motion.
Veo 3.1 and World Models: The latest architecture incorporates advanced 3D temporal attention mechanisms that maintain object permanence and fluid consistency over time. It understands that a wave cresting in frame 1 must crash in frame 24, adhering to a learned understanding of gravity and momentum. This "spatiotemporal coherence" is the defining characteristic that allows Veo 3 to be treated as a serious tool for dynamics generation.
This capability is underpinned by the model's training on "noisy" data—real-world footage that contains all the chaotic imperfections of actual fluid dynamics. Unlike game engines that often approximate water with repetitive normal maps and shaders, Veo 3’s output includes the stochastic nature of nature itself: the unpredictable spray of spindrift, the chaotic foam patterns of a wake, and the complex interaction of light refracting through turbid water. The model effectively acts as a "physics engine of the imagination," capable of rendering phenomena like subsurface scattering and caustics not by tracing light rays, but by recalling how light looks when it travels through volume.
1.3 The Concept of Learned Physics
The implications of "learned physics" are profound. Traditional sims are bound by the resolution of the grid; details smaller than a voxel are lost or faked. Veo 3, however, can hallucinate infinite detail based on its training. If trained on macro footage of droplets, it can generate the surface tension of a single drop with the same computational cost as generating a wide ocean shot. This scalability is a feature of the transformer architecture, which processes "patches" of spacetime rather than distinct particles.
However, "learned physics" is probabilistic. The model predicts the most likely outcome of a fluid interaction. In 99% of cases, water falls down. But in edge cases, or with contradictory prompts, the model may hallucinate "reverse entropy," where water un-splashes or flows uphill. Understanding this probabilistic nature is key to mastering the tool; the artist's role becomes one of guiding the probability distribution towards the desired physical outcome through precise prompting and conditioning.
2. Technical Deep Dive: Deconstructing the Water Model
2.1 Analyzing Veo 3’s Hydrodynamic Accuracy
To master ocean dynamics in Veo 3, one must understand the model's specific capabilities in replicating hydrodynamic phenomena. The model’s performance is not uniform; it varies significantly depending on the scale, complexity, and specific type of water interaction being generated.
The "Uncanny Valley" of Fluids
While Veo 3 excels at general fluid motion, it faces scrutiny under the lens of rigorous physics benchmarks like Morpheus. These benchmarks evaluate whether generative models adhere to conservation laws (mass, momentum, energy).
Mass Conservation: In traditional sims, water volume is typically constant unless emitted or killed. In Veo 3, "hallucinations" can occur where water appears or disappears spontaneously. However, Veo 3.1 shows marked improvement in maintaining volume consistency over short clips (4-8 seconds), reducing the "flickering" of mass seen in earlier models.
Viscosity and Fluidity: The model demonstrates a sophisticated "intuitive physics" understanding of viscosity. It differentiates correctly between the rapid, low-viscosity splashing of water and the slow, thick movement of mud, lava, or oil. This suggests the model has encoded the relationship between material semantic labels (e.g., "mud" vs. "water") and their kinematic behavior.
2.2 Subsurface Scattering and Volume Rendering
One of the most challenging aspects of rendering water is Subsurface Scattering (SSS)—the physical mechanism by which light penetrates the surface of a translucent object, scatters internally against particles, and exits at a different point. This gives ocean water its characteristic glow, teal depth, and perception of volume.
Traditional Rendering: Achieving realistic SSS in engines like Arnold or V-Ray requires computationally expensive path tracing to calculate light transport through the medium. It involves complex settings for absorption coefficients, scattering radii, and phase functions.
Veo 3’s Latent Approach: Veo 3 generates SSS effects based on visual pattern recognition. When prompted with terms like "tropical ocean," "glacial ice," or "Caribbean water," the model replicates the visual signature of SSS—the turquoise luminance in a wave's face or the deep blue of the abyss—without calculating the photon paths.
Caustics: The model can generate convincing underwater caustics (the dancing patterns of light on the sea floor caused by refraction). In traditional CGI, caustics are notoriously expensive and prone to noise. Veo 3 generates them as texture patterns. While they look realistic, they are "baked in" to the generated video pixels. This means that while they act convincingly, they may not always perfectly align mathematically with the surface wave geometry above if the prompt introduces conflicting directives.
God Rays (Volumetric Lighting): Veo 3 demonstrates high proficiency in rendering volumetric lighting or "god rays" penetrating the water column. Prompts combining "underwater," "sunlight from above," and "volumetric" yield high-fidelity results that mimic the scattering of light by particulate matter (marine snow), a detail that adds immense realism to underwater shots.
2.3 Surface Tension and Foam Patterns
The realism of an ocean scene often lives in the high-frequency details: the whitecaps, the foam trails, and the spray. These are the elements that break up the "plastic" look of low-quality CGI.
Foam Generation: Veo 3 understands the context of foam generation. In a "stormy" prompt, it generates chaotic, aerated white water. In a "calm" prompt, foam is absent. Crucially, the model seems to have learned the correlation between turbulence and aeration. It accurately places foam at the crests of breaking waves and in the turbulent wake of objects.
Kelvin Wakes: Ship wakes follow a specific mathematical pattern known as the Kelvin wake, characterized by a V-shape with specific transverse and divergent waves. Snippets suggest that Veo 3, having been trained on vast datasets of maritime footage, can reproduce these specific patterns when prompted with technical accuracy (e.g., "Kelvin wake pattern," "propeller wash"). It mimics the interaction of the hull with the water surface, creating the requisite displacement and foam generation.
Spindrift: The model can render "spindrift"—the fine spray blown from cresting waves by strong winds. This is a complex phenomenon involving the interaction of fluid and aerodynamics (two phase flow). Veo 3 captures the visual essence of this, creating misty, atomized water droplets that blur the boundary between sea and sky. This effect is essential for selling the scale and violence of a storm scene.
2.4 Turbidity and Water Clarity
The optical properties of water vary wildly based on particulate content. Veo 3 allows for the control of turbidity.
Low Turbidity: Prompts for "crystal clear," "oligotrophic," or "tropical" water result in high transparency, visible sea floors, and strong caustic networks.
High Turbidity: Prompts for "silty," "churning," "muddy," or "storm surge" result in opaque, desaturated water where light does not penetrate. The model accurately obscures objects beneath the surface in these conditions, demonstrating an understanding of how light attenuation varies with water quality.
3. The Prompt Engineer’s Toolkit for Oceans
3.1 Controlling the Chaos: Precision Prompting for Water
In Generative AI, the prompt is the user's interface with the physics engine. Unlike Houdini, where you adjust a "viscosity" slider, in Veo 3 you must invoke the "viscosity" concept through semantic tokens. To force Veo 3 to adhere to specific fluid dynamics, one must use a precise, technical vocabulary that aligns with the model's training data. Vague prompts like "cool ocean" yield generic stock footage. Technical, descriptive prompts unlock the model's full potential.
The Vocabulary of Water: Key Technical Terms
Research into prompt engineering for Veo 3 suggests a strong correlation between the use of specific nautical, meteorological, and cinematographic terms and the physical accuracy of the output.
Target Effect | Generic Prompt (Avoid) | Technical Prompt (Use) | Expected Outcome |
Storm Intensity | "Big storm", "huge waves" | "Beaufort Scale 9," "Violent Storm," "Rogue waves" | Generates physically accurate wave scales, driving rain, and reduced visibility characteristic of specific wind speeds. The model associates "Beaufort Scale" with specific sea states. |
Water Clarity | "Clear water" | "Low turbidity," "Crystal clear," "Oligotrophic water" | Results in transparent water with visible sea floor and distinct, sharp caustics. |
Surface Texture | "Smooth water" | "Glassy surface," "Fresnel reflection," "Capillary waves" | Produces a mirror-like finish with micro-scale ripples (capillary waves) rather than large swells. Emphasizes reflection over refraction. |
Wave Action | "Crashing waves" | "Plunging breakers," "Spilling breakers," "Whitecaps" | Differentiates between the violent barrel of a surfing wave (plunging) and the gentle crumble of a shore break (spilling). |
Ship Wakes | "Boat trail" | "Kelvin wake pattern," "Propeller wash," "Turbulent displacement" | Creates the specific V-shaped wake pattern and aerated water trail behind a vessel, avoiding generic "ripples." |
Lighting | "Sunlight" | "Subsurface scattering," "Volumetric god rays," "Snell's window" | Triggers complex light transport effects, crucial for underwater or backlit shots to achieve that "glowing" water look. |
Scale | "Big ocean" | "Vast horizon," "Open ocean," "Swell period" | helps define the distance between waves, distinguishing between a choppy lake and the deep sea. |
3.2 Camera Motion and Ocean Scale
The perception of ocean scale is intrinsically linked to camera physics. A vast ocean looks like a bathtub splash if the camera moves too quickly or if the depth of field is incorrect. Veo 3 responds to cinematic terminology to define these physical constraints.
Scale via Lens Choice: Specifying "Telephoto lens compression" (e.g., 200mm) makes waves appear stacked and more menacing, a technique often used in surf photography to exaggerate wave height. Conversely, "Wide-angle" (e.g., 16mm) emphasizes the vastness of the horizon and the curvature of the earth.
Scale via Frame Rate: Ocean dynamics often look more realistic in slow motion. Water is heavy; large masses of it move slowly due to inertia. Prompts including "High frame rate," "Slow motion," or "Phantom Flex 4K" force the model to generate fluid motion that feels heavier and more massive. Water moving at real-time speed in AI video can sometimes appear "floaty" or lacking in mass; slowing it down allows the viewer to register the complex fluid deformations.
Perspective:
"Over-under split shot": This technical term effectively triggers the specific view where the camera is half-submerged, requiring the model to render both the surface tension line (meniscus) and the underwater refraction/air reflection simultaneously.
"Drone top-down": Useful for analyzing wave patterns and swell direction, forcing the model to render the ocean surface as a texture map of varying heights.
3.3 Advanced Prompt Structure
To maximize control, use the structured prompting formula recommended for Veo 3.1: [Cinematography] + + [Action] + [Context] +.
Example Prompt for a Storm: "Cinematic wide shot, 35mm lens (Cinematography). A solitary lighthouse (Subject) withstanding a Beaufort Scale 10 violent storm (Context). Massive rogue waves crash against the base, generating explosive spindrift and thick foam. Volumetric searchlight cuts through the heavy rain and mist (Action). High contrast, moody, desaturated teal and grey palette, photorealistic 8k render, chaotic energy (Style)."
Example Prompt for Underwater: "Macro shot, shallow depth of field (Cinematography). Air bubbles rising from a coral reef (Subject). Bubbles wobble and refract light as they ascend towards the surface. Sunlight streams down in god rays (Action). Crystal clear tropical water, distinct caustics on the ocean floor, vibrant colors, serene atmosphere (Context/Style)."
4. Workflow: Image-to-Video for Art Direction
4.1 The Hybrid Workflow: Guiding the Flow
While text-to-video is powerful, professional production requires specific art direction. "Prompting and praying" is not a viable strategy for a pipeline. Veo 3.1’s Image-to-Video (I2V) capability allows creators to define the precise look of the ocean using reference images, effectively using the AI as a texture and animation engine for a static concept.
Reference Image Strategy:
Color Grading: Uploading a reference image with a specific "Atlantic Teal" or "Caribbean Turquoise" grade ensures that the generated video maintains that exact color palette, avoiding the model's tendency to revert to a generic blue.
Sea State Definition: Using a photograph of a specific sea state (e.g., choppy, glassy, swelling) as the input "ingredient" locks the simulation to that physical condition. The prompt then drives the motion of that state (e.g., "waves moving right to left") while the image dictates the texture and lighting.
4.2 Consistency with "Ingredients to Video"
One of the major hurdles in AI video is temporal consistency—keeping the ocean looking like the same ocean across multiple shots. Veo 3.1 introduces "Ingredients to Video," allowing users to provide up to three reference images to anchor the generation.
Workflow for Storyboarding:
Concept: Create 3-4 distinct "keyframe" images of your ocean environment using a text-to-image model (like Gemini 3 Pro Image or Midjourney) or traditional 3D renders. Ensure these keyframes share the same lighting and color palette.
Ingestion: Feed these images into Veo 3.1 as "ingredients." This tells the model, "This is what the water looks like; this is what the sky looks like."
Generation: Prompt for different camera angles (e.g., "Close up on wave crest," "Wide aerial shot"). The model uses the ingredients to ensure the water color, foam density, and lighting atmosphere remain consistent across these disparate shots.
Result: A sequence of clips that can be edited together without jarring visual discontinuities (e.g., the water changing from blue to green between cuts), creating a cohesive narrative environment.
4.3 Using 3D Renders as Guides (The "Gray-Box" Technique)
A powerful professional workflow involves using coarse 3D geometry as a guide. This bridges the gap between the control of Houdini and the realism of Veo.
The Technique: A VFX artist can block out the basic wave shapes and camera moves in Blender or Houdini, rendering a simple "gray-box" or low-poly version of the scene. This establishes the composition, timing, and scale of the water movement.
AI Texturing: This render is used as the starting frame or reference for Veo 3. The prompt then instructs Veo to "render as photorealistic ocean, highly detailed foam and spray."
The Benefit: Effectively, the AI is used to perform the high-frequency detailing (foam, spray, refraction, SSS) that is computationally expensive to simulate, while the artist retains control over the macro composition and camera movement. This solves the "lack of control" issue inherent in pure text-to-video workflows.
5. Comparative Analysis: Veo 3 vs. The Industry Titans
5.1 Veo 3 vs. Houdini (FLIP Solvers)
Houdini is the industry standard for high-end water simulation, utilizing FLIP (Fluid Implicit Particle) solvers to generate physically accurate fluids.
Feature | SideFX Houdini | Google Veo 3.1 | Analysis |
Physics | Explicit, calculated Newtonian physics. 100% accurate interactions. | Implicit, learned physics. Statistical approximation of reality. | Houdini is essential for "hero" interaction (e.g., a character interacting with water where the splash must match the movement exactly). Veo 3 excels at background environments and chaotic systems where perceptual realism is sufficient. |
Control | Absolute. Every particle, force, viscosity, and surface tension value can be tweaked. | High-level. Controlled via text, style references, and image inputs. No vertex-level control. | Veo 3 offers "directorial" control (mood, style, action); Houdini offers "engineering" control (velocity fields, collision depths). |
Render Time | Hours to days for simulation baking, meshing, and rendering. | Seconds to minutes (approx. 60-90s for Veo 3 Fast). | Veo 3 is orders of magnitude faster. This speed enables rapid iteration and "concepting" that is impossible in Houdini. |
Workflow | Complex node-based pipeline. Extremely high barrier to entry. | Natural language prompting. Low barrier to entry. | Veo 3 democratizes high-quality water FX, but lacks the specific "art directability" for precise VFX shots (e.g., "move that splash 2 inches left"). |
Output | 3D Geometry (Alembic/VDB) + Rendered Pixels. | 2D Video Pixels (MP4). | Houdini output can be relit and composited in 3D. Veo 3 output is "baked" and can only be composited as a 2D layer. |
5.2 Veo 3 vs. Unreal Engine 5 (Real-Time)
Unreal Engine 5 (UE5) offers real-time rendering but relies on shader tricks, Gerstner waves, and simplified physics for water.
Visual Fidelity: Veo 3 often achieves higher photorealism than standard UE5 water shaders because it generates the "imperfections" of real film (grain, focus falloff, complex scattering, infinite foam variety) natively. UE5 water can sometimes look "gamey" or overly perfect/repetitive unless heavily customized by technical artists.
Interactivity: UE5 allows for real-time interaction (a player walking through water, dynamic ripples). Veo 3 generates static video files; it is non-interactive.
Use Case: Veo 3 is superior for generating cinematic plates, backdrops, and pre-rendered cutscenes where visual fidelity is paramount. UE5 remains the undisputed choice for interactive media and real-time virtual production where latency must be zero.
5.3 Speed vs. Control: The Trade-off
The core trade-off determining the choice of tool is render time vs. editability.
Scenario A: A director says, "Make that wave crash later."
Houdini: The artist must re-sim, which can take overnight to bake and render.
Veo 3: The artist changes the prompt to "slow building wave" and regenerates in 2 minutes.
Scenario B: A director says, "Keep the wave exactly the same but move the foam on the left slightly to the right."
Houdini: The artist can mask and manipulate that specific foam layer or re-sim just the whitewater.
Veo 3: Cannot do this. A regeneration will change the entire fluid simulation due to the stochastic nature of the diffusion process. This "all-or-nothing" generation is the primary bottleneck for replacing traditional tools in final-pixel hero shots.
Conclusion: Veo 3 is an unmatched tool for exploration, pre-visualization, and backgrounds, while Houdini remains necessary for specific, scripted, high-control interactions.
6. Limitations and Future Outlook
6.1 Where the Simulation Breaks
Despite its "World Model" capabilities, Veo 3 is not a physics engine in the scientific sense and is prone to specific failures, often referred to as "hallucinations".
Physics Hallucinations:
Reverse Entropy: Occasionally, water may appear to flow up a waterfall or un-splash, particularly in complex scenes or reversed-time prompts.
Object Interaction: While Veo 3 creates realistic splashes, it sometimes struggles with the "aftermath"—wetmaps on objects may disappear, or a boat may not displace the correct amount of water for its apparent mass. It understands the visual of a splash but not the mass displacement causing it.
Horizon Line Artifacts: In vast ocean shots, the horizon line can sometimes warp or break, especially during rapid camera pans, revealing the 2D nature of the latent generation.
The Uncanny Valley of Physics:
There is a specific zone where the water looks photorealistic (textures, lighting) but moves slightly "wrong"—too viscous or too fast. This disconnect can be jarring to the viewer, creating an "uncanny valley" effect for fluids. This is often due to a mismatch between the prompted scale (e.g., "giant tsunami") and the learned motion dynamics (which might resemble a smaller wave scaled up).
6.2 Benchmarking Physical Reasoning: The Morpheus Study
Recent academic research, specifically the Morpheus benchmark, has quantified these limitations. The study tested generative video models like Veo 3 against rigorous conservation laws.
Key Finding: Even with advanced prompting, the models do not yet function as true simulators. They are "pixel predictors" that mimic the appearance of physics rather than adhering to the rules of physics. They often fail to conserve mass (water appearing/disappearing) or momentum (objects stopping instantly without friction).
Implication: For scientific visualization or engineering simulations (e.g., predicting floodwaters), Veo 3 is currently unsuitable. Its domain is strictly entertainment and creative visualization.
6.3 Future Outlook: The Convergence
The trajectory of development points towards a convergence of these technologies.
Hybrid Models and PINNs: Future iterations (Veo 4, Genie 4) will likely integrate "Physics-Informed Neural Networks" (PINNs). These networks embed actual physical laws (like Navier-Stokes equations) into the loss function of the AI during training. This would force the generated video to adhere to conservation of mass and momentum, effectively "teaching" the AI the laws of physics rather than just the look of them.
Neural Rendering in Engines: We are seeing the beginning of workflows where a coarse physics sim from Houdini is "rendered" by an AI model like Veo. This combines the explicit control of simulation with the photorealism of generative AI, offering the best of both worlds.
Native Audio & Video: Veo 3.1’s inclusion of native audio (generating the sound of crashing waves with the video) suggests a future where "sound design" is automated alongside visual generation, creating a holistic sensory simulation.
Conclusion
Google Veo 3.1 represents a watershed moment in the creation of digital marine environments. By replacing the deterministic calculations of traditional CFD (Computational Fluid Dynamics) with the probabilistic predictions of a World Model, it allows artists to conjure oceans of infinite variety and photorealism in seconds—a feat that previously required days of computation.
Mastering this tool requires a fundamental shift in skillset: from manipulating vertices and solving equations to manipulating language and curating reference imagery. The "Prompt Engineer" effectively becomes the "Simulation Director," using a technical lexicon of hydrodynamics—spindrift, caustics, Kelvin wakes—to guide the AI’s latent imagination.
While Veo 3 currently lacks the granular control and physical infallibility of Houdini, making it less suitable for scientific accuracy or specific "hero" interactions where continuity is paramount, its speed, fidelity, and accessibility make it a disruptive force. For background environments, motion design, pre-visualization, and rapid concepting, it is already reshaping the industry. As these models evolve to incorporate stricter physical constraints through PINNs and hybrid workflows, the line between "simulation" and "generation" will likely dissolve completely, ushering in a new era where the only limit to the ocean's depth is the user's ability to describe it.
Appendix: Featured Snippet Opportunity - 5 Key Prompts for Realistic Oceans in Veo 3
Title: "5 Key Prompts for Realistic Oceans in Veo 3"
For Storms: "Beaufort Scale 9, violent storm, chaotic swell, heavy spindrift, plunging breakers, low visibility, grey turbulent water."
For Calm/Tropical: "Glassy surface with capillary waves, tropical oligotrophic water (crystal clear), visible sandy sea floor, distinct caustic networks, bright sunlight."
For Underwater: "Subsurface view, volumetric lighting (god rays), particulate matter (marine snow), deep blue fade, upward angle towards Snell's window."
For Scale: "Telephoto lens compression (200mm), slow-motion fluid physics, high frame rate, massive scale rogue wave, small human figure for scale reference."
For Detail: "Macro shot of water surface tension, splashing droplets, aerated foam bubbles, sharp focus, high shutter speed to freeze motion."


