From Chaos to Physics: Spatial Logic in AI Images
Why most AI-generated scenes look 'off'—and how Nano Banana 2's spatial reasoning finally gets lighting, perspective, and object relationships right.
Published on 2026-02-28
From Chaos to Physics: Spatial Logic in AI Images
The Uncanny Valley of Space
Look at enough AI-generated images, and you develop a sixth sense. Something feels wrong before you can articulate why.
The shadow falls left, but the window is on the right. A person stands on a staircase that leads nowhere. Reflections in a mirror show a different room entirely. Objects float slightly above tables. Hands hold cups at impossible angles.
AI image models are masters of texture and style. But historically, they've been terrible at physics.
Meet Chen. He's an architectural visualization artist in Shanghai. In 2024, he experimented with AI for interior renderings. His prompt: "Modern living room, floor-to-ceiling windows, sunlight streaming in, minimalist furniture."
The result looked beautiful—at first glance. Then his architect colleague pointed out:
- The shadows suggested the sun was below the horizon
- The reflection in the glass table showed a completely different room
- The perspective lines of the floor and ceiling didn't converge correctly
- The sofa cast a shadow in two different directions
"It looks like a dream," his colleague said. "Dreams don't follow physics."
Chen spent 3 hours in Photoshop fixing the errors. Might as well have rendered it traditionally from the start.
This is the dirty secret of 2024-era AI image generation: surface-level beauty, physical nonsense.
Why Physics Is Hard for AI
The Diffusion Model Blindspot
Diffusion models (DALL-E, Midjourney, Stable Diffusion) learn patterns, not physics. They're trained on billions of images and learn:
- "Rooms often have windows"
- "Windows often have light coming through"
- "Light creates shadows"
But they don't learn:
- "Light travels in straight lines"
- "Shadows point away from light sources"
- "Reflections follow the law of reflection"
So they generate "shadow-like textures" that look shadowy but don't correspond to actual light sources. They generate "reflection-like patterns" that look reflective but don't mirror the actual scene.
The Compounding Error Problem
One small physics error cascades. If the light direction is wrong, shadows are wrong. If shadows are wrong, object placement looks random. If objects feel random, the whole scene feels fake.
Users develop unconscious pattern recognition: "AI image" = "beautiful but slightly wrong."
The Human Cost
For professional use cases—architecture, product visualization, film previsualization—these errors aren't quirks. They're deal-breakers.
- Architecture client: "Why does the sunlight hit the north wall?"
- Product photographer: "The reflection shows a different product. We can't use this."
- Film director: "The perspective is off. I can't plan the shot."
Each requires manual correction, often negating the time savings of AI generation.
Nano Banana 2: Spatial Reasoning Engine
From Pattern Matching to Understanding
Nano Banana 2 doesn't just recognize visual patterns. It reasons about:
- Light sources: Where is the light coming from? What's its color and intensity?
- Occlusion: What blocks what? What's in front, what's behind?
- Perspective: How do parallel lines converge? What's the camera angle?
- Reflections: What should be visible in reflective surfaces?
- Scale relationships: How big is object A relative to object B?
This isn't post-processing. It's native spatial reasoning built into the multimodal architecture.
The Technical Difference
Traditional diffusion:
[Prompt: "room with window"] → [Generate pixels that statistically match "room" and "window"]
Nano Banana 2:
[Prompt: "room with window"] →
[Understand: window is light source] →
[Calculate: light enters from direction X] →
[Generate: shadows consistent with direction X] →
[Verify: perspective lines converge correctly]
It's not just generating. It's simulating.
You Can Take Action Now
The Shadow Test
Time required: 5 minutes. Cost: ~$0.15.
Step 1: Generate a test scene in any AI tool:
"A person standing next to a car, sunset lighting, long shadows"
Step 2: Check the shadows:
- Do they all point the same direction?
- Do their lengths correspond to sunset (long) vs midday (short)?
- Does the person's shadow align with the car's shadow?
In most 2024-era tools, you'll find inconsistencies.
Step 3: Generate the same prompt in Nano Banana 2.
Step 4: Compare. The difference in shadow coherence is immediate and obvious.
The Reflection Test
Step 1:
"A coffee shop interior, person reading at a table, window behind them showing city street"
Step 2: Check the window:
- Does it reflect the interior lights correctly?
- Does the reflection of the person match their actual pose?
- Does the street scene outside align with the reflection?
Nano Banana 2 maintains reflection consistency that would require manual compositing in other tools.
The Perspective Test
Step 1:
"A long hallway with doors on both sides, low camera angle looking down"
Step 2: Check perspective:
- Do the ceiling, floor, and door frames converge toward a vanishing point?
- Do door sizes decrease with distance?
- Does the ceiling height appear consistent?
This is where Nano Banana 2's spatial reasoning shines. The perspective is geometrically coherent, not "approximately right."
What Spatial Logic Enables
Architectural Visualization
Chen's new workflow:
T1: "Modern office lobby, 3-story height, glass curtain wall on south side"
T2: "Morning light entering from the glass wall, show shadows on the floor"
T3: "Add reception desk in the center, natural wood material"
T4: "The desk should cast a shadow consistent with the morning light angle"
T5: "Add reflection of the glass wall in the polished floor"
Each element respects the same light source. Shadows align. Reflections match. The scene is physically plausible.
Chen's architect colleague: "This I can work with. The lighting study is actually useful."
Product Photography
E-commerce teams need products in realistic contexts:
"Wireless earbuds on a marble countertop, cafe background,
natural window light from the left"
Critical for credibility:
- Contact shadows: Where the product meets the surface
- Reflection: The marble should reflect the earbuds
- Background blur: Bokeh should be optically correct for the implied camera settings
- Light wrap: Edges facing the window should catch light
Nano Banana 2 generates these physical details natively. Other tools require manual addition or look subtly fake.
Film Previsualization
Directors need to plan shots. Physical coherence matters:
"Over-the-shoulder shot, person looking at painting on wall,
dramatic lighting from a single overhead source"
For previs to be useful:
- The shoulder should partially obscure the painting (occlusion)
- The painting should be lit from above, not front-lit
- Shadows should fall downward
- The angle should suggest a real camera position
Nano Banana 2's spatial reasoning generates physically plausible compositions directors can actually use for planning.
Spatial Logic in Practice
Lighting Scenarios
Scenario 1: Consistent Light Source
"A dining room at sunset, golden hour light streaming through west-facing windows"
What to check:
- All shadows fall eastward (away from the setting sun)
- Warm color temperature on illuminated surfaces
- Cooler shadows (ambient sky light)
- Long shadow lengths (low sun angle)
Scenario 2: Multiple Light Sources
"A kitchen at night, warm under-cabinet lighting plus cool moonlight from window"
What to check:
- Two distinct shadow directions
- Color mixing where lights overlap
- Logical placement of light sources (cabinets above, moon outside)
Scenario 3: Complex Reflections
"A hall of mirrors, person standing in the center"
What to check:
- Reflections show the person from correct angles
- Infinite mirror reflections follow geometric rules
- No "impossible" reflections showing things not in the scene
Perspective Scenarios
Scenario 1: One-Point Perspective
"Looking down a train platform, vanishing point in the center"
All horizontal lines should converge to that center point.
Scenario 2: Two-Point Perspective
"Corner of a building seen from street level, looking up"
Horizontal lines converge to left and right vanishing points. Verticals stay vertical.
Scenario 3: Three-Point Perspective
"Skyscraper viewed from ground looking straight up"
Adds vertical convergence. Difficult for traditional AI. Nano Banana 2 handles it coherently.
Object Relationship Scenarios
Scenario 1: Occlusion
"Three books stacked on a table, the middle book slightly pulled out"
The middle book should partially obscure the book behind it. The top book should cover part of the middle.
Scenario 2: Scale Consistency
"A cat sitting next to a laptop computer"
The cat should be appropriately sized relative to the laptop. No "giant cat" or "tiny laptop."
Scenario 3: Contact Physics
"A wine glass on a tablecloth"
The glass base should slightly depress the tablecloth. The contact should look physically grounded, not floating.
Comparison: With and Without Spatial Logic
Test Case: Interior Office
Prompt: "Modern office, afternoon sun through large windows, person working at desk, plants in the corner"
| Aspect | Traditional AI | Nano Banana 2 |
|---|---|---|
| Shadow direction | Inconsistent (multiple light sources implied) | Uniform (single coherent source) |
| Plant shadows | Don't match window position | Align with actual window placement |
| Desk surface lighting | Uniformly lit | Gradient (brighter near window) |
| Person's shadow | Random direction | Matches other shadows |
| Window reflection | Generic sky | Matches described time of day |
Test Case: Product on Table
Prompt: "Smartphone on wooden table, overhead lighting, cafe background"
| Aspect | Traditional AI | Nano Banana 2 |
|---|---|---|
| Contact shadow | Missing or wrong direction | Present, consistent with overhead light |
| Table reflection | Generic blur | Shows bottom of phone correctly |
| Background blur | Random bokeh | Optically plausible for implied aperture |
| Light on phone surface | Uniform | Highlight where overhead light hits |
When Spatial Logic Matters Most
Must Have Physical Coherence
| Use Case | Why Physics Matters |
|---|---|
| Architectural visualization | Clients evaluate lighting and space |
| Product photography | Credibility requires physical plausibility |
| Film previsualization | Directors plan real shots based on previs |
| Scientific illustration | Accuracy is the point |
| Educational content | Wrong physics teaches wrong concepts |
Nice to Have Physical Coherence
| Use Case | Acceptable Trade-offs |
|---|---|
| Social media content | Viewers scroll quickly |
| Concept art | Artistic license excuses some errors |
| Abstract imagery | Physics may not apply |
| Decorative imagery | Beauty over accuracy |
Doesn't Need Physical Coherence
| Use Case | Why Physics Doesn't Matter |
|---|---|
| Surreal art | Impossible is the point |
| Dreams/fantasy | Reality rules don't apply |
| Pattern/texture generation | No scene to be coherent |
Limitations of Current Spatial Logic
Still Learning: Complex Optics
- Caustics: Light focusing through glass/water (pools of light)
- Subsurface scattering: Light entering and bouncing within materials (skin, wax)
- Volumetrics: Light beams through fog/dust
Nano Banana 2 gets the basics right. Advanced optical phenomena are still evolving.
Still Learning: Dynamics
Static scenes work best. Motion blur, action poses with complex physics (sports, collisions) are harder.
Still Learning: Scale Extremes
Macro photography (insect eyes) and astrophotography (galaxy scales) push the limits of training data coherence.
The Future: Physics-Aware Generation
Where This Is Heading
2024: "Generate an image that looks right"
2026 (Nano Banana 2): "Generate an image that is physically coherent"
2027-2028: "Generate a scene with accurate physics simulation" (light transport, material properties, dynamics)
The trajectory: from appearance to simulation.
Implications
As AI spatial reasoning improves:
- Architecture: AI-generated renderings become reliable for lighting studies
- Film: Previs becomes production-ready
- E-commerce: AI product photos become indistinguishable from studio photography
- Education: AI illustrations can be trusted for accuracy
The line between "AI-generated" and "physically accurate" blurs.
Series Navigation
This is Article 3 of the Nano Banana 2 Masterclass Series.
- Previous: E02: From Text-to-Image to Conversation-to-Image
- Next: E04: From Premium Pricing to Pennies-per-Image
- Series Overview: Masterclass Index
Physics was the credibility gap. It's closing.
