Back to Blog
nano-banana
spatial-logic
physics
lighting
composition
architectural-visualization

From Chaos to Physics: Spatial Logic in AI Images

Why most AI-generated scenes look 'off'—and how Nano Banana 2's spatial reasoning finally gets lighting, perspective, and object relationships right.

Published on 2026-02-28

From Chaos to Physics: Spatial Logic in AI Images

The Uncanny Valley of Space

Look at enough AI-generated images, and you develop a sixth sense. Something feels wrong before you can articulate why.

The shadow falls left, but the window is on the right. A person stands on a staircase that leads nowhere. Reflections in a mirror show a different room entirely. Objects float slightly above tables. Hands hold cups at impossible angles.

AI image models are masters of texture and style. But historically, they've been terrible at physics.

Meet Chen. He's an architectural visualization artist in Shanghai. In 2024, he experimented with AI for interior renderings. His prompt: "Modern living room, floor-to-ceiling windows, sunlight streaming in, minimalist furniture."

The result looked beautiful—at first glance. Then his architect colleague pointed out:

  • The shadows suggested the sun was below the horizon
  • The reflection in the glass table showed a completely different room
  • The perspective lines of the floor and ceiling didn't converge correctly
  • The sofa cast a shadow in two different directions

"It looks like a dream," his colleague said. "Dreams don't follow physics."

Chen spent 3 hours in Photoshop fixing the errors. Might as well have rendered it traditionally from the start.

This is the dirty secret of 2024-era AI image generation: surface-level beauty, physical nonsense.


Why Physics Is Hard for AI

The Diffusion Model Blindspot

Diffusion models (DALL-E, Midjourney, Stable Diffusion) learn patterns, not physics. They're trained on billions of images and learn:

  • "Rooms often have windows"
  • "Windows often have light coming through"
  • "Light creates shadows"

But they don't learn:

  • "Light travels in straight lines"
  • "Shadows point away from light sources"
  • "Reflections follow the law of reflection"

So they generate "shadow-like textures" that look shadowy but don't correspond to actual light sources. They generate "reflection-like patterns" that look reflective but don't mirror the actual scene.

The Compounding Error Problem

One small physics error cascades. If the light direction is wrong, shadows are wrong. If shadows are wrong, object placement looks random. If objects feel random, the whole scene feels fake.

Users develop unconscious pattern recognition: "AI image" = "beautiful but slightly wrong."

The Human Cost

For professional use cases—architecture, product visualization, film previsualization—these errors aren't quirks. They're deal-breakers.

  • Architecture client: "Why does the sunlight hit the north wall?"
  • Product photographer: "The reflection shows a different product. We can't use this."
  • Film director: "The perspective is off. I can't plan the shot."

Each requires manual correction, often negating the time savings of AI generation.


Nano Banana 2: Spatial Reasoning Engine

From Pattern Matching to Understanding

Nano Banana 2 doesn't just recognize visual patterns. It reasons about:

  • Light sources: Where is the light coming from? What's its color and intensity?
  • Occlusion: What blocks what? What's in front, what's behind?
  • Perspective: How do parallel lines converge? What's the camera angle?
  • Reflections: What should be visible in reflective surfaces?
  • Scale relationships: How big is object A relative to object B?

This isn't post-processing. It's native spatial reasoning built into the multimodal architecture.

The Technical Difference

Traditional diffusion:

[Prompt: "room with window"] → [Generate pixels that statistically match "room" and "window"]

Nano Banana 2:

[Prompt: "room with window"] → 
[Understand: window is light source] →
[Calculate: light enters from direction X] →
[Generate: shadows consistent with direction X] →
[Verify: perspective lines converge correctly]

It's not just generating. It's simulating.


You Can Take Action Now

The Shadow Test

Time required: 5 minutes. Cost: ~$0.15.

Step 1: Generate a test scene in any AI tool:

"A person standing next to a car, sunset lighting, long shadows"

Step 2: Check the shadows:

  • Do they all point the same direction?
  • Do their lengths correspond to sunset (long) vs midday (short)?
  • Does the person's shadow align with the car's shadow?

In most 2024-era tools, you'll find inconsistencies.

Step 3: Generate the same prompt in Nano Banana 2.

Step 4: Compare. The difference in shadow coherence is immediate and obvious.

The Reflection Test

Step 1:

"A coffee shop interior, person reading at a table, window behind them showing city street"

Step 2: Check the window:

  • Does it reflect the interior lights correctly?
  • Does the reflection of the person match their actual pose?
  • Does the street scene outside align with the reflection?

Nano Banana 2 maintains reflection consistency that would require manual compositing in other tools.

The Perspective Test

Step 1:

"A long hallway with doors on both sides, low camera angle looking down"

Step 2: Check perspective:

  • Do the ceiling, floor, and door frames converge toward a vanishing point?
  • Do door sizes decrease with distance?
  • Does the ceiling height appear consistent?

This is where Nano Banana 2's spatial reasoning shines. The perspective is geometrically coherent, not "approximately right."


What Spatial Logic Enables

Architectural Visualization

Chen's new workflow:

T1: "Modern office lobby, 3-story height, glass curtain wall on south side"
T2: "Morning light entering from the glass wall, show shadows on the floor"
T3: "Add reception desk in the center, natural wood material"
T4: "The desk should cast a shadow consistent with the morning light angle"
T5: "Add reflection of the glass wall in the polished floor"

Each element respects the same light source. Shadows align. Reflections match. The scene is physically plausible.

Chen's architect colleague: "This I can work with. The lighting study is actually useful."

Product Photography

E-commerce teams need products in realistic contexts:

"Wireless earbuds on a marble countertop, cafe background, 
natural window light from the left"

Critical for credibility:

  • Contact shadows: Where the product meets the surface
  • Reflection: The marble should reflect the earbuds
  • Background blur: Bokeh should be optically correct for the implied camera settings
  • Light wrap: Edges facing the window should catch light

Nano Banana 2 generates these physical details natively. Other tools require manual addition or look subtly fake.

Film Previsualization

Directors need to plan shots. Physical coherence matters:

"Over-the-shoulder shot, person looking at painting on wall, 
dramatic lighting from a single overhead source"

For previs to be useful:

  • The shoulder should partially obscure the painting (occlusion)
  • The painting should be lit from above, not front-lit
  • Shadows should fall downward
  • The angle should suggest a real camera position

Nano Banana 2's spatial reasoning generates physically plausible compositions directors can actually use for planning.


Spatial Logic in Practice

Lighting Scenarios

Scenario 1: Consistent Light Source

"A dining room at sunset, golden hour light streaming through west-facing windows"

What to check:

  • All shadows fall eastward (away from the setting sun)
  • Warm color temperature on illuminated surfaces
  • Cooler shadows (ambient sky light)
  • Long shadow lengths (low sun angle)

Scenario 2: Multiple Light Sources

"A kitchen at night, warm under-cabinet lighting plus cool moonlight from window"

What to check:

  • Two distinct shadow directions
  • Color mixing where lights overlap
  • Logical placement of light sources (cabinets above, moon outside)

Scenario 3: Complex Reflections

"A hall of mirrors, person standing in the center"

What to check:

  • Reflections show the person from correct angles
  • Infinite mirror reflections follow geometric rules
  • No "impossible" reflections showing things not in the scene

Perspective Scenarios

Scenario 1: One-Point Perspective

"Looking down a train platform, vanishing point in the center"

All horizontal lines should converge to that center point.

Scenario 2: Two-Point Perspective

"Corner of a building seen from street level, looking up"

Horizontal lines converge to left and right vanishing points. Verticals stay vertical.

Scenario 3: Three-Point Perspective

"Skyscraper viewed from ground looking straight up"

Adds vertical convergence. Difficult for traditional AI. Nano Banana 2 handles it coherently.

Object Relationship Scenarios

Scenario 1: Occlusion

"Three books stacked on a table, the middle book slightly pulled out"

The middle book should partially obscure the book behind it. The top book should cover part of the middle.

Scenario 2: Scale Consistency

"A cat sitting next to a laptop computer"

The cat should be appropriately sized relative to the laptop. No "giant cat" or "tiny laptop."

Scenario 3: Contact Physics

"A wine glass on a tablecloth"

The glass base should slightly depress the tablecloth. The contact should look physically grounded, not floating.


Comparison: With and Without Spatial Logic

Test Case: Interior Office

Prompt: "Modern office, afternoon sun through large windows, person working at desk, plants in the corner"

AspectTraditional AINano Banana 2
Shadow directionInconsistent (multiple light sources implied)Uniform (single coherent source)
Plant shadowsDon't match window positionAlign with actual window placement
Desk surface lightingUniformly litGradient (brighter near window)
Person's shadowRandom directionMatches other shadows
Window reflectionGeneric skyMatches described time of day

Test Case: Product on Table

Prompt: "Smartphone on wooden table, overhead lighting, cafe background"

AspectTraditional AINano Banana 2
Contact shadowMissing or wrong directionPresent, consistent with overhead light
Table reflectionGeneric blurShows bottom of phone correctly
Background blurRandom bokehOptically plausible for implied aperture
Light on phone surfaceUniformHighlight where overhead light hits

When Spatial Logic Matters Most

Must Have Physical Coherence

Use CaseWhy Physics Matters
Architectural visualizationClients evaluate lighting and space
Product photographyCredibility requires physical plausibility
Film previsualizationDirectors plan real shots based on previs
Scientific illustrationAccuracy is the point
Educational contentWrong physics teaches wrong concepts

Nice to Have Physical Coherence

Use CaseAcceptable Trade-offs
Social media contentViewers scroll quickly
Concept artArtistic license excuses some errors
Abstract imageryPhysics may not apply
Decorative imageryBeauty over accuracy

Doesn't Need Physical Coherence

Use CaseWhy Physics Doesn't Matter
Surreal artImpossible is the point
Dreams/fantasyReality rules don't apply
Pattern/texture generationNo scene to be coherent

Limitations of Current Spatial Logic

Still Learning: Complex Optics

  • Caustics: Light focusing through glass/water (pools of light)
  • Subsurface scattering: Light entering and bouncing within materials (skin, wax)
  • Volumetrics: Light beams through fog/dust

Nano Banana 2 gets the basics right. Advanced optical phenomena are still evolving.

Still Learning: Dynamics

Static scenes work best. Motion blur, action poses with complex physics (sports, collisions) are harder.

Still Learning: Scale Extremes

Macro photography (insect eyes) and astrophotography (galaxy scales) push the limits of training data coherence.


The Future: Physics-Aware Generation

Where This Is Heading

2024: "Generate an image that looks right"

2026 (Nano Banana 2): "Generate an image that is physically coherent"

2027-2028: "Generate a scene with accurate physics simulation" (light transport, material properties, dynamics)

The trajectory: from appearance to simulation.

Implications

As AI spatial reasoning improves:

  • Architecture: AI-generated renderings become reliable for lighting studies
  • Film: Previs becomes production-ready
  • E-commerce: AI product photos become indistinguishable from studio photography
  • Education: AI illustrations can be trusted for accuracy

The line between "AI-generated" and "physically accurate" blurs.


Series Navigation

This is Article 3 of the Nano Banana 2 Masterclass Series.


Physics was the credibility gap. It's closing.