nano-banana

spatial-logic

physics

lighting

composition

architectural-visualization

From Chaos to Physics: Spatial Logic in AI Images

Why most AI-generated scenes look 'off'—and how Nano Banana 2's spatial reasoning finally gets lighting, perspective, and object relationships right.

Published on 2026-02-28

From Chaos to Physics: Spatial Logic in AI Images

The Uncanny Valley of Space

Look at enough AI-generated images, and you develop a sixth sense. Something feels wrong before you can articulate why.

The shadow falls left, but the window is on the right. A person stands on a staircase that leads nowhere. Reflections in a mirror show a different room entirely. Objects float slightly above tables. Hands hold cups at impossible angles.

AI image models are masters of texture and style. But historically, they've been terrible at physics.

Meet Chen. He's an architectural visualization artist in Shanghai. In 2024, he experimented with AI for interior renderings. His prompt: "Modern living room, floor-to-ceiling windows, sunlight streaming in, minimalist furniture."

The result looked beautiful—at first glance. Then his architect colleague pointed out:

The shadows suggested the sun was below the horizon
The reflection in the glass table showed a completely different room
The perspective lines of the floor and ceiling didn't converge correctly
The sofa cast a shadow in two different directions

"It looks like a dream," his colleague said. "Dreams don't follow physics."

Chen spent 3 hours in Photoshop fixing the errors. Might as well have rendered it traditionally from the start.

This is the dirty secret of 2024-era AI image generation: surface-level beauty, physical nonsense.

Why Physics Is Hard for AI

The Diffusion Model Blindspot

Diffusion models (DALL-E, Midjourney, Stable Diffusion) learn patterns, not physics. They're trained on billions of images and learn:

"Rooms often have windows"
"Windows often have light coming through"
"Light creates shadows"

But they don't learn:

"Light travels in straight lines"
"Shadows point away from light sources"
"Reflections follow the law of reflection"

So they generate "shadow-like textures" that look shadowy but don't correspond to actual light sources. They generate "reflection-like patterns" that look reflective but don't mirror the actual scene.

The Compounding Error Problem

One small physics error cascades. If the light direction is wrong, shadows are wrong. If shadows are wrong, object placement looks random. If objects feel random, the whole scene feels fake.

Users develop unconscious pattern recognition: "AI image" = "beautiful but slightly wrong."

The Human Cost

For professional use cases—architecture, product visualization, film previsualization—these errors aren't quirks. They're deal-breakers.

Architecture client: "Why does the sunlight hit the north wall?"
Product photographer: "The reflection shows a different product. We can't use this."
Film director: "The perspective is off. I can't plan the shot."

Each requires manual correction, often negating the time savings of AI generation.

Nano Banana 2: Spatial Reasoning Engine

From Pattern Matching to Understanding

Nano Banana 2 doesn't just recognize visual patterns. It reasons about:

Light sources: Where is the light coming from? What's its color and intensity?
Occlusion: What blocks what? What's in front, what's behind?
Perspective: How do parallel lines converge? What's the camera angle?
Reflections: What should be visible in reflective surfaces?
Scale relationships: How big is object A relative to object B?

This isn't post-processing. It's native spatial reasoning built into the multimodal architecture.

The Technical Difference

Traditional diffusion:

[Prompt: "room with window"] → [Generate pixels that statistically match "room" and "window"]

Nano Banana 2:

[Prompt: "room with window"] → 
[Understand: window is light source] →
[Calculate: light enters from direction X] →
[Generate: shadows consistent with direction X] →
[Verify: perspective lines converge correctly]

It's not just generating. It's simulating.

You Can Take Action Now

The Shadow Test

Time required: 5 minutes. Cost: ~$0.15.

Step 1: Generate a test scene in any AI tool:

"A person standing next to a car, sunset lighting, long shadows"

Step 2: Check the shadows:

Do they all point the same direction?
Do their lengths correspond to sunset (long) vs midday (short)?
Does the person's shadow align with the car's shadow?

In most 2024-era tools, you'll find inconsistencies.

Step 3: Generate the same prompt in Nano Banana 2.

Step 4: Compare. The difference in shadow coherence is immediate and obvious.

The Reflection Test

Step 1:

"A coffee shop interior, person reading at a table, window behind them showing city street"

Step 2: Check the window:

Does it reflect the interior lights correctly?
Does the reflection of the person match their actual pose?
Does the street scene outside align with the reflection?

Nano Banana 2 maintains reflection consistency that would require manual compositing in other tools.

The Perspective Test

Step 1:

"A long hallway with doors on both sides, low camera angle looking down"

Step 2: Check perspective:

Do the ceiling, floor, and door frames converge toward a vanishing point?
Do door sizes decrease with distance?
Does the ceiling height appear consistent?

This is where Nano Banana 2's spatial reasoning shines. The perspective is geometrically coherent, not "approximately right."

What Spatial Logic Enables

Architectural Visualization

Chen's new workflow:

T1: "Modern office lobby, 3-story height, glass curtain wall on south side"
T2: "Morning light entering from the glass wall, show shadows on the floor"
T3: "Add reception desk in the center, natural wood material"
T4: "The desk should cast a shadow consistent with the morning light angle"
T5: "Add reflection of the glass wall in the polished floor"

Each element respects the same light source. Shadows align. Reflections match. The scene is physically plausible.

Chen's architect colleague: "This I can work with. The lighting study is actually useful."

Product Photography

E-commerce teams need products in realistic contexts:

"Wireless earbuds on a marble countertop, cafe background, 
natural window light from the left"

Critical for credibility:

Contact shadows: Where the product meets the surface
Reflection: The marble should reflect the earbuds
Background blur: Bokeh should be optically correct for the implied camera settings
Light wrap: Edges facing the window should catch light

Nano Banana 2 generates these physical details natively. Other tools require manual addition or look subtly fake.

Film Previsualization

Directors need to plan shots. Physical coherence matters:

"Over-the-shoulder shot, person looking at painting on wall, 
dramatic lighting from a single overhead source"

For previs to be useful:

The shoulder should partially obscure the painting (occlusion)
The painting should be lit from above, not front-lit
Shadows should fall downward
The angle should suggest a real camera position

Nano Banana 2's spatial reasoning generates physically plausible compositions directors can actually use for planning.

Spatial Logic in Practice

Lighting Scenarios

Scenario 1: Consistent Light Source

"A dining room at sunset, golden hour light streaming through west-facing windows"

What to check:

All shadows fall eastward (away from the setting sun)
Warm color temperature on illuminated surfaces
Cooler shadows (ambient sky light)
Long shadow lengths (low sun angle)

Scenario 2: Multiple Light Sources

"A kitchen at night, warm under-cabinet lighting plus cool moonlight from window"

What to check:

Two distinct shadow directions
Color mixing where lights overlap
Logical placement of light sources (cabinets above, moon outside)

Scenario 3: Complex Reflections

"A hall of mirrors, person standing in the center"

What to check:

Reflections show the person from correct angles
Infinite mirror reflections follow geometric rules
No "impossible" reflections showing things not in the scene

Perspective Scenarios

Scenario 1: One-Point Perspective

"Looking down a train platform, vanishing point in the center"

All horizontal lines should converge to that center point.

Scenario 2: Two-Point Perspective

"Corner of a building seen from street level, looking up"

Horizontal lines converge to left and right vanishing points. Verticals stay vertical.

Scenario 3: Three-Point Perspective

"Skyscraper viewed from ground looking straight up"

Adds vertical convergence. Difficult for traditional AI. Nano Banana 2 handles it coherently.

Object Relationship Scenarios

Scenario 1: Occlusion

"Three books stacked on a table, the middle book slightly pulled out"

The middle book should partially obscure the book behind it. The top book should cover part of the middle.

Scenario 2: Scale Consistency

"A cat sitting next to a laptop computer"

The cat should be appropriately sized relative to the laptop. No "giant cat" or "tiny laptop."

Scenario 3: Contact Physics

"A wine glass on a tablecloth"

The glass base should slightly depress the tablecloth. The contact should look physically grounded, not floating.

Comparison: With and Without Spatial Logic

Test Case: Interior Office

Prompt: "Modern office, afternoon sun through large windows, person working at desk, plants in the corner"

Aspect	Traditional AI	Nano Banana 2
Shadow direction	Inconsistent (multiple light sources implied)	Uniform (single coherent source)
Plant shadows	Don't match window position	Align with actual window placement
Desk surface lighting	Uniformly lit	Gradient (brighter near window)
Person's shadow	Random direction	Matches other shadows
Window reflection	Generic sky	Matches described time of day

Test Case: Product on Table

Prompt: "Smartphone on wooden table, overhead lighting, cafe background"

Aspect	Traditional AI	Nano Banana 2
Contact shadow	Missing or wrong direction	Present, consistent with overhead light
Table reflection	Generic blur	Shows bottom of phone correctly
Background blur	Random bokeh	Optically plausible for implied aperture
Light on phone surface	Uniform	Highlight where overhead light hits

When Spatial Logic Matters Most

Must Have Physical Coherence

Use Case	Why Physics Matters
Architectural visualization	Clients evaluate lighting and space
Product photography	Credibility requires physical plausibility
Film previsualization	Directors plan real shots based on previs
Scientific illustration	Accuracy is the point
Educational content	Wrong physics teaches wrong concepts

Nice to Have Physical Coherence

Use Case	Acceptable Trade-offs
Social media content	Viewers scroll quickly
Concept art	Artistic license excuses some errors
Abstract imagery	Physics may not apply
Decorative imagery	Beauty over accuracy

Doesn't Need Physical Coherence

Use Case	Why Physics Doesn't Matter
Surreal art	Impossible is the point
Dreams/fantasy	Reality rules don't apply
Pattern/texture generation	No scene to be coherent

Limitations of Current Spatial Logic

Still Learning: Complex Optics

Caustics: Light focusing through glass/water (pools of light)
Subsurface scattering: Light entering and bouncing within materials (skin, wax)
Volumetrics: Light beams through fog/dust

Nano Banana 2 gets the basics right. Advanced optical phenomena are still evolving.

Still Learning: Dynamics

Static scenes work best. Motion blur, action poses with complex physics (sports, collisions) are harder.

Still Learning: Scale Extremes

Macro photography (insect eyes) and astrophotography (galaxy scales) push the limits of training data coherence.

The Future: Physics-Aware Generation

Where This Is Heading

2024: "Generate an image that looks right"

2026 (Nano Banana 2): "Generate an image that is physically coherent"

2027-2028: "Generate a scene with accurate physics simulation" (light transport, material properties, dynamics)

The trajectory: from appearance to simulation.

Implications

As AI spatial reasoning improves:

Architecture: AI-generated renderings become reliable for lighting studies
Film: Previs becomes production-ready
E-commerce: AI product photos become indistinguishable from studio photography
Education: AI illustrations can be trusted for accuracy

The line between "AI-generated" and "physically accurate" blurs.

Series Navigation

This is Article 3 of the Nano Banana 2 Masterclass Series.

Previous: E02: From Text-to-Image to Conversation-to-Image
Next: E04: From Premium Pricing to Pennies-per-Image
Series Overview: Masterclass Index

Physics was the credibility gap. It's closing.