nano-banana

character-consistency

tutorial-series

brand-design

workflow

De LoRA a Entrenamiento Cero: La Revolución de la Consistencia de Personajes

Cómo Nano Banana 2 elimina el mayor punto débil en la generación de imágenes AI—la consistencia de personajes—sin entrenamiento, sin esperar, sin dolores de cabeza.

Publicado el 2026-02-26

From LoRA to Zero-Training: Character Consistency Revolution

The Character Consistency Nightmare

In 2024, AI image generation had a dirty secret: you could generate a beautiful character once, but you could never generate them twice.

Meet Sarah. She runs a small design agency in Austin. In March 2024, she landed a dream client—a children's book publisher needing 24 illustrations of a recurring protagonist. The character: a curious red fox named Rusty, with specific markings, a green scarf, and expressive amber eyes.

Sarah's workflow looked like this:

Week 1: Generate 200+ images in Midjourney. Find 3 that vaguely match the client's vision. Present them.

Week 2: Client selects Rusty v2. Now Sarah needs to generate Rusty in 24 different scenes. Same fox. Same scarf. Same eyes.

Attempt 1: Add "consistent character" to prompts. Result: 24 different foxes. Some orange. Some brown. One inexplicably purple.

Attempt 2: Use Midjourney's Character Reference (CF) feature. Better, but the scarf color drifts. The eye shape changes. Background elements bleed into the character.

Attempt 3: Train a LoRA. Sarah spends $50 on cloud GPU credits. Waits 6 hours for training. The LoRA overfits—every Rusty has the exact same pose. Client wants Rusty running, jumping, sleeping. The LoRA can only do "Rusty standing and looking cute."

Total time: 3 weeks. Total cost: $800 in tools and revisions. Client satisfaction: "Can you make episode 7's Rusty look more like episode 3's Rusty?"

This was the reality of AI image generation in 2024. Character consistency was the industry's open wound.

The Old Solutions (And Why They Failed)

Solution 1: Prompt Engineering

The Promise: Write detailed prompts, and the AI will remember.

The Reality:

"A red fox named Rusty, orange fur with white chest patch, 
wearing a forest green scarf, amber eyes, friendly expression..."

Generate 10 images. You get 10 different scarves. 3 different eye colors. One fox with two tails.

Current diffusion models do not "remember" characters. They generate probabilities. Each image is a fresh roll of the dice.

Success rate: ~15% for simple characters, ~3% for complex ones.

Solution 2: Character Reference (Midjourney CF)

Midjourney's 2024 Character Reference was a step forward. Upload a reference image, add --cref URL, and hope.

The Problems:

Style bleeding: The reference image's lighting and background contaminate new generations
Feature drift: Facial features wander across generations
Limited control: Works for portraits, fails for complex poses or extreme angles

Success rate: ~40% for headshots, ~10% for full-body action shots.

Solution 3: LoRA Training

The "professional" solution. Train a small model on 15-30 images of your character. Then use that LoRA in your generations.

The Workflow:

Collect 20+ high-quality images of your character (or generate them painstakingly)
Label each image with captions
Rent GPU ( $0.50-$ 2/hour)
Train for 2-6 hours
Test, realize it overfit, adjust parameters
Retrain
Discover the LoRA works for front-facing poses but fails on profiles
Collect more profile images
Retrain
Finally get acceptable results—for one specific character

Time per character: 8-20 hours. Cost: $30-100 in compute. Expertise required: Significant.

And when the client says: "We love Rusty! Now we need his sister, a blue-gray fox with a yellow scarf"—you start over.

Nano Banana 2: The Zero-Training Revolution

January 2026. Google releases Nano Banana 2 (Gemini 3.1 Flash Image). The feature that matters: native reference image support.

Not LoRA. Not training. Upload up to 6 reference images. The model understands. The character stays consistent.

Sarah's New Workflow (February 2026)

Same client. Same Rusty. New approach:

Step 1: Generate or upload 3-6 reference images of Rusty:

Front view, neutral expression
Side profile
3/4 view with scarf visible
Close-up of face markings
Full body standing
Action pose (running)

Step 2: Generate scene 1:

"Rusty the fox exploring a forest clearing, morning light, 
curious expression, children's book illustration style"

Reference images: [upload 6 Rusty refs]

Result: Rusty. Correct orange fur. White chest patch. Forest green scarf. Amber eyes.

Step 3: Generate scene 2:

"Rusty jumping over a stream, dynamic pose, water splashing"

Reference images: [same 6 refs]

Result: Same Rusty. In motion. Scarf flowing correctly. Eyes still amber.

Step 4-24: Repeat for remaining scenes. Each Rusty is the same Rusty.

Total time: 2 days. Total cost: ~$15 in API calls. Client satisfaction: "This is exactly what we envisioned."

The difference is not incremental. It is categorical.

How Native Reference Images Work

The Technical Shift

Traditional diffusion models: [Text] → [Noise] → [Image]

Nano Banana 2: [Text + Reference Images + Context] → [Multimodal Understanding] → [Consistent Image]

The key: multimodal reasoning. Nano Banana 2 doesn't "copy" pixels from references. It understands what makes Rusty "Rusty"—the fur pattern, the scarf color, the eye shape, the personality—and applies that understanding to new contexts.

The 6-Reference Sweet Spot

Why 6? Through extensive testing, Google found diminishing returns beyond 6 references:

References	Consistency	Generation Time	Use Case
1-2	60%	Fast	Quick tests, simple objects
3-4	85%	Normal	Standard characters
5-6	95%+	Normal	Production characters
7+	96%	Slower	Marginal improvement

Recommended reference set:

Front-facing portrait (neutral expression)
Side profile (showing silhouette)
3/4 view (most versatile angle)
Detail shot (face/unique features)
Full body (proportions)
Action/expression variation (personality)

What Stays Consistent (And What Doesn't)

Highly Consistent (95%+ reliability):

Facial features and structure
Color schemes (fur, clothing, accessories)
Proportions and body type
Distinctive markings (scars, patterns)

Moderately Consistent (80-90% reliability):

Lighting direction (model adapts to scene)
Expression intensity (mood varies with context)
Clothing details (may simplify complex patterns)

Intentionally Variable (by design):

Pose and angle (adapted to each scene)
Background (varies by context)
Lighting quality (adapts to environment)

You Can Take Action Now

Your First Character Consistency Test

Time required: 15 minutes. Cost: ~$0.50.

Step 1: Create a simple character

Go to Google AI Studio. Select Gemini 3.1 Flash Image.

Prompt:

"A friendly robot mascot for a tech startup, rounded design, 
blue and white color scheme, LED face display, minimalist aesthetic"

Generate 4-6 variations. Pick the best one.

Step 2: Build your reference set

From your generated character, create 6 reference images:

Crop/resize to focus on different angles
Or regenerate with prompts like "front view," "side profile," "close-up face"

Step 3: Test consistency

New prompt:

"The robot mascot working at a desk, typing on a laptop, 
office environment, soft lighting"

Upload your 6 reference images. Generate.

Step 4: Test again with different context

"The robot mascot presenting on stage, spotlight, confident pose, 
audience visible in background"

Same 6 references. Generate.

Compare: Same robot? Same colors? Same face? That's character consistency.

Production Workflow Template

For Brand Mascots

Reference Set:

3-4 neutral poses showing full design
1-2 expression variations
1 detail close-up

Generation Strategy:

Always use same reference set for all brand materials
Lock color palette in references, let model adapt lighting
Generate 3-4 options per scene, select best

Cost Estimate: $0.10-0.30 per image vs.$ 50-200 for LoRA training per character.

For Storybook Illustrations

Reference Set:

Character A: 6 refs
Character B: 6 refs
Setting/style: 2-3 refs

Generation Strategy:

Batch generate scenes with consistent references
Generate characters separately, composite if needed for complex interactions
Use "children's book illustration style" prompt modifier for consistency

Time Savings: 3 weeks → 3 days per book.

For Product Visualization

Reference Set:

Product: 4-6 refs (different angles)
Style/environment: 2 refs

Generation Strategy:

Product references ensure SKU consistency
Environment references control mood/lighting
Generate 50+ scenes without product variation

Use Case: E-commerce teams generating lifestyle images for hundreds of SKUs.

Advanced Techniques

Technique 1: Character + Style Separation

Problem: You want consistent character AND consistent art style across scenes.

Solution: Use 4 references for character, 2 for style.

References 1-4: [Your character in various poses]
References 5-6: [Style examples - e.g., "Studio Ghibli style artwork"]

Prompt: "Character in a forest scene, style matching reference 5-6"

The model maintains character consistency from refs 1-4 AND style consistency from refs 5-6.

Technique 2: Seasonal/Temporal Variations

Problem: Your character needs winter clothes in scene 7, but must still be recognizable.

Solution: Keep 4 core references (face/body), replace 2 with seasonal variants.

References 1-4: [Core character - face, body, proportions]
References 5-6: [Character in winter coat, character with snow background]

Prompt: "Character walking through snowy street, wearing winter coat"

Result: Core identity maintained, seasonal variation applied.

Technique 3: Multi-Character Scenes

Problem: Two characters interacting in one image.

Current limitation: Nano Banana 2 supports 6 references total, not 6 per character.

Workaround:

Generate Character A alone (with A's references)
Generate Character B alone (with B's references)
Generate background/environment
Composite in traditional editing software

Or: Use 3 refs for Character A, 3 refs for Character B, prompt carefully:

"Character A and Character B having coffee together, cafe setting"

Result varies. Best for characters with very distinct silhouettes/color schemes.

The Economics of Zero-Training

Cost Comparison: Traditional vs. Nano Banana 2

Scenario: 50-image children's book, 3 recurring characters.

Method	Setup Time	Per-Image Cost	Total Cost	Revision Flexibility
LoRA Training	24-40 hours	$0.02	$120-200	Low (retrain needed)
Manual Prompting	0 hours	$0.05	$150+	Medium (inconsistent)
Nano Banana 2	1 hour	$0.03	$75	High (just regenerate)

Time-to-First-Image

Method	Time
LoRA Training	6-12 hours (training)
Manual Prompting	5 minutes
Nano Banana 2	2 minutes (upload refs + generate)

For client work, this means: same-day character approval, next-day scene delivery.

Real-World Case Studies

Case Study 1: E-commerce Fashion Brand

Client: Direct-to-consumer fashion brand, 200 SKUs.

Old Workflow:

Hire models: $500/day
Studio rental: $300/day
Photography: 2 days per collection
Post-processing: 3 days
Total: ~$2000 + 5 days per collection

Nano Banana 2 Workflow:

Generate brand model references: 30 minutes
Generate 200 lifestyle scenes: 4 hours
Select and minor retouch: 1 day
Total: ~$100 + 1.5 days per collection

Result: 80% cost reduction, 70% time savings. Model consistency across all 200 images.

Case Study 2: Indie Game Developer

Client: Solo developer creating visual novel.

Old Workflow:

Commission artist: $50-100 per character sprite
Wait time: 2-4 weeks
Revisions: $25 each
12 characters × $75 =$ 900

Nano Banana 2 Workflow:

Generate character concepts: 2 hours
Lock references, generate all expressions/poses: 4 hours
12 characters: $30 API cost

Result: 97% cost reduction. Full creative control. Same-day iteration.

Limitations and Workarounds

Limitation 1: Complex Interactions

Two characters holding hands? Hugging? Fighting?

Current State: Challenging. Nano Banana 2 handles single characters excellently. Multi-character interactions can blend features ("chimera effect").

Workaround: Generate characters separately, composite manually. Or use specialized pose-control tools in combination.

Limitation 2: Extreme Angles

Top-down view? Extreme foreshortening?

Current State: Reference images help, but extreme perspectives may drift.

Workaround: Include an extreme-angle shot in your 6 references. Or generate standard angle first, use img2img with perspective transformation.

Limitation 3: Fine Detail Consistency

Specific jewelry patterns? Text on clothing? Precise tattoo designs?

Current State: Broad features stay consistent. Fine details may vary.

Workaround: For critical details, generate base character in Nano Banana 2, then overlay precise details in post-processing.

The Next 12 Months

Character consistency is solved—for now. What's next?

Predicted Evolution:

Q2 2026: 12+ reference images support for complex characters
Q3 2026: Built-in character memory/"personas" you can save and reuse
Q4 2026: Video character consistency (same character across video frames)
2027: 3D character consistency (generate same character from any angle)

The arms race has shifted. It's no longer "can we keep characters consistent?" It's "how many characters can we manage, and how quickly?"

Series Navigation

This is Article 1 of the Nano Banana 2 Masterclass Series.

Next: E02: From Text-to-Image to Conversation-to-Image
Series Overview: Masterclass Index

Character consistency was the first gate. It has fallen. The evolution continues.