SURF NOIR DEV LOG 1

TL;DR

I spent ~60–80 hours creating an 11-minute animated episode set in the Surf Noir universe using a hybrid AI workflow (Midjourney, Freepik/Seedream, Kling, ElevenLabs, DaVinci Resolve).

This was not “prompt and publish.” It was directing, editing, rewriting, rejecting, and problem-solving at scale. Over 5,500+ Midjourney images, 2,600+ Freepik images, and 400k+ generation credits were used to arrive at a finished episode.

Because of beta access and partnerships, the out-of-pocket cost was low — but at current market rates, this episode would realistically cost $1k–$5k to produce with AI. A traditionally animated equivalent could cost anywhere from $40k–$150k+.

AI didn’t replace storytelling. It compressed production distance — allowing one person to iterate like a small studio — while making taste, clarity, and intention the real bottlenecks.

This dev log documents how a story-first approach, not the tools themselves, determines whether AI outputs feel like “slop” or cinema.

INTRODUCTION: FOR THE WORLD-BUILDERS

This document is for anyone who's ever had a vision that seemed too big, too outlandish, or too impossible to do alone.

If you grew up on Final Fantasy, Metal Gear Solid, Pokémon, Dragon Ball Z, One Piece, Harry Potter, Lord of the Rings—anything where you knew the creator approached building a world as their life's work, as if they had to do it or else, as if they had to literally imagine a place because the real world didn't gel with their soul—this is for you.

This is a guide to show you that it's possible. And before you roll your eyes at another "AI will change everything" manifesto, let me be clear: this process actually inspired me to re-incorporate manual art-making into my workflow. AI didn't replace craft, but rather, iit revealed how much craft is still required.

AI is expensive. AI is time-consuming. AI requires skill, care, and creative vision to use well. And I see a lot of the loudest voices in AI influencing being the primary reason why AI gets called "slop." They either lack the creativity, care, or money to push the limits of the technology and make something as good as they can. They're more concerned with showing possibilities, rage bait, and collecting engagement checks than actually making a thing.

This Dev Log is for:

Indie creators who want to use AI as a tool, not a crutch
My supporters who want to know what actually goes into this work
Potential collaborators and clients who need to understand the process
The anti-AI people who make blanket statements like "if you use AI you're not an artist" (though unfortunately, you probably won't read this)

I spent 60-80+ hours creating an 11.5-minute animated episode set in Nearstalgia Bay, a fictional coastal town in the Surf Noir Archipelago. This document breaks down everything: my creative choices, technical challenges, workflow solutions, and honest cost analysis.

My philosophy: “Make it bad, make it imperfect” was derived from a Gawx video I watched during his process that helped me break through my imperfections. It’s poignant because people think AI is all about perfection, when it has so many imperfections. However, there's a much needed growing movement of people leaving the imperfections of humanity, in their work to prove it’s not AI. While the imperfections of AI and the imperfections of humanity manifest differently, the notion was immensely important for someone like myself who is prone to perfectionism.

Let's begin.

PART I: CREATIVE FOUNDATIONS

Thematic Motivations

I first came up with Surf Noir and Future Surf in 2020. It was the pandemic and we couldn’t leave the house, so I was left to literally imagine a place. This took form in two ways, exploring real places from a far through books and documentaries, and exploring places that didn’t exist at all through books, games, animation, my dreams, etc.

In my own imagined world, I could talk about the issues that I sometimes find trouble addressing in the real world. Magical realism makes more sense to me than actual realism. And thus, Surf Noir is a vehicle for me to have fun tackling issues of classism, gentrification, corporate greed, necessary capitalism, nostalgia as a crutch, and so much more.

The World of Surf Noir

Surf Noir is a transmedia world-building project inspired by my hometown area in Virginia—Hampton Roads, also known as The Seven Cities. It's a coastal region highly susceptible to sea level rise, connected by bridges and underwater tunnels. But in Surf Noir's world, set in 20XX, it transformed.

The seven cities became seven independent island nations collectively known as The Seven Isles of Surf Noir.

The Central Conflict: A company called Wave-Tech (founded by the Wavecrest family) discovered a prehistoric stone with water-manipulating properties during deep-sea oil drilling. A lowly oil rig worker cleaned one off as a gift for his daughter. She wore it to prom on a full moon—the next day, freshly cleansed and sun-charged, it burned her chest.

A family friend and scientist investigated. He happened to be from the Wavecrest family. They led the gold rush to mine the stone.

The stone became known as:

Kaimana - what indigenous coastal tribes called it (Hawaiian for "diamond/power of the sea")
ORS (Oceanic Resonance Substrate) - what Wave-Tech calls it

As global warming intensified and sea levels rose, WaveTech emerged as global saviors by using ORS to slow sea level rise and redirect hurricanes. They became a corporate powerhouse that makes nearly everything, blurring the lines between corporation and government.

But there's a cost: Coastal tribes who'd maintained Kaimana for generations, using it for medicine, water purification, and spiritual practices, became targets. Governments and corporations oppressed them, accusing them of keeping the stone secret, labeling them threats because the stones could be weaponized.

Wave-Tech refined Kaimana into ORS—a single-use, unrechargeable version that maintains their monopoly. The authentic stones naturally recharge via sun and moon cycles. The tribes knew this. WaveTech destroyed it.

Nearstalgia Bay (NBX)

Episode 1 takes place in Nearstalgia Bay, a small resort and fishing village inspired by the Outer Banks (OBX). It's not one of the Seven Isles—it's somewhere in between, which gives it narrative breathing room.

The aesthetic: A town where it's always sunset. I never explain this. It just is. This eternal golden hour creates:

Visual nostalgia and liminal time
Tourism appeal (Wave-Tech markets this)
A frozen aesthetic preserving "the way things were"
Melancholy beauty

NBX got by on simple life and modest family tourism. Then Wave-Tech's waterfront district arrived—introducing a South Beach feel to the quiet fishing village. The clash between old NBX (retro technologies, historic surf shacks) and new NBX (sleek ORS-powered infrastructure) drives the conflict.

The Story: Our protagonist, Aza, works at Excursion Club—a struggling travel agency in Mid Town. When she returns from a trip, she discovers Wave-Tech is planning a "revitalization" (gentrification) of Mid Town.

Character Design Philosophy: Learning from Araki

While designing characters, I was reading Manga in Theory by Hirohiko Araki (creator of JoJo's Bizarre Adventure). His "Golden Ratio" of character design became my foundation:

Appearance - Visual design instantly communicates personality
Speech - Manner of speaking reveals intelligence and temperament
Thoughts - Inner world provides emotional depth
Actions - Behavior defines moral code

Key principles I applied:

Characters drive story, not the other way around - Never change a character to suit a plot
Distinctiveness above all - Use contradictions to make characters memorable
The Character's Logic - Every decision must make sense according to their personal rulebook
Desire and Motivation - Each character needs a clear driving force
Episodes matter more than overarching plot - People remember moments, not myths

My character checklist before generating:

Name, age, personality traits
Speech quirks and manner
Core belief or motto
Symbolic color or motif
Signature gestures
Favorite/least favorite things

This pre-work made generations vastly more successful because I knew who I was prompting, not just what they looked like.

The Aesthetic Question That Held Me Back for Years

For years, I tried forcing AI into the "perfect style." I wanted something so unique, so mine, that it would be instantly recognizable. This perfectionism paralyzed me.

Then I made a concession that changed everything: I adopted the One Piece approach.

Eiichiro Oda's art style evolved dramatically from East Blue to Wano, but the world and story remained consistent. Fans grew with the aesthetic improvements rather than expecting polish from day one.

My solution:

Primary storytelling: 3D video game cutscene aesthetic (think PS2-era Final Fantasy, but modern)
In-world memories and flashbacks: 2D sakuga-style animation
Print graphics and manga: 2D illustration
Let the style improve over time - Season 1 is the foundation, not the ceiling

This hybrid approach:

Makes 3D more forgiving for iteration
Lets me focus on storytelling NOW
Creates visual hierarchy (present = 3D, past = 2D)
Primes the audience for an eventual video game
Gives permission to improve without invalidating earlier work

Most importantly: It got me out of research hell and into production.

PART II: THE GREAT PIVOT - Why I Abandoned 2D for 3D

The 2D Research Phase (1+ month: Give or take 3 years)

The majority of my AI work since 2021 has been 2D anime-styled. For Surf Noir, I took my previous styles and began refining them, pushing toward something more unique than the standard "classic anime" look everyone else uses.

My target: Studio Trigger animation aesthetic (FLCL, Kill la Kill, Gurren Lagann).

What I loved about Trigger's style:

Characters don't try to "look real"
Movements are free, unrealistic, fluid
Facial features are expressive and elastic
There's a "controlled chaos" energy

I thought achieving this with AI would be more impressive than going realistic. I generated thousands of test images in Midjourney v7, training custom style references and exploring:

Expressive sakuga cel-shading
Loose energetic linework
Flat cel color blocks
Elastic anatomy
Dynamic hand-drawn energy

I made some beautiful images. I'll still publish them. But I couldn't make them work for animation.

Why 2D Didn't Work (Three Critical Problems)

Problem 1: Midjourney is great for ideation, terrible for consistency

Midjourney v7 creates stunning, unique imagery. But when you need:

The same character from multiple angles
Consistent proportions across scenes
Controlled variations in expression

...it falls apart. Every generation is a new interpretation. Even with style references (--sref), character profiles, and careful prompting, I couldn't get the structural consistency needed for animation.

Problem 2: Quality degradation in iterative editing

I tried porting Midjourney images into Google Nano Banana Pro , Seedream 4.5, and Flux 2 for more controlled editing. The workflow was:

Generate in MJ (creative, unique)
Import to Nano/Seedream/Flux (structural control)
Make adjustments (expressions, angles, etc.)

The problem: with each edit, quality degraded. By the third or fourth iteration, the image looked muddy, over-processed. This was especially bad in 2D styles where line weight and color flatness matter.

In 3D-leaning styles, this degradation was far less noticeable.

Problem 3: Style transfer failure in Freepik models

My Midjourney 2D styles were too unique and stylized. When I fed them into Freepik's Seedream or Nano Banana, the models couldn't retain:

The specific color palette
The line art quality
The cel-shading flatness

They would default to their interpretation of anime style—usually something more generic, more "Ghibli-adjacent," losing bold sketchy line work and the loose energy I wanted.

The 3D Solution

I already had a 3D style I'd developed in Midjourney—a hybrid of:

3D character models (video game cutscene quality)
2D cel-shading techniques
Film photography lighting and color grading

Why 3D worked better:

Video generation favors realism - The closer you get to photorealistic proportions and lighting, the easier video models handle it. AI video struggles with extreme 2D stylization.
Consistency across models - 3D character designs translated cleanly from Midjourney → Seedream → video generation. The "language" was more universal.
Less quality loss in iteration - Because 3D allows for more photographic lighting and texture detail, small degradations weren't as visible.
Retro game aesthetic fits the world - NBX has that nostalgic, "you had to be there" vibe. A PS2/early PS3-era cutscene look actually enhances the Surf Noir nostalgia.

The aesthetic I landed on:

Realistic materials and weathering (scuffed mechs, worn paint, dirt)
Anime proportions and character design
Grounded sci-fi (utility tech, lived-in world)
Noir grit through color grading and lighting

Think: Patlabor meets Eureka Seven meets Final Fantasy X cutscenes.

What I Learned About 2D (And Why I'll Return)

Now that Episode 1 is done, I'm more motivated to solve 2D animation, not less. Here's why:

AI made me want to go back to Blender.

Which is poetically ironic because I was learning Blender in 2023 when my laptop broke and all I had was my iPad. This is what got me into AI via Midjourney in the first place.

Throughout this process, I constantly thought: "If I could just manually adjust this one thing..." or "If I had more control over this movement..."

The limitations of AI video generation, especially lip sync and subtle character motion, made me realize that a hybrid approach is not the future, it’s the present:

Use AI for ideation, layout, backgrounds
Use manual 3D modeling for character rigging and precise control
Combine them for final output

I still want to make a 2D anime OVA. The 2D research wasn't wasted, it was educational. I now know:

Which models handle stylization best
How to maintain style consistency across platforms
Where AI breaks down and manual intervention is needed

The goal is a full-length 2D animated film. But I needed to finish something first to learn the pipeline. Episode 1 was that proving ground.

PART III: THE TECHNICAL GAUNTLET

(Half way through organizing this dev log, I realized I wanted to add in more videos and images, so I’ll be following up with a video essay version of this as well so I can actually show you)

The Image Generation Pipeline

My workflow evolved into this five-stage process:

Stage 1: Ideation in Midjourney v7

Generate creative, stylized concept images
Use for characters, environments, props
Leverage MJ's unique aesthetic and prompt interpretation
Don't worry about consistency yet, just explore

Stage 2: Character/Asset Refinement

Select best MJ generations
Create character sheets with multiple angles:
- Head shot with turnaround
- Full body shot (high res)
- Expression sheet
- Pose sheet
- 4-5 candid variations

Stage 3: Consistency Lock in Freepik

Import MJ images as reference
Use Seedream 4.5 (95% of generations—free during beta)
Use Nano Banana Pro (5% of generations—500 credits per 4K image)
Seedream: best quality and creativity
Nano: best prompt adherence and control

Stage 4: Scene Composition

Generate backgrounds separately from characters
Use variations feature to maintain consistency and get alternate angles.
Inpaint characters into scenes when needed
Export at highest resolution possible

Stage 5: Video Generation

Import to Kling (O-1 or 2.6 model)
Generate 5-10 second clips
Use start/end frame control when needed for transition effects
Pray it works the first time
Revise prompt run it again

Total Image Generations:

Midjourney: 5,532 images (ideation + style research)
Freepik: 2,663 images (final production)
- 95% Seedream 4.5 (free during beta)
- 5% Nano Banana Pro (500 credits per 4K gen)
Freepik (Character Creation): 284 images (converting MJ to usable assets)

Note: I accidentally deleted all my work once during file management and had to regenerate 300-500 images. These aren't counted above.

The Midjourney Mastery Curve

Early in production, I felt my prompt style had become stale and wanted to improve upon my understanding of how to speak Midjourney's language for the scenes I wanted. I used Clarinet's Prompt Helper GPT (an official MJ community tool found here) to understand parameter optimization.

Key learnings from that process:

Understanding Parameter Intentions:

--sw (Style Weight): Controls how strongly the model follows your style references (--sref). High values (1000) = aggressive style adherence. Low values (100-300) = softer blending.
--ow (Omni Weight): Controls how tightly the model binds to structural identity from omni-references (--oref). High values lock anatomy/proportions. Low values allow reinterpretation.
--exp (Expression Strength): Increases visual extremity and internal contrast. Values 60-100 = energetic but stable. Higher = more painterly chaos.
--stylize: Global aesthetic bias. Low (100-250) = literal/realistic. Medium (500-750) = cinematic. High (1000+) = painterly/experimental.

My final character generation formula:

Character head shot in expressive sakuga cel-shaded animation style inspired by Trigger Studio aesthetics. [detailed character description]. Clean cel background, soft anime color, loose energetic linework, flat cel color blocks, elastic anatomy, consistent proportions, dynamic hand-drawn energy. --ar 58:77 --raw --sref [style reference IDs] --profile [personal profile] --stylize 50-750

For 3D conversion:

3D sakuga animation portrait of [character description] --sref [3D reference] --sw 1000 --s 750 --v 7.0 --p [profile]

For character sheets:

A 2D character expression and pose sheet in sakuga cel-shaded anime style, featuring [character]. The sheet includes: Full-body front pose, full-body back view, head-and-shoulders neutral, smiling, surprised, angry, sad, joyful, annoyed, shy, serious, sleepy. All expressions front-facing in clean grid layout on white background. --ar 16:9 --s 750 --v 7.0

Critical discovery: When using an existing character reference (--oref) that's not in your target style, lower --ow to 50-80 and --sw to 300-400. This frees the model to reinterpret the character in the new style rather than just "repainting" the original.

Prompting Strategies That Worked

The "Enhance Prompt" vs "Auto Prompt" Test:

Freepik offers two AI-assisted prompting features:

Auto Prompt: AI analyzes your image and writes a description
Enhance Prompt: AI takes your prompt and improves it

I tested both extensively.

Results:

Auto Prompt: 50% success rate. Works better when you provide at least a short description so it understands your intent.
Enhance Prompt: 70% success rate. Sometimes added unwanted elements or misinterpreted style, but mostly helpful for learning the model's language.

Best practice: Write your own prompt first, use Enhance to see how the AI interprets it, then manually adjust based on what worked.

Speaking to Seedream:

Seedream wants:

Clear subject identification
Specific style keywords ("cel-shaded," "sakuga," "loose linework")
Lighting direction described simply
Camera angle stated upfront
Action verbs for motion, not abstract concepts

Example of a failed prompt: "Make this character look more dynamic and interesting"

Improved version: "Close-up shot of [reference] focusing on determined facial expression, Dutch angle, dramatic rim lighting from left"

The "Make It Bad" Principle:

Inspired by a Final Fantasy storytelling deep dive I was listening to during production, I realized: Don't get trapped in your workflow. Press all the buttons.

This led me to discover Freepik's experimental "Variations" feature—which, when it worked, saved me hours or even days of trying to maintain consistency across camera angles.

How Variations works:

Take an existing generation
Generate a bunch of variations at different camera angles
Maintains core composition, style, and character identity
Faster than regenerating from scratch
Focus on getting a different angle, not so much on consistency
Use this new angle shot as a reference along with your character designs and reprompt it so you can get this new angle.

This became essential for:

Getting a a fucking side-profile of a scene…jfc
Adjusting expressions without full regeneration
Fine-tuning compositions

Consistency Techniques

Using Seedream's Reference System:

Including the character reference image helps significantly. But the real trick:

When trying to change something in a scene without changing the entire scene:

Don't say: "Make this, change that, place this"

Instead say: "Close-up shot of [reference] focusing on ___"

This makes the model use the existing shot with stronger influence, rather than regenerating from scratch.

Creating a Character Profile in Freepik:

Freepik allows you to create a "character profile" that bundles multiple reference images:

Front view
Back view
Side view
Close-up headshot

When you prompt @character_name, it uses all references simultaneously. This improved consistency by ~40% compared to single image reference.

Video Generation: Where Things Got Hard

Platform Testing:

I tried:

Kling 2.6 / O-1 (primary tool - 90% of final footage)
Hailuo Minimax
Sora
Veo
Wan 2.5 / 2.6

Total credits spent on non-Kling testing: ~15,000

Why I stuck with Kling:

Best quality-to-cost ratio
Most reliable motion
Start/end frame control actually works
Handles 3D-style characters better than competitors

Kling Workflow:

Generate still frame in Freepik (perfect composition)
Import to Kling as starting frame
Write motion prompt describing action
Generate 5-10 second clip
Pray it doesn't hallucinate chaos

Motion Prompting Strategy:

Less is more. Kling wants to do too much. It loves dolly zooms and camera moves.

Overly complex: "The camera slowly dollies forward while the character turns their head, maintaining focus on the eyes as light shifts across the scene"

Simple and controlled: "Static shot. Character slowly turns head left. No camera movement."

Start/End Frame Control:

When possible, I used both start and end frames to constrain motion. This worked ~60% of the time. The other 40%, the model would:

Hallucinate new elements
Change the character's appearance
Introduce unwanted camera movement
Create temporal artifacts

The Door Scene Problem:

One of my early video generations had slight inconsistencies in door size between frames. When I tried using start/end frame control, it was a mess—the model couldn't reconcile the difference.

Solution: Generate the motion in reverse.

Instead of a start/end frame: "Character opens door and peeks through"

I did one image with the door already open: "Static shot: a stylized young woman with nervous eyes peeks through a slightly open, weathered blue wooden door into a cozy, cluttered library. Her eyes scan the space from left to right, then she hesitates, and gently pulls the door shut until it clicks."

Then I reversed the clip in DaVinci Resolve. Worked perfectly.

The Front Desk Sign Shot:

Sometimes auto-prompt actually nails it. This was one of those times:

"Slow cinematic dolly shot gliding along the curved wooden reception counter of the warm, vintage-style 'Excursion Club' lobby, the glowing golden sign letters casting soft light onto polished wood as the camera moves from close-up of the illuminated text to a wider view revealing travel posters, world maps, and brochures in the softly blurred background, shallow depth of field, cozy ambient lighting, elegant travel-club atmosphere."

I didn't write that. The AI did. I kept it.

PART IV: THE LIP SYNC NIGHTMARE

This was, without question, the most frustrating part of production.

The Problem

AI video models that do lip sync want to make your characters look like Pixar characters (AI made me hate Pixar style). Even when your base image is semi-realistic or anime-styled, the moment you add dialogue, faces become:

Overly cartoonish
Clay-like texture
Exaggerated Disney expressions
Loss of original art style

Platforms Tested

Omnihuman (Freepik):

My primary lip sync tool
Best quality when it worked
Major issue: Defaulted to Pixar-style facial animation
Required extensive prompt engineering to minimize

Higgsfield (Wan 2.5 / 2.6):

Wan 2.5: Degraded visual quality significantly
Wan 2.6: Better, but characters looked like clay
Abandoned after tests

OpenArt (Kling Avatar 2.0):

Tied with Omnihuman for quality
More artifacts than Omnihuman
Sometimes better at maintaining style
Used 10,000 credits testing (account worth $30/month, free via creative partnership with Open Art)

OpenArt (Standard Lip Sync):

Too many artifacts
Abandoned quickly

Hedra:

Terrible. Just terrible.
One test, never returned

My Workarounds

Since I couldn't get clean lip sync consistently, I designed around it:

1. Internal monologue Many lines play as voiceover thoughts rather than spoken dialogue. This allowed me to:

Use more expressive voice acting (I recorded lines myself and did voice changes)
Avoid mouth animation entirely
Create a more introspective tone

2. Off-screen dialogue Characters speak while the camera shows:

Another character's reaction
The environment
An object of focus

(I learned this from an episode of Invincible where the creator makes a cameo in the show)

3. Wide shots without visible mouth When characters had to speak on-screen, I used:

Extreme wide shots where mouth detail isn't visible
Over-the-shoulder angles
Characters facing away or in profile

4. Strategic close-ups only For critical emotional moments, I accepted the lip sync compromise and used it. But I limited this to a hand ful of shots in the entire episode.

Controlling Omnihuman's Cartoonish Tendency

Failed approach: "Generate lip sync for this dialogue"

Successful approach: Hyper-specific prompting with constraints:

[camera angle] of the [describe character] as they continue to look [direction they are facing]. Their head and eyes do not move from the position in the @Start image. No hand motion. The character is completely still and composed. Their facial expression [emotion that matches the audio you’re using].

Key techniques:

State the camera angle explicitly
Reinforce "no extra movement"
Direct where eyes and face should point
Use phrases like "completely still," "composed," "maintains position"
Reference the exact starting frame

Important discovery: Close-ups work MUCH better than full-body shots. Full-body lip sync introduces:

Arbitrary hand gestures
Body swaying
Unwanted head movement

Keep it tight on the face when you must use it.

Image quality matters: If your base image is already on the fence between Pixar-style and realism, lip sync will push it fully into Pixar. Use more realistic base images for lip sync shots.

The Multi-Character Dialogue Experiment

I tried using ElevenLabs' multi-voice prompt feature to generate a full conversation between Aza and Kate at the train station, thinking I could:

Generate the conversation as one audio file
Feed it into Kling Avatar 2.0 for multi-character lip sync
Get the whole scene in one generation

The test: 700 credits in Kling Avatar 2.0

The prompt:

The two characters are looking directly at each other while speaking. There is no hand movement. Their motions are calm and relaxed. No smiling.

Character on the left says: "So where are you headed?"

Character on the right says: "I gotta meet Naomi, we're staying in Waterside"

Character on the left says: "Ooo fancy"

...etc.

Result: FAILED.

The model couldn't handle:

Two characters speaking alternately
Maintaining consistent expressions
Preventing unwanted gestures
Keeping them looking at each other

Solution: Split the dialogue into separate audio tracks with silence between lines, generate separate close-up shots for each character, edit them together in post.

More work, slightly functional…but ultimately less natural feeling than my previous off-screen dialogue work around. I scrapped these shots.

PART V: SOUND DESIGN & VOICE

ElevenLabs Voice Acting

ElevenLabs became my primary voice tool. I used two accounts (don't ask me how) with a combined 15,000 credits spent.

Voice Design Process:

Each character got a specific voice profile:

Aza: Talia - Calm, slightly raspy, never too excited
Jade: Tiffany - Natural and welcoming, "bubbly black girl - Delta energy"
Javonte: Ministar - Too cool, poetic, whispery, British-African inspired
Cass: [voice undecided] - Mousy, soft-spoken
Kate: Ivy - Sophisticated and sassy, kawaii energy
Surf Shack Man: Scott - Calm and welcoming
Surf Shack Woman: Ms. Harris - Caring Southern mom
Student 1: Revenant - Youthful enthusiasm
Student 2: Grechen - Snooty brat
Teacher: Ivanna - Authority figure warmth, the cool teacher

The Boy's Voice (Custom Creation):

There are NO good kid voices on ElevenLabs. All the "childlike" voices were women doing super cartoonish performances.

So I made one from scratch using ElevenLabs' voice cloning.

Result: I was STUNNED by the quality, expressiveness, and accuracy of vernacular. The voice felt authentic, natural, and emotionally present, especially considering my prompt was super basic.

Highly recommend the custom voice feature for unique character needs.

Multi-Voice Dialogue:

As mentioned earlier, I tested multi-character conversations generated in one file. The audio quality was excellent—natural pauses, realistic overlaps, proper emotional tone.

Where it failed was in video generation (lip sync couldn't handle it). But for pure audio storytelling or podcast-style content, this feature is incredible.

Sound Effects

ElevenLabs SFX:

Great for ambient sound loops
I used a mix of ElevenLabs and Pixabay

Pixabay:

Primary source for most SFX (free, high-quality)
Footsteps, door creaks, train sounds, etc.

Important note: You don't have to use AI for everything. Pixabay's library is massive and free. Why generate a door creak with AI when a perfect one already exists? "Who gon be the humans” — Jamee Cornelia

Ambient Design:

NBX's eternal sunset needed a consistent soundscape:

Distant ocean waves (looped)
Seagull calls (sparse)
Light wind (constant low presence)
Urban hum in Waterside District scenes

I layered 3-4 ambient tracks per scene to create depth without overwhelming dialogue.

PART VI: FILE MANAGEMENT & WORKFLOW CHAOS

The Disaster I Created

I create messy. When I try to be too organized upfront, it fucks up my flow. But the downside:

I accidentally deleted all of my work once.

Trashed the wrong folder. Gone. 300-500 images, hours of video tests. Had to regenerate large portions from scratch. But after this my process was much more refined so I was able to catch up to my previous stopping point in one day.

The second problem: DaVinci Resolve can't locate clips if you move files after importing. I reorganized my folders mid-production and luckily caught this before it became overwhelming to relink media, it’s just something to take note of..

What I Learned (The Hard Way)

During production:

Let the chaos happen
One big "Active Project" folder with everything dumped in
Use DaVinci's internal organization (bins, tags, colors)
Don't move files once they're imported

After production:

THEN organize into a proper folder structure
Create a master archive with clear naming
Export a project file with relinked media
Back up to external drive and cloud

My eventual structure:

Surf_Noir_EP01/

├── 01_Scripts/

├── 02_Storyboards/

├── 03_Assets/

│ ├── Characters/

│ ├── Backgrounds/

│ ├── Props/

├── 04_Audio/

│ ├── Dialogue/

│ ├── SFX/

│ ├── Music/

├── 05_Video_Generations/

│ ├── Scene_01/

│ ├── Scene_02/

│ └── ...

├── 06_Final_Edit/

└── 07_Exports/

Why this matters: When sharing your process later (like in this Dev Log), having clear organization makes pulling examples and references infinitely easier. I learned this from working at ad agencies like Vayner Media.

PART VII: THE NUMBERS - COMPLETE COST BREAKDOWN

Time Investment

Total: 60-80+ hours

This doesn't include:

Months of pre-production conceptualizing Surf Noir
The 1+ month 2D research phase (mostly abandoned)
Writing scripts and dialogue
Character design pre-work

Breakdown estimate:

Character design & asset creation: 15-20 hours
Scene composition & image generation: 20-25 hours
Video generation & iteration: 15-20 hours
Voice recording & sound design: 5-8 hours
Editing in DaVinci Resolve: 10-15 hours
Rendering & troubleshooting: 5-7 hours

Labor rate benchmark: $45-60/hour (industry standard for AI creative direction, based on my experience at tech startups, Hollywood ad agencies, and recent recruitment offers)

Monetary value of labor: $2,700 - $4,800

Platform Costs

Midjourney Pro (Annual):

Normal: $60/month
My rate: $48/month (annual discount)
Months used: 1 month heavy production
Cost: $48

Freepik Credits:

Initial research (before deletion): ~50,000 credits
Actual production: 372,300 credits

Important context:

Seedream 4.5 (≈95% of images) was free during beta
Most credits were spent on video generation, not images
Nano Banana Pro was used sparingly due to cost (500 credits per 4K generation)

Because Freepik’s pricing model is credit-based and tiered, exact dollar equivalency fluctuates. However, at current pricing tiers, this volume of usage would conservatively translate to hundreds to low four figures if Seedream were not free.

This is a critical point:

AI is only “cheap” if you’re not doing much with it.

Kling Video Generation:

Primary model: Kling 2.6 / O-1
Nearly all final video footage generated here
Other platforms tested (Sora, Veo, Minimax, Wan): ~15,000 credits total
Kling proved best quality-to-cost ratio

Exact Kling costs vary by plan and usage window, but this episode represents heavy, sustained generation, not casual testing.

OpenArt (Lip Sync Testing):

Account value: ~$30/month
Credits included: 12,000
Credits used on lip sync: ~10,000
Access provided via creative partnership

ElevenLabs (Voice & SFX):

Free tier: 10,000 credits per account
Two accounts used (don’t ask)
~15,000 credits total
Covered:
- Full dialogue
- Internal monologue
- Select ambient sound effects

Software Licenses:

DaVinci Resolve Studio: $299 perpetual license

What This Would Cost Without “Lucky Breaks”

This episode benefited from:

Seedream being temporarily free
A creative partnership on OpenArt
Already having DaVinci Resolve
Prior sunk costs in tools I already owned

If all tools were paid at market rates today, producing this episode would realistically cost:

Low estimate: $1,000–$1,500
High estimate: $3,000–$5,000+

And that’s before valuing labor.

Comparison to Traditional Animation

A traditionally animated 8-minute episode at even modest indie rates would require:

Storyboarding
Character design
Background art
Layout
Animation
Cleanup
Color
Compositing
Editing
Sound design

Even at an extremely conservative $5,000 per finished minute, you’re looking at:

$40,000+ minimum

More realistically:

$80,000–$150,000

AI didn’t just eliminate cost. It collapsed the distance between a solo creator and studio-scale output.

ART VIII: WHAT AI ACTUALLY CHANGED (AND WHAT IT DIDN’T)

What AI Did NOT Do

AI did not:

Write the story
Design the world
Decide the tone
Choose the shots
Maintain continuity
Solve narrative problems
Create emotional intent

Every time something worked, it was because:

I knew what I wanted
I could recognize when it was wrong
I had the taste to say “no”

That’s the part people ignore when they call this “slop.”

What AI DID Change

AI:

Lowered the logistical barrier to entry
Made iteration possible at solo scale
Allowed me to fail faster
Exposed weak creative instincts immediately
Punished vague thinking
Rewarded specificity

Most importantly, AI forced clarity.

If you don’t know what you want, AI will happily give you something.
That something will almost always be generic.

Why “AI Slop” Exists

AI slop isn’t a tool problem.
It’s a taste problem.

The loudest AI influencers:

Optimize for output, not meaning
Confuse novelty with substance
Treat art as content arbitrage
Never sit with a piece long enough to refine it

This project took 60–80 hours because:

I rejected hundreds of “good enough” outputs
I rewrote prompts obsessively
I rebuilt scenes that technically worked but emotionally didn’t

That labor is invisible to people scrolling past results.

PART IX: WHY THIS MATTERS (BEYOND THIS EPISODE)

This Was Never Just an Episode

Episode 1 is:

A narrative pilot
A workflow test
A proof-of-concept
A studio dry run

It proves that:

A solo creator can produce serialized animation
Worldbuilding can precede monetization
AI can be used with intention, not as spectacle

Surf Noir as a Studio Model

Long-term, Surf Noir is:

A transmedia IP
A music + animation ecosystem
A client-facing creative studio
A testbed for hybrid AI/manual workflows

This Dev Log doubles as:

A portfolio artifact
A transparency document
A capability statement

If you’re a collaborator, investor, or client:
This is what my process actually looks like.

PART X: WHAT I’D DO DIFFERENTLY (HONESTLY)

1. Lock Visual Language Earlier

I lost time chasing “perfect” when “coherent” would’ve been enough.

2. Design Around AI Limitations Sooner

Lip sync taught me this the hard way. Structure the story to avoid known weaknesses.

3. Commit to Hybrid Earlier

Blender + AI would’ve saved hours in subtle motion and control.

4. Ship Earlier

Nothing teaches like finishing.

PART XI: WHAT’S NEXT

Continued 2D research
Blender character workflows
Expanded Surf Noir episodes
Music-forward storytelling
URL + IRL integrations
Client work under Surf Noir Studio

FINAL THOUGHT

If you take anything from this:

AI will not save you from the work.
But it will meet you wherever your ambition already is.

If your vision is small, it’ll stay small.
If your vision is obsessive, messy, personal, and necessary — AI can help you finish it.

And finishing changes everything.

TL;DR I spent ~60–80 hours creating an 11-minute animated episode set in the Surf Noir universe using a hybrid AI workflow (Midjourney, Freepik/Seedream, Kling, ElevenLabs, DaVinci Resolve).