Cover photo

SURF NOIR DEV LOG 1

Building Nearstalgia Bay With (and Against) AI


TL;DR

I spent ~60–80 hours creating an 11-minute animated episode set in the Surf Noir universe using a hybrid AI workflow (Midjourney, Freepik/Seedream, Kling, ElevenLabs, DaVinci Resolve).

This was not “prompt and publish.” It was directing, editing, rewriting, rejecting, and problem-solving at scale. Over 5,500+ Midjourney images, 2,600+ Freepik images, and 400k+ generation credits were used to arrive at a finished episode.

Because of beta access and partnerships, the out-of-pocket cost was low — but at current market rates, this episode would realistically cost $1k–$5k to produce with AI. A traditionally animated equivalent could cost anywhere from $40k–$150k+.

AI didn’t replace storytelling. It compressed production distance — allowing one person to iterate like a small studio — while making taste, clarity, and intention the real bottlenecks.

This dev log documents how a story-first approach, not the tools themselves, determines whether AI outputs feel like “slop” or cinema.

INTRODUCTION: FOR THE WORLD-BUILDERS

This document is for anyone who's ever had a vision that seemed too big, too outlandish, or too impossible to do alone.

If you grew up on Final Fantasy, Metal Gear Solid, Pokémon, Dragon Ball Z, One Piece, Harry Potter, Lord of the Rings—anything where you knew the creator approached building a world as their life's work, as if they had to do it or else, as if they had to literally imagine a place because the real world didn't gel with their soul—this is for you.

This is a guide to show you that it's possible. And before you roll your eyes at another "AI will change everything" manifesto, let me be clear: this process actually inspired me to re-incorporate manual art-making into my workflow. AI didn't replace craft, but rather, iit revealed how much craft is still required.

AI is expensive. AI is time-consuming. AI requires skill, care, and creative vision to use well. And I see a lot of the loudest voices in AI influencing being the primary reason why AI gets called "slop." They either lack the creativity, care, or money to push the limits of the technology and make something as good as they can. They're more concerned with showing possibilities, rage bait, and collecting engagement checks than actually making a thing.

This Dev Log is for:

  • Indie creators who want to use AI as a tool, not a crutch

  • My supporters who want to know what actually goes into this work

  • Potential collaborators and clients who need to understand the process

  • The anti-AI people who make blanket statements like "if you use AI you're not an artist" (though unfortunately, you probably won't read this)

I spent 60-80+ hours creating an 11.5-minute animated episode set in Nearstalgia Bay, a fictional coastal town in the Surf Noir Archipelago. This document breaks down everything: my creative choices, technical challenges, workflow solutions, and honest cost analysis.

My philosophy: “Make it bad, make it imperfect” was derived from a Gawx video I watched during his process that helped me break through my imperfections. It’s poignant because people think AI is all about perfection, when it has so many imperfections. However, there's a much needed growing movement of people leaving the imperfections of humanity, in their work to prove it’s not AI. While the imperfections of AI and the imperfections of humanity manifest differently, the notion was immensely important for someone like myself who is prone to perfectionism. 

Let's begin.


PART I: CREATIVE FOUNDATIONS

Thematic Motivations

I first came up with Surf Noir and Future Surf in 2020. It was the pandemic and we couldn’t leave the house, so I was left to literally imagine a place. This took form in two ways, exploring real places from a far through books and documentaries, and exploring places that didn’t exist at all through books, games, animation, my dreams, etc.

In my own imagined world, I could talk about the issues that I sometimes find trouble addressing in the real world. Magical realism makes more sense to me than actual realism. And thus, Surf Noir is a vehicle for me to have fun tackling issues of classism, gentrification, corporate greed, necessary capitalism, nostalgia as a crutch, and so much more.

post image

The World of Surf Noir

Surf Noir is a transmedia world-building project inspired by my hometown area in Virginia—Hampton Roads, also known as The Seven Cities. It's a coastal region highly susceptible to sea level rise, connected by bridges and underwater tunnels. But in Surf Noir's world, set in 20XX, it transformed. 

The seven cities became seven independent island nations collectively known as The Seven Isles of Surf Noir.

The Central Conflict: A company called Wave-Tech (founded by the Wavecrest family) discovered a prehistoric stone with water-manipulating properties during deep-sea oil drilling. A lowly oil rig worker cleaned one off as a gift for his daughter. She wore it to prom on a full moon—the next day, freshly cleansed and sun-charged, it burned her chest.

A family friend and scientist investigated. He happened to be from the Wavecrest family. They led the gold rush to mine the stone.

post image

The stone became known as:

  • Kaimana - what indigenous coastal tribes called it (Hawaiian for "diamond/power of the sea")

  • ORS (Oceanic Resonance Substrate) - what Wave-Tech calls it

As global warming intensified and sea levels rose, WaveTech emerged as global saviors by using ORS to slow sea level rise and redirect hurricanes. They became a corporate powerhouse that makes nearly everything, blurring the lines between corporation and government.

But there's a cost: Coastal tribes who'd maintained Kaimana for generations, using it for medicine, water purification, and spiritual practices, became targets. Governments and corporations oppressed them, accusing them of keeping the stone secret, labeling them threats because the stones could be weaponized.

Wave-Tech refined Kaimana into ORS—a single-use, unrechargeable version that maintains their monopoly. The authentic stones naturally recharge via sun and moon cycles. The tribes knew this. WaveTech destroyed it.

post image

Nearstalgia Bay (NBX)

Episode 1 takes place in Nearstalgia Bay, a small resort and fishing village inspired by the Outer Banks (OBX). It's not one of the Seven Isles—it's somewhere in between, which gives it narrative breathing room.

The aesthetic: A town where it's always sunset. I never explain this. It just is. This eternal golden hour creates:

  • Visual nostalgia and liminal time

  • Tourism appeal (Wave-Tech markets this)

  • A frozen aesthetic preserving "the way things were"

  • Melancholy beauty

NBX got by on simple life and modest family tourism. Then Wave-Tech's waterfront district arrived—introducing a South Beach feel to the quiet fishing village. The clash between old NBX (retro technologies, historic surf shacks) and new NBX (sleek ORS-powered infrastructure) drives the conflict.

The Story: Our protagonist, Aza, works at Excursion Club—a struggling travel agency in Mid Town. When she returns from a trip, she discovers Wave-Tech is planning a "revitalization" (gentrification) of Mid Town. 

Character Design Philosophy: Learning from Araki

While designing characters, I was reading Manga in Theory by Hirohiko Araki (creator of JoJo's Bizarre Adventure). His "Golden Ratio" of character design became my foundation:

  1. Appearance - Visual design instantly communicates personality

  2. Speech - Manner of speaking reveals intelligence and temperament

  3. Thoughts - Inner world provides emotional depth

  4. Actions - Behavior defines moral code

Key principles I applied:

  • Characters drive story, not the other way around - Never change a character to suit a plot

  • Distinctiveness above all - Use contradictions to make characters memorable

  • The Character's Logic - Every decision must make sense according to their personal rulebook

  • Desire and Motivation - Each character needs a clear driving force

  • Episodes matter more than overarching plot - People remember moments, not myths

My character checklist before generating:

  • Name, age, personality traits

  • Speech quirks and manner

  • Core belief or motto

  • Symbolic color or motif

  • Signature gestures

  • Favorite/least favorite things

This pre-work made generations vastly more successful because I knew who I was prompting, not just what they looked like.

The Aesthetic Question That Held Me Back for Years

For years, I tried forcing AI into the "perfect style." I wanted something so unique, so mine, that it would be instantly recognizable. This perfectionism paralyzed me.

Then I made a concession that changed everything: I adopted the One Piece approach.

Eiichiro Oda's art style evolved dramatically from East Blue to Wano, but the world and story remained consistent. Fans grew with the aesthetic improvements rather than expecting polish from day one.

post image

My solution:

  • Primary storytelling: 3D video game cutscene aesthetic (think PS2-era Final Fantasy, but modern)

  • In-world memories and flashbacks: 2D sakuga-style animation

  • Print graphics and manga: 2D illustration

  • Let the style improve over time - Season 1 is the foundation, not the ceiling

This hybrid approach:

  • Makes 3D more forgiving for iteration

  • Lets me focus on storytelling NOW

  • Creates visual hierarchy (present = 3D, past = 2D)

  • Primes the audience for an eventual video game

  • Gives permission to improve without invalidating earlier work

Most importantly: It got me out of research hell and into production.


PART II: THE GREAT PIVOT - Why I Abandoned 2D for 3D

The 2D Research Phase (1+ month: Give or take 3 years)

The majority of my AI work since 2021 has been 2D anime-styled. For Surf Noir, I took my previous styles and began refining them, pushing toward something more unique than the standard "classic anime" look everyone else uses.

My target: Studio Trigger animation aesthetic (FLCL, Kill la Kill, Gurren Lagann).

post image

What I loved about Trigger's style:

  • Characters don't try to "look real"

  • Movements are free, unrealistic, fluid

  • Facial features are expressive and elastic

  • There's a "controlled chaos" energy

I thought achieving this with AI would be more impressive than going realistic. I generated thousands of test images in Midjourney v7, training custom style references and exploring:

  • Expressive sakuga cel-shading

  • Loose energetic linework

  • Flat cel color blocks

  • Elastic anatomy

  • Dynamic hand-drawn energy

I made some beautiful images. I'll still publish them. But I couldn't make them work for animation.

post image

Why 2D Didn't Work (Three Critical Problems)

Problem 1: Midjourney is great for ideation, terrible for consistency

Midjourney v7 creates stunning, unique imagery. But when you need:

  • The same character from multiple angles

  • Consistent proportions across scenes

  • Controlled variations in expression

...it falls apart. Every generation is a new interpretation. Even with style references (--sref), character profiles, and careful prompting, I couldn't get the structural consistency needed for animation.

Problem 2: Quality degradation in iterative editing

I tried porting Midjourney images into Google Nano Banana Pro , Seedream 4.5, and Flux 2 for more controlled editing. The workflow was:

  1. Generate in MJ (creative, unique)

  2. Import to Nano/Seedream/Flux (structural control)

  3. Make adjustments (expressions, angles, etc.)

The problem: with each edit, quality degraded. By the third or fourth iteration, the image looked muddy, over-processed. This was especially bad in 2D styles where line weight and color flatness matter.

In 3D-leaning styles, this degradation was far less noticeable.

Problem 3: Style transfer failure in Freepik models

My Midjourney 2D styles were too unique and stylized. When I fed them into Freepik's Seedream or Nano Banana, the models couldn't retain:

  • The specific color palette

  • The line art quality

  • The cel-shading flatness

They would default to their interpretation of anime style—usually something more generic, more "Ghibli-adjacent," losing bold sketchy line work and the loose energy I wanted.

The 3D Solution

I already had a 3D style I'd developed in Midjourney—a hybrid of:

  • 3D character models (video game cutscene quality)

  • 2D cel-shading techniques

  • Film photography lighting and color grading

post image

Why 3D worked better:

  1. Video generation favors realism - The closer you get to photorealistic proportions and lighting, the easier video models handle it. AI video struggles with extreme 2D stylization.

  2. Consistency across models - 3D character designs translated cleanly from Midjourney → Seedream → video generation. The "language" was more universal.

  3. Less quality loss in iteration - Because 3D allows for more photographic lighting and texture detail, small degradations weren't as visible.

  4. Retro game aesthetic fits the world - NBX has that nostalgic, "you had to be there" vibe. A PS2/early PS3-era cutscene look actually enhances the Surf Noir nostalgia.

The aesthetic I landed on:

  • Realistic materials and weathering (scuffed mechs, worn paint, dirt)

  • Anime proportions and character design

  • Grounded sci-fi (utility tech, lived-in world)

  • Noir grit through color grading and lighting

Think: Patlabor meets Eureka Seven meets Final Fantasy X cutscenes.

What I Learned About 2D (And Why I'll Return)

Now that Episode 1 is done, I'm more motivated to solve 2D animation, not less. Here's why:

AI made me want to go back to Blender.

Which is poetically ironic because I was learning Blender in 2023 when my laptop broke and all I had was my iPad. This is what got me into AI via Midjourney in the first place.

Throughout this process, I constantly thought: "If I could just manually adjust this one thing..." or "If I had more control over this movement..."

The limitations of AI video generation, especially lip sync and subtle character motion, made me realize that a hybrid approach is not the future, it’s the present:

  • Use AI for ideation, layout, backgrounds

  • Use manual 3D modeling for character rigging and precise control

  • Combine them for final output

I still want to make a 2D anime OVA. The 2D research wasn't wasted, it was educational. I now know:

  • Which models handle stylization best

  • How to maintain style consistency across platforms

  • Where AI breaks down and manual intervention is needed

The goal is a full-length 2D animated film. But I needed to finish something first to learn the pipeline. Episode 1 was that proving ground.


PART III: THE TECHNICAL GAUNTLET

(Half way through organizing this dev log, I realized I wanted to add in more videos and images, so I’ll be following up with a video essay version of this as well so I can actually show you)

The Image Generation Pipeline

My workflow evolved into this five-stage process:

Stage 1: Ideation in Midjourney v7

  • Generate creative, stylized concept images

  • Use for characters, environments, props

  • Leverage MJ's unique aesthetic and prompt interpretation

  • Don't worry about consistency yet, just explore

Stage 2: Character/Asset Refinement

  • Select best MJ generations

  • Create character sheets with multiple angles:

    • Head shot with turnaround

    • Full body shot (high res)

    • Expression sheet

    • Pose sheet

    • 4-5 candid variations

Stage 3: Consistency Lock in Freepik

  • Import MJ images as reference

  • Use Seedream 4.5 (95% of generations—free during beta)

  • Use Nano Banana Pro (5% of generations—500 credits per 4K image)

  • Seedream: best quality and creativity

  • Nano: best prompt adherence and control

Stage 4: Scene Composition

  • Generate backgrounds separately from characters

  • Use variations feature to maintain consistency and get alternate angles.

  • Inpaint characters into scenes when needed

  • Export at highest resolution possible

Stage 5: Video Generation

  • Import to Kling (O-1 or 2.6 model)

  • Generate 5-10 second clips

  • Use start/end frame control when needed for transition effects

  • Pray it works the first time

  • Revise prompt run it again

Total Image Generations:

  • Midjourney: 5,532 images (ideation + style research)

  • Freepik: 2,663 images (final production)

    • 95% Seedream 4.5 (free during beta)

    • 5% Nano Banana Pro (500 credits per 4K gen)

  • Freepik (Character Creation): 284 images (converting MJ to usable assets)

Note: I accidentally deleted all my work once during file management and had to regenerate 300-500 images. These aren't counted above.

post image
midjourney version
post image
final scene composition

The Midjourney Mastery Curve

Early in production, I felt my prompt style had become stale and wanted to improve upon my understanding of how to speak Midjourney's language for the scenes I wanted. I used Clarinet's Prompt Helper GPT (an official MJ community tool found here) to understand parameter optimization.

Key learnings from that process:

Understanding Parameter Intentions:

  • --sw (Style Weight): Controls how strongly the model follows your style references (--sref). High values (1000) = aggressive style adherence. Low values (100-300) = softer blending.

  • --ow (Omni Weight): Controls how tightly the model binds to structural identity from omni-references (--oref). High values lock anatomy/proportions. Low values allow reinterpretation.

  • --exp (Expression Strength): Increases visual extremity and internal contrast. Values 60-100 = energetic but stable. Higher = more painterly chaos.

  • --stylize: Global aesthetic bias. Low (100-250) = literal/realistic. Medium (500-750) = cinematic. High (1000+) = painterly/experimental.

My final character generation formula:

Character head shot in expressive sakuga cel-shaded animation style inspired by Trigger Studio aesthetics. [detailed character description]. Clean cel background, soft anime color, loose energetic linework, flat cel color blocks, elastic anatomy, consistent proportions, dynamic hand-drawn energy. --ar 58:77 --raw --sref [style reference IDs] --profile [personal profile] --stylize 50-750

post image

For 3D conversion:

3D sakuga animation portrait of [character description] --sref [3D reference] --sw 1000 --s 750 --v 7.0 --p [profile]

For character sheets:

A 2D character expression and pose sheet in sakuga cel-shaded anime style, featuring [character]. The sheet includes: Full-body front pose, full-body back view, head-and-shoulders neutral, smiling, surprised, angry, sad, joyful, annoyed, shy, serious, sleepy. All expressions front-facing in clean grid layout on white background. --ar 16:9 --s 750 --v 7.0

post image

Critical discovery: When using an existing character reference (--oref) that's not in your target style, lower --ow to 50-80 and --sw to 300-400. This frees the model to reinterpret the character in the new style rather than just "repainting" the original.

Prompting Strategies That Worked

The "Enhance Prompt" vs "Auto Prompt" Test:

Freepik offers two AI-assisted prompting features:

  • Auto Prompt: AI analyzes your image and writes a description

  • Enhance Prompt: AI takes your prompt and improves it

I tested both extensively.

Results:

  • Auto Prompt: 50% success rate. Works better when you provide at least a short description so it understands your intent.

  • Enhance Prompt: 70% success rate. Sometimes added unwanted elements or misinterpreted style, but mostly helpful for learning the model's language.

Best practice: Write your own prompt first, use Enhance to see how the AI interprets it, then manually adjust based on what worked.

Speaking to Seedream:

Seedream wants:

  • Clear subject identification

  • Specific style keywords ("cel-shaded," "sakuga," "loose linework")

  • Lighting direction described simply

  • Camera angle stated upfront

  • Action verbs for motion, not abstract concepts

Example of a failed prompt: "Make this character look more dynamic and interesting"

Improved version: "Close-up shot of [reference] focusing on determined facial expression, Dutch angle, dramatic rim lighting from left"

The "Make It Bad" Principle:

Inspired by a Final Fantasy storytelling deep dive I was listening to during production, I realized: Don't get trapped in your workflow. Press all the buttons.

This led me to discover Freepik's experimental "Variations" feature—which, when it worked, saved me hours or even days of trying to maintain consistency across camera angles.

How Variations works:

  • Take an existing generation

  • Generate a bunch of variations at different camera angles

  • Maintains core composition, style, and character identity

  • Faster than regenerating from scratch

  • Focus on getting a different angle, not so much on consistency

  • Use this new angle shot as a reference along with your character designs and reprompt it so you can get this new angle.

This became essential for:

  • Getting a a fucking side-profile of a scene…jfc

  • Adjusting expressions without full regeneration

  • Fine-tuning compositions

Consistency Techniques

Using Seedream's Reference System:

Including the character reference image helps significantly. But the real trick:

When trying to change something in a scene without changing the entire scene:

Don't say: "Make this, change that, place this"

Instead say: "Close-up shot of [reference] focusing on ___"

This makes the model use the existing shot with stronger influence, rather than regenerating from scratch.

Creating a Character Profile in Freepik:

Freepik allows you to create a "character profile" that bundles multiple reference images:

  • Front view

  • Back view

  • Side view

  • Close-up headshot

When you prompt @character_name, it uses all references simultaneously. This improved consistency by ~40% compared to single image reference.

Video Generation: Where Things Got Hard

Platform Testing:

I tried:

  • Kling 2.6 / O-1 (primary tool - 90% of final footage)

  • Hailuo Minimax

  • Sora

  • Veo

  • Wan 2.5 / 2.6

Total credits spent on non-Kling testing: ~15,000

Why I stuck with Kling:

  • Best quality-to-cost ratio

  • Most reliable motion

  • Start/end frame control actually works

  • Handles 3D-style characters better than competitors

Kling Workflow:

  1. Generate still frame in Freepik (perfect composition)

  2. Import to Kling as starting frame

  3. Write motion prompt describing action

  4. Generate 5-10 second clip

  5. Pray it doesn't hallucinate chaos

Motion Prompting Strategy:

Less is more. Kling wants to do too much. It loves dolly zooms and camera moves.

Overly complex: "The camera slowly dollies forward while the character turns their head, maintaining focus on the eyes as light shifts across the scene"

Simple and controlled: "Static shot. Character slowly turns head left. No camera movement."

Start/End Frame Control:

When possible, I used both start and end frames to constrain motion. This worked ~60% of the time. The other 40%, the model would:

  • Hallucinate new elements

  • Change the character's appearance

  • Introduce unwanted camera movement

  • Create temporal artifacts

The Door Scene Problem:

One of my early video generations had slight inconsistencies in door size between frames. When I tried using start/end frame control, it was a mess—the model couldn't reconcile the difference.

Solution: Generate the motion in reverse.

Instead of a start/end frame: "Character opens door and peeks through"

I did one image with the door already open: "Static shot: a stylized young woman with nervous eyes peeks through a slightly open, weathered blue wooden door into a cozy, cluttered library. Her eyes scan the space from left to right, then she hesitates, and gently pulls the door shut until it clicks."

Then I reversed the clip in DaVinci Resolve. Worked perfectly.

The Front Desk Sign Shot:

Sometimes auto-prompt actually nails it. This was one of those times:

"Slow cinematic dolly shot gliding along the curved wooden reception counter of the warm, vintage-style 'Excursion Club' lobby, the glowing golden sign letters casting soft light onto polished wood as the camera moves from close-up of the illuminated text to a wider view revealing travel posters, world maps, and brochures in the softly blurred background, shallow depth of field, cozy ambient lighting, elegant travel-club atmosphere."

I didn't write that. The AI did. I kept it.


PART IV: THE LIP SYNC NIGHTMARE

This was, without question, the most frustrating part of production.

The Problem

AI video models that do lip sync want to make your characters look like Pixar characters (AI made me hate Pixar style). Even when your base image is semi-realistic or anime-styled, the moment you add dialogue, faces become:

  • Overly cartoonish

  • Clay-like texture

  • Exaggerated Disney expressions

  • Loss of original art style

Platforms Tested

Omnihuman (Freepik):

  • My primary lip sync tool

  • Best quality when it worked

  • Major issue: Defaulted to Pixar-style facial animation

  • Required extensive prompt engineering to minimize

Higgsfield (Wan 2.5 / 2.6):

  • Wan 2.5: Degraded visual quality significantly

  • Wan 2.6: Better, but characters looked like clay

  • Abandoned after tests

OpenArt (Kling Avatar 2.0):

  • Tied with Omnihuman for quality

  • More artifacts than Omnihuman

  • Sometimes better at maintaining style

  • Used 10,000 credits testing (account worth $30/month, free via creative partnership with Open Art)

OpenArt (Standard Lip Sync):

  • Too many artifacts

  • Abandoned quickly

Hedra:

  • Terrible. Just terrible.

  • One test, never returned

My Workarounds

Since I couldn't get clean lip sync consistently, I designed around it:

1. Internal monologue Many lines play as voiceover thoughts rather than spoken dialogue. This allowed me to:

  • Use more expressive voice acting (I recorded lines myself and did voice changes)

  • Avoid mouth animation entirely

  • Create a more introspective tone

2. Off-screen dialogue Characters speak while the camera shows:

  • Another character's reaction

  • The environment

  • An object of focus

(I learned this from an episode of Invincible where the creator makes a cameo in the show)

3. Wide shots without visible mouth When characters had to speak on-screen, I used:

  • Extreme wide shots where mouth detail isn't visible

  • Over-the-shoulder angles

  • Characters facing away or in profile

4. Strategic close-ups only For critical emotional moments, I accepted the lip sync compromise and used it. But I limited this to a hand ful of shots in the entire episode.

Controlling Omnihuman's Cartoonish Tendency

Failed approach: "Generate lip sync for this dialogue"

Successful approach: Hyper-specific prompting with constraints:

[camera angle] of the [describe character] as they continue to look [direction they are facing]. Their head and eyes do not move from the position in the @Start image. No hand motion. The character is completely still and composed. Their facial expression [emotion that matches the audio you’re using].


Key techniques:

  • State the camera angle explicitly

  • Reinforce "no extra movement"

  • Direct where eyes and face should point

  • Use phrases like "completely still," "composed," "maintains position"

  • Reference the exact starting frame

Important discovery: Close-ups work MUCH better than full-body shots. Full-body lip sync introduces:

  • Arbitrary hand gestures

  • Body swaying

  • Unwanted head movement

Keep it tight on the face when you must use it.

Image quality matters: If your base image is already on the fence between Pixar-style and realism, lip sync will push it fully into Pixar. Use more realistic base images for lip sync shots.

The Multi-Character Dialogue Experiment

I tried using ElevenLabs' multi-voice prompt feature to generate a full conversation between Aza and Kate at the train station, thinking I could:

  1. Generate the conversation as one audio file

  2. Feed it into Kling Avatar 2.0 for multi-character lip sync

  3. Get the whole scene in one generation

The test: 700 credits in Kling Avatar 2.0

The prompt:

The two characters are looking directly at each other while speaking. There is no hand movement. Their motions are calm and relaxed. No smiling.


Character on the left says: "So where are you headed?"

Character on the right says: "I gotta meet Naomi, we're staying in Waterside"

Character on the left says: "Ooo fancy"

...etc.


Result: FAILED.

The model couldn't handle:

  • Two characters speaking alternately

  • Maintaining consistent expressions

  • Preventing unwanted gestures

  • Keeping them looking at each other

Solution: Split the dialogue into separate audio tracks with silence between lines, generate separate close-up shots for each character, edit them together in post.

More work, slightly functional…but ultimately less natural feeling than my previous off-screen dialogue work around. I scrapped these shots.


PART V: SOUND DESIGN & VOICE

ElevenLabs Voice Acting

ElevenLabs became my primary voice tool. I used two accounts (don't ask me how) with a combined 15,000 credits spent.

Voice Design Process:

Each character got a specific voice profile:

  • Aza: Talia - Calm, slightly raspy, never too excited

  • Jade: Tiffany - Natural and welcoming, "bubbly black girl - Delta energy"

  • Javonte: Ministar - Too cool, poetic, whispery, British-African inspired

  • Cass: [voice undecided] - Mousy, soft-spoken

  • Kate: Ivy - Sophisticated and sassy, kawaii energy

  • Surf Shack Man: Scott - Calm and welcoming

  • Surf Shack Woman: Ms. Harris - Caring Southern mom

  • Student 1: Revenant - Youthful enthusiasm

  • Student 2: Grechen - Snooty brat

  • Teacher: Ivanna - Authority figure warmth, the cool teacher

The Boy's Voice (Custom Creation):

There are NO good kid voices on ElevenLabs. All the "childlike" voices were women doing super cartoonish performances.

So I made one from scratch using ElevenLabs' voice cloning.

Result: I was STUNNED by the quality, expressiveness, and accuracy of vernacular. The voice felt authentic, natural, and emotionally present, especially considering my prompt was super basic.

Highly recommend the custom voice feature for unique character needs.

Multi-Voice Dialogue:

As mentioned earlier, I tested multi-character conversations generated in one file. The audio quality was excellent—natural pauses, realistic overlaps, proper emotional tone.

Where it failed was in video generation (lip sync couldn't handle it). But for pure audio storytelling or podcast-style content, this feature is incredible.

Sound Effects

ElevenLabs SFX:

  • Great for ambient sound loops

  • I used a mix of ElevenLabs and Pixabay

Pixabay:

  • Primary source for most SFX (free, high-quality)

  • Footsteps, door creaks, train sounds, etc.

Important note: You don't have to use AI for everything. Pixabay's library is massive and free. Why generate a door creak with AI when a perfect one already exists? "Who gon be the humans” — Jamee Cornelia

Ambient Design:

NBX's eternal sunset needed a consistent soundscape:

  • Distant ocean waves (looped)

  • Seagull calls (sparse)

  • Light wind (constant low presence)

  • Urban hum in Waterside District scenes

I layered 3-4 ambient tracks per scene to create depth without overwhelming dialogue.


PART VI: FILE MANAGEMENT & WORKFLOW CHAOS

The Disaster I Created

I create messy. When I try to be too organized upfront, it fucks up my flow. But the downside:

I accidentally deleted all of my work once.

Trashed the wrong folder. Gone. 300-500 images, hours of video tests. Had to regenerate large portions from scratch. But after this my process was much more refined so I was able to catch up to my previous stopping point in one day.

The second problem: DaVinci Resolve can't locate clips if you move files after importing. I reorganized my folders mid-production and luckily caught this before it became overwhelming to relink media, it’s just something to take note of..

What I Learned (The Hard Way)

During production:

  • Let the chaos happen

  • One big "Active Project" folder with everything dumped in

  • Use DaVinci's internal organization (bins, tags, colors)

  • Don't move files once they're imported

After production:

  • THEN organize into a proper folder structure

  • Create a master archive with clear naming

  • Export a project file with relinked media

  • Back up to external drive and cloud

My eventual structure:

Surf_Noir_EP01/

├── 01_Scripts/

├── 02_Storyboards/

├── 03_Assets/

│   ├── Characters/

│   ├── Backgrounds/

│   ├── Props/

├── 04_Audio/

│   ├── Dialogue/

│   ├── SFX/

│   ├── Music/

├── 05_Video_Generations/

│   ├── Scene_01/

│   ├── Scene_02/

│   └── ...

├── 06_Final_Edit/

└── 07_Exports/


Why this matters: When sharing your process later (like in this Dev Log), having clear organization makes pulling examples and references infinitely easier. I learned this from working at ad agencies like Vayner Media.


PART VII: THE NUMBERS - COMPLETE COST BREAKDOWN

Time Investment

Total: 60-80+ hours

This doesn't include:

  • Months of pre-production conceptualizing Surf Noir

  • The 1+ month 2D research phase (mostly abandoned)

  • Writing scripts and dialogue

  • Character design pre-work

Breakdown estimate:

  • Character design & asset creation: 15-20 hours

  • Scene composition & image generation: 20-25 hours

  • Video generation & iteration: 15-20 hours

  • Voice recording & sound design: 5-8 hours

  • Editing in DaVinci Resolve: 10-15 hours

  • Rendering & troubleshooting: 5-7 hours

Labor rate benchmark: $45-60/hour (industry standard for AI creative direction, based on my experience at tech startups, Hollywood ad agencies, and recent recruitment offers)

Monetary value of labor: $2,700 - $4,800

Platform Costs

Midjourney Pro (Annual):

  • Normal: $60/month

  • My rate: $48/month (annual discount)

  • Months used: 1 month heavy production

  • Cost: $48

Freepik Credits:

  • Initial research (before deletion): ~50,000 credits

  • Actual production: 372,300 credits

Important context:

  • Seedream 4.5 (≈95% of images) was free during beta

  • Most credits were spent on video generation, not images

  • Nano Banana Pro was used sparingly due to cost (500 credits per 4K generation)

Because Freepik’s pricing model is credit-based and tiered, exact dollar equivalency fluctuates. However, at current pricing tiers, this volume of usage would conservatively translate to hundreds to low four figures if Seedream were not free.

This is a critical point:

AI is only “cheap” if you’re not doing much with it.

Kling Video Generation:

  • Primary model: Kling 2.6 / O-1

  • Nearly all final video footage generated here

  • Other platforms tested (Sora, Veo, Minimax, Wan): ~15,000 credits total

  • Kling proved best quality-to-cost ratio

Exact Kling costs vary by plan and usage window, but this episode represents heavy, sustained generation, not casual testing.


OpenArt (Lip Sync Testing):

  • Account value: ~$30/month

  • Credits included: 12,000

  • Credits used on lip sync: ~10,000

  • Access provided via creative partnership

ElevenLabs (Voice & SFX):

  • Free tier: 10,000 credits per account

  • Two accounts used (don’t ask)

  • ~15,000 credits total

  • Covered:

    • Full dialogue

    • Internal monologue

    • Select ambient sound effects


Software Licenses:

  • DaVinci Resolve Studio: $299 perpetual license

What This Would Cost Without “Lucky Breaks”

This episode benefited from:

  • Seedream being temporarily free

  • A creative partnership on OpenArt

  • Already having DaVinci Resolve

  • Prior sunk costs in tools I already owned

If all tools were paid at market rates today, producing this episode would realistically cost:

Low estimate: $1,000–$1,500
High estimate: $3,000–$5,000+

And that’s before valuing labor.

Comparison to Traditional Animation

A traditionally animated 8-minute episode at even modest indie rates would require:

  • Storyboarding

  • Character design

  • Background art

  • Layout

  • Animation

  • Cleanup

  • Color

  • Compositing

  • Editing

  • Sound design

Even at an extremely conservative $5,000 per finished minute, you’re looking at:

$40,000+ minimum

More realistically:

$80,000–$150,000

AI didn’t just eliminate cost. It collapsed the distance between a solo creator and studio-scale output.

ART VIII: WHAT AI ACTUALLY CHANGED (AND WHAT IT DIDN’T)

What AI Did NOT Do

AI did not:

  • Write the story

  • Design the world

  • Decide the tone

  • Choose the shots

  • Maintain continuity

  • Solve narrative problems

  • Create emotional intent

Every time something worked, it was because:

  • I knew what I wanted

  • I could recognize when it was wrong

  • I had the taste to say “no”

That’s the part people ignore when they call this “slop.”

What AI DID Change

AI:

  • Lowered the logistical barrier to entry

  • Made iteration possible at solo scale

  • Allowed me to fail faster

  • Exposed weak creative instincts immediately

  • Punished vague thinking

  • Rewarded specificity

Most importantly, AI forced clarity.

If you don’t know what you want, AI will happily give you something.
That something will almost always be generic.

Why “AI Slop” Exists

AI slop isn’t a tool problem.
It’s a taste problem.

The loudest AI influencers:

  • Optimize for output, not meaning

  • Confuse novelty with substance

  • Treat art as content arbitrage

  • Never sit with a piece long enough to refine it

This project took 60–80 hours because:

  • I rejected hundreds of “good enough” outputs

  • I rewrote prompts obsessively

  • I rebuilt scenes that technically worked but emotionally didn’t

That labor is invisible to people scrolling past results.

PART IX: WHY THIS MATTERS (BEYOND THIS EPISODE)

This Was Never Just an Episode

Episode 1 is:

  • A narrative pilot

  • A workflow test

  • A proof-of-concept

  • A studio dry run

It proves that:

  • A solo creator can produce serialized animation

  • Worldbuilding can precede monetization

  • AI can be used with intention, not as spectacle

Surf Noir as a Studio Model

Long-term, Surf Noir is:

  • A transmedia IP

  • A music + animation ecosystem

  • A client-facing creative studio

  • A testbed for hybrid AI/manual workflows

This Dev Log doubles as:

  • A portfolio artifact

  • A transparency document

  • A capability statement

If you’re a collaborator, investor, or client:
This is what my process actually looks like.

PART X: WHAT I’D DO DIFFERENTLY (HONESTLY)

1. Lock Visual Language Earlier

I lost time chasing “perfect” when “coherent” would’ve been enough.

2. Design Around AI Limitations Sooner

Lip sync taught me this the hard way. Structure the story to avoid known weaknesses.

3. Commit to Hybrid Earlier

Blender + AI would’ve saved hours in subtle motion and control.

4. Ship Earlier

Nothing teaches like finishing.

PART XI: WHAT’S NEXT

  • Continued 2D research

  • Blender character workflows

  • Expanded Surf Noir episodes

  • Music-forward storytelling

  • URL + IRL integrations

  • Client work under Surf Noir Studio

FINAL THOUGHT

If you take anything from this:

AI will not save you from the work.
But it will meet you wherever your ambition already is.

If your vision is small, it’ll stay small.
If your vision is obsessive, messy, personal, and necessary — AI can help you finish it.

And finishing changes everything.