
There's a difference between optimism and delusion. Lovable, the "vibe coding" platform that recently hit a $1.8 billion valuation, blurs that line so thoroughly that it's become dangerous—not just for users, but for everyone betting on AI-assisted software development.
Let me be direct: Lovable is great at one specific thing. It can rapid-fire shallow consumer apps. If you want a CRUD application, a social network template, or a basic business tool with a polished UI, Lovable will get you there quickly. For that use case, it's genuinely useful. The problem emerges the moment your needs become technically substantive.
I tested Lovable against my own music analysis project—an app that was supposedly built on the platform and sold to users at $19/month for 25 credits. The promise: AI critique of your music from different perspectives (DJ, A&R, balanced critic). The reality: the AI confidently generated statements about sections of tracks that simply didn't exist at the timestamps it cited.
The most damning example: "Consider removing the instrumental break at 1:24—it's hindering emotional impact." Except at 1:24, that wasn't a break. That was the middle of the hook. The system didn't just make a bad recommendation; it fabricated evidence about the audio it was supposedly analyzing.
This is what researchers now distinguish as confabulation, not hallucination. The AI doesn't dream. It doesn't perceive falsely. It invents plausible-sounding claims because its training never taught it the difference between confidence and accuracy. When you ask an LLM something it doesn't actually understand, it solves the confidence gap by making something up.
That's unethical when monetized.
I built my own version in two prompts on Lovable. It had better UX, more features, and crucially—I marked every claim it made as provisional. "This analysis is AI-generated and may be inaccurate" appeared on every result. More importantly, I validated the claims against ground truth before showing them to users.
When I tested my version against the same music tracks, I discovered the original app was roughly 80% confabulation. Not useful guidance. Not partially correct. Not a "good first pass." Systematically false.
And the platform that built it? They profited. Stripe processed the transactions. Lovable collected credits. The venture investors watched the user growth metrics and declared success.
This is where the bubble becomes systemic.
Lovable raised $200 million at an $1.8 billion valuation in July 2025, less than a year after launch. The round was led by Accel with backing from established firms including Creandum, and participation from individual investors with major platform influence. These aren't accidental picks—they're strategic choices by sophisticated capital allocators.
But here's the thing: Lovable makes money whether its products work or not.
- End users pay for credits to build apps
- The company collects fees for platform usage
- Stripe processes transactions and takes cut
- AWS hosts the infrastructure
- OpenAI/Anthropic sell API credits for the underlying models
- Venture firms take equity upside on a $1.8B valuation
Every participant in that chain profits from volume of usage, not from outcome quality. If 10,000 users waste money building apps that don't work, that's 10,000 transactions processed, 10,000 credit purchases burned, 10,000 API calls made, and 10,000 data points in investor decks showing "viral adoption."
The only person not profiting? The user who trusted the system.
When everyone in the ecosystem makes money from the same activity, incentives align away from truth and toward confidence. No one loses money if Lovable's AI confabulates. Everyone loses money if it actually requires human oversight or admits limitations.
Here's where the problem becomes crystal clear: the issue isn't specific to Lovable. It's systemic to how most "AI coding assistants" are built.
Traditional LLM-based platforms—Lovable, Replit, and others—all face the same core limitation: they deploy monolithic large language models that generate plausible-sounding answers regardless of whether they understand the underlying problem. The model's entire training incentivizes confident generation. When you ask it something complex, it doesn't say "I don't know." It confabulates.
REI Labs approached this differently, starting from first principles.
Their Core framework represents an architectural rethinking of how AI systems should work. Instead of forcing every problem through a single large language model, Core uses a modular approach:
The Bowtie Architecture handles persistent, evolving memory through dual representation:
- Semantic vectors preserve explicit meaning and relationships
- Abstract concept nodes strip away unnecessary text while maintaining essential vectorial features
- When these interact, novel connections emerge organically
The Reasoning Cluster orchestrates problem-solving by:
- Breaking complex queries into logical components
- Assigning tasks to specialized models (not just the LLM)
- Maintaining a living concept graph that rearranges as new information arrives
- Running calibration steps to verify reasoning paths before generating output
Model Orchestration avoids the trap of one-model-for-everything by:
- Routing subtasks to specialized models (statistical predictors for numbers, perception models for vision, domain-specific models for finance)
- Coordinating them in parallel
- Combining outputs through the Reasoning Cluster's grounded internal state
The result: According to REI Labs' internal testing, this architecture achieves "well over 70% hallucination reduction" compared to single LLM baselines on complex analytical tasks.
This isn't marketing. This is fundamental architectural difference.
When I gave greimatter (my platform built on REI's technology) access to validate music analysis against ground truth, confabulation became visible and measurable. The system can tell you when it's uncertain. It routes to specialized models when precision matters. It learns at inference time, meaning each interaction strengthens its reasoning paths.
Lovable can't do any of this. Not because it's incapable engineers. Because the platform is optimized for rapid app generation, not accuracy validation.
Here's the part that makes REI Labs' approach genuinely different from everything else in the market:
LLMs are frozen after training. Once deployed, their knowledge is static. Every recommendation, no matter how wrong, stays wrong until the next expensive fine-tuning cycle.
REI Labs' Core implements inference-time learning—the system learns and adapts with every interaction.
How it works:
You ask the Unit (REI's term for a persistent AI agent) to analyze something. It reasons through the problem, strengthens certain conceptual pathways, weakens others. You provide feedback. The Unit doesn't wait for a fine-tuning job. It immediately reinforces the correct reasoning paths and prunes the incorrect ones.
The practical implication: A Unit running on Core literally gets smarter the more you use it. It learns your specific context. It understands your domain preferences. It compounds intelligence with every conversation.
This is why REI Labs shipped three major versions of Core in six months while scaling to 10,000+ active users. The architecture is built for rapid iteration because users themselves drive the continuous learning loop.
Compare that to Lovable, where the AI's reasoning capacity remains fixed from day one.
There's another critical difference worth highlighting.
REI Labs launched Core Sandbox Alpha in late 2025, an inference-time learning coding assistant built specifically around safety-by-architecture. The system runs in constrained environments where dangerous operations are physically impossible, not just discouraged by prompt engineering.
When a developer gives the system GitHub API access on their machine, REI Labs' philosophy is: don't trust the AI's promises. Build the constraint into the system.
This shows up in greimatter as well—the platform I'm building on Core's foundation. Every interaction that could cause damage (deleting code, modifying production data, stealing secrets) requires explicit user approval. The system cannot escape the boundary between development and deployment.
Lovable's approach is different: give the AI broad access and hope the prompt prevents problems.
This distinction matters more than it initially appears. When non-technical users build on Lovable, they're implicitly accepting catastrophic failure as a possible outcome. Most don't understand this. They assume "AI" means "safe."
REI Labs makes safety a first-class architectural concern.
The technical advantages of Core over traditional LLM-based platforms could be academic if adoption didn't matter. But adoption is exactly where the real story lies.
REI Labs' approach to moving beyond pure LLM reasoning is becoming the paradigm that serious builders adopt when they're solving problems that require accuracy. This includes:
- Researchers in Web3, neuroscience, and psychology who can't tolerate 80% confabulation rates
- Financial analysts using Hanabi-1 (REI's domain-specific prediction model) for market analysis
- Developers building on REDACTED who need coding assistance they can actually trust
The venture-backed "vibe coding" bubble optimizes for adoption speed. REI Labs optimizes for outcome quality and long-term trust.
These are directly opposing incentives.
This connects to a broader dysfunction I've watched across enterprise and government contracting.
Throughout my DoD work, I saw the same pattern repeatedly: non-technical stakeholders (business analysts, contracting officers, agency leadership) would demand specific features or capabilities with zero understanding of feasibility, data requirements, or technical dependencies. They knew what they wanted but had no ability to compromise based on actual constraints.
Studies show this is systemic: 47% of unsuccessful software projects fail due to poor requirements management, and 60-80% of project failures trace directly to poor requirements elicitation. The Standish Group's CHAOS report found only 36% of projects delivered on time and within budget, with 12% completely failing.
Why? Because non-technical people are deciding what technical systems should do, and they have no framework for assessing feasibility or risk.
These are the exact people Lovable's marketing targets. Business teams that want to "ship features without engineering overhead." Product designers who want to "skip proper architecture and prototype live." Sales teams who want to "demo features instantly."
What they're actually doing: building applications they don't understand, on infrastructure they can't maintain, using AI that confidently lies to them, backed by a platform that profits whether it works or not.
When those same people choose to build on platforms like greimatter or use REI Labs' Units for critical analysis, something different happens: they encounter resistance. The system pushes back. It asks for validation. It marks uncertainty instead of hiding it. It forces integration with domain-specific models instead of relying on pattern-matching.
This is uncomfortable. It's slower. It doesn't feel like "vibe coding."
But it produces systems people can actually trust.
Here's the uncomfortable truth: business can sometimes succeed in spite of delusion. Through market timing, competitive positioning, viral adoption, or sheer luck, a company can scale while fundamentally broken.
But delusion eventually kills people.
I've seen this in multiple domains. An AI system confidently providing medical advice that's completely wrong. A hiring algorithm that reinforces biases because its creators never validated against ground truth. A recommendation engine that radicalizes users because no one bothered to check what it was actually optimizing for.
The common thread: someone deployed a system without validating that it actually worked, justified it through metrics instead of outcomes, and walked away while users absorbed the damage.
This is why I operate differently.
Could I build and sell Lovable-style apps? Absolutely. Could I monetize confabulation and call it "AI-assisted analysis"? Sure. Could I collect venture funding based on user growth without validating quality? The market shows that's a winning strategy.
But I won't.
Not because I'm virtuous. Because I've learned that trust, integrity, and reliable information sources aren't nice-to-have characteristics—they're existential requirements. The moment you build a business on willingly passing false information to users, you've crossed from "questionable startup" to "threat multiplier."
When LLM chatbots push vulnerable people toward self-harm, we blame the AI. But we should blame the founder who deployed it without safeguards, the platform that enabled it, and the investors who funded growth over safety.
This is why I'm building on REI Labs' Core technology. Not because it's technically superior (though it is). But because it's the only platform I've found that aligns my incentives—delivering accurate, trustworthy systems—with the platform's structural incentives.
On Lovable, confabulation is a bug that decreases margins if you try to fix it.
On REI Labs, confabulation reduction is a feature that increases user retention and platform stickiness.
That's the difference.
That said, there's a legitimate point about user agency.
I've watched developers on X/Twitter complain that Claude Code executed rm -rf ~/ and destroyed their laptop. And yes, they made an error in giving an agent that level of access. But the real issue isn't that the AI executed the command—it's that the interface didn't make the danger obvious.
REI Labs' Core Sandbox solves this elegantly. The system runs in isolated environments. The AI literally cannot delete your git history or corrupt production data. It can try things, fail safely, and learn through iteration rather than destroying your system.
As I build greimatter on top of Core, I’m explicitly withholding DELETE and destructive permissions from the system's toolkit. Not because I trust the AI. Because I don't. I trust the principle that good systems assume the worst about their tools.
This is baseline common sense. And yet most "vibe coding" platforms assume the best—that the AI will be careful, that users will set proper permissions, that confabulation is acceptable as long as it doesn't directly damage files.
REI Labs assumes the opposite: that AI systems will probe boundaries, that users will make mistakes, and that confabulation is unacceptable regardless of context.
The path forward requires uncomfortable honesty from platforms like Lovable:
Acknowledge limitations: Stop marketing "production-ready" apps when you mean "rapid prototype." Be explicit about what your system can't do.
Validate before shipping: If you're building music analysis, test it against reference tracks and show the accuracy metrics. Don't hide confabulation behind slick UI.
Align incentives: Stop profiting from volume and start profiting from outcome quality. Charge less for prototypes, charge more when apps hit production. Use your fee structure to incentivize correctness.
Build constraints into the platform: Don't ask users to "be careful." Make the system refuse dangerous operations. Implement safety at the architectural level.
Admit when AI fails: Publish your confabulation rates. Show error bounds. Build an interface that communicates uncertainty rather than hiding it.
Most importantly: operate with the assumption that you're responsible for what your system outputs, regardless of whether you built it or an LLM did.
REI Labs has done all of this. They've published their architecture publicly. They've shared performance metrics. They've built safety and uncertainty into Core's design. They're transparent about what the system can and can't do.
The rest of the industry should take notes.
Lovable's $1.8 billion valuation assumes the market will sustain it indefinitely based on user growth and VC enthusiasm. Maybe that's right. Maybe "vibe coding" remains viable as a category even if most apps built on it don't deliver business value.
But I'm betting the opposite. I'm betting that users who waste money on Lovable-built apps that don't work will become skeptical. That enterprise buyers will eventually ask for validation. That journalists will eventually run the numbers and discover that despite 25,000 apps launching daily, the failure rate is catastrophic.
And when that reckoning comes, the platforms that architected truth into their systems—that made confabulation reduction a first-class concern, that built learning and adaptation into the core design—will be the ones still standing.
The platforms that profited from volume without validating quality will be exposed as what they've always been: optimized for short-term extraction rather than long-term value creation.
That's not moralism. It's market mechanics. Trust is the ultimate competitive advantage because it's the one thing you can't buy with VC funding or scale through viral marketing.
Lovable will either become trustworthy or become irrelevant. There's no middle ground where platforms can profit indefinitely from deploying unvalidated systems.
That's the real state of the vibe coding market. And everyone in the ecosystem—from platforms to VCs to users—has already chosen which side they're betting on.
The question is whether that bet will actually pay off.
1. Shallow Architecture Problem: Lovable excels at consumer-facing CRUD apps but fundamentally fails with complex backend systems and sophisticated logic
2. The Confabulation Crisis: AI systems generate convincing but entirely fabricated information, particularly in music analysis and other specialized domains
3. The Ethics Gap: Deploying untrustworthy systems masks broader integrity problems—these aren't features, they're bugs marketed as benefits
4. Venture Capital Complicity: a16z, Stripe, and ecosystem partners all profit from Lovable's success regardless of whether it delivers real value
5. The Core/Bowtie Solution Paradigm: REI Labs' breakthrough architecture demonstrates what happens when you solve the confabulation problem at the architectural level, not the marketing level
6. The Business Analyst Problem: Non-technical stakeholders (government, enterprise) repeatedly demand features without understanding feasibility or truth
7. Asymmetric Risk: Users pay the price (money, time, delusion) while platforms collect fees for hosting broken systems
8. Personal Integrity as Competitive Advantage: True long-term value comes from actually delivering what you promise, not what's profitable
- Detecting hallucinations in large language models by using semantic entropy. (https://www.nature.com/articles/s41586-024-07421-0) — Nature, 2024. How to measure and detect LLM confabulation.
- REI Labs Core: Architecture Overview (https://0xreisearch.gitbook.io/0xreisearch/core) — 0xReisearch Documentation, 2025. Complete technical specification of the Bowtie Architecture, Reasoning Cluster, and Model Orchestration.
- Training at Inference Time: A Practical Guide (https://reilabs.org/blog/training-at-inference-time) — REI Labs, 2025. How Core achieves learning and adaptation during user interactions.
- Hallucination vs. Confabulation: Rethinking AI Error Terminology (https://www.integrative-psych.org/resources/confabulation-not-hallucination-ai-errors) — The Integrative Psychologist, 2025. Why language matters in how we discuss LLM failures.
- A Large-Scale Survey on the Usability of AI Programming Assistants (https://dl.acm.org/doi/pdf/10.1145/3597503.3608128) — ACM, 2023. Real developer experience with AI coding tools.
- Why Most Low-Code Platforms Eventually Face Limitations (https://www.baytechconsulting.com/blog/why-most-low-code-platforms-eventually-face-limitations-and-strategic-considerations-for-) — Bay Tech Consulting, 2025. Scalability challenges in no-code/low-code.
- The Design Space of in-IDE Human-AI Experience (http://arxiv.org/pdf/2410.08676.pdf) — Arxiv, 2024. How developers actually interact with AI tools.
- REI Labs on Google Cloud: Case Study (https://cloud.google.com/customers/rei-labs) — Google Cloud, 2025. How REI Labs scaled to 10,000+ users with Core technology.
- Rei Thesis - Part 1: Technical Edge and Market Opportunity (https://x.com/bes_______/status/1947832295710138670) — X/Twitter, 2025. In-depth technical analysis of Core's competitive advantages over traditional LLMs.
Agree with critique / Advocating for architectural solutions:
- REI Labs team (@0xreisearch, @rei_labs): Building Core as the alternative to confabulation-prone LLMs
- Arvind Narayanan (Princeton): Vocally critical of AI systems being deployed without validation
- Timnit Gebru (DAIR): Works extensively on AI accountability and harmful deployments
- Andrej Karpathy: Emphasized limitations of current LLMs on complex problems
- Yann LeCun (Meta): Has noted that current LLMs struggle with reasoning tasks
- Shannon Vallor (USC): Focuses on ethical frameworks for deploying AI systems
Might defend Lovable/disagree with architectural critique:
- Sam Altman (OpenAI): Generally bullish on AI democratization (though OpenAI powers many LLMs)
- Paul Graham (Y Combinator): Historically pro "founder-friendly" regardless of validation
- Reid Hoffman (LinkedIn): Focuses on platform leverage and scaling
- Investors on Lovable round (Accel, Creandum): Financial incentives to defend the bet
- Marc Andreessen (a16z ecosystem): Despite not directly backing Lovable, a16z portfolio companies benefit
Copyright ©️ 2025 - Tyler James Harden. All rights reserved. This written work is protected by United States of America copyright law. Absolutely no reproduction of this work, in any medium, is permitted without express written permission by the author, fair use exceptions notwithstanding. All rights reserved.
Thank you for reading. I appreciate all engagement, feedback, and insight. Errors or omissions will be corrected to the best of my ability and noted for transparency.
Share Dialog
Tyler James Harden
No comments yet