Musings on Tech

There is such a thing as a free lunch: the deepfake refund economy

bees_neeth@newsletter.paragraph.com (Praneeth) — Fri, 30 Jan 2026 18:00:40 GMT

Visual proofs are becoming a liability

In early 2024, videos began circulating on TikTok and Reddit explaining how to get “free food” from delivery apps. How? Users were taking a photo from a previous order, editing it slightly to look incomplete or damaged, and submitting it as evidence for refunds. Others were showing how AI tools could generate convincing images of spilled drinks, missing items, or poorly prepared food, matching the exact order description and packaging style of popular delivery platforms.

What’s striking about these posts is not their sophistication, but in fact, how little is required. There is a new type of fraud: one requiring no account takeover, no payment fraud, no hacking in the traditional sense. The exploit depends entirely on one assumption: that a photo submitted as proof is real by default.

That assumption is quietly failing.

Restaurants feel this most acutely, and yet, they have no choice but to absorb continuing losses from this type of behavior. Restaurants describe customers submitting photos of missing items that were never missing, or damaged food that left the kitchen intact. Couriers upload convincing proof-of-delivery images that later turn out to be reused or fabricated. Platforms issue refunds or penalties because, in the absence of stronger signals, an image still carries authority. This is leaving restaurants having anywhere between 2-3% of their topline in disputes with delivery platforms, not to mention one of the highest post-transaction fraud rates among online verticals. This is a massive problem, given a global GMV of $320B for 2025 for food delivery platforms, with DoorDash alone reporting €48.8B global GMV in FY24.

The delivery platforms which are part of this value leakage for this are starting to react, and in ways that are rather opaque, overreaching and still don't quite work. As part of an AP report on DoorDash's expanded identity checks, DoorDash acknowledged widespread account sharing and fraud, prompting new selfie verification and real-time identity checks for couriers. But the reporting also made it clear that images (be it selfies, delivery photos, verification shots) had become a central enforcement mechanism, and therefore a prime target for manipulation.

Quick taxonomy for the attack surface

I feel it generally helps to have a quick set of definitions for this threat in order to think about better defenses against the same. In my view, it's helpful to cluster along two axes: who submits the image and what claim the image supports.

From the customer: Images are submitted to justify missing items, damages or some evidence that the food is inedible. This is done via targeted edits, or with generative AI and also using tools like compression/resizing to alter the underlying metadata - or even just taking true pictures but with intentional purposes of removing/harming pieces of packaging. This works since refunds (as other experiences in mobile apps) optimizes for speed and customer satisfaction, plus reviewers are generally not trained for examining provenance, but more so plausibility. This basically leads to refund abuse being one of the largest vectors for value leakage in gig economy platforms.

From the courier: Images are used to prove attributes about the drop-off being made at the right time, the right place and the right manner. This will keep continuing especially as platforms keep tightening the loop on identity verification.

Existing controls are no longer enough

We have built layered defense systems over years to counter fraud: refund limits, reputation scores, manual review, device fingerprinting, and increasingly, biometric identity checks. These systems work well when the main constraint is identity or volume. However, they struggle when the constraint becomes evidence quality - which is hard to constrain given the rise and ease of use of generative AI. The weakest link in the system is no longer who is acting, but what proof they can submit. And that proof is no longer anchored to reality in a way the platform can verify.

Ideas on building a resilient stack to address this problem

Media provenance reframes trust from perception to verification. If synthetic evidence is the problem, the solution is not a single tool or model, but a stack - a set of not only technical but also economic controls that together make fraud expensive, detectable, and unprofitable. Generating a plausible image is cheap, while verifying it is expensive. Any solution must reverse that imbalance by pushing trust upstream (closer to capture) and pricing risk explicitly rather than absorbing it silently.

Organizations like C2PA are working on creating open standards, which can help in answering questions pertaining to media origins, content integrity, processes utilized in creating these outputs as well as who is vouching for these claims. We can leverage these in addition to what is outlined below to arrive to the first iteration of a design.

Just to reiterate: there is no single methodology that can address this problem at scale and as with all systems, layering is the best approach to strengthen such systems against systematic abuse.

Capture integrity with attested sensors

What would be nice to have, is to prove for an image/piece of media, to have been captured in-app, at a specific time and place, on a device with known security properties. Some ideas in that direction would be utilizing secure capture SDKs, hardware-backed key attestations, and ephemeral session challenges to create a packet, that can ensure that an image corresponds to a real event rather than a replay; this packet can then be verified at a minimal cost. There are some issues: rooted devices or emulators can add some elements of risk, but again, this can be managed since it becomes more expensive. This would also be nice to extend to other sensors, like microphones, and cryptographically sign signals as soon as they are captured. Further, high end iPhones also offer lidar sensing, which can be tied in boost the confidence level of pictures taken on device. To address the issues of users taking pictures of empty porches, it would be great to tie this in with media from the courier and associating timestamps to all events that are tracked - this is something that can help immensely with dispute resolution.

Cryptographic audit trails from editing libraries

Most platforms and users want to perform edits of the content they capture on their devices. As opposed to banning edits, what would be nice is to require on-device libraries to also emit signed edit receipts, or even better, produce a ZK (snark) proof on-device to show that these edits were done honestly on the raw content, which can then not only preserve privacy around the original content, but can be verified independently to also attest to the veracity of changes made - thereby creating a consistent and attributable trail, which can then be utilized in some of the following sections.

Better benchmarking against manipulation

This will always be a field undergoing constant iteration, but this could be another signal/parameter to weight as a risk estimator, as opposed to having a more brittle binary decision-maker to judge "fake or not". There are some interesting directions with newer watermarking techniques that can help with benchmarking on transformed images (like perceptual hashes or degradation aware models).

Public blockchains as trust anchors

For standards to work across a fragmented landscape of stakeholders, they need to be rooted in public trust anchors - which is precisely the role that public blockchains can play. The idea here would be to store hashes of the manifest generated to be verified independently by any party- and even better, run a verification program onchain that can leverage the proofs from the audit trails and the final output to validate the provenance of the media.

From provenance to insurance

The most interesting part about a provenance chain is not technical but rather economic. Better provenance means treating disputes as insurable events, opening the door to better analysis, better underwriting and therefore, more explicit risk pricing. In contrast to the more opaque and adversarial refund systems in use today, this could be net positive for everyone involved.

This is where restaking protocols like Eigenlayer can play a crucial role. In a restaking-based insurance model, third parties can underwrite claims by staking capital against specific classes of evidence. That capital earns yield for backing legitimate claims and is slashed when it backs fraudulent or low-integrity evidence. Over time, this creates a market that prices dispute risk dynamically, based on real outcomes rather than static rules.

From the perspective of platforms and merchants, this behaves like insurance. Claims would be pooled across different merchants, risk priced on the basis of the provenance quality. Legitimate claims are paid quickly, with merchants are protected from arbitrary losses. Platforms reduce operational overhead and as a consequence, fraud becomes economically expensive rather than operationally annoying.

Critically, this model allows pluralism. Different capture SDKs, provenance formats, or attestation mechanisms can coexist, as long as their outputs can be benchmarked and priced by the insurance layer. Trust emerges from incentives, not mandates. This is something that matters a lot given the extreme degree of fragmentation with regards to the operational setups of service providers and courier services as well as multiple delivery platforms that operate within even the same locality in a city.

In this framing, provenance provides the technical substrate, while restaking provides the economic substrate for trust.

A path forward for media provenance

There is considerable fragmentation in terms of what each of these components provides and no one solution provider to this problem. Given the economic value at risk that the margin-sensitive delivery businesses and restaurants face, it makes sense for them to invest in some near-term (in-app capture and attestation, binding context to images/video for both couriers and consumers) and mid-term capabilities (c2pa manifest, zk proof capture) and work with partners that can help with their long-term outcomes (insurance-style dispute resolution, better benchmarking) to ultimately create a better environment of trust in these marketplaces and also allow their restaurant partners to offload risk in a more convenient manner.

This feels like a more natural path to unlocking media provenance in other industries where such a trail could be quite valuable (journalism, content creation, among many others).

My thanks to Aayush, Wei Dai, Obstropolous, Ankit, Erik and Sreeram for reviewing this!

Determinism is an economic weapon: Why NVIDIA’s Groq deal is about inference, supply, and selling boxes again

bees_neeth@newsletter.paragraph.com (Praneeth) — Fri, 26 Dec 2025 16:16:16 GMT

NVIDIA’s $20B non-exclusive IP licensing deal with Groq looks puzzling if examined through a purely architectural lens. On the surface, it appears to pair the world’s most successful GPU vendor (with a software moat built around abstraction) with a company best known for a deterministic, VLIW-style inference accelerator—an approach many people associate with brittle compiler tooling, hardware-dependent scheduling complexity, and poor long-term ergonomics for end-users.

This is a perspective that tries to address some of this, by giving an alternative explanation related to economics and distribution as opposed to instruction sets (or leveling the competition or geopolitical deals to sell more hardware to China). This also came about at 3am on the occasion of peeking at Max Weinbach's twitter thread putting my infant daughter to bed, so take it with a huge pinch of salt.

tl;dr NVIDIA’s response is to turn inference into a sellable product again, not merely a cloud workload—by increasing utilization per square millimeter, reducing memory pressure, and making deterministic inference deployable outside hyperscaler walls. This will be done via moving determinism downward—into hardware, microcode, and compiler/runtime layers—while keeping user-facing abstractions stable. The result is a plausible future where GPUs support multiple execution personalities, including a Groq-inspired deterministic inference mode, without exposing VLIW complexity to developers.

Core Tensions

From a strategic standpoint, there are 3 key pressures NVIDIA is seeking to address:

Inference workloads that are more irregular and latency sensitive, which are points that hyperscalers could look to capitalize on (along with them integrating vertical silicon)
Bottlenecks from TSMC, soaring HBM demand and CoWos
Enterprise inference workloads that need to be on-prem and have guarantees around predictable outcomes and latency

Inference

Inference is no longer batch-friendly, relatively uniform, and tolerant of latency variance + it isn't looking to be amortized behind the same infrastructure that was set up for training. These days, inference is increasingly dominated by low-batch or batch-one workloads, strict latency requirements, and memory pressure rather than compute pressure, especially when you consider this from an enterprise standpoint. KV caches grow quicker; mixture-of-experts, speculative decoding, and conditional execution introduce irregularity that GPUs can handle, but not always efficiently.

Deterministic execution is possibly now, in NVIDIA's viewpoint, a first-class goal to be optimized for. Inference is no longer a throughput problem, but one of utilization - and therefore, an economic problem.

Supply constraints

Well known fact: NVIDIA’s real constraint today is not demand. Demand is effectively unbounded. The constraint is supply: access to advanced TSMC nodes, CoWoS packaging capacity, and high-bandwidth memory. At this stage, adding more theoretical FLOPs to a die matters less than extracting more useful work from every square millimeter of silicon and every byte of memory bandwidth already available.

Steps towards determinism - as a demand and supply strategy

The most underappreciated consequence of this shift in inference is commercial rather than technical. Deterministic, high-utilization inference makes it possible to sell inference as a product again.

For the last decade, hyperscalers have dominated compute (and inference) by renting it. This works best when workloads are elastic, latency variance is acceptable, and scale hides inefficiency. It works poorly when data must stay on-prem, latency must be bounded, and customers want capital assets rather than usage-based APIs.

Enterprises, regulated industries, defense, healthcare, and industrial customers increasingly fall into the latter bucket. They want predictable performance, known costs, vendor support, and systems they can deploy inside their own environments. Hyperscalers are not particularly good at selling boxes into these markets; but NVIDIA is.

NVDIA's architectural transition

Notably, this is also something we're seeing in NVIDIA's architecture post-Blackwell. Tensor Core execution is less warp-centric, Tensor Memory allows for explicit reuse as opposed to positioning it as a cache heuristic and there are multiple mentions of long-lived tensor pipelines (Blackwell is already hinting at this). More responsibility is being pushed into the compiler and runtime, while the user-facing abstraction remains CUDA or Triton.

This is also where Grok's ideas matter. Grok's positioning itself on three pillars: deterministic execution, predictable memory access and good utilization. Deterministic execution, explicit dataflow, and pipeline-centric design allow Groq hardware to deliver consistent latency and very high effective utilization for a narrow class of inference workloads. NVIDIA doesn't likely care about VLIW, but rather this utilization focus and discipline.

Chiplets is where all of this comes together (hopefully?)

Chiplet-based designs allow NVIDIA to physically separate concerns: tensor-heavy compute regions optimized for persistent pipelines, control-heavy regions for irregular logic, and tiered memory that reflects actual access patterns rather than a flat illusion. Predictable traffic reduces interconnect pressure, locality improves, and memory reuse can be guaranteed rather than hoped for, which means higher yield, (theoretically) better binning, and more usable SKUs per wafer. This basically means using the same silicon and same software stack, but enable different execution capabilities.

The best part - moving away from the supply constraints outlined earlier.

A plausible chiplet layout

1) As opposed to monolithic SM's, one can imagine multiple tensor-heavy compute tiles with reduced warp scheduler complexity and fewer CUDA cores per tile. The idea would be to own local TMEM/SRAM and toggle between dynamic GPU computation or static inference mode, with a focus on compute utilization and deterministic pipelines. Groq IP could essentially take the form of a static sub-ISA inside these GPU's - not exposed publicly, but used for specific inference graphs by recognizing these (transformer decode, MoE, cache updates) and then locking these into a deterministic pipeline.

2) We don't need control logic polluting these tensor dies, so there could be a separate die for orchestration, CUDA-heavy workloads ; these would feed tensor pipelines and talk over NVLink-C2C.

3) Tiered memory chiplets that are meant to be used per their specific capabilities- HBM for KV cache/weights, SRAM for hot activations/reuse-heavy ops and DDR for the rest

4) Interconnect as the key underpinning factor here, tying everything together neatly

Where can this break down?

Static or semi-static inference pipelines only work for stable/common model patterns. Rapid model churn erodes their value. Compiler and runtime complexity grows exponentially, and NVIDIA can only justify that investment where scale exists. Inside hyperscalers, vertically integrated silicon will continue to dominate workloads tightly coupled to internal models and infrastructure.

What does this all mean?

Inference is simply too valuable to be given up on; NVIDIA might be seeking to productize this as a box and sell this to enterprises as opposed to hyperscalers (which is in a similar vein to the point that Ben Thompson was making in his Stratchery article earlier this month). The bet here is that Nvidia needs to keep its throne and not let the inference game get away from it - and it will do what it does best: sell inference as a product by controlling hardware, software and distribution.