# The paradox of legal AI

By [Staram Newsletter](https://paragraph.com/@staram) · 2025-07-05

legal-ai, legal-semantics, human-layer, context

---

### Word objects

As lawyers, when we hear about the inner workings of large language models (LLMs), we either switch off or turn away.

That's unfortunate because the key idea behind them is rather intuitive.

Terms like 'neural embeddings', 'word vectors' and 'high dimensional spaces' sound like sci-fi.

But beneath them lies a simple formula: [King - Man + Woman = Queen](https://www.technologyreview.com/2015/09/17/166211/king-man-woman-queen-the-marvelous-mathematics-of-computational-linguistics/).

When we 'subtract' man from king, an abstraction (monarch) remains. To this remainder, we can 'add' woman to reach queen.

That's more or less it.

By turning words into computational objects, LLMs are able to perform mathematical operations (addition, subtraction, products, graphs, distributions etc.).

At scale, this results in bots that can ‘chat’ like they have a mind of their own.

But that's [a complex illusion.](https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf)

They’re simply turning words into numbers, sequences into arrays, contexts into dimensions, and meanings into probability distributions.

It’s a powerful process, but not without limits.

We know that:

1.  words can never be perfectly represented as numbers, and
    
2.  meaning is constantly shifting.
    

\*\*\*

In semantic space, there are unpredictable roads to the same destination.

For instance, we can get to 'queen' in any of the following ways:

1.  (Authority to Make Law - Modern Law) + Woman,
    
2.  Name\_set (Freddie Mercury + Brian May + Roger Taylor + John Deacon),
    
3.  Find\_highest\_common\_factor (Catherine + Victoria + Elizabeth + Cleopatra).
    

If a model is well-trained, it will slot 1 and 3 near 'monarch', and 2 near 'rock bands'.

Depending on the query, it will then retrieve one or the other.

This remarkable ability — to find bests routes through a maze of semantic possibilities — arises because terms are not treated as solid boxes.

If you open a language model, you won't find a dictionary-like arrangement of meanings.

Instead, you'll find magical boxes whose meanings are derived from their 'spatial' interrelations.

These relations are constantly coded and recoded as arrays (numerical indices) and vectors (directional pointers).

That's why LLMs require many 'dimensions'. It's a way for them to plot new meanings as they ingest more data.

Put simply, word-objects:

1.  are meaningless in isolation,
    
2.  move in multiple directions,
    
3.  exist in different dimensions, and
    
4.  are dynamic configurations.
    

If this weren’t complex enough, new elements alter the probability-space by introducing fresh relations and short-circuits.

That's not all.

In order to keep pace with poets and GenZ, LLMs need steady streams of fresh data.

This has two implications:

1.  If these streams get noisy or polluted, the model’s semantic space can suffer.
    
2.  If they dry up altogether, the model can become outmoded.
    

For the same reasons, it’s impossible to create a ‘legal AI’ without creating a ‘legal semantic space’.

That is to say, 'natural language' cannot become 'legal language' without passing through a filter.

![](https://storage.googleapis.com/papyrus_images/e033adebfd1ea087b775c3ab83006a50.png)

### Sociological filter

The problem is twofold:

1.  this filter is constantly evolving, and
    
2.  it uses the same materials (i.e. words and phrases) as natural language.
    

![](https://paragraph.com/editor/callout/information-icon.png)

If you're unsure whether such a filter even exists, consider the following examples around legal 'personhood':

1.  Non-lawyers often ask: how can a company and an infant both be persons in law?
    
2.  Lawyers often ask: whether a chatbot should be treated as a person for legal reasons?
    

One is a necessary rabbit hole; the other is a post-human swamp of moral confusion.

Both are 'unnatural' uses of the word 'person', but are perfectly coherent in legal discourse.

Pithily put, legal language rests on natural language but is not reducible to it.

Legal principles and doctrines can travel seamlessly between jurisdictions, which shows that they treat natural languages as mere vehicles.

In mathematical terms, we might say that the two exist on different 'planes'.

That's why there is **no direct route from NLP (natural language processing) to legal reasoning.**

\*\*\*

In truth, there's another element in the filter: the institutional layer of lawyers, judges, parliamentarians and others who have [the sociological power of legal-meaning-making](https://repository.uclawsf.edu/cgi/viewcontent.cgi?article=2905&context=hastings_law_journal).

![](https://storage.googleapis.com/papyrus_images/39c0bf8200da04fcc83d5c554dbbfdd8.png)

This does not mean that computational linguistics cannot be harnessed for legal work.

Far from it.

LLMs have already replaced search engines and become the main source of first drafts.

Since legal work is entirely textual, there's no doubt that AI will continue to transform its three main aspects: reading, writing and research.

Lawyers of the future will be able to:

*   search the entire universe of legal knowledge from the same dashboard,
    
*   discover and isolate the most relevant portions, and
    
*   write drafts with accurate references.
    

We can call these the _Ctrl+F_, _Zoom-in_ and _Auto-complete_ aspects of legal AI. They are not so much concrete functions as developmental trajectories.

The question is simple: how do we get to such a future?

\*\*\*

Let's not forget that, in addition to being textual, legal work is also adversarial.

We cannot 'generate' legal knowledge without passing through contestation and debate.

Moreover, such contestation must be open and institutionally sanctioned.

That's why the three primary sites of legal-meaning-making are the parliament, the courtroom and the classroom.

To be sure, nothing prevents a lawyer from using AI for work. But nothing prevents his adversary from using the same tools to challenge him.

Which brings us to the central paradox:

![](https://paragraph.com/editor/callout/tip-icon.png)

If we give legal AI the freedom to generate interpretations 'autonomously', i.e. without human intervention, its outputs will be easily challenged through adversarial human intervention, which will in turn prevent them from becoming legal.

We cannot escape this paradox by trying to capture the essence of legal reasoning.

That's because language becomes 'legal' not through semantic, but sociological, properties.

Different judges, parliamentarians, lawyers and academics may support the same legal rule using completely different rationales.

Which tells us that **a rule's 'validity' is derived not from the moral or logical reasoning behind it, but from the sociological power vested in specific communities to create and validate legal statements.**

In short, AI can help lawyers come up with novel reasonings, but these cannot become 'legal' on their own.

They must pass through a filtering layer, which is nothing but [the community of legal interpreters](https://www.hup.harvard.edu/books/9780674467262).

Many legal AI projects are trying to get rid of this [human layer](https://www.buzzsprout.com/2460445/episodes/17294785-beyond-ai-s-blackbox-building-technology-that-serves-humanity) because it's costly, error-prone, and does not scale well.

But there's no other way to give natural language 'legal' attributes.

![](https://paragraph.com/editor/callout/information-icon.png)

Removing domain experts is like killing the goose.

The problem is the exact opposite: how do we get _more_ of them to engage with AI, not as an individual weapon, but as a collective asset.

The labour of legal semantics is 'costly' for a reason. It is the bedrock of the entire industry's value proposition.

In other words, the only way to solve the above paradox is to make 'AI-aided interpretations' acceptable within the legal fraternity.

And that requires a common semantic layer which everyone can trust.

![](https://storage.googleapis.com/papyrus_images/932ed9a9f70f36be29c2734da5ca8d05..svg)

_An open-source data layer that rewards semantic production by creating trustworthy access-points for domain knowledge._ (right-click, open in new tab for animation)

### Infinite forest

Later articles will cover Staram's architecture in more detail.

For now, let us simply locate the project within a broader [shift to context.](https://www.anthropic.com/news/model-context-protocol)

Andrej Karpathy, for instance, has recently noted that ['context engineering'](https://x.com/karpathy/status/1937902205765607626) is preferable over 'prompt engineering'.

In his words, it is "the delicate art and science of filling the context window with just the right information for the next step."

That's what Staram's knowledge graphs are designed for: providing trustworthy legal context to LLMs as a microservice.

\*\*\*

Consider the broad area 'Indian arbitration law'.

Its context includes caselaws, statutes, bye-laws, parliamentary debates, committee reports, international instruments and so on.

Together, these materials constitute the 'base layer' (स्तरम) of legal interpretation.

Individual lawyers and judges may diverge on interpretations of the 'appointment of arbitrators', but they will converge on the text of section 11.

In fact, the former (divergence) is not possible without the latter (convergence).

Now the important question: how should we interpret Karpathy's 'just the right information'?

Clearly, we must stay within the bounds of black letter (or positive) law.

Section 11's 'right context' includes internal (sections 2, 7 etc.) as well as external references (Act 3 of 2016, Act 33 of 2019, UNCITRAL Model Law etc.).

This gives us the root and trunk of the tree of 'Indian arbitration law'.

Upon it, divergent branches, leaves and fruits (individual interpretations) can grow.

Over time, certain branches will gather enough community-weight to become part of the root-trunk-complex (as in a banyan).

Others will lose support, dry up, or require pruning.

Under the surface, some roots will connect with other trees (e.g. contract law, company law).

The point of the metaphor is simply this: legal interpretations grow _over_ each other.

Which is why we must clear the ground — not for static databases but — for self-expanding forests in which lawyers can plant, share, grow and exchange the fruits of their interpretation.

More on that in the next one.

* * *

---

*Originally published on [Staram Newsletter](https://paragraph.com/@staram/paradox)*
