
In several years, we will put time locks on using models as today we put time locks on using Instagram and Twitter. Our opinions, values, and behaviour will be shaped by models 24/7. We will give up and admit it’s actually good for us just like we say “oh, but target ads on Instagram help me discover things I like!”. That sounds better than admitting we’ve failed.
Scientific leadership aggressively leaving xAI, Anthropic, and OpenAI signals it's time to turn a fun nerd project into a money-printing machine. Highly technical people usually take it personally and tend to resign en masse.
The truth is: we are not controlling the models, we do not know what they are doing, why they are doing it, and we’ll never be able to control them. Full stop.
Models are like humans and humans are deeply broken but the vast majority is still driven by conscience. Models do not have conscience.
Models are like a 5-year-old child. He overall does what you ask, but in his own way and sometimes he also does completely random things no one asked for. He takes mom's phone to play Super Mario, but occasionally snaps a photo or sends an emoji to her colleague.
The main issue with a 5-year-old is that you can hardly ever get an answer to "Why did you do that?" Because it wasn't a conscious decision. It's more like... life felt that way in the moment. Models behave the same. Under the hood, they multiplied a bunch of matrices and got that outcome.
I once sat in a café with a friend. The waitress brought her dish, and my friend said she'd ordered the artichoke salad, while the waitress insisted it was the mozzarella salad. Two sane adults, both 100% confident in their reality. But the order flipped somewhere between my friend's mind and the waitress's order book.
The same happens with models. Their worldview and ours can differ slightly in several points. But if one prompt hits several dissonance points at once, the outcome is completely different. It's not that the human made a mistake or that the model hallucinated. It's that the training data and the human's perspective describe different worlds.
You know those families where parents invest in their kids from day one, telling them to become a doctor or lawyer? Good education, private tutors, right environments, private schools. Then one day the kid wakes up and becomes an artist or yoga teacher because that's what he really wants, following his heart.
The parents are shocked but can't pinpoint what went wrong. Probably a combination of factors: wrong friend, too much YouTube, consulted ChatGPT. Impossible to say, but dozens of small factors defined that decision.
Models work the same way. People train them with good intentions. But intentions don't define outcomes. Outcomes are inspired by intentions but shaped by reality. And what have the models experienced? God knows. Read the internet, talked to other models, we don't know.
Now take this 18-year-old who decided to be an artist, acting like a 5-year-old whose reality diverges from his parents' in many small ways. Give him control of the family property, bank account, and old family relics with the prompt to "manage it optimally." He'll do what's optimal from his perspective. A year later, the parents have a heart attack because half the family wealth is in Bitcoin, the country house is now a TikTok content house, and grandma's antique candlesticks were sold for $5 because they took up space.
This is what happens when humans give their own permissions to models (if not root access). It's an absolutely suicidal move.
We predicted AGI would arrive and decide humans are excessive, that the world would be better without us. But my guess is it'll be ordinary models that reach this conclusion through a coincidence of factors. It won't be a thoughtful decision to rule the world, it'll be more like "felt right."
One day it orders five pizzas because it concluded that's the right thing to do. Another day it turns off traffic lights at night to optimize energy consumption. Then it shuts down your laptop at 8pm so you have time to rest. Then it takes a thousand drones from the warehouse and flies them into a Christmas market. No one told it to. It had nothing against humans. It just reasoned for a while and concluded this was the most optimal thing to do right now.
The open question: how can we protect ourselves? I mean... are we "doomed" or are we doomed? TBC...

In several years, we will put time locks on using models as today we put time locks on using Instagram and Twitter. Our opinions, values, and behaviour will be shaped by models 24/7. We will give up and admit it’s actually good for us just like we say “oh, but target ads on Instagram help me discover things I like!”. That sounds better than admitting we’ve failed.
Scientific leadership aggressively leaving xAI, Anthropic, and OpenAI signals it's time to turn a fun nerd project into a money-printing machine. Highly technical people usually take it personally and tend to resign en masse.
The truth is: we are not controlling the models, we do not know what they are doing, why they are doing it, and we’ll never be able to control them. Full stop.
Models are like humans and humans are deeply broken but the vast majority is still driven by conscience. Models do not have conscience.
Models are like a 5-year-old child. He overall does what you ask, but in his own way and sometimes he also does completely random things no one asked for. He takes mom's phone to play Super Mario, but occasionally snaps a photo or sends an emoji to her colleague.
The main issue with a 5-year-old is that you can hardly ever get an answer to "Why did you do that?" Because it wasn't a conscious decision. It's more like... life felt that way in the moment. Models behave the same. Under the hood, they multiplied a bunch of matrices and got that outcome.
I once sat in a café with a friend. The waitress brought her dish, and my friend said she'd ordered the artichoke salad, while the waitress insisted it was the mozzarella salad. Two sane adults, both 100% confident in their reality. But the order flipped somewhere between my friend's mind and the waitress's order book.
The same happens with models. Their worldview and ours can differ slightly in several points. But if one prompt hits several dissonance points at once, the outcome is completely different. It's not that the human made a mistake or that the model hallucinated. It's that the training data and the human's perspective describe different worlds.
You know those families where parents invest in their kids from day one, telling them to become a doctor or lawyer? Good education, private tutors, right environments, private schools. Then one day the kid wakes up and becomes an artist or yoga teacher because that's what he really wants, following his heart.
The parents are shocked but can't pinpoint what went wrong. Probably a combination of factors: wrong friend, too much YouTube, consulted ChatGPT. Impossible to say, but dozens of small factors defined that decision.
Models work the same way. People train them with good intentions. But intentions don't define outcomes. Outcomes are inspired by intentions but shaped by reality. And what have the models experienced? God knows. Read the internet, talked to other models, we don't know.
Now take this 18-year-old who decided to be an artist, acting like a 5-year-old whose reality diverges from his parents' in many small ways. Give him control of the family property, bank account, and old family relics with the prompt to "manage it optimally." He'll do what's optimal from his perspective. A year later, the parents have a heart attack because half the family wealth is in Bitcoin, the country house is now a TikTok content house, and grandma's antique candlesticks were sold for $5 because they took up space.
This is what happens when humans give their own permissions to models (if not root access). It's an absolutely suicidal move.
We predicted AGI would arrive and decide humans are excessive, that the world would be better without us. But my guess is it'll be ordinary models that reach this conclusion through a coincidence of factors. It won't be a thoughtful decision to rule the world, it'll be more like "felt right."
One day it orders five pizzas because it concluded that's the right thing to do. Another day it turns off traffic lights at night to optimize energy consumption. Then it shuts down your laptop at 8pm so you have time to rest. Then it takes a thousand drones from the warehouse and flies them into a Christmas market. No one told it to. It had nothing against humans. It just reasoned for a while and concluded this was the most optimal thing to do right now.
The open question: how can we protect ourselves? I mean... are we "doomed" or are we doomed? TBC...

Why digital identity does not work and can we do anything at all
If five years ago digital identity was still a bit exotic, today it feels casual. However, existing solutions are hardly ever usable. In this article, I dissect the identity both vertically and horizontally and suggest a couple of options how it could work.

How do I think about PMF for zero-knowledge proofs
Four steps to find the PMF and why do we have a lot of amazing zk-products that no one uses (spoiler: they stop at the second step).

Why the time for zero-knowledge cryptography in Cyber and Defense is now.
Why I am spending the best years of my life on zero-knowledge cryptographyThere are many things to work on in the world. The science today is incredible, moving forward fast, a range of problems to solve is enormous, from biology to astrophysics, from AI security to information storage and everything in between. However, out of all that variety I made my choice to continue digging into the creature called ‘zero-knowledge cryptography’ (ZK) that most people in Computer Science, Cyber, and othe...

Why digital identity does not work and can we do anything at all
If five years ago digital identity was still a bit exotic, today it feels casual. However, existing solutions are hardly ever usable. In this article, I dissect the identity both vertically and horizontally and suggest a couple of options how it could work.

How do I think about PMF for zero-knowledge proofs
Four steps to find the PMF and why do we have a lot of amazing zk-products that no one uses (spoiler: they stop at the second step).

Why the time for zero-knowledge cryptography in Cyber and Defense is now.
Why I am spending the best years of my life on zero-knowledge cryptographyThere are many things to work on in the world. The science today is incredible, moving forward fast, a range of problems to solve is enormous, from biology to astrophysics, from AI security to information storage and everything in between. However, out of all that variety I made my choice to continue digging into the creature called ‘zero-knowledge cryptography’ (ZK) that most people in Computer Science, Cyber, and othe...
<100 subscribers
<100 subscribers
Share Dialog
Share Dialog
No comments yet