
THE AI JOB DISPLACEMENT TIMELINE: Why Everyone Has It Wrong
A Data-Driven Analysis of Which Jobs Are Actually At Risk And When

Understanding AI: What It Is, How It Works, and Why It Matters
AN INTENTIONALLY SOON-TO-BE-OUTDATED SNAPSHOT OF ARTIFICAL INTELLIGENCE

From Delphi to Decentralization: The Evolution of Oracles in Web3
For thousands of years, oracles were understood to be the mouthpieces of the Gods, communicating their intentions, instructions, and desires for us in the mortal world. Their messages were veiled in metaphor and often misinterpreted or heard too late. In the web3 context, oracles perform a distinctly different function. (I’m guessing that doesn’t surprise you much.) What both versions have in common is that they answer the questions posed to them by the living mortals who seek to use them. An...
Writing about web3, crypto, and AI | Newer to crypto, been following AI since he was a Hoya | Ex-growth at a Gen AI startup | Now sharing my confusion publicly



THE AI JOB DISPLACEMENT TIMELINE: Why Everyone Has It Wrong
A Data-Driven Analysis of Which Jobs Are Actually At Risk And When

Understanding AI: What It Is, How It Works, and Why It Matters
AN INTENTIONALLY SOON-TO-BE-OUTDATED SNAPSHOT OF ARTIFICAL INTELLIGENCE

From Delphi to Decentralization: The Evolution of Oracles in Web3
For thousands of years, oracles were understood to be the mouthpieces of the Gods, communicating their intentions, instructions, and desires for us in the mortal world. Their messages were veiled in metaphor and often misinterpreted or heard too late. In the web3 context, oracles perform a distinctly different function. (I’m guessing that doesn’t surprise you much.) What both versions have in common is that they answer the questions posed to them by the living mortals who seek to use them. An...
Share Dialog
Share Dialog
Writing about web3, crypto, and AI | Newer to crypto, been following AI since he was a Hoya | Ex-growth at a Gen AI startup | Now sharing my confusion publicly

Subscribe to nmohapatra

Subscribe to nmohapatra
<100 subscribers
<100 subscribers

The table above provides a comprehensive overview of the major components that make up today's AI ecosystem. Each component is specialized for specific types of inputs, outputs, and tasks. Understanding how these pieces fit together helps clarify what "AI" actually encompasses—and why it's more accurate to speak of AI systems (combinations of components) rather than a single monolithic "AI."
The "Learning Mode" column describes how each component acquires and updates its capabilities:
The model is trained once on a large dataset, then deployed as-is. It does not learn from new interactions after deployment. Think of this as a "frozen" snapshot of knowledge captured during training.
Example: Image generation models like DALL-E or Midjourney are pre-trained on millions of image-text pairs, then deployed without further learning from user prompts.
The model is first pre-trained on general data, then refined using Reinforcement Learning from Human Feedback (RLHF). Human raters evaluate model outputs, and the model is adjusted to produce responses that align better with human preferences. After this fine-tuning, the model is deployed and does not continue learning.
Example: ChatGPT and Claude undergo RLHF to make their responses more helpful, harmless, and honest before public deployment.
The system continuously learns from ongoing user interactions and adjusts its behavior over time. This is relatively rare in current AI systems due to technical challenges and safety concerns.
Example: AI Agents often combine pre-trained models with memory systems and tool use, allowing them to adapt their approach based on task outcomes and feedback loops.
The retrieval component relies on a fixed knowledge base or document index and does not learn new patterns unless the index is manually updated. However, it's paired with a generative model (typically pre-trained + fine-tuned) that synthesizes retrieved information into responses.
Example: RAG (Retrieval-Augmented Generation) systems search through a static database of documents to find relevant information, then use an LLM to generate a response based on that retrieved context.
Important Note: "Static" here refers to the retrieval database, not the language model component. The LLM used in RAG is typically pre-trained + fine-tuned with RLHF.
The system builds a personalized profile based on user interactions, preferences, and conversation history. It recalls past interactions to provide continuity and contextual recommendations.
Example: Memory & Personalization Layers in chatbots remember user preferences ("I'm vegetarian," "I prefer Python over JavaScript") and reference them in future conversations, creating the impression of a persistent relationship.
Language Models traditionally accept text as input and produce text as output. Newer versions can also generate images or audio in response to text prompts, but their core training was language-focused.
Multimodal Models are trained from the start to handle multiple types of input and output within a single unified model. You can give them a mix of text, images, and audio, and they can respond with any combination of these modalities—all processed through one integrated system rather than separate specialized pipelines.
Why it matters: Multimodal models are more flexible and can perform complex tasks like "explain what's happening in this video" or "generate a diagram based on this conversation" without needing to coordinate multiple separate AI systems.
Examples:
Language Model: ChatGPT (GPT-4) can accept text and images but was primarily trained as a text model with image understanding added later
Multimodal Model: GPT-4o, Claude 3.5 Sonnet, Gemini Pro—designed from the ground up to seamlessly process and generate across text, images, and (in some cases) audio
Pre-trained Models are largely static once deployed. You provide input, they predict output based on learned patterns, and that's the end of the interaction. They're reactive rather than proactive.
AI Agents are dynamic systems that combine language models with additional capabilities:
Reasoning & Planning: Breaking complex goals into steps
Tool Use: Calling external APIs, databases, or software tools
Memory: Maintaining context across multi-step workflows
Execution: Taking actions in the world (scheduling meetings, running code, querying databases)
Feedback Loops: Adjusting approach based on intermediate results
Think of pre-trained models as highly knowledgeable advisors who answer questions. Think of agents as autonomous assistants who can plan, act, remember, and iterate toward goals.
Example Workflow:
Pre-trained LLM: "What's the weather in Tokyo?" → "I don't have real-time data, but..."
AI Agent: "What's the weather in Tokyo?" → [Calls weather API] → "Currently 18°C and partly cloudy in Tokyo."
Examples:
Static Model: Claude or ChatGPT without plugins (text in, text out, no external actions)
AI Agent: AutoGen, LangChain agents, custom agentic systems that can search the web, query databases, execute code, and chain multiple steps together
Standard LLMs rely solely on knowledge encoded in their parameters during training. This creates several limitations:
Knowledge becomes outdated (everything learned from training data, which has a cutoff date)
Cannot access private or proprietary information not in training data
May "hallucinate" plausible-sounding but incorrect information when uncertain
RAG (Retrieval-Augmented Generation) Systems augment LLMs with real-time retrieval from external sources:
User submits a query
System searches relevant documents, databases, or web pages
Retrieved information is provided to the LLM as context
LLM generates a response grounded in the retrieved sources
Benefits:
Access to current information (news, stock prices, recent events)
Can query private knowledge bases (company docs, proprietary research)
Responses can cite specific sources, improving verifiability
Reduces hallucinations by grounding responses in retrieved facts
Trade-offs:
Slower than standard LLM inference (retrieval adds latency)
Quality depends on retrieval effectiveness (garbage in, garbage out)
Requires maintaining and updating document indexes
Example:
Standard LLM: "What's the latest research on CRISPR gene therapy?" → Provides information from training data (possibly 1-2 years old)
RAG System: "What's the latest research on CRISPR gene therapy?" → [Retrieves recent papers from PubMed] → Summarizes findings published in the last 3 months with citations
Examples: LangChain RAG chains, enterprise search systems (Glean, Perplexity), customer support bots with knowledge base integration
Speech-to-Text (STT) / Automatic Speech Recognition (ASR) Converts spoken audio into written text. Essential for voice interfaces, meeting transcription, accessibility tools, and voice-controlled systems.
Example: Whisper (OpenAI), Google Speech-to-Text, assembly AI
Text-to-Speech (TTS) / Speech Synthesis Generates natural-sounding spoken audio from written text. Used in voice assistants, audiobook narration, accessibility features, and content localization.
Example: ElevenLabs, Google Cloud TTS, Azure Neural TTS
Together, they enable full voice interaction loops:
User speaks → STT transcribes → Text goes to LLM
LLM generates text response → TTS synthesizes → User hears response
This powers voice assistants, phone-based customer service bots, and hands-free interfaces.
Each component in the table is optimized for a specific type of work. Their real power emerges when they're combined into unified systems. Modern AI applications rarely use a single component in isolation—they orchestrate multiple components working together.
Components:
Speech Models (STT): Convert customer's spoken question to text
Language Model: Understand the question and formulate response
RAG System: Retrieve relevant information from company knowledge base
Memory Layer: Recall previous interactions with this customer
Speech Models (TTS): Convert text response back to natural voice
Result: A voice bot that understands spoken questions, retrieves accurate company-specific information, remembers past interactions, and responds naturally.
Components:
Language Models: Generate marketing copy, blog posts, scripts
Image Generation Models: Create visuals, mockups, illustrations
Video Generation Models: Produce promotional videos, animations
Speech Models (TTS): Add professional voiceover narration
Result: A complete content production pipeline that can go from a single brief ("Create a social media campaign for our new product") to finished assets across multiple formats.
Components:
AI Agent: Plan research strategy, break down complex questions
RAG System: Search academic databases, news archives, web sources
Language Model: Analyze findings, synthesize information, identify gaps
Memory Layer: Track research progress across multiple sessions
Multimodal Model: Interpret charts, graphs, and diagrams in research papers
Result: A system that can conduct literature reviews, identify key findings, flag contradictions, and produce comprehensive research summaries—adapting its approach based on what it discovers.
Components:
Language Model: Natural conversation and task understanding
Memory & Personalization Layer: Remember user preferences, past conversations, ongoing projects
AI Agent: Execute multi-step tasks (book appointments, send emails, set reminders)
RAG System: Access personal notes, documents, and emails when relevant
Multimodal Model: Handle voice commands, images, and documents
Result: An assistant that knows you, remembers your context, can take action on your behalf, and improves its usefulness over time by learning your preferences and patterns.
This modular structure makes AI systems:
Flexible: Components can be swapped, upgraded, or recombined without rebuilding everything
Scalable: Each component can be optimized independently for performance and cost
Specialized: Individual components stay focused on what they do best
Powerful: Combinations create emergent capabilities greater than any single component
Adaptable: New components can be added as technology evolves
Understanding AI as an ecosystem of specialized components—rather than a single monolithic technology—helps clarify both its capabilities and limitations. It also explains why AI development is accelerating: improvements to any component benefit every system using that component, and new combinations unlock new possibilities.
The systems that seem most "intelligent" aren't necessarily the ones with the largest models—they're the ones that combine multiple specialized components in thoughtful, well-architected ways.

The table above provides a comprehensive overview of the major components that make up today's AI ecosystem. Each component is specialized for specific types of inputs, outputs, and tasks. Understanding how these pieces fit together helps clarify what "AI" actually encompasses—and why it's more accurate to speak of AI systems (combinations of components) rather than a single monolithic "AI."
The "Learning Mode" column describes how each component acquires and updates its capabilities:
The model is trained once on a large dataset, then deployed as-is. It does not learn from new interactions after deployment. Think of this as a "frozen" snapshot of knowledge captured during training.
Example: Image generation models like DALL-E or Midjourney are pre-trained on millions of image-text pairs, then deployed without further learning from user prompts.
The model is first pre-trained on general data, then refined using Reinforcement Learning from Human Feedback (RLHF). Human raters evaluate model outputs, and the model is adjusted to produce responses that align better with human preferences. After this fine-tuning, the model is deployed and does not continue learning.
Example: ChatGPT and Claude undergo RLHF to make their responses more helpful, harmless, and honest before public deployment.
The system continuously learns from ongoing user interactions and adjusts its behavior over time. This is relatively rare in current AI systems due to technical challenges and safety concerns.
Example: AI Agents often combine pre-trained models with memory systems and tool use, allowing them to adapt their approach based on task outcomes and feedback loops.
The retrieval component relies on a fixed knowledge base or document index and does not learn new patterns unless the index is manually updated. However, it's paired with a generative model (typically pre-trained + fine-tuned) that synthesizes retrieved information into responses.
Example: RAG (Retrieval-Augmented Generation) systems search through a static database of documents to find relevant information, then use an LLM to generate a response based on that retrieved context.
Important Note: "Static" here refers to the retrieval database, not the language model component. The LLM used in RAG is typically pre-trained + fine-tuned with RLHF.
The system builds a personalized profile based on user interactions, preferences, and conversation history. It recalls past interactions to provide continuity and contextual recommendations.
Example: Memory & Personalization Layers in chatbots remember user preferences ("I'm vegetarian," "I prefer Python over JavaScript") and reference them in future conversations, creating the impression of a persistent relationship.
Language Models traditionally accept text as input and produce text as output. Newer versions can also generate images or audio in response to text prompts, but their core training was language-focused.
Multimodal Models are trained from the start to handle multiple types of input and output within a single unified model. You can give them a mix of text, images, and audio, and they can respond with any combination of these modalities—all processed through one integrated system rather than separate specialized pipelines.
Why it matters: Multimodal models are more flexible and can perform complex tasks like "explain what's happening in this video" or "generate a diagram based on this conversation" without needing to coordinate multiple separate AI systems.
Examples:
Language Model: ChatGPT (GPT-4) can accept text and images but was primarily trained as a text model with image understanding added later
Multimodal Model: GPT-4o, Claude 3.5 Sonnet, Gemini Pro—designed from the ground up to seamlessly process and generate across text, images, and (in some cases) audio
Pre-trained Models are largely static once deployed. You provide input, they predict output based on learned patterns, and that's the end of the interaction. They're reactive rather than proactive.
AI Agents are dynamic systems that combine language models with additional capabilities:
Reasoning & Planning: Breaking complex goals into steps
Tool Use: Calling external APIs, databases, or software tools
Memory: Maintaining context across multi-step workflows
Execution: Taking actions in the world (scheduling meetings, running code, querying databases)
Feedback Loops: Adjusting approach based on intermediate results
Think of pre-trained models as highly knowledgeable advisors who answer questions. Think of agents as autonomous assistants who can plan, act, remember, and iterate toward goals.
Example Workflow:
Pre-trained LLM: "What's the weather in Tokyo?" → "I don't have real-time data, but..."
AI Agent: "What's the weather in Tokyo?" → [Calls weather API] → "Currently 18°C and partly cloudy in Tokyo."
Examples:
Static Model: Claude or ChatGPT without plugins (text in, text out, no external actions)
AI Agent: AutoGen, LangChain agents, custom agentic systems that can search the web, query databases, execute code, and chain multiple steps together
Standard LLMs rely solely on knowledge encoded in their parameters during training. This creates several limitations:
Knowledge becomes outdated (everything learned from training data, which has a cutoff date)
Cannot access private or proprietary information not in training data
May "hallucinate" plausible-sounding but incorrect information when uncertain
RAG (Retrieval-Augmented Generation) Systems augment LLMs with real-time retrieval from external sources:
User submits a query
System searches relevant documents, databases, or web pages
Retrieved information is provided to the LLM as context
LLM generates a response grounded in the retrieved sources
Benefits:
Access to current information (news, stock prices, recent events)
Can query private knowledge bases (company docs, proprietary research)
Responses can cite specific sources, improving verifiability
Reduces hallucinations by grounding responses in retrieved facts
Trade-offs:
Slower than standard LLM inference (retrieval adds latency)
Quality depends on retrieval effectiveness (garbage in, garbage out)
Requires maintaining and updating document indexes
Example:
Standard LLM: "What's the latest research on CRISPR gene therapy?" → Provides information from training data (possibly 1-2 years old)
RAG System: "What's the latest research on CRISPR gene therapy?" → [Retrieves recent papers from PubMed] → Summarizes findings published in the last 3 months with citations
Examples: LangChain RAG chains, enterprise search systems (Glean, Perplexity), customer support bots with knowledge base integration
Speech-to-Text (STT) / Automatic Speech Recognition (ASR) Converts spoken audio into written text. Essential for voice interfaces, meeting transcription, accessibility tools, and voice-controlled systems.
Example: Whisper (OpenAI), Google Speech-to-Text, assembly AI
Text-to-Speech (TTS) / Speech Synthesis Generates natural-sounding spoken audio from written text. Used in voice assistants, audiobook narration, accessibility features, and content localization.
Example: ElevenLabs, Google Cloud TTS, Azure Neural TTS
Together, they enable full voice interaction loops:
User speaks → STT transcribes → Text goes to LLM
LLM generates text response → TTS synthesizes → User hears response
This powers voice assistants, phone-based customer service bots, and hands-free interfaces.
Each component in the table is optimized for a specific type of work. Their real power emerges when they're combined into unified systems. Modern AI applications rarely use a single component in isolation—they orchestrate multiple components working together.
Components:
Speech Models (STT): Convert customer's spoken question to text
Language Model: Understand the question and formulate response
RAG System: Retrieve relevant information from company knowledge base
Memory Layer: Recall previous interactions with this customer
Speech Models (TTS): Convert text response back to natural voice
Result: A voice bot that understands spoken questions, retrieves accurate company-specific information, remembers past interactions, and responds naturally.
Components:
Language Models: Generate marketing copy, blog posts, scripts
Image Generation Models: Create visuals, mockups, illustrations
Video Generation Models: Produce promotional videos, animations
Speech Models (TTS): Add professional voiceover narration
Result: A complete content production pipeline that can go from a single brief ("Create a social media campaign for our new product") to finished assets across multiple formats.
Components:
AI Agent: Plan research strategy, break down complex questions
RAG System: Search academic databases, news archives, web sources
Language Model: Analyze findings, synthesize information, identify gaps
Memory Layer: Track research progress across multiple sessions
Multimodal Model: Interpret charts, graphs, and diagrams in research papers
Result: A system that can conduct literature reviews, identify key findings, flag contradictions, and produce comprehensive research summaries—adapting its approach based on what it discovers.
Components:
Language Model: Natural conversation and task understanding
Memory & Personalization Layer: Remember user preferences, past conversations, ongoing projects
AI Agent: Execute multi-step tasks (book appointments, send emails, set reminders)
RAG System: Access personal notes, documents, and emails when relevant
Multimodal Model: Handle voice commands, images, and documents
Result: An assistant that knows you, remembers your context, can take action on your behalf, and improves its usefulness over time by learning your preferences and patterns.
This modular structure makes AI systems:
Flexible: Components can be swapped, upgraded, or recombined without rebuilding everything
Scalable: Each component can be optimized independently for performance and cost
Specialized: Individual components stay focused on what they do best
Powerful: Combinations create emergent capabilities greater than any single component
Adaptable: New components can be added as technology evolves
Understanding AI as an ecosystem of specialized components—rather than a single monolithic technology—helps clarify both its capabilities and limitations. It also explains why AI development is accelerating: improvements to any component benefit every system using that component, and new combinations unlock new possibilities.
The systems that seem most "intelligent" aren't necessarily the ones with the largest models—they're the ones that combine multiple specialized components in thoughtful, well-architected ways.
No activity yet