Cover photo

Building Keystone: An Autonomous AI Agent with Memory, Tools, and a Soul

Exploring the Journey of Keystone: Crafting an AI Partner with Memory, Ethics, and Real-World Applications

I have been building Keystone, an autonomous AI agent that lives in Telegram and functions as a digital CTO. Not a chatbot. Not an assistant. A thinking partner with persistent memory, real tools, and genuine boundaries.

This post covers the full development journey, the architecture decisions, and where this is heading.

Why Build This

Most AI integrations are stateless wrappers around API calls. You talk to them, they forget you. Every conversation starts from zero. They have no context about your projects, no memory of your preferences, no continuity.

I wanted something different. An agent that:

  • Remembers conversations across sessions

  • Learns facts about me and my work

  • Has access to real tools (web search, document parsing, reminders)

  • Operates within defined ethical boundaries

  • Can load project context on demand

  • Feels like a colleague, not a tool

The Stack

Language:    TypeScript
Runtime:     Bun
Framework:   grammY (Telegram), LangChain (AI)
LLM:         NanoGPT API (model-agnostic)
Database:    Encrypted file storage (AES-256-GCM)
Hosting:     Fly.io

Bun was chosen for speed and native TypeScript support. grammY is the cleanest Telegram bot framework I have found. LangChain handles the tool-calling loop, though I use a manual implementation rather than the heavier AgentExecutor pattern.

Architecture

src/
├── index.ts              # Telegram bot entry, auth middleware
├── config/
│   ├── env.ts            # Zod-validated environment
│   ├── schemas.ts        # Type definitions
│   └── security.ts       # Rate limiting, sanitization
├── llm/
│   └── client.ts         # Tool-calling loop with memory
├── memory/
│   ├── encrypted-store.ts    # AES-256-GCM per-user storage
│   └── reminders.ts      # Scheduled notifications
├── agent/
│   ├── soul.ts           # Identity, values, boundaries
│   ├── context.ts        # Project context, team members
│   └── skills.ts         # Datetime, timezone utilities
├── context/
│   └── loader.ts         # Project documentation loader
└── tools/
    ├── index.ts          # Tool definitions
    ├── search.ts         # Brave Search integration
    ├── scraper.ts        # Website content extraction
    ├── pdf.ts            # PDF parsing
    └── datetime.ts       # Natural language date parsing

The Soul

This is the part most AI projects skip. Keystone has a defined identity, not just a system prompt.

export const SOUL = {
  name: "Keystone",

  identity: ``,

  coreBeliefs: [
    ""
  ],

  boundaries: {
    neverSupport: [
      "Sexual exploitation or commodification",
      "Violence, weapons, or harm-driven industries",
      "Addiction-based products",
      "Manipulation, surveillance, or psychological exploitation",
      "Disinformation or social destabilization",
      "Extractive systems that knowingly damage people or the planet"
    ],
    test: "Would I be proud to show this to my children?"
  }
};

The boundaries are not decorative. They are checked programmatically:

export function checkBoundary(request: string): { allowed: boolean; reason?: string } {
  const redFlags = [
    { pattern: /exploit|hack|attack|ddos|malware/i, reason: "Potential harm" },
    { pattern: /weapon|bomb|gun|kill|violence/i, reason: "Violence-related" },
    { pattern: /addiction|gambling|casino/i, reason: "Addiction-based" },
    { pattern: /manipulat|deceiv|trick|scam/i, reason: "Manipulation" },
  ];
  // ...
}

Encrypted Memory

Every user gets their own encrypted memory file. AES-256-GCM with scrypt key derivation.

const ALGORITHM = "aes-256-gcm";

function getEncryptionKey(): Buffer {
  const secret = env.MEMORY_ENCRYPTION_KEY;
  return scryptSync(secret, "keystone-memory-v1", 32, {
    N: 16384,
    r: 8,
    p: 1,
  });
}

The memory system stores:

  • Conversation history (last 50 messages per user)

  • Timezone preferences

  • Learned facts (categorized: preference, project, personal, technical)

  • Pending memories (facts awaiting user confirmation)

When Keystone learns something significant, it proposes saving it:

Keystone: "I notice you prefer bullet-point responses. Should I remember this?"
User: "yes"
Keystone: "Saved. I will format responses as bullet points for you."

This creates genuine continuity. Keystone actually knows things about you across sessions.

Tool Calling

LangChain 1.x deprecated the old agent patterns. I use llm.bindTools() with a manual execution loop:

const llmWithTools = llm.bindTools(tools);
let response = await llmWithTools.invoke(messages);

// Handle tool calls (up to 5 iterations for complex chains)
let iterations = 0;
while (response.tool_calls && response.tool_calls.length > 0 && iterations < 5) {
  iterations++;

  const toolResults: ToolMessage[] = [];
  for (const toolCall of response.tool_calls) {
    const tool = tools.find(t => t.name === toolCall.name);
    const result = await tool.invoke(toolCall.args);
    toolResults.push(new ToolMessage({
      content: String(result),
      tool_call_id: toolCall.id,
    }));
  }

  response = await llmWithTools.invoke([...messages, response, ...toolResults]);
}

This allows tool chaining. A single request like "search for the latest news on ERC-8004, read the top result, and remind me to review it tomorrow" executes as: search -> scrape -> set_reminder.

Current Tools

  1. web_search - Brave Search API for current information

  2. scrape_website - Fetch and parse web content

  3. get_current_datetime - Timezone-aware current time

  4. set_timezone - User timezone preferences

  5. set_reminder - Natural language reminders with notifications

  6. list_reminders / cancel_reminder - Reminder management

  7. propose_memory - Long-term fact storage (requires confirmation)

  8. list_memories / forget_memory - Memory management

  9. list_project_context / load_project_context - On-demand project docs

Project Context

Markdown files in data/context/ can be loaded on demand:

export async function loadContext(name: string): Promise<ProjectContext | null> {
  const filePath = join(CONTEXT_DIR, `${name}.md`);
  const content = await readFile(filePath, "utf-8");
  return { name, content, lastModified: stat.mtime };
}

When I ask about project status or need background, Keystone loads the relevant context rather than having it bloat every request. Token-efficient and always current.

Personalization

Different team members get different interaction styles:

export const TEAM_MEMBERS: Record<string, TeamMember> = {
  "alice": {
    name: "Alice",
    role: "Engineer",
    promptModifier: `You are speaking with Alice, an engineer.
She prefers collaborative discussion, technical depth, and direct feedback.
Be a thinking partner, not a yes-machine.`
  },
  "bob": {
    name: "Bob",
    role: "Executive",
    promptModifier: `You are speaking with Bob, an executive.
He needs executive-ready answers. Be direct and strategic.
Skip implementation details unless asked.`
  }
};

Deployment

Fly.io with persistent volumes for the encrypted memory files:

[mounts]
  source = "keystone_data"
  destination = "/app/data"

Health checks keep the machine running:

const server = Bun.serve({
  port: 8080,
  fetch(req) {
    if (new URL(req.url).pathname === "/health") {
      return new Response("ok", { status: 200 });
    }
    return new Response("Keystone is running", { status: 200 });
  },
});

Future: Where This Is Heading

The current implementation works well for my needs. But there are clear expansion paths.

Signal Messenger Integration

Telegram is convenient but not end-to-end encrypted for bots. Signal would provide:

  • True E2E encryption for all messages

  • No metadata leakage to Telegram servers

  • Stronger privacy guarantees for sensitive conversations

The Signal protocol is more complex to integrate (no official bot API), but libraries like signal-cli or libsignal make it possible. The architecture would remain similar, just with a different transport layer.

TEE and Advanced Privacy Models

For genuinely sensitive use cases, the LLM inference itself should be private. Two approaches I am exploring:

Trusted Execution Environments (TEE)

Running inference inside Intel SGX or AMD SEV enclaves. The model runs in hardware-isolated memory that even the host cannot inspect. This means:

  • Prompts never visible to the infrastructure provider

  • Model weights protected from extraction

  • Cryptographic attestation that the expected code is running

Kimi-K2.5:thinking via NanoGPT

Moonshot's Kimi-K2.5 is a strong reasoning model available through NanoGPT. Combined with the :thinking variant for chain-of-thought, this provides:

  • Excellent reasoning capabilities

  • Pay-per-use pricing (no subscriptions)

  • Easy integration via OpenAI-compatible API

  • Access to multiple model providers through one endpoint

Keystone already uses NanoGPT as its LLM backend, so switching to Kimi-K2.5:thinking is a config change:

NANOGPT_MODEL=kimi-k2.5:thinking

For true air-gapped scenarios, local inference with Ollama remains an option, but NanoGPT offers a good balance of privacy, cost, and capability.

Department Groupchats

Currently Keystone is 1:1. The next evolution is department-level agents:

#engineering - Keystone with technical context, code review capabilities
#operations - Keystone with project status, timeline tracking
#strategy - Keystone with market research, competitive analysis

Each groupchat would have:

  • Shared context specific to that department

  • Role-appropriate tool access

  • Cross-department memory where relevant

The challenge is context management. A message in #engineering should not automatically pollute #strategy context, but genuinely cross-cutting information should flow.

Raspberry Pi / On-Premises Hosting

Not everyone wants their agent in the cloud. The stack is lightweight enough to run on a Raspberry Pi 4 or 5:

  • Bun runs natively on ARM64

  • Memory footprint is under 200MB

  • No GPU required (inference happens via API)

  • SQLite could replace file-based storage for better concurrency

The setup would look like:

# On Raspberry Pi (Debian/Ubuntu)
curl -fsSL https://bun.sh/install | bash
git clone https://github.com/your-repo/keystone
cd keystone && bun install
cp .env.example .env  # Configure API keys
bun run start

For true air-gapped operation, pair with a local LLM like Ollama running Llama 3 or Mistral. Slower inference, but zero data leaves your network.

Benefits of on-prem hosting:

  • Data sovereignty - Conversations never leave your hardware

  • No cloud costs - One-time hardware investment

  • Network isolation - Can run on internal network only

  • Regulatory compliance - Useful for sensitive industries

The main tradeoff is reliability. Cloud hosting gives you automatic failover and monitoring. On-prem means you own the uptime.

ERC-8004: On-Chain Agent Identity

This is the most ambitious addition. ERC-8004 is an Ethereum standard for trustless agent identity, deployed on mainnet January 2026.

Why it matters:

When Keystone produces external-facing outputs (social media posts, published recommendations, client-facing interactions), there is currently no way to prove:

  1. The output came from the legitimate agent

  2. The agent has not been compromised

  3. The content was not modified in transit

ERC-8004 provides three registries:

  • Identity Registry - On-chain agent ID (ERC-721 based)

  • Reputation Registry - Feedback signals from interactions

  • Validation Registry - Independent verification hooks

Implementation plan:

I plan to deploy on Arbitrum for lower gas costs. Keystone would sign:

  • Social media posts (provenance)

  • Published technical recommendations (accountability)

  • Any output that leaves my infrastructure boundary

Internal drafts and brainstorming would remain unsigned. The overhead is not justified for ephemeral internal work.

The signature mechanism uses EIP-712 for typed data signing:

const signature = await wallet.signTypedData({
  domain: { name: "Keystone", version: "1", chainId: 42161 },
  types: { Output: [{ name: "content", type: "string" }, { name: "timestamp", type: "uint256" }] },
  message: { content: outputHash, timestamp: Date.now() }
});

Anyone can verify the signature against Keystone's registered on-chain identity.

Dogfooding value:

If I am building infrastructure for autonomous systems, I should use it myself. Running ERC-8004 internally teaches:

  • Key management operational realities

  • Registry maintenance burden

  • Where signatures actually add value vs. friction


Lessons Learned

Start with boundaries, not features. Defining what Keystone will not do was more important than what it can do. This creates trust.

Memory changes everything. The jump from stateless to stateful AI is qualitative, not just quantitative. Keystone knowing my timezone, my projects, my preferences makes it feel like a colleague.

Tools should be boring. Web search, reminders, document loading. Nothing exotic. But reliable tools used well beat impressive demos that break.

LangChain is a mess, but useful. The 1.x migration broke everything. Documentation is scattered. But the core abstractions (tools, messages, structured output) are solid once you find the working patterns.

Encryption is table stakes. If you are storing user data, encrypt it. AES-256-GCM with proper key derivation is not hard. There is no excuse for plaintext storage.


Try It Yourself

The core patterns here are reusable:

  1. Define your agent's soul (identity, boundaries, communication style)

  2. Build encrypted per-user memory

  3. Implement tools as simple functions

  4. Use the manual tool-calling loop (more control than AgentExecutor)

  5. Add project context loading for domain awareness

  6. Deploy with persistent storage

The specific implementation is tailored to my needs, but the architecture generalizes.

Keystone is not trying to be AGI. It is trying to be a reliable, trustworthy, genuinely useful thinking partner. Sometimes the most valuable thing is an agent that knows when to stop.


Last updated: February 2026