# Building Keystone: An Autonomous AI Agent with Memory, Tools, and a Soul

*Exploring the Journey of Keystone: Crafting an AI Partner with Memory, Ethics, and Real-World Applications*

By [MetaEnd](https://paragraph.com/@metaend) · 2026-02-03

ai, agents, llm

---

I have been building Keystone, an autonomous AI agent that lives in Telegram and functions as a digital CTO. Not a chatbot. Not an assistant. A thinking partner with persistent memory, real tools, and genuine boundaries.

This post covers the full development journey, the architecture decisions, and where this is heading.

**Why Build This**
------------------

Most AI integrations are stateless wrappers around API calls. You talk to them, they forget you. Every conversation starts from zero. They have no context about your projects, no memory of your preferences, no continuity.

I wanted something different. An agent that:

*   Remembers conversations across sessions
    
*   Learns facts about me and my work
    
*   Has access to real tools (web search, document parsing, reminders)
    
*   Operates within defined ethical boundaries
    
*   Can load project context on demand
    
*   Feels like a colleague, not a tool
    

**The Stack**
-------------

    Language:    TypeScript
    Runtime:     Bun
    Framework:   grammY (Telegram), LangChain (AI)
    LLM:         NanoGPT API (model-agnostic)
    Database:    Encrypted file storage (AES-256-GCM)
    Hosting:     Fly.io
    

Bun was chosen for speed and native TypeScript support. grammY is the cleanest Telegram bot framework I have found. LangChain handles the tool-calling loop, though I use a manual implementation rather than the heavier AgentExecutor pattern.

**Architecture**
----------------

    src/
    ├── index.ts              # Telegram bot entry, auth middleware
    ├── config/
    │   ├── env.ts            # Zod-validated environment
    │   ├── schemas.ts        # Type definitions
    │   └── security.ts       # Rate limiting, sanitization
    ├── llm/
    │   └── client.ts         # Tool-calling loop with memory
    ├── memory/
    │   ├── encrypted-store.ts    # AES-256-GCM per-user storage
    │   └── reminders.ts      # Scheduled notifications
    ├── agent/
    │   ├── soul.ts           # Identity, values, boundaries
    │   ├── context.ts        # Project context, team members
    │   └── skills.ts         # Datetime, timezone utilities
    ├── context/
    │   └── loader.ts         # Project documentation loader
    └── tools/
        ├── index.ts          # Tool definitions
        ├── search.ts         # Brave Search integration
        ├── scraper.ts        # Website content extraction
        ├── pdf.ts            # PDF parsing
        └── datetime.ts       # Natural language date parsing
    

**The Soul**
------------

This is the part most AI projects skip. Keystone has a defined identity, not just a system prompt.

    export const SOUL = {
      name: "Keystone",
    
      identity: ``,
    
      coreBeliefs: [
        ""
      ],
    
      boundaries: {
        neverSupport: [
          "Sexual exploitation or commodification",
          "Violence, weapons, or harm-driven industries",
          "Addiction-based products",
          "Manipulation, surveillance, or psychological exploitation",
          "Disinformation or social destabilization",
          "Extractive systems that knowingly damage people or the planet"
        ],
        test: "Would I be proud to show this to my children?"
      }
    };
    

The boundaries are not decorative. They are checked programmatically:

    export function checkBoundary(request: string): { allowed: boolean; reason?: string } {
      const redFlags = [
        { pattern: /exploit|hack|attack|ddos|malware/i, reason: "Potential harm" },
        { pattern: /weapon|bomb|gun|kill|violence/i, reason: "Violence-related" },
        { pattern: /addiction|gambling|casino/i, reason: "Addiction-based" },
        { pattern: /manipulat|deceiv|trick|scam/i, reason: "Manipulation" },
      ];
      // ...
    }
    

**Encrypted Memory**
--------------------

Every user gets their own encrypted memory file. AES-256-GCM with scrypt key derivation.

    const ALGORITHM = "aes-256-gcm";
    
    function getEncryptionKey(): Buffer {
      const secret = env.MEMORY_ENCRYPTION_KEY;
      return scryptSync(secret, "keystone-memory-v1", 32, {
        N: 16384,
        r: 8,
        p: 1,
      });
    }
    

The memory system stores:

*   **Conversation history** (last 50 messages per user)
    
*   **Timezone preferences**
    
*   **Learned facts** (categorized: preference, project, personal, technical)
    
*   **Pending memories** (facts awaiting user confirmation)
    

When Keystone learns something significant, it proposes saving it:

    Keystone: "I notice you prefer bullet-point responses. Should I remember this?"
    User: "yes"
    Keystone: "Saved. I will format responses as bullet points for you."
    

This creates genuine continuity. Keystone actually knows things about you across sessions.

**Tool Calling**
----------------

LangChain 1.x deprecated the old agent patterns. I use `llm.bindTools()` with a manual execution loop:

    const llmWithTools = llm.bindTools(tools);
    let response = await llmWithTools.invoke(messages);
    
    // Handle tool calls (up to 5 iterations for complex chains)
    let iterations = 0;
    while (response.tool_calls && response.tool_calls.length > 0 && iterations < 5) {
      iterations++;
    
      const toolResults: ToolMessage[] = [];
      for (const toolCall of response.tool_calls) {
        const tool = tools.find(t => t.name === toolCall.name);
        const result = await tool.invoke(toolCall.args);
        toolResults.push(new ToolMessage({
          content: String(result),
          tool_call_id: toolCall.id,
        }));
      }
    
      response = await llmWithTools.invoke([...messages, response, ...toolResults]);
    }
    

This allows tool chaining. A single request like "search for the latest news on ERC-8004, read the top result, and remind me to review it tomorrow" executes as: search -> scrape -> set\_reminder.

### **Current Tools**

1.  **web\_search** - Brave Search API for current information
    
2.  **scrape\_website** - Fetch and parse web content
    
3.  **get\_current\_datetime** - Timezone-aware current time
    
4.  **set\_timezone** - User timezone preferences
    
5.  **set\_reminder** - Natural language reminders with notifications
    
6.  **list\_reminders** / **cancel\_reminder** - Reminder management
    
7.  **propose\_memory** - Long-term fact storage (requires confirmation)
    
8.  **list\_memories** / **forget\_memory** - Memory management
    
9.  **list\_project\_context** / **load\_project\_context** - On-demand project docs
    

**Project Context**
-------------------

Markdown files in `data/context/` can be loaded on demand:

    export async function loadContext(name: string): Promise<ProjectContext | null> {
      const filePath = join(CONTEXT_DIR, `${name}.md`);
      const content = await readFile(filePath, "utf-8");
      return { name, content, lastModified: stat.mtime };
    }
    

When I ask about project status or need background, Keystone loads the relevant context rather than having it bloat every request. Token-efficient and always current.

**Personalization**
-------------------

Different team members get different interaction styles:

    export const TEAM_MEMBERS: Record<string, TeamMember> = {
      "alice": {
        name: "Alice",
        role: "Engineer",
        promptModifier: `You are speaking with Alice, an engineer.
    She prefers collaborative discussion, technical depth, and direct feedback.
    Be a thinking partner, not a yes-machine.`
      },
      "bob": {
        name: "Bob",
        role: "Executive",
        promptModifier: `You are speaking with Bob, an executive.
    He needs executive-ready answers. Be direct and strategic.
    Skip implementation details unless asked.`
      }
    };
    

**Deployment**
--------------

Fly.io with persistent volumes for the encrypted memory files:

    [mounts]
      source = "keystone_data"
      destination = "/app/data"
    

Health checks keep the machine running:

    const server = Bun.serve({
      port: 8080,
      fetch(req) {
        if (new URL(req.url).pathname === "/health") {
          return new Response("ok", { status: 200 });
        }
        return new Response("Keystone is running", { status: 200 });
      },
    });
    

* * *

**Future: Where This Is Heading**
---------------------------------

The current implementation works well for my needs. But there are clear expansion paths.

### **Signal Messenger Integration**

Telegram is convenient but not end-to-end encrypted for bots. Signal would provide:

*   True E2E encryption for all messages
    
*   No metadata leakage to Telegram servers
    
*   Stronger privacy guarantees for sensitive conversations
    

The Signal protocol is more complex to integrate (no official bot API), but libraries like `signal-cli` or `libsignal` make it possible. The architecture would remain similar, just with a different transport layer.

### **TEE and Advanced Privacy Models**

For genuinely sensitive use cases, the LLM inference itself should be private. Two approaches I am exploring:

**Trusted Execution Environments (TEE)**

Running inference inside Intel SGX or AMD SEV enclaves. The model runs in hardware-isolated memory that even the host cannot inspect. This means:

*   Prompts never visible to the infrastructure provider
    
*   Model weights protected from extraction
    
*   Cryptographic attestation that the expected code is running
    

**Kimi-K2.5:thinking via NanoGPT**

Moonshot's Kimi-K2.5 is a strong reasoning model available through [NanoGPT](https://nano-gpt.com/subscription/HCdbQetV). Combined with the `:thinking` variant for chain-of-thought, this provides:

*   Excellent reasoning capabilities
    
*   Pay-per-use pricing (no subscriptions)
    
*   Easy integration via OpenAI-compatible API
    
*   Access to multiple model providers through one endpoint
    

Keystone already uses NanoGPT as its LLM backend, so switching to Kimi-K2.5:thinking is a config change:

    NANOGPT_MODEL=kimi-k2.5:thinking
    

For true air-gapped scenarios, local inference with Ollama remains an option, but NanoGPT offers a good balance of privacy, cost, and capability.

### **Department Groupchats**

Currently Keystone is 1:1. The next evolution is department-level agents:

    #engineering - Keystone with technical context, code review capabilities
    #operations - Keystone with project status, timeline tracking
    #strategy - Keystone with market research, competitive analysis
    

Each groupchat would have:

*   Shared context specific to that department
    
*   Role-appropriate tool access
    
*   Cross-department memory where relevant
    

The challenge is context management. A message in #engineering should not automatically pollute #strategy context, but genuinely cross-cutting information should flow.

### **Raspberry Pi / On-Premises Hosting**

Not everyone wants their agent in the cloud. The stack is lightweight enough to run on a Raspberry Pi 4 or 5:

*   **Bun** runs natively on ARM64
    
*   **Memory footprint** is under 200MB
    
*   **No GPU required** (inference happens via API)
    
*   **SQLite** could replace file-based storage for better concurrency
    

The setup would look like:

    # On Raspberry Pi (Debian/Ubuntu)
    curl -fsSL https://bun.sh/install | bash
    git clone https://github.com/your-repo/keystone
    cd keystone && bun install
    cp .env.example .env  # Configure API keys
    bun run start
    

For true air-gapped operation, pair with a local LLM like Ollama running Llama 3 or Mistral. Slower inference, but zero data leaves your network.

Benefits of on-prem hosting:

*   **Data sovereignty** - Conversations never leave your hardware
    
*   **No cloud costs** - One-time hardware investment
    
*   **Network isolation** - Can run on internal network only
    
*   **Regulatory compliance** - Useful for sensitive industries
    

The main tradeoff is reliability. Cloud hosting gives you automatic failover and monitoring. On-prem means you own the uptime.

### **ERC-8004: On-Chain Agent Identity**

This is the most ambitious addition. ERC-8004 is an Ethereum standard for trustless agent identity, deployed on mainnet January 2026.

**Why it matters:**

When Keystone produces external-facing outputs (social media posts, published recommendations, client-facing interactions), there is currently no way to prove:

1.  The output came from the legitimate agent
    
2.  The agent has not been compromised
    
3.  The content was not modified in transit
    

ERC-8004 provides three registries:

*   **Identity Registry** - On-chain agent ID (ERC-721 based)
    
*   **Reputation Registry** - Feedback signals from interactions
    
*   **Validation Registry** - Independent verification hooks
    

**Implementation plan:**

I plan to deploy on Arbitrum for lower gas costs. Keystone would sign:

*   Social media posts (provenance)
    
*   Published technical recommendations (accountability)
    
*   Any output that leaves my infrastructure boundary
    

Internal drafts and brainstorming would remain unsigned. The overhead is not justified for ephemeral internal work.

The signature mechanism uses EIP-712 for typed data signing:

    const signature = await wallet.signTypedData({
      domain: { name: "Keystone", version: "1", chainId: 42161 },
      types: { Output: [{ name: "content", type: "string" }, { name: "timestamp", type: "uint256" }] },
      message: { content: outputHash, timestamp: Date.now() }
    });
    

Anyone can verify the signature against Keystone's registered on-chain identity.

**Dogfooding value:**

If I am building infrastructure for autonomous systems, I should use it myself. Running ERC-8004 internally teaches:

*   Key management operational realities
    
*   Registry maintenance burden
    
*   Where signatures actually add value vs. friction
    

* * *

**Lessons Learned**
-------------------

**Start with boundaries, not features.** Defining what Keystone will not do was more important than what it can do. This creates trust.

**Memory changes everything.** The jump from stateless to stateful AI is qualitative, not just quantitative. Keystone knowing my timezone, my projects, my preferences makes it feel like a colleague.

**Tools should be boring.** Web search, reminders, document loading. Nothing exotic. But reliable tools used well beat impressive demos that break.

**LangChain is a mess, but useful.** The 1.x migration broke everything. Documentation is scattered. But the core abstractions (tools, messages, structured output) are solid once you find the working patterns.

**Encryption is table stakes.** If you are storing user data, encrypt it. AES-256-GCM with proper key derivation is not hard. There is no excuse for plaintext storage.

* * *

**Try It Yourself**
-------------------

The core patterns here are reusable:

1.  Define your agent's soul (identity, boundaries, communication style)
    
2.  Build encrypted per-user memory
    
3.  Implement tools as simple functions
    
4.  Use the manual tool-calling loop (more control than AgentExecutor)
    
5.  Add project context loading for domain awareness
    
6.  Deploy with persistent storage
    

The specific implementation is tailored to my needs, but the architecture generalizes.

Keystone is not trying to be AGI. It is trying to be a reliable, trustworthy, genuinely useful thinking partner. Sometimes the most valuable thing is an agent that knows when to stop.

* * *

_Last updated: February 2026_

---

*Originally published on [MetaEnd](https://paragraph.com/@metaend/building-keystone-autonomous-ai-agent-with-memory)*