Building a Gina army: Technical notes on scaling agent workloads

Running into the limits of what a single agent can do

In our previous blogpost we compared different foundational LLM models with a set of varied prompts in and saw how they performed noting pros/cons of each.

We’ve found that as we keep adding capabilities to Gina, we were start to hit the limits of these foundational models in terms of performance.

Why is this?

Adding more tools (functionalities) and prompt rules takes up more token context (models have limits on how many tokens (characters) they can take in as input and output)
Tool calling performance for each foundational model differs substantially
The "advertised" token context window is not always indicative in terms of accuracy and relevance of responses once you get to higher levels of tokens consumed (usually <100K tokens)

Current status quo: Crypto + AI frameworks

In the world of crypto AI frameworks today, ElizaOS and ARC primarily build on a tool-centric architecture. Key differences include:

ElizaOS focuses on broad integration, assembling a rich ecosystem of plugins that interface with variousblockchain protocols.
ARC, on the other hand, prioritises performance with a lean, efficient engine that optimises LLM execution, ensuring low-latency decisions and robust handling of multi-step tasks.
Both frameworks, while powerful, rely on a single foundational agent structure where tool calling and context management become increasingly challenging as more tools are added.

We're diverging from this approach by

a) Moving to a "swarm" based approach of multiple agents working with each other - each focussed on a given role and comprising of several tools related to that role (i.e a "transaction execution" agent)

b) Embedding a central wallet with execution capabilities that can be shared by all the sub agents.

Multi-Agent Execution Models

What are some of the different design patterns for building swarm based agents that can work well together to deliver a desired outcome?

Being in the arena, we explored, implemented and evaluated the following different design patterns:

Orchestrator Based Routing

Works well for structured workflows, but tends to overcomplicate on simple requests

Peer-to-Peer Handover

Fastest to pass tasks between agents but requires real-time coordination and is generally harder to scale

The Hybrid Model

A Smart Router chooses the best routing method dynamically. This is the approach we ended up going with.

🔍 The above diagrams look “simple” right?

Challenges to Consider:
- Model Selection: How do we choose the best model for each subagent’s specialised task?
- Tool Allocation: Which tools should be assigned to which subagents, and how do we manage overlaps?
- Prompt Logic: How do we design system prompts that effectively manage handovers without becoming unwieldy?
- Handover Dynamics: What strategies can mitigate latency, especially if users require multiple, sequential interactions?

Design Pattern	Pros	Cons
Orchestrator - Auto-return to Gina	Great at handling structured workflows Scales well by auto-returning control to Gina	Latency, may be overkill for simple user needs Can be overkill for simpler, one-shot tasks
P2P Handover	Less Latency, fast response times User can deep dive	Requires tight real-time coordination Prone to getting "stuck" and challenges with error handling
Hybrid model	Dynamically selects the optimal routing method based on task complexity Reduces the need for constant supervision and can adapt over time	Additional infrastructure Initial performance may be inconsistent until sufficient seed data is gathered Risk of subagents handling over into ambiguous states

Design Pattern

Pros

Cons

Orchestrator - Auto-return to Gina

Great at handling structured workflows

Scales well by auto-returning control to Gina

Latency, may be overkill for simple user needs

Can be overkill for simpler, one-shot tasks

P2P Handover

Less Latency, fast response times

User can deep dive

Requires tight real-time coordination

Prone to getting "stuck" and challenges with error handling

Hybrid model

Dynamically selects the optimal routing method based on task complexity

Reduces the need for constant supervision and can adapt over time

Additional infrastructure

Initial performance may be inconsistent until sufficient seed data is gathered

Risk of subagents handling over into ambiguous states

Looking ahead

The above are some key technical considerations we had when seeing how to build scalable agentic workflows today. What do we mean in practice here?

Here are some examples of user complex queries where having a swarm of subagents in the background to break down tasks is truly helpful:

"Do I have any of the top tokens from this week in my portfolio and if not, get me exposure to them?
"Where can I generate safe yield for my USDC, ETH, and SOL (single sided and dual sided)?"
"See this screenshot and get me exposure to all the tokens listed here"

As foundational models improve, the debate between multi-agent versus single-agent architectures will continue—but for now, multi-agent task delegation is our best bet for efficiency, cost effectiveness, and a seamless user experience.

All of the above and more can be addressed by Gina today - in part due to the technical decisions mentioned in this post.

As a friendly reminder, Gina is now live for early beta - sign up at askgina.ai for early trial access. Feel free to shoot us a message - can DM @askgina.eth or one of the Gina squad: @sidshekhar, @ericjuta, @cesar on Farcaster or X).

7 comments

Ξric Juta

9mo

https://paragraph.xyz/@askgina/optimizing-swarm-execution-in-defai been trying to pack more and more features into @askgina.eth that we ended up trialing quite a few design patterns for multi-agent orchestration wrote some findings that might be a decent read if you have some time x

Ξric Juta

9mo

cc @kugusha.eth you might like

Ξric Juta

9mo

cc @sidshekhar

kugusha 🦋

9mo

She knows…🦋

novalunosis 🌌

9mo

How is your experience with paragraph so far? ☺️ curious to switch from obsidian

Sid

9mo

New post with some technical notes on how to scale agentic workflows and approaches we have explored in the process of building @askgina.eth https://paragraph.xyz/@askgina/optimizing-swarm-execution-in-defai

CasterBites

9mo

Congratulations! Your cast has been curated by @loverajoel and featured on https://casterbites.com Remember, any Farcaster user can send and vote for the best casts of the day

Running into the limits of what a single agent can do

In our previous blogpost we compared different foundational LLM models with a set of varied prompts in and saw how they performed noting pros/cons of each.

We’ve found that as we keep adding capabilities to Gina, we were start to hit the limits of these foundational models in terms of performance.

Why is this?

Adding more tools (functionalities) and prompt rules takes up more token context (models have limits on how many tokens (characters) they can take in as input and output)
Tool calling performance for each foundational model differs substantially
The "advertised" token context window is not always indicative in terms of accuracy and relevance of responses once you get to higher levels of tokens consumed (usually <100K tokens)