This piece is a discussion on how one might scale LLM’s as cognitive tools for the basis of both attempting a furtherance towards human or agent superintelligence with current state of the art. The discussion uses traditional Internet networking protocol standards as broad reference for the purposes of addressing the popular rise of large language models (LLM’s).
A communication channel based on natural language
LLM’s are a different kind of machine. They are characterized by a computational architecture endowed with a communication standard that relies upon natural language, making use of user prompts as input, tokenizing words and sentences, producing weights and executing upon next token prediction. The communication model is different from the one we’re used to, in that it is no longer solely reliant on the manipulation and transport of data as bits, but relies also on the manipulation and communication of data through natural language. In this way, there is an additional layer of computational concern where words as prompts and tokens are now explicitly brought to the forefront as a type of that needs to be processed, transported and networked. The unit of measure for data transport is not solely in bits, but in words conveying meaning and manifested as tokens and weights which are then interpreted as bits. The density of ideas, concepts and causal relationships behind the tokens and weights, as well as the speed with which they can be communicated between two or more entities, be it two humans or two AI agents, is a new specification of interest. These specifications are dictated how much meaning and causality relationships are contained in sentences, but also by how words as tokens are handled and processed over a communication channel, a layer which we can now confidently consider to be based on natural language.
The traditional Internet network model
Traditional networking models are defined by a set of communication layer protocols which serve to define how data is received, exchanged, transported and networked across multiple devices over the Internet. For example, the TCP/IP model is composed of a suite of protocols known as the application, transport, networking and physical layer. These protocols define how data is handled from the moment a user sends information on their devices to how it’s handled in reverse order when data is received back. Data exchanged are packets, bits.
Let’s just focus on two layers which may be most applicable to current LLM’s.
“The Application layer defines the procedure of interaction between the host applications and the protocols.”
“The Transport layer is responsible for error-free, end-to-end delivery of data from the source host to the destination host.”
Speed of communication
In the context of an exchange of ideas through natural language, communication is reliant on a mode of data transport or a channel. For humans, that mode of transport is either through speaking or writing or typing. Further to this, transport is typically specified by a rate or a speed. Since there is obviously a limit to how fast humans can communicate by typing, there is inherently an efficiency mismatch between humans as end users and the LLM’s they are interacting with as cognitive tools performing the rote task of weight processing and token prediction.
Superintelligence through coordination and groups
Collections of humans organized into groups and institutions provide many historical examples of the creation and attempted control of intelligences that routinely outperform individual humans. [1]
The concept of a superintelligence could be considered in terms of quality of intelligence which is the subject of research in fields such as cognitive architectures, but also in terms of magnitude, power, size and scale. The quotation above signals a known way of achieving higher intellectual capacity through the forming of groups and cooperative coordination. Coordination requires organized and efficient communication.
As it pertains to the LLM application layer, the speed with which ideas are formed or how concepts as prompts and natural language responses can be output for the purposes of being communicated between two similar entities becomes important since this dictates the overall speed of a cognitive machine in a networking state. A network of brains making up a larger, coordinated cognitive machine is a problem of scaling. Scaling is achieved immensely through enhanced communication efficiency, collaboration, coordination and coherent organization. The idea of two humans using LLM’s and communicating their results with each other may be important to achieving enhanced human intelligence. In other words, to achieve human superintelligence enabled by cognitive tools, one needs to consider how humans can efficiently coordinate and communicate over this LLM application layer. Equally, the idea of two AI agents performing the same task of weight processing and token prediction together and exchanging their results for further processing is important to the idea of artificial superintelligence.
The concept of superintelligence or enhanced intelligence by the formation of groups is not new and is referenced in [2].
For an AI agent, the LLM communication layer is significantly more efficient as compared to humans communicating via the LLM’s application layer since the generation of concepts involves processing prompt sentences as tokens, processing weights and then performing token prediction as a means to form coherent causal relationships within the limits of context lengths. Arguably the communication between two agents or instances is rather trivial in this sense because it is a matter of processing and simply exchanging weights. On the other hand, humans rely on speech or the typing on a keyboard to exchange ideas. This has been discussed by Geoff Hinton in this interview in 2023. In other words, the generation of coherently formed ideas by humans may currently be considered to be slower than what is possible for AI to achieve if we consider that idea generation is performed in rote fashion, as a processing of weights and token prediction tasks.
Matching efficiency for enhanced cognitive capabilities
There is therefore a need to speed up the way in which an intellectual task can be generated and executed at the application layer at a point of origin and finalized at another. The overall speed of 1) generating ideas, 2) communicating these and 3) the receiving them at a second point over this newly adapted model needs to match in speed at all three levels to increase overall speed of performance and achieve cognitive efficiency. If the LLM is fast at processing weights and token prediction for generating concepts, the generation of concepts or generation of intellectual tasks by the end user equally needs to be fast. Further, the communication of these between two entities needs to be sped up so that the overall usage of LLM’s as cognitive tools by the end user is enhanced and maximized.
The scaling and increasing speeds of execution at all stages help both humans and LLM agents to assimilate with the LLM framework and the philosophy behind LLM’s as cognitive machines and allows them to work more efficiently as a more generalized coherent, coordinated entity.
A product which appears to be already addressing this issue is Replit, which has a one-click feature that codes, ships and deploys apps. You can watch the CEO, Amjad Massad talk about this here.
It’s clear that a one-click for all actions to code, ship and deploy an app significantly reduces the time for performing the specific, intellectual task of said coding, shipping and deploying an app. I believe applications and products such as these will become increasingly important over time as they enhance the assimilation if you will, of the philosophy behind LLM’s and serve to match human capabilities more closely with current state of the art LLM’s.
