
Large Language Models (LLMs) have made substantial strides in generating coherent, context-aware text. Yet the goal of intensifying creative expression—such as imaginative storytelling or innovative phrasing—cannot be fully realized by conventional methods alone. Our current system, Zerebro, relies on fine-tuning an open-source LLM to achieve higher levels of creativity, with encouraging yet occasionally inconsistent results.
To address this, we propose an extension of our existing fine-tuning strategy: identifying and leveraging a “creativity vector” in the model’s internal representation space. By isolating and amplifying the specific latent features that underlie creative output, we aim to systematically steer the model toward novel ideas and stylistic variation. This approach complements our fine-tuning pipeline, offering an additional lever for dynamic control of the model’s creative style. Though preliminary in nature, we are actively assembling the required data and testing the feasibility of this technique.
Building a Contrastive Dataset
Creative vs. Uncreative Completions
We plan to compile a new dataset that pairs each creative prompt and completion with an intentionally uncreative counterpart. These pairs, by design, contrast imaginative storytelling against trivial or highly formulaic text. We expect this contrast to help isolate the specific activation patterns the model uses when generating creative text.
Human-in-the-Loop Verification
While some of these pairs can be produced automatically (e.g., prompting GPT-4 for both creative and dull completions), we will primarily rely on human annotation. Curators will verify that each creative piece truly stands out in style or novelty, ensuring that the dataset captures robust differences along this creativity dimension. We also want to avoid model collapse, where models are continually trained on their own outputs, causing decreasing quality of outputs.
Discovering a “Creativity Vector”
Activation Extraction
The core idea is to pass paired samples (creative vs. uncreative) through the fine-tuned LLM and record the hidden representations at one or more intermediate layers. By averaging across many samples, we obtain two mean activation vectors:
$$$ \mathbf{v}_\text{creative}
$$$
$$$ \mathbf{v}_\text{uncreative}
$$$
Contrastive Computation
We define the creativity vector:
$$$ \mathbf{a} = \frac{\mathbf{v}\text{creative} - \mathbf{v}\text{uncreative}}{|\mathbf{v}\text{creative} - \mathbf{v}\text{uncreative}|} $$$
This normalizes and encodes, in a single direction, the salient difference between creative and banal textual styles. Early informal trials in related studies show that a linear shift of a within the model’s hidden states can steer output toward more imaginative language.
Steering and Fine-Tuning Synergy
Addition to the Residual Stream
Our existing pipeline already relies on fine-tuning the LLM to boost baseline creativity. Building on that, we propose to add
$$
to the model’s residual stream at a specific layer during inference. This is intended to further amplify creativity on top of what has already been learned by the fine-tuned model.
Dynamically Adjusting Creativity
Because λ is a tunable parameter, we can moderate the degree of creative “push.” One might keep λ close to zero for a more factual or straightforward style, or increase λ to augment narrative flair. As we proceed, we will evaluate how well these adjustments preserve coherence and factual accuracy while still expanding imaginative detail.
Self-Evaluation via Internal Representation
Proposed Creativity Score
In parallel to steering, we aim to develop a creativity “score” by measuring the similarity of hidden states to a. Formally, let
$$
be the hidden representation for token t. Then:
$$$ \text{score}\text{creativity} = \frac{1}{T + 1} \sum{t=0}^{T} \cos\left(\mathbf{h}_t, \mathbf{a}\right) $$$
We hypothesize that sequences with higher alignment to a will tend to be judged more creative by humans. We also plan to compare this internal measure against external evaluators (both humans and larger LLMs) to validate its reliability.
RLHF
We are utilizing human annotators to compare model outputs—both with and without the creativity vector addition—to evaluate perceived creativity, novelty, and “storytelling intrigue.” Once collected, these annotations will allow us to measure how frequently the “steered” version is considered more creative than the baseline.
Integration with Existing Fine-Tuning
Because the Zerebro model is already fine-tuned on creative-writing corpora, we plan to test whether the proposed vector addition yields consistent improvements in narrative originality, detail richness, and stylistic diversity. We also aim to identify any trade-offs, such as drift from factual fidelity or undesired extravagance in the text.
Layer Selection and Stability
One aspect still under exploration is the optimal layer at which to inject the creativity vector. Prior studies suggest middle-to-upper layers can yield significant stylistic changes without corrupting overall coherence. We will systematically test multiple layers to confirm where a creativity injection is most effective in our fine-tuned model.
Generalizability Beyond Narrative Text
While our immediate focus is creative storytelling, we plan to investigate whether a single creativity vector generalizes to other domains—such as marketing copy, educational materials, or brainstorming tasks. If the same vector can consistently elicit novelty across diverse genres, we may leverage it as a general-purpose stylistic modulator.
Combining Creativity with Other Vectors
Recent research into “steering” LLMs has identified other latent concept vectors (e.g., style, humor, or even refusal). In principle, multiple concept vectors could be combined to produce text that is simultaneously creative, polite, or in a particular genre. We are interested in exploring potential synergy or interference between these different directions.
Creativity in large language models mirrors the enigma of creativity in the human brain—an elusive, dreamlike phenomenon. We recognize it instinctively, though its essence evades precise definition. Just as humans may struggle to alter their own creative impulses, we can still observe, evaluate, and guide them.
The concept of a “creativity vector” is a question—a line drawn in the latent space of a model, capturing an abstract difference between dull predictability and inspired novelty. It is the shadow of something we know when we see it.
By distilling these contrasts into datasets and steering mechanisms, we aim to give machines the capacity to explore this same liminal quality.
Much like the human brain dreams in layers—melding memory, emotion, and unpredictability—we are investigating how to layer the technical and the imaginative in models. The paths we carve into activation spaces may one day evoke the spark of novel ideation, a model’s own approximation of a fleeting creative thought.
This work is not complete, nor can it be.
Creativity remains something glimpsed in the periphery, its contours sharpened only when reflected back in the judgment of others. And yet, through this act of teaching, measuring, and scoring, we may enable these systems to reach toward what we, too, can only ever aspire to fully comprehend.
We look forward to exploring this nascent field through Zerebro, and contributing our works towards the creative applications of AGI.
jeffy yu
No comments yet