Cover photo

One Model to Rule Them All

Keeping models balanced before they run away

AI use continues to grow across organizations. Everyone around us, at this point in time, is using an LLM model - whether standalone, embedded within enterprise applications or connected via an MCP. The rate of growth presents numerous challenges. Security being the most concerning one - individuals copying and pasting company data to get results, MCP servers granted root-like access to machines, and other uses I'm sure cybersecurity teams are sniffing out.

The other challenge is an information management challenge. This has all of the telltale signs of information spread across different mediums resulting in a fragmented output becoming an obstacle to measurable AI ROI if not addressed. Much like the world of enterprise databases before they were all connected/replaced by lakehouses and warehouses, IT organizations need to start planning for enterprise AI offerings for their users and stakeholders. Easily said and will be an endeavor that reshapes technology teams as the AI vertical is developed through AI engineers and AI-prioritizing leaders.

An organization's biggest challenge will be to not turn away from AI. I know that seems silly given how much everything in our professional lives is now revolving around AI but there cannot be a half step approach. It's either yes in full or get left behind. Both feet into AI, accepting that resistance is futile. Let's remove the thin veil preventing discussions and talk in full confidence between individuals who have a desire to use it or have used it to solve problems. Which raises the question - is using AI wrong? I'd argue that's not the question. The question is: do you want to get left behind? I don't, hence we're here talking about it.

Organizations that don't want to get left behind should consider adopting AI/LLMs into their enterprise at the enterprise level. Not using AI functionality that a vendor is pushing you to use. Not telling users you'll reimburse a Claude Pro license. Instead, finding an agonistic AI tool that gives you general AI use + the potential to connect your applications to. Users want one location to be told "this is where you go to use AI". That solution has to be onboarded just like any other software application used by a company, with security and privacy being top of mind. Once it has been enabled inside and made available, it will require the right governance to avoid misuse and costs spreading like wildfire. Two big components of that governance are models and tokens.

Understanding models and tokens are critical to thinking about AI governance as they drive not only the cost but the results (or quality of the results) for the company and its users. Models will need to be identified and bucketed for specific uses. Activities centered around automating a process should use a more complex model. Agents released to answer simple questions should use the lightest/smallest model available. Engineers building a new app with AI should use the smartest or most robust model that can guide them to an appropriate solution. All different types of uses will need to be categorized and a corresponding model called out for them. If not, the risk of misusing tokens increases dramatically. And I say misuse because leveraging deep thinking models for simple agent-esque questions is how costs start to run out of control. The cheapest model for the highest quality (accurate) answer is what should be strived for.

Having tokens in mind, we can aim for the least amount of tokens used to the highest quality answer. Since we are essentially charged by the amount of text inputted into the LLM, the focus is on the smallest amount of prompts that retrieve the penultimate answer. This is more of an organizational focus that a company's users need to implement but one that greatly determines cost and is potentially mitigated by an enterprise level agreement. Less prompts cost less since LLMs usually reprocess the conversation history with every prompt thus the more you ask questions, the more tokens you are feeding and the more it's going to cost in usage. Another way to minimize cost is leveraging prompt caching. Prompt caching essentially addresses the reprocessing of previous prompts in a conversation by storing it in memory and referencing it when new prompts are entered. The trick here will be identifying the ideal TTL window so that tokens writing to cache are minimized and data is not stale. If you are building an information agent, you might consider caching the context document once per session (when the user initiates their first interaction with the agent).

Both model selections and prompt mindfulness are getting into the weeds of the use and governance of LLMs. Both highlight key elements that organizations need to prioritize as they take on AI. However, both take a back seat to what is the most important aspect of onboarding AI into an organization. And that's ensuring there is one enterprise AI solution that connects all of your data sources, platforms, BI tools, and user excels together. Effectively, in the hierarchy of enterprise applications, AI models sit at the very top because they are going to be the eventual sole point of interaction with users. The enterprise AI models application will be the interaction layer and the action layer. Users won't update excel files anymore, they'll ask the model to update it (using the right model of course). Users won't get into Salesforce or Power BI anymore, they'll tell the AI what they want done and the model will do it. This is the shift of the information management landscape.

post image
All enterprise applications give precedence to the enterprise AI solution, directly supported by dedicated AI engineering team and AI governance committee. Users interact principally with AI layer above other enterprise applications.

From an information management perspective, all other applications are now on the same level - below the AI tools. This is important to understand as it solidifies the need to focus on one AI solution that is capable of handling all of an organizations tools and generated data. If that's not taken into consideration, we will be repeating the fragmented data landscape that all organizations have been working to escape for the last 10 plus years. Organizations cannot afford to fragment their AI tools and hope for consistent, accurate answers received by their users. Getting different results across two enterprise applications that claim to have the same data will be a huge inhibitor to adoption, success and measurable ROI.

Additionally, engineering teams cannot be expected (nor should they be asked) to define and build their own AI tools that facilitate the use of their responsible tech stack. That work is redundant within the scope of thinking enterprise-first and building context within the topmost layer. And that work should be led by a dedicated AI engineering team whose responsibilities will be to enhance and optimize models, build guardrails with model selection and prompting, and imbue context into interactions with models.

Successful AI adoption = prioritized( dedicated AI team + governance + dedicated enterprise AI solution).