Thoughts on Generative AI, product leadership, and enterprise AI transformation

Another new term in the world of AI, ‘System of Intelligence’.
Prompt engineering, AI Agents, context engineering, context or knowledge graphs, claws, harness engineering and now ‘System of Intelligence’
For about more than two decades, the way enterprise software competed was by owning the data underneath that layer. The companies that won the era are Salesforce, ServiceNow, Workday, SAP. Once a company’s operational truth lived in your database, switching cost became prohibitive. The data was the moat. The industry has a name for those products: systems of record.
The next decade is shifting. The place where work actually happens for a sales rep, a support agent, a recruiter, an analyst and increasingly where enterprise value accrues is not the system of record anymore. It sits above the systems of record, pulls from all of them, reasons across and acts back on all of them. The industry is starting to call it the system of intelligence.
A system of intelligence is the reasoning and orchestration layer that sits above one or more systems of record. Its job is to take a goal from a user, or from another agent and achieve it consistently and safely across the systems it has access to.
Concretely:
For a given user persona and a given goal, retrieve the right context and perform the right actions across the systems of record — consistently, every time, safely.
Right context, right actions, consistently, every time, safely. Hit any one of those at small scale and you have a prototype. Hit all five at production scale and you have a system of intelligence.

The first three pillars (memory, context, tools) handle what the agent knows and what it can touch. The last two (harness, learning loop) handle how we make sure it behaves.
To make the pillars concrete, take a single case we’ll return to throughout this series.
DeltaBank is a US retail bank with a large credit card business. Maria is a Platinum cardholder. At 9:47 on a Tuesday morning she calls DeltaBank to dispute a $39 late fee charged to her card.
In the next ninety seconds, the right thing for DeltaBank’s system of intelligence to do is straightforward to list: identify Maria, pull her payment history, look up the relevant waiver policy, check whether she qualifies, apply the waiver if she does, log the case, send a confirmation email, update her record. Every system DeltaBank needs in order to do this already exists. The CRM knows who Maria is, the card platform knows the fee was charged, the knowledge base holds the policy, the ticketing system can log the case, the email service can send the message.
What is hard is doing all of it consistently. For Maria, and for every other Platinum customer who calls in this month, every time, safely. Here is what each pillar contributes during Maria’s ninety-second window.
Memory has to hold everything DeltaBank knows that might matter for Maria: her identity and tier, her account, the specific charge being disputed, the waiver policy that applies, the document that defines that policy, her recent payment history, the outcome of her last conversation, a handful of past Platinum waiver decisions from comparable cases, the “Late Fee Dispute” workflow, the waive_fee tool definition. In a system of intelligence these don’t live in five different stores stitched together at query time, they live in one connected context graph where Maria the customer, Maria the participant in last December’s call, and Maria the subject of a past decision trace are the same node. → Deep dive: The Memory Substrate.
Context engineering has to surface the right slice of that memory into the model’s working window in well under a second. Pure vector search across DeltaBank’s 50,000-document corpus would return semantically similar passages from the wrong customer tier. Pure graph traversal would return the right documents but not the right passages. The discipline is compositional: resolve the entities in the goal (Maria, Platinum, late fee), traverse the graph to narrow the candidate set, then run vector search only over that filtered set. Same pattern for episodic precedent, for procedural workflows, for everything. And if Maria’s call runs long, five minutes of back-and-forth, multiple tool calls, several policy lookups then context engineering also has to compact the window: summarize what’s already been resolved, drop what’s no longer relevant, keep what still matters. Retrieval gets the right content in. Compaction keeps the window healthy as the conversation grows. → Deep dive: Retrieval and the Context Graph.
Tools and actions is the typed registry of what the agent is allowed to do. get_payment_history, get_waiver_count, waive_fee, create_ticket. Each has a schema, a permission scope, a timeout, an idempotency key. The agent does not call DeltaBank’s APIs directly. It calls tools or skills, and tools call APIs. The indirection is what makes the next pillar possible. → Deep dive: Tools and Actions.
Harness intercepts every tool call before it touches a real system. Before waive_fee runs: is the amount in policy range? Is Maria’s tier eligible? Has she hit her annual waiver limit? Does the requesting role have permission? Any failure blocks the call and pushes an error trace back to the model so it can re-plan. After the call: did the ledger actually reflect the reversal? Before the agent’s reply reaches Maria: PII filter, prompt-injection scrubber, tone classifier. A second waiver in 30 days, or anything above a threshold amount, pauses for human approval. The harness is what makes a probabilistic model behave deterministically enough for production. → Deep dive: Harness Engineering.
Learning loop runs the day after. Yesterday’s 4,200 conversations are replayed against the latest model version to catch regressions. A drift detector compares this week’s waiver rate against baseline — if it jumped from 38% to 61%, something changed. Human QA samples 50 cases and writes their corrections back to memory as new decision traces, so tomorrow’s retrieval picks them up as precedent. This is the closed loop: memory feeds context, context feeds the model, the model proposes actions, the harness gates them, the actions hit systems of record, and the outcomes flow back into memory. Deep dive: The Learning Loop.
Takeaways worth carrying into the deep-dive posts.
The center of gravity in enterprise software is moving up the stack. The system of record doesn’t disappear. DeltaBank still needs Salesforce and FIS. But the place where work happens, where reasoning lives, where the user spends their time, is now the layer above them. That layer is where the next decade of enterprise value will accrue.
Building one is not a single engineering discipline. It’s five disciplines stitched together. Memory engineering looks nothing like context engineering, which looks nothing like harness engineering, which looks nothing like running an evals platform. The teams that win at this are the ones that recognize the disciplines as distinct and staff each one accordingly.
The rest of this series goes pillar by pillar. Start with the memory substrate, it’s where everything else gets its grounding
No comments yet. Be the first to comment!

Gen AI Product Leader · Leads AI Applications and Search at eGain
I partner with PMs and engineers to drive production adoption of AI across Fortune 500 enterprises in the US and Europe. IIT Bombay alumnus; previously co-founded Selekt.in and built ChatGen.ai. The thesis I evangelize: knowledge is the harness for AI applications.