Thoughts on Generative AI, product leadership, and enterprise AI transformation

This is the third deep-dive in the system of intelligence series. The first two covered the memory substrate and compositional retrieval. Memory holds what the agent knows. Retrieval gets the right slice into working memory. This post is about the third pillar, how the agent actually does anything with what it knows.
This post covers:
The running example is still Maria, the Platinum DeltaBank customer disputing a $39 late fee. We left her with the agent’s working memory fully populated like customer facts, applicable policy, payment history, comparable precedent. The agent now has to act on that information.
Most production agents conflate these terms. Doing so is the most common source of unpredictable behavior. Keep them separate.
:Plan node in the graph for long-running cases, working memory for short ones).call verify_identity(customer_id="CUST_4421") (the action). One task usually maps to one action, but complex tasks decompose into several.The hierarchy is goal → plan → task → action, top to bottom. Each layer is a refinement of the one above. Keeping them separate is what makes the system auditable — six months from now, you can ask “why did the agent take this action?” and trace it back through the task it executed, the plan that contained the task, and the goal that produced the plan.

Two phases govern how an agent moves through this hierarchy.
Plan phase: The agent reads the goal (from the user) and its working memory (from retrieval), and produces a plan. “To resolve a late-fee dispute, I will: verify identity, check waiver eligibility, apply waiver if eligible, log the case, notify customer.” The plan is a structured object, not free text — each task in the plan names a known workflow step and references the tools that will execute it.
Execute phase: The agent walks the plan, picking the next task whose preconditions are satisfied, executing it via tool calls, checking the outcome, moving to the next task. Each action goes through the harness before reaching any real system.
The two phases interleave when reality intervenes. If a task fails, the agent re-plans from the current state. If new information arrives mid-execution (the customer asks an unrelated question), the agent re-plans with the updated goal set. This pattern — plan, execute, observe, replan when needed — is what the field calls plan-and-execute, and it’s the dominant production architecture for agents that need to be reviewable.
A tool is not just a function. It is a first-class declarative object with seven components:
tool:
name: waive_fee
description: >
Reverse a fee that was charged to a customer's account, recording
the waiver reason. Use this only when the customer is eligible per
policy and the amount is within the agent's allowed range.
input_schema:
fee_event_id: string (required)
amount: number (required, max: 50)
reason: enum["goodwill", "system_error", "policy_override"]
requested_by: string (required, employee_id)
output_schema:
status: enum["success", "rejected", "requires_approval"]
transaction_id: string
reversed_at: timestamp
side_effects: write
idempotency_key: required
permissions: [fee_waiver_apply]
timeout_ms: 2000
governed_by: :Policy{name: "Late Fee Waiver"}
Each component does specific work:
waive_fee with amount: 500 against a schema declaring max: 50 is rejected at the schema layer, not by the underlying system.read, write, or write_external. get_payment_history is a read — calling it twice changes nothing. waive_fee is a write — calling it twice without an idempotency key would reverse the fee twice. send_confirmation_email is a write-external — once the email is sent, no key in the world can unsend it. This single property determines whether the tool is reversible, whether it requires idempotency, and what level of harness scrutiny it needs.fee_waiver_apply; an intern doing read-only research does not. The harness checks this before every call.Two patterns are used in production. Neither is wrong; they fit different situations.
The agent has a catalog of tools, each with a name and description. At runtime it reads the descriptions, reasons over what the task needs, and picks a tool whose description matches. The tool is not bound to a goal type — it floats in the catalog, available whenever its description fits. OpenAI function calling, Anthropic Skills, and most MCP-exposed tools work this way.
Strength: flexible and new tools get picked up automatically. Weakness: unpredictable, different picks for similar cases.
A workflow is exposed to the agent as a single tool. Calling the tool invokes a predefined sequence like verify identity, check eligibility, apply waiver, log case, notify customer — each step orchestrated by a workflow engine outside the agent. The agent sees one tool call; the workflow engine handles the rest. The workflow tool exists in the tool registry alongside atomic tools like waive_fee — it just happens to be a tool whose implementation is a multi-step process.
Strength: predictable like for same case, same sequence, same audit trail. Weakness: rigid, only works for cases a workflow was authored for.
The right answer in production is both, deliberately, with three factors deciding:
late_fee_dispute_workflow, card_replacement_workflow) → workflow tools.In practice DeltaBank runs both. For Maria’s case the agent invokes late_fee_dispute_workflow (workflow tool, encapsulates the five-step sequence) plus a few atomic catalog tools like format_currency (used to format the confirmation message). Workflow tools are the rails for high-stakes execution; the atomic catalog is the discretion for everything else.
The context graph from the previous posts is the retrieval substrate — what the agent reads from. It is not where executable processes live. Tools and workflows live elsewhere.
waive_fee (atomic) and calling late_fee_dispute_workflow (workflow) — both are tool calls with names, schemas, permissions, and a harness wrapper. The workflow tool’s implementation happens to be a multi-step process; the agent does not need to know that.DeltaBank’s registry for Maria’s case contains both kinds:
atomic: waive_fee, create_ticket, send_confirmation_email,
verify_identity, check_waiver_eligibility, format_currency
workflow: late_fee_dispute_workflow, card_replacement_workflow,
fraud_investigation_workflow
Audit trail still works. When any tool is invoked — atomic or workflow — the resulting :DecisionTrace in episodic memory records the tool name, arguments, and outcome. The tool definition lives in the registry; the invocation record lives in the context graph. Separation of concerns clean.
Maria’s case can play out in two very different ways depending on what tools the system already has registered. The dichotomy matches the Pattern A / Pattern B split from Section 3, but now we walk through both ends of it on the same customer.
The key architectural point: when a workflow tool’s description matches the user’s request, the model picks it and retrieval is skipped entirely. The workflow tool encapsulates the policy, the sequence, and the atomic calls. There is nothing for the agent to retrieve and reason over — the workflow already knows what to do. Retrieval only fires when no tool in the catalog is a good match.
Maria’s case is well-trodden. DeltaBank has handled thousands of these. A workflow tool late_fee_dispute_workflow is registered in the catalog with a clear description: “Reverse a disputed credit card late fee for an eligible customer, applying tier rules, annual limits, and ledger reconciliation. Use when a customer is contesting a late fee charge.”
Phase 1 (Tool selection): The agent reads the available tool descriptions and reasons over what Maria asked for. The late_fee_dispute_workflow description is a strong match — same domain, same operation, applicable customer profile. The agent picks it. No retrieval is invoked. The workflow tool already encapsulates everything the agent needs — applicable policy, eligibility logic, the five-step sequence.
Phase 2 (Invoke the workflow tool): The plan reduces to a single tool call:
plan:
goal: "Resolve Maria's $39 late fee dispute"
tasks:
- tool: late_fee_dispute_workflow
args:
customer_id: "CUST_4421"
fee_event_id: "EVT_88291"
idempotency_key: "LFD-CUST_4421-EVT_88291"
The agent does not orchestrate the sequence itself. The workflow engine does.
Phase 3 (Execute): The workflow tool call goes through the harness. Inside the workflow engine, the five internal steps run.
(The idempotency check itself fires at the workflow tool level in Phase 2 above — not duplicated per internal step.)
Post-execution (2 checks):
ledger_verification: A separate read against the ledger confirms the $39 reversal actually landed. Pass.The workflow engine proceeds to the remaining steps (log case, send email). When done, the agent’s :DecisionTrace records the workflow invocation and outcome.
End-to-end wall-clock: ~900ms from utterance to ledger reversal. No retrieval. One workflow tool call. Five internal atomic tool calls.
This is what every business-sensitive case should look like in production. Predictable, fast, fully audited.
Now imagine Maria’s case is something less common. Same customer, but this time she says “I want to dispute a $39 late fee on my Platinum card AND an annual fee charge on my business card — and the timing makes me think they’re related.” The agent scans the available tool descriptions. late_fee_dispute_workflow matches part of the request but not the cross-product dimension. annual_fee_waiver_workflow matches another part but does not handle the combined logic. No single tool description is a confident match for the whole request. The system has never explicitly handled a cross-product dispute as a single workflow.
Phase 1 (Tool selection, no confident match): The agent scans tool descriptions but no single workflow tool covers the full request. The agent falls back to the retrieval-and-compose path.
Phase 2 (Retrieval):. The agent invokes retrieve_context. The compositional pipeline (see the previous post for how) returns:
The procedural guidance is the critical retrieved artifact. It tells the agent what sequence to compose.
Phase 3 (Plan from atomic tools): Using the retrieved guidance, the agent composes a plan from the atomic tool catalog:
plan:
tasks:
- T1: verify_identity
- T2: get_fee_event(EVT_88291) # late fee
- T3: get_fee_event(EVT_88450) # annual fee
- T4: check_waiver_eligibility(EVT_88291)
- T5: check_waiver_eligibility(EVT_88450)
- T6: waive_fee(EVT_88291) if T4.eligible
- T7: waive_fee(EVT_88450) if T5.eligible
- T8: create_ticket
- T9: send_confirmation_email
Eight atomic tool calls instead of one workflow call. The agent reasoned its way to this sequence from the retrieved policy text and precedent.
Phase 4 (Execute): The agent walks the plan task by task. Each atomic call goes through the harness , but now applied across each tool call in the plan rather than encapsulated inside one workflow tool.
End-to-end wall-clock: ~1.9 seconds. Slower than Scenario A, broader harness surface, more places for things to go wrong — but the case got resolved using only retrieval and atomic composition.
Three takeaways for the Tools and Actions pillar:
The next pillar is Harness Engineering, the deterministic layer that wraps every tool call, every memory write, every output. Tools turn decisions into changes in the world. The harness is what makes sure those changes are the right ones.
No comments yet. Be the first to comment!

Gen AI Product Leader · Leads AI Applications and Search at eGain
I partner with PMs and engineers to drive production adoption of AI across Fortune 500 enterprises in the US and Europe. IIT Bombay alumnus; previously co-founded Selekt.in and built ChatGen.ai. The thesis I evangelize: knowledge is the harness for AI applications.