The Digital Transformation Playbook

Agent Strategy, Made Practical For Leaders

Kieran Gilmurray

Want a clear path from AI buzzwords to business results? We walk through a practical executive framework for building and deploying agents that actually move the needle. Instead of drowning in technical detail, we focus on what matters: memory that persists, reasoning loops that plan and adapt, and tool integrations that touch the systems where value is created.

TLDR / At a Glance:

  • executive mental model for agent strategy
  • working memory versus episodic memory with RAG
  • step-by-step RAG example using BYOD policy
  • traditional RAG versus agentic RAG adaptability
  • fine-tuning as semantic memory and trade-offs
  • prompt engineering structure, guardrails and tools
  • rule of thumb for choosing methods
  • reasoning loops ReAct and perceive-think-act-learn
  • task decomposition, planning and exception handling
  • API integration, orchestration and real-time adaptation
  • leaders’ role in architecting capabilities to outcomes

We start by demystifying memory. Short-term working memory keeps conversations coherent, while episodic memory via retrieval augmented generation anchors responses in live, organisation-specific data. Using a concrete BYOD policy example, we show how semantic search, vector embeddings, and augmented prompts reduce hallucinations and boost accuracy. Then we contrast traditional RAG with agentic RAG, where autonomous agents iterate questions, switch data sources, and ask for clarification to get the right context before acting.

From there, we unpack fine-tuning as semantic memory that embeds domain expertise, including the trade-offs around cost, maintenance, and catastrophic forgetting. We pair that with prompt engineering you can use today: define persona, objectives, tools, constraints, and output format to shape reliable behaviour without new infrastructure. Our rule of thumb keeps choices simple—start with prompts, add RAG or function calling for freshness and depth, and fine-tune when specialisation is essential.

Finally, we get practical about execution. ReAct loops and the broader perceive-think-act-learn model enable agents to decompose tasks, plan across constraints, handle exceptions, and learn from outcomes. The payoff arrives when agents connect to your stack through APIs, orchestrate across CRM, ERP, payments, and messaging, and adapt to real-time data. Leaders don’t need to code chips; they need to architect systems that combine memory, planning, and tools into a consistent methodology. Subscribe, share with a colleague who leads transformation, and leave a review telling us which workflow you’ll automate first.

Want some free book chapters?  Then go here How to build an agent - Kieran Gilmurray

Want to buy the complete book? Then go to Amazon or  Audible today.


Support the show


𝗖𝗼𝗻𝘁𝗮𝗰𝘁 my team and I to get business results, not excuses.

☎️ https://calendly.com/kierangilmurray/results-not-excuses
✉️ kieran@gilmurray.co.uk
🌍 www.KieranGilmurray.com
📘 Kieran Gilmurray | LinkedIn
🦉 X / Twitter: https://twitter.com/KieranGilmurray
📽 YouTube: https://www.youtube.com/@KieranGilmurray

📕 Want to learn more about agentic AI then read my new book on Agentic AI and the Future of Work https://tinyurl.com/MyBooksOnAmazonUK


SPEAKER_00:

Chapter 2. The Executives Framework for Agent Strategy. The most successful technology transformations in business history have been led by executives who focused on core business principles. Steve Jobs didn't need to be a chip designer to envision the iPhone. Jeff Bezos didn't need to be a database engineer to architect Amazon's marketplace. Similarly, executives don't need to understand the intricacies of neural networks to make strategic decisions about AI agents. That said, they do need a clear mental model as to what agents can do and how they create value. You'd never hire someone without knowing what they're good at, so don't deploy AI agents without understanding their architecture. In this chapter, we'll look at how to think strategically about agents and what makes them different from regular software. Memory in modern AI systems. What is the single greatest limitation of early AI models, such as GPT-1? It forgot everything, true digital amnesia. They gave brilliant answers, then instantly forgot the conversation. For business, that was unacceptable. You cannot help a customer if you lose their name and problem in seconds. Newer models fixed this flaw. Short-term or working memory is what enables modern AI to maintain conversational continuity. Rather than treating each interaction as a blank slate, the system uses a dynamic context window to capture recent exchanges. User instructions and reasoning steps. This enables the AI to track conversations, resolve ambiguities, and respond in a coherent manner. Working memory acts like a scratch pad, holding just enough information for the current task, but fading once the task ends, unless it is stored in a longer-term system. Episodic memory and retrieval augmentation generation. A further challenge arises when a question requires information that was either missing from the training dataset or generated after the model's knowledge cutoff. In such cases, an LLM may hallucinate, producing factually incorrect outputs, or default to a generic response such as, this is outside my capabilities. Retrieval augmented generation is a powerful technique that enhances an LLM's response by grounding it in external, up-to-date information. This type of memory is often referred to as episodic memory. When a user submits a query, the system does not rely only on its pre-trained data. Instead, it first searches a defined corpus, such as internal documents, spreadsheets, or wikis, before generating an answer. This ensures outputs are both contextually relevant and factually supported. Step-by-step rag example. Let's imagine you are working at a company, and you want to use an internal chatbot to ask a question about the company's policy regarding the use of a personal laptop for work purposes. Step 0. The query. You start by typing your question into the company's AI chatbot. Can I use my personal laptop for work projects? This query is sent to the large language model, LLM. Without RAG, the LLM would have to guess the answer based on its general, pre-existing training data, which, more likely than not, does not include your specific company's IT policies. Step 1. Retrieval. The R in RAG. The system searches a specific collection of documents, which, in this case, is your company's internal knowledge base. This includes documents such as an IT security policy PDF, an employee handbook, or internal wiki pages regarding the use of personal IT equipment. Semantic search with vector embeddings. This is not a simple keyword search. The RAG system converts your query and all company documents into numerical representations known as vector embeddings. These vectors capture the meaning of the text. The system now looks for documents whose meaning is mathematically closest to the meaning of your query. Finding relevant information. Because it's searching by meaning, it might find a section in the IT security document titled Policy on Use of Non-Company Devices, and another in the onboarding guide called Bring Your Own Device. BYOD guidelines. Even those that don't contain the exact phrase personal laptop, they are semantically relevant. Step 2. Augmenting the prompt, the A in RAG. The system now combines the relevant snippets of text it found with your original question. This is known as the augmentation phase. The simple prompt is transformed into a much more detailed and context-rich prompt that is then passed to the LLM. It looks something like this. Start of augmented prompt. Context from IT Security Policy. Use of personal devices to access company data is permitted only if the device is enrolled in the company's Mobile Device Management and passes a security compliance check. From BYOD guidelines, employees who choose to use personal laptops for work are eligible for a monthly stipend of$50, but must ensure all company related work is stored exclusively on the company's secure cloud drive, not the local machine. Original question Can I use my personal laptop for work projects? Instruction Using only the context provided above. Answer the user's question. End of augmented prompt. Step 3. Generating the final answer. The G in RAG. Finally, the LLM receives this augmented prompt. This is the generation phase. Grounded by the specific, factual information from your company's documents, the model can now generate a precise and helpful answer, rather than a generic or incorrect one. The chatbot's final response to you would be, yes, you can use your personal laptop for work, provided it is enrolled in our MDM software. You will also be eligible for a$50 monthly stipend, but you must store all company work on our secure cloud drive. By following these steps, RAG ensures the model's answer is not just a guess, but it is based on actual, verifiable data from a trusted source. Traditional RAG versus a Gentec RAG. Everything described above, retrieving relevant documents, augmenting the prompt, and generating a grounded response, is what we call traditional RAG. In traditional RAG, the system typically retrieves context from a fixed single source in a one-shot process, then uses that information to generate an answer. Agentic RAG extends this approach by embedding autonomous agents. These agents can dynamically decide which tools or data sources to query, switch between multiple databases, and reformulate questions if the initial retrieval is insufficient. Agents may also perform iterative searches or ask for clarification, introducing adaptability and multi-step reasoning into the process. Fine-tuning and semantic memory. Fine-tuning is a method that adapts a pre-trained LLM to a specific domain by continuing its training with a high-quality domain-specific dataset. For instance, fine-tuning a model on thousands of technical support queries enables it to recognize the patterns and solutions typical of that context. During this process, the model's internal parameters, also known as weights, are adjusted through a technique called backpropagation. In doing so, the model develops domain expertise in a more defined reasoning process. It reshapes the model's internal parameters to align its reasoning process with a specialized task. Think of RAG as like giving someone a library of medical textbooks. The knowledge is available, but it must be looked up each time. Fine-tuning is like sending that person to medical school. After training, a doctor carries an ingrained, intuitive understanding of how symptoms and systems connect without needing to consult a book for every case. Memory systems why agents that remember drive higher customer lifetime value. The memory capabilities of AI agents represent one of their most significant business advantages. Unlike traditional software, which operates statelessly and treats each interaction as a fresh start. Agents can maintain persistent searchable memory across engagements, building and compounding value over time. From a business standpoint, this memory architecture delivers three critical outcomes. 1. Escalating value. Each interaction enhances the agent's effectiveness for future use, creating a compounding cycle of performance gains that traditional software cannot match. 2. Personalization at scale. Agents deliver individually customized experiences without the typical costs associated with personalization, enabling mass customization across large customer bases. 3. Institutional knowledge. Agent memory becomes a persistent repository of organizational learning, thereby reducing knowledge loss due to employee turnover and enabling the continuous optimization of processes. Prompt engineering Prompt engineering is the practice of carefully and deliberately shaping inputs to guide an LLM toward a desired output. Unlike RAG and fine-tuning, this approach does not alter the model's architecture or require new data. Instead, it draws on the model's existing capabilities by structuring the prompt in a way that activates the most relevant patterns learned during its initial training. At its simplest, prompt engineering involves refining a vague query into one that is sufficiently clear and detailed. For example, poorly written prompt. Is this code secure? Well-engineered prompt. Review the following Python code for potential security vulnerabilities, including input validation, authentication, and data handling. List any issues you find, explain why they are risks, and suggest safer alternatives. More advanced prompt engineering techniques include embedding examples, supplying additional context, or supplying explicit formatting requirements. These methods enable greater precision, consistency, and control over model outputs. For example, an effective prompt for a business agent might include Persona, you are a helpful and professional customer support agent for our company. High level goal. Your primary objective is to resolve customer issues quickly and accurately. Available tools. You have access to the following tools a knowledge based search, a CRM lookup, and a ticket escalation tool. Constraints and guardrails. You must never provide medical or legal advice. If a customer expresses frustration, you must immediately escalate their ticket to a human manager. Crafting prompts of this kind is now an essential business skill. The clarity and precision of your instructions directly shape the quality, reliability, and safety of an agent's performance. RAG vs. Fine-tuning vs. prompt engineering. Each method serves a distinct purpose in shaping how agents perform. Retrieval augmented generation best suited for providing agents with access to up-to-date or organization-specific information, functioning as episodic memory. Its drawbacks include latency from the retrieval step and added costs for storing and processing external corpora. Fine tuning. The primary method for instilling semantic memory, giving the model deep expertise in a domain. Fine tuning integrates specialized knowledge directly into the model, resulting in faster inference since no retrieval is required. However, it is resource intensive, requires ongoing retraining for updates, and carries the risk of catastrophic forgetting where domain learning weakens general capabilities. Prompt Engineering The most lightweight approach. It requires no changes to infrastructure, provides immediate feedback, and offers high flexibility. Its limitations are trial and error refinement and dependence on the model's existing knowledge base, without the addition of new information. Rule of thumb start with prompt engineering to test performance. If accuracy remains insufficient, consider adding rag or function calling to expand knowledge. If deeper specialization is needed, fine-tune the base model. The React and Perceive Think Act Learn Loops. If memory provides an agent with knowledge, reasoning, and acting gives it the ability to pursue goals, breaking a large objective into a dynamic, step-by-step plan. The most widely adopted approach today is React, short for reason and act. In this model, the agent alternates between thinking out loud and taking action in a recurring cycle. Reason. The agent evaluates the current context and decides the next subtask. Act. It executes the necessary tool or action. Observe. It reviews the outcome, using that feedback to inform its next step in reasoning. For example, when asked to schedule a meeting. 1. Reason. I need team availability. Check the calendar. Act. Calls the calendar tool. Observe. Receives availability data. Then reasons again. Q3 forecast doc needed. Search files. 4. Act. Executes a document search. 5. Repeat until the goal is complete. In most implementations, observe is not called out separately. It is folded into reasoning. After every action, the agent immediately processes the outcome, integrates it into its reasoning trace, and decides what to do next. As a result, the practical loop is reason, act, reason, act. Current frameworks such as Langchain, Lang Graph, Autogen, and Crew AI rely heavily on React style loops. Perceive, think, act, learn. The perceive, think, act, learn loop is a broader, older model drawn from robotics, cognitive science, and control systems. It frames an autonomous agent as one that perceives its environment, sensors, data streams, user inputs, thinks by planning or reasoning about possible actions, acts by executing a decision in the real or digital environment. Learns by adapting from outcomes, updating its model or behavior. This framework represents the longer-term vision for agentic AI, systems with situational awareness, adaptive learning, and persistent memory that enable continuous improvement and a sense of autonomy. Reasoning capabilities how agents break down complex business processes. Agentic AI systems bring reasoning to automation. They can decompose tasks, leverage external tools such as web search or calculators, interact with humans, and even collaborate with other agents. This reasoning ability is transforming how organizations handle complex, multi-step workflows that once required extensive human coordination. Decomposition. Agents break large objectives into manageable subtasks. For example, a directive to optimize customer onboarding becomes a structured sequence, analyze funnel performance, identify friction points, research best practices, propose improvements, test solutions, and measure outcomes. Planning. Beyond simple task execution, agents design strategies that optimize for multiple objectives simultaneously. They can plan parallel work streams, anticipate resource constraints, identify dependencies, and adapt sequences as priorities shift. Exception handling Unlike rigid automation systems, agents can reason through unexpected situations and respond accordingly. Faced with an edge case, they evaluate alternatives, consult knowledge bases, escalate appropriately, and retain what they learn for future scenarios. Cross domain integration Agents reason across organizational silos, weighing sales impact, inventory constraints, customer satisfaction, and financial implications at the same time. This provides holistic business insight traditionally associated with senior human judgment. Integration, tool use to connect existing systems. Tool use is where an agent's intelligence connects with the real world of business. It transforms an agent from a closed black box into an active, integrated part of your digital ecosystem. In agentic terms, a tool is any system the agent can access through an API, application programming interface, to perceive information or take action. This is the most critical capability for executives to understand because it drives tangible ROI. Reasoning without integration delivers little value. Agents must interact with the systems that run the enterprise. API integration. Agents connect directly with enterprise software through APIs. They can read data from CRM systems, update inventory platforms, trigger financial workflows, and coordinate across the technology stack without costly system replacements. Tool utilization. Agents interact with APIs, databases, search engines, and specialized applications to facilitate seamless integration. They can perform competitive web searches, run calculations for financial modeling, query customer databases, and activate workflows across multiple platforms. Multi system orchestration. Acting as orchestration layers. Agents coordinate workflows spanning multiple systems. For example, they can initiate a customer service request, check inventory, verify payment, arrange shipping, update CRM records, and trigger follow-up communication, all while maintaining full context. Real-time adaptation. By integrating with live systems, agents make decisions based on current states rather than historical data. They can adjust recommendations to reflect real-time inventory, modify pricing in response to demand, or escalate issues using live performance metrics. Conclusion Agents are not magic. Their power comes from combining memory, planning, and tools into a deliberate methodology that serves real business challenges. As a leader, the task is not to be mystified by AI, but to architect the strategy that directs its capabilities.