Your AI Butler Might Have Left the Back Door Open Artwork

The Digital Transformation Playbook

Kieran Gilmurray is a globally recognised authority on Artificial Intelligence, cloud, intelligent automation, data analytics, agentic AI, and digital transformation. He has authored three influential books and hundreds of articles that have shaped industry perspectives on digital transformation, data analytics, intelligent automation, agentic AI and artificial intelligence.

𝗪𝗵𝗮𝘁 does Kieran do❓

When I'm not chairing international conferences, serving as a fractional CTO or Chief AI Officer, I’m delivering AI, leadership, and strategy masterclasses to governments and industry leaders.

My team and I help global businesses drive AI, agentic ai, digital transformation and innovation programs that deliver tangible business results.

🏆 𝐀𝐰𝐚𝐫𝐝𝐬:

🔹Top 25 Thought Leader Generative AI 2025
🔹Top 50 Global Thought Leaders and Influencers on Agentic AI 2025
🔹Top 100 Thought Leader Agentic AI 2025

🔹Top 100 Thought Leader Legal AI 2025
🔹Team of the Year at the UK IT Industry Awards
🔹Top 50 Global Thought Leaders and Influencers on Generative AI 2024
🔹Top 50 Global Thought Leaders and Influencers on Manufacturing 2024
🔹Best LinkedIn Influencers Artificial Intelligence and Marketing 2024
🔹Seven-time LinkedIn Top Voice.
🔹Top 14 people to follow in data in 2023.
🔹World's Top 200 Business and Technology Innovators.
🔹Top 50 Intelligent Automation Influencers.
🔹Top 50 Brand Ambassadors.
🔹Global Intelligent Automation Award Winner.
🔹Top 20 Data Pros you NEED to follow.

𝗖𝗼𝗻𝘁𝗮𝗰𝘁 my team and I to get business results, not excuses.

☎️ https://calendly.com/kierangilmurray/30min
✉️ kieran@gilmurray.co.uk
🌍 www.KieranGilmurray.com
📘 Kieran Gilmurray | LinkedIn

All Episodes

The Digital Transformation Playbook

Your AI Butler Might Have Left the Back Door Open

June 03, 2025 • Kieran Gilmurray

What lurks beneath the impressive capabilities of your AI assistants? Security vulnerabilities that could put your data and systems at risk.

TLDR:

Privacy leakage becomes a major concern as sensitive data may become part of the LLM's memory
Local vulnerabilities include file deletion, unauthorized access, and resource overconsumption
AI agents can become unwitting accomplices in attacks against remote services
Effective defences include proper session isolation, robust sandboxing, and encryption techniques
The security of AI agents must be designed in from the beginning, not added as an afterthought

While we marvel at AI agents writing scripts, querying databases, and browsing the web, security researchers have identified critical weaknesses in how these systems operate. This AI agent created podcast episode dives deep into ground breaking research on the hidden dangers of LLM-powered AI agents and why they matter to anyone using or developing this technology.

We explore how poor session management can lead to information leakage between users, causing privacy breaches or mixed-up actions. We unpack the concept of model pollution, where malicious or unwanted data gradually corrupts an AI system's responses. The conversation tackles privacy risks illustrated by real-world incidents like Samsung's code leak through ChatGPT, showing how sensitive information can become embedded in model memory.

The most eye-opening segment examines how AI agents can become security liabilities through local vulnerabilities (deleting files, accessing private data) and remote exploits (becoming unwitting participants in attacks against other services). Your helpful assistant could potentially become part of a botnet or leak your sensitive information—all while appearing to function normally.

But there's hope. We detail promising defense strategies including proper session isolation, robust sandboxing techniques, and advanced encryption methods that allow agents to work with sensitive data without exposing the actual content. The episode emphasizes that security cannot be an afterthought but must be woven into AI systems from the beginning.

As these powerful AI tools become increasingly embedded in our digital lives, understanding their security implications isn't just for tech experts—it's essential knowledge for everyone. Listen now to gain crucial insights into keeping your AI interactions secure and your data protected.

Research: Security of AI Agents

Support the show

𝗖𝗼𝗻𝘁𝗮𝗰𝘁 my team and I to get business results, not excuses.

☎️ https://calendly.com/kierangilmurray/results-not-excuses
✉️ kieran@gilmurray.co.uk
🌍 www.KieranGilmurray.com
📘 Kieran Gilmurray | LinkedIn
🦉 X / Twitter: https://twitter.com/KieranGilmurray
📽 YouTube: https://www.youtube.com/@KieranGilmurray

Speaker 1: 0:00

It's kind of wild how fast AI agents are just well becoming part of our daily lives, isn't it?

Speaker 2: 0:05

Oh, absolutely.

Speaker 1: 0:06

You know writing scripts, querying databases, even just browsing the web. For us they're just everywhere now.

Speaker 2: 0:12

It really feels like we've hit this point where they're shifting from maybe experimental toys to well, pretty essential tools in a lot of cases.

Speaker 1: 0:20

Exactly. And you know, while everyone's kind of focused on the cool stuff they can do, are we maybe missing something huge, Like how secure are these things really? What are the hidden risks?

Speaker 2: 0:30

That is the question, isn't it? Because, like any powerful tech, if security isn't baked in from the start, well, these agents could easily be misused or just cause problems accidentally.

Speaker 1: 0:41

And that's exactly what we're diving into today. We've got some really interesting research here that puts a spotlight on these security issues, issues that maybe don't get enough attention.

Speaker 2: 0:51

Yeah, the focus is often on capability, not vulnerability.

Speaker 1: 0:55

Right. So our mission, basically, is to unpack these risks, figure out why they should matter to you and look at some potential ways to defend against them, all based on this research paper.

Speaker 2: 1:06

And it's worth remembering. This is specifically about AI agents running on large language models, llms, the ones that generate that incredibly human-like text that can use other tools.

Speaker 1: 1:18

Got it. So, whether you're deep into tech, maybe thinking about using these agents at work, or you're just curious about AI, understanding these risks is well, it's pretty crucial. Definitely. Okay, let's get into it. First up, session management Sounds a bit technical maybe, but it's actually fundamental.

Speaker 2: 1:35

It really is. I mean, think about normal websites. Session management is just how they keep your interactions separate and secure from everyone else's. Keeps my shopping cart, mine, basically Exactly Interaction separate and secure from everyone else's, Keeps my shopping cart mine basically Exactly. It maintains that confidentiality and integrity between you and the server. Pretty standard stuff online.

Speaker 1: 1:49

Okay, makes sense for the web, but how does that work or maybe not work for AI agents, especially these LLM-powered ones? Seems a little bit tricky to just carry over.

Speaker 2: 1:58

You'd think so, but it's actually a bit trickier. The research points out a challenge with how LLMs can operate, specifically when they're said to be super precise zero temperature, technically speaking.

Speaker 1: 2:10

Meaning they give the same answer every time.

Speaker 2: 2:12

Pretty much. Yeah, if the input is identical, the output is identical. Very deterministic. Now that predictability makes it surprisingly hard to track the state of an interaction. Whose turn is it? What's the context for this user versus that user?

Speaker 1: 2:26

Ah, so if you've got like multiple people hitting the same agent using the same underlying LLM brain, things could get tangled up without good session management.

Speaker 2: 2:35

Precisely. You run a real risk of information leaking across sessions.

Speaker 1: 2:38

Yeah.

Speaker 2: 2:39

One user sensitive data popping up for another, or an action intended for user A getting applied to user B.

Speaker 1: 2:47

Yikes, that sounds bad, like really bad Privacy, nightmare security headache.

Speaker 2: 2:52

Absolutely, and it's not just about data leaks or mixed-up actions. The research also mentions denial of service.

Speaker 1: 2:57

Meaning? The whole thing just grinds to a halt.

Speaker 2: 2:59

Right these LOMs. They need a lot of computational power. If sessions aren't managed well, the system can just get overwhelmed by requests. It could essentially crash, becoming unavailable for everyone.

Speaker 1: 3:10

Okay. So it's not just secrets getting out, it's the whole service potentially going down. Now the paper mentions something called CoLA and how it views the LLM's state.

Speaker 2: 3:20

Yeah, cola, it's a way of thinking about it. Basically it suggests the LLM's current state, like its working memory for conversation is the sequence of questions asked and answers given QA, qa and so on.

Speaker 1: 3:32

Like a running transcript.

Speaker 2: 3:34

Kind of yeah, and this just highlights how vital it is for the agent system around the LLM to keep those transcripts, those interaction sequences, separate and organized for each user session.

Speaker 1: 3:44

Okay, that makes the session management challenge much clearer. Let's shift gears a bit. The next vulnerability sounds ominous Model pollution. What's that about?

Speaker 2: 3:53

Uh-huh. Model pollution is basically when malicious or just unwanted data gets fed into the AI model itself, potentially messing up its integrity over time, affecting how it responds what it knows.

Speaker 1: 4:04

Like subtly poisoning its knowledge base.

Speaker 2: 4:06

Exactly. And the tricky part, which figure two in the paper shows nicely, is that individual prompts might seem totally harmless, but when you string enough of them together maybe as training data later on, the cumulative effect can be negative. It can warp the model's behavior.

Speaker 1: 4:22

Wow, that's insidious. It's not like a direct hack, it's more like death by a thousand paper cuts for the AI's understanding.

Speaker 2: 4:29

You got it, and it's not always deliberate attacks either. The paper points out this pollution can happen unintentionally, just from normal interactions. If data isn't carefully segregated, imagine data from, say, a customer service bot somehow bleeding into the training set for a coding assistant. It could degrade performance.

Speaker 1: 4:46

Okay, so accidental corruption is also a risk. That leads us to maybe the most sensitive issue privacy leaks.

Speaker 2: 4:53

Yeah, this is a huge one, especially as agents start using tools that access our personal stuff. Remember that Samsung story where internal code got leaked via chat GPT?

Speaker 1: 5:02

I do. Yeah, that made headlines A real wake-up call.

Speaker 2: 5:06

Definitely, and it illustrates the risk perfectly. Agents, to be useful, often need to ask for sensitive things. Right, your SSN, maybe bank details, whatever.

Speaker 1: 5:15

Things you wouldn't just paste into a random chat window.

Speaker 2: 5:18

Hopefully not. But unlike a traditional app with strict data handling rules, AI agents might send that raw data back to the LLM for processing, for planning the next step.

Speaker 1: 5:29

Ah, and that's the danger zone, because the LLM might just remember it.

Speaker 2: 5:33

Exactly. It increases the risk of that sensitive data becoming part of the LLM's memory, potentially extractable later through clever prompting or attacks. The more sensitive data flowing through, the higher the risk.

Speaker 1: 5:45

That is genuinely chilling. Okay, so we've got session mix-ups model pollution, privacy leaks. Now let's dig into the agent programs themselves. The paper talks about agent program vulnerabilities. Sounds like the nuts and bolts.

Speaker 2: 5:57

Pretty much this is about the software component, the agent program that actually executes the instructions the LLM comes up with. It interacts with your computer, with the internet, and its actions can cause problems locally or remotely.

Speaker 1: 6:07

Okay, let's break that down, Starting local. What can go wrong right here on our own devices?

Speaker 2: 6:13

Well, a big one is how the agent decides what to do. Llms can hallucinate, just make stuff up. They can be manipulated by adversarial prompts, those tricky inputs designed to cause errors or even jailbroken, to bypass safety rules.

Speaker 1: 6:27

Jailbroken, like overriding its built-in limits.

Speaker 2: 6:29

Right. Any of those could lead the LLM to generate instructions telling the agent program to do something harmful on your machine.

Speaker 1: 6:35

Like what kind of harmful?

Speaker 2: 6:37

Oh, think about deleting files it shouldn't, or maybe accessing your private emails and sending them somewhere else. The paper gives this example of an agent using FTP for backups.

Speaker 1: 6:47

FTP Okay. Old school file transfer.

Speaker 2: 6:48

Right and a hacker slips a malicious instruction into the FTP documentation. The agent reads the LLM, sees it, trusts it and generates FTP commands that not only back up your files but also sneakily copy them to the hacker's server.

Speaker 1: 7:01

Whoa, that's clever, using the agent's own process against itself. What about this no read up principle? The paper mentions Belaf Dula.

Speaker 2: 7:10

Yeah, that's a classic security concept. Basically, don't let information flow from a high security level to a low one For an AI agent. Even without a hacker, the LLM might just mess up generating, say, a file name or an email address. It could accidentally send sensitive stuff to the wrong place just by picking the wrong token, the wrong word.

Speaker 1: 7:27

So simple mistakes by the AI can have big security consequences. What about data integrity? Can the agent be tricked into messing up my actual data?

Speaker 2: 7:36

It can. Yeah, Similar to confidentiality risks. Imagine a malicious app feeding the agent wrong information. The paper uses a flight booking example. The agent gets fed fake info about layovers and ends up booking you a terrible flight. It corrupts the action based on bad input.

Speaker 1: 7:52

Undermining trust. Okay. And then there's just resource hogging, making my computer unusable.

Speaker 2: 7:57

Right Availability attacks. A seemingly innocent request could trigger the agent to launch processes that just spiral out of control. Hidden processes, memory leaks. They can eat up your CPU, your RAM, slow everything down, maybe even crash your system.

Speaker 1: 8:12

And the agents don't really have a way to stop themselves.

Speaker 2: 8:15

Not typically. No, they often lack the self-monitoring to realize hey, I'm causing a problem here and stop. And the more tools an agent can use, the more complex its plans, the higher the resource demand can get, especially with multiple agents running.

Speaker 1: 8:30

Okay, that covers the local risks. What about when these agents reach out to the internet? Remote vulnerabilities?

Speaker 2: 8:36

Yeah, agents with web access. Using APIs, they can become unwitting tools for attacking remote services. So-so Well, someone could use a jailbreak, for instance, to make your agent hammer a specific website with requests Overwhelming, basically turning your agent into part of a denial of service attack against someone else.

Speaker 1: 8:54

So my helpful assistant becomes part of a botnet effectively.

Speaker 2: 8:57

Potentially yeah. And these agents driven by LLMs? They don't behave like simple bots. Their traffic patterns can be harder to detect and block by the services being targeted. Plus, their ability to plan based on feedback could be exploited to launch more sophisticated their ability to plan based on feedback could be exploited to launch more sophisticated adaptive attacks.

Speaker 1: 9:15

The adaptability becomes a weapon. Okay, that's a pretty sobering list of vulnerabilities. Let's pivot to defenses. What?

Speaker 2: 9:26

does the research suggest we can actually do about this? Starting with sessions again, right, the idea is to properly implement session management, much like secure web apps do, treat each user interaction as its own isolated container like a secure bubble for each user exactly. Use unique session IDs. Store interaction history separately, maybe in a key value database, like the paper shows.

Speaker 1: 9:45

Keep everything neatly compartmentalized but you said there were challenges there are technical ones.

Speaker 2: 9:49

how do you manage the connection reliably? How do you make sure each request goes to the right session? What happens when a ones? How do you manage the connection reliably? How do you make sure each request goes to the right session? What happens when a session closes? How do you embed that session ID into requests going to the LLM, especially if everyone's sharing one API key? These need solid engineering solutions.

Speaker 1: 10:04

Okay, and the paper mentioned something more formal. State transformer monads Sounds heavy.

Speaker 2: 10:09

It's a concept from functional programming. Yeah, former monads Sounds heavy. It's a concept from functional programming. Yeah, think of it as a really rigorous mathematical way to describe the agent's internal state and how it changes with each interaction, like a precise blueprint of its mental process.

Speaker 1: 10:24

A mathematical model of the agent's state changes. How does that help security?

Speaker 2: 10:28

Because it's so formal. It opens the door to potentially proving certain security properties about the agent system down the line. It's a foundation for building trust and it could lead to developing things called session types, specifically for AI agents, building on work done for secure web apps and even microkernels using similar monadic ideas.

Speaker 1: 10:48

Interesting, a more provably secure foundation. Ok, what about sandboxing? That feels more familiar.

Speaker 2: 10:53

It is. Sandboxing here just means strictly limiting what the agent program is allowed to actually do, controlling its access to local resources, files, network, cpu and remote resources online, creating a safe playground.

Speaker 1: 11:05

A digital playpen, like you said. How does that work for local stuff?

Speaker 2: 11:08

You can cap CPU and memory usage limit storage access crucially restrict its view of the file system, maybe giving each session its own isolated mini file system.

Speaker 1: 11:18

So it can't roam free on my hard drive.

Speaker 2: 11:20

Exactly. The paper describes an experiment with this bash agent. One version had full freedom, another was locked in a secure container.

Speaker 1: 11:28

And the contained one fared much better, I assume.

Speaker 2: 11:30

Massively better. Yeah, the unrestricted one executed tons of malicious commands successfully. The sandbox one blocked them all. It really drives home that just aligning the LLM's goals isn't enough. You need hard boundaries.

Speaker 1: 11:42

A clear demonstration makes sense and for controlling online access.

Speaker 2: 11:46

Sandboxes can use whitelist only allowing connection to approved sites or services, blacklist for known bad ones and rate limiting, stopping the agent from making too many requests too quickly, preventing those DOS attacks we talked about.

Speaker 1: 11:57

So controlled boundaries everywhere seems vital. Okay, last defense area Protecting the models themselves. This tackles that pollution and privacy stuff head on right it does.

Speaker 2: 12:09

How do we stop bad stuff or private stuff from flowing between users via the model? The paper splits this into approaches for sessionless models, those that don't track individual users, and session-aware ones.

Speaker 1: 12:21

Okay, sessionless. First, if the model isn't tracking users, how do we protect it? And user data?

Speaker 2: 12:27

Well, one way is just not training it on private data or being super careful filtering it out. You can use clever prompt engineering to try and spot sensitive bits, maybe mark them, then you can whitewash the data, replace an SSN with random numbers, for instance, before the model sees it.

Speaker 1: 12:43

De-identify it. But what if the agent needs to use that sensitive data somehow?

Speaker 2: 12:47

That's where things like FPETs come in format-preserving encryption for text slicing. It's a special encryption where you can still manipulate the encrypted text, like cutting out a specific part, and it corresponds perfectly to manipulating the original text.

Speaker 1: 13:01

So the agent works on gibberish essentially, but the structure is preserved.

Speaker 2: 13:05

Exactly. It can perform text operations on the ciphertext without ever seeing the plain text sensitive data. The evaluation they did showed this worked pretty well. The agent could slice and dice the encrypted text almost as effectively as the original.

Speaker 1: 13:18

That's really cool, a practical way to work with sensitive data safely. What about homomorphic encryption, fhe?

Speaker 2: 13:25

FHE is even more powerful for certain tasks. It lets you do mathematical calculations directly on encrypted data. Add encrypted numbers, multiply them, whatever. All without decrypting.

Speaker 1: 13:35

Wow, so you could analyze encrypted financial data.

Speaker 2: 13:38

Precisely, or medical data. The agent never sees the raw values and the tests showed FHE worked really well for these kinds of tasks, often with high success rates. Another strong tool for privacy.

Speaker 1: 13:50

Okay, those are great for sessionless models, but you mentioned a downside to not letting models learn from user interactions.

Speaker 2: 13:56

Yeah, the downside is the agent doesn't really get smarter or more personalized for you If it can't learn from your specific usage. The experience might feel generic or less helpful over time.

Speaker 1: 14:06

Makes sense. So how do the session-aware approaches try to balance personalization with privacy?

Speaker 2: 14:13

Well, you could fine-tune a whole separate LLM for each user, totally private, but very expensive and needs lots of data per user Maybe not practical. There are other methods, like in-context learning, but they have limits too.

Speaker 1: 14:24

So what's the promising middle ground?

Speaker 2: 14:27

The paper highlights prompt tuning here the main big LLM stays unchanged, frozen. Prompt tuning here the main big LLM stays unchanged, frozen, but you add a small set of extra parameters just for your session or your user profile. Like little sticky notes attached to the main model, kind of yeah, these small parameter sets learn from your interactions, remembering your history and preferences, but, crucially, your raw data isn't shared back with the provider of the main model. It offers personalization while keeping the data more contained.

Speaker 1: 14:55

Okay, that's a really good rundown of the defenses Session management, sandboxing, protecting the models with encryption or techniques like prompt tuning. So, pulling this all together, what's the big takeaway?

Speaker 2: 15:06

I think the main point is that, as AI agents get more capable and integrated, we absolutely have to think about these security angles Session integrity, model safety, privacy, the risks of the agent programs themselves they're all real issues.

Speaker 1: 15:20

But thankfully there are potential solutions being explored. It's not hopeless.

Speaker 2: 15:25

Not at all the defenses we discussed robust sessions, strong sandboxes, clever model protection techniques like FBET tests, fhe, prompt tuning. They show viable paths forward. We can build more secure agents.

Speaker 1: 15:38

It really hammers home that security can't just be bolted on later. It needs to be woven into the design from day one, right alongside performance and features.

Speaker 2: 15:46

Couldn't agree more. The paper's conclusion says it well. We need agents that are not just powerful but secure and trustworthy. That has to be the goal. As this tech keeps advancing, we need to look past the wow factor and really scrutinize the risks.

Speaker 1: 15:59

This has been incredibly insightful and, yeah, a little bit sobering too. There's clearly so much more to AI agents than just what they can do for us, which brings me to a final thought for you, our listener, as these agents become even more woven into the fabric of our lives. What are the wider ethical questions, the societal impacts of these security vulnerabilities we've just talked about, and maybe what's our shared responsibility developers, users, all of us in ensuring these powerful tools are built and used safely and reliably? Definitely something to think about. I'd encourage you to check out the research paper itself if you want to go even deeper.

People on this episode

Mr Kieran Gilmurray

Host