
AI Unscripted with Kieran Gilmurray
Kieran Gilmurray is a globally recognised authority on Artificial Intelligence, cloud, intelligent automation, data analytics, agentic AI, and digital transformation. I have authored three influential books and hundreds of articles that have shaped industry perspectives on digital transformation, data analytics and artificial intelligence.
𝗪𝗵𝗮𝘁 𝗗𝗼 𝗜 𝗗𝗼❓
When I'm not chairing international conferences, serving as a fractional CTO or Chief AI Officer, I’m delivering AI, leadership, and strategy masterclasses to governments and industry leaders. My team and I help global businesses, driving AI, digital transformation and innovation programs that deliver tangible results.
I am the multiple award winning CEO of Kieran Gilmurray and Company Limited and the Chief AI Innovator for the award winning Technology Transformation Group (TTG) in London.
🏆 𝐀𝐰𝐚𝐫𝐝𝐬:
🔹Top 25 Thought Leader Generative AI 2025
🔹Top 50 Global Thought Leaders and Influencers on Agentic AI 2025
🔹Top 100 Thought Leader Agentic AI 2025
🔹Team of the Year at the UK IT Industry Awards
🔹Top 50 Global Thought Leaders and Influencers on Generative AI 2024
🔹Top 50 Global Thought Leaders and Influencers on Manufacturing 2024
🔹Best LinkedIn Influencers Artificial Intelligence and Marketing 2024
🔹Seven-time LinkedIn Top Voice.
🔹Top 14 people to follow in data in 2023.
🔹World's Top 200 Business and Technology Innovators.
🔹Top 50 Intelligent Automation Influencers.
🔹Top 50 Brand Ambassadors.
🔹Global Intelligent Automation Award Winner.
🔹Top 20 Data Pros you NEED to follow.
𝗦𝗼...𝗖𝗼𝗻𝘁𝗮𝗰𝘁 𝗠𝗲 to get business results, not excuses.
☎️ https://calendly.com/kierangilmurray/30min.
✉️ kieran@gilmurray.co.uk or kieran.gilmurray@thettg.com
🌍 www.KieranGilmurray.com
📘 Kieran Gilmurray | LinkedIn
AI Unscripted with Kieran Gilmurray
The Rise of Virtual Assistants That Never Sleep
Can a virtual assistant truly revolutionize the way we interact with technology?
Join us as we welcome Alan Bekker, CEO of eSelf.AI, who has been blazing trails in the AI landscape. From his academic roots in machine learning to the founding of Voka, a pioneer in human-like voice bots later acquired by Snapchat, Alan's journey is nothing short of insightful.
The episode explores the transformative power of AI in creating multilingual virtual assistants that deliver human-like experiences. Alan Becker shares insights into integrating these technologies into businesses, overcoming user perceptions, and optimising for quick ROI.
Topics:
• Rise of AI-powered voice and visual agents
• Importance of integrating AI with existing systems
• Multilingual capabilities enhancing customer interactions
• Visual engagement as a key to effective communication
• Common pitfalls in implementing AI solutions
• Future trends in AI applications and adoption
We dive into the innovations driving eSelfAI's face-to-face visual conversational AI engine, which is transforming industries by providing real-time, multilingual AI agents.
Alan offers us a glimpse into the future of customer interactions, especially how these intelligent agents are setting new standards in sectors like finance and education.
Explore the ground breaking ESOF product from eService, which is redefining customer engagement by integrating AI with business systems.
Discover how businesses can enhance their operations with customizable AI agents that offer 24/7 multilingual support, seamlessly connecting with CRM systems for real-time data access.
We also tackle the ethical challenges of crafting human-like AI interactions, focusing on the importance of visual engagement in creating natural and engaging user experiences.
Whether it's through facilitating smooth transactions or offering comprehensive customer journeys, eSelfAI's technological advancements promise to enhance satisfaction and set new benchmarks in AI-driven customer service.
For more information:
🌎 Visit my website: https://KieranGilmurray.com
🔗 LinkedIn: https://www.linkedin.com/in/kierangilmurray/
🦉 X / Twitter: https://twitter.com/KieranGilmurray
📽 YouTube: https://www.youtube.com/@KieranGilmurray
📕 Buy my book 'The A-Z of Organizational Digital Transformation' - https://kierangilmurray.com/product/the-a-z-organizational-digital-transformation-digital-book/
📕 Buy my book 'The A-Z of Generative AI - A Guide to Leveraging AI for Business' - The A-Z of Generative AI – Digital Book Kieran Gilmurray
Have you ever imagined a world where a single individual builds a billion-dollar company? Not a team with hundreds, but a powerful digital workforce, a team of ones and zeros, speaking your language, understanding your industry and working 24-7 to connect with your audience, effortlessly, delivering real business value to your customers? This isn't science fiction. It's happening today. It's powered by the rise of digital workers, and these are video and voice cloned virtual assistants. Welcome to AI Unscripted, where I talk to the creme de la creme of the world's best technology companies. Today, I have Alan Becker, ceo of eselfai, the first face-to-face visual conversational AI engine, on to talk about creating multilingual AI agents in minutes, with no coding required whatsoever. Welcome, alan. For those who don't know you, would you mind giving them a brief introduction please?
Speaker 2:Sure, thanks for having me here today. I will tell a little bit about myself. So my background I have a PhD in the field of machine learning and AI, published 10 papers all over the AI space of computer vision, nlp speech. I just started my PhD when the deep learning revolution started, so I was lucky enough to publish many papers in multimodal AI. One of my papers was also accepted to Nature, which was an amazing accomplishment for my PhD.
Speaker 2:Then afterwards, you know, as an Israeli, our dream is always to be entrepreneurs. So I asked for my wife's permission to be an entrepreneur. She told me OK, I'll give you one year. We took a loan from the bank for one year and I told my wife, let me try for one year. If we succeed, I mean, that's amazing. Otherwise I'll go back and you know, work for the multinational companies like Google, facebook, wherever they will, always will have a spot for me. So I did my best.
Speaker 2:We founded Voka and it was the early 2018 in which the NLP and voice space started to start working eventually. So there were plenty of chatbots out there and so we told ourselves let's be the first voice bot, human-like voice conversations between humans and machines. That was the vision and the mission of the company. So we built an amazing technology in-house a real-time, human-like voice agent, and we targeted call centers for this technology. We had big customers like American Express, at&t. We powered millions of calls in total until the end of 2020.
Speaker 2:Then, out of nowhere, I would say, snapchat came and they acquired my company, which was a huge success for myself and my co-founder and for the investors. We didn't raise much money back then. We raised about $6 million and the applicant was exceptional. I would say it was a life-changing event for us. Then I joined Snap leading the conversational AI efforts, working on the early days of the large language models. I mean, we were working internally of NLMs and we also were lucky enough, I would say, to be one of the first design partners of OpenAI. I remember the day they reached out to me and said hey, alan, I see that you're leading the conversational AI efforts in Snap, we have something we call GPT-1 and we want to try it out.
Speaker 2:You guys have 400 million data active users, so it's a nice playground for us to start seeing how does it work in real life scenarios. And we were working anyway with our LLM, so it was sure why not? Let's try it. And for myself I mean as an AI researcher I knew that being able to develop NLP based completely on deep neural networks and not on regular stuff, I knew that we changed the world and I was witnessing it with my own eyes, seeing that the LLMs now they give huge, huge capabilities to the agents. So, to be honest, when we saw the company to snap, I knew right away what I would do next. I knew that doing voice-to-voice is very cool, but it's not enough. My dream was always being to create the perfect human-like machine and, as we speak right now in the face-to-face mode, I wanted to create the perfect machine that would do the same. And obviously it started to be possible. First of all, because the LLMs started to work eventually and the ability to generate speech and video in real time started to be something that is doable. So we started the company early 2023 with my co-founder, elo Shishan, which is one of the best minds in the Israeli industry, and we started to build our technology.
Speaker 2:We had quite a few design partners from different industries because we wanted to learn what the market can do with it. So we have customers like Christie's, which is one of the largest real estate enterprises out there. We have banks, financial institutions. We are working already with the largest digital bank in Brazil and with one of the biggest financial institutions in Asia, which is a public company. So we are already powering millions of calls, I would say monthly, with our agents calls, I would say monthly with our agents.
Speaker 2:We have the best. I mean, I'm trying to be modest here, but we know the market and we published a lot of research about it. So we have, as of today, the best face-to-face interactive engine, which means that we are able to provide a faster response time even than the best voice bot only in the market. I mean, we have a face-to-face which is video in, video out, but we are providing a faster response time than voice in, voice out, which is only like a simpler modality, obviously. So, overall, the research team is working very hard in providing the best experiences and I assume during our conversation I will have the chance to elaborate more around it. That's, in a nutshell, about myself.
Speaker 1:That's a pretty good nutshell there, and well done on all the success. I hope your wife's delighted now with that one year of business success. Well, let me jump in a little bit. I'll ask a couple of questions. There's a lot of trends driving the adoption of AI and voice agents in industries today. What are you actually seeing? Why are companies so enthused with agents, not necessarily people?
Speaker 2:That's a great question. So I think the market is being well-educated by big companies like OpenAI and Google, and they are showing very nice demos and.
Speaker 2:I would say it's not only a demo, it's a real product that is working and you can see a voice-to-voice conversation, even a multi-language voice-to-voice conversation, which is something that it wasn't possible until now. I mean, we are right now speaking in English. If we wanna switch our conversation to Italian or French, it's not possible because we both, we don't know that language. Right so, right now, with AI agents, you can actually have a multi-language conversation with the same AI and the same place, which is amazing. Yet it's nice, it's a nice playground to play with it.
Speaker 2:But how you can solve a real problem for a business using those technologies, and that's the main problem that eService is trying to solve. So, if you're a bank, you're a real estate company or any type of business in the world and you want to provide value, usually to your customers, so you can either give them to stick with a human agent, which is amazing, but he's not available 24-7,. Obviously he's not multilingual and the human labor is not cheap. Today they can only do one call simultaneously. That's a huge problem, right? So? But again, using GHPT or Google Gemini is not good enough because, they need to integrate with your business.
Speaker 2:So being able to have a voice to voice or even a face to face interaction without being integrated with your business is not good enough. And that's exactly what ESOP is doing. We are integrating with the CRMs of our customers, meaning that we can extract in real time data from the CRM based on the user requirements and we can update the CRM of our customers, of the business, based on the end user requirements. So, in order to actually provide a real value to the end user, having the voice of a conversation is not enough. You need to be integrated with the databases and data source of your customer. And that's what we do at ESOF. And not only integration, but also to act in the real world.
Speaker 2:Let's say well, I mean, we have this use case with one of our largest customers, with Christie's, so we do a pre-qualification screening call with a potential lead customer of real estate.
Speaker 2:Let's say I want to buy a property in London and I'm screening your website. So, instead of screening your website, I can have the conversation with the eSelf agent, which is integrated with the database of Christie's and, in real time, based on the user requirements, it can filter the right property for him, even show images and doing a virtual tour. So let's assume, right now that's live, you can go to Christie's and you will see it live, you can play with it. That's an amazing thing. But now the end user, he wants to schedule a meeting and to actually see the property. Nobody would buy a property based on a virtual tour, right? So you need to be integrated with the calendar of the aid of a real agent to schedule an appointment, and that's something that we allow to do. We are not only integrated with the data sources, we also integrated with calendars and checkouts, payment checkouts, in order to actually complete an action, which is either schedule an appointment or complete the purchase of a product.
Speaker 1:I think that's important as well, isn't it, alan? Because there's the part of the problem that you have a chatbot, for example, and use it going back in the day and you can do a certain amount and then it ends, whereas what you're describing there that feels more effective for people these days is you're using AI to screen a call, but intelligently, and then using that AI to connect to back-end databases to actually have an informed conversation and then allow the AI to engage with a particular person and their calendar and the financial system to actually make what I would describe as a complete customer experience. Or did I hear that wrong?
Speaker 2:That's correct completely. You completely got it oh wow.
Speaker 1:So tell me this though sometimes people can get put off by AI that looks like people. So AI, virtual people can cause dissonance in people's heads. They know they're talking to a non-person or not a human. So how do you overcome that? Or are you trying to overcome that?
Speaker 2:That's a great question. So, first of all, I mean, I think and we can tell it from real life production people tend to think that the AI today is so great that people they wouldn't distinguish between a real person and AI. But we're not there, right? I mean, we're completely not there. You know my parents they're above 60, and they don't think that they're speaking and they're not tech oriented, right? And they never thought they're speaking with a human or it's not AI.
Speaker 2:From an ethical point of view. We didn't reach yet the point that people that the AI is perceived to be human from many aspects, not only the conversational, but the voice and the intonation and the visual. We are not there yet. Not ESO, not OpenAIation and the visual. We are not there yet. Not ESOF, not OpenAI, not Gemini. We're not there yet. Hopefully we'll be there in the next coming years, but we are not.
Speaker 2:The question is what we're trying to achieve when we build a human life experience.
Speaker 2:So my thesis is and it's based on my previous experience from my PhD at VOCA so people feel comfortable to speak with something that is as human as possible, because we are used to speak with humans.
Speaker 2:So if you let them have an experience that is not human-like, so that's a new type of conversation which they're not used to, so obviously they won't feel comfortable to interact with it and the conversation won't be engaging enough. So this is why we're trying to provide the most human-like experience, and when I say most human-like, it means that you need to let the user to speak freely. You need to respond to the user the same way humans do. So we are interacting right now. You can hear me right now. You can even interrupt me in the middle that we're speaking. So there's a bunch of capabilities that we need to provide to the AI in order to let users and end users to feel comfortable to interact with. So the answer to your question is that nobody still thinks that they're speaking with a human and not AI, but still letting them the most human-like experience is something crucial to have them feel comfortable to interact with the AI and actually to have a more engaging conversation and a more productive outcome for the businesses that they are acquiring the AI capabilities.
Speaker 1:Yeah, because that's fascinating. Because it's interesting because when I see pictures of AI, I see robots, which seems slightly odd. That puts me off. When I see too human like, I'm sort of excited by it and slightly worried by it as well, you know. So again, it's interesting how different generations accept or don't accept technology. How difficult or how complex is it to build an AI agent.
Speaker 2:So it depends how you do that right. So what we did at ESOF and again we have the same problem, I would say, in my previous company so we had a bunch of different customers and we need to let them to create their own agents, and back then we had only one voice for all of our agents. It's nothing as of today. We were supporting only English, not as today, I would say right now and the knowledge base, the integrations. So I learned in my previous company in order to have a scalable business, you need to have a studio, and this is why we built the ESOF studio, which allows for every business and individual in the world to create their own agent, I would say in minutes.
Speaker 2:So how does it work? So you can first choose the face and voice that you would like your agent to represent. You would like to be represented in your business, right? Then you can upload materials, either PDF files, word files or links of your website in order to update the knowledge base of the agent. And finally, you can define what's the conversation goal of the agent. The conversation goal might be to close a deal, might be to collect some data points of the user, might be to integrate with your sales force eventually might be to scale an appointment on the calendar.
Speaker 2:So there are a bunch of different goals of the conversations. And then, basically, when you hit the Publish button on RStudio, you have your agent ready and you have a URL link that you can embed very easily into your existing pipelines. For example, you can embed it to your website, to your LinkedIn, instagram, facebook campaigns, so every person that will hit this link will actually be in a Zoom call with your agent that you just created. And then you can go back to the studio and if you want to modify the face, the voice or the knowledge base of the agent or the integration, you can just do it by yourself and then the same link will be perceived for you and your customers. Your end users will still interact with your new, updated agent. And then you, as a business owner, you have a mystery of the conversations. You can see what your users are speaking about.
Speaker 1:You have the summary of the conversations.
Speaker 2:You have the transcript of the conversations which they let you to learn about the actual experiences of your end users with your agent. I think that with the human right, with your human agent, it's not built in, but with eSelf and AI agent, it's built in because we obviously everything is transcribed, because we are the agent, so we can obviously we know what the agent said and we hear you as a user, so we know what you're saying.
Speaker 1:Which feels a bit more control with a small c. In other words, the challenge with human agents is them remembering everything they've actually said, unless a call is actually recorded. And then a call needs to be recorded and transcribed to get intelligence out of what did or didn't happen, and therefore, with all calls being recorded, you've got a security trail and you can analyze it afterwards, which is exciting. But what are some of the common pitfalls companies encounter, though, when trying to integrate agentics or visual ai agents or solutions into their businesses, because it all sounds a little bit too easy, but it's not.
Speaker 2:Technology is never simple uh, that's a great question. So I would say that businesses, they try eventually to provide that end value to the users. That's the only reason why they're purchasing agents, right, they want to automate something, they want to scale something, they want to release workforce to do other stuff, they want to provide a better customer journey, and so on. So in order to really do that, you need to be integrated with your own database, your own CRMs and the gap that they're seeing between seeing Gemini demo or OpenAI demo and then to reality is huge, they say, ok, so we so just so great.
Speaker 1:So let's integrate it.
Speaker 2:But then they understand that the integration is is a nightmare because you know not to integrate you need to have security checks, you need to be able to do to easily to extract data and to update the data your own database, which is not trivial at all.
Speaker 2:This is why we built the you know the pre-made integrations for the main vendors, which means that you know, we know that most of the customers in the world they have Salesforce, hubspot and some type of very specific CRM. So once we did it for Salesforce, then it's not. We can do it for every customer in the world. So that's quite easy. I would say that with ESL, what we add on top of it, and what I would say is our unique value proposition, is the visual engagement. So when we have a face-to-face conversation, as we do right now, you or myself, we can share visuals with each other. Let's say, I want to share some graphs of some information or share some images. I want to show you just show you something.
Speaker 1:I can share my screen with you.
Speaker 2:So this is why, at ESOF, we did the same Our agents. They're not only a face-to-face agent with a very good-looking agent and a very good lip sync, but we also added on top of it a visual presentation that you're interested in. In a bank user journey, if you have a question about how to unlock your credit card, we can share the screen. The agent in real time is sharing the screen with you and they can actually show you how to go to the application step-by-step guide. You, you go there, you click this tab and then you can unlock your credit card.
Speaker 2:And in educational space, which is a huge business for us when engaging with students, and so everyone knows, the best way to provide information, to consume information, is through visuals. When you have a visual, research says that visual presentation is 60% more engaging than only a voice conversation. This is why it is that when we teach our users, our students, about history, geography or science, we are generating in real time images and videos, illustrations of the actual topic that we teach to the students, because we understand that the students will be more engaged by seeing the visuals and also they will remember way better the lecture by having the visuals in their mind.
Speaker 1:Well, so I love that. I love that that's the visual piece, because I get the figure of 60, 70 percent of people are visual, you know, and therefore the more you can show them, the more you can integrate. I think people would be quite surprised, though, with the capabilities here, because we're not talking about just answering a question. We're actually talking about engaging with end users, showing them visual sharing screens, doing a whole host of things that you would traditionally get with a person. So it's exciting times, but what are the essential components to get companies a faster ROI, or at least an ROI with this technology? Alan?
Speaker 2:So the key components, as I said, is to provide something that is easy to integrate. As I said is to provide something that is easy to integrate. That's obviously the to be able to distribute it fast needed to be able to integrate with existing workflows. This is what we did at ESOF. Second, to be easy to create many agents, because in your own business you might have need 20, 40 different agents, so it should be easy to deploy and build new agents. And the third one is is to get able to to provide information that your end users are looking for. So, for example, in education, for an average school, we provide around 30 or 40 different teachers for them, so they can have a teacher in history, teacher in math, teacher in science for dedicated grades, because the topic that they're teaching for the fourth grade is not the same as the fifth grade, right? So now those schools they can have access to AI teacher of history for fourth grade, fifth grade and so on, easily. And the students at home, they can play with it.
Speaker 2:Let's say that they finished the class at school and they didn't get something.
Speaker 2:Maybe they missed the the class because they were sick, or maybe they just want to to practice, so we can just go to the teacher of the fourth grade of history from their home, from the mobile, from the desktop, wherever they just hit the link and they can have a conversation with the history teacher, which is amazing, something that wasn't possible until now.
Speaker 2:And even the most exciting part is that we are, as I said, we are multi multilingual. We can speak up to 30 different languages, so the most common languages in the world the same agent can cover. So you can go to our website, you can play with it and say, okay, uh, you can say the agent, I don't understand english, maybe we can try to have a conversation in Spanish. Yeah, then in real time, she's switching to Spanish and then you can have a conversation in Spanish, which is something that wasn't possible until now. You know, we see a lot of immigrants all over the world, especially in the US. Right, so they go to a public school in English. Right, but those students, they might speak another language.
Speaker 1:So how they can interact.
Speaker 2:So right now they can interact completely natively with our AI. So this is why we believe the educational space it's a huge potential. It has a huge potential for our technology and we also see the impact that I would say everyone will agree.
Speaker 1:it's a very positive impact on our society to be able to educate every person in the world, regardless their economical background or ethnicity right At relatively low marginal cost because, as you mentioned, you know, you've one agent, you have've 50 agents, you've 100 agents, they can all work simultaneously. That's the joy, I think, of computing in clouds or hyperscaler computing.
Speaker 1:Nowadays you can just put more horsepower at it and get more of the same thing, whereas to train people could be difficult. Well, what do you see as the future of AI, visual AI, agentics? What do you think is coming now? We've described here today?
Speaker 2:Yeah. So I believe 2025 will be a year that we will see applications of AI, the 2024, the companies that succeeded the most were the companies that they did foundational models, right. So either they do llms, or video generation engines, or text-to-speech engine, for example. 11 labs is a great example. Right, they did that only text-to-speech engine. But 2025 it's a mirror, it's a go or no. Go here, right, because if we won't see application companies actually using those financial models, so the financial models won't see more revenues right.
Speaker 2:Eventually, someone needs to use it. So right now 2024, the companies were trying to build stuff, but 2025 is the year that we need to see adoption. If we won't see adoption, the whole pyramid of hardware which is controlled by NVIDIA, by the way and then the financial models of OpenAI and Tropic and so on everything will collapse.
Speaker 2:So we are at the top of the pyramid and we are an applicational AI company, and we believe this will be a great year for us because the lower layers in the pyramid, they already educated the market. So right now the market is ready to adopt the AI applications to find the right use cases, the right customers, the right applications that the end user will eventually gain value from the all different layers in this pyramid.
Speaker 1:Yeah well, thank you to all the companies who've spent billions of marketing dollars to save you having to spend the same billions. I get that as well. I think 2024 was very much, you know hype and excitement and building the foundations and everything else, and now 2025 comes down to actually real applications of this amazing technology being put into the wild as aware and earning companies real return on investment. That's the bit that will get people excited and keep this momentum, or flywheel, moving on. If people want to find out a little bit more about you and eself, how do they go about doing that?
Speaker 2:so that's quite easy, you can. They can either visit our website, wwweselfai. They can play uh for free with our agents I have. They can play with experience. They experience right away. We have a pre-built of 40 or 50, I don't remember right now the exact number, but of pre-built agents from different industries for different applications they can play with it, and then they can go to the studio.
Speaker 2:We're having the studio, a self-service platform which allows them to actually build at the same time by themselves, the ai agent. They can do it in minutes. It's up to them. And then, obviously, if they need a further uh, uh, um, further uh help from us, they can reach out and our team uh will uh we have a conversation with them and see how we can integrate with the more sophisticated TRMs in case it's needed.
Speaker 1:Fantastic, alan, thank you so much indeed, and thank you everyone for listening in today. Look, this is an exciting time to be in the world of business and be in the world of technology. All the technology we now need is there and it works, and it's only going to get better. And, as someone said, ai is the worst, it's only going to get better. And, as someone said, ai is the worst it's ever going to be. So I can't wait to see what happens and what unfolds in the future. Until next time, everyone, thank you for listening and thank you for joining in. I'll see you all soon. Thank you so much indeed.