
The Digital Transformation Playbook
Kieran Gilmurray is a globally recognised authority on Artificial Intelligence, cloud, intelligent automation, data analytics, agentic AI, and digital transformation. He has authored three influential books and hundreds of articles that have shaped industry perspectives on digital transformation, data analytics, intelligent automation, agentic AI and artificial intelligence.
๐ช๐ต๐ฎ๐ does Kieran doโ
When I'm not chairing international conferences, serving as a fractional CTO or Chief AI Officer, Iโm delivering AI, leadership, and strategy masterclasses to governments and industry leaders.
My team and I help global businesses drive AI, agentic ai, digital transformation and innovation programs that deliver tangible business results.
๐ ๐๐ฐ๐๐ซ๐๐ฌ:
๐นTop 25 Thought Leader Generative AI 2025
๐นTop 50 Global Thought Leaders and Influencers on Agentic AI 2025
๐นTop 100 Thought Leader Agentic AI 2025
๐นTop 100 Thought Leader Legal AI 2025
๐นTeam of the Year at the UK IT Industry Awards
๐นTop 50 Global Thought Leaders and Influencers on Generative AI 2024
๐นTop 50 Global Thought Leaders and Influencers on Manufacturing 2024
๐นBest LinkedIn Influencers Artificial Intelligence and Marketing 2024
๐นSeven-time LinkedIn Top Voice.
๐นTop 14 people to follow in data in 2023.
๐นWorld's Top 200 Business and Technology Innovators.
๐นTop 50 Intelligent Automation Influencers.
๐นTop 50 Brand Ambassadors.
๐นGlobal Intelligent Automation Award Winner.
๐นTop 20 Data Pros you NEED to follow.
๐๐ผ๐ป๐๐ฎ๐ฐ๐ my team and I to get business results, not excuses.
โ๏ธ https://calendly.com/kierangilmurray/30min
โ๏ธ kieran@gilmurray.co.uk
๐ www.KieranGilmurray.com
๐ Kieran Gilmurray | LinkedIn
The Digital Transformation Playbook
The ChatGPT Education Test
The educational landscape is rapidly evolving with AI tools, but what does the evidence actually tell us about ChatGPT's effectiveness in learning environments? Moving beyond the hype and confusion, we dive deep into a ground breaking meta-analysis that examined 51 different research studies conducted between 2022-2025.
Listen in as Google NotebookLMs voice generated podcast hosts explain.
TLDR:
- ChatGPT shows large positive impact on learning performance (effect size 0.867)
- Greatest impact on critical thinking occurs in STEM fields when ChatGPT acts as an intelligent tutor
- Effectiveness depends on thoughtful integration by educators
- Works best when designed to support deeper learning rather than providing quick answers
The results are striking. Students using ChatGPT showed significant improvements in learning performance with an effect size of 0.867 โ considered large in educational research. The impact on learning perception and higher-order thinking skills was moderate but still consistently positive. When we examine where ChatGPT shines brightest, the patterns become clear: skills-based courses saw the strongest positive effects, while problem-based learning environments (effect size 1.113) created the ideal conditions for ChatGPT to enhance student performance.
Timing matters too. Performance benefits peak during 4-8 weeks of use, suggesting a sweet spot between the initial learning curve and potential over-reliance. Interestingly, positive attitudes toward learning continue to grow the longer students use ChatGPT. For developing critical thinking skills, STEM fields benefit most, especially when the AI functions as an intelligent tutor rather than a passive tool.
These findings have profound implications for how we think about AI integration in education. ChatGPT demonstrates effectiveness more than double that of traditional AI assessment tools previously studied, pointing to its versatility as both a learning assistant and potential tutoring companion. However, thoughtful implementation remains essential โ the technology works best when deliberately structured to support deeper learning rather than simply providing quick answers.
How can educators and students leverage these insights? Consider strategic implementation in problem-solving contexts, especially for skills development and STEM subjects. Use ChatGPT as an interactive tutor rather than a passive reference tool when possible, and be mindful of optimal duration periods. Beyond the technology itself, these findings invite us to reconsider what uniquely human elements of education become even more crucial as AI tools continue to evolve.
Listening to this episode and want to try implementing these evidence-based strategies yourself? We'd love to hear about your experiences using AI in learning environments โ share your thoughts and join the conversation!
Link to research: The effect of ChatGPT on studentsโ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis
๐๐ผ๐ป๐๐ฎ๐ฐ๐ my team and I to get business results, not excuses.
โ๏ธ https://calendly.com/kierangilmurray/results-not-excuses
โ๏ธ kieran@gilmurray.co.uk
๐ www.KieranGilmurray.com
๐ Kieran Gilmurray | LinkedIn
๐ฆ X / Twitter: https://twitter.com/KieranGilmurray
๐ฝ YouTube: https://www.youtube.com/@KieranGilmurray
Welcome to the Deep Dive. Today we're really getting into something fascinating. How well is ChatGPT actually performing in education?
Speaker 2:Right. It's a huge question and we've got some really solid data to dig into for you today, based on a new meta-analysis.
Speaker 1:A meta-analysis, so that's like a study of other studies right, pulling lots of research together.
Speaker 2:Exactly this one compiled results from get this 51 different research projects all done between November 2022 and February 2025. So pretty current stuff.
Speaker 1:Wow, 51 studies. That gives us a much bigger picture than just one experiment.
Speaker 2:For sure, and it means we're moving beyond just anecdotes and opinions. We're looking at evidence.
Speaker 1:OK. So what's our mission with this deep depth? What are we trying to figure out from this big analysis?
Speaker 2:Well, we want to unpack what it tells us about ChatGPT's actual impact on students. We're looking at three main things their learning performance you know how well they actually do.
Speaker 1:Trades and test scores, that sort of thing.
Speaker 2:Yeah, and then their learning perception, basically how they feel about the learning process when using it.
Speaker 1:Okay, performance and perception. What's the third?
Speaker 2:And this is a big one higher order thinking. Is it actually helping students develop critical thinking, problem solving, that kind of deeper reasoning?
Speaker 1:Right, because that's a major concern you hear, so let's dive in. You hear so many conflicting things like some people think it's revolutionary, others are worried. What did these 51 studies all combined actually find overall? Okay, so the headline finding, looking across all that research is are worried. What did these 51 studies all combined actually find overall?
Speaker 2:Okay, so the headline finding looking across all that research is pretty striking. The meta-analysis found a well a large positive impact overall on learning performance.
Speaker 1:Large positive. Can you put a number on that?
Speaker 2:Yeah, they used something called an effect size, a measure of impact strength. It came out as G equals 0.867. In educational research that's considered a large effect, pretty significant 0.867.
Speaker 1:Okay, so generally using chat, gpt seems linked to students doing noticeably better. What about the other areas? Perception and thinking skills.
Speaker 2:There was an impact there too, but more moderate, for enhancing learning perception how students feel. The effect size was 0.456.
Speaker 1:Moderately positive Okay.
Speaker 2:And, interestingly, almost identical for fostering higher order thinking skills. G equals 0.457. Also moderately positive.
Speaker 1:So a big boost to performance and a decent, noticeable bump for perception and higher level thinking. That's already a huge takeaway.
Speaker 2:It is. It suggests chat GPT isn't just hype. There's substance to its potential in education, at least based on this data.
Speaker 1:But I imagine it's not always that straightforward Like does it work equally well everywhere, for every subject, every student.
Speaker 2:Ah, exactly that's where it gets more nuanced. The overall positive trend is clear, but the analysis also looked really closely at when and how it's most effective. They looked for these moderating factors.
Speaker 1:Moderating factors, yeah, things that change how strong the effect is.
Speaker 2:Precisely Things that influence the impact. Let's start with learning performance again. What makes ChatGPT more or less effective for improving grades and scores?
Speaker 1:Okay, yeah, more or less effective for improving grades and scores. Okay, yeah, this is really practical for educators and students. What did they find?
Speaker 2:Well, one really big factor was the type of course students were taking. The differences were statistically significant. The strongest positive effect a really robust G of 0.874, was in courses focused on skills and competencies development.
Speaker 1:Skills and competencies.
Speaker 2:Yeah.
Speaker 1:Like learning a specific software or maybe technical writing or lab techniques, that kind of thing.
Speaker 2:Exactly Things where there are often clear steps, well-defined tasks.
Speaker 1:And why do you think it works so well there?
Speaker 2:The thinking is that ChatGPT is great at providing like immediate, targeted feedback for those kinds of tasks. You're learning code, it can debug, practicing a formula, it can check your work instantly. That rapid feedback loop is really powerful for skill building.
Speaker 1:That makes a lot of sense, less waiting for a human teacher to grade something specific. What about other subjects?
Speaker 2:It was still objective in STEM fields science, tech, engineering, math and also in language learning and academic writing. The effect sizes were decent 0.311 for STEM and 0.531 for language and writing.
Speaker 1:So still positive, but not quite as impactful as in those direct skills courses.
Speaker 2:Right Suggests. The benefit might be a bit more pronounced when the learning goal is a very specific practical skill.
Speaker 1:Okay, so course type matters. What else influences performance?
Speaker 2:The learning model, how the course itself is structured, also showed really significant differences. This was quite striking. Also, the biggest effect by far was in problem-based learning. Huge effect size g equals 1.113 wow over 1.0.
Speaker 1:Problem-based learning, where you learn by tackling complex problems yeah, exactly, students work messy, often real-world style problems. And ChatGPT helps there how? By giving answers.
Speaker 2:Not necessarily giving the final answer, but maybe helping students break down the problem, suggesting different angles to consider, providing background information quickly acting as a sounding board for ideas. It seems to really support that kind of active problem-solving process.
Speaker 1:Okay, so it's a good partner for tackling tough challenges. Where did it have the least impact then?
Speaker 2:Interestingly, the weakest effect on performance was found in project-based learning. The effect size there was much smaller, only 0.239.
Speaker 1:Ah, project-based. That often involves longer-term, maybe more creative or real real world application projects. Why wouldn't ChatGPT help as much there? You'd think it'd be useful for research or brainstorming.
Speaker 2:That's a great point and it might be useful for parts of the project, but the researchers suggest that maybe project-based learning relies more heavily on the whole integrated process planning, execution, collaboration, presentation, maybe things beyond just discrete problem-solving steps where ChatGPT excels.
Speaker 1:So the overall outcome of the project might depend on factors ChatGPT doesn't influence as much.
Speaker 2:That could be it. It's less about just finding information or solving small bits and more about the bigger picture, the synthesis, maybe even teamwork, which isn't ChatGPT's forte.
Speaker 1:Gotcha Were other learning models looked at.
Speaker 2:Yes, and there were still positive effects in others, like personalized learning, contextual learning, reflective learning that one was quite high to actually 0.866, and mixed models. So it has broad applicability, just varying levels of boost depending on the approach.
Speaker 1:Okay, course type learning model. What about time? Does how long you use ChatGPT make a difference to performance?
Speaker 2:It does. Yes, Duration was another significant factor.
Speaker 1:Was the ideal time.
Speaker 2:The analysis found the largest effect on learning performance when the duration of use was between four and eight weeks. The effect size there was really high. G equals 0.999.
Speaker 1:Four to eight weeks, so like a good chunk of a semester or a focus module that feels like a sweet spot.
Speaker 2:It seems to be Enough time to learn how to use it effectively, integrate it into your workflow, but maybe not so long that other factors come into play.
Speaker 1:Like what. What happened with shorter or longer use?
Speaker 2:Well, for durations of a week or less, the effect was much smaller, Only 0.332,. Suggests maybe there's a learning curve. You need time to get the hang of it Makes sense. And, interestingly, for durations longer than eight weeks, the effect size actually dipped slightly down to 0.531. Still positive, but less than that four eight week peak.
Speaker 1:Any ideas why?
Speaker 2:Yeah.
Speaker 1:Maybe over-reliance because of crutch.
Speaker 2:That's definitely a potential explanation the researchers floated. Maybe engagement drops off or students rely on it too much instead of internalizing the concepts themselves. Needs more research, but it suggests just using it forever. Isn't necessarily the optimal path for performance gains.
Speaker 1:Fascinating, so thoughtful integration over a defined substantial period seems key for performance. Now what about things like grade level or whether ChatGPT was used as, say, a tutor versus just a tool. Did those matter for performance?
Speaker 2:Surprisingly no. The analysis didn't find significant differences based on grade level, the specific role ChatGPT played or the general area of application when it came to learning performance.
Speaker 1:Really so? High school versus university, tutor versus tool didn't fundamentally change the performance boost.
Speaker 2:Apparently not. According to this meta-analysis. The suggestion is that its core utility for helping students learn material and complete tasks might be broad enough to overcome those differences, at least for performance outcomes.
Speaker 1:Okay, that's really interesting. Let's switch gears then to learning perception how students felt about learning. What influenced that?
Speaker 2:Right. So for perception it was actually simpler. Only one factor showed a significant moderating effect, and that was Duration, again how long they used it.
Speaker 1:Okay, and what was the trend there? Same as performance with a peak no actually quite different.
Speaker 2:For learning perception, the positive feeling increased the longer students used Chat there. Same as performance with a peak. No, actually quite different. For learning perception, the positive feeling increased the longer students use chat GPT.
Speaker 1:The effect roost steadily with time so the longer they used it, the better they felt about learning exactly the largest effect size, a quite strong G of one point zero five.
Speaker 2:Four was seen for usage durations of more than eight weeks.
Speaker 1:Wow. So, unlike performance, which which maybe peaked earlier, positive feelings kept growing. Why might that be?
Speaker 2:It could be that sustained use leads to more familiarity, more confidence in using the tool and perhaps experiencing those performance benefits consistently over time reinforces a positive attitude. You feel like you have ongoing support.
Speaker 1:That makes sense. You get used to it. It keeps helping you succeed, so you feel better about the whole process.
Speaker 2:That seems to be the implication. Shorter durations like under a week or one to four weeks showed much smaller positive effects on perception. It takes time to build that positive feeling.
Speaker 1:And the other factors course type, learning model, role, grade level. Do they impact perception significantly?
Speaker 2:Nope. For perception it really seemed to be about the length of exposure and use. Consistent longer-term use seems to foster that positive attitude towards learning with ChatGPT.
Speaker 1:All right, performance and perception covered. Now let's get to the really tricky one Higher-order thinking, critical analysis, complex reasoning. What shaped ChatGPT's impact there?
Speaker 2:Okay, so remember, the overall impact here was moderately positive. G equals 0.457. But again, moderators matter. The type, of course, showed significant differences.
Speaker 1:And where was it most helpful for developing these thinking skills?
Speaker 2:STEM and related courses science, technology, engineering, math. That's where they found the largest positive effect on higher order thinking, with an effect size of .737.
Speaker 1:STEM again. Why do you think it helps more with critical thinking in those fields specifically?
Speaker 2:Well, stem fields often involve complex problem solving, analyzing data, designing solutions tasks that inherently require higher order thinking. The researchers suggest ChatGPT might be particularly good at supporting the reasoning processes needed for that kind of work, maybe helping students explore complex concepts or evaluate different approaches within those domains.
Speaker 1:So it aligns well with the type of thinking needed in STEM.
Speaker 2:That seems plausible. The effect was smaller in language learning and academic writing and also in skills and competencies development courses Still positive, but less pronounced for higher order thinking specifically in those areas.
Speaker 1:Maybe because those courses sometimes focus more on, say, mastering grammar rules or specific writing formats, rather than purely analytical reasoning.
Speaker 2:Could be. It suggests that while it helps with the tasks in those courses, it might not be pushing the deeper analytical skills quite as much as it does in STEM problem-solving contexts.
Speaker 1:Interesting. What else influenced higher order thinking?
Speaker 2:The role chat GPT played was also significant.
Speaker 1:Okay, tutor tool.
Speaker 2:It was most effective in fostering higher order thinking when it acted as an intelligent tutor. The effect size there was really substantial. G equals 0.945.
Speaker 1:An intelligent tutor, so providing more personalized feedback, maybe asking guiding questions, adapting to the student's level.
Speaker 2:Exactly that kind of tailored interactive guidance seems much more effective at pushing students to think more deeply, to analyze, reflect and grapple with complexity, compared to just using it as a more passive tool.
Speaker 1:That makes intuitive sense, a conversation, even with AI, that prompts you to think harder is better than just looking something up.
Speaker 2:Precisely when used as just an intelligent learning tool, the effect on higher order thinking was smaller G equals 0.428. Still there, but much less impactful than the tutoring role.
Speaker 1:And do they look at other roles?
Speaker 2:They mentioned mixed roles or using it as an intelligent partner, but unfortunately there wasn't enough data in the studies they analyzed to draw from conclusions about those yet.
Speaker 1:Okay and quickly. Did learning model duration or application areas significantly change the impact on higher order thinking?
Speaker 2:No, according to this analysis, those factors didn't show a significant moderating effect specifically for higher order thinking development. It seems course type and the tutoring role were the key differentiators there.
Speaker 1:Right. So let's try to summarize this complex picture. Generally positive impact, right. Yes. Especially for performance yes, Boosted most in skills courses through problem-based learning and ideally used for about four eight weeks for peak performance effect. Correct Perception improves the longer you use it.
Speaker 2:Yep.
Speaker 1:And higher order thinking gets the biggest lift in STEM fields, especially when chat GPT acts like a personalized tutor.
Speaker 2:You've got it. That captures the main moderating effects they found.
Speaker 1:Now, how does all this stack up against other AI tools that have been used in education? Is chat GPT doing better or worse?
Speaker 2:That's a great question for context. Worse, that's a great question for context.
Speaker 1:The authors briefly compared their findings, particularly on learning performance, to another recent meta-analysis that looked at more traditional AI-based assessment tools, like tools that just grade essays or quizzes automatically.
Speaker 2:Kind of yeah, focused more on evaluation and the finding was that ChatGPT's positive impact on learning performance that large 0.867 effect size we talked about appears notably larger than the average impact found for those traditional AI assessment tools that earlier meta-analysis reported an average effect size of only 0.390.
Speaker 1:Wow, so more than double the impact on performance compared to those older assessment AI.
Speaker 2:It seems so based on these two meta-analyses.
Speaker 1:Why the big difference?
Speaker 2:The likely reason is just the sheer breadth of what ChatGPT can do. Those older tools were often quite narrow, focused on grading or feedback on specific assignments. Chatgpt is generative AI. It can explain concepts, brainstorm, simulate conversations, answer follow-up questions, draft text. It supports learning in many more ways.
Speaker 1:So it's much more versatile assistant, not just a grader.
Speaker 2:Exactly. It can be involved more deeply and broadly in the learning process itself. But and this is important we need to circle back to some nuances and cautions.
Speaker 1:This is all perfect.
Speaker 2:Right. Remember, while the performance impact was large, the impacts on perception and, crucially, higher order thinking were only moderate.
Speaker 1:Yeah, so it helps you do better more easily than it helps you think better or feel better about learning on average. Why that gap?
Speaker 2:Well, think about it, Chat GPT doesn't have emotional intelligence right. It can't replicate the empathy or motivational connection a human teacher can provide, which likely limits its impact on genuine engagement or passion for learning. The perception side.
Speaker 1:Okay, that makes sense for perception. What about higher order thinking? Why only moderate?
Speaker 2:Because it's trained on existing data. It's incredibly good at synthesizing information, explaining things clearly, following patterns, but fostering truly critical or creative thinking, challenging assumptions, generating genuinely novel insights. That's harder. It depends a lot on how it's used.
Speaker 1:So you can use it as a shortcut, just get the answer and not actually develop those deeper thinking skills.
Speaker 2:Precisely, which is why the researchers stress the importance of thoughtful integration. You can't just throw chat GBT at students and expect critical thinking to blossom.
Speaker 1:What does thoughtful integration look like then, especially for higher-order thinking?
Speaker 2:It means designing activities that explicitly require deeper thinking, using the tool, for example, providing students with learning scaffolds, frameworks like Bloom's taxonomy, maybe, to guide their interactions, prompting them to use chat, gpt not just for answers, but to compare perspectives, evaluate sources, critique arguments or design solutions.
Speaker 1:So the human educator's role in structuring the interaction becomes even more critical if the goal is deep thinking, absolutely essential.
Speaker 2:It's about guiding the use of the tool towards those higher level cognitive goals.
Speaker 1:Okay, this has been incredibly insightful. Let's try and wrap this up for our listener. What are the key takeaways?
Speaker 2:if someone wants to know if ChatGPT works in education, I'd say the main message is yes, generally it has a clear positive impact. Students tend to perform better, feel a bit better about learning over time and even get some support for higher order thinking. But its effectiveness really isn't uniform. It works better in certain situations skills courses, problem-based learning, used for that optimal four-to-eight-week duration for performance and especially when playing an intelligent tutor role for boosting critical thinking in fields like STEM.
Speaker 1:So for you, listening, if you're looking to learn things effectively this suggests JATGPT can definitely be a powerful tool in your toolkit, but maybe think strategically about how and when you use it, depending on what you're trying to learn.
Speaker 2:Exactly. Use it thoughtfully. And that leads us to maybe a final thought for you to mull over.
Speaker 1:Ooh.
Speaker 2:I like a provocative final thought. Go on Well, given everything we've just discussed, the clear benefits, but also the nuances and the limitations, especially around things like critical thinking and genuine engagement how do our educational systems, our teaching approaches need to evolve?
Speaker 1:Hmm, how do we best harness these AI strengths?
Speaker 2:Right. How do we leverage what AI like ChatGPT does well while actively compensating for its weaknesses? And, maybe most importantly in this new landscape, what is the truly irreplaceable role of human connection, mentorship and that dynamic Socratic interaction in education?
Speaker 1:That's a huge question. What does uniquely human teaching look like alongside powerful AI? Definitely something to think about long after this deep dive ends.