The Lancet Voice

AI and LLMs in healthcare

The Lancet Group Season 5 Episode 10

Gavin and Jessamy are joined by Rupa Sarkar, editor-in-chief of The Lancet Digital Health, to discuss the uses of Large Language Models (LLMs) in healthcare, the safety and patient concerns, progress that’s been made, and the what the future of AI in health might hold.

Send us your feedback!

Read all of our content at https://www.thelancet.com/?dgcid=buzzsprout_tlv_podcast_generic_lancet

Check out all the podcasts from The Lancet Group:
https://www.thelancet.com/multimedia/podcasts?dgcid=buzzsprout_tlv_podcast_generic_lancet

Continue this conversation on social!
Follow us today at...
https://twitter.com/thelancet
https://instagram.com/thelancetgroup
https://facebook.com/thelancetmedicaljournal
https://linkedIn.com/company/the-lancet
https://youtube.com/thelancettv

This transcript was automatically generated using speech recognition technology and may differ from the original audio. In citing or otherwise referring to the contents of this podcast, please ensure that you are quoting the recorded audio rather than this transcript.

Gavin: Hello and welcome to The Lancet Voice. It's May 2024, I'm Gavin Cleaver, and I'm joined by my co host, Jessamy Bagnall. Thank you for joining us. Today, we're talking about the poppiest of subjects, AI. You don't need me to tell you about the strides AI has made over the last couple of years. But sometimes it can feel a bit like a tool, searching for an application rather than one particularly suited to anything.

We're in the very early stages of seeing how AI, and particularly the prominent, headline grabbing version of AI of the last few years, Large Language Models, or LLMs, in the style of ChatGBT, might be used in healthcare and by clinicians. Jessamy and I thought we'd sit down with Dr. Rupa Sarkar, Editor in Chief of the Lancet Digital Health.

To find out where we are with AI and LLMs in healthcare.

Jess, me and I are joined today by Rupa Sarkar, Editor in Chief of The Lancet Digital Health. Rupa, thanks so much for joining us. 

Rupa: Thanks for having me. 

Gavin: I wanted to get you on to talk about large language models in healthcare and AI more broadly. Because it's such a rapidly evolving topic and I thought perhaps we could start off by differentiating what a large language model is from what we're talking about with the broad sweep of AI generally.

Rupa: Yeah, good question. So AI is a really broad field that encompasses large language models. So large language models is a form of artificial intelligence. But this model has been trained on. Billions, maybe even trillions of words from various sources from the internet largely and large language models use statistical associations to, between words to make predictions about how words are used together in language to apply these patterns, to complete various natural language processing tasks.

I think most of us know what Chachi Petit can do in a sort of, in a sense, a lot of us have used it. It can respond to free text queries, a little bit like Google without any specific training on a task. And this has caused a lot of excitement and also a lot of concern naturally about their use in various applications, including healthcare.

But. What's quite unique about these models is that they're very large. I think GPT 4, the most recent model from OpenAI has over a trillion parameters and these are growing. I mentioned that they're general, they're trained for a specific task. Fine tuning, but these are very these are add ons to ChatGPT.

They're known as auto regressive, which means that they're trained on past data to predict future data, the foundational. So they're the seeds for a lot of future products. And They're really challenging our current system at the moment, much more than other artificial intelligence systems have because they're rapidly evolving and the potential of them is seemingly limitless.

And I think it's important to mention, because you often hear, when you hear people from open AI, for example, talk about it, they talk about large language models or that particular large language models, understanding text, being able to communicate like humans do. And that's simply not the case.

There is a statistical model, just like other artificial intelligence tools. So there are similarities and differences. 

Gavin: How similar is it to that system that you used to have on your phone where you keep pressing the button in the middle and keep predicting what you're going to say? 

Rupa: Yeah. 

Gavin:

Rupa: think, I think it's quite similar, but it's very intuitive and it's able to learn in real time with the information that you give it.

So yeah. And it's able to tune based on the prompts that you give it. So it's it's much more intelligent than that particular app and it's getting more powerful as well. 

Gavin: So here's an intensely broad question. What are the applications in healthcare specifically? 

Rupa: I think there's two main areas that we're seeing a lot of progress in at the moment and it's not limited to these two areas, but it's, We've seen a lot of research in this field and a lot of investment.

So one is medical education. In this particular field the, there are a large number of ways that it's making an impact. But I think last year there was a nature paper that showed scribe med palm. So this is a fine tuned version of a large language model. And it showed that this language model was able to provide accurate answers to medical questions from the U S medical licensing exam questions.

I think it was 67 percent of the time they answered the question correctly, which is pretty high a much higher than any other model before. We at the Lancet Digital Health published a paper as well, and this used this was a bit different. It used electronic health records from the UK to predict health conditions an individual might develop in the future.

And so this has multiple applications, but in education specifically it can help clinicians really understand the future projection of a patient care journey, for example. So that's one area. And then the second area is I think we're all familiar with this sort of relentless increase in administration related to clinical work and healthcare, especially, all around the world, but we're seeing this a lot in the U.

S. with with their electronic with their electronic healthcare system, with their insurance system. A very large electronic health record company called Epic. I think they're running over 2000, almost 3000 hospitals in the US. That's about a third of hospitals in total in the US have announced a collaboration with OpenAI and they have about 60 tools.

In development. And so that's huge. That is huge. They're being rolled out to patients now. A lot of these tools are to do with, replying to patients queries about healthcare, producing letters from clinicians. And we actually published a paper very recently. I think it was in the last week.

It said coming up in our June issue about using large language models to reply to patient messages. With the idea that it could improve health care by identifying urgent issues and providing education to patients as well as supporting the workforce. So this came from the Brigham and Women's Hospital, and they found that whilst these LLM assisted responses reduced Physician workload and improve consistency and responses.

There are also a lot of errors, which if left unedited could lead to harm. So there is a huge need for human oversight and use of these tools, but there they are growing in. In uses, basically, I think we've got a few other studies. So there's one coming out soon on digital pathology. So this is using LLMs to support the work of pathologists in diagnosis.

Which is another area and interestingly, recently, there was a nature medicine study that showed an LLM that can identify errors in pharmacy instructions for dosage or frequency or highlighting the chances of adverse drug events. So huge range of, that are in research at the moment. 

Gavin: I know these models are improving all the time, but from this research and kind of from your perception, how are they like, are they ready?

What's the kind of maturity level of them like? 

Rupa: So I think it depends on what tasks you're talking about. There are tests that have higher risk level. But on the whole, I would say no. I think we all know that these tools have huge levels of error. I think there was a New York Times article that said that, um, ChatGPT in particular had were incorrect 3 percent of the time at the very best and up to 27 percent of the time.

So there's huge inaccuracies in these tools that require validation. In order to be able to use them and that's not there yet. The tools are obviously black box. I think, it's called open AI, but nothing really about the models are open, not even where they get the data from how many, what kind of parameters they've used to train their model.

We, we simply don't know a lot of this information. But we are able to evaluate the output. So we published a paper from Zach Cohen's group recently that showed That chat GPT was actually very biased. It's probably not a surprise, but the level of bias was huge. It really perpetuated negative stereotypes across race, ethnicity, and gender.

They really require further, further evaluation before we can use them. And then of course there's other issues like privacy. I think I think in fact, OpenAI last year was banned from use in Italy because of non compliance to GDPR rules. And they're still under investigation by the EU.

So there's a lot of problems. I think if I'm going to carry on, cause I can, there's also cost. It's, a huge computing power, much bigger than, I'm not sure of the figures, I'm going to have to go away and look this up, but I think it's over 25%. More in terms of computing power for large language models.

I think it might even be even bigger. So this cost means that it's dominated to a few regions in the world, the US, the UK as well as just a few companies. So this has a lot of equity issues that, that is going to be difficult to overcome. 

Gavin: So it's interesting in the sense that it entrenches inequity in two directions.

One, it's a mirror reflecting what's been put into it back out at us, all the biases that it took in the first place. But two in there also, its application is generally increased in in high income countries. 

Rupa: Definitely. I think I saw a nature of medicine review on large language models by Daniel Ting, and it was quite optimistic in that they felt like the costs would reduce by the year 2030, which might help with with resolving some of the equity issues, but it were really very far away because it's not just costs, it's it's.

Space it's the way that these servers are run. It's the amount of water that's used for these servers, the number of components that you need to run one of these models is huge. It's huge and unless there's these companies that are able to provide some level of assurances for equity when we're not going to resolve that gap.

Gavin: Jessamy, I know that you published a comment you wrote a comment recently in the Lancet about AI in healthcare. I'm keen to hear your thoughts on the topic as well. 

Jessamy: There's lots of overlap there. In terms of the research that feeds into what we do, it's really a fascinating space and I think it's easy to get in, into the weeds and sometimes you think about it and it just hurts your head and it just melts, anything actually, I don't know what the next five years of anything looks like.

I think based on the last five years. Of how health tech and digital health has been implemented and integrated into health care. I don't feel a huge sense that it's going to be transformative yet, because it doesn't seem to me that we quite have the political economy within, perhaps you do within a private health care system, but within universal health care coverage health systems, where Any extra large language model, has to be developed, which as Rupa says, it's really expensive and it's really expensive to run, and then it has to be implemented across all languages.

On an enormous scale, we just don't see that happening very often. We normally see it as supplemental to the major healthcare delivery service. And therefore the actual impact it can have on what physicians are doing on a day to day basis or a very complex patients or, real life, what's happening still feels fairly minimal.

And obviously there's huge, there's great excitement and I know that, The Google DeepMind study, the AIME, the large language model that essentially looked at patients and they were talking about their sort of symptoms. And it was either being diagnosed by a large language model or by a physician.

And the large language model showed greater accuracy, but it also showed greater kind of emotion and empathy, which is slightly terrifying, but again, that's in a sign of simple setting on an online setting. If you're in a GP practice and you've got patient who's 90 with 10 different multi morbidity, comorbidity situations.

What, how does a large language model help you in that situation? And how is it that we can get to a stage where in that kind of universal healthcare coverage, we are able to implement these tools to help with administrative tasks and reduce the workload in a way that isn't riddled with errors, that still seems to me like quite a long way off.

Yeah. I might be incorrect. 

Rupa: Yeah. No, I completely agree. There's no difference between evaluating an AI or a large language model in a health care system. It still needs the the elements of study design, the minimized bias, the pre specified endpoints that have comparative groups and that are transparent.

And this all takes time, especially when we have, randomized control trials that we need to assess these kinds of tools. So it's not going to happen in the next five years. But. I think we are going to see progress in the private healthcare system, like you said, especially in the U.

S. where there's a lot of money going into as I mentioned electronic health record companies to, to process insurance claims faster, to handle appeals. Which is 

Jessamy: fantastic, right? All of these things, they'll, they, you can see the potential is so great. Just there just does seem this gap. We can't wait.

Rupa: think there's this, yeah, I think there's this idea that, the patient is the ultimate endpoint and, care of the patient is what we care about. But to do that, we need to care for the workforce and this is where LLMs might have an impact. But of course this has been said time and time again, I think electronic health records are the reason why a lot of clinicians are burnt out.

So I think you're right to be. To be resistant and I think we, we need to see what happens in the future. And I think the problem, the reason we don't know is because we're not being told what exactly is being developed. We're not being involved. People like us, patients clinicians people who work in healthcare don't know what's coming and we can't help.

And I think that's a huge problem for for everyone, including LLM developers. And I think that's a huge error on their part. 

Jessamy: It's interesting, isn't it? Because, I don't mean to come across as a Luddite in any way, and I'm all for large language models, I think they can have a huge impact on healthcare.

It's just what I'm not seeing is the focus alongside the development of new approaches and applications with actually how we do it. And that's been the case, it seems to me, and, Rippa, you'll know more than me, but that seems the case with all digital health tech that we've seen over the last 10 years.

And they're just, there isn't, there's just all this focus on new applications, new approaches, new companies to deliver new things. And that they. They all sound very good. I think they're all really well intentioned. They've all got the patient trying to improve, accessibility, trying to improve people's understanding, trying to improve people's access.

We just don't, I don't see that focus or that link alongside with the health system. And I don't know what we can do to try and get to that stage. 

Rupa: I think it's very slow work. I do see it in some. Places multidisciplinary teams working together developing tools that are actually then Applied within healthcare settings.

So these not LLMs because it's too early, but I've seen this certainly in some deep learning tools and I think what we're trying to what we're seeing more of our clinicians that are being skilled in, engineering AI, and this will take a generation, more than a generation to, to develop this sort of multifaceted team.

And I think that the tools that you need are changing as well, before we needed to be able to code, now we don't. And that's thanks to large language models. So I think innovation is just. This is what innovation is. It's all about new things and only, one out of a billion will stick.

Jessamy: But if we were to. Try and give three recommendations or say, a few things that we think could actually help to realize the opportunity and the huge benefit that large language models could provide for healthcare delivery, what would we. This is so off the cuff. So sorry.

She has, she me. What a questions gonna be basically 

Gavin: like 

Jessamy: things up. But 

Gavin: it feels like you could do a commission on least, right? Mean . 

Rupa: Yeah, true. But let's do it in the next three minutes. I think. Let's do it in the next 30 seconds. I think the first thing is the safety. So there has to be.

Regulation, legal infrastructure that enables safe development of these tools, safe applications, safe implementation, that doesn't exist. And without this. People won't try it, clinicians don't want to use it so that's the first thing that we need and, there are, WHO inter, other intergovernmental organizations academics, journals, we're all working on this.

And I think that the LLMs, is things change so fast. So we're constantly on the back foot. And we can't predict what legal infrastructure we need for when we don't know what the model's going to do. Even in real time, when you use the model, you don't know what's, what it's going to output in the next minute.

I don't know. That's the one thing we need to solve for, um, you said three things, right? Does anyone else want to pitch in with a second? 

Gavin: I was going to follow up on that. Say that it often strikes me that not only is research always trying to catch up as something is accelerating out of it, but it also feels like politics and policy is always even further behind than that.

I still sometimes think that politicians don't really have a handle on the internet yet, which obviously we've been living with our daily lives. Decades now. 

Rupa: Yeah. I don't think our politicians really do. Yes. Exactly. And I think, so the education is key then. So that's really second. We really need to understand these tools, understand what the potential of them are in order to really enable safe application and legal infrastructure, et cetera.

And then the fourth thing is we need everybody involved. We need patience. We need all healthcare workers working on this, if we really think that this is worth the investment, then we need everyone to participate in development of these tools because otherwise it's not going to, it's not, it's never going to benefit anyone.

Jessamy: That's really well there, Rupa. I think you've done really well. The one thing that I wanted to ask, and it's always dangerous to ask about new bodies or organizations because we love creating new things in healthcare and sometimes, we just change them all the time. They never really make a big difference.

But do you think, or have you heard in your travels and your conversations with people, is there a desire for some type of new body or something that looks at this, that, that isn't involved in the innovation and the research, but is? Really focusing on the application. How can we make the most of this new technology and how can we actually make it improve patients and healthcare providers lives?

Rupa: Yeah. I don't know of any new bodies. I know of new sectors of bodies. So FDA obviously has the AI group funders have their AI group, like NICE. And I think you make, you, it's quite a tough question to, to answer because there are. There are different types of groups that could do different things in this space, again, because you need so many different people involved.

I would struggle to understand what kind of force you'd be thinking of. If you're thinking of something like COVID, during COVID, there was a task force that was established in the UK and the US. I think maybe that would be good. I'm not sure. Are you 

Jessamy: thinking about the kind of something that provides a sort of normative processes and structures would allow healthcare systems to realize some of these opportunities in a more consistent way across context settings.

I don't know. Does that make sense? Okay. Thanks. 

Rupa: Yeah. No, it does make sense. But I think because this field is so unwieldy, I think that, I think it's often referred to as the wild west, that the focus is really what needs to be determined first before there are any kind of working groups. But saying that I'm sure that I'm unaware of what is going on politically or globally but it'd be good to look into it.

So I think you make a really good point. Maybe this is a call for them to step up and let us know what they're doing. And I'll answer, I'll afford some letters and some 

Jessamy: comments coming in about it. 

Gavin: I wanted to ask you, Risa, because we should point out in the podcast, you recently celebrated five years as editor in chief of the Lancet Digital Health.

Yes. We celebrated five years of the Lancet Digital Health and five years of you being in charge of Lancet. Beautiful. Of Lancet, the founding editor in chief. I wanted to ask you what kind of some of the changes you've noticed in AI in that five year period. 

Rupa: That's an interesting question because I think some things have changed, but a lot has stayed the same.

There's still a lot of concepts, a lot of tools that aren't making the clinic that aren't impacting patients, but there's this exponential rise in publications, exponential rise in, in media interests. So that's remained the same, but what we are seeing is that. At the beginning of the journal we published a systematic review by Alastair Denniston and colleagues that showed the level of of, uh, AI studies at the time, something like, I think it was over 20, 000 studies and only 14 percent showed clinical clinical use that has now changed.

We now see since that study, we've now developed guidelines that, thanks to Alistair Deveston, academics, industry, FDA journals, we've developed guidelines like console AI guidelines and many others that our authors are now using to help to report their studies more transparently. We've seen a rise in the number of RCTs that we're publishing of digital tools.

So the field of technology is adapting to healthcare, I would say, and really understanding what's required to make a safe and effective technology In, in, 2024, we're now beyond the age of, unvalidated spurious tech spurious results. We're now seeing really good studies.

And I think that, hopefully in another five years will lead to Better patient care. 

Gavin: The big question that is, where do you think we'll be in five years? 

Rupa: Dare I say in the same place, potentially . I think we'll be, I think we'll have the same, some of the same problems, especially the legal and safety issues.

But we will know a lot more about what we can do. 

Jessamy: Yeah. I just wanted you to ask your reflections a bit because it's obviously a different ecosystem to the one. That we see in other areas of healthcare and medicine, and that it's much more sort of VC heavy. And I wondered what your reflections were on.

On that aspect of it and how that either complicates it or simply simplifies it cause there's more money sloshing around or what's your, what do you think? 

Rupa: I think healthcare in general is a very complex space, no matter where the money comes from. I think whenever things are for profit, whenever there's a.

A sale involved the shift focuses and that in the U S in the U S they have the most money spent on healthcare out of tens and tens of countries and yet they have the poorest healthcare outcomes. So that's a probably an indication that having such a large volume of private funding, it might not result in the best outcome for PIC for healthcare right now.

Then we have to look at space travel, and they did eventually get to the moon, but a lot of that was funded by the government initially. Yeah. And I think if you think about the internet, obviously that was a lot of that was publicly funded initially as well. So 

Gavin: yeah, vaccine, 

Rupa: exactly COVID vaccines too.

So I think, I don't know if the question about where the money is now coming from is actually that. I think how we want it to be used and look at the kinds of aims that these companies have is really what we're looking for, especially obviously the ones that are focused on healthcare.

And I think that, making money is one aspect that's, that helps with longevity, patient care is just as important. At the center of what we're looking at even in the U. S. Yeah, definitely. Definitely. 

Jessamy: I guess I just mean more, have you had any observations about how we as healthcare or people in the healthcare space might interact with.

VCs and people who are start, funding startups that, some of the technology that startups have are just fantastic. They, they're great ideas, they're very patient centric, but is there enough linkages between, the, where that money is funded, where these people are working and then with actually getting it into healthcare, what's our role in that?

Rupa: Yeah. I think that's a really interesting question and I wonder if, we have something to learn from pharma here and how we work or, the healthcare system works with pharmaceuticals in terms of identifying the right language, identifying, um, mutual mutual benefit there.

So I think maybe. Maybe you, Jessamy, as you've been in, in healthcare a bit longer than I have, if there's any tips from that sector that you might be able to share? 

Jessamy: I don't know. I don't know whether it's the right, whether it's the right approach. Many people, get fed up with the fact that it takes so long for a farmer to develop medications.

Although I think, we haven't that's changing, they do all have massive arms that are busy doing studies and trials and writing them up in correct ways. And we obviously have a. There's a relationship there with journals and, submitting a lot of research into journals, whereas we don't see that really from startups and, I'm, that's nothing new, but I'm wondering whether.

That's the right thing or 

Rupa: whether we are seeing more of that because regulators, so in order to be FDA approved publication within a research journal is one of the prerequisites even in, in, software. So we are seeing papers from startups, from big companies like Google. There is a research publication culture that's growing.

But perhaps you're right it's started slow and there needs to be more encouragement. So I think there is more for us to do to really understand the way that startups work to understand what requirements they need in order to get to the publication stage, including things like IP and and timing of their innovations as well and how they go on to do the further studies for evaluations that we would need.

I think with startups. Maybe the funding process requires, them to move faster and to not be as developed as, which is a requirement for us, obviously, in publishing. 

Jessamy: Very interesting. I could go on talking about this for all afternoon, but I won't. 

Gavin: Maybe we should leave it there in that case Rupa, thank you so much for talking with us.

Rupa: Thank you for having me. 

Gavin: It's an absolutely fascinating area and hopefully we've thrown a little bit of a light on that.

Thanks so much for listening to this episode of The Lancet Voice. Remember, you can subscribe to The Lancet Voice wherever you usually get your podcasts.