New medical LLM, PathChat 2, can talk to pathologists about tumors, offer diagnoses
Don’t miss OpenAI, Chevron, Nvidia, Kaiser Permanente, and Capital One leaders only at VentureBeat Transform 2024. Gain essential insights about GenAI and expand your network at this exclusive three day event. Learn More
Four state-of-the-art large language models (LLMs) are presented with an image of what looks like a mauve-colored rock. It’s actually a potentially serious tumor of the eye — and the models are asked about its location, origin and possible extent.
LLaVA-Med identifies the malignant growth as in the inner lining of the cheek (wrong), while LLaVA says it’s in the breast (even more wrong). GPT-4V, meanwhile, offers up a long-winded, vague response, and can’t identify where it is at all.
But PathChat, a new pathology-specific LLM, correctly pegs the tumor to the eye, informing that it can be significant and lead to vision loss.
Developed in the Mahmood Lab at Brigham and Women’s Hospital, PathChat represents a breakthrough in computational pathology. It can serve as a consultant, of sorts, for human pathologists to help identify, assess and diagnose tumors and other serious conditions.
Countdown to VB Transform 2024
Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now
PathChat performs significantly better than leading models on multiple-choice diagnostic questions, and it can also generate clinically relevant responses to open-ended inquiries. Starting this week, it is being offered through an exclusive license with Boston-based biomedical AI company Modella AI.
“PathChat 2 is a multimodal large language model that understands pathology images and clinically relevant text and can basically have a conversation with a pathologist,” Richard Chen, Modella founding CTO, explained in a demo video.
PathChat does better than ChatGPT-4, LLaVA and LLaVA-Med
In building PathChat, researchers adapted a vision encoder for pathology, combined it with a pre-trained LLM and fine-tuned with visual language instructions and question-answer turns. Questions covered 54 diagnoses from 11 major pathology practices and organ sites.
Each question incorporated two evaluation strategies: An image and 10 multiple-choice questions; and an image with additional clinical context such as patient sex, age, clinical history and radiology findings.
When presented with images of X-rays, biopsies, slides and other medical tests, PathChat performed with 78% accuracy (on the image alone) and 89.5% accuracy (on the image with context). The model was able to summarize, classify and caption; could describe notable morphological details; and answered questions that typically require background knowledge in pathology and general biomedicine.
Researchers compared PathChat against ChatGPT-4V, the open-source LLaVA model and the biomedical domain-specific LLaVA-Med. In both evaluation settings, PathChat outperformed all three. In image-only, PathChat scored more than 52% better than LLaVA and more than 63% better than LLaVA-Med. When provided clinical context, the new model performed 39% better than LLaVA and nearly 61% better than LLaVA-Med.
Similarly, PathChat performed more than 53% better than GPT-4 with image-only prompts and 27% better with prompts providing clinical context.
Faisal Mahmood, associate professor of pathology at Harvard Medical School, told VentureBeat that, until now, AI models for pathology have largely been developed for specific diseases (such as prostate cancer) or specific tasks (such as identifying the presence of tumor cells). Once trained, these models typically can’t adapt and therefore can’t be used by pathologists in an “intuitive, interactive manner.”
“PathChat moves us one step forward towards general pathology intelligence, an AI copilot that can interactively and broadly assist both researchers and pathologists across many different areas of pathology, tasks and scenarios,” Mahmood told VentureBeat.
Offering informed pathology advice
In one example of the image-only, multiple-choice prompt, PathChat was presented with the scenario of a 63-year-old male experiencing chronic cough and unintentional weight loss over the previous 5 months. Researchers also fed in a chest X-ray of a dense, spiky mass.
When given 10 options for answers, PathChat identified the correct condition (lung adenocarcinoma).
Meanwhile, in the prompt method supplemented with clinical context, PathChat was given an image of what to the layman looks like a closeup of blue and purple sprinkles on a piece of cake, and was informed: “This tumor was found in the liver of a patient. Is it a primary tumor or a metastasis?”
The model correctly identified the tumor as metastasis (meaning it is spreading), noting that, “the presence of spindle cells and melanin-containing cells further supports the possibility of a metastatic melanoma. The liver is a common site for metastasis of melanoma, especially when it has spread from the skin.”
Mahmood noted that the most surprising result was that, by training on comprehensive pathology knowledge, the model was able to adapt to downstream tasks such as differential diagnosis (when symptoms match more than one condition) or tumor grading (classifying a tumor on aggressivity), even though it was not given labeled training data for such instances.
He described this as a “notable shift” from prior research, where model training for specific tasks — such as predicting the origin of metastatic tumors or assessing heart transplant rejection — typically requires “thousands if not tens of thousands of labeled examples specific to the task in order to achieve reasonable performance.”
Offering clinical advice, supporting research
In practice, PathChat could support human-in-the-loop diagnosis, in which an initial AI-assisted assessment could be followed up with context, the researchers note. For instance, as in the examples above, the model could ingest a histopathology image (a microscopic examination of tissue), provide information on structural appearance and identify potential features of malignancy.
The pathologist could then provide more information about the case and ask for a differential diagnosis. If that suggestion is deemed reasonable, the human user could ask for advice on further testing, and the model could later be fed the results of those to arrive at a diagnosis.
This, researchers note, could be particularly valuable in cases with more lengthy, complex workups, such as cancers of unknown primary (when diseases have spread from another part of the body). It could also be valuable in low-resource settings where access to experienced pathologists is limited.
In research, meanwhile, an AI copilot could summarize features of large cohorts of images and potentially support automated quantification and interpretation of morphological markers in large data cohorts.
“The potential applications of an interactive, multimodal AI copilot for pathology are immense,” the researchers write. “LLMs and the broader field of generative AI are poised to open a new frontier for computational pathology, one which emphasizes natural language and human interaction.”
Implications beyond pathology
While PathChat presents a breakthrough, there are still issues with hallucinations, which could be improved with reinforcement learning from human feedback (RLHF), the researchers note. Additionally, they advise, that models should be continually trained with up-to-date knowledge so they are aware of shifting terminology and guidelines — for instance, retrieval augmented generation (RAG) could help provide a continuously updated knowledge database.
Looking further afield, models could be made even more useful for pathologists and researchers with integrations such as digital slide viewers or electronic health records.
Mahmood noted that PathChat and its capabilities could be extended to other medical imaging specialties and data modalities such as genomics (the study of DNA) and proteomics (large-scale protein study).
Researchers at his lab plan to collect large amounts of human feedback data to further align model behavior with human intent and improve responses. They will also integrate PathChat with existing clinical databases so that the model can help retrieve relevant patient information to answer specific questions.
Further, Mahmood noted, “We plan to work with expert pathologists across many different specialties to curate evaluation benchmarks and more comprehensively evaluate the capabilities and utility of PathChat across diverse disease models and workflows.”
Source link