切换至 "中华医学电子期刊资源库"

中华医学超声杂志(电子版) ›› 2025, Vol. 22 ›› Issue (08) : 777 -782. doi: 10.3877/cma.j.issn.1672-6448.2025.08.013

综述

多模态基础模型与超声影像的交叉应用
孟雅清1, 杨景涵1, 李欣玥1, 张啸谦2,3, 田捷1, 王坤1,()   
  1. 1 100190 北京,中国科学院自动化研究所
    2 200025 上海交通大学医学院附属瑞金医院超声医学科
    3 200025 上海交通大学医学院医学技术学院
  • 收稿日期:2025-03-18 出版日期:2025-08-01
  • 通信作者: 王坤
  • 基金资助:
    国家重点研发计划(2023YFF120460); 国家自然科学基金(82441010,92159305,82272029,62201019); 北京市杰出青年项目(JQ22013)

Cross-applications of multimodal foundation models and ultrasound imaging

Yaqing Meng, Jinghan Yang, Xinyue Li   

  • Received:2025-03-18 Published:2025-08-01
引用本文:

孟雅清, 杨景涵, 李欣玥, 张啸谦, 田捷, 王坤. 多模态基础模型与超声影像的交叉应用[J/OL]. 中华医学超声杂志(电子版), 2025, 22(08): 777-782.

Yaqing Meng, Jinghan Yang, Xinyue Li. Cross-applications of multimodal foundation models and ultrasound imaging[J/OL]. Chinese Journal of Medical Ultrasound (Electronic Edition), 2025, 22(08): 777-782.

1
Chernyak VFowler KJ, Do RKG, et al. LI-RADS: Looking back, looking forward [J]. Radiology, 2023, 307(1): e222801.
2
Christian B, van der Pol, Matthew DF, et al. CT/MRI and CEUS LI-RADS major features association with hepatocellular carcinoma: individual patient data meta-analysis[J]. Radiology, 2022, 302(2): 326-335.
3
Kono Y, Lyshchik A, Cosgrove D, et al. Contrast enhanced ultrasound (CEUS) liver imaging reporting and data system (LI-RADS®): the official version by the American College of Radiology (ACR)[J]. Ultraschall Med, 2017, 38(1): 85-86.
4
Li Y, Pan L, Peng Y, et al. Application of deep learning-based multimodal fusion technology in cancer diagnosis: A survey[J]. Engineering Applications of Artificial Intelligence, 2025, 143: 109972.
5
Dhar J, Zaidi N, Haghighat M, et al. Multimodal fusion learning with dual attention for medical imaging[DB/OL]. (2024-11-02)[2025-03-18].
6
Ding W, Meng Y, Ma J, et al. Contrast-enhanced ultrasound-based AI model for multi-classification of focal liver lesions[J]. J Hepatol, 2025, 83(2): 426-439.
7
Wang Y, Ge X, Ma H, et al. Deep learning in medical ultrasound image analysis: a review[J]. IEEE Access, 2021, 9: 54310-54324.
8
Krasniqi E, Filomeno L, Arcuri T, et al. Multimodal deep learning for predicting neoadjuvant treatment outcomes in breast cancer: a systematic review[J]. Biol Direct, 2025, 20(1): 72.
9
Yang H, Yang M, Chen J, et al. Multimodal deep learning approaches for precision oncology: a comprehensive review[J]. Brief Bioinform, 2024, 26(1): bbae699.
10
Goertzel B. Artificial general intelligence: Concept, state of the art, and future prospects[J]. Journal of Artificial General Intelligence, 2014, 5(1): 1-46.
11
Pei J, Deng L, Song S, et al. Towards artificial general intelligence with hybrid Tianjic chip architecture[J]. Nature, 2019, 572(7767): 106-111.
12
Fei N, Lu Z, Gao Y, et al. Towards artificial general intelligence via a multimodal foundation model[J]. Nat Commun, 2022, 13(1): 3094.
13
Shevlin H, Vold K, Crosby M, et al. The limits of machine intelligence: Despite progress in machine intelligence, artificial general intelligence is still a major challenge[J]. EMBO Rep, 2019, 20(10): e49177.
14
Moor M, Banerjee O, Abad ZSH, et al. Foundation models for generalist medical artificial intelligence[J]. Nature, 2023, 616(7956): 259-265.
15
Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge[J]. Nature, 2023, 620(7972): 172-180.
16
Yang X, Chen A, PourNejatian N, et al. A large language model for electronic health records[J]. NPJ Digit Med, 2022, 5(1): 194.
17
Desai K, Johnson J. VirTex: Learning visual representations from textual annotations[DB/OL]. (2021-09-25)[2025-03-18].
18
Radford A, Kim JW, Hallacy C, et al. Learning transferable visual models from natural language supervision[DB/OL]. (2021-02-26)[2025-03-18].
19
Li Y, Liang F, Zhao L, et al. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm [DB/OL]. (2022-03-14)[2025-03-18].
20
Ramesh A, Dhariwal P, Nichol A, et al. Hierarchical text-conditional image generation with CLIP latents [DB/OL]. (2022-04-13)[2025-03-18].
21
Adams LC, Busch F, Truhn D, et al. What does DALL-E 2 know about radiology?[J]. J Med Internet Res, 2023, 25: e43110.
22
Bachmann R, Mizrahi D, Atanov A, et al. MultiMAE: Multi-modal multi-task masked autoencoders[DB/OL]. (2022-04-04)[2025-03-18].
23
Kang Q, Lao Q, Gao J, et al. Deblurring masked image modeling for ultrasound image analysis[J]. Med Image Anal, 2024, 97: 103256.
24
刘静, 郭龙腾. GPT-4 对多模态基础模型在多模态理解,生成,交互上的启发[J]. 中国科学基金, 2023, 37(5): 793-802.
25
Codella NCF, Jin Y, Jain S, et al. Medimageinsight: An open-source embedding model for general domain medical imaging[DB/OL]. (2024-10-09)[2025-03-18].
26
Alberto Santamaria. Image search series part 1: Chest X-ray lookup with MedImageInsight [DB/OL]. (2025-02-01)[2025-03-18].
27
Wang G, Ye J, Cheng J, et al. SAM-Med3D-MoE: Towards a non-forgetting segment anything model via mixture of experts for 3D medical image segmentation[DB/OL]. (2024-07-06)[2025-03-18].
28
Qiu J, Wu J, Wei H, et al. VisionFM: a multi-modal multi-task vision foundation model for generalist ophthalmic artificial intelligence[DB/OL]. (2023-10-08)[2025-03-18].
29
Koleilat T, Asgariandehkordi H, Rivaz H, et al. BiomedCoOp: Learning to prompt for biomedical vision-language models[DB/OL]. (2024-11-21)[2025-03-18].
30
Jee J, Fong C, Pichotta K, et al. Automated real-world data integration improves cancer outcome prediction[J]. Nature, 2024, 636(8043): 728-736.
31
Qin X, Liu X, Xia L, et al. Multimodal ultrasound deep learning to detect fibrosis in early chronic kidney disease[J]. Ren Fail, 2024, 46(2): 2417740.
32
Braman N, Gordon JWH, Goossens ET, et al. Deep orthogonal fusion: multimodal prognostic biomarker discovery integrating radiology, pathology, genomic, and clinical data [DB/OL]. (2021-07-01)[2025-03-18].
33
Wan CF, Jiang ZY, Wang YQ, et al. Radiomics of multimodal ultrasound for early prediction of pathologic complete response to neoadjuvant chemotherapy in breast cancer[J]. Acad Radiol, 2025, 32(4): 1861-1873.
34
Mondol RK, Millar EKA, Sowmya A, et al. MM-SurvNet: deep learning-based survival risk stratification in breast cancer through multimodal data fusion[DB/OL]. (2024-02-19)[2025-03-18].
35
Yeghaian M, Bodalal Z, van den Broek D, et al. Multimodal integration of longitudinal noninvasive diagnostics for survival prediction in immunotherapy using deep learning[J]. J Am Med Inform Assoc, 2025, 32(8): 1267-1275.
36
Mohsen F, Ali H, El Hajj N, et al. Artificial intelligence-based methods for fusion of electronic health records and imaging data[J]. Sci Rep, 2022, 12(1): 17981.
37
Pang T, Li P, Zhao L. A survey on automatic generation of medical imaging reports based on deep learning[J]. Biomed Eng Online, 2023, 22(1): 48.
38
Hossain MDZ, Sohel F, Shiratuddin MF, et al. A comprehensive survey of deep learning for image captioning[J]. ACM Computing Surveys, 2019, 51(6): 1-36.
39
Jing B, Xie P, Xing E. On the automatic generation of medical imaging reports[DB/OL]. (2018-07-20)[2025-03-18].
40
Yang Y, Yu J, Zhang J, et al. Joint embedding of deep visual and semantic features for medical image report generation[J]. IEEE Transactions on Multimedia, 2021, 25: 167-178.
41
Vinyals O, Toshev A, Bengio S, et al. Show and tell: A neural image caption generator[DB/OL]. (2015-04-20)[2025-03-18].
42
Yu Y, Si X, Hu C, et al. A review of recurrent neural networks: LSTM cells and network architectures[J]. Neural Comput, 2019, 31(7): 1235-1270.
43
Alsharid M, Sharma H, Drukker L, et al. Captioning ultrasound images automatically[J]. Med Image Comput Comput Assist Interv, 2019, 22: 338-346.
44
Alsharid M, Cai Y, Sharma H, et al. Gaze-assisted automatic captioning of fetal ultrasound videos using three-way multi-modal deep neural networks[J]. Med Image Anal, 2022, 82: 102630.
45
Zeng X, Wen L, Liu B, et al. Deep learning for ultrasound image caption generation based on object detection[J]. Neurocomputing, 2020, 392: 132-141.
46
Li J, Su T, Zhao B, et al. Ultrasound report generation with cross-modality feature alignment via unsupervised guidance[J].IEEE Trans Med Imaging, 44(1): 19-30.
47
Antol S, Agrawal A, Lu J, et al. Vqa: Visual question answering[DB/OL].(2016-10-27)[2025-03-18].
48
Open AI, Achiam J, Adler S, et al. Gpt-4 technical report[DB/OL]. (2024-05-04)[2025-03-18].
49
Liu H, Li C, Wu Q, et al. Visual instruction tuning[DB/OL]. (2023-11-11)[2025-03-18].
50
Wu J, Gan W, Chen Z, et al. Multimodal large language models: A survey[DB/OL]. (2023-11-22)[2025-03-18].
51
Li J, Li D, Savarese S, et al. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models[DB/OL]. (2023-07-13) [2025-03-18].
52
Zhang X, Wu C, Zhao Z, et al. PMC-VQA: Visual instruction tuning for medical visual question answering[DB/OL]. (2024-09-08)[2025-03-18].
53
Tu T, Azizi S, Driess D, et al. Towards generalist biomedical AI[J]. NEJM AI, 2024, 1(3): AIoa2300138.
54
Li X, Zhao L, Zhang L, et al. Artificial general intelligence for medical imaging analysis[J]. IEEE Rev Biomed Eng, 2025, 18: 113-129.
55
Li C, Wong C, Zhang S, et al. Llava-med: Training a large language-and-vision assistant for biomedicine in one day[DB/OL]. (2023-06-01)[2025-03-18].
56
Thawakar OC, Shaker AM, Mullappilly SS, et al. XrayGPT: Chest radiographs summarization using large medical vision-language models[DB/OL]. (2023-06-13)[2025-03-18].
57
Guo X, Chai W, Li SY, et al. LLaVA-Ultra: Large Chinese language and vision assistant for ultrasound[DB/OL]. (2024-10-19)[2025-03-18].
58
Hartsock I, Rasool G. Vision-language models for medical report generation and visual question answering: A review[J]. Front Artif Intell, 2024, 7: 1430984.
59
Ji Z, Lee N, Frieske R, et al. Survey of hallucination in natural language generation[J]. ACM Computing Surveys, 2023, 55(12): 1-38.
60
Huang C, Song P, Trzasko JD, et al. Simultaneous noise suppression and incoherent artifact reduction in ultrafast ultrasound vascular imaging[J]. IEEE Trans Ultrason Ferroelectr Freq Control, 2021, 68(6): 2075-2085.
61
Goceri E. Medical image data augmentation: techniques, comparisons and interpretations[J]. Artif Intell Rev, 2023, 20: 1-45.
62
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[DB/OL]. (2014-06-10)[2025-03-18].
63
Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[DB/OL]. (2020-11-16)[2025-07-13].
64
Cao Y, Li S, Liu Y, et al. A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT[DB/OL]. (2023-03-07)[2025-03-18].
65
Bargsten L, Schlaefer A. SpeckleGAN: a generative adversarial network with an adaptive speckle layer to augment limited training data for ultrasound image processing[J]. Int J Computer Assist Radiol Surg, 2020, 15(9): 1427-1436.
66
Maack L, Holstein L, Schlaefer A. GANs for generation of synthetic ultrasound images from small datasets[J]. Current Directions in Biomedical Engineering, 2022, 8(1): 17-20.
67
李洋, 蔡金玉, 党晓智, 等. 基于深度学习的乳腺超声应变弹性图像生成模型的应用研究[J/OL]. 中华医学超声杂志(电子版), 2024, 21(6): 563-570.
68
Feng R, Lin Z, Zhu J, et al. Uncertainty principles of encoding GANs[C]. Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021, 139: 3240-3251.
69
Stojanovski D, Hermida U, Lamata P, et al. Echo from noise: synthetic ultrasound image generation using diffusion models for real image segmentation[J]. Simpl Med Ultrasound(2023), 2023, 14337: 34-43.
70
Song Z, Zhou Y, Wang J, et al. Synthesizing real-time ultrasound images of muscle based on biomechanical simulation and conditional diffusion network[J]. IEEE Trans Ultrason Ferroelectr Freq Control, 2024, 71(11): 1501-1513.
71
Nori H, King N, McKinney SM, et al. Capabilities of GPT-4 on medical challenge problems[DB/OL]. (2023-04-12)[2025-03-18].
72
Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models[J]. PLoS Digit Health, 2023, 2(2): e0000198.
73
Farhat F, Chaudhry BM, Nadeem M, et al. Evaluating large language models for the national premedical exam in India: comparative analysis of GPT-3.5, GPT-4, and Bard[J]. JMIR Med Educ, 2024, 10: e51523.
74
AlSaad R, Abd-Alrazaq A, Boughorbel S, et al. Multimodal large language models in health care: applications, challenges, and future outlook[J]. J Med Internet Res, 2024, 26: e59505.
75
Wu Y, Liu Y, Yang Y, et al. A concept-based interpretable model for the diagnosis of choroid neoplasias using multimodal data[J]. Nat Commun, 2025, 16(1): 3504.
76
Niu S, Ma J, Bai L, et al. EHR-KnowGen: Knowledge-enhanced multimodal learning for disease diagnosis generation[J]. Information Fusion, 2024, 102: 102069.
77
Yang S, Niu J, Wu J, et al. Automatic ultrasound image report generation with adaptive multimodal attention mechanism[J]. Neurocomputing, 2021, 427: 40-49.
No related articles found!
阅读次数
全文


摘要


AI


AI小编
你好!我是《中华医学电子期刊资源库》AI小编,有什么可以帮您的吗?