多模态基础模型与超声影像的交叉应用

doi:10.3877/cma.j.issn.1672-6448.2025.08.013

1	Chernyak V, Fowler KJ, Do RKG, et al. LI-RADS: Looking back, looking forward [J]. Radiology, 2023, 307(1): e222801.
2	Christian B, van der Pol, Matthew DF, et al. CT/MRI and CEUS LI-RADS major features association with hepatocellular carcinoma: individual patient data meta-analysis[J]. Radiology, 2022, 302(2): 326-335.
3	Kono Y, Lyshchik A, Cosgrove D, et al. Contrast enhanced ultrasound (CEUS) liver imaging reporting and data system (LI-RADS®): the official version by the American College of Radiology (ACR)[J]. Ultraschall Med, 2017, 38(1): 85-86.
4	Li Y, Pan L, Peng Y, et al. Application of deep learning-based multimodal fusion technology in cancer diagnosis: A survey[J]. Engineering Applications of Artificial Intelligence, 2025, 143: 109972.
5	Dhar J, Zaidi N, Haghighat M, et al. Multimodal fusion learning with dual attention for medical imaging[DB/OL]. (2024-11-02)[2025-03-18].
6	Ding W, Meng Y, Ma J, et al. Contrast-enhanced ultrasound-based AI model for multi-classification of focal liver lesions[J]. J Hepatol, 2025, 83(2): 426-439.
7	Wang Y, Ge X, Ma H, et al. Deep learning in medical ultrasound image analysis: a review[J]. IEEE Access, 2021, 9: 54310-54324.
8	Krasniqi E, Filomeno L, Arcuri T, et al. Multimodal deep learning for predicting neoadjuvant treatment outcomes in breast cancer: a systematic review[J]. Biol Direct, 2025, 20(1): 72.
9	Yang H, Yang M, Chen J, et al. Multimodal deep learning approaches for precision oncology: a comprehensive review[J]. Brief Bioinform, 2024, 26(1): bbae699.
10	Goertzel B. Artificial general intelligence: Concept, state of the art, and future prospects[J]. Journal of Artificial General Intelligence, 2014, 5(1): 1-46.
11	Pei J, Deng L, Song S, et al. Towards artificial general intelligence with hybrid Tianjic chip architecture[J]. Nature, 2019, 572(7767): 106-111.
12	Fei N, Lu Z, Gao Y, et al. Towards artificial general intelligence via a multimodal foundation model[J]. Nat Commun, 2022, 13(1): 3094.
13	Shevlin H, Vold K, Crosby M, et al. The limits of machine intelligence: Despite progress in machine intelligence, artificial general intelligence is still a major challenge[J]. EMBO Rep, 2019, 20(10): e49177.
14	Moor M, Banerjee O, Abad ZSH, et al. Foundation models for generalist medical artificial intelligence[J]. Nature, 2023, 616(7956): 259-265.
15	Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge[J]. Nature, 2023, 620(7972): 172-180.
16	Yang X, Chen A, PourNejatian N, et al. A large language model for electronic health records[J]. NPJ Digit Med, 2022, 5(1): 194.
17	Desai K, Johnson J. VirTex: Learning visual representations from textual annotations[DB/OL]. (2021-09-25)[2025-03-18].
18	Radford A, Kim JW, Hallacy C, et al. Learning transferable visual models from natural language supervision[DB/OL]. (2021-02-26)[2025-03-18].
19	Li Y, Liang F, Zhao L, et al. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm [DB/OL]. (2022-03-14)[2025-03-18].
20	Ramesh A, Dhariwal P, Nichol A, et al. Hierarchical text-conditional image generation with CLIP latents [DB/OL]. (2022-04-13)[2025-03-18].
21	Adams LC, Busch F, Truhn D, et al. What does DALL-E 2 know about radiology?[J]. J Med Internet Res, 2023, 25: e43110.
22	Bachmann R, Mizrahi D, Atanov A, et al. MultiMAE: Multi-modal multi-task masked autoencoders[DB/OL]. (2022-04-04)[2025-03-18].
23	Kang Q, Lao Q, Gao J, et al. Deblurring masked image modeling for ultrasound image analysis[J]. Med Image Anal, 2024, 97: 103256.
24	刘静, 郭龙腾. GPT-4 对多模态基础模型在多模态理解,生成,交互上的启发[J]. 中国科学基金, 2023, 37(5): 793-802.
25	Codella NCF, Jin Y, Jain S, et al. Medimageinsight: An open-source embedding model for general domain medical imaging[DB/OL]. (2024-10-09)[2025-03-18].
26	Alberto Santamaria. Image search series part 1: Chest X-ray lookup with MedImageInsight [DB/OL]. (2025-02-01)[2025-03-18].
27	Wang G, Ye J, Cheng J, et al. SAM-Med3D-MoE: Towards a non-forgetting segment anything model via mixture of experts for 3D medical image segmentation[DB/OL]. (2024-07-06)[2025-03-18].
28	Qiu J, Wu J, Wei H, et al. VisionFM: a multi-modal multi-task vision foundation model for generalist ophthalmic artificial intelligence[DB/OL]. (2023-10-08)[2025-03-18].
29	Koleilat T, Asgariandehkordi H, Rivaz H, et al. BiomedCoOp: Learning to prompt for biomedical vision-language models[DB/OL]. (2024-11-21)[2025-03-18].
30	Jee J, Fong C, Pichotta K, et al. Automated real-world data integration improves cancer outcome prediction[J]. Nature, 2024, 636(8043): 728-736.
31	Qin X, Liu X, Xia L, et al. Multimodal ultrasound deep learning to detect fibrosis in early chronic kidney disease[J]. Ren Fail, 2024, 46(2): 2417740.
32	Braman N, Gordon JWH, Goossens ET, et al. Deep orthogonal fusion: multimodal prognostic biomarker discovery integrating radiology, pathology, genomic, and clinical data [DB/OL]. (2021-07-01)[2025-03-18].
33	Wan CF, Jiang ZY, Wang YQ, et al. Radiomics of multimodal ultrasound for early prediction of pathologic complete response to neoadjuvant chemotherapy in breast cancer[J]. Acad Radiol, 2025, 32(4): 1861-1873.
34	Mondol RK, Millar EKA, Sowmya A, et al. MM-SurvNet: deep learning-based survival risk stratification in breast cancer through multimodal data fusion[DB/OL]. (2024-02-19)[2025-03-18].
35	Yeghaian M, Bodalal Z, van den Broek D, et al. Multimodal integration of longitudinal noninvasive diagnostics for survival prediction in immunotherapy using deep learning[J]. J Am Med Inform Assoc, 2025, 32(8): 1267-1275.
36	Mohsen F, Ali H, El Hajj N, et al. Artificial intelligence-based methods for fusion of electronic health records and imaging data[J]. Sci Rep, 2022, 12(1): 17981.
37	Pang T, Li P, Zhao L. A survey on automatic generation of medical imaging reports based on deep learning[J]. Biomed Eng Online, 2023, 22(1): 48.
38	Hossain MDZ, Sohel F, Shiratuddin MF, et al. A comprehensive survey of deep learning for image captioning[J]. ACM Computing Surveys, 2019, 51(6): 1-36.
39	Jing B, Xie P, Xing E. On the automatic generation of medical imaging reports[DB/OL]. (2018-07-20)[2025-03-18].
40	Yang Y, Yu J, Zhang J, et al. Joint embedding of deep visual and semantic features for medical image report generation[J]. IEEE Transactions on Multimedia, 2021, 25: 167-178.
41	Vinyals O, Toshev A, Bengio S, et al. Show and tell: A neural image caption generator[DB/OL]. (2015-04-20)[2025-03-18].
42	Yu Y, Si X, Hu C, et al. A review of recurrent neural networks: LSTM cells and network architectures[J]. Neural Comput, 2019, 31(7): 1235-1270.
43	Alsharid M, Sharma H, Drukker L, et al. Captioning ultrasound images automatically[J]. Med Image Comput Comput Assist Interv, 2019, 22: 338-346.
44	Alsharid M, Cai Y, Sharma H, et al. Gaze-assisted automatic captioning of fetal ultrasound videos using three-way multi-modal deep neural networks[J]. Med Image Anal, 2022, 82: 102630.
45	Zeng X, Wen L, Liu B, et al. Deep learning for ultrasound image caption generation based on object detection[J]. Neurocomputing, 2020, 392: 132-141.
46	Li J, Su T, Zhao B, et al. Ultrasound report generation with cross-modality feature alignment via unsupervised guidance[J].IEEE Trans Med Imaging, 44(1): 19-30.
47	Antol S, Agrawal A, Lu J, et al. Vqa: Visual question answering[DB/OL].(2016-10-27)[2025-03-18].
48	Open AI, Achiam J, Adler S, et al. Gpt-4 technical report[DB/OL]. (2024-05-04)[2025-03-18].
49	Liu H, Li C, Wu Q, et al. Visual instruction tuning[DB/OL]. (2023-11-11)[2025-03-18].
50	Wu J, Gan W, Chen Z, et al. Multimodal large language models: A survey[DB/OL]. (2023-11-22)[2025-03-18].
51	Li J, Li D, Savarese S, et al. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models[DB/OL]. (2023-07-13) [2025-03-18].
52	Zhang X, Wu C, Zhao Z, et al. PMC-VQA: Visual instruction tuning for medical visual question answering[DB/OL]. (2024-09-08)[2025-03-18].
53	Tu T, Azizi S, Driess D, et al. Towards generalist biomedical AI[J]. NEJM AI, 2024, 1(3): AIoa2300138.
54	Li X, Zhao L, Zhang L, et al. Artificial general intelligence for medical imaging analysis[J]. IEEE Rev Biomed Eng, 2025, 18: 113-129.
55	Li C, Wong C, Zhang S, et al. Llava-med: Training a large language-and-vision assistant for biomedicine in one day[DB/OL]. (2023-06-01)[2025-03-18].
56	Thawakar OC, Shaker AM, Mullappilly SS, et al. XrayGPT: Chest radiographs summarization using large medical vision-language models[DB/OL]. (2023-06-13)[2025-03-18].
57	Guo X, Chai W, Li SY, et al. LLaVA-Ultra: Large Chinese language and vision assistant for ultrasound[DB/OL]. (2024-10-19)[2025-03-18].
58	Hartsock I, Rasool G. Vision-language models for medical report generation and visual question answering: A review[J]. Front Artif Intell, 2024, 7: 1430984.
59	Ji Z, Lee N, Frieske R, et al. Survey of hallucination in natural language generation[J]. ACM Computing Surveys, 2023, 55(12): 1-38.
60	Huang C, Song P, Trzasko JD, et al. Simultaneous noise suppression and incoherent artifact reduction in ultrafast ultrasound vascular imaging[J]. IEEE Trans Ultrason Ferroelectr Freq Control, 2021, 68(6): 2075-2085.
61	Goceri E. Medical image data augmentation: techniques, comparisons and interpretations[J]. Artif Intell Rev, 2023, 20: 1-45.
62	Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[DB/OL]. (2014-06-10)[2025-03-18].
63	Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[DB/OL]. (2020-11-16)[2025-07-13].
64	Cao Y, Li S, Liu Y, et al. A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT[DB/OL]. (2023-03-07)[2025-03-18].
65	Bargsten L, Schlaefer A. SpeckleGAN: a generative adversarial network with an adaptive speckle layer to augment limited training data for ultrasound image processing[J]. Int J Computer Assist Radiol Surg, 2020, 15(9): 1427-1436.
66	Maack L, Holstein L, Schlaefer A. GANs for generation of synthetic ultrasound images from small datasets[J]. Current Directions in Biomedical Engineering, 2022, 8(1): 17-20.
67	李洋, 蔡金玉, 党晓智, 等. 基于深度学习的乳腺超声应变弹性图像生成模型的应用研究[J/OL]. 中华医学超声杂志(电子版), 2024, 21(6): 563-570.
68	Feng R, Lin Z, Zhu J, et al. Uncertainty principles of encoding GANs[C]. Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021, 139: 3240-3251.
69	Stojanovski D, Hermida U, Lamata P, et al. Echo from noise: synthetic ultrasound image generation using diffusion models for real image segmentation[J]. Simpl Med Ultrasound(2023), 2023, 14337: 34-43.
70	Song Z, Zhou Y, Wang J, et al. Synthesizing real-time ultrasound images of muscle based on biomechanical simulation and conditional diffusion network[J]. IEEE Trans Ultrason Ferroelectr Freq Control, 2024, 71(11): 1501-1513.
71	Nori H, King N, McKinney SM, et al. Capabilities of GPT-4 on medical challenge problems[DB/OL]. (2023-04-12)[2025-03-18].
72	Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models[J]. PLoS Digit Health, 2023, 2(2): e0000198.
73	Farhat F, Chaudhry BM, Nadeem M, et al. Evaluating large language models for the national premedical exam in India: comparative analysis of GPT-3.5, GPT-4, and Bard[J]. JMIR Med Educ, 2024, 10: e51523.
74	AlSaad R, Abd-Alrazaq A, Boughorbel S, et al. Multimodal large language models in health care: applications, challenges, and future outlook[J]. J Med Internet Res, 2024, 26: e59505.
75	Wu Y, Liu Y, Yang Y, et al. A concept-based interpretable model for the diagnosis of choroid neoplasias using multimodal data[J]. Nat Commun, 2025, 16(1): 3504.
76	Niu S, Ma J, Bai L, et al. EHR-KnowGen: Knowledge-enhanced multimodal learning for disease diagnosis generation[J]. Information Fusion, 2024, 102: 102069.
77	Yang S, Niu J, Wu J, et al. Automatic ultrasound image report generation with adaptive multimodal attention mechanism[J]. Neurocomputing, 2021, 427: 40-49.

No related articles found!

阅读次数

全文

摘要

选择文件类型/文献管理软件名称

选择包含的内容

Cross-applications of multimodal foundation models and ultrasound imaging

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

Cross-applications of multimodal foundation models and ultrasound imaging