切换至 "中华医学电子期刊资源库"

中华医学超声杂志(电子版) ›› 2025, Vol. 22 ›› Issue (11) : 1055 -1061. doi: 10.3877/cma.j.issn.1672-6448.2025.11.009

腹部超声影像学

基于DeepSeek大语言模型的胃癌和直肠癌超声报告结构化及T分期自动评估研究
张振奇, 卢漫, 齐艺涵, 庄敏, 胡紫玥, 王璐()   
  1. 610041 成都,四川省肿瘤医院·研究所 四川省肿瘤临床医学研究中心 四川省癌症防治中心 电子科技大学附属肿瘤医院
  • 收稿日期:2025-08-25 出版日期:2025-11-01
  • 通信作者: 王璐
  • 基金资助:
    国家重点研发计划(2019YFE0196700); 国家自然科学基金(82272015); 四川省区域创新合作项目(2024YFHZ0140)

Utility of DeepSeek large language models for structured ultrasound reporting and automated tumor staging in gastric and rectal cancer

Zhenqi Zhang, Man Lu, Yihan Qi, Ming Zhuang, Ziyue Hu, Lu Wang()   

  1. Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, University of Electronic Science and Technology of China, Chengdu 610041, China
  • Received:2025-08-25 Published:2025-11-01
  • Corresponding author: Lu Wang
引用本文:

张振奇, 卢漫, 齐艺涵, 庄敏, 胡紫玥, 王璐. 基于DeepSeek大语言模型的胃癌和直肠癌超声报告结构化及T分期自动评估研究[J/OL]. 中华医学超声杂志(电子版), 2025, 22(11): 1055-1061.

Zhenqi Zhang, Man Lu, Yihan Qi, Ming Zhuang, Ziyue Hu, Lu Wang. Utility of DeepSeek large language models for structured ultrasound reporting and automated tumor staging in gastric and rectal cancer[J/OL]. Chinese Journal of Medical Ultrasound (Electronic Edition), 2025, 22(11): 1055-1061.

目的

探讨DeepSeek大语言模型在胃癌和直肠癌超声报告结构化及T分期自动评估中的应用价值。

方法

本研究纳入四川省肿瘤医院2023年1月至2024年12月进行的胃癌和直肠癌超声检查报告共121份。由资深超声医师团队制定胃癌和直肠癌超声报告结构化模板,使用DeepSeek R1和V3模型进行结构化信息提取和T分期评估。采用召回率、精确率和F1分数评估结构化报告生成的性能,并以准确性评估T分期性能。邀请3位医师对比评估DeepSeek生成的报告与原始报告,评价其在审阅效率和临床易用性方面的表现。

结果

DeepSeek R1与V3模型在结构化信息提取方面召回率、精确率和F1分数均高于0.9,二者差异均无统计学意义(P均>0.05)。在T分期评估中,采用推理模式的DeepSeek R1模型准确性最高,达到76.86%,显著优于DeepSeek V3模型的59.50%,二者差异具有统计学意义(χ2=8.51,P<0.05)。与审阅原始报告所需的平均时间[(60.96±6.11)s/份]相比,审阅DeepSeek R1[(18.12±4.52)s/份](t=60.38;P<0.001)和DeepSeek V3[(17.15±2.60)s/份](t=71.98;P<0.001)生成的结构化报告所需时间缩短。5分李克特量表评分结果显示,原始报告的评分为3(3,3)分,而DeepSeek R1和V3报告的评分分别为1(1,2)分(Z=-9.72;P<0.001)和1(1,2)分(Z=-9.95;P<0.001),差异具有统计学意义。

结论

DeepSeek大语言模型,特别是R1版本,可有效从胃癌和直肠癌超声报告中提取结构化信息,并在T分期评估方面展现出较高的准确性,其生成的报告有助于提高审阅效率,并具有辅助临床决策的潜力。

Objective

To investigate the utility of the DeepSeek large language model (LLM) in the structured generation of ultrasound reports and the automatic assessment of T-staging for gastric and rectal cancer.

Methods

A total of 121 ultrasound examination reports for gastric and rectal cancer, collected from Sichuan Cancer Hospital between January 2023 and December 2024, were included in this study. A structured template for gastric and rectal cancer ultrasound reports was developed by a team of senior sonographers. The DeepSeek R1 and V3 models were employed to extract structured information and assess T-staging. The performance of structured report generation was evaluated using recall, precision, and F1 score, while T-staging performance was assessed based on accuracy. Three physicians were invited to compare the reports generated by DeepSeek with the original reports to evaluate review efficiency and clinical usability.

Results

Regarding structured information extraction, both DeepSeek R1 and V3 models achieved recall, precision, and F1 scores exceeding 0.9, with no statistically significant differences between the two (P>0.05). In T-staging assessment, the DeepSeek R1 model (utilizing reasoning mode) achieved the highest accuracy of 76.86%, which was significantly superior to the 59.50% achieved by the DeepSeek V3 model (χ2=8.51, P<0.05). Compared to the average time required to review original reports [(60.96±6.11) s/report], the review time for structured reports generated by DeepSeek R1 [(18.12±4.52) s/report] (t=60.38; P<0.001) and DeepSeek V3 [(17.15±2.60) s/report] (t=71.98; P<0.001) was significantly shortened. The 5-point Likert scale evaluation showed that the score for the original reports was 3 (3, 3), while the scores for the DeepSeek R1 and V3 reports were 1 (1, 2) (Z=-9.72; P<0.001) and 1 (1, 2) (Z=-9.95; P<0.001), respectively, indicating a statistically significant difference.

Conclusion

The DeepSeek large language models, particularly the R1 version, can effectively extract structured information from gastric and rectal cancer ultrasound reports and demonstrates high accuracy in T-staging assessment. The generated reports contribute to improved review efficiency and possess the potential to assist in clinical decision-making.

图1 使用DeepSeek R1和V3模型生成结构化报告处理流程图 注:API为应用程序接口
表1 胃肠超声报告的一般情况 [n=121,例(%)]
图2 胃肠超声原始报告经DeepSeek模型处理生成的结构化报告示例 注:CDFI为彩色多普勒血流成像;PSV为收缩期峰值流速;RI为阻力指数
表2 DeepSeek R1与V3模型生成的结构化报告信息提取性能评估四格表(份)
表3 DeepSeek R1与V3模型结构化信息提取的性能比较
1
袁坤山, 王如蒙, 张淑欣, 等.口服胃肠超声助显剂的研究进展[J/OL].中华医学超声杂志(电子版), 2020, 17(6): 587-590.
2
宋勇, 张伟, 李锐, 等.PDCA循环法在降低超声测量数值错误报告中的应用价值[J].临床超声医学杂志, 2023, 25(8): 650-653.
3
Woźnicki P, Laqua C, Fiku I, et al.Automatic structuring of radiology reports with on-premise open-source large language models [J]. Eur Radiol, 2025, 35(4): 2018-2029.
4
Tang W, Pei C, Yu P, et al.Generating Chinese radiology reports from X-ray images: a public dataset and an X-ray-to-reports generation method [C]. International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham: Springer Nature, 2023: 79-88.
5
Reichenpfader D, Müller H, Denecke K. A scoping review of large language model based approaches for information extraction from radiology reports [J].NPJ Digit Med, 2024, 7(1): 222.
6
Bhayana R, Nanda B, Dehkharghanian T, et al.Large language models for automated synoptic reports and resectability categorization in pancreatic cancer [J]. Radiology, 2024, 311(3): e233117.
7
秦赛梅, 文琼, 段依恋, 等.对比通义千问2.5与GPT-4o模型生成的甲状腺超声结构化报告[J].中国医学影像技术, 2025, 41(3): 409-413.
8
Sandmann S, Hegselmann S, Fujarski M, et al.Benchmark evaluation of DeepSeek large language models in clinical decision-making [J]. Nat Med, 2025, 31(8): 2546-2549.
9
Tordjman M, Liu Z, Yuce M, et al.Comparative benchmarking of the DeepSeek large language model on medical tasks and clinical reasoning [J]. Nat Med, 2025, 31(8): 2550-2555.
10
Chen J, Miao C.DeepSeek deployed in 90 Chinese tertiary hospitals: How artificial intelligence is transforming clinical practice [J]. J Med Syst, 2025, 49(1): 53.
11
Amin MB, Edge SB, Greene FL, et al. AJCC Cancer staging manual[M]. 8th ed. New York: Springer, 2017.
12
张梅芳, 谭莹, 朱巧珍, 等.早孕期胎儿头臀长正中矢状切面超声图像的人工智能质控研究[J/OL].中华医学超声杂志(电子版), 2023, 20(9): 945-950.
13
朱巧珍, 谭莹, 张梅芳, 等.妊娠早期胎儿心脏人工智能质控模型的研究与应用[J].中华超声影像学杂志, 2023, 32(11): 952-958.
14
孙舒涵, 陈雅静, 宗晴晴, 等.基于超声的深度学习列线图预测乳腺癌新辅助化疗后腋窝淋巴结状态的研究[J/OL].中华医学超声杂志(电子版), 2025, 22(2): 97-105.
15
Liu F, Zhou H, Gu B, et al.Application of large language models in medicine [J]. Nature Reviews Bioengineering, 2025: 445-464.
16
Johnson D, Goodman R, Patrinely J, et al.Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model [J]. Res Sq, 2023: rs.3.rs-2566942.
17
Caruccio L, Cirillo S, Polese G, et al.Can ChatGPT provide intelligent diagnoses? A comparative study between predictive models and ChatGPT to define a new medical diagnostic bot [J]. Expert Systems with Applications, 2024, 235: 121186.
18
谭浩, 王力, 王军永, 等.技术与社会的视角探析ChatGPT对医学的影响[J].医学与哲学, 2024, 45(5): 15-20.
19
闫温馨, 刘珏, 梁万年.DeepSeek赋能全科医学: 潜在应用与展望[J].中国全科医学, 2025, 28(17): 2065-2069.
20
刘泽垣, 王鹏江, 宋晓斌, 等.大型语言模型的幻觉问题研究综述[J].软件学报, 2025, 36(3): 1152-1185.
[1] 周欣, 梁豪进, 邓振宇, 肖菊花, 周小军. 基于人工智能技术评价江西省孕11~13+6周产前超声筛查质量现状及提出能力提升对策[J/OL]. 中华医学超声杂志(电子版), 2025, 22(09): 850-857.
[2] 张振奇, 齐艺涵, 王璐, 胡紫玥, 李婷婷, 卢漫. 大语言模型DeepSeek-R1在甲状腺超声报告质量控制中的初步应用[J/OL]. 中华医学超声杂志(电子版), 2025, 22(09): 832-837.
[3] 江瑶, 蒋程, 余翔, 谭莹, 温昕, 温慧莹, 彭桂艳, 李胜利. 基于注意力机制改进的子宫解剖结构检测与分割多任务模型的性能评估[J/OL]. 中华医学超声杂志(电子版), 2025, 22(08): 703-710.
[4] 陈明朗, 许凯, 黄稚熙, 梁博诚, 贺杰, 黄海珊, 马微波, 谭莹, 邹志英, 刘晓棠, 彭桂艳, 陈家希, 钟晓红. MobileNetV4:面向产前超声的主动脉弓分支异常智能诊断研究[J/OL]. 中华医学超声杂志(电子版), 2025, 22(08): 711-720.
[5] 杨丽仙, 黄稚熙, 梁博诚, 欧阳淑媛, 陈明朗, 赵英丽, 马薇波, 缪敬, 王磊, 袁鹰. 基于产前时序超声数据的新生儿出生体重智能预测[J/OL]. 中华医学超声杂志(电子版), 2025, 22(08): 721-732.
[6] 刘晴晴, 俞劲, 徐玮泽, 张志伟, 潘晓华, 舒强, 叶菁菁. OBICnet图像分类模型在小儿先天性心脏病超声筛查中的应用价值[J/OL]. 中华医学超声杂志(电子版), 2025, 22(08): 754-760.
[7] 刘畅, 蒋洁, 胥雪冬, 崔立刚, 张睿超, 王淑敏, 陈文. 基于C-TIRADS词典的甲状腺结节超声结构化报告使用评价[J/OL]. 中华医学超声杂志(电子版), 2025, 22(07): 620-627.
[8] 毛俊, 蔡兆伦, 尹晓南, 沈朝勇, 张波. 影像组学预测模型在胃肠间质瘤诊断及预后中的研究进展[J/OL]. 中华普通外科学文献(电子版), 2025, 19(06): 421-425.
[9] 梅昊楠, 杨瑞, 刘修恒. 人工智能辅助病理学图像分析在前列腺癌诊断中的研究进展[J/OL]. 中华腔镜泌尿外科杂志(电子版), 2026, 20(01): 1-7.
[10] 丁小博, 陈洁, 王艳波. 人工智能在泌尿系结石诊治中的应用进展[J/OL]. 中华腔镜泌尿外科杂志(电子版), 2026, 20(01): 15-21.
[11] 樊帆, 黄浩, 付莉丽, 周春梅, 马雪霞, 黄海. 下尿路功能障碍患者智能化尿控标准病房的建设及成效[J/OL]. 中华腔镜泌尿外科杂志(电子版), 2026, 20(01): 44-50.
[12] 唐玥, 陈家璐, 覃德龙, 李宗龙, 汤朝晖, 全志伟. 基于AI的多模态影像在肝癌诊治中应用及面临挑战[J/OL]. 中华肝脏外科手术学电子杂志, 2026, 15(01): 4-9.
[13] 薛怡宁, 兰雅迪, 刘兆宇, 史磊, 赵琪, 许洪伟. 基于图像的人工智能在胃癌中的研究进展[J/OL]. 中华消化病与影像杂志(电子版), 2025, 15(06): 670-675.
[14] 杨雪峰, 孙涛, 石磊, 王岳峰, 冯世英, 张振才, 崔莹莹, 谷天祥. DeepSeek+3D数字模拟下同轴多象限的125I粒子植入治疗肺癌[J/OL]. 中华临床医师杂志(电子版), 2025, 19(10): 772-778.
[15] 常芳媛, 乔春梅, 王欣, 王博冉, 赵梓孚, 李春歌, 王晓磊. 多模态超声及人工智能在细菌性和非细菌性关节炎中应用的研究进展[J/OL]. 中华临床医师杂志(电子版), 2025, 19(08): 606-611.
阅读次数
全文


摘要


AI


AI小编
你好!我是《中华医学电子期刊资源库》AI小编,有什么可以帮您的吗?