切换至 "中华医学电子期刊资源库"

中华医学超声杂志(电子版) ›› 2025, Vol. 22 ›› Issue (09) : 832 -837. doi: 10.3877/cma.j.issn.1672-6448.2025.09.007

超声医学质量控制

大语言模型DeepSeek-R1在甲状腺超声报告质量控制中的初步应用
张振奇, 齐艺涵, 王璐, 胡紫玥, 李婷婷, 卢漫()   
  1. 610041 成都,四川省肿瘤医院·研究所,四川省肿瘤临床医学研究中心,四川省癌症防治中心,电子科技大学附属肿瘤医院
  • 收稿日期:2025-08-07 出版日期:2025-09-01
  • 通信作者: 卢漫
  • 基金资助:
    国家重点研发计划(2019YFE0196700); 国家自然科学基金(82272015)

Leveraging the large language model DeepSeek-R1 for quality control of thyroid ultrasound reports

Zhenqi Zhang, Yihan Qi, Lu Wang, Ziyue Hu, Tingting Li, Man Lu()   

  1. Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, University of Electronic Science and Technology of China, Chengdu 610041, China
  • Received:2025-08-07 Published:2025-09-01
  • Corresponding author: Man Lu
引用本文:

张振奇, 齐艺涵, 王璐, 胡紫玥, 李婷婷, 卢漫. 大语言模型DeepSeek-R1在甲状腺超声报告质量控制中的初步应用[J/OL]. 中华医学超声杂志(电子版), 2025, 22(09): 832-837.

Zhenqi Zhang, Yihan Qi, Lu Wang, Ziyue Hu, Tingting Li, Man Lu. Leveraging the large language model DeepSeek-R1 for quality control of thyroid ultrasound reports[J/OL]. Chinese Journal of Medical Ultrasound (Electronic Edition), 2025, 22(09): 832-837.

目的

探究基于大语言模型DeepSeek-R1构建的智能质量控制系统在甲状腺超声报告书写质量控制中的应用价值。

方法

随机抽取2024年1月至12月四川省肿瘤医院超声医学科的甲状腺超声检查报告120份作为研究对象,分别由DeepSeek-R1和2名超声主治医师进行甲状腺超声检查报告的独立评估,将3名副高级职称医师组成的专家组的联合评审结果作为金标准。绘制DeepSeek-R1与人工质量控制的受试者操作特征(ROC)曲线评估其效能,计算曲线下面积(AUC)、敏感度、特异度、阳性预测值、阴性预测值,采用DeLong检验比较AUC的差异;采用Kappa检验评价不同质量控制者之间的结果一致性程度。

结果

在120份报告中,以专家组评审为金标准,DeepSeek-R1模型的ROC曲线AUC为0.959[95%可信区间(CI):0.902~1.000],其敏感度为92.86%,特异度为98.91%,均优于2名超声主治医师[AUC:0.828(95%CI:0.720~0.936),0.835(95%CI:0.731~0.940);敏感度:67.86%,71.43%;特异度:97.83%,95.65%)]。经DeLong检验,DeepSeek-R1与2位超声主治医师的AUC差异具有统计学意义(P=0.020、0.025)。Kappa检验显示,DeepSeek-R1与2位超声主治医师的判定结果具有良好的一致性(κ值分别为0.64和0.63)。

结论

基于DeepSeek-R1模型的智能质量控制系统在甲状腺超声检查报告规范性审核中表现出优于人工的效能,能有效降低主观误差,具有良好的临床应用价值,但仍需关注模型局限性等问题。

Objective

To evaluate the utility of an intelligent quality control system based on the large language model DeepSeek-R1 in quality control of thyroid ultrasound report writing.

Methods

A total of 120 thyroid ultrasound examination reports from the Department of Ultrasound, Sichuan Cancer Hospital, collected between January and December 2024, were randomly selected as study subjects. The reports were independently evaluated by DeepSeek-R1 and two attending ultrasound physicians. The joint assessment results of an expert panel consisting of three associate chief physicians were regarded as the reference standard. Receiver operating characteristic (ROC) curves were generated to assess the performance of DeepSeek-R1 and human quality control, with the area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) calculated. Differences in AUC were compared using the DeLong test. Agreement among different quality controllers was evaluated with the Kappa statistic.

Results

Using the expert panel's assessment as the reference standard, the AUC of the DeepSeek-R1 model was 0.959 [95% confidence interval (CI): 0.902–1.000], with a sensitivity of 92.86% and specificity of 98.91%, both superior to those of the two human quality controllers [AUC: 0.828 (95%CI: 0.720-0.936), 0.835(95%CI: 0.731-0.940); sensitivity: 67.86%, 71.43%; specificity: 97.83%, 95.65%]. The DeLong test showed that the differences in AUC between DeepSeek-R1 and the two physicians were statistically significant (P=0.020, 0.025). The Kappa test demonstrated good consistency between DeepSeek-R1 and the two physicians, with κ values of 0.64 and 0.63, respectively.

Conclusion

The intelligent quality control system based on the DeepSeek-R1 model demonstrated superior performance to human reviewers in the normative evaluation of thyroid ultrasound reports, effectively reducing subjective errors and showing promising clinical applicability. However, the limitations of the model requires careful consideration.

图1 DeepSeek-R1大语言模型超声报告质量控制流程示意图 注:API为应用程序接口,Prompt为指令
图2 DeepSeek-R1大语言模型与超声医师对甲状腺超声报告质量控制的受试者操作特征曲线图。DeepSeek-R1的曲线位置最高,表明其在各阈值水平下均具有优异的敏感度-特异度平衡性能;超声医师1(红线)与超声医师2(绿线)的表现相近,但均低于DeepSeek-R1模型
表1 DeepSeek-R1大语言模型与2名超声医师对甲状腺超声报告质量控制的效能比较(120份)
1
宋勇, 张伟, 李锐, 等. PDCA循环法在降低超声测量数值错误报告中的应用价值 [J]. 临床超声医学杂志, 2023, 25(8): 650-653.
2
张梅芳, 谭莹, 朱巧珍, 等. 早孕期胎儿头臀长正中矢状切面超声图像的人工智能质控研究[J/OL]. 中华医学超声杂志(电子版), 2023, 20(9): 945-950.
3
朱巧珍, 谭莹, 张梅芳, 等. 妊娠早期胎儿心脏人工智能质控模型的研究与应用 [J]. 中华超声影像学杂志, 2023, 32(11): 952-958.
4
Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al. Large language models in medicine [J]. Nat Med, 2023, 29(8): 1930-1940.
5
Qiu J, Lam K, Li G, et al. LLM-based agentic systems in medicine and healthcare [J]. Nat Mach Intell, 2024, 6(12): 1418-1420.
6
Ullah E, Parwani A, Baig MM, et al. Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology–a recent scoping review [J]. Diagn Pathol, 2024, 19(1): 43.
7
Guo D, Yang D, Zhang H, et al. DeepSeek-R1: Incentivizing reasoning capability in llms via reinforcement learning [J]. arXiv, 2025: arXiv: 2501.12948.
8
Mondillo G, Colosimo S, Perrotta A, et al. Comparative evaluation of advanced AI reasoning models in pediatric clinical decision support: ChatGPT O1 vs. DeepSeek-R1 [J]. medRxiv, 2025: 2025.01. 27.25321169.
9
薛恩生, 陈舜. 超声医学专业医疗质量控制指标(2022年版)的解读及临床应用[J/OL]. 中华医学超声杂志(电子版), 2023, 20(7): 690-692.
10
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [J]. arXiv, 2017: arXiv.1706.03762.
11
伯小皖, 郭乐杭, 余松远, 等. 甲状腺结节人工智能自动分割和分类系统的建立和验证[J/OL]. 中华医学超声杂志(电子版), 2024, 21(3): 304-309.
12
唐书宣, 徐永祥, 周洁, 等. 基于人工智能的新安医学智能辅助诊疗系统研究 [J]. 南京中医药大学学报, 2024, 40(12): 1348-1356.
13
宋晓微, 尹伟, 李佳褀, 等. 基于ChatGLM急性缺血性卒中大血管闭塞的识别与诊断 [J]. 中国卒中杂志, 2025, 20(1): 70-77.
14
田崇腾, 刘静, 王晓燕, 等. 大语言模型GPT在医疗文本中的应用综述[J/OL]. 计算机科学与探索, 2025, 19(8): 2043-2056.
15
Bhayana R, Nanda B, Dehkharghanian T, et al. Large language models for automated synoptic reports and resectability categorization in pancreatic cancer [J]. Radiology, 2024, 311(3): e233117.
16
Chen Y, Yang H, Pan H, et al. Burextract-llama: An llm for clinical concept extraction in breast ultrasound reports [C]// Proceedings of the 1st International Workshop on Multimedia Computing for Health and Medicine. 2024: 53-58.
17
Xu J, Xia S, Hua Q, et al. Performance of ChatGPT and radiology residents on ultrasonography board-style questions [J]. Advanced Ultrasound in Diagnosis and Therapy, 2024, 8(4): 250-254.
18
Choi HS, Song JY, Shin KH, et al. Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer [J]. Radiat Oncol J, 2023, 41(3): 209-216.
19
Yan Y, Wang K, Feng B, et al. The use of large language models in detecting Chinese ultrasound report errors [J]. NPJ Digit Med, 2025, 8(1): 66.
20
颜见智, 何雨鑫, 骆子烨, 等. 生成式大语言模型在医疗领域的潜在典型应用与面临的挑战 [J]. 医学信息学杂志, 2023, 44(9): 23-31.
[1] 杨明, 许彩娜, 张宁, 王晓娜, 贾坤, 宋伟, 李丽, 薛红元. 2023—2024年度河北省甲状腺癌超声诊断符合率现状分析[J/OL]. 中华医学超声杂志(电子版), 2025, 22(09): 846-849.
[2] 周欣, 梁豪进, 邓振宇, 肖菊花, 周小军. 基于人工智能技术评价江西省孕11~13+6周产前超声筛查质量现状及提出能力提升对策[J/OL]. 中华医学超声杂志(电子版), 2025, 22(09): 850-857.
[3] 江瑶, 蒋程, 余翔, 谭莹, 温昕, 温慧莹, 彭桂艳, 李胜利. 基于注意力机制改进的子宫解剖结构检测与分割多任务模型的性能评估[J/OL]. 中华医学超声杂志(电子版), 2025, 22(08): 703-710.
[4] 陈明朗, 许凯, 黄稚熙, 梁博诚, 贺杰, 黄海珊, 马微波, 谭莹, 邹志英, 刘晓棠, 彭桂艳, 陈家希, 钟晓红. MobileNetV4:面向产前超声的主动脉弓分支异常智能诊断研究[J/OL]. 中华医学超声杂志(电子版), 2025, 22(08): 711-720.
[5] 杨丽仙, 黄稚熙, 梁博诚, 欧阳淑媛, 陈明朗, 赵英丽, 马薇波, 缪敬, 王磊, 袁鹰. 基于产前时序超声数据的新生儿出生体重智能预测[J/OL]. 中华医学超声杂志(电子版), 2025, 22(08): 721-732.
[6] 刘晴晴, 俞劲, 徐玮泽, 张志伟, 潘晓华, 舒强, 叶菁菁. OBICnet图像分类模型在小儿先天性心脏病超声筛查中的应用价值[J/OL]. 中华医学超声杂志(电子版), 2025, 22(08): 754-760.
[7] 曹柳柳, 王佳佳, 武林松, 彭梅, 姜凡. PDCA导向的危急值管理质量提升:安徽省超声科调查干预与数据反馈的实证研究[J/OL]. 中华医学超声杂志(电子版), 2025, 22(07): 628-632.
[8] 应康, 郭良云, 胡震. 超声心动图对成人型主动脉缩窄漏诊原因分析及质量控制改进措施[J/OL]. 中华医学超声杂志(电子版), 2025, 22(07): 633-636.
[9] 张杰, 何年安, 叶显俊, 刘阳, 张行, 裴蓓. 安徽省腹部超声检查现状分析与质量提升策略[J/OL]. 中华医学超声杂志(电子版), 2025, 22(07): 637-642.
[10] 傅小芳, 杨青翰, 孙昌琴, 豆梦杰, 胡峻溥, 孙灏, 吕发勤. 基于YOLO 11的肢体长骨骨折断端超声检测模型的临床价值[J/OL]. 中华医学超声杂志(电子版), 2025, 22(06): 541-546.
[11] 张家乐, 田璐, 伍国胜, 刘莹莹, 李志, 吴琼, 纪世召. 浅析人工智能在海战烧伤诊疗中的应用前景[J/OL]. 中华损伤与修复杂志(电子版), 2025, 20(05): 426-430.
[12] 石爽, 王艺, 史娜, 徐微. 多源信息融合下人工智能在慢性伤口管理中的精准应用与展望[J/OL]. 中华损伤与修复杂志(电子版), 2025, 20(05): 431-435.
[13] 左泽平, 宇洪涛, 朱金海, 钱俊杰, 徐秀民, 王一行, 梁朝朝, 郝宗耀. 智能无线腔镜在超微通道经皮肾镜取石术中的临床应用[J/OL]. 中华腔镜泌尿外科杂志(电子版), 2025, 19(06): 736-741.
[14] 谢钰嵘, 唐流康, 陈明政, 王伟利, 缪文学, 谢峰. 人工智能在肝胆外科临床教学中的应用[J/OL]. 中华肝脏外科手术学电子杂志, 2025, 14(06): 822-827.
[15] 王玲洁, 王瑷萍, 李朝军, 丁跃有, 杨德业, 赵清, 崔兆强, 王京昆, 王宏宇. 心脏和血管健康技术创新研发策略专家共识(2024第一次报告,上海)[J/OL]. 中华临床医师杂志(电子版), 2025, 19(05): 323-336.
阅读次数
全文


摘要


AI


AI小编
你好!我是《中华医学电子期刊资源库》AI小编,有什么可以帮您的吗?