2024 , Vol. 21 >Issue 03: 304 - 309
DOI: https://doi.org/10.3877/cma.j.issn.1672-6448.2024.03.009
甲状腺结节人工智能自动分割和分类系统的建立和验证
Copy editor: 吴春凤
收稿日期: 2023-06-11
网络出版日期: 2024-06-05
基金资助
上海市科学技术委员会项目(21Y11910800)
版权
Establishment and verification of an artificial intelligence system for automatic segmentation and classification of thyroid nodules
Received date: 2023-06-11
Online published: 2024-06-05
Copyright
开发一种能自动分割和诊断甲状腺结节良恶性的人工智能(AI)系统。
收集872例2017年10月至2018年10月于上海市第十人民医院行穿刺活检确认的甲状腺结节患者的超声图像,利用AI方法对这些图片进行处理、检测等并最终反馈结果,建立AI系统,并对AI系统进行验证及内部测试。按照6∶2∶2的比例将所有收集的超声图像分为训练集、验证集和内部测试集进行初步验证测试。纳入外院209例甲状腺结节患者(共209个结节)超声图像再次进行验证,以穿刺或外科手术病理结果为诊断标准,计算低年资医师组、高年资医师组和AI系统诊断甲状腺结节良恶性的敏感度、特异度、准确性、阳性预测值、阴性预测值,并绘制三者诊断甲状腺结节良恶性的受试者操作特征曲线,计算曲线下面积(AUC),采用Delong检验比较AI系统与低年资医师组、高年资医师组的诊断效能。
AI系统结节自动分割率在验证集、内部测试集和外部测试集上分别为98.8%、98.9%、98.1%。在外部测试集中,AI系统的诊断敏感度、特异度及准确性与低年资医师组、高年资医师组比较,差异均无统计学意义(P均>0.017)。而AI系统诊断甲状腺结节良恶性的AUC优于低年资医师组[0.885(95%CI:0.842~0.929) vs 0.823(95%CI:0.771~0.875),P=0.022],而与高年资医师组[0.932(95%CI:0.897~0.966)]类似(P=0.096)。
本研究开发了一种能自动分割及诊断甲状腺结节良恶性的AI系统,其在外部测试集中具有较高的诊断效能,有望辅助低年资医师更准确鉴别甲状腺结节良恶性。
伯小皖 , 郭乐杭 , 余松远 , 李明宙 , 孙丽萍 . 甲状腺结节人工智能自动分割和分类系统的建立和验证[J]. 中华医学超声杂志(电子版), 2024 , 21(03) : 304 -309 . DOI: 10.3877/cma.j.issn.1672-6448.2024.03.009
To develop an artificial intelligence (AI) system that can automatically segment and diagnose benign and malignant thyroid nodules.
The ultrasound images of 872 patients with thyroid nodules confirmed by puncture biopsy at Shanghai Tenth People's Hospital from October 2017 to October 2018 were collected, and the results were processed, monitored, and finally fed back by AI methods. Then, an AI system was established, and the system was verified and tested internally. According to a ratio of 6:2:2, all the collected ultrasound images were divided into training set, validation set, and internal test set for preliminary verification test. The ultrasound images of 209 patients with thyroid nodules (a total of 209 nodules) in other hospitals were re-verified, and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value of a junior physician group, a senior physician group, and the AI system in the diagnosis of benign and malignant thyroid nodules were calculated using the pathological results of puncture biopsy or surgery as the diagnostic criteria. The receiver operation characteristic curves of the three in the diagnosis of benign and malignant thyroid nodules were plotted, and the area under the curve (AUC) was calculated. The Delong test was used to compare the diagnostic performance of the AI system with junior physicians and senior physicians.
The automatic nodule segmentation rates of the AI system were 98.8%, 98.9%, and 98.1% in the validation set, internal test set, and external test set, respectively. In the external test set, there were no significant differences in the diagnostic sensitivity, specificity, or accuracy between the AI system and the junior or senior physician group (P>0.017 for all). The AUC of the AI system in the diagnosis of benign and malignant thyroid nodules was better than that of junior physicians [0.885 (95%CI: 0.842-0.929) vs 0.823 (95%CI: 0.771-0.875), P=0.022], but similar to that of senior physicians [0.932 (95%CI: 0.897-0.966)] (P=0.096).
We have developed an AI system that can automatically segment and diagnose benign and malignant thyroid nodules, which has high diagnostic efficacy in the external test set, and it is expected to assist junior physicians to more accurately identify benign and malignant thyroid nodules.
Key words: Artificial intelligence; Thyroid nodule; Ultrasound
表1 人工智能系统辅助诊断甲状腺结节良恶性的场景展示 |
属性模型 | 良恶性模型 | 辅助诊断结果 |
---|---|---|
无结节 | 无结节 | 未见明显异常 |
场景1 | ||
ACR TI-RADS评分0~2分 | 良性 | 考虑良性 |
ACR TI-RADS评分3分 | 良性 | 低度可疑恶性 |
ACR TI-RADS评分4~6分 | 良性 | 中度可疑恶性 |
ACR TI-RADS评分7分 | 良性 | 中度可疑恶性 |
场景2 | ||
ACR TI-RADS评分0~2分 | 恶性 | 低度可疑恶性 |
ACR TI-RADS评分3分 | 恶性 | 低度可疑恶性 |
ACR TI-RADS评分4~6分 | 恶性 | 中度可疑恶性 |
ACR TI-RADS评分7分 | 恶性 | 高度可疑恶性 |
注:ACR TI-RADS为美国放射协会发布的甲状腺影像报告和数据系统;人工智能系统通过属性模型和良恶性模型进行综合判断 |
表2 2个数据集甲状腺结节患者基本临床资料比较 |
参数 | 训练集、验证集和内部测试集(n=872) | 外部测试集(n=209) | 统计值 | P值 |
---|---|---|---|---|
女性[例(%)] | 686(78.7) | 167(79.9) | χ2=0.154 | 0.694 |
年龄(岁,![]() | 48.0±13.5 | 44.8±11.7 | t=14.031 | <0.001 |
结节最大径(mm,![]() | 22.1±18.8 | 24.7±16.4 | t=5.165 | 0.023 |
实性成分[例(%)] | 439(50.3) | 114(54.5) | χ2=1.191 | 0.275 |
低或极低回声[例(%)] | 384(44.0) | 98(46.9) | χ2=0.556 | 0.456 |
点状钙化[例(%)] | 285(32.7) | 63(30.1) | χ2=0.498 | 0.480 |
形态不规则[例(%)] | 171(19.6) | 38(18.2) | χ2=0.221 | 0.639 |
边界模糊[例(%)] | 244(28.0) | 42(20.1) | χ2=5.388 | 0.020 |
纵横比>1[例(%)] | 281(32.2) | 62(29.7) | χ2=0.510 | 0.475 |
恶性占比[例(%)] | 375(43.0) | 100(47.8) | χ2=1.605 | 0.205 |
表3 外部测试集中人工智能系统、低年资医师和高年资医师对甲状腺结节的诊断表现[%(95%CI)] |
参数 | 敏感度 | 特异度 | 阳性预测值 | 阴性预测值 | 准确性 |
---|---|---|---|---|---|
人工智能系统 | 89.0(82.9~95.1) | 88.1(82.0~94.2) | 87.3(80.8~93.7) | 89.7(84.0~95.5) | 88.5(88.4~88.6) |
低年资医师 | 82.0(74.5~89.5) | 82.6(75.4~89.7)a | 81.2(73.6~88.8)a | 83.3(76.3~90.4) | 82.3(82.2~82.4)a |
高年资医师 | 90.0(84.1~95.9) | 96.3(92.8~99.9) | 95.7(91.7~99.8) | 91.3(86.2~96.5) | 93.3(93.2~93.4) |
χ2值 | 3.360 | 10.675 | 9.742 | 3.755 | 12.086 |
P值 | 0.186 | 0.005 | 0.008 | 0.153 | 0.002 |
注:a与高年资医师组比较,差异具有统计学意义(P=0.001、0.002、P=0.001) |
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
肖冰心, 吴国柱. AI在甲状腺结节超声智能诊断中的应用 [J]. 中国医疗设备, 2023, 38(1): 165-170.
|
7 |
张蕊, 牛丽娟. 基于常规超声的深度学习技术在甲状腺结节良恶性鉴别中的研究进展[J]. 癌症进展, 2022, 20(8): 757-759, 765.
|
8 |
梁羽, 岳林先, 曹文斌, 等. 基于计算机辅助诊断的人工智能在甲状腺TI-RADS分类中的临床应用价值 [J]. 四川医学, 2021, 42(2): 127-131.
|
9 |
王婷婷, 闫瑞芳, 李潜, 等. 常规超声联合S-detect及超声弹性成像技术对鉴别良恶性甲状腺结节的临床应用价值 [J]. 世界复合医学, 2022, 8(8): 1-4, 9.
|
10 |
邢博缘, 赵云, 平杰, 等. 超声S-Detect技术对甲状腺TI-RADS 4类结节良恶性的诊断价值 [J]. 中国超声医学杂志, 2021, 37(5): 497-501.
|
11 |
方明娣, 彭梅, 毕玉. 人工智能S-Detect技术结合钙化特征对甲状腺结节的诊断价值[J/OL]. 中华医学超声杂志(电子版), 2021, 18(2): 177-181.
|
12 |
李婷婷, 卢漫, 巫明钢, 等. 计算机辅助诊断系统对甲状腺结节的诊断价值研究[J/CD]. 中华医学超声杂志(电子版), 2019, 16(9): 660-664.
|
13 |
李盈盈, 李欣洋, 阎琳, 等. S-detect技术辅助住院医师诊断甲状腺影像报告和数据系统4类≤1 cm甲状腺结节的应用价值[J/OL]. 中华医学超声杂志(电子版), 2022 , 19(7): 682-687.
|
14 |
|
/
〈 |
|
〉 |