Performance of Automated Machine Learning for Diabetic Retinopathy Image Classification from Multi-field Handheld Retinal Images

Citation:

Jacoba CMP, Doan D, Salongcay RP, Aquino LAC, Silva JPY, Salva CMG, Zhang D, Alog GP, Zhang K, Locaylocay KLRB, Saunar AV, Ashraf M, Sun JK, Peto T, Aiello LP, Silva PS. Performance of Automated Machine Learning for Diabetic Retinopathy Image Classification from Multi-field Handheld Retinal Images. Ophthalmol Retina 2023;7(8):703-712.

Date Published:

2023 Aug

Abstract:

PURPOSE: To create and validate code-free automated deep learning models (AutoML) for diabetic retinopathy (DR) classification from handheld retinal images. DESIGN: Prospective development and validation of AutoML models for DR image classification. PARTICIPANTS: A total of 17 829 deidentified retinal images from 3566 eyes with diabetes, acquired using handheld retinal cameras in a community-based DR screening program. METHODS: AutoML models were generated based on previously acquired 5-field (macula-centered, disc-centered, superior, inferior, and temporal macula) handheld retinal images. Each individual image was labeled using the International DR and diabetic macular edema (DME) Classification Scale by 4 certified graders at a centralized reading center under oversight by a senior retina specialist. Images for model development were split 8-1-1 for training, optimization, and testing to detect referable DR ([refDR], defined as moderate nonproliferative DR or worse or any level of DME). Internal validation was performed using a published image set from the same patient population (N = 450 images from 225 eyes). External validation was performed using a publicly available retinal imaging data set from the Asia Pacific Tele-Ophthalmology Society (N = 3662 images). MAIN OUTCOME MEASURES: Area under the precision-recall curve (AUPRC), sensitivity (SN), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), accuracy, and F1 scores. RESULTS: Referable DR was present in 17.3%, 39.1%, and 48.0% of the training set, internal validation, and external validation sets, respectively. The model's AUPRC was 0.995 with a precision and recall of 97% using a score threshold of 0.5. Internal validation showed that SN, SP, PPV, NPV, accuracy, and F1 scores were 0.96 (95% confidence interval [CI], 0.884-0.99), 0.98 (95% CI, 0.937-0.995), 0.96 (95% CI, 0.884-0.99), 0.98 (95% CI, 0.937-0.995), 0.97, and 0.96, respectively. External validation showed that SN, SP, PPV, NPV, accuracy, and F1 scores were 0.94 (95% CI, 0.929-0.951), 0.97 (95% CI, 0.957-0.974), 0.96 (95% CI, 0.952-0.971), 0.95 (95% CI, 0.935-0.956), 0.97, and 0.96, respectively. CONCLUSIONS: This study demonstrates the accuracy and feasibility of code-free AutoML models for identifying refDR developed using handheld retinal imaging in a community-based screening program. Potentially, the use of AutoML may increase access to machine learning models that may be adapted for specific programs that are guided by the clinical need to rapidly address disparities in health care delivery. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found after the references.

Last updated on 09/04/2023