Automated and Computer-Assisted Detection, Classification, and Diagnosis of Diabetic Retinopathy

April 28, 2020 Editor

Introduction

Systems for computer-assisted and fully automated detection, triage, and diagnosis of diabetic retinopathy (DR) from retinal images show great variation in design, level of autonomy, and intended use. Moreover, the degree to which these systems have been evaluated and validated is heterogeneous. We use the term DR artificial intelligence (AI) system as a general term for any system that interprets retinal images with at least some degree of autonomy from a human grader.

Rationale

The introduction of AI in medicine has raised significant ethical, economic, and scientific controversies. Because an explicit goal of AI is to perform processes previously reserved for human clinicians and other health care personnel, there is justified concern about the impact on patient safety, efficacy, equity, and liability, and the labor market.

To partially address these controversies, the partnership on AI was established to formulate best practices for the application of AI technologies, to advance the public’s understanding of AI, and to serve as an open platform for discussion and engagement about AI and its influences on individuals and society.¹ More recently, the American Medical Association, in a series of policy statements, most recently in 2019, has been addressing these concerns for the health care field, such as the formulation of the principles of safety, efficacy, and equity of AI, and autonomous AI, in both design and validation, the integration of AI into the health care system, and when the AI developer should assume liability, as well as the development of a common nomenclature and guidelines for domain-specific systems.² If AI systems in general, and DR AI systems specifically, are to gain acceptance by patients, medical providers, payers, and the general public, a common language for describing them widely agreed upon guidelines, and the upholding and dissemination of these principles are essential. This study is intended to establish a common framework and lexicon for consideration of DR AI systems, and provide a starting point for future practice guidelines. In this context, the following discussion will refer to these preliminary recommendations as “guidelines.”

Currently, most AI systems function as augmented intelligence, wherein there is a combination of human tasks that are difficult or impossible to computerize (e.g., common sense, morals, compassion, imagination, and abstraction) and AI system tasks (e.g., pattern identification and machine learning) to achieve high clinical accuracy, low intraobserver variability, and improved system scalability.³ In the case of DR AI systems, a fully automated grading of a retinal photograph to identify a threshold level DR may allow a provider or program to determine whether referral to an eye care provider is needed as a component of the patient’s diabetes care.⁴ With sufficient clinical accuracy, cost and ease of use, multiple other use cases both inside (e.g., optometric or ophthalmogic office, and fundus reading center) and outside (e.g., pharmacy and phlebotomy laboratory) traditional eye care may also find value in such systems.

In these guidelines, the following components of an AI system will be discussed in sequence: “Level of Device Autonomy,” “Intended Use,” “Level of Evidence for Diagnostic Accuracy,” and “System Design.” At the current stage of scientific and legal evidence, there is no basis for recommending a specific combination of autonomy, accuracy, and intended use as more appropriate than any other. Thus, the current guidelines treat each component as independent, with the practice recommendations necessarily descriptive, rather than prescriptive. Issues such as patient recruitment, patient referral, and the wider health care context in which these operate are outside the scope of these guidelines.

Where possible, these practice recommendations align with other published guidelines, including the Food and Drug Administration (FDA’s) proposed Software as a Medical Device (SaMD): Clinical Evaluation guidelines,⁵ and presentations by FDA after their recent authorization of the first autonomous diagnostic AI.⁴ It is to be expected, as our understanding of DR AI systems advances, that these guidelines may become more prescriptive. Since DR AI systems are a relatively new introduction in health care, many readers may not be familiar with their associated lexicon, categorical structure, and quality measures. These and other features of this type of software that operates as a medical device have been described by the International Medical Device Regulators Forum, and may facilitate understanding of these guidelines.⁵

Level of System Autonomy

The autonomy of DR AI systems is categorized in reference to the diagnostic decision being made by the DR AI. In other words, the autonomy levels reflect the level of (or lack of) expert oversight of the clinical decision in clinical care. Autonomy levels in reference to the patient decision are (1) no autonomy, (2) assistive, or (3) autonomous, and are provided in Table 1.

**Table 1. Description of Artificial Intelligence System Levels of Autonomy**
AUTONOMY LEVEL	NO AUTONOMY	ASSISTIVE	AUTONOMOUS
Description	System that does not provide treatment, diagnosis, or screening recommendations	AI system that assists clinicians by giving treatment, diagnosis, or screening recommendations, while relying on physician interpretation of said advice to direct patient care	AI system that provides direct treatment or diagnosis/screening recommendations without physician interpretation

Classifying DR AI systems according to autonomy level has implications for patient safety, testing/validation, and, therefore, the claims that can be made about such a system. The more autonomously an AI system operates, the higher the requirements are for technical sophistication, validation, and system controls. The FDA has authorized or cleared systems in each autonomy level.

Intended Use

The purpose of this section is to put forth a classification system for specifying the intended use of any system for the detection, triage, or diagnosis of DR from images, including DR AI systems. In this discussion, intended use refers to the planned sociotechnical environment of the users and patients.⁶ There are multiple characteristics of intended use, the most prominent being the operational environment, type of output, and end user. Although this intended use classification system is proposed for DR AI applications, it is general enough that it may be useful in the description of any image-based diagnostic system. DR AI intended for clinical use should be in alignment with the scientific state of the art, and based on functional aspects of the system.

The characteristics of a DR AI system are interrelated:

The environment may be one or more of primary care clinics, endocrinology clinics, diabetes and family care clinics, telemedicine programs, reading centers, retail walk-in clinics, ophthalmology clinics, optometry clinics, retinal specialist clinics, patient homes, and other settings.

The type of output maps generally to the validation categories defined in the parent document of this study (ATA Telehealth Practice Guidelines for Diabetic Retinopathy); a fifth category for more comprehensive diagnosis of retinal disease in addition to DR was added.^7–10 Specifically:

	○	A DR AI program that allows identification of patients who have no or minimal DR versus those who have more than minimal DR could be considered ATA Category 1 (Refs.^11–14).
	○	A DR AI program that allows identification of patients who do not have sight-threatening DR versus those who have potentially sight-threatening DR could be considered ATA Category 2 (Refs.^7,14).
	○	A DR AI program that allows identification of defined clinical levels of nonproliferative DR (mild, moderate, or severe), proliferative DR (early, high risk), and diabetic macular edema (DME) (according to a clinical grading scheme,¹⁵ typically the Early Treatment Diabetic Retinopathy Study [ETDRS]¹⁶) with accuracy sufficient to determine appropriate follow-up and treatment strategies could be considered ATA Category 3 (Ref.¹⁴).
	○	A DR AI system that matches or exceeds the ability of ETDRS photographs to identify all lesions of DR to determine precise levels of DR and DME¹⁶ could be considered ATA Category 4 (Ref.¹⁴).
	○	A DR AI system that can exclude or describe the presence of non-DR diagnoses, such as, but not limited to, retinal vein occlusions, hypertensive retinopathy, choroidal nevus, and macular degeneration is not currently described in the ATA categories, although the ETDRS system includes level 12 to describe non-DR findings.¹⁰

The end user can be physicians and other providers, nonphysician staff, or patients (in a direct-to-consumer paradigm).

Additional characteristics of intended use that can be specified are:

	○	A specific image quality taxonomy and level required by the DR AI system.
	○	A specific imaging protocol required by the DR AI system, which may include requirements for the size, number, and localization of fields per eye.
	○	An ability of the DR AI system to evaluate differences in disease features between two or more visits, such as changes in lesion distribution, extent, or other characteristics representative of activity.¹⁷

These additional characterizations may be helpful for a full description of the capabilities and limitations of the DR AI system’s intended use.

Within an intended use case, a DR AI system output characteristic should match the end user and environment characteristic. For example, a patient will typically be unable to interpret specific disease severity levels, and thus the output, that is, report, for this use case is required to include a referral or no referral result. Likewise, some physician users may have background in DR and so inclusion of specific clinical or even ETDRS classification levels may be more appropriate.

Diagnostic Accuracy Evidence

The purpose of this section is to describe standardized levels of diagnostic accuracy of DR AI systems. This system does not specify the requirements to achieve a particular autonomy level, but rather defines the criteria by which diagnostic accuracy evidence can be evaluated by physician and patients as well as consumers and policy-makers. The characterizations are descriptive and are in alignment with the current scientific state of the art, allow a step-wise progression, are based on functional aspects of the system, and define the intended use and provider roles for each level. Although the requirements for the evidence vary with the level of desired system autonomy, with more autonomous system requiring greater scrutiny, these guidelines remain descriptive.

Current Good Manufacturing Practices

Current Good Manufacturing Practices are regulations enforced by the FDA to facilitate proper design, monitoring, and control of manufacturing processes and facilities.¹⁸ Similar requirements exist in other countries. These regulations imply that the design and production of the DR AI system are under some form of structured quality control that requires validation. For example, 21 CFR 820 (Ref.¹⁹) in the United States and ISO 13485 (Ref.²⁰) in the European Union set forth the minimum requirements of a quality management system, including a framework for the design, development, and production of medical devices, and postmarketing surveillance.

Accuracy Study

A diagnostic accuracy study examines diagnostic accuracy of the DR AI system in isolation; that is, without full reflection of its intended use. Diagnostic accuracy studies for DR AI will involve images of subjects demonstrating a full range of DR and DME severity. Retinal imaging equipment operator performance, image quality management, and other factors external to DR AI systems are outside the scope of this AI discussion. Reference standards and metrics are discussed hereunder.

System Validation in Context

Validation as a system implies that the diagnostic accuracy of the DR AI system is examined within the entirety of its intended use (end-to-end). All factors that will affect the quality and availability of the subject’s images are considered. Thus, the overall system validity will depend not only on the diagnostic accuracy of the DR AI system in isolation, but also on a variety of related programmatic components, such as the ability of a real-world operator to demonstrate technical proficiency and acquire retinal images according to the required imaging protocol, and with sufficient quality for the successful disposition of the subject.

Metrics for Diagnostic Accuracy and Validation Studies

Diagnostic accuracy and system validation studies should yield data relevant for management decisions of patients based on the intended use. Although variable thresholds are possible, clinical practice will require management decisions on patients to be made with fixed preset thresholds (e.g., disease is present or absent, a specific risk level is present or absent). Thus, the classical diagnostic accuracy metrics of sensitivity and specificity, which are appropriately used for binary outcomes, are more appropriate measures than metrics such as receiver operator characteristics analysis. Similarly, they are also more appropriate than aggregate accuracy expressed a single metric, that is, combining sensitivity and specificity. In addition, newer metrics, such as severity-weighted sensitivity, incorporate the clinical significance of false negatives at different severity levels of disease (i.e., higher risk of vision loss if a case of severe DR is classified as normal as compared with a case of mild DR).²¹ Standards for diagnostic accuracy studies, such as Standards for Reporting of Diagnostic Accuracy Studies,²² can help in comparing DR AI systems and in increasing acceptance by clinicians and the public. In addition to the classical diagnostic accuracy metrics already described, the following are important to define DR AI system at their specified diagnostic accuracy evidence level⁵:

The fraction of subjects that can be successfully imaged and result in a disposition by the DR AI system, referred to as gradability.
Corrected measures of sensitivity and specificity taking into consideration gradability, again with a preset threshold.
Specific report of the severity level of DR of all false negatives.
Use of severity weighted metrics such as severity weighted sensitivity.
Evaluation of the repeatability and reproducibility of the DR AI system.
Limit of detection of a system; that is, the robustness of the system to random and so-called adversarial inputs.^23,24
Analytical sensitivity reflecting how image artifacts and other disruptions affect performance.

Accuracy Study Setting: Laboratory or Intention to Screen

Accuracy studies can be characterized as laboratory or intention to screen.

Laboratory studies are characterized by the use of retrospectively accessed pre-existing image data sets, which may be publicly available. Typically, these data sets are enhanced by removal of low-quality images.

Intention-to-screen studies in contrast include all patients to better replicate real-world conditions in which media opacity and poor dilation may preclude perfect quality photography. Such studies are characterized by either prospectively collected or previously collected under a prespecified, but unrelated protocol. The data sets may include clinicaltrials.gov registration trials or image data sets that were not previously available, even though they are accessed retrospectively.

Reference Standard Truth Derivation

The reference standard for a diagnostic accuracy or system validation study is typically derived from subjective reading of retinal images, but can also be derived from more objective sources, such as definitive retinal thickening by optical coherence tomography or surrogate clinical outcome.²⁵ The following levels of subjective grading may be used:

Level A reference standard: A reference standard that either is a clinical outcome or an outcome that has been validated to be equivalent to patient level outcome (i.e., a surrogate for a specific patient outcome). This reference standard is derived from an independent reading center (where the clinicians or experts performing the reading are not otherwise involved in performing the study), with validated published protocols, and with published reproducibility and repeatability metrics. A level A reference standard is based on at least as many modalities as the test and ideally more.
Level B reference standard: A reference standard derived from an independent reading center with validated published reading protocols, and with published reproducibility and repeatability metrics. A level B reference standard has not been validated to correlate with a patient level outcome.
Level C reference standard: A reference standard created by adjudicating or voting of multiple independent expert readers, documented to be masked, with published reproducibility and repeatability metrics. A level C reference standard has not been derived from an independent reading center, and has not been validated to correlate with a patient level outcome.
Level D reference standard: All other reference standards, including single readers and nonexpert readers. A level D reference standard has not been derived from an independent reading center, and has not been validated to correlate with a patient level outcome, and readers do not have published reproducibility and repeatability metrics.

In addition, studies can be characterized as having reference standards derived from objective measures such as clinical outcome, or a combination of the mentioned.

Diagnostic drift, where readers differ in their grading system compared with the reading center that was involved in the original foundational studies, should be taken into account.¹⁴

Studies can be prospective, that is, where the data are collected according to a prespecified protocol.

Studies can be preregistered, that is, where the data and the statistical analysis, hypothesis to be tested, and subject in-and exclusion are executed according to a prespecified protocol and statistical analysis plan. Preregistration is a requirement for publication in many scientific journals^26–28

Conflicts of interest with the organization that is involved in the development and sponsorship of the DR AI system should be taken into account.

Additional considerations to be made in classifying diagnostic accuracy evidence.

The following variables can be used to further describe the characteristics of the diagnostic accuracy evidence for a DR AI system:

Inclusion of the presence of incidental findings, in other words, non-DR diagnoses, including macular disease such age-related macular degeneration, nondiabetic retinal vascular diseases such as central retinal vein occlusion, and hypertensive retinopathy, are excluded in the accuracy analyses, or whether they are included as either positive, negative, or both.²⁹
Selection of the grading system used to create the reference standard against which the DR AI system is evaluated, such as National Health Service United Kingdom³⁰ Eurodiab³¹ or ETDRS.¹⁶ The choice of grading system will also depend on the intended use. The grading system affects the estimated diagnostic accuracy and performance of the DR AI system, even within the same reading center.^15,26

DR AI System Design

DR AI system design has changed considerably over the past 10 years, and significant continued evolution is expected; therefore, any characterization of design must be descriptive to avoid rapid obsolescence. Nevertheless, there are some characteristics of system design that are considered informative at the present time; for example, amount of training required and explanation generation of the DR AI system.

DR AI systems can be characterized by the amount of training required. One taxonomy involves so-called unsupervised and supervised categories.¹⁰Unsupervised implies that once the algorithm has been designed and implemented, none of the parameters is ever adjusted in response to the performance on a training set of images. Almost all current DR AI systems are supervised, in which some or many parameters are adjusted during a so-called training phase in response to a specific performance on a training data set until an acceptable performance is achieved. Another category is semisupervised, which combines aspects of both aforementioned types to improve complex image analysis. These terms are not very useful to categorize DR AI systems.

Explainability (explanation generation) of a DR AI system design means that human users can understand, at least to some level of abstraction, how the DR AI system arrives at its diagnostic output (i.e., “The computer finds all the microaneurysm, hemorrhages and exudates. Based on the total number and location of each, the final diagnosis is calculated”). Explainability is predicated upon an appreciation of contemporary DR AI system design.

DR AI system designs can use retinal feature detectors to determine the presence of lesions and biomarkers in retinal images (e.g., hemorrhages and exudates), as well as nonlinear transformations of their outputs.^32–35 Machine learning approaches are typically used for the generation of the final output (e.g., normal vs. abnormal). Because these DR AI systems involve multiple feature detectors for pathognomonic DR lesions, they are categorized as lesion based, and can be explained at the disease characteristic level because they explicitly detect types of relevant lesions. Some have claimed that lesion-based designs are more “physiologically plausible”³⁶ with multiple redundant lesion-specific detectors, and a functional method that mimics the human visual cortex.³⁷

DR AI system designs can also involve one or more multilayer neural networks, such as Convolutional Neural Networks (CNN).³⁸ Such designs have allowed marked improvements in diagnostic accuracy, as evidenced in the diagnostic accuracy of algorithms in the recent Kaggle competition³⁹ These designs have one or more CNN trained to associate an entire retinal image with a disease level diagnostic output. In these designs, the computer is “fed” each image and its corresponding output in a very large training set and then develops a system to grade the images without “knowledge” that microaneurysms, hemorrhages, and exudates are the hallmarks of DR, but instead uses the raw pixel data to “learn” what DR is and is not. Because the system has not been “taught” about the lesions of DR and is not explicitly using them to make the DR diagnosis, the human user cannot understand how the AI system actually makes the diagnosis of DR, and so such systems are sometimes considered black box designs. A number of end-to-end–based DR AI systems have been developed in academic and other prototype contexts in recent years, leveraging the fact that extensive feature design is not required.^10,40–42 Although these systems may actually detect some lesions (i.e., the system “teaches itself” about microaneurysms), the operation of black box systems cannot be verified or explained at the disease characteristic level.

Explanation generation has not been shown to affect overall diagnostic accuracy based on classical diagnostic accuracy metrics to date.^14,40,41 Some claim that a lack of explanation generation may impact diagnostic accuracy and the risk of unanticipated errors from small perturbations that can be estimated with non-Gaussian diagnostic accuracy metrics.^23,24

Summary

In summary, this document puts forth these standardized descriptors to form a means to categorize systems for computer-assisted and fully automated detection, triage, and diagnosis of DR. The components of the categorization system include Level of Device Autonomy, Intended Use, Level of Evidence for Diagnostic Accuracy, and System Design. There is currently no empirical basis to assert that certain combinations of autonomy, accuracy, or intended use are better or more appropriate than any other. Therefore, at the current stage of development of this document, we have been descriptive rather than prescriptive, and we treat the different categorizations as independent and organized along multiple axes.

Disclosure Statement

M.D.A. is Founder, Executive Chairman, Director of IDx (Coralville, IA), and an investor. M.D.A. has patent and patent applications assigned to the University of Iowa, to the Department of Veterans Affairs, and to IDx. T.L. is consultant for Spect, Inc. (San Francisco, CA). K.R. is Chief Medical Officer of IBM, Inc. (Yorktown, NY). M.F.C. is an unpaid member of the Scientific Advisory Board for Clarity Medical Systems (Pleasanton, CA) and a consultant for Novartis (Basel, Switzerland). M.B.H. has no competing financial interests.

Funding Information

M.D.A. is the Robert C. Watzke Professor of Ophthalmology and Visual Sciences, and supported by NIH grants R01 EY019112, R01 EY018853; by unrestricted departmental funding from Research to Prevent Blindness (New York, NY), the Department of Veterans Affairs, and Alimera Life Sciences. T.L. is supported by unrestricted departmental funding from Research to Prevent Blindness, and in part by the Heed Ophthalmic Foundation Fellows Grant. M.F.C. is supported by grant P30EY010572 from the National Institutes of Health (Bethesda, MD), and by unrestricted departmental funding from Research to Prevent Blindness.

References

1. Partnership on artificial intelligence. 2018. Available at https://www.partnershiponai.org (last accessed December 29, 2019). Google Scholar
2. American Medical Association. Augmented intelligence in health care: 2019 AI Board report. 2019. Available at https://www.ama-assn.org/system/files/2019-08/ai-2019-board-report.pdf (last accessed December 29, 2019). Google Scholar
3. Helmchen LA, Lehmann HP, Abramoff MD. Automated detection of retinal disease. Am J Manag Care 2014;20:eSP48–eSP52. Medline, Google Scholar
4. Food and Drug Administration. FDA permits marketing of artificial-intelligence-based device to detect certain diabetes-related eye problems. 2018. Available at https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm604357.htm (last accessed December 29, 2019). Google Scholar
5. Food and Drug Administration. Software as a Medical Device (SaMD): Clinical evaluation. 2017. Available at https://www.fda.gov/media/100714/download (last accessed December 29, 2019). Google Scholar
6. International Medical Device Regulators Forum SaMD Working Group. Software as a Medical Device (SaMD): Application of quality management system. 2015. Available at http://www.imdrf.org/docs/imdrf/final/technical/imdrf-tech-151002-samd-qms.pdf (last accessed December 29, 2019). Google Scholar
7. Li HK, Horton M, Bursell SE, et al. Telehealth practice recommendations for diabetic retinopathy, second edition. Telemed J E Health 2011;17:814–837. Link, Google Scholar
8. Sim DA, Keane PA, Tufail A, Egan CA, Aiello LP, Silva PS. Automated retinal image analysis for diabetic retinopathy in telemedicine. Curr Diab Rep 2015;15:14. Crossref, Medline, Google Scholar
9. Abramoff MD, Garvin MK, Sonka M. Retinal imaging and image analysis. IEEE Rev Biomed Eng 2010;3:169–208. Crossref, Medline, Google Scholar
10. Ryan SJ. Retina, 6th ed. London: Saunders/Elsevier, 2017. Google Scholar
11. Abramoff MD, Folk JC, Han DP, et al. Automated analysis of retinal images for detection of referable diabetic retinopathy. JAMA Ophthalmol 2013;131:351–357. Crossref, Medline, Google Scholar
12. Fleming AD, Goatman KA, Philip S, Prescott GJ, Sharp PF, Olson JA. Automated grading for diabetic retinopathy: A large-scale audit using arbitration by clinical experts. Br J Ophthalmol 2010;94:1606–1610. Crossref, Medline, Google Scholar
13. Tufail A, Kapetanakis VV, Salas-Vega S, et al. An observational study to assess if automated diabetic retinopathy image assessment software can replace one or more steps of manual imaging grading and to determine their cost-effectiveness. Health Technol Assess 2016;20:1–72. Crossref, Medline, Google Scholar
14. Abramoff MD, Lou Y, Erginay A, et al. Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning. Invest Ophthalmol Vis Sci 2016;57:5200–5206. Crossref, Medline, Google Scholar
15. Abramoff MD, Fort PE, Han IC, Jayasundera KT, Sohn EH, Gardner TW. Approach for a clinically useful comprehensive classification of vascular and neural aspects of diabetic retinal disease. Invest Ophthalmol Vis Sci 2018;59:519–527. Crossref, Medline, Google Scholar
16. Fundus photographic risk factors for progression of diabetic retinopathy. ETDRS report number 12. Early Treatment Diabetic Retinopathy Study Research Group. Ophthalmology 1991;98(5 Suppl.):823–833. Medline, Google Scholar
17. Haritoglou C, Kernt M, Neubauer A, et al. Microaneurysm formation rate as a predictive marker for progression to clinically significant macular edema in nonproliferative diabetic retinopathy. Retina 2014;34:157–164. Crossref, Medline, Google Scholar
18. U.S. Food and Drug Administration. Facts about the current good manufacturing practices (CGMPs). 2018. Available at https://www.fda.gov/drugs/developmentapprovalprocess/manufacturing/ucm169105.htm (last accessed December 29, 2019). Google Scholar
19. U.S. Food and Drug Administration. Part 820 quality system regulation. 2018. Available at https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=820 (last accessed December 29, 2019). Google Scholar
20. International Organization for Standardization. Medical devices—Quality management systems—Requirements for regulatory purposes. 2016. Available at https://www.iso.org/standard/59752.html (last accessed December 29, 2019). Google Scholar
21. Moons KG, Stijnen T, Michel BC, et al. Application of treatment thresholds to diagnostic-test evaluation: An alternative to the comparison of areas under receiver operating characteristic curves. Med Decis Making 1997;17:447–454. Crossref, Medline, Google Scholar
22. Cohen JF, Korevaar DA, Altman DG, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: Explanation and elaboration. BMJ Open 2016;6:e012799. Crossref, Medline, Google Scholar
23. Shah A, Lynch S, Niemeijer M, et al. Susceptibility to misdiagnosis of adversarial images by deep learning based retinal image analysis algorithms. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). Washington, DC: IEEE, 2018;1454–1457. Crossref, Google Scholar
24. Lynch S, Abramoff MD. Catastrophic failure in image-based convolutional neural network algorithms for detecting diabetic retinopathy. IOVS 2017;58:3776. Google Scholar
25. Quellec G, Abramoff MD. Estimating maximal measurable performance for automated decision systems from the characteristics of the reference standard. application to diabetic retinopathy screening. Conf Proc IEEE Eng Med Biol Soc 2014;2014:154–157. Medline, Google Scholar
26. Abramoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med 2018;1:39. Crossref, Medline, Google Scholar
27. Mathieu S, Boutron I, Moher D, Altman DG, Ravaud P. Comparison of registered and published primary outcomes in randomized controlled trials. JAMA 2009;302:977–984. Crossref, Medline, Google Scholar
28. DeAngelis CD, Drazen JM, Frizelle FA, et al. Clinical trial registration: A statement from the International Committee of Medical Journal Editors. JAMA 2004;292:1363–1364. Crossref, Medline, Google Scholar
29. Ting DSW, Cheung CY, Lim G, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 2017;318:2211–2223. Crossref, Medline, Google Scholar
30. Scanlon PH, Malhotra R, Greenwood RH, et al. Comparison of two reference standards in validating two field mydriatic digital photography as a method of screening for diabetic retinopathy. Br J Ophthalmol 2003;87:1258–1263. Crossref, Medline, Google Scholar
31. Aldington SJ, Kohner EM, Meuer S, Klein R, Sjolie AK. Methodology for retinal photography and assessment of diabetic retinopathy: The EURODIAB IDDM complications study. Diabetologia 1995;38:437–444. Crossref, Medline, Google Scholar
32. Quellec G, Lamard M, Cazuguel G, et al. Automated assessment of diabetic retinopathy severity using content-based image retrieval in multimodal fundus photographs. Invest Ophthalmol Vis Sci 2011;52:8342–8348. Crossref, Medline, Google Scholar
33. Abramoff MD, Reinhardt JM, Russell SR, et al. Automated early detection of diabetic retinopathy. Ophthalmology 2010;117:1147–1154. Crossref, Medline, Google Scholar
34. Chaum E, Karnowski TP, Govindasamy VP, Abdelrahman M, Tobin KW. Automated diagnosis of retinopathy by content-based image retrieval. Retina 2008;28:1463–1477. Crossref, Medline, Google Scholar
35. Fleming AD, Goatman KA, Philip S, et al. The role of haemorrhage and exudate detection in automated grading of diabetic retinopathy. Br J Ophthalmol 2010;94:706–711. Crossref, Medline, Google Scholar
36. Abramoff MD, Alward WL, Greenlee EC, et al. Automated segmentation of the optic disc from stereo color photographs using physiologically plausible features. Invest Ophthalmol Vis Sci 2007;48:1665–1673. Crossref, Medline, Google Scholar
37. Ts’o DY, Frostig RD, Lieke EE, Grinvald A. Functional organization of primate visual cortex revealed by high resolution optical imaging. Science 1990;249:417–420. Crossref, Medline, Google Scholar
38. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Pereira FBurges CJCBottou LWeinberger KQ, eds. Advances in neural information processing systems. Red Hook, NY: Curran Associates, Inc., 2012:1097–1105. Google Scholar
39. Kaggle. Diabetic retinopathy detection. 2015. Available at https://www.kaggle.com/c/diabetic-retinopathy-detection (last accessed December 29, 2019). Google Scholar
40. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016;316:2402–2410. Crossref, Medline, Google Scholar
41. Gargeya R, Leng T. Automated identification of diabetic retinopathy using deep learning. Ophthalmology 2017;124:962–969. Crossref, Medline, Google Scholar
42. Takahashi H, Tampo H, Arai Y, Inoue Y, Kawashima H. Applying artificial intelligence to disease staging: Deep learning for improved staging of diabetic retinopathy. PLoS One 2017;12:e0179790. Crossref, Medline, Google Scholar

Source link