Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination

Lin, John C.; Younessi, David N.; Kurapati, Sai S.; Tang, Oliver Y.; Scott, Ingrid U.

doi:10.1038/s41433-023-02564-2

Brief Communication
Published: 08 May 2023

Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination

John C. Lin^1,2,
David N. Younessi³,
Sai S. Kurapati⁴,
Oliver Y. Tang^1,5 &
…
Ingrid U. Scott⁴

Eye volume 37, pages 3694–3695 (2023)Cite this article

1280 Accesses
21 Citations
1 Altmetric
Metrics details

Subjects

Access through your institution

Buy or subscribe

OpenAI recently created the large language model ChatGPT [1], which utilises autoregressive language model GPT-3.5 and is trained on Internet data to possess deductive capabilities [1]. In March 2023, OpenAI introduced GPT-4, a successor to GPT-3.5 [2].

GPT-4, and sometimes GPT-3.5, achieve passing scores on United States Medical Licensing Examination Step exams [3]. Both pass neurosurgery practice board exams at rates comparable to neurosurgery residents [2]. However, GPT-3 could not consistently and accurately answer most mock ophthalmology exam questions [4]. Hence, we compared the performance of GPT-3.5, GPT-4, and human users on a practice ophthalmology written examination.

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med. 2023;388:1233–39. https://doi.org/10.1056/NEJMsr2214184.
Article PubMed Google Scholar
Ali R, Tang OY, Connolly ID, Zadnik Sullivan PL, Shin JH, Fridley JS, et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations. medRxiv. 2023;2:e0000198. https://doi.org/10.1101/2023.03.25.23287743.
Nori H, King N, McKinney SM, Carignan D, Horvitzet E. Capabilities of GPT-4 on medical challenge problems. arXiv. 2023. https://doi.org/10.48550/arXiv.2303.13375.
Fares A. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. medRxiv. 2023:2023.01.22.23284882. https://doi.org/10.1101/2023.01.22.23284882.
American Academy of Ophthalmology. Board prep resources for ophthalmology residents San Francisco, CA: American Academy of Ophthalmology; 2023. Available from: https://www.aao.org/education/board-prep-resources accessed April, 2023.

Download references

Acknowledgements

The authors thank Rohaid Ali, MD of Brown University and Ian Connolly, MD, MS of Massachusetts General Hospital for their contributions to this study’s design.

Funding

John Lin was awarded departmental funding from Brown University for expenses related to this study.

Author information

Authors and Affiliations

The Warren Alpert Medical School, Brown University, Providence, RI, USA
John C. Lin & Oliver Y. Tang
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
John C. Lin
Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
David N. Younessi
Departments of Ophthalmology and Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
Sai S. Kurapati & Ingrid U. Scott
Department of Neurosurgery, University of Pittsburgh, Pittsburgh, PA, USA
Oliver Y. Tang

Authors

John C. Lin
View author publications
You can also search for this author in PubMed Google Scholar
David N. Younessi
View author publications
You can also search for this author in PubMed Google Scholar
Sai S. Kurapati
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Y. Tang
View author publications
You can also search for this author in PubMed Google Scholar
Ingrid U. Scott
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors were responsible for conceptualization and research design; JCL, DNY, and SSK were involved in data acquisition and research execution; JCL, DNY, and OYT conducted the data analysis; all authors worked on data interpretation and manuscript preparation.

Corresponding author

Correspondence to Ingrid U. Scott.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lin, J.C., Younessi, D.N., Kurapati, S.S. et al. Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination. Eye 37, 3694–3695 (2023). https://doi.org/10.1038/s41433-023-02564-2

Download citation

Received: 17 April 2023
Revised: 17 April 2023
Accepted: 24 April 2023
Published: 08 May 2023
Issue Date: December 2023
DOI: https://doi.org/10.1038/s41433-023-02564-2

This article is cited by

Recommendations for diabetic macular edema management by retina specialists and large language model-based artificial intelligence platforms
- Ayushi Choudhary
- Nikhil Gopalakrishnan
- Ramesh Venkatesh
International Journal of Retina and Vitreous (2024)
Recommendations for initial diabetic retinopathy screening of diabetic patients using large language model-based artificial intelligence in real-life case scenarios
- Nikhil Gopalakrishnan
- Aishwarya Joshi
- Ramesh Venkatesh
International Journal of Retina and Vitreous (2024)
Comment on: ‘Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination’ and ‘ChatGPT in ophthalmology: the dawn of a new era?’
- Nima Ghadiri
Eye (2024)
ChatGPT und die deutsche Facharztprüfung für Augenheilkunde: eine Evaluierung
- Rémi Yaïci
- M. Cieplucha
- M. Roth
Die Ophthalmologie (2024)
Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology
- Andrea Taloni
- Massimiliano Borselli
- Giuseppe Giannaccare
Scientific Reports (2023)

Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination

Subjects

Access options

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

This article is cited by

Recommendations for diabetic macular edema management by retina specialists and large language model-based artificial intelligence platforms

Recommendations for initial diabetic retinopathy screening of diabetic patients using large language model-based artificial intelligence in real-life case scenarios

Comment on: ‘Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination’ and ‘ChatGPT in ophthalmology: the dawn of a new era?’

ChatGPT und die deutsche Facharztprüfung für Augenheilkunde: eine Evaluierung

Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology

Comparative analysis of large language models in the Royal College of Ophthalmologists fellowship exams

Search

Quick links

Subjects

Access options

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Recommendations for diabetic macular edema management by retina specialists and large language model-based artificial intelligence platforms

Recommendations for initial diabetic retinopathy screening of diabetic patients using large language model-based artificial intelligence in real-life case scenarios

Comment on: ‘Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination’ and ‘ChatGPT in ophthalmology: the dawn of a new era?’

ChatGPT und die deutsche Facharztprüfung für Augenheilkunde: eine Evaluierung

Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology

Search

Quick links