COMPARISON OF CHATGPT AND PERPLEXITY`S RESPONSES ON HPV INFECTION AND HPV VACCINE - COGI 2023 The 31st World Congress on Controversies in Obstetrics, Gynecology & Infertility

Problem Statement

The general public utilizes artificial intelligence (AI) to obtain information on a variety of medical conditions due to the growing popularity of AI chatbots. It is still unclear whether or not AI robots can provide accurate information concerning education questions. The present study aims to evaluate the appropriateness and consistency of ChatGPT and Perplexity, the two most popular and freely available chatbots, in answering questions on various aspects of HPV infection and vaccination.

Methods

A question set comprising twenty questions was prepared based on the frequently asked questions handout provided by the American College of Obstetricians and Gynecologists (ACOG). Because ChatGPT`s answers to the same question may differ, it was asked each question three times. Various responses have been recorded in order to assess the consistency of ChatGPT`s answers. Since Perplexity gave the same answer to the same questions, the questions were asked once, and the answers were recorded. The gynecology expert evaluated the answers as appropriate, inappropriate, or if there was a contradiction as inconsistent Finally, all two AI robots’ answers were rated and compared with each other.

Results

The expert rated five answers as inappropriate and one answer as inconsistent for ChatGPT. Only one question was rated inappropriate for Perplexity. The inappropriate and inconsistent answers of AI robots are shown in Table 1. The percentage of answers rated as appropriate 70% for ChatGPT and 95% for Perplexity.

Conclusion

As a result, it was thought that the reason for these different percentages was due to Perplexity`s reliable references. The most common sources of Perplexity are the Centers for Disease Control and Prevention (CDC), National Institutes of Health (NIH), and ACOG. In addition, since Perplexity answers contain citations, the reader can assess the objectivity of the responses. The absence of additional declarations on the HIV-positive and immunocompromised patient groups in ChatGPT indicates a lack of information. Also, the images in Perplexity may increase readers’ attention to the subject. On the other hand, Perplexity is more didactic since ChatGPT is in the form of dialogue, which can be more sincere to the reader and increase its readability.