Is ChatGPT a math genius? – ICSE – International Centre for Stem Education

An analysis of ChatGPT’s modelling capabilities

by Dr. Oliver Straser and Chrissi Fischer

Since the release of ChatGPT-3 in November 2022, our world has changed fundamentally. An artificial intelligence that apparently understands human language and can independently create texts and images on request. But is ChatGPT also a good mathematician?

Four academics who are associated with ICSE, Carina Spreitzer, Stephan Zehetmeier (both University of Klagenfurt), Oliver Straser and Katja Maaß (both University of Education, Freiburg), investigated this question in their recently published article “Mathematical Modelling Abilities of Artificial Intelligence Tools: The Case of ChatGPT“.

Katja Maaß and Oliver Straser, University of Education, Freiburg

Stephan Zehetmeier and Carina Spreitzer, University of Klagenfurt

Is Chat GPT a good mathematician?

Their answer to this question is: generally no. But it’s not quite that simple. ChatGPT performs exceptionally well on the math SAT, a part of the American university admissions test (89th percentile compared to student solutions). Nevertheless, this result should not be overestimated (for comparison: students at the elite Harvard University usually start at the 90th percentile).

Even with simple modeling tasks, ChatGPT appears to struggle. The context of the task is generally not understood, especially if general information (eg. how many lanes does a highway have) is not given in the task text. In addition, the solution seems to be selected on the basis of concise terms in the task text and not on the basis of the task content. Certain words in the task text can therefore “trigger” ChatGPT to select incorrect approaches. You can find examples of this in the full article at https://www.mdpi.com/2227-7102/14/7/698.

Using AI’s weaknesses to teach mathematics

In general, complex math problems, especially reasoning and proving, are a major challenge for ChatGPT. Although newer versions of ChatGPT show significantly better mathematical skills than the older ones, all versions so far show similar error patterns. However, these weaknesses also offer opportunities for teaching: students can analyze solutions created by ChatGPT. The typical error patterns not only provide an opportunity to understand the possibilities and limitations of this technology, but also to learn about how large language models work in general.

It is impossible to predict how these or similar technologies will develop in the future. OpenAI, the company behind ChatGPT, has already announced that the next iteration will have intelligence comparable to a person with a PhD. What this means in detail, however, remains to be seen.

If you want to stay in conversation with us and continue learning and evaluating what ChatGPT has to offer for STEM education, sign up for our newsletter. In different formats, like 1h4teachers, we engage with the fascinating potential of AI.

Reference

Spreitzer C, Straser O, Zehetmeier S, Maaß K. (2024). Mathematical Modelling Abilities of Artificial Intelligence Tools: The Case of ChatGPT. Education Sciences; 14(7):698. https://doi.org/10.3390/educsci14070698

Access Article

Back to overview