Don't Let AI Calculate Grades

Exploring an AI's mistake in grade calculation highlights the importance of human oversight in AI-assisted educational tasks.

Posted Aug 6, 2024

By Bill Hollingsworth

6 min read

In our journey to understand and leverage AI for assistive computing, it’s crucial to critically examine AI’s performance in everyday tasks. This article explores an unexpected outcome when using AI to calculate student grades, highlighting the importance of human oversight in AI-assisted processes.

What is Perplexity AI?

For this experiment, I used Perplexity AI, which is an AI-powered search engine and question-answering system. It combines web search capabilities with natural language processing to provide direct answers to user queries, often with cited sources.

Stating the problem

Last week, I was calculating my final grades for the Discrete Math class that I taught over the Summer. I promised my students that I would drop the lowest homework score. I asked Perplexity AI how to do this in the LibreOffice spreadsheet.

I tried to describe the problem in context with the following prompt:

I have 5 homework assignments in a LibreOffice spreadsheet. Each homework assignment has all of its scores listed in a column. I want to compute the average homework score for each student dropping the lowest of the 5 homework assignments for each student. How do I do that?

The initial solution

The following is the response from Perplexity AI:

To compute the average homework score for each student while dropping the lowest of the 5 homework assignments in LibreOffice Calc, you can use a combination of functions. Here’s how to do it:

Assume your homework scores are in columns A through E, with each student’s scores in a row.
In column F, use this formula to calculate the average dropping the lowest score:

=AVERAGE(A2:E2) - MIN(A2:E2)/5

This formula works as follows:

AVERAGE(A2:E2) calculates the average of all 5 homework scores
MIN(A2:E2) finds the lowest score
Subtracting MIN(A2:E2)/5 from the average effectively removes the impact of the lowest score

Copy this formula down for all students in your spreadsheet.

Here’s a breakdown of why this works:

The AVERAGE function adds up all 5 scores and divides by 5
By subtracting MIN(A2:E2)/5, we’re removing 1/5 of the lowest score from the total average
This is mathematically equivalent to removing the lowest score entirely and averaging the remaining 4 scores

My concern

All of my students would have been unhappy if the AI had calculated their homework averages. If the lowest homework score were 0, then the adjusted average would be unchanged. Otherwise, the new homework grade would be lower than it was before dropping the lowest score.

This is because the solution above would take the average out of 5 and subtract one-fifth of the lowest score with the formula:

\[\frac{\sum\limits_{i=1}^{5}x_i}{5} - \frac{MIN(x_1,x_2,...,x_5)}{5}\]

Visualizing the Problem

To better understand the issue, let’s look at a simple example (using fabricated scores):

Homework Scores	Incorrect Method	Correct Method
90, 85, 80, 75, 70	78	82.5
100, 90, 80, 70, 0	68	85

As we can see, the incorrect method consistently produces lower averages, and fails to properly drop the lowest score.

The correction

When I bring questionable reasoning to the attention of an AI bot, I try not to lead it so that I may reduce the chance of it just agreeing with me. I responded with the following prompt:

I don't think that works. If the lowest score is a 0, then the new average is the same. It seems to me that what you want to do is subtract the lowest score from the total of all 5 scores and then divide the new Total by 4. Is that correct? Is that equivalent to what you did? If so, why?

Now we understand?

After reading my follow-up prompt, it recognized its mistake and corrected its solution. At that point, it felt the obligation to explain to me why its updated solution was correct with the following response:

You are correct. The approach I initially suggested does not correctly handle the case where the lowest score is zero. Your method of subtracting the lowest score from the total and then dividing by 4 is indeed the correct approach.

To calculate the average homework score for each student, dropping the lowest of the 5 homework assignments, you can use the following formula:

Sum the Scores: Calculate the total of all 5 scores.
Subtract the Lowest Score: Subtract the lowest score from the total.
Divide by 4: Divide the new total by 4 to get the average of the remaining 4 scores.

Here is the formula to achieve this in LibreOffice Calc:

= (SUM(A2:E2) - MIN(A2:E2)) / 4

Here’s a step-by-step guide:

Assume your homework scores are in columns A through E, with each student’s scores in a row.
In column F, use the following formula to calculate the average dropping the lowest score:

= (SUM(A2:E2) - MIN(A2:E2)) / 4

Copy this formula down for all students in your spreadsheet.

Explanation:

SUM(A2:E2) calculates the total of all 5 homework scores.
MIN(A2:E2) finds the lowest score among the 5.
SUM(A2:E2) - MIN(A2:E2) subtracts the lowest score from the total.
Dividing by 4 gives the average of the remaining 4 scores.

This method ensures that the lowest score is effectively dropped, and the average is calculated based on the remaining 4 scores.

What went wrong: natural language vs math

Although the AI agreed that we arrived at the correct solution, it made a fundamental error in reasoning. In its first response, it arrived at an incorrect solution even though it had the correct solution because it deemed them to be equivalent. We can see this in the final statement:

This is mathematically equivalent to removing the lowest score entirely and averaging the remaining 4 scores

Generative AI chat bots are based on large language models (LLMs), which we will discuss in an upcoming article in this blog. An LLM is trained on natural language. Therefore, an LLM is not a math engine. We will discuss in later articles how AI models based on LLMs learn math. Essentially, they are trained by reading natural language (e.g., English) texts about math. In my Discrete Math and Theory of Computing courses, I urge students to “reason in math, not in English” because colloquial English often allows, or even favors, inaccurate interpretations of mathematical words.

Conclusion

When I give an AI assistant a detailed prompt to produce something that requires complex reasoning, the response usually requires at least one follow-up prompt and manual editing. The math and reasoning skills of AI have already improved over the Summer and will continue to improve. Nonetheless, continue to check it and push it.

This experience underscores the importance of critical thinking and human oversight when using AI tools, especially in educational contexts. While AI can be a powerful assistant, it’s crucial to verify its outputs, particularly for tasks involving mathematical reasoning or grade calculations that directly impact students.

As we continue to explore the intersection of AI and assistive technology, examples like this remind us of the need for a balanced approach that leverages AI’s capabilities while recognizing its current limitations.

Use cases

Bill Hollingsworth

Educator, Software Developer, and Innovator in Assistive Technology

Learn more about Bill