Teaching & assessing coding in the ChatGPT, Bard, etc era

I run second-year computing for physics undergraduates. The second semester part is taught as individual projects so is perhaps a little resistant to the problem of students just getting ChatGPT or Google’s Bard to do it. But the first semester includes very common basic problems like fitting to noisy data. The bad news here is that for standard simple tasks, ChatGPT and Bard will just give you answers, and so I can’t really have an assessment where students can just ask ChatGPT.

For example, I asked Bard: “How do I fit a straight line to noisy data, and get both best fit values and uncertainties in the intercept slope? I want to use Python code.” It’s answer was:

“Sure, here is how you can fit a straight line to noisy data in Python and get both best fit values and uncertainties in the intercept and slope:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Define the true objective function.
def objective(x, a, b):
    return a * x + b

# Generate some noisy data.
x = np.linspace(0, 10, 100)
y = objective(x, 2, 3) + np.random.normal(0, 0.5, 100)

# Fit the data to a straight line.
popt, pcov = curve_fit(objective, x, y)

# Print the best fit values and uncertainties.
print("Best fit values:")
print("a = {:.2f} +/- {:.2f}".format(popt[0], np.sqrt(pcov[0, 0])))
print("b = {:.2f} +/- {:.2f}".format(popt[1], np.sqrt(pcov[1, 1])))

# Plot the data and the fitted line.
plt.plot(x, y, "o")
plt.plot(x, objective(x, *popt))
plt.show()

This code will first define the true objective function, which is the function that we believe generated the data. Then, it will generate some noisy data using this function. Next, it will fit the data to a straight line using the curve_fit() function. This function will return the best fit values for the slope and intercept, as well as a covariance matrix that contains the uncertainties in these values. Finally, the code will print the best fit values and uncertainties, and plot the data and the fitted line.”

ChatGPT gave a functionally identical answer, although didn’t format the output so nicely. Problems did start when I asked Bard and ChatGPT how scipy calculated the uncertainty estimates, in both cases they confidently gave (different) wrong answers.

And both made essentially the same mistake when asked to use the bootstrap method to estimate errors. I asked ChatGPT: “Please write a code to fit a straight line to noisy data in Python. It should use the bootstrap method to estimate the uncertainty in the best fit values.”* and got:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# Generate noisy data
np.random.seed(0)
x = np.linspace(0, 10, 100)
y = 3 * x + 2 + np.random.normal(0, 1, 100)

# Define the linear function
def linear_func(x, m, c):
    return m * x + c

# Number of bootstrap samples
n_bootstrap = 1000

# Perform bootstrap fitting
bootstrap_params = []
for _ in range(n_bootstrap):
    # Generate bootstrap sample
    indices = np.random.choice(len(x), len(x), replace=True)
    x_bootstrap = x[indices]
    y_bootstrap = y[indices]

    # Fit the bootstrap sample to the linear function
    params, _ = curve_fit(linear_func, x_bootstrap, y_bootstrap)
    bootstrap_params.append(params)

# Calculate the mean and standard deviation of the fitted parameters
bootstrap_params = np.array(bootstrap_params)
mean_params = np.mean(bootstrap_params, axis=0)
std_params = np.std(bootstrap_params, axis=0)

# Extract the best fit values and uncertainties
best_fit_values = mean_params
uncertainties = std_params

# Plot the data and fitted line
plt.scatter(x, y, label='Data')
plt.plot(x, linear_func(x, *best_fit_values), color='red', label='Fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

# Print the results
print("Best fit values:", best_fit_values)
print("Uncertainties:", uncertainties)

I think the estimate of the uncertainties is correct, but the best fit values should be obtained by fitting to the data itself not by averaging over the best fit values of the resamples. Essentially you need a line like

best_fit_params, _ = curve_fit(linear_func, x, y)

to obtain the best fit values. Both ChatGPT and Bard made the same mistake, which for large enough numbers of resamples will not introduce a large error but is wrong.

So it seems clear that I can’t just ask students to write code to do a standard data analysis problem as in a minute ChatGPT or Bard will give them either a correct or working but subtly wrong code. This is a start but does not tell me what assessment I should set. Presumably the future of programming will be using generative AI as a co-pilot, and so the skill that will be in demand will be to tell if an AI-generated code is actually right. At this moment, I am not so sure how to do this, but I have a couple of months before semester starts.

* Not sure why I said please here, either automatic politeness or hedging my bets by being polite to our future AI overlords?

Teaching & assessing coding in the ChatGPT, Bard, etc era

Published by Richard Sear

Leave a Comment Cancel reply

Share this:

Related

Published by Richard Sear

Leave a Comment Cancel reply