# Filling apparent gaps in the curriculum is harder than it looks

One of the classic mistakes you can make as a teacher is to spot what you fondly think is a small gap in the curriculum, and then commit to filling it. The not-so-small gap is in our teaching of data analysis. Analysing data is, as I just said to our second years, a key part of doing science. As I also said to them, it is poor practice to use formulas or software such as Excel without knowing what they are doing. Both of these statements are true. The problem is that data analysis is a huge subject, and it is underpinned by lots of maths whose details I don’t know myself and will not be teaching to the students. So, by committing to do one extra lecture to try and improve matters here, I bit off a bit more than I could chew.

You can only do so much in one lecture and a couple of Python Jupyter notebooks. I went through a Jupyter notebook that shows that if the noise in the data is Gaussian, as standard models assume, then these standard error estimates work, but if that the noise has a very different form, they don’t. This makes the important point that the formulas they are using in Excel make assumptions. So far so good.

The bit that worked less well was that the consequences of data analysis being a whole branch of maths only dawned on me too late in the lecture writing. Today was some guy who knows 0.1 % of what there is to know about fitting data, talking to students who on average know about 0.01%, and saying that they should increase this to 0.012 %, without being clear enough why knowing 0.012 % is so much more useful than the 0.01 %. I did say that the new technique in one of the course notebooks, the bootstrap method, has the advantage that it can give an error estimate even when dealing with some complicated quantity, whose standard error expression would probably not be known. I should be able to edit one of the Jupyter notebooks to include a simple example of this, but of course this a bit late for this afternoon’s lecture …