The previous post was perhaps a bit of a moan about epidemiologists and medics focusing on randomised controlled trials (RCTs) and observational studies, to the exclusion of absolutely every other study. This was in the context of using masks to reduce the probability that you catch COVID-19. But moaning is not very constructive. A better approach is to say: OK so epidemiologists are not combining epidemiology data with other data, but there is nothing to stop you doing this. So what can be done if data from epidemiology studies are combined with data of other types?
The plot up top shows the result of a simple combination of data from epidemiology (the y axis) with estimates of the relative efficiency at filtration of FFP2s and surgical masks (the x axis). The just-published review of Boulos and coworkers includes an RCT by Loeb and coworkers, that compares the probability of infection of health care workers who wore FFP2-type masks, with the probability for those that wore surgical masks. Loeb et al‘s best estimate is that the probability of infection of someone wearing an FFP2 is 0.87 times as large as if they wore a surgical mask.
Our best guess is that if someone is wearing an FFP2 versus a surgical mask, then the inhaled dose is very roughly 0.2 times as large. These two numbers, 0.2 of the dose (x axis – non-epidemiological data) and 0.87 (y axis – epidemiological data), give the single red data point in the plot above. By combining the data, we estimate that reduced the inhaled dose by a factor of five reduces the probability of being infected by around 13%. This is potentially useful. If say we redesign a ventilation system (in a hospital, this is a study of health care workers) such that it reduces the amount of airborne virus by a factor of five, we now know that this should reduce the number of infections by about 13%.
The uncertainties in both numbers are large. The FFP2 standard requires a filtration efficiency of around 95%* as worn, but there is some uncertainty as to whether health care workers always wear them correctly. So for a rough idea I put the range of possible efficiencies of FFP2s as 90 to 95%. Surgical masks are tougher, the standard that defines them*, has no as-worn filtration efficiency (because it is a bad standard), so I guesstimate a filtration efficiency of between 40 and 60%, as worn. This gave me the error bar along the x axis of between 0.05/0.6 = 0.08 (=fraction let through by FFP2/fraction let through by surgical mask) to 0.1/0.4 = 0.25.
The confidence intervals for Loeb and coworkers’ study are from 0.58 to 1.32. Two points about that. The first is that the interval is broad because it is a small study, only about a thousand people. The second is that the interval includes ratios greater than one, i.e., that the probability of infection is greater while wearing an FFP2 than with a surgical mask. At first sight this makes no sense, FFP2 masks filter better and so you should inhale a lower dose while wearing an FFP2 than a surgical mask. And I would say that this is almost certainly the case. Only “almost certainly” because there could be human factors at work, if maybe for some reason the health care workers wore the FFP2s very badly, that could give this affect. But their use of FFP2s would have to be abominable, which seems very unlikely. It is also true that it is an assumption that the probability of becoming infected always increases when the dose increases. This seems very reasonable and very likely to be true. It is hard to see how inhaling less virus makes you more likely to become infected, but it is an assumption.
So the uncertainty interval of Loeb and coworkers is very likely to be too wide, it should stop no higher than at a value of 1.0, i.e., it should not extend into the cyan-coloured region in the plot. For these studies, the error bars are just obtained using statistics, there is no attempt to look at these error bars to see if they make sense in the light of what the phenomenon actually is (here airborne virus transmission).
To recap: epidemiological data is great, it has huge strengths, it directly measures what we want to measure, the number of people becoming infected, as a function of something we can control, mask wearing. But it also has weaknesses. Without a model of transmission, epidemiological data for one set of circumstances, e.g., mask-wearing health care workers, cannot be applied to any other set of circumstances, e.g., improved room ventilation, or even say mask-wearing workers in the hospitality sector.
This means that for any new or different NPI application you need to do a new RCT. This is impractical. Human society is complex, you can’t do RCTs for every type of employee or interaction. As it is the few RCTs done are so small that they have huge confidence intervals, and reducing these intervals by a factor of ten means you need to study a hundred more people, with a hundred times the cost.
So we are not going to RCT ourselves to a position where we know what are the best NPIs and how well they work. The scarce RCT data needs to be supplemented with other data, for example, on mask filtration efficiency. This is what I have done above. I don’t think epidemiologists are going to do this, so maybe I should.
* FFP2 is not a (physical) mask but a mask standard, it is defined by the European standard EN 149. Similarly, the surgical mask standard is EN 14863. Brief summaries of both are in here.