It is something molecular biologists and biomedical scientists often ask: do we really have to do statistics? The question implies that if only they could get on with their important, life-saving discoveries the world would be a better place.
There is some logic behind this. One of my once fellow students, now a well-respected microbiologist, once explained it to me as follows: in the lab something either works, or it doesn't. There isn't much need for statistics in this.
But when it comes to translating a lab result into a successful, life-saving treatment, it isn't just like that. To do that one needs to show that the treatment is safe, is without significant side effects, and that works better than existing treatments. This involves large and costly clinical trials and the careful evaluation of the results. And to do that, you guessed it, you need statistics.
But where does the Guinness come in, even for me as I don't drink it—I can hear you think—and how much good has it actually done? Well, all sorts of nutritional and psychological benefits have been attributed to the beverage, but that is not what this is about. This is about the science that comes with brewing beer.
Just before the start of the twentieth century, the Guinness brewing company of Dublin was one of the largest breweries in the world. It had started to recruit the best chemistry graduates they could find to appoint them as brewers, taking leading roles in the organisation. In 1899 they recruited William Sealy Gosset. It was a good appointment: Gosset was a capable administrator and he rose to become the head brewer of the new Guinness Brewery in London in 1935.
But Gosset wasn't just a chemist: he had graduated with a combined degree in chemistry and mathematics. And although by day he mainly used his chemistry skills, he often used his math skills to work on statistical problems in the evenings, at home.
He was fascinated by statistical problems that applied to his work. For beer, having good barley is crucial. So barley was grown in plots at different farms to find the best barley for the brewery. But the variation in these experiments was high and the results were difficult to interpret, what could they say about the mean yield of these samples really? Statistical theory of that time required large numbers of samples, and that allowed estimating the mean and the variance. But for Gosset's problems this didn't work: some of the data came from the plots on four farms, and with such a small number of plots he had to do something different, but what?
Gosset worked out an alternative method that didn't need the estimation of the variance. See him sitting at his kitchen table, quietly working so you can hear the steady hiss of the gaslight. On the table he has stacks of pieces of cardboard with numbers written on them, he painstakingly calculates the means of small samples he draws from the stacks, and then the distributions of those means.
Gosset's thinking was that if you have a small sample, the mean will differ for every sample. But if sample many times, the spread of all the means allows you to work out confidence intervals for the mean. Gosset did something that we now know as bootstrapping: a statistical method in which data are resampled using a computer. But there weren't any computers then, all he had was stacks of cards. To work out his alternative method he therefore had to use math. "Now let's assume that this data comes from a normal distribution ... ", he must have thought. And out came a simple result.
The results of his statistical endeavours were useful for the brewery. Gosset thought they had importance beyond that and wanted to publish them. The brewery was not keen: they had had bad experiences with publication when a Master Brewer had inadvertently revealed some of the secrets of the brewing process. So they decided that if Gosset published, he had to use a pseudonym, either 'Pupil' or 'Student'. Gosset choose Student. His paper on The probable error of a mean
was published in 1908.
Ronald Fisher was still at school when the paper came out. In 1912 he came across Student's paper as an undergraduate. Fisher realised the wider implications and used Student's distribution to morph it into the test that we now know. And so, Student's t test became a workhorse of statistical analysis.
Fisher didn't stop there and kept himself busy. In 1917 he married Ruth Guinness; she was from the preaching rather than the brewing branch of the family. He developed another beverage related statistic: in the 'lady tasting tea' experiment he challenged a female colleague if she was capable of tasting whether the tea or the milk had first been put in the cup. The test he designed for this is Fisher's exact test; it is related to the chi square test. As it turned out, the lady proved Fisher wrong and could indeed taste the difference (p<0.05).
Under the exciting title Analysis of Crop Variation II: The manurial response of different potato varieties
, Fisher first published on the method of analysis of variance (ANOVA). It has since has found many applications beyond manure. Fisher's work wasn't limited to statistics either, he worked in evolution and genetics. To this day, every textbook on statistics, evolution and genetics is likely to have whole chapters devoted to Fisher's work.
But why should medical scientists know about brewing beer, tasting tea, and the response of potato varieties to manure? Medicine has been around for thousands of years. Potions, ointments and pills have been made forever, but we just didn't know if the individual recipes of herbologists and apothecaries actually worked. What turned medicine into medical science was the scientific method: the systematic collection of evidence to support or contradict a theory. This methodology has revolutionised science since renaissance times.
In medicine, this took a little time. It was not until the statistical methods that allowed efficient discovery of small differences were developed that modern medicine really took off. These tests are the mainstay of clinical science: in a review of nearly two thousand papers from medical journals Student's t test, Pearson's chi-square test, Fisher's exact test and ANOVA were used in over 90%. Life expectancy at birth in the UK his risen from about 57 in 1922 to over 79 in 2017, to a large degree due to the development of novel drugs, vaccines and medical and public health technologies. Gosset and Fisher were instrumental in this, and their work saved innumerable lives. And that is why Guinness has been so good for us and why —my dear fellow student of years ago— biomedical scientists need statistics.
Vincent Jansen, March 2019
David Salsburg wrote a popular science book about the development of statistics, its main players and impact on science:
The review paper about statistical tests in the field of medicine mentioned in the blog is: :
Vincent Jansen is Professor of Mathematical Biology at Royal Holloway, University of London . In his research he uses mathematics and models to understand how things work in biology, evolution and epidemiology.