Sunday, August 23, 2020

Thoughts on Smith's Standard Deviations: Makes Statistical Blunders Easy to Spot

I read Gary Smith's book Standard Deviations with great interest. The book makes the complex issue of statistical deceptions and mistakes easy to understand through simple language and entertaining examples. Smith covered statistical cases ranging from the obvious (e.g., misleading graphs) to much more subtle (e.g., mistakes where people see apparent clusters in random data without accounting for various confounding effects). I especially liked the way Smith tackled the subtle examples. He went over the many instances in training models and using them to mine data where the process becomes highly sensitive to various parameters such as binning. He also pointed out mistakes such as not sufficiently testing multiple hypotheses and not correcting for confounders. He described one obvious confounder in detail - how the population is always increasing and how things correlated with it, in turn, seem to be correlated with each other. For instance, one can see an ever-increasing amount of diaper and rug sales, but these are just correlated with overall population growth rather than being correlated with each other. There is also a nice discussion of survivorship bias -- how one does statistics only on those that survive and not the entire original cohort. This was most notably seen in the famous case of the vulnerable parts of World War II planes, determined only from the planes returning from combat. Overall, I found this book very easy to read, and I would recommend it to anyone wanting to avoid statistical blunders.

Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics