I read Gary Smith's book Standard Deviations with great interest. The book makes
the complex issue of statistical deceptions and mistakes easy to understand
through simple language and entertaining examples. Smith covered statistical
cases ranging from the obvious (e.g., misleading graphs) to much more subtle (e.g.,
mistakes where people see apparent clusters in random data without accounting
for various confounding effects). I especially liked the way Smith tackled the
subtle examples. He went over the many instances in training models and using
them to mine data where the process becomes highly sensitive to various
parameters such as binning. He also pointed out mistakes such as not
sufficiently testing multiple hypotheses and not correcting for confounders. He
described one obvious confounder in detail - how the population is always
increasing and how things correlated with it, in turn, seem to be correlated
with each other. For instance, one can see an ever-increasing amount of diaper
and rug sales, but these are just correlated with overall population growth
rather than being correlated with each other. There is also a nice discussion of
survivorship bias -- how one does statistics only on those that survive and not
the entire original cohort. This was most notably seen in the famous case of the
vulnerable parts of World War II planes, determined only from the planes
returning from combat. Overall, I found this book very easy to read, and I would
recommend it to anyone wanting to avoid statistical blunders.
Book:
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics