Sunday, August 23, 2020

Thoughts on Smith's Standard Deviations: Makes Statistical Blunders Easy to Spot

I read Gary Smith's book Standard Deviations with great interest. The book makes the complex issue of statistical deceptions and mistakes easy to understand through simple language and entertaining examples. Smith covered statistical cases ranging from the obvious (e.g., misleading graphs) to much more subtle (e.g., mistakes where people see apparent clusters in random data without accounting for various confounding effects). I especially liked the way Smith tackled the subtle examples. He went over the many instances in training models and using them to mine data where the process becomes highly sensitive to various parameters such as binning. He also pointed out mistakes such as not sufficiently testing multiple hypotheses and not correcting for confounders. He described one obvious confounder in detail - how the population is always increasing and how things correlated with it, in turn, seem to be correlated with each other. For instance, one can see an ever-increasing amount of diaper and rug sales, but these are just correlated with overall population growth rather than being correlated with each other. There is also a nice discussion of survivorship bias -- how one does statistics only on those that survive and not the entire original cohort. This was most notably seen in the famous case of the vulnerable parts of World War II planes, determined only from the planes returning from combat. Overall, I found this book very easy to read, and I would recommend it to anyone wanting to avoid statistical blunders.

Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics 

Sunday, August 02, 2020

Some defunct links to various tagging sites

Old bookmarks (Dec. '06 to Sep. '11), including article clips (mostly popular)

For Delicious, here are some definitions of my public tags. Notice, how it contains a subdivision into tags that are applicable to bookmarks, blogs, images, etc using yet more tags. It also contains pointers to tops of link clusters (overviews and centers)

Backflip bookmarks (broken): Quicklinks, current ones, sent-email-about-this

Digg history

Sunday, June 21, 2020

Thoughts on Christian's & Griffiths' Algorithms to Live By: Great Intuition for Important Concepts

I read with great interest Brian Christian and Tom Griffiths' Algorithms To Live By. The book does a fantastic job of connecting abstruse computer algorithms to real-life situations and intuitions. I couldn't stop jotting down various tidbits that I wanted to remember; I advise people to read it!

The book begins with a discussion of the stopping problem. It relates this to familiar situations in life, such as when people need to decide on a partner, rent a home, or otherwise gather information to make a decision. Then, the book focuses on sorting, connecting this to how people arrange books on a shelf, and also to how teams sort themselves in tournaments. The book also highlights how certain sorting algorithms are more robust to noise than others (eg bubble vs. merge sort).

The book discusses ways of storing information, providing useful hints on how to organize one's closet based on the LRU cache. The discussion of Bayesian mathematics is excellent. It explains how one uses different priors in common-sense reasoning; for instance, movie revenues follow a scale-free prior, whereas human age is centered on a mean, using a Gaussian prior.

The book also talks about various randomized algorithms, such as Monte Carlo and simulated annealing, giving a sense of how sampling randomly allows one to tackle problems that could not be addressed otherwise. Other techniques, such as constraint relaxation, also enable one to deal with intractable problems and start to get at solutions.

The book ends with chapters on networking and game theory. The discussion on networking highlights the importance of buffers and how our inundation of messages in the modern world reflects that we can essentially keep everything in a buffer (i.e. no tail drop). One prominent omission for me was that there was no discussion of network science, the analysis of connectivity patterns in large networks (e.g. hubs and bottlenecks). The section on game theory relates Alan Turing's famous discussion of the halting problem to the intractable regress that one has in playing poker or trying to estimate a stock price, where one is not thinking only of one's own estimation but what one's opponent thinks - and so on.

The book concludes with the notion of computational kindness, where one tries to interact with others so as to minimize the amount of computations they have to do. Altogether, the book was a great read that I would highly recommend to others.

Algorithms to Live By: The Computer Science of Human Decisions 0th Edition, Kindle Edition
by Brian Christian (Author), Tom Griffiths (Author)


My quotes:

My tag (associated with the book):

Monday, June 15, 2020

Thoughts on Zuckerman's The Man Who Solved the Market: Great story about Personalities, but not much Tech Stuff

I read Gregory Zuckerman's book, The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution, with great interest.

Before reading the book, my understanding of Jim Simons was that he was a person doing big data and data science before it was cool— in a wide-open "blue sea." Now, data science is popular on Wall Street, with many firms competing with each other—in a "red sea." I was eager to read the book to learn more about some of Simons' seminal ideas and how they became the foundation for applying a quantitative approach to the market.

Unfortunately, I found that the book lacked substantial insight into Simon's algorithms and tactics.
Instead, the book focused more on personalities and their histories. It described how Simons started a very small operation that eventually grew to become the highly successful Renaissance Fund. Technical buzzwords were mentioned – such as "hidden Markov models" and "Baum-Welch algorithm" – and there were some hints about Renaissance's strategies, such as making trades at certain times of the day, accumulating very granular information on trading history, and looking at correlations between related stocks. However, overall the book provided very few details on Simon's overall approach. Perhaps this was by intention due to the great secrecy surrounding Simons and the Renaissance.

In any case, I enjoyed the book despite the dearth of technical knowledge.

Gregory Zuckerman
The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution



My quotes on goodreads:

Monday, May 11, 2020

Thoughts on Lents' Human Errors: Fascinating physiological & molecular anecdotes on our species' weaknesses, illuminating the non-obvious paths of evolution

I read Nathan Lents' book Human Errors with great interest. This book goes over a variety of physiological and molecular errors in humans that are somewhat paradoxical: they make us, in a sense, significantly more vulnerable than our immediate animal cousins. In particular, writing this review from the vantage point of mid-2020, I found it fascinating that this book was published in 2018 and cautions us on how vulnerable the human species is to global pandemic.

The book begins by describing examples of physiological oddities, such as our prevalence of knee injuries (i.e., ACL) and our upside-down sinus drainage patterns, which to some degree were caused by our recently upright posture. Next, the book delves into molecular defects, looking at the large amount of supposedly junk DNA and the many pseudogenes in our genome. Lents relates the pseudogenes to vitamin deficiencies such as the pseudogene for the GULO enzyme being associated with vitamin C deficiency. Lents then talks about autoimmune diseases, such as Graves disease and Myasthenia gravis, that we are much more prone to than our immediate animal relatives. The book culminates with a focus on the human brain and how it, too, sometimes suffers in comparison to cognitive set-ups elsewhere in the animal kingdom. For instance, our flicker-fusion threshold is considerably lower than that of dogs and birds, meaning that we are less able to resolve things moving quickly. In addition, we tend to be easily overwhelmed by large amounts of data, despite our belief that we can reason with "big data" well.

Overall, the book is a good read. I have some small quibbles on the discussion of junk DNA, which I think is a bit exaggerated. I believe that much of this DNA does have various uses, albeit somewhat indirect. Nevertheless, Lents' illustration of how evolution doesn't always lead to the optimal endpoint is compelling.

Human Errors: A Panorama of Our Glitches, from Pointless Bones to Broken Genes
by Nathan H. Lents (Author)


My quotes on goodreads:

Tuesday, January 28, 2020

Thoughts on Strogatz's Infinite Powers: Great intuition on calculus, from a master teacher

I enjoyed Steven Strogatz's new work, Infinite Powers: How Calculus Reveals the Secrets of the Universe. The book gives an excellent overview of calculus, which permeates all branches of mathematics and so much of life. I should say at the outset that I had the great opportunity of being taught calculus in college by Dr. Strogatz. After reading this book, I feel even more fortunate for this experience because he's such a gifted communicator.

What I liked mainly about the book was the intuitive way Strogatz describes differentials and the development of calculus from Newton and Leibniz onwards. He introduces these concepts in several ways. My favorite was the way he demonstrated simply cubing the number 2 and contrasting it with cubing 2.01, where the latter can be expressed as the cube of a sum (2 + .01) and then expanded out with Pascal's triangle. From merely looking at the multiplication of these numbers, one can immediately get a sense of which terms can be neglected in this specific sum and in the whole process of differentiation.

Strogatz also clearly explains many classic equations in mathematics and physics, such as the heat and the wave equations. I particularly liked the way he described the development of the Fourier series and how this series converts differentiation of sine and cosine into a simple multiplication by minus one, making it easy to deal with. I also liked how he explained how one can easily express even very angular shapes such as a triangular waveform in terms of Fourier series.

I enjoyed many of the practical examples of how we can see calculus in everyday life, ranging from the oscillations of HIV in people, as tracked by Alan Perelson and David Ho, to the development of CT scans by Hounsfield and Cormack. Strogatz gives an especially hands-on understanding of the fundamental theorem of calculus by describing it in terms of a well-known paint roller analogy and how it can link together the disparate ideas of the slope of a function and the area under a curve.

Finally, I enjoyed the discussion of many of the personalities in mathematics, such as Descartes and Fermat. I hadn't appreciated the famous feud between these two until I read the book.

Overall, a great read. I'd highly recommend it, especially for anyone studying or using calculus.
Infinite Powers: How Calculus Reveals the Secrets of the Universe
by Steven Strogatz


Sunday, December 29, 2019

Thoughts on Mukherjee's Emperor of All Maladies: Learned about the science of cancer & many of its personalities (finally know who Dana & Farber were!)

I had heard that The Emperor of All Maladies – written by award-winning oncologist Siddhartha Mukherjee – is a “must-read” about cancer. I was not disappointed. The book is fascinating, examining all aspects of the disease in the framework of a broad story arc. Mukherjee did an excellent job interspersing captivating language and vignettes – such as a quote from Susan Sontag about illness being the dark side of life – with science and history on an all-important disease.

The book provides a comprehensive history of cancer, beginning from its first identification by the Greeks (as “oncos” ) to the present. The detailed descriptions of the development of chemotherapy, radiation, and surgery treatments were engrossing. I was struck by the importance of blood cancers for the development of the first chemotherapeutic agents, as well as the importance of surgeons such as William Halsted in devising various ways to remove tumors.

I was interested to learn about the early work on epidemiology and prevention in relation to lung cancer. Richard Doll and Austin Bradford Hill pioneered a new approach to epidemiological statistics to link cigarette smoking and cancer. These researchers deserve high praise for making this critical link and changing the field of epidemiology.

Mukherjee also discusses other aspects of cancer, including the development of a massive apparatus for cancer funding with institutions such as the Dana-Farber Cancer Institute and the Jimmy Fund, and how these came together successfully to raise millions of dollars to fight the disease.

The only thing I felt the book was missing was a section describing how recent developments in cancer immunotherapy fit into the whole discussion. Nonetheless, I whole-heartedly recommend the book. The Emperor of All Maladies: A Biography of Cancer (8580001040431): Siddhartha Mukherjee: Books