Sunday, November 17, 2019

Thoughts on O'Neil's Weapons of Math Destruction: What happens when learning for machines makes judgements on people

We read Cathy O’Neil’s book, Weapons of Math Destruction, with great interest. O’Neil discusses many of the biases and unfair situations that arise in the modern data science world based on mathematical models or algorithms. An avid mathematician and former academic who moved to industry to “carry mathematics from abstract theory into practice,” O’Neil realized that her favorite science field was part of a cycle of co-production with society: design of numerous mathematical models reflected and exacerbated social problems.

In a sense, it is obvious that an algorithm trained with biased data will be biased. While an algorithm can be efficient in the sense of optimizing and minimizing the number of false positives and false negatives, each of the “mistakes” might still lead to unjust and unfair outcomes if applied to an individual person or a group. The book makes this point with many clear case studies. Whether they are for loan approval, college ranking, law enforcement, or business optimization, models aimed at improving efficiency or boosting profits backfired. Such models created feedback loops that widened gaps within society because their limited designs proved to be oblivious to broader professional and social contexts.

A few recurring themes stand out from the book. The two most important of which, we think, are the emphases on balanced objectives and heightened awareness needed before building a “Big Data” model. O’Neil strongly makes the case for broadening a model’s objective, understanding its strengths and limitations, and being fully aware of how human biases can diffuse to collected (and uncollected) data. We particularly like the discussion of the widespread worry about the fall in standardized test performance in the United States, and how this turned out to be totally erroneous. In reality, SAT scores in each subgroup were actually increasing, but more disadvantaged kids were taking the test – a mistake due to the famous statistical error known as Simpson's paradox.

The book also touches on the proprietary nature of many commercial algorithms. In this regard, the book praises the well-known FICO score, which is viewed as a model of transparency compared to more closed types of rating systems. We also like the way the book goes through a lot of the jargon of modern commercial data sciences, such as proxies (which are often features that stand-in for a different feature that can’t be as easily measured or is not appropriate to measure) and micro-targeting. Pointing out the limitations of proxies is especially resonant today in light of several studies that followed the book’s publication in 2016, the most recent of which appeared in Science last month on the inherent bias in a widely used algorithm that inaccurately used health costs as a proxy to health needs.

The book is human-centered, no doubt. O’Neil calls for measures in favor of the protection and collective betterment of everyone’s lives. She acknowledges the potential utility of “Big Data,” thoroughly demonstrates that good intentions are not enough, and chooses to raise alarming issues at the heart of this rapidly unfolding field. Altogether, we found this book to be a fun read that we would recommend to anyone interested in large-scale data science that involves actual people as opposed to inanimate objects.

M Gerstein & H Mohsen

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
by Cathy O’Neil