Sunday, March 12, 2017

Thoughts on Harari's Sapiens: An encompassing history of humankind, with non-intuitive insights - eg on our post-Neolithic quality of life)

I read Yuval Noah Harari's Sapiens: A Brief History of Humankind with great interest. The book covers the broad arc of human history all the way from prehistory and prehumans to the current day. It gives one a good feeling for a lot of things that we take for granted in organizing modern life but which are of course essential. In particular, the book talks a lot about how important shared myths and ideas are with things such as what is a company, what is money and how important that is to structuring society? The book also has a fascinating take on a lot of key events in history. For instance, it does not necessarily portray the agricultural revolution and domestication of animals in an entirely positive light, pointing out that while it led to a huge population boom for humans, it made the average person much more miserable, much harder working and much more susceptible to diseases. Likewise, we have a very romantic view of the notion of domestication of animals, but in a sense, it is a cruel practice. While, of course, it led to a great multiplication of certain animal species, it provided individuals of each species a potentially awful life to live -- e.g., think of the cooped-up chicken. The book has further interesting discussions about energy and how we have been able to ever more efficiently extract it from the environment over time, culminating in the industrial and then atomic ages. There are further interesting views on colonization and so forth. Overall, I found this a great read and would highly recommend it.

Sapiens: A Brief History of Humankind (9780062316097): Yuval Noah Harari


Sunday, February 19, 2017

Thoughts on Silver's Signal & the Noise - Easy-to-read tidbits on practical statistical prediction in many fields

I read Nate Silver's book "The Signal and the Noise" with great interest. The author is a renowned popular statistician who has become quite famous in the world of political forecasting. I was keen to see his insights more generally on predictions. The book gives a nice overview of statistical prediction in many contexts, far more than just in political science or economics. In fact, I was most interested in the sections on weather forecasting and earthquake prediction. The book talks about how weather forecasting was one of the earliest places where people tried to do prediction using computers -- doing it, in fact, through simulation of physical models. At this point they found out that the predictions did not work very well because of the complexity of simulating such complicated systems; the butterfly effect was discovered. However, Silver talks about how weather forecasting has grown into a very successful tool where people now routinely make predictions that are quite useful to much of the world's population yet the predictions are still quite statistical in nature. These powerful predictions come from fusing physical models with lots of real world data, collected through sensors and satellites orbiting the sky. This is in great contrast to what happens in earthquake prediction where in a sense one also has a similar situation of an underlying physical model but one cannot readily observe and collect data since most of the forces and factors in earthquake prediction happen far underground. There are many other predication realms that the book discusses in great detail and overall I would highly recommend this book to anyone interested in practical statistical prediction.
The Signal and the Noise: Why So Many Predictions Fail--but Some Don't
by Nate Silver

Friday, January 20, 2017

My response to the recent NIH data-sharing RFI: Consider the dynamic between DBs & journals, the cost of datasets, privacy, &c

I was very interested in the recent NIH data-sharing RFI ( In the past I have written a number of pieces about the subject and below I summarize my response and list relevant references.

My Response

(1) The dynamic between databases and journals and between traditional reading and other forms of access should be considered (Reference Collection #1).
(2) There is a substantial cost in maintaining large data sets, both in terms of keeping up internet infrastructure (ie security) and the exponential scaling of data size and compute needs (ref. #2).
(3) The current journal publishing system should be updated to allow for computer parsing of papers and machine readable standards and to make the journal article more like a "mineable dataset" (ref #3).
(4) Sharing private, patient data is problematic; solutions may lie in the framework of a central NIH sponsored resource and in specialized data standards (ref #4).

Reference Collection #1

E-publishing on the Web: promises, pitfalls, and payoffs for bioinformatics.
M Gerstein (1999). Bioinformatics 15: 429-31.

Annotation of the human genome.
M Gerstein (2000). Science 288: 1590.

Blurring the boundaries between scientific 'papers' and biological databases
M Gerstein, J Junker (2002). Nature Yearbook of Science and Technology 210-212 (ed. D Butler, Palgrave Macmillan Publishers)

An analysis of the present system of scientific publishing: what's wrong and where to go from here
D Greenbaum, J Lim, M Gerstein (2003). Interdiscip Sci Rev 28:293-302

The Death of the Scientific Paper
Seringhaus M, Gerstein M (2006). The Scientist. 20(9): 25

Open access: taking full advantage of the content.
PE Bourne, JL Fink, M Gerstein (2008). PLoS Comput Biol 4: e1000037.

Reproducible Research: Addressing the need for data and code sharing in computational science
Yale Law School Roundtable on Data and Code Sharing (2010). Computing in Science & Engineering 12(5): 8-13 (Sept/Oct).

Reference Collection #2

Computer security in academia-a potential roadblock to distributed annotation of the human genome.
D Greenbaum, SM Douglas, A Smith, J Lim, M Fischer, M Schultz, M Gerstein (2004). Nat Biotechnol 22: 771-2.

Impediments to database interoperation: legal issues and security concerns.
D Greenbaum, A Smith, M Gerstein (2005). Nucleic Acids Res 33: D3-4.

Network security and data integrity in academia: an assessment and a proposal for large-scale archiving.
A Smith, D Greenbaum, SM Douglas, M Long, M Gerstein (2005). Genome Biol 6: 119.

The real cost of sequencing: scaling computation to keep pace with data generation.
P Muir, S Li, S Lou, D Wang, DJ Spakowicz, L Salichos, J Zhang, GM Weinstock, F Isaacs, J Rozowsky, M Gerstein (2016). Genome Biol 17: 53.

Reference Collection #3

Structured digital abstract makes text mining easy.
M Gerstein, M Seringhaus, S Fields (2007). Nature 447: 142.

Structured digital tables on the Semantic Web: toward a structured digital literature.
KH Cheung, M Samwald, RK Auerbach, MB Gerstein (2010). Mol Syst Biol 6: 403.

Manually structured digital abstracts: a scaffold for automatic text mining.
M Seringhaus, M Gerstein (2008). FEBS Lett 582: 1170.

Seeking a new biology through text mining.
A Rzhetsky, M Seringhaus, M Gerstein (2008). Cell 134: 9-13.

Getting started in text mining: part two.
A Rzhetsky, M Seringhaus, MB Gerstein (2009). PLoS Comput Biol 5: e1000411.

Reference Collection #4

Genomics and Privacy: Implications of the New Reality of Closed Data for the Field
D Greenbaum, A Sboner, X J Mu, M Gerstein (2011). PLoS Comput Biol 7: e1002278

The role of cloud computing in managing the deluge of potentially private genetic data.
D Greenbaum, M Gerstein (2011). Am J Bioeth 11: 39-41.

Proceed with Caution
D Greenbaum, M Gerstein (2013). The Scientist 27:26 (1 Oct.)

General Note on References

I've compiled the above various sub-collections from:

Wednesday, December 21, 2016

Thoughts on Kondo's Life-Changing Magic of Tidying - Well-written work, but for a different group than practical scientists

I read Marie Kondo's book on tidying up (The Life-Changing Magic of Tidying: A simple, Effective Way to Banish Clutter Forever) with great interest. I am a confessed neat freak and certainly believe in saying that cleanliness is next to goodliness. I read the book looking for tips on how decluttering one's life up would really help improve one's efficiency and overall enjoyment. Unfortunately the book was a bit of a mismatch for me as it did not really provide all that much practical advice but rather was more of a set of exhortations urging people to be tidy and telling them the good that will come of it. It does not really get too much into practical minutiae of something that I would take home. However it is lucidly written and for a different group it would have been interesting.

The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing (8601421528498): Marie Kondo: Books

Saturday, December 10, 2016

Thoughts on Rovelli’s 7 Lessons in Physics: Great intuition on tricky concepts (esp. gravity) - wish it wasn't so brief

I read Carlo Rovelli’s book Seven Lessons in Physics with great interest. I have some physical knowledge myself but certainly nothing about the sophisticated particle physics that he talks about and I found the book gave me good intuitions about how to think about the frontiers of physics. I particularly like the way he talks about the fundamental differences between the conceptions of general relativity and those of quantum mechanics -- the eternal feud of Einstein and Bohr -- and how they give rise to very different pictures of the universe that to this day have not been fully reconciled. I also liked the way he talked about various attempted reconciliations, such as loop quantum gravity, and the simplifications and assumptions they make. The discussion about heat and the arrow of time was also good. The distinction between the statistical mechanical nature of probability and the quantum mechanical conception of it could have been discussed in a little more detail. I would have also liked a little more detail on many modern complicated physical ideas put forth, such as quarks and the SU(5) theory predicting proton decay theory, but I guess that would be out of the scope of this book. Altogether an enjoyable read. I just wish there had been more than brief lessons.

public tags

Carlo Rovelli, Seven Lessons in Physics

Saturday, December 03, 2016

Letter RE "News Outlets Wonder Where the Predictions Went Wrong," NY Times -- More thoughts on how election '16 deflated my belief in the power of mining social-science big data

I read the recent Times article about election predictions going wrong
with great interest. For me the big loser in the election was not
Clinton (or Trump), but data science -- social science data science, in
particular. The reality is despite all the wonders that we've heard
about how one can do big-data mining on social science data
sets, finding, for instance, highly specific pockets of voters, the
polls were consistently wrong. Consistently, the Times predicted that
Clinton would easily win -- never, in fact, predicting the opposite.
The reality of election day was completely different, with Trump
easily trumping. I do not believe that there was a dramatic reversal
of opinion on the day of the election. The blunt fact is that much of the
earlier polling and associated reporting was inaccurate.

Unpublished letter in response to:

News Outlets Wonder Where the Predictions Went Wrong - The New York Times

Also see:

Saturday, November 05, 2016

Letter RE "My Distraction Sickness, & Yours," NY Mag. -- A post on all the skills becoming obsolete in the iPhone age

Andrew Sullivan's article about decoupling from gadgets and the web
alludes to the perpetual cycle of disruptive innovation followed by
the atrophying of heretofore basic skills. Just like the industrial
revolution resulted in much of the population losing their skills of
sustainable farming, many in the smartphone age are forgetting —or not
even learning— map reading skills (replaced by Waze and Google Maps)
long-hand (replaced by micro blogging), or research skills (replaced
by search engines). It’s not all doom and gloom, however:
Matrix-like, the incredibly interconnected world accessed by our
smartphones that beguiled Mr. Sullivan also allows us to effectively
download previously hard-to-master skills by immediately calling up
the relevant apps such as star gazing, foreign language translating,
knot tying, origami, chess, drink mixing and myriad other abilities,
in addition to the actually relevant and actionable health-related

Mark Gerstein
Dov Greenbaum

Unpublished letter in response to:

I Used to Be a Human Being

An endless bombardment of news and gossip and images has rendered us manic information addicts. It broke me. It might break you, too.

By Andrew Sullivan

Also see:

Letter RE "New Corporate Power Brokers: Passive Investors," WSJ

I read the recent article about passive investing and the influence
of index funds on corporate boards with great interest. It's
instructive to think of the extreme situation where index investors
make up a larger and larger share of the overall stock market, yet
the incentives on these investors is very much to minimize their cost,
and not to spend money retaining people to actively investigate the
companies they own. One wonders if this extreme will lead to appropriate
oversight. It certainly will lead to more management decisions being decided by
an increasingly small fraction of shares owned by active investors.

Unpublished letter in response to:

New Corp. Power Brokers: Passive Investors, WSJ

Also relevant:

New Corporate Power Brokers: Passive Investors
Control from those w. strong incentives to spend less time on