Friday, January 20, 2017

My response to the recent NIH data-sharing RFI: Consider the dynamic between DBs & journals, the cost of datasets, privacy, &c

I was very interested in the recent NIH data-sharing RFI ( In the past I have written a number of pieces about the subject and below I summarize my response and list relevant references.

My Response

(1) The dynamic between databases and journals and between traditional reading and other forms of access should be considered (Reference Collection #1).
(2) There is a substantial cost in maintaining large data sets, both in terms of keeping up internet infrastructure (ie security) and the exponential scaling of data size and compute needs (ref. #2).
(3) The current journal publishing system should be updated to allow for computer parsing of papers and machine readable standards and to make the journal article more like a "mineable dataset" (ref #3).
(4) Sharing private, patient data is problematic; solutions may lie in the framework of a central NIH sponsored resource and in specialized data standards (ref #4).

