(1) The dynamic between databases and journals and between traditional reading and other forms of access should be considered (Reference Collection #1).
(2) There is a substantial cost in maintaining large data sets, both in terms of keeping up internet infrastructure (ie security) and the exponential scaling of data size and compute needs (ref. #2).
(3) The current journal publishing system should be updated to allow for computer parsing of papers and machine readable standards and to make the journal article more like a "mineable dataset" (ref #3).
(4) Sharing private, patient data is problematic; solutions may lie in the framework of a central NIH sponsored resource and in specialized data standards (ref #4).
Reference Collection #1
E-publishing on the Web: promises, pitfalls, and payoffs for bioinformatics.
M Gerstein (1999). Bioinformatics 15: 429-31.
Annotation of the human genome.
M Gerstein (2000). Science 288: 1590.
Blurring the boundaries between scientific 'papers' and biological databases
M Gerstein, J Junker (2002). Nature Yearbook of Science and Technology 210-212 (ed. D Butler, Palgrave Macmillan Publishers)
An analysis of the present system of scientific publishing: what's wrong and where to go from here
D Greenbaum, J Lim, M Gerstein (2003). Interdiscip Sci Rev 28:293-302
The Death of the Scientific Paper
Seringhaus M, Gerstein M (2006). The Scientist. 20(9): 25
Open access: taking full advantage of the content.
PE Bourne, JL Fink, M Gerstein (2008). PLoS Comput Biol 4: e1000037.
Reproducible Research: Addressing the need for data and code sharing in computational science
Yale Law School Roundtable on Data and Code Sharing (2010). Computing in Science & Engineering 12(5): 8-13 (Sept/Oct).
Reference Collection #2
Computer security in academia-a potential roadblock to distributed annotation of the human genome.
D Greenbaum, SM Douglas, A Smith, J Lim, M Fischer, M Schultz, M Gerstein (2004). Nat Biotechnol 22: 771-2.
Impediments to database interoperation: legal issues and security concerns.
D Greenbaum, A Smith, M Gerstein (2005). Nucleic Acids Res 33: D3-4.
Network security and data integrity in academia: an assessment and a proposal for large-scale archiving.
A Smith, D Greenbaum, SM Douglas, M Long, M Gerstein (2005). Genome Biol 6: 119.
The real cost of sequencing: scaling computation to keep pace with data generation.
P Muir, S Li, S Lou, D Wang, DJ Spakowicz, L Salichos, J Zhang, GM Weinstock, F Isaacs, J Rozowsky, M Gerstein (2016). Genome Biol 17: 53.
Reference Collection #3
Structured digital abstract makes text mining easy.
M Gerstein, M Seringhaus, S Fields (2007). Nature 447: 142.
Structured digital tables on the Semantic Web: toward a structured digital literature.
KH Cheung, M Samwald, RK Auerbach, MB Gerstein (2010). Mol Syst Biol 6: 403.
Manually structured digital abstracts: a scaffold for automatic text mining.
M Seringhaus, M Gerstein (2008). FEBS Lett 582: 1170.
Seeking a new biology through text mining.
A Rzhetsky, M Seringhaus, M Gerstein (2008). Cell 134: 9-13.
Getting started in text mining: part two.
A Rzhetsky, M Seringhaus, MB Gerstein (2009). PLoS Comput Biol 5: e1000411.
Reference Collection #4
Genomics and Privacy: Implications of the New Reality of Closed Data for the Field
D Greenbaum, A Sboner, X J Mu, M Gerstein (2011). PLoS Comput Biol 7: e1002278
The role of cloud computing in managing the deluge of potentially private genetic data.
D Greenbaum, M Gerstein (2011). Am J Bioeth 11: 39-41.
Proceed with Caution
D Greenbaum, M Gerstein (2013). The Scientist 27:26 (1 Oct.)
General Note on References
I've compiled the above various sub-collections from: