Sunday, November 04, 2012

Experiments in Self Quantification and Anonymization

In this post I thought I'd explore some of the issues with publicly displaying personal health information while simultaneously attempting to analyze one's own biological and medical measurements. Like many others I've become quite interested in what can loosely be described as the "quantified self movement" -- trying to record a lot of measurements about your physical being throughout the day in an attempt to improve your health and just learn interesting things. I have procured a variety of devices such as heart rate monitors, pulse oximeters, and peak flow meters and have tried to use these systematically to record various bits of information.

The question is what is this useful for? To find out, I have tried to do a number of correlations -- in particular trying to correlate my heart rate with my weight as well as with my peak flow. At the same time that I am interested in doing this -- and perhaps posting it and sharing it because I am putting so much effort into it -- I am very conscious of the privacy of this type of information. So I have also wanted to make an attempt to post it and make it available while also anonymizing it to some degree.

In the following web accessible chart and tables I have tried to post the data and some of the correlations but at the same time to partially obscure aspects of the underlying measurements.

https://docs.google.com/spreadsheet/ccc?key=0AnSCkRiRBZ5KdDl6ZUwwR05HMnBKQzBIMnR6cHFxYXc#gid=0

In particular, this is what I have done:

1. For the field called “Date” I have listed a number of days that I took these measurements; however I have shifted them slightly so the temporal ordering remains the same, but you cannot see the exact date that I performed the measurement.

2. For the fields called “Heart Rate”, “Weight” and “Peak Flow” I have tried to rescale these to a population mean. I think you can do this rescaling and still keep the correlation between the quantities as the correlation is invariant to linear transformations. In particular, I have tried to adjust to a population mean that would be appropriate for roughly a person of my height, weight and age but not exactly. So again you can get a rough sense of the numbers that I have and a rough sense of how they correlate without knowing my exact weight and peak flow at a particular date.

At the end of the day, here's what you get for the correlations:
r(HR v PF) = .47
r(HR v WGT) = .22
So it appears that heart rate is a bit more correlated with peak flow than weight.

I am curious to get a sense from people if they think this type of posting of data is at all useful and whether they can glean more about me from this information than I think they can. In any case I would welcome feedback.



Some useful links:
http://www.livestrong.com/article/344100-the-maximum-heart-rate-in-a-45-year-old/
http://www.medicinenet.com/exercise/page4.htm
http://www.users.globalnet.co.uk/~aair/asthma_PEF.htm

[My underlying files are gsheet "Merge of Health and Exercise Data" & big-merge-of-health-n-exercise-data-reworked-25aug12.xls]