Monday, August 12, 2013

Big data!

Trek Domane getting prepped for a washing.
 Last week I attended the Joint Statistical Meetings (JSM) in Montreal. This is THE gathering of statisticians from around the world. “Big Data” was all the rage at the conference. What is big data you ask? Think of it in the context of credit card transactions. Every swipe of a credit card generates at least one record in a database (envision a sheet of paper on a desk). Now consider all of the credit card transactions that occur every second around the world. That is a tremendous amount of data generated (envision the desk now with paper stacked to the heavens). What is more remarkable is that your credit card transitions are processed in near real time. If you travel too far too fast, your card might get declined for fraud protection. We have had this happen when purchasing something online from the far reaches of the world while at the same time swiping the card at a local store thanks to the modern sidekick called a smart phone.

What does this have to do with cycling? 
Believe it or not, you can generate “big data” while biking. In fact, modern cycling computers such as the Garmin Edge 510 generate data files consisting of 1-second sampling of your biking performance. You can collect your precise location via the GPS system, heart rate measurements, cadence, altitude, temperature and of course speed. Some cyclists are now using power meters to quantify how much energy is being put into each pedal stroke. This data too can be incorporated into the data stream captured by the cycling computer. Perhaps even more remarkably, this data can be uploaded automatically to the web and viewed in real time by people around the world!

No self-respecting statistician would stop with the canned summaries made available through default settings on websites. However, how does one begin to move the data around in cyberspace for further analysis? How does one synchronize the data across multiple apps? There are many ways of doing this, some of which involve manual data manipulation (i.e., time). Since cycling is a hobby for me and my many day jobs keep me busy, I’ve put together a rather elaborate series of connect apps to make the “big data” useful in a practical sense.  

My workflow is this: I use a Garmin Edge 510 to capture the raw data generated by my bike rides. The Garmin talks to my iPhone via Bluetooth. At the end of the ride, the Garmin Connect app is used to automatically upload the data to Garmin Connect, which is the repository for all of your cardiovascular-based fitness activities such as running, cycling, swimming, etc. Garmin Connect, while very nice, doesn’t have all of the features I like to see in a website for fitness data.

One thing I like to do is easily compare repeated performances over the same route. Strava is the perfect tool for this. You can see how your own performance changes over time through the use of “segments”. Segments are essentially GPS start and stop coordinates with a defined path connecting them. They create virtual races for people. So while you can compare your own performance over time, you can see how you stack up against others riding the same segment. Cyclists seem to have a love hate relationship with Strava as a result. I find it fun and an interesting way to keep you engaged in riding your bike. Now, I could manually upload all my rides to Strava, but there is no fun in that. There is a tool, garminsync.com, which does this automatically for you. That is my kind of tool.

My fitness fun doesn’t stop there.
I use a FitBit One to record movements throughout the day to better gauge energy expenditure. Since I ride my bike virtually every day of the year, I don’t want to have to manually upload these rides to FitBit to get credit for the calories burned. After all, I’m only really measuring calories burned to quantify how many calories I still have to eat. As you may be suspecting, there is another web app called fitdatasync that will automatically move my Garmin Connect activities to FitBit.

With the combination of Garmin Connect and FitBit, I have calories consumed pretty much quantified on effectively a real time basis. What I now need is to measure calories in. I use the app MyFitnessPal to record food eaten. This is a remarkable tool that has virtually any food you might eat already in its database. You can even use your phone to take a picture of the barcode on the package and it looks up the dietary information for you  – Big Data at work again! The way this app works is that it measures your calories in and uses the calories burned to quantify how many calories you can still eat – the basis of energy balance.

To do this, MyFitnessPal requires an estimate of energy expenditure, which is currently consolidated in the FitBit account. Fortunately, FitBit and MyFitnessPal can be linked so that they share data. Therefore, my calories burned are linked with my calories in and I can better control what I eat in a day. However, all of these calculations are all heavily influenced by weight. Consistent with my intentions to not lift a finger while managing the data approach to data integration, I purchased a FitBit Aria scale. This scale wirelessly uploads to my FitBit account. Fitdatasync syncs weight to Garmin Connect, so calories consumed based on exercise are closer in Garmin Connect. This is helpful since the calories burned during exercise get ported back to FitBit and ultimately MyfitnessPal.

So just think, one simple bike ride generates data that moves through a half a dozen websites and applications all to produce some summary that I make decisions on. This is the essence of what Big Data is about.



No comments:

Post a Comment