Analyzing Time-Series of Individual Data
If you haven’t been living in a cave for the last couple of years, you definitely noticed an increase in data collection, data mining and visualization. HRV tracking, jump output tracking, estimating 1RMs from velocity-load data, game statistics, performance analysis, various testing statistics, body weight, Run Keeper, Run Tracker, and all that quantified-self movement.
Collecting data is getting easier and easier – even without one being aware of it. What is still falling behind is making sense of all that data. For example, you might have been collecting HRV or rest HR every morning for the last couple of months, or even better training load using session RPE and duration. How do you analyze this? How do you visualize this data? How do you make sense of it? How much certain statistic need to drop to provide any worthwhile change and real-world effect?
Luckily, the statistics we learned in school didn’t help us. Too much reliance on Fisherian approach (using p value) and too much usage of statistical significance that doesn’t mean much to a coach. Even worse, they (lay people with no formal education in inferential statistic) misinterpret term statistical significance as real-world significance, instead of low chance [p<0.05, p<0.01, p<0.001 etc] of acquiring such an extreme score if null hypothesis is true. If this sounds confusing – it is, and unfortunately, according to Geoff Cumming (author of excellent Understanding the New Statistics book) even the researchers don’t get these concepts right.
If you are interested in these subjects you should definitely read everything ever written by Will Hopkins – and I will give you a quick-start presentation one need to read to understand the important concepts of magnitude base statistics and SWC (Smallest Worthwhile Change) and TE (Typical Error):
How to Interpret Changes in an Athletic Performance Test [Very Important read]
Client Assessment and Other New Uses of Reliability [Very Important read]
Couple of great researcher, like Martin Buchheit (@mart1buch) are pushing the envelope in using magnitude-based statistics (SWC and TE and chances) – but as far as I know a lot of journal editors are still resistant to forget about p value.
The Dance of p values
Anyway, as coaches we are not interested in group averages and making an inferences to a populations (at least we shouldn’t if we are not thinking about research career). We are interested in individual response and unfortunately we had a lot of flawed thinking over the years using flaw of the averages and thinking that all individuals will respond in a similar and predictable way. Welcome to the biological complexity.
Presentation slides from WindSprint 2013
Luckily a lot more studies are leaned toward showing inter-individual variability, quantifying it and visualizing it, besides worrying only on the group averages and whether they get statistically significant effect of the treatments.
What we need to do is start thinking in terms of individuals and their unique reactions. All training is single subject experiment, even if you work in team sports (a bit harder to implement, but still very important).
Taisuke Kinugasa (@umekinu) is one of the few researchers focusing on single-case research design and analysis of single-subject time-series. If you are wondering what are single subject time series it is all that data you collect on yourself (quantified self), like HRV.
Speaking of HRV, recent papers coauthored by Martin Buchheit and other great researchers, brought into light some very applicable tips for coaches to be used on a daily basis. Part of that applicability is using SWC and TE (progressive statistics, magnitude-based approach) and single-case design (in some papers).
Evaluating Training Adaptation with Heart Rate Measures: A Methodological Comparison. Int J Sports Physiol Perform. 2013
Heart rate variability in elite triathletes, is variation in variability the key to effective training? A case comparison. Eur J Appl Physiol. 2012 Nov;112(11):3729-41
Training Adaptation and Heart Rate Variability in Elite Endurance Athletes: Opening the Door to Effective Monitoring. Sports Med. 2013
Cardiac Parasympathetic Reactivation Following Exercise: Implications for Training Prescription. Sports Med
What they showed is that having either week averages or rolling 7-days averages “appears to be superior method for evaluating positive adaption to training compared with assessing its value on a single isolated day”.
I have wrote about rolling averages and Z-scores in evaluating wellness data HERE so I won’t go into details too much.
Another interesting approach was to estimate BASELINE for each athlete and estimate SWC of that baseline. The researchers did this by taking first two weeks of the intervention as baseline. Then this baseline and SWC of it (usually 0.3 to 0.5 of intra-individual SD) is used to estimate ‘context’ to 7-days rolling averages.
Sometime this approach is used in sports and for baseline is taken certain period of the year. Another option is to have ‘rolling’ average as well and that might include longer time frame than 7-days rolling average. Again, there are pros and cons of each approach and analyzing time series is more an art than it is a science. Not sure if there is a right thing to go about it.
The idea is to get baseline and SWC, and then to use Rolling averages and TE (it is beyond me how is this calculated, except using rolling 7-days SD) to get chances for beneficial/trivial/harmful changes (see links above from Will Hopkins).
The simplest approach might be to use percent change between last score and rolling average (or longer baseline). Unfortunately this approach doesn’t take individual variability into considerations (see more HERE).
Another approach that takes this into account is to get daily Z-Score which is number of rolling 7-days SDs that last score is different that rolling average [Z-Score = (Last_Score – Rolling_AVG) / Rolling_SD ]. I believe that this is the approach behind iThlete HRV coding system. If you are out of your normal variability then you get a flag.
What we want to achieve with all these approaches is ‘flags’ – what is a normal score and what is abnormal. Again this is more art than it is a science, but I believe the right analysis is a must – one just need to put it in the right context.
Long story short, I have created a Excel workbook that analyses time-series using some of the approaches above. I wanted to thank Andrew Flatt (@andrew_flatt) for providing me with his HRV data and to Andrew Murray (not the tennis player - @cudgie) for giving me an idea of using Effect sizes for comparing Baseline and Rolling average (same as daily Z-Score).
Here is the video of me demonstrating the software and below you can find a link for downloading the Excel workbook.
Click HERE to download Excel workbook
Great work once again Mladen!
ReplyDeleteI have a couple of thoughts. I'v been using the SWC myself with HRV data for a little while now and find it a great tool. But still not 100% whether this can really be applied to time-series monitoring data as the SWC statistic is meant for changes in athletic performance, would be interested to hear your thoughts?
Secondly, with regard to applying meaningful inferences to this data (beneficial/trivial/harmful) this is something i have been trying to do myself. the equation used to do this in excel is very complicated but involves:
(TDIST((SWC-Δdata)/SE,df,trails)
To my knowledge as this involves degrees of freedom it would be impossible to apply this when N=1
Let me know if you come up with a solution
Keep up the great work!
John
Hi John,
ReplyDeleteThanks for feedback.
I have/had those same thoughts/questions as you. I believe SWC is a 'concept' that could be applied everywhere. When applied to team testing, SWC is 0.2 x SD between individuals. When applied to a single athlete over multiple tests (or competitions) SWC is 0.3 to 0.5 of individual SD. I have pointed out on this in the video, and as you I am not sure exactly how to calculate it for time series.
In the papers on HRV, they have used 0.5 x SD of the baseline (first two weeks). So I guess the concept is still very valid, but the calculus for it might be more an art and maybe even coach estimate. (Bayesian approach anyone? :) )
As for the other question - no need for T distribution, since you are not making inference to a population with a group scores/averages. Use normal distribution IMHO. But here is the "problem" - if
SWC = 0.3 x Baseline SD
TE = 7-daysRollingAvg SD
Then CI (90%) for this TE are 1.645 x TE (if I remember correctly) and that is basically 90% spread of the scores in 7 days rolling average.
Using TE this way and calculating beneficial/trivial/harmful gives you the percentages of data points in Rolling averages that are in beneficial/trivial/harmful zones based on Baseline SWC.
Thus, if your 'chance' of harmful is more than x% than flag it. Same for beneficial.
I might do another work-book and play with this concept.
Hope this helps