Tuesday, November 26, 2013

No-Holds-Barred Interview with Dan Baker 2013

No-Holds-Barred Interview with Dan Baker 2013 



It has been more than two years since I have interviewed Dan Baker (click HERE for the interview). That interview was one of my favorite interviews I have done and read if not the very best, mainly due honest, extensive and personal answers by Dan Baker. I have said back then that Dan is my role model and he still is – even more now then ever.

I decided to interview Dan again and I hope this to be practice in the future as well (every 2-3 years J). I am more than thankful to Dan for his time and good will to answer all of my questions extensively, honestly, personally and insightfully like he always does.

The goal of this interview was to cut the crap with all that monitoring mumbo-jumbo and get back to the reality ~ to the building of the performance culture, winning habits and hard/smart work.  I hope that both me and Dan managed to convey this message. 

Enjoy this very insightful interview from one of the leaders in strength and conditioning.


The man...the legend... Dan Baker



Mladen: It has been awhile since the last interview we did and I am sure a lot of things changed. There is not need to introduce yourself, because I am pretty sure all of my readers know very well who you are. What did change in the last two years – give us an update Dan

Dan:  Hi Mladen, thanks for the opportunity again.  The biggest change I suppose it is the fact that I no longer work for the Brisbane Broncos NRL club.  I worked for them from Sept. 1995 to November 2013.  It is pretty full-on life at elite pro level, there is continual stress and performance scrutiny.  Now I can relax a little more and focus on coach education with ASCA courses and workshops, some occasional lecturing at Edith Cowan University and other private things that I do and plan to do.  So anything I talk about below is what took place in the last few years or from other times when I was there or from my other coaching experiences before the Broncos ~ it may not be happening at the Broncos now or in the future.


Mladen: My questions in this interview will not involve much “set & reps” questions, but rather some other profound things that needs to be addressed, but are always lacking focus. In my opinion I think that's the team/club CULTURE and everything that goes with it and surrounds it (goal setting, communication, trust, responsibility, commitment, accountability, motivation and conflict management). What are your thoughts about it?

Dan:  Culture is set by previous traditions and practices at a club and how the current coach (& staff) and senior players see those traditions and practices in relation to the present.  If there has been a culture of winning by training hard, not complaining, doing what needs to be done etc, then generally new, younger players will buy into it.  If a previous culture was not to train too hard, lack discipline, aim for the occasional good performance, then will it (this culture) take you to the big games? 

If your senior players set the tone, then who are the younger, less accomplished players to reject it?  So the senior players and coach need to come up with what they want from the team, then have overall meetings so that the larger playing group feel included about what the team culture is in regards to all these things. 

From my experience, me complaining to a player about his non-adherence to anything (eg. diet, hydration, fitness levels, adherence to rehab protocols etc) is much less effective than if a senior player says to him  ”Can you get your act together over this.  We need you in good shape but you are not displaying to us that you are as committed to the rest of us.  It is like you are saying that your comfort levels are my important than the teams success”.  This has a huge effect.

At the Broncos, we always had good senior players who set good examples with regards to training and discipline.  I believe the problem for all sports/teams now is “Do archetypal Gen Y want to give up something (effort/comfort) for the good of the team?” Many people from many sports are questioning whether they do, so that will be the problem facing coaches in the immediate future, with regards to adherence to team culture.  A sense of entitlement exists with many younger athletes.  But yet again, some younger athletes are magnificent. 
So culture is sometimes “Monkey see, monkey do”.  

Despite all of the above, we still had financial fines at the Broncos for small breaches of discipline (being 1-minute late for training, not taking your protein or vitamins etc) but only young players ever fell foul.  But we should not have needed them, if younger players understood that PROFESSIONALISM IS NOT ABOUT A PAY PACKET, IT IS ABOUT A STATE OF MIND. 

Shannon Turley described it at the NSCA 2013 conference as the difference between a professional (does things necessary to do the basics of their job) and a technician (does things to do their job as perfectly as possible).

I am not sure if financial fines allows us to see if a player is technician, professional, semi-pro or amateur in their mindset.  I never gave a player a fine.

Motivation is easy in one sense, it is about goal setting and reinforcement, but it is also difficult.  It is really behaviour modification we are often concerned with ~ modifying the behavior of amateurs into semi-pros into professionals and then into technicians.  Making behaviours equate to the goals is difficult.  Everyone has the goal of winning the championship but are their behaviours (eg. eating junk food every day) going to get them there?


The Professional vs The Technician. See more by Coach Turley HERE



Mladen: It seems that monitoring and technology is on the raise lately.  What are your thoughts on monitoring sRPE (Session Rate of Perceived Exertion) and other subjective indicators, like Wellness Questionnaire, POMS and others? How can one use them in real life? How can you modify training loads, without being manipulated by the players who are lacking commitment? How to avoid boredom and lack of thrust in those? What should happen if an athlete report feeling tired? What is the right message to send to the players?

Dan:  Subjective measures like RPE, Wellness etc rely on trust and education. I have seen RPE’s abused by athletes who gave false scores, looking to get an unload session or unload week.  I think all monitoring need to take into account some objective data (total meters, GPS, pitches thrown, contacts in jumps, HR impulse data etc ~ whatever is appropriate to the sport/athlete) and some subjective data (RPE, feeling).  Take a look at the photo of the poster that used to hang in our training center.  It gives a pretty clear message – you have a certain responsibility to do things to aid your recovery.  If you are not doing them why would we change our program to offset your lack of being a Technician, your lack of respect to the efforts of your team-mates.  But if you are doing everything right and are not recovering, then we will do something (unload you etc).

Boredom?  I have squatted just about every week of my life for 30+ years and I am not bored with it.  Training is always a challenge.  In saying that, sure things are manipulated to avoid early adaptation.  Subtle changes, dramatic changes, loading weeks, unloading weeks, strain, monotony etc.  But the reality is, athletes still need to train hard and get uncomfortable on a regular basis.  If they don’t want to do that, they need to take a big long hard look in the mirror ~ they no longer possess what is necessary to improve.


Figure 1. So the message is DO YOUR JOB AS A TECHNICIAN ATHLETE FIRST!! 



Mladen:  How do you juggle with different “functional” groups in the squad? By functional I refer to starting line-up, traveling guys, reserves and injured guys. Do injured guys train harder than the rest? How do you deal with reserves – do they train harder/more than starting guys and how do you keep up their motivation to do so? Are there used to be any squad rotation rules, so the reserves know they have a chance to start if they work hard(er)?

Dan:  I will talk about in-season stuff here, as that is what I think you are alluding to.  The NRL is different to soccer in that all clubs must name their teams by 12 pm Wednesday each week for the weekends games, there is no cat-and-mouse about the starting line-up till 2-hrs beforehand like in soccer.  We would typically play on a Friday night, so we would know most of our team by Tuesday even (sure sometimes you waiting on blokes injuries to heal or not, but the majority of the team is known).  So those not in the NRL team play State League, on a Saturday or Sunday.  So we would have three groups – NRL players, State League players and injured players for that week.  The injured players follow the same schedule as the NRL players, the difference being on game day, they had their hardest energy system training session of the week (typically a cross-training session if they are injured and incapable of running).  Injured players don’t travel, they stay home and train with me.  The State League players schedule is typically 1 or 2 days off-set from NRL.  So if NRL lift on Monday and Wednesday, the SRL lift on Tuesday and Thursday.  So players follow the schedule based upon where they played the previous weekend (NRL or SRL) to start the week and by Tuesday or Wednesday at the latest they follow the schedule for their group for the following weekend.  Pretty straight-forward, the only glitch is when we are waiting on an NRL player to recover from injury, if he doesn’t make it, the SRL player gets called up to NRL late, he usually misses one lifting session.

Injured blokes get trained brutally hard.  It is an opportunity to improve energy system fitness without having to save their bodies for the intense blunt force trauma that rugby league contains.  We call the injured players “Rehab” and we would often play or sing the song “Rehab” ~ “they are trying to make me go to rehab, I say No, No, No”.  No one ever faked an injury because the option was, for example, light skill and tactical session with the team under the head coach or get mercilessly trained by myself and the other S & C staff in our cross-training shed (the Shed of Dread) looking at the rest of the team having fun doing ball-work. 


Figure 2.  The Shed of Dread for the cross-training of injured players who can’t run, tackle or wrestle.  Contains Concept 2 rowers, Concept wall mounted ski ergos, Watt bikes, spin bikes, boxing gear, battling ropes and more.  The poster on the wall, partially obscured by the glare, says “We are only as strong as our weakest”  You can’t escape from energy system conditioning, even if injured!



If anything players try to fake recovery from injury to stop being in Rehab!

So Rehab group did their strength work the same (as best they could, depending on their injury) but every time the team did skill/tactics, they would do cross-training for energy system fitness (http://www.danbakerstrength.com/free-articles/some-cross-training-workouts-to-enhance-your-energy-system-fitness/).  They knew that by doing this they will easily be able to fulfill the game fitness requirements once they return from injury.  Also reserves/bench players with limited game time get a top-up to their fitness (say 1-2 sets of 4-8 mins) of some of the drills I detailed here (http://www.danbakerstrength.com/free-articles/recent-trends-in-high-intensity-aerobic-training/), typically at the completion of the training sessions on a Monday and/or Tuesday.


"If anything players try to fake recovery from injury to stop being in Rehab!" ... With "Rehab" like this players might think twice before lying on being tired. Interesting strategy!


Mladen: How is training adapted to older players? What do they do more and what do they do less? How is this coordinated with the head coach? When the new guys arrive, how do you introduce them to the training system?

Dan:  Most of our older blokes don’t want to do anything different to the rest of the team.  We have tried in the past to protect some older players from high training loads, but the NRL is super-tough and the blokes playing it are mentally tough as well.  They don’t want easy or soft options “Soft options make you soft” or “Easy options makes you easy to beat” are two mottos bandied around. 

If we see an older bloke is close to breaking down, we order him down, don’t let it appear he is being given an option, otherwise he loses respect from the other tough blokes.  So if they were close to breaking down, they might be ordered to do the skill/tactical session with the team on the field and resort to the Shed of Dread for off-feet conditioning.  This may deload their running volume by 10-20% for the week, not huge but maybe enough to prevent something bad.  Again, they normally didn’t want to do this, they would see it as a sign of weakness, of softness.  But we would make them.  It was pretty rare anyway, once or twice per season for 1-4 players at most during in-season but a bit more during pre-season when loads are double or more.

We also try and protect the younger blokes from the greater total training load and contact.  Same thing, substitute some off-feet cross-training for some of their running conditioning and/or defensive tackling work.  It is hard, head coaches want them toughened up and team-mates want to see them doing the same intense training as they are and so on.  This is very difficult to manage, just as I explain in the HRV section ~ how do you keep a key player out of important tactical training sessions and how often?  How does a younger player win respect from experienced players when he is doing modified training loads? 

New blokes to the club, they get about a two-minute introduction and induction and away we go~ obviously they would have a physio/medical screening before hand and their problems discussed, but apart from that, full steam ahead!!  No need to explain too much except to say “See that cone 75 m away – run there in 15 s!”  But I would typically put news guys in a small training group with players experienced at our club, to help guide them through.  And our senior players would help guide them through – that is leadership.




Mladen: How do you keep player accountable using GPS? Do they have to fill certain norm during practices and how does this affect tactical behavior on the field – do rules like these affect running on practices/games in the bad way? How do you know the necessary dose that needs to be fulfilled by players and/or position?

Dan:  Playing small-sided games, each game would have a running demand, say “x” m’s in 5-mins for “soccer rugby”, which is the rules of soccer but passing the ball with the hands ~ a conditioning game designed for more running content and no contact. Failure to attain your meters meant extra penalty conditioning drills at the end of the session.  For skill and tactical sessions, it would depend on the nature of the session.  If it was a light skills (learning) session, then there would be no real intensity to be met.  If it was intense skills session, then a certain % of the session should have been above “x” m/s (I can’t reveal those figures).  Not always penalties, more education, because sometimes the flow of the game/session results in you not achieving those scores.  But if one player in a similar position did the prescribed intensities during skills training and another didn’t, we would point out that fact. 

But certain intense skill sessions may also include the alternating of traditional conditioning sets with small-sided game sets with intense tactical sets, so the intensity of the dose is typically fulfilled, because we plan it in detail.  If a player does not make their intensities and everyone else does, sometimes penalties at the end.  Do the intensity or do extras!  Another motto.



Mladen: What are your thoughts on HRV technology? Also, which one do you prefer – objective or subjective monitoring and why?

Dan:  I have not used HRV.  It would be very difficult in a team environment.  For example, if a star player in a key role has a young baby keeping him up at night, destroying his sleep and recovery and he shows up less than optimal, do we rest him and the whole team cannot train properly tactically?  This scenario goes on for weeks and months in some occasions.  What about for big games, championships, there is a lot of stress – performance anxiety, media stress, travel stress, stress from family and friends ringing up and looking for free tickets and autographed memorabilia (you would not believe how much this happens and stresses players).  If a player was not used to training under some stresses, how do we know if they can compete and win under such stressful situations.  I am not discounting HRV, I need to see how it could be implemented in a large team environment.  I can see its benefits to an individual athlete, especially a technician with a propensity for over-training (save them from themselves), but I can see many more problems with a team with varied personalities and with a coach who wants everyone training, all the time.

So I can see its value but I can see implementation problems, especially in team sports.  Also there are times when we want players over-reached, so that they rebound in a few weeks time ~ what does HRV do about that?  As MMA female champion Rhonda Rouseys mother said “I wasn’t training you to win the world championship on the day you were feeling good, I was training you to make you good enough to win on your worst day, with injuries and feeling bad”.  I just need to see it in use, how a winning team uses it.  Anyone, including perpetual losers, can use HRV and claim stuff and I don’t discount those experiences, but how are winners using it?  How did it change their management of training?  That is what I need to see, a teams use of it or a rowing “Eight”, the data they got back, how they managed the team training and how it changed the teams performance.  So far I have only seen individual athletes (runners, MMA etc) claiming stuff.


Time to slow down with all that "monitoring" in team sports....



Mladen: I believe most of the physical tests you use are rather prescriptive than descriptive/benchmark (e.g. MAS, 1RMs). In the case of benchmark, how often do you test/monitor players and how do you modify the program based on the score? Are there any incentives/punishments for lousy scores, especially after the off-season? 

Dan:  Firstly on return to training after off-season.  Typically we would allow for a certain % decline in MAS scores for every week of off-season.  So with a typical 7-wk off-season a player would have to return with a MAS score within certain % of his previous season best.  So if the previous PB was 1200m in 5-mins, then report back to training and run a time within that certain pre-determined % decline.  Simple.  Only lazy or dumb fuckers can’t do that, come back at a known % decline of their best.  By that I mean once per week for 2-wks before returning you could do a 5-min run, see where you are at in preparation for the return to training test, take some action if needed.  Then we would aim to get everyone back to 100% of the MAS in a progressive fashion within 4-or so weeks (Gen Prep).  Players who fail the return test are given an extra session of cross-training on Saturday morning (typically a day off).  This is the punishment, but it is also because we need them to improve.  At times during that 4-wks, if they feel they could retest and escape Saturday morning, they can try.  We don’t want them in on Saturday, they put themselves in by being lazy or dumb fuckers!  We know that the MAS score correlates with meters covered in small-sided games and competition games. 

I don’t test strength on return to training.  I assume every male would do some upper body training, lighter hypertrophy-oriented sets and reps (and they do), so generally that is maintained pretty close to before, again within a certain %.  Squats and lower body are trained lighter in an off-season, if at all and I am ok with that, the legs cop a pounding and need a rest.  So leg training starts off with much lower training %1RM than upper body.  

Again we build up so that by wk 4-7, they should be hitting their previous PB for squats (whether judged as 1 or 3 or 5RM) and equaling or beating their PB’s for upper body. 
So using this first 4-7 wks to reclaim everything and then using the next block (3-6 wks) to build new capacity levels.

I keep detailed records of each players 1, 3 and 5 RM under different conditions in the key exercises.  For example, 3RM normal, with bands, with chains. So we are regularly working up to these capacities (as a 3 or 5RM), after we are “in shape.”  They key thing is, once a player “is in shape” and technically stable, he can really only miss equaling or beating their previous PB due to 1.  A niggling or major injury  2.  Under-recovering/fatigue  3.  Lack of desire.  Either way we know something is wrong.  From regular “testing” (ie. Analyzing the training Max Effort sets, not every week, but about half of the weeks during the in-season) we have a monitoring system.  Any negative changes, we assume 1 (injury) or 2 (fatigue) from above, work out which one, determine what we need to do (or what they need to do).
.

Mladen: How do you deal with certain player wanting to do training outside of club – e.g. hire a personal strength or speed coach?

Dan:  It is not allowed or doesn’t happen in our situation, for the following reasons.  1.  If that outside trainer was any good, they would be working for a pro team or government institute of sport, because they pay the best for the best people.  So he is most likely a bum or charlatan  2. Outside trainers, having no responsibility to the team, what are they doing to that player, how is it going to affect the team?  If there is someone that is independent that a player wants (and I have never seen it), then see the next point 3. Most importantly, all people having contact/training players must be signed by the club and be cleared by the Integrity Unit for that pro sport.  This is to stop drug peddler trainers, “supplement gurus”, match-fixer trainers, charlatans, the whole gamut of shady characters on the periphery of pro sports who are trying to get in but are not good enough or have no integrity, weaning themselves into a players life and causing irreparable damage to the sport.  So even if a player wanted “their own” trainer, they would still have to be hired by the club and be cleared by the integrity unit.  So saying you wanted “your own trainer”, it wouldn’t work in my sport anyway, as saying you wanted your own trainer is to say “I am a prima donna, I have special needs beyond the team” ~ the other tough blokes in the NRL squad would beat the shit out of you (in contact sessions) and verbally mock you for thinking you are better than them, that you need separate special training/mollycoddling. 

I would bet that those athletes who want their trainer would use one who trained them less volume and/or less intensity.  Are these athletes coming up to you and saying ”I need my own trainer because we as a team are not training hard enough, intense enough, we are not doing the compound exercises, we are not supra-max MAS running and I feel I could play better by training so much harder”, I bet not.  They are looking for easy options, soft options, (see my mottos above) mollycoddling.  “Mate you are not special, despite what your parents told you” So this is not an Australian problem, but I know it exists in Europe and the USA, but they have different systems.



Unfortunately, excluding NRL sport is full of Prima donnas ... 


Mladen: I would like to see a study where they compare the group that believes that concurrent training is the best way to train and the group that believes that concurrent training is the worst thing one can do, along with neutral/control group. It seems that belief influence things a lot. How do you get players to buy-in into the program and do you believe that with concurrent/mixed approach one needs to develop work capacity first to sustain mixed and increased loads? How does one achieve that and what are your thoughts now on concurrent/mixed vs. block/sequential approaches compared to two years ago?

Dan:  I don’t know where this belief that you can’t concurrent train at all came from.  We are in a sport (rugby league, the NRL) that requires strength/power and energy system fitness as well as brutal unpadded collisions, so concurrent training is the norm and has been for >100 yrs, since the sport existed.  It is the norm since kids are 7 years old, we deal with it easy, mentally, it just has to be done.  When I was 7-years old, we ran, wrestled, did situps, pushups, jumps and so on as well as the skills of the game and that was 41-years ago (yes that’s right, Crossfit didn’t invent concurrent training).  And we could go back to the Ancient Greek Olympics, to the Pankration fighters and their 4-day training cycle called the tetrad, mixing technique days, conditioning days, full contact sparring days and so on.  Concurrent training has always existed, but due to the time-out system in US sports, some people have lost sight of this and the fact that we can easily do it if we train appropriately.

Will NRL players and other concurrent trainers be as strong as lifters, NO, will they be as aerobically fit as triathletes, NO, will they be as fast as sprinters, NO, but they need to be reasonably good at all those things.  So they need to train all things.  I can’t think of a time when any NRL athlete trained just one quality. 

So we train conjugate/concurrent within a week, but in a sequential block manner, with one block building upon another and leading to the overload that we want. Not much or anything has changed in my beliefs compared to 2-years ago.  We believe in concurrent training for these types of mixed sports, but it does not mean I used it when I coached powerlifters or divers!  Every sport has their specific demands as well.



Mladen: And now couple of more practical questions. Can you briefly describe your in-season approach to training (e.g. strength, power, speed, conditioning)? How do you avoid boredom in the long in-season? How do you avoid soreness with too much drill/exercise rotations and variety?

Dan:  Briefly? No, I could write books on it, so I can’t go into detail here.  Details will be revealed when I do seminars, lectures and workshops. 

But in saying that, we always train hard but in a cycle with different objectives, in the strength work for example, some weeks are max effort, some are hard effort, some are medium or medium-hard to unload the neural and adrenal systems.  Some days are strength and muscle training days, some days are power/dynamic effort days.  In-season, the soreness does not come from exercises, it comes from the brutal contact in the game.  So soreness is normal, just a case of managing the week as best we can.

Boredom?  What do athletes want, to be entertained or to win?  Do they want jugglers and clowns at training (work) to be entertained?  When I worked construction, I didn’t say to the boss “This is boring, can you change my work to entertain me”.   Sense of entitlement shit!!!  I vary training to avoid adaptation, to keep progression in training happening, not to entertain blokes with short attention spans and a lack of desire to work hard.



Athletes are here to TRAIN, to get better and to win, not to have fun activity. Coaches are not here to be liked by players, but to make players better. (Although these concepts are not mutually exclusive, there is a sweet spot IMHO)


Mladen: What is the role of combat training for rugby players (e.g. wrestling, BJJ, boxing)? How is that implemented into rugby training. What about various strongman implements like sandbags (or boxing bags), sleds (prowlers), farmers walk, tires, etc?

Dan:  NRL is combat, the structure of the game is based upon warfare.  So almost all forms of combat have some application.  Every team does some form of grappling because it was always how we put someone on the ground and controlled them.  Every team uses boxing drills as well.  I don’t really like strongman for a team due to its imprecise overload when you have large numbers of athletes with disparate body sizes and strengths (see this article,  Baker, D.  Strongman training for large groups of athletes.  Journal of Australian Strength & Conditioning. 16(1):33-34.  2008.).  One-on-one or for a very small group some strongman stuff is OK, especially during late rehab when a player is close to resuming playing.  I would rather players wrestle rather than do strongman, but sometimes a player can’t do wrestling, so that is where we would use modified strongman stuff (more controlled environment).


Do more wrestling.... less tire flipping 



Mladen: Thanks for the insights Dan. What are you plans for the future? When can we expect the Dan Baker Training system book?

Dan:  There will be no book.  I will do lectures, symposiums, conference presentations, consulting, whatever, but no book.  I am a people person, better to hear and see me talk if you want to learn more.




Tuesday, November 19, 2013

Understanding inferential statistics using correlation example

Understanding inferential statistics using correlation example

Understanding inferential statistics using correlation example

Introduction

In the following R and knitr experiment/blog post I will be documenting my play with correlation and inferences. I am just reading Discovering Statistics Using R by Andy Field and I am trying to code some staff from the book, plus experiment and see how inferential statistis work.

Simulations are great way to learn statistics in my opinion and in opinion of Will Hopkins. I hope that someone might find this blog post intresting and learn a thing or two.

As I have pointed out in previous blog posts, sport coaches are not interested in inferential statistics, but rather individual reaction/effects, yet most if not all research utilize inferential statistics. Why is that? Because in research we are interested in effects overall (or on average) on a given population, and not on a single individual or sample. In research, subjects are just vehicles, a way to get numbers/estimates or obeservations, while in sport they are what matters the most.

Since it very hard to measure the whole population, we need to make inferences from smaller sample to the bigger population. To do this we use Central Limit Theorem and estimated standard error (it is beyond me why standard error is not called sampling error, because it conveys much more meaning).

Understanding this of crucial importance to understand statistics and I have struggled with this mainly because most books don't put much pages/emphasis on getting it and jump to ANOVAs and all thet fancy stuff too soon.

Enough of my rant - I hope that this blog post might yield some light on population/sample inferences for the students. I will use correlation as an estimate we are interested into (it could be mean, SD, Cohen's effect size, whatever - the idea is the same).

Population correlation

Creating population with two estimates that correlate - in this case squat and vertical jump in athletes (NOTE: All data are imaginary for the sake of an example)

populationSize <- 10000

# Simulate vertical jump and squat estiamtes in population
randomError = 8
populationSquatKG <- rnorm(populationSize, mean = 150, sd = 10)
populationVerticalJumpCM <- populationSquatKG * 0.45 - 20 + rnorm(populationSize, 
    mean = 0, sd = randomError)

# Graph the populations and scatter
par(mfrow = c(1, 3))

hist(populationSquatKG, 30, col = "blue", xlab = "kg", main = "Squat 1RM in kg")

hist(populationVerticalJumpCM, 30, col = "yellow", xlab = "cm", main = "Vertical Jump Height in cm")

plot(populationSquatKG, populationVerticalJumpCM, col = "grey", main = "Scatterplot between Squat \nand Vertical Jump", 
    xlab = "Squat 1RM in kg", ylab = "Vertical Jump Height in cm")

# Add Text (r=) on the graph
text(min(populationSquatKG) * 1.1, max(populationVerticalJumpCM) * 0.9, paste("r=", 
    as.character(round(cor(populationSquatKG, populationVerticalJumpCM), 2)), 
    sep = ""), cex = 1.5)

plot of chunk unnamed-chunk-1

In the population above r=0.49 between vertical jump and squat. Let's see what happens with correlation when we modify the random error parameter.

par(mfrow = c(3, 4))

for (randomError in seq(from = 0, to = 30, length.out = 12)) {

    popVerticalJumpCM <- populationSquatKG * 0.45 - 20 + rnorm(populationSize, 
        mean = 0, sd = randomError)

    # Graph the scatter
    plot(populationSquatKG, popVerticalJumpCM, col = "grey", xlab = "Squat 1RM in kg", 
        ylab = "Vertical Jump Height in cm", main = paste("r= ", as.character(round(cor(populationSquatKG, 
            popVerticalJumpCM), 2)), "\nRandom Error= ", as.character(round(randomError, 
            2)), sep = ""))

}

plot of chunk unnamed-chunk-2

As can be seen from the graphs above the higher the random error the lower the correlation.

Sample correlation

Let's get back to our original population where r=0.49 between vertical jump and squat. Because we can't measure the whole population (practically), we need to take a small random sample from the population (N=50).

# Put our population squat and vertical jump into data frame for easier
# sampling
population <- data.frame(squat = populationSquatKG, verticalJump = populationVerticalJumpCM)

# Take a sample from the population
sampleSize <- 50

sample <- population[sample(nrow(population), sampleSize), ]

# Graph the sample histogram and scatter
par(mfrow = c(1, 3))

hist(sample$squat, 10, col = "blue", xlab = "kg", main = "Sample Squat 1RM in kg")

hist(sample$verticalJump, 10, col = "yellow", xlab = "cm", main = "Sample Vertical Jump Height in cm")

plot(sample$squat, sample$verticalJump, col = "grey", main = "Scatterplot between sample Squat \nand Vertical Jump", 
    xlab = "Squat 1RM in kg", ylab = "Vertical Jump Height in cm")

# Add Text (r=) on the graph
text(min(sample$squat) * 1.1, max(sample$verticalJump) * 0.9, paste("r=", as.character(round(cor(sample$squat, 
    sample$verticalJump), 2)), sep = ""), cex = 1.5)

plot of chunk unnamed-chunk-3

What you will notice is that correlation in the whole population (N=104) r=0.49 differs from the correlation in the sample (N=50) r=0.58. What we are interested in are making inferences to a whole population from a small sample. As you can see they are not the same, so how can we be certain what is the correlation in the population from drawing a small sample?

Sampling distribution

What would happend if we repeat the sampling a lot of times? Let's try out and let's plot the sampling distribution of the sample correlation (N=50).

samplingNumber <- 5000

sampleCorrelation <- rep(0, samplingNumber)

for (i in seq(from = 1, to = samplingNumber)) {
    # take sample from population
    sample <- population[sample(nrow(population), sampleSize), ]

    # remember the correlation
    sampleCorrelation[i] = cor(sample$squat, sample$verticalJump)
}

# Plot the histogram of sampling correlations
hist(sampleCorrelation, 50, col = "blue", main = "Sampling distribution of the sample correlation", 
    xlab = "Correlation", xlim = c(0, 1.2))

# Plot the line representing correlation in the population
abline(v = cor(populationSquatKG, populationVerticalJumpCM), col = "red", lwd = 3)

plot of chunk unnamed-chunk-4

What is noticable is that by repeating sampling most of our sample correlations are distributed around population correlation (red line). The “spread” around mean (correlation in population) is expressed with standard deviation and it is called standard error (SE).

Interestingly enough, standard error (or standard deviation of sampling distribution of the sample correlation; yeah a mouthful) depends on the sample size. The larger the sample size (N) the lower the standard error (SE) and vice versa. Let's simulate it and plot it.


samplingNumber <- 5000


sampleCorrelation <- rep(0, samplingNumber)

par(mfrow = c(4, 3))
for (sampleSize in seq(from = 10, to = 120, length.out = 12)) {

    for (i in seq(from = 1, to = samplingNumber)) {
        # take sample from population
        sample <- population[sample(nrow(population), sampleSize), ]

        # remember the correlation
        sampleCorrelation[i] = cor(sample$squat, sample$verticalJump)
    }
    # Plot the histogram of sampling correlations
    hist(sampleCorrelation, 50, col = "blue", main = paste("N=", sampleSize, 
        "\nSE=", round(sd(sampleCorrelation), 2)), xlab = "Correlation", xlim = c(0, 
        1.2))

    # Plot the line representing correlation in the population
    abline(v = cor(populationSquatKG, populationVerticalJumpCM), col = "red", 
        lwd = 3)
}

plot of chunk unnamed-chunk-5

As can be seen from the graph above the larger the sample size the smaller the standard error (SE).

Adjusted correlation coefficient

Before we can proceed with how to use SE to make inferences, we must deal with one assumption to make this work. For all inferential tests in statistics there must be couple of assumptions met - one of the most important one is the assumption of normality. One problem with sampling distribution of the sample correlation is that it is not normally distributed. Hence, correlation coefficient have to be adjusted. I am not going into that formula here, but I will compare sampling distribution of unadjusted and adjuster correlation using sample size N=50

samplingNumber <- 5000
sampleSize <- 50

sampleCorrelation <- rep(0, samplingNumber)
sampleAdjustedCorrelation <- rep(0, samplingNumber)

for (i in seq(from = 1, to = samplingNumber)) {
    # take sample from population
    sample <- population[sample(nrow(population), sampleSize), ]

    # remember the correlation
    sampleCorrelation[i] = cor(sample$squat, sample$verticalJump)
    sampleAdjustedCorrelation[i] = 0.5 * log((1 + sampleCorrelation[i])/(1 - 
        sampleCorrelation[i]))

}

# Plot the histogram of sampling correlations
par(mfrow = c(2, 2))

h <- hist(sampleCorrelation, 50, col = "grey", main = "Sampling distribution of \nthe sample correlation", 
    xlab = "Correlation", xlim = c(0, 1.2))

# Plot the line representing correlation in the population
abline(v = cor(populationSquatKG, populationVerticalJumpCM), col = "red", lwd = 3)

# Plot the line representing normal distribution
i <- seq(from = 0, to = 1.2, length = 50)
normalCurve <- dnorm(i, mean = mean(sampleCorrelation), sd = sd(sampleCorrelation)) * 
    samplingNumber * diff(h$mids[1:2])
lines(i, normalCurve, col = "blue", lwd = 3)

# Draw QQ Plot
qqnorm(sampleCorrelation, col = "blue", main = "Q-Q plot of Sampling distribution of \nthe sample correlation")
qqline(sampleCorrelation)

h <- hist(sampleAdjustedCorrelation, 50, col = "grey", main = "Sampling distribution of \nthe adjusted sample correlation", 
    xlab = "Correlation", xlim = c(0, 1.2))

# Plot the line representing correlation in the population
populationCorrelation <- cor(populationSquatKG, populationVerticalJumpCM)

adjustedPopulationCorrelation <- 0.5 * log((1 + populationCorrelation)/(1 - 
    populationCorrelation))

abline(v = adjustedPopulationCorrelation, col = "red", lwd = 3)

# Plot the line representing normal distribution
i <- seq(from = 0, to = 1.2, length = 50)
normalCurve <- dnorm(i, mean = mean(sampleAdjustedCorrelation), sd = sd(sampleAdjustedCorrelation)) * 
    samplingNumber * diff(h$mids[1:2])
lines(i, normalCurve, col = "blue", lwd = 3)

# Draw QQ Plot
qqnorm(sampleAdjustedCorrelation, col = "red", main = "Q-Q plot of Sampling distribution of \nthe adjusted sample correlation")
qqline(sampleAdjustedCorrelation)

plot of chunk unnamed-chunk-6

Red vertical lines represent regular and adjuster correlation coefficients in the population. Using Q-Q plot we can visually inspect deviations from the normal curve (adjusted has very small deviation compared to normal)

With adjusted correlation coefficient we can calculate SE from formula, and because it is normaly distributed we can calculate probabilities from normal distribution using Z-scores.

Formula for standard error (SE) for adjusted correlation coefficient is

SE = 1 / SQRT( N - 3 )

According to this formula, SE for our population is 0.15 when N=50. The standard deviation of simulated distribution of adjusted sample correlation is 0.14, which is very close. So the formula is correct.

Once we are done with the statistical tests or calculation of confidence intervals (more in a minute) we can re-adjust the coefficient to the normal one. Hence, we are using adjusted correlation coefficient for his SE normal distribution to perform statistical tests. Once we are done with that we convert it to the original. Make sense?

Statistical significance and Null Hypothesis Testing

Knowing standard error (SE) in the population we can calculate probability of acquiring certain sample result. The only problem is that we don't know the population SE. The only option we can do is to use sample SE.

What we have in real life/study (without knowing the whole population) is

  • Adjusted Correlation in the sample
  • Standard error of the sample

We know that in the population, adjusted sampling correlation is distributed normaly around mean equal to population correlation with standard deviation equal to standard error (SE).

plot of chunk unnamed-chunk-7

We can check if the sample adjusted correlation is statistically significant by utilizing Null Hypothesis Testing (NHT) and calculating p-value. How does this work?

We assume that there is no correlation in the population (r=0). This is called Null Hypothesis. We know that the standard error of null hypothesis (r=0) is equal to standard error of our sample.

What we are interested into is the probability of acquiring score (sample correlation) at least extreme as we did assuming that null hypothesis is true (population r=0).

On the following picture, Null Hypothesis (r=0) is depicted as blue, and the sample correlation we might got is depicted as red line.

plot of chunk unnamed-chunk-8

Using two-tail test, we can calculate the probability of acquiring score at least as extreme as we did, assuming the Null is true. This called p-value. Since we know the distribution and mean of Null, we can calculate p-value using simple Z-Test (that gives us Z-Statistic or Z-Number). Please note that probability is surface under curve.

Let's take one sample and calculate p-value

sampleSize <- 10

sample <- population[sample(nrow(population), sampleSize), ]

sampleCorrelation <- cor(sample$squat, sample$verticalJump)

sampleAdjustedCorrelation <- 0.5 * log((1 + sampleCorrelation)/(1 - sampleCorrelation))

standardError <- 1/sqrt(sampleSize - 3)

Z.Statistic <- sampleAdjustedCorrelation/standardError
pValue <- (1 - pnorm(Z.Statistic, mean = 0, sd = standardError)) * 2

This is what we got

  • r= 0.38
  • Adjusted r= 0.4
  • SE= 0.38
  • p-value= 0.0046

If p-value equals to zero, that means it is so low that it is beyond R's calculation power.

Researchers usually take 0.05 as Alpha level to accept statistical significance and reject null. So if p-value < 0.05 then we reject null hypothesis and say that we are >95% certain that there is a effect in the population (in this case correlation in the population)

BTW, in the calculus above I should have used T-Test and T-Statistic instead of Z-test, but for the sake of simplicity I took Z-score. The calculus using T-Test is a bit different. One could use R function cor.test

correlationTest <- cor.test(sample$squat, sample$verticalJump)

print(correlationTest)
## 
##  Pearson's product-moment correlation
## 
## data:  sample$squat and sample$verticalJump
## t = 1.176, df = 8, p-value = 0.2733
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3239  0.8163
## sample estimates:
##   cor 
## 0.384

Using cor.test we get p-value of 0.2733.

Dance of p-values

What happens with p-value if we repeat the sampling numerous times for different sample sizes? Let's find out.

samplingNumber <- 5000


samplePValue <- rep(0, samplingNumber)

par(mfrow = c(3, 3))
for (sampleSize in c(3, 5, 10, 15, 20, 25, 30, 35, 40)) {

    for (i in seq(from = 1, to = samplingNumber)) {
        # take sample from population
        sample <- population[sample(nrow(population), sampleSize), ]

        # remember the p-value
        samplePValue[i] <- cor.test(sample$squat, sample$verticalJump)$p.value
    }
    # Plot the p-value distribution
    hist(samplePValue, 50, col = "blue", main = paste("N=", round(sampleSize, 
        0)), xlab = "p-value")
    abline(v = 0.05, col = "red", lwd = 3)
}

plot of chunk unnamed-chunk-11

The red line represent 0.05 cut-off (alpha) for statistical significance.As you can see the distribution of p=values depends a lot on sample size. Bigger the sample the higher the chance to get statistical significance score. You can find more about it in this video by Geoff Cumming

Another way to represent inferences to a population are confidence intervals, and in my opinion and opinion of many others, are much better way than p-values. I might cover confidence intervals in some future posts.