Tuesday, November 12, 2013

Quick correction in simulating Typical Error

Quick correction in simulating Typical Error

Quick correction in simulating Typical Error

I am awaiting review and opinions/critiques from statistics wizzard Will Hopkins for my How to visualize test change scores for coaches, but even on his first look he pointed to one small error I did in simulating typical error for vertical jump. It is not huge error, but it is an error in representing typical error in measurement.

What I did wrong?

This is what I did wrong (for this example I will generate 1000 athletes/samples)

# Load library for random names generation for athlete
library(randomNames)

# Simulate vertical jump scores for 1000 players, with an average of 55cm
# and SD of 6cm We are going to repeat the assessment on two days in the
# same conditionins and add some 'noise'
set.seed(505)

reliabilityTest1CM_old <- rnorm(1000, mean = 55, sd = 6)
reliabilityTest2CM_old <- reliabilityTest1CM_old + runif(1000, min = -2, max = 2)

# Let's name those 1000 athletes
reliabilityPlayerNames_old <- randomNames(1000)

# Put the data in data frame
reliabilityData_old <- data.frame(Athlete = reliabilityPlayerNames_old, Test1 = reliabilityTest1CM_old, 
    Test2 = reliabilityTest2CM_old)

# Calcualte the differences between tests
reliabilityData_old <- within(reliabilityData_old, Diff <- Test2 - Test1)

Here are our first 10 athletes from the table

Athlete Test1 Test2 Diff
1 O'Connor, Joshua 48.27 49.04 0.76
2 Matthews, Tisa 47.31 46.94 -0.37
3 Obi, Victoria 42.76 42.88 0.12
4 Leal, Alexandra 49.37 48.09 -1.28
5 Salazar, Ana 51.94 53.61 1.68
6 Nunez-Medina, Makeyla 52.52 53.17 0.65
7 Shin, Nicholas 49.54 49.81 0.28
8 Smith, Bruce 55.84 57.57 1.73
9 Bayarsaikhan, Gabriel 58.39 58.10 -0.29
10 Yarangsy, Miles 52.41 52.13 -0.27

What I did wrong is that I have added uniform noise to the first test

reliabilityTest1CM_old <- rnorm(1000, mean=55, sd=6)

reliabilityTest2CM_old <- reliabilityTest1CM_old + runif(1000, min=-2, max=2)

What I should have done to simulate this more methodically and correctly is to create real test scores and add normal “noise” (TE) to BOTH of them. Here is the right code

set.seed(505)

# Create the 'REAL' scores in vertical jump with mean=55 and SD=6 for 1000
# athletes
realVerticalJumpCM <- rnorm(1000, mean = 55, sd = 6)

# Now add 'noise' (TE) to both reliability tests. In this case noise has
# mean=0 and SD=1. If mean of the TE would be >0 then that would be a
# constant error.

reliabilityTest1CM <- realVerticalJumpCM + rnorm(1000, mean = 0, sd = 1)
reliabilityTest2CM <- realVerticalJumpCM + rnorm(1000, mean = 0, sd = 1)

# Let's name those 1000 athletes
reliabilityPlayerNames <- randomNames(1000)

# Put the data in data frame
reliabilityData <- data.frame(Athlete = reliabilityPlayerNames, Test1 = reliabilityTest1CM, 
    Test2 = reliabilityTest2CM)

# Calcualte the differences between tests
reliabilityData <- within(reliabilityData, Diff <- Test2 - Test1)
Athlete Test1 Test2 Diff
1 Martinez, Kuan-Hsuen 48.77 47.65 -1.12
2 Earvin, Colton 47.38 46.84 -0.54
3 Vigil, Analee 44.16 43.63 -0.53
4 Bruce, Victoria 49.55 49.44 -0.11
5 Slechter, Joshua 51.76 52.26 0.50
6 Kaeser, Lnay 52.35 54.24 1.89
7 Shibles, Kevin 50.82 49.11 -1.70
8 Russell, Jonathan 54.21 56.85 2.64
9 Chavez, Selena 58.96 58.21 -0.75
10 Saldivar, Ebonnie 51.09 52.33 1.24

Here is the histogram of difference scores between test1 and test2 in both methods

par(mfrow = c(1, 2))

hist(reliabilityData_old$Diff, 30, col = "red", main = "Old method with uniform error", 
    xlab = "Vertical Jump difference [cm]")

hist(reliabilityData$Diff, 30, col = "blue", main = "Correct method wih normal error", 
    xlab = "Vertical Jump difference [cm]")

plot of chunk unnamed-chunk-5

Let's calculate TE for both simulations. I need to copy paste our function for TE calculus first.

typicalError <- function(data, test1 = "Test1", test2 = "Test2", na.rm = TRUE) {
    # Remove the missing values
    if (na.rm) 
        data <- na.omit(data)

    # Calculate the difference from named collums
    diff <- data[[test2]] - data[[test1]]

    return(sd(diff)/sqrt(2))
}

# Calculate Typical Errors for both simulations
TE_old <- typicalError(reliabilityData_old)
TE_correct <- typicalError(reliabilityData)



cat("Simulation #1 with uniform error\nTE=", round(TE_old, 3), "\n")
## Simulation #1 with uniform error
## TE= 0.822
cat("\nSimulation #2 with normal error\nTE=", round(TE_correct, 3), "\n")
## 
## Simulation #2 with normal error
## TE= 0.987

As you can see, TE in correct method of simulation (TE=0.987) is VERY close to SD=1 of our normally distributed noise we have added to the “real” vertical jump.

I hope I haven't made any errors now… But Will will notice :)

No comments:

Post a Comment