“Power” to detect <strong><em>statistically significant</em></strong> effects based on sample size and magnitudes of effects

“Power” to detect statistically significant effects based on sample size and magnitudes of effects

I was going through magnitude-based inferences materials by Will Hopkins and I am playing with R simulations. I wanted to see how many times I am able to detect statisticaly significant effects (p<0.05) depending on magnitude of effects (expressed as Cohen's D, and using Will Hopkins levels) and sample sizes.

What I did is created a baseline group (mean = 100, SD = 10), and 5 more groups based on magnitude of difference (Trivial, Small, Medium, Large, Very Large) and repeated this for different number of subjects. Then I calculated p values using t test between baseline group and 5 other groups for each number of subjects. Then I repeated this process 1000 times and counder significant effects (p<0.05).

The result is the table showing how many times (percentage) in those 1000 resampling I was able to detect statisticly significant effect depending on the number of subjest of magnitude of change (from baseline group).

Here is the code and the resulting table:

effect.magnitudes <- c(0, 0.2, 0.6, 1.2, 2, 4)
effect.names <- c("Baseline", "Trivial", "Small", "Moderate", "Large", "Very.Large")
subjects.list <- seq(from = 5, to = 200, by = 10)

p.value <- matrix(0, nrow = length(subjects.list), ncol = length(effect.names) - 
    1)
colnames(p.value) <- effect.names[-1]
rownames(p.value) <- subjects.list
alpha <- 0.05

re.sampling <- 1000
significant.effects <- matrix(0, nrow = length(subjects.list), ncol = length(effect.names) - 
    1)
colnames(significant.effects) <- effect.names[-1]
rownames(significant.effects) <- subjects.list

for (k in 1:re.sampling) {
    for (j in seq_along(subjects.list)) {
        subjects <- subjects.list[j]
        standard.deviation <- 30
        sample.mean <- 100
        dataSamples <- matrix(0, nrow = subjects, ncol = length(effect.magnitudes))

        for (i in seq_along(effect.magnitudes)) dataSamples[, i] <- rnorm(n = subjects, 
            mean = sample.mean + standard.deviation * effect.magnitudes[i], 
            sd = standard.deviation)


        colnames(dataSamples) <- effect.names
        dataSamples <- as.data.frame(dataSamples)

        p.value[j, 1] <- t.test(dataSamples$Baseline, dataSamples$Trivial)$p.value
        p.value[j, 2] <- t.test(dataSamples$Baseline, dataSamples$Small)$p.value
        p.value[j, 3] <- t.test(dataSamples$Baseline, dataSamples$Moderate)$p.value
        p.value[j, 4] <- t.test(dataSamples$Baseline, dataSamples$Large)$p.value
        p.value[j, 5] <- t.test(dataSamples$Baseline, dataSamples$Very.Large)$p.value
    }

    significant.effects <- significant.effects + (p.value < alpha)
}

significant.effects <- significant.effects/re.sampling * 100

	Trivial	Small	Moderate	Large	Very.Large
5	6	12	37	79	100
15	8	34	90	100	100
25	10	55	98	100	100
35	13	69	100	100	100
45	15	80	100	100	100
55	17	87	100	100	100
65	20	92	100	100	100
75	23	95	100	100	100
85	23	97	100	100	100
95	31	99	100	100	100
105	32	99	100	100	100
115	34	99	100	100	100
125	37	100	100	100	100
135	37	100	100	100	100
145	40	100	100	100	100
155	43	100	100	100	100
165	40	100	100	100	100
175	44	100	100	100	100
185	48	100	100	100	100
195	51	100	100	100	100

As can be seen from the table, the number of subjects needed to get over 80% statistical power (chances to detect real effect) for Trivial is well above 300 (I did regression to find it out), for Small around 50, for Moderate probably around 10, for Large over 5 and Very Large effects are always detected (minimum 5 subjects). Someone please correct me if I am wrong here

Here is the graph of the above table, but to graph it in ggplot we need to reshape it

library(reshape2)
library(ggplot2)

significant.effects <- as.data.frame(significant.effects)
significant.effects <- data.frame(sample.size = as.numeric(rownames(significant.effects)), 
    significant.effects)
rownames(significant.effects) <- NULL

significant.effects.long <- melt(significant.effects, id.var = "sample.size", 
    value.name = "Power", variable.name = "Effect.Size")



gg <- ggplot(significant.effects.long, aes(x = sample.size, y = Power, color = Effect.Size))
gg <- gg + geom_line()
gg <- gg + geom_hline(yintercept = 80, linetype = "dotted", size = 1)
gg

plot of chunk unnamed-chunk-3

Please refer to work by Will Hopkins on how to get Trivial, Beneficial and Harmful chances (magnitude-based inferences). Maybe next time I will create a table with mean Trivial, Beneficial and Harmful chances using the same approach.

Pages

Wednesday, February 12, 2014

“Power” to detect statistically significant effects based on sample size and magnitudes of effects

“Power” to detect statistically significant effects based on sample size and magnitudes of effects

No comments:

Post a Comment

	Trivial	Small	Moderate	Large	Very.Large
5	6	12	37	79	100
15	8	34	90	100	100
25	10	55	98	100	100
35	13	69	100	100	100
45	15	80	100	100	100
55	17	87	100	100	100
65	20	92	100	100	100
75	23	95	100	100	100
85	23	97	100	100	100
95	31	99	100	100	100
105	32	99	100	100	100
115	34	99	100	100	100
125	37	100	100	100	100
135	37	100	100	100	100
145	40	100	100	100	100
155	43	100	100	100	100
165	40	100	100	100	100
175	44	100	100	100	100
185	48	100	100	100	100
195	51	100	100	100	100

	Trivial	Small	Moderate	Large	Very.Large
5	6	12	37	79	100
15	8	34	90	100	100
25	10	55	98	100	100
35	13	69	100	100	100
45	15	80	100	100	100
55	17	87	100	100	100
65	20	92	100	100	100
75	23	95	100	100	100
85	23	97	100	100	100
95	31	99	100	100	100
105	32	99	100	100	100
115	34	99	100	100	100
125	37	100	100	100	100
135	37	100	100	100	100
145	40	100	100	100	100
155	43	100	100	100	100
165	40	100	100	100	100
175	44	100	100	100	100
185	48	100	100	100	100
195	51	100	100	100	100

	Trivial	Small	Moderate	Large	Very.Large
5	6	12	37	79	100
15	8	34	90	100	100
25	10	55	98	100	100
35	13	69	100	100	100
45	15	80	100	100	100
55	17	87	100	100	100
65	20	92	100	100	100
75	23	95	100	100	100
85	23	97	100	100	100
95	31	99	100	100	100
105	32	99	100	100	100
115	34	99	100	100	100
125	37	100	100	100	100
135	37	100	100	100	100
145	40	100	100	100	100
155	43	100	100	100	100
165	40	100	100	100	100
175	44	100	100	100	100
185	48	100	100	100	100
195	51	100	100	100	100