Kapitel 76 Mixed Models
76.1 To do:
- Unabhängiger T-Test mit einem gemischten Modell (was zwar keinen Sinn macht, aber zum Verständnis hilft)
- Abhängiger T-Test mit einem gemischten Modell (auch wieder nur zur Illustration)
76.3 Simulated Examples
=sample(c("Women", "Men"),size=100, rep=TRUE, prob=c(0.5,0.5))
independent_variable<-1:100
id<-ifelse(independent_variable=="Women", rnorm(length(independent_variable=="Women"), 50, 3), rnorm(length(independent_variable=="Men"), 53, 3))
dependent_variable<-data.frame(id, dependent_variable, independent_variable)
data::pairs.panels(data) psych
::describeBy(dependent_variable~independent_variable, data=data) psych
##
## Descriptive statistics by group
## independent_variable: Men
## vars n mean sd median trimmed mad min max range skew
## dependent_variable 1 49 53.27 3.16 53.61 53.23 3.32 46.2 60.53 14.33 0.06
## kurtosis se
## dependent_variable -0.44 0.45
## ------------------------------------------------------------
## independent_variable: Women
## vars n mean sd median trimmed mad min max range skew
## dependent_variable 1 51 50.59 3.08 51.09 50.52 2.88 45.1 59.9 14.8 0.3
## kurtosis se
## dependent_variable 0.09 0.43
76.3.1 Independent T-Test
Hier klicken für ein Tutorial zum R-Test in R
<-t.test(dependent_variable~independent_variable, data=data, var.equal=TRUE)
res.t.test res.t.test
##
## Two Sample t-test
##
## data: dependent_variable by independent_variable
## t = 4.2979, df = 98, p-value = 4.072e-05
## alternative hypothesis: true difference in means between group Men and group Women is not equal to 0
## 95 percent confidence interval:
## 1.443087 3.918901
## sample estimates:
## mean in group Men mean in group Women
## 53.26886 50.58787
$estimate[2]-res.t.test$estimate[1] res.t.test
## mean in group Women
## -2.680994
76.3.2 lineare Regression
Der Koeffizient der Variable independent_variable für die Antwortsoption “Women* ist die Differenz der Frauen zu den Männern, die Frauen haben im durchschnittlich -2.6986 Punkte. Die P-Wert sind nur gleich, wenn wir beim T-Test annehmen, dass die Varianzen in beiden Gruppen gleich gross sind (var.equal=TRUE). Siehe dazu auch (hier klicken)
<-lm(dependent_variable~independent_variable, data=data)
lm.fitsummary(lm.fit)
##
## Call:
## lm(formula = dependent_variable ~ independent_variable, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.0724 -2.1195 0.3659 2.1521 9.3109
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 53.2689 0.4455 119.576 < 2e-16 ***
## independent_variableWomen -2.6810 0.6238 -4.298 4.07e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.118 on 98 degrees of freedom
## Multiple R-squared: 0.1586, Adjusted R-squared: 0.15
## F-statistic: 18.47 on 1 and 98 DF, p-value: 4.072e-05
76.4 Repeated (Longitudinal) Data
set.seed(12345)
<-rnorm(100, 50, 5)
indep_t0<-indep_t0+rnorm(100, 2, 5)
indep_t1=rnorm(100, 0, 3)+indep_t0*rnorm(100, 1, 0.1)
dep_t0=rnorm(100, 0, 3)+indep_t1*rnorm(100, 1, 0.1)
dep_t1<-1:100
id<-data.frame(id, dep_t0,indep_t0, dep_t1, indep_t1)
data::pairs.panels(data) psych
76.4.1 Mit einem Mixed Model
Dazu müssen wir die Daten in das Long-Format transformieren.
<-data %>%
data_longpivot_longer(
cols = -id,
names_to = c(".value", "time"),
names_sep = '_') %>%
mutate(time_num=ifelse(time=="t0", 0,1))
76.4.2 Time as only variable in the model
= lmerTest::lmer(dep ~ time_num+ (1|id), data=data_long)
lmeModel summary(lmeModel)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dep ~ time_num + (1 | id)
## Data: data_long
##
## REML criterion at convergence: 1430.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.98142 -0.63209 0.00474 0.50888 2.62599
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 42.48 6.518
## Residual 45.31 6.731
## Number of obs: 200, groups: id, 100
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 52.2654 0.9369 160.4347 55.783 <2e-16 ***
## time_num 1.4718 0.9519 99.0000 1.546 0.125
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## time_num -0.508
anova(lmeModel)
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## time_num 108.31 108.31 1 99 2.3905 0.1253
options(scipen=999)
76.4.3 Time as factor
= lmerTest::lmer(dep ~ indep*factor(time)+ (1|id), data=data_long)
lmeModel summary(lmeModel)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dep ~ indep * factor(time) + (1 | id)
## Data: data_long
##
## REML criterion at convergence: 1279.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4145 -0.5878 -0.0447 0.6383 2.6971
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 1.565 1.251
## Residual 33.542 5.792
## Number of obs: 200, groups: id, 100
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -4.99685 5.50285 195.98521 -0.908 0.365
## indep 1.11784 0.10680 195.98660 10.467 <0.0000000000000002
## factor(time)t1 3.34771 6.72926 124.92118 0.497 0.620
## indep:factor(time)t1 -0.08165 0.12854 126.63111 -0.635 0.526
##
## (Intercept)
## indep ***
## factor(time)t1
## indep:factor(time)t1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) indep fct()1
## indep -0.994
## factr(tm)t1 -0.797 0.792
## indp:fct()1 0.806 -0.811 -0.992
anova(lmeModel)
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## indep 8828.3 8828.3 1 130.35 263.2037 <0.0000000000000002 ***
## factor(time) 8.3 8.3 1 124.92 0.2475 0.6197
## indep:factor(time) 13.5 13.5 1 126.63 0.4035 0.5264
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
options(scipen=999)
76.4.4 Time as numeric 0 / 1 variable
= lmerTest::lmer(dep ~ indep*time_num+ (1|id), data=data_long)
lmeModel summary(lmeModel)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dep ~ indep * time_num + (1 | id)
## Data: data_long
##
## REML criterion at convergence: 1279.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4145 -0.5878 -0.0447 0.6383 2.6971
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 1.565 1.251
## Residual 33.542 5.792
## Number of obs: 200, groups: id, 100
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -4.99685 5.50285 195.98521 -0.908 0.365
## indep 1.11784 0.10680 195.98660 10.467 <0.0000000000000002 ***
## time_num 3.34771 6.72926 124.92118 0.497 0.620
## indep:time_num -0.08165 0.12854 126.63111 -0.635 0.526
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) indep tim_nm
## indep -0.994
## time_num -0.797 0.792
## indep:tm_nm 0.806 -0.811 -0.992
anova(lmeModel)
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## indep 3674.6 3674.6 1 195.99 109.5532 <0.0000000000000002 ***
## time_num 8.3 8.3 1 124.92 0.2475 0.6197
## indep:time_num 13.5 13.5 1 126.63 0.4035 0.5264
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
options(scipen=999)
76.5 Vergleich mit drei linearen Modellen
<-lm(dep_t0~indep_t0, data=data)
lm.fitsummary(lm.fit)
##
## Call:
## lm(formula = dep_t0 ~ indep_t0, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.2703 -2.3703 -0.5878 3.2629 14.1012
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.00544 4.48699 -1.116 0.267
## indep_t0 1.11800 0.08708 12.838 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.829 on 98 degrees of freedom
## Multiple R-squared: 0.6271, Adjusted R-squared: 0.6233
## F-statistic: 164.8 on 1 and 98 DF, p-value: < 0.00000000000000022
<-lm(dep_t1~indep_t1, data=data)
lm.fit2summary(lm.fit2)
##
## Call:
## lm(formula = dep_t1 ~ indep_t1, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.5999 -4.2983 0.1411 4.4989 16.3423
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.70092 4.70321 -0.362 0.718
## indep_t1 1.03715 0.08705 11.914 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.848 on 98 degrees of freedom
## Multiple R-squared: 0.5916, Adjusted R-squared: 0.5874
## F-statistic: 141.9 on 1 and 98 DF, p-value: < 0.00000000000000022
<-lm(dep_t1~dep_t0,data=data)
lm.fit3summary(lm.fit3)
##
## Call:
## lm(formula = dep_t1 ~ dep_t0, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.674 -6.520 0.861 5.884 25.470
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.881 6.237 2.867 0.00507 **
## dep_t0 0.686 0.118 5.813 0.0000000766 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.24 on 98 degrees of freedom
## Multiple R-squared: 0.2564, Adjusted R-squared: 0.2488
## F-statistic: 33.79 on 1 and 98 DF, p-value: 0.0000000766
mean(data$dep_t1-data$dep_t0, na.rm=TRUE)
## [1] 1.471782
t.test(dep_t1,dep_t0,data=data, paired=TRUE)
##
## Paired t-test
##
## data: dep_t1 and dep_t0
## t = 1.5461, df = 99, p-value = 0.1253
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.4170253 3.3605891
## sample estimates:
## mean difference
## 1.471782