Kapitel 76 Mixed Models

76.1 To do:

Unabhängiger T-Test mit einem gemischten Modell (was zwar keinen Sinn macht, aber zum Verständnis hilft)
Abhängiger T-Test mit einem gemischten Modell (auch wieder nur zur Illustration)

76.1.1 Pakete für dieses Kapitel

Wir benötigen folgende Pakete:

library(tidyverse)
library(ggplot2)
library(BlandAltmanLeh)
library(emo) # nur für ein paar smileys
library(lme4)
library(lmerTest)

76.2 Links zu Tutorials

76.3 Simulated Examples

independent_variable=sample(c("Women", "Men"),size=100, rep=TRUE, prob=c(0.5,0.5))
id<-1:100
dependent_variable<-ifelse(independent_variable=="Women", rnorm(length(independent_variable=="Women"), 50, 3), rnorm(length(independent_variable=="Men"), 53, 3))
data<-data.frame(id, dependent_variable, independent_variable)
psych::pairs.panels(data)

psych::describeBy(dependent_variable~independent_variable, data=data)

## 
##  Descriptive statistics by group 
## independent_variable: Men
##                    vars  n  mean   sd median trimmed  mad  min   max range skew
## dependent_variable    1 49 53.27 3.16  53.61   53.23 3.32 46.2 60.53 14.33 0.06
##                    kurtosis   se
## dependent_variable    -0.44 0.45
## ------------------------------------------------------------ 
## independent_variable: Women
##                    vars  n  mean   sd median trimmed  mad  min  max range skew
## dependent_variable    1 51 50.59 3.08  51.09   50.52 2.88 45.1 59.9  14.8  0.3
##                    kurtosis   se
## dependent_variable     0.09 0.43

76.3.1 Independent T-Test

Hier klicken für ein Tutorial zum R-Test in R

res.t.test<-t.test(dependent_variable~independent_variable, data=data, var.equal=TRUE)
res.t.test

## 
##  Two Sample t-test
## 
## data:  dependent_variable by independent_variable
## t = 4.2979, df = 98, p-value = 4.072e-05
## alternative hypothesis: true difference in means between group Men and group Women is not equal to 0
## 95 percent confidence interval:
##  1.443087 3.918901
## sample estimates:
##   mean in group Men mean in group Women 
##            53.26886            50.58787

res.t.test$estimate[2]-res.t.test$estimate[1]

## mean in group Women 
##           -2.680994

76.3.2 lineare Regression

Der Koeffizient der Variable independent_variable für die Antwortsoption “Women* ist die Differenz der Frauen zu den Männern, die Frauen haben im durchschnittlich -2.6986 Punkte. Die P-Wert sind nur gleich, wenn wir beim T-Test annehmen, dass die Varianzen in beiden Gruppen gleich gross sind (var.equal=TRUE). Siehe dazu auch (hier klicken)

lm.fit<-lm(dependent_variable~independent_variable, data=data)
summary(lm.fit)

## 
## Call:
## lm(formula = dependent_variable ~ independent_variable, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.0724 -2.1195  0.3659  2.1521  9.3109 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                53.2689     0.4455 119.576  < 2e-16 ***
## independent_variableWomen  -2.6810     0.6238  -4.298 4.07e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.118 on 98 degrees of freedom
## Multiple R-squared:  0.1586, Adjusted R-squared:   0.15 
## F-statistic: 18.47 on 1 and 98 DF,  p-value: 4.072e-05

76.4 Repeated (Longitudinal) Data

set.seed(12345)
indep_t0<-rnorm(100, 50, 5)
indep_t1<-indep_t0+rnorm(100, 2, 5)
dep_t0=rnorm(100, 0, 3)+indep_t0*rnorm(100, 1, 0.1)
dep_t1=rnorm(100, 0, 3)+indep_t1*rnorm(100, 1, 0.1)
id<-1:100
data<-data.frame(id, dep_t0,indep_t0, dep_t1, indep_t1)
psych::pairs.panels(data)

76.4.1 Mit einem Mixed Model

Dazu müssen wir die Daten in das Long-Format transformieren.

data_long<-data %>% 
  pivot_longer(
        cols = -id,
        names_to = c(".value", "time"),
        names_sep = '_') %>% 
  mutate(time_num=ifelse(time=="t0", 0,1))

76.4.2 Time as only variable in the model

lmeModel = lmerTest::lmer(dep ~ time_num+ (1|id), data=data_long)
summary(lmeModel)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dep ~ time_num + (1 | id)
##    Data: data_long
## 
## REML criterion at convergence: 1430.7
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.98142 -0.63209  0.00474  0.50888  2.62599 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept) 42.48    6.518   
##  Residual             45.31    6.731   
## Number of obs: 200, groups:  id, 100
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  52.2654     0.9369 160.4347  55.783   <2e-16 ***
## time_num      1.4718     0.9519  99.0000   1.546    0.125    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr)
## time_num -0.508

anova(lmeModel)

## Type III Analysis of Variance Table with Satterthwaite's method
##          Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## time_num 108.31  108.31     1    99  2.3905 0.1253

options(scipen=999)

76.4.3 Time as factor

lmeModel = lmerTest::lmer(dep ~ indep*factor(time)+ (1|id), data=data_long)
summary(lmeModel)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dep ~ indep * factor(time) + (1 | id)
##    Data: data_long
## 
## REML criterion at convergence: 1279.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4145 -0.5878 -0.0447  0.6383  2.6971 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept)  1.565   1.251   
##  Residual             33.542   5.792   
## Number of obs: 200, groups:  id, 100
## 
## Fixed effects:
##                       Estimate Std. Error        df t value            Pr(>|t|)
## (Intercept)           -4.99685    5.50285 195.98521  -0.908               0.365
## indep                  1.11784    0.10680 195.98660  10.467 <0.0000000000000002
## factor(time)t1         3.34771    6.72926 124.92118   0.497               0.620
## indep:factor(time)t1  -0.08165    0.12854 126.63111  -0.635               0.526
##                         
## (Intercept)             
## indep                ***
## factor(time)t1          
## indep:factor(time)t1    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) indep  fct()1
## indep       -0.994              
## factr(tm)t1 -0.797  0.792       
## indp:fct()1  0.806 -0.811 -0.992

anova(lmeModel)

## Type III Analysis of Variance Table with Satterthwaite's method
##                    Sum Sq Mean Sq NumDF  DenDF  F value              Pr(>F)    
## indep              8828.3  8828.3     1 130.35 263.2037 <0.0000000000000002 ***
## factor(time)          8.3     8.3     1 124.92   0.2475              0.6197    
## indep:factor(time)   13.5    13.5     1 126.63   0.4035              0.5264    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

options(scipen=999)

76.4.4 Time as numeric 0 / 1 variable

lmeModel = lmerTest::lmer(dep ~ indep*time_num+ (1|id), data=data_long)
summary(lmeModel)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: dep ~ indep * time_num + (1 | id)
##    Data: data_long
## 
## REML criterion at convergence: 1279.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4145 -0.5878 -0.0447  0.6383  2.6971 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  id       (Intercept)  1.565   1.251   
##  Residual             33.542   5.792   
## Number of obs: 200, groups:  id, 100
## 
## Fixed effects:
##                 Estimate Std. Error        df t value            Pr(>|t|)    
## (Intercept)     -4.99685    5.50285 195.98521  -0.908               0.365    
## indep            1.11784    0.10680 195.98660  10.467 <0.0000000000000002 ***
## time_num         3.34771    6.72926 124.92118   0.497               0.620    
## indep:time_num  -0.08165    0.12854 126.63111  -0.635               0.526    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) indep  tim_nm
## indep       -0.994              
## time_num    -0.797  0.792       
## indep:tm_nm  0.806 -0.811 -0.992

anova(lmeModel)

## Type III Analysis of Variance Table with Satterthwaite's method
##                Sum Sq Mean Sq NumDF  DenDF  F value              Pr(>F)    
## indep          3674.6  3674.6     1 195.99 109.5532 <0.0000000000000002 ***
## time_num          8.3     8.3     1 124.92   0.2475              0.6197    
## indep:time_num   13.5    13.5     1 126.63   0.4035              0.5264    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

options(scipen=999)

76.5 Vergleich mit drei linearen Modellen

lm.fit<-lm(dep_t0~indep_t0, data=data)
summary(lm.fit)

## 
## Call:
## lm(formula = dep_t0 ~ indep_t0, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.2703  -2.3703  -0.5878   3.2629  14.1012 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) -5.00544    4.48699  -1.116               0.267    
## indep_t0     1.11800    0.08708  12.838 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.829 on 98 degrees of freedom
## Multiple R-squared:  0.6271, Adjusted R-squared:  0.6233 
## F-statistic: 164.8 on 1 and 98 DF,  p-value: < 0.00000000000000022

lm.fit2<-lm(dep_t1~indep_t1, data=data)
summary(lm.fit2)

## 
## Call:
## lm(formula = dep_t1 ~ indep_t1, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.5999  -4.2983   0.1411   4.4989  16.3423 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) -1.70092    4.70321  -0.362               0.718    
## indep_t1     1.03715    0.08705  11.914 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.848 on 98 degrees of freedom
## Multiple R-squared:  0.5916, Adjusted R-squared:  0.5874 
## F-statistic: 141.9 on 1 and 98 DF,  p-value: < 0.00000000000000022

lm.fit3<-lm(dep_t1~dep_t0,data=data)
summary(lm.fit3)

## 
## Call:
## lm(formula = dep_t1 ~ dep_t0, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.674  -6.520   0.861   5.884  25.470 
## 
## Coefficients:
##             Estimate Std. Error t value     Pr(>|t|)    
## (Intercept)   17.881      6.237   2.867      0.00507 ** 
## dep_t0         0.686      0.118   5.813 0.0000000766 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.24 on 98 degrees of freedom
## Multiple R-squared:  0.2564, Adjusted R-squared:  0.2488 
## F-statistic: 33.79 on 1 and 98 DF,  p-value: 0.0000000766

mean(data$dep_t1-data$dep_t0, na.rm=TRUE)

## [1] 1.471782

t.test(dep_t1,dep_t0,data=data, paired=TRUE)

## 
##  Paired t-test
## 
## data:  dep_t1 and dep_t0
## t = 1.5461, df = 99, p-value = 0.1253
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.4170253  3.3605891
## sample estimates:
## mean difference 
##        1.471782