# Model Testing in R

with tags t-test f-test -Model testing belongs to the main tasks of any econometric analysis. This post illustrates how to calculate basic test statistics in R. It also covers the calculation of heteroskedasticy robust standard erros.

## Data

To illustrate the calculation of test statistics in R, let’s use the *wage1* dataset from the `wooldridge`

package and estimate a basic Mincer earnings function. This standard specification of earnings models explains the natural log of average hourly earnings `lwage`

by years of education `educ`

and experience `exper`

. The standard specification also includes the squared values of experience `expersq`

to take into account potential decreasing marginal effects.

```
# Load dataset
library(wooldridge)
data("wage1")
# Estimate a model
model <- lm(lwage ~ educ + exper + expersq, data = wage1)
```

## t test

t tests are used to assess the statistical significance of single variables. In R t values for each variable in a regression model are usually already calculated by the `summary`

function.

`summary(model)`

```
##
## Call:
## lm(formula = lwage ~ educ + exper + expersq, data = wage1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.96387 -0.29375 -0.04009 0.29497 1.30216
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1279975 0.1059323 1.208 0.227
## educ 0.0903658 0.0074680 12.100 < 2e-16 ***
## exper 0.0410089 0.0051965 7.892 1.77e-14 ***
## expersq -0.0007136 0.0001158 -6.164 1.42e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4459 on 522 degrees of freedom
## Multiple R-squared: 0.3003, Adjusted R-squared: 0.2963
## F-statistic: 74.67 on 3 and 522 DF, p-value: < 2.2e-16
```

The t values for our benchmark model indicate that, except for the constant, all variables are statistically significant. You might think about dropping the intercept term at this point, but let’s forget about this for the moment.

## F test

F tests can be used to check, whether one or multiple variables in a model are statistically significant. Basically, the test compares two models with each other, where one model is a special case of the other. This means that we compare a model with more variables - the so-called *unrestricted* model - to a model with less but otherwise the same variables, i.e. the *restriced* or nested model. If the additional predictive power of the unrestricted model is sufficiently high, the variables are jointly significant.

In our example, we add the `tenure`

variable and its square `tenuresq`

to the model equation. This is the unrestricted model, which we have to estimate before we can calculate the F test.

```
# Estimate unrestricted model
model_unres <- lm(lwage ~ educ + exper + expersq + tenure + tenursq, data = wage1)
summary(model_unres)
```

```
##
## Call:
## lm(formula = lwage ~ educ + exper + expersq + tenure + tenursq,
## data = wage1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.96984 -0.25313 -0.03204 0.27141 1.28302
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2015715 0.1014697 1.987 0.0475 *
## educ 0.0845258 0.0071614 11.803 < 2e-16 ***
## exper 0.0293010 0.0052885 5.540 4.80e-08 ***
## expersq -0.0005918 0.0001141 -5.189 3.04e-07 ***
## tenure 0.0371222 0.0072432 5.125 4.20e-07 ***
## tenursq -0.0006156 0.0002495 -2.468 0.0139 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.425 on 520 degrees of freedom
## Multiple R-squared: 0.3669, Adjusted R-squared: 0.3608
## F-statistic: 60.26 on 5 and 520 DF, p-value: < 2.2e-16
```

The t values of the new model indicate that all variabes - including the constant - are statistically significant and `tenure`

provides additional explanatory power to the model. Since `tenure`

also enters in its squared form, we are interested in the joint significance of the tenure terms. This can be checked with an F test. In R we can use the `anova`

function for this, where we proved the two estimated models as arguments.

`anova(model, model_unres)`

```
## Analysis of Variance Table
##
## Model 1: lwage ~ educ + exper + expersq
## Model 2: lwage ~ educ + exper + expersq + tenure + tenursq
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 522 103.790
## 2 520 93.911 2 9.8791 27.351 5.079e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

As you can see from the output, the tenure terms are joinly significant.

## Literature

Kennedy, P. (2014). A Guide to Econometrics. Malden (Mass.): Blackwell Publishing 6th ed.