Model-Based Inference

Since Matérn (1960) presented his influential work on model-based inference within forest surveys there has been a dispute around whether or not classical design-based inference can be replaced by model-based inference. The assumption underlying model-based inference is that there is a model which generates random values of the population elements. Once the model parameters are estimated, we can use the estimated model, \widehat{\boldsymbol{y}}=\boldsymbol{X}\widehat{\boldsymbol{\beta}}, for predicting the population quantities of interest based on the auxiliary data; in standard cases these are assumed available for all population elements. Introducing \boldsymbol{1} as an N \times 1 vector of “1”-entries, the random population total \tau=\boldsymbol{1}^T\boldsymbol{y}= \boldsymbol{1}^T\boldsymbol{X\beta} + \boldsymbol{1}^T\boldsymbol{\epsilon}  may be predicted as


This model is often known as a superpopulation model from which the actual population is a realization. Since the individual values of population elements are random variables so is the population total or mean. Estimators (sometimes termed predictors in the case of model-based inference) are random variables even if the sample is selected following non-random principles. The variance of this estimator is simpler to derive, since it does not involve any residual terms; thus uncertainty in this case is introduced only through the model parameter estimation.
The variance of the estimator is

V(\hat{\tau}) = \boldsymbol{1}^T\boldsymbol{X}cov(\widehat{\boldsymbol{\beta}})\boldsymbol{X}^T\boldsymbol{1}

The matrix cov(\widehat{\boldsymbol{\beta}}) is the variance-covariance matrix of the model parameter estimates.