Mar 16, 2010

Dummy Variable, Fixed and Random Effects

Dummy variables are sometimes used in the context of panel, or longitudinal data - observations on a cross-section of individuals or firms, say, over time. In this context it is often assumed that the intercept varies across the N cross-sectional units and/or across the T time periods. In the general case (N-1)+(T-1) dummies can be used for this, with computational short-cuts available to avoid having to run a regression with all these extra variables. This way of analyzing panel data is called the Fixed Effects Model. the dummy variable coefficients reflect ignorance - they are inserted merely for the purpose of measuring shifts in the regression line arising from unknown variables. Some researchers feel that this type of ignorance should be treated in a fashion similar to the general ignorance represented by the error term, and have accordingly proposed the Random Effects, Variance Components, or Error Components model.
Which of the fixed effects and the random effects models is better? This depends on the context of the data and for what the results are to be used. If the data exhaust the population (say observations on all firms producing automobiles), then the fixed effects approach, which produces results conditional on the units in the dataset, is reasonable. If the data are a drawing of observations from a large population (say a thousand individuals in a city many times that size), and we wish to draw inferences regarding other members of that population, the fixed effects model is no longer reasonable; in this context, use of the random effects model has the advantage that it saves a lot of degrees of freedom.

The random effects model has major drawback, however, it assumes that the random error associated with each cross-section unit is uncorrelated with the other regressors, something that is not likely to be the case. Suppose, for example, that wages are being regressed on schooling for a large set of individuals, and that a missing variable, ability, is thought to affect the intercept; since schooling and ability are likely to be correlated, modeling this as a random effect will create correlation between the error and the regressor schooling (whereas modeling it as a fixed effect will not). The result is bias in the coefficient estimates from the random effect model. This may explain why the slope estimates from the fixed and random effects models are often so different. 
A Hausman test for correlation between the error and the regressors can be used to check for whether the random effects model is appropriate. Under the null hypothesis of no correlation between the error and the regressors, the random effects model is applicable and its estimated GLS estimator is consistent and efficient. Fixed effects model is consistent under both null nad the alternative.

1 comments:

Unknown said...

“the dummy variable coefficients reflect ignorance” This opinion is funny.

Post a Comment

 
Locations of visitors to this page