Feb 17, 2010

The Story of Maximum Likelihood

The theory of maximum likelihood is very beautiful indeed: a conceptually simple approach to an amazingly broad collection of problems. This theory provides a simple recipe that purports to lead to the optimum solution for all parametric problems and beyond, and not only promises an optimum estimate, but also a simple all-purpose assessment of its accuracy. And all this comes with no need for the specification of a priori probabilities, and no complicated derivation of distributions. Furthermore, it is capable of being automated in modern computers and extended to any number of dimensions. Maximum-likelihood estimation was recommended, analyzed and vastly popularized by R. A. Fisher between 1912 and 1922 (although it had been used earlier by Gauss, Laplace, Thiele, and F. Y. Edgeworth). Reviews of the development of maximum likelihood have been provided by a number of authors.


When we analyze an analysis of variance or linear regression, typically we estimate parameters for the model using the principle of least squares. The idea of least squares is that we choose parameter estimates that minimize the average squared difference between observed and predicted values. That is, we maximize the fit of the model to the data by choosing the model that is closest, on average, to the data.

For many other procedures such as logistic, Poisson, and proportional hazards regression, least squares usually cannot be used as an estimation method. Instead, most often we turn to the method of maximum likelihood. In maximum likelihood estimation, we search over all possible sets of parameter values for a specified model to find the set of values for which the observed sample was most likely. That is, we find the set of parameter values that, given a model, were most likely to have given us the data that we have in hand.

By way of analogy, imagine that you are in a jury for a civil trial. Four things are presented to you in the course of the trial: 1) charges that specify the purpose of the trial, 2) prosecution's version of the truth, 3) defendant's version of the truth, and 4) evidence. Your task on the jury is to decide, in the context of the specified charges and given the evidence presented, which of the two versions of the truth most likely occurred. You are asked to choose which version of the truth was most likely to have resulted in the evidence that was observed and presented.

Analogously, in statistical analysis with maximum likelihood, we are given: 1) a specified conceptual, mathematical, and statistical model, 2) one set of values for the parameters of the model, 3) another set of values for the parameters of the model, and 4) observed data. We want to find the set of values for the parameters of the model that are most likely to have resulted in the data that were actually observed. (We do this by searching over all possible sets of values for the parameters, not just two sets.)

In analysis of variance or linear regression, we measure the fit of the model to the data using the regression sum of squares. With maximum likelihood, the likelihood measures the fit of the model to the data, Therefore, we want to choose parameter values that maximize the likelihood. In analysis of variance or linear regression if we want to compare the fit of two models, we form the ratio of two mean squares to yield an F-test . With maximum likelihood, we do this by forming the ratio of two likelihoods to yield a chi-square test.

0 comments:

Post a Comment

 
Locations of visitors to this page