Feb 28, 2010

Why Generalized Least Square Estimator?

It is known that heteroskedasticity affects the properties of the OLS estimatror (though still unbiased, but less efficient, namely larger variance). When you draw a scatter plot on raw data, the higher absolute values of the residuals to the right in the graph indicate that there is a positive relationship between the error variance and the independent variable. With this kind of error pattern, a few additional large positive errors near the right in this graph would tilt (make something move, into a position with one side or end higher than the other) the OLS regression line considerably. A few additional large negative errors would tilt it in the opposite direction considerably. In repeated sampling these unusual cases would average out, leaving the OLS estimator unbiased, but the variation of the OLS regression line around its mean will be greater - i. e., the variance of βOLS will be greater. The Generalized Least Square (GLS) technique pays less attention to the residuals associated with high-variance observations (by assigning them a low weight in the weighted sum of squared residuals it minimizes) since these observations give a less precise indication of where the true regression line lies. This avoids these large tilts, making the variance of βGLS smaller than that of βOLS.

In the case of that Durbin-Watson test indicates autocorrelated errors. It is typically concluded that estimation via Feasible GLS is called for. This is not always appropriate, however, the significant value of the Durbin-Watson statistic could result from an omitted explanatory variable, an incorrect functional form, or a dynamic misspecification. Only if a researcher is satisfied that none of these phenomena are responsible for the significant Durbin-Watson statistic value should estimation via feasible GLS proceed.

Feb 27, 2010

Two Nonparametrics

In the world of econometrics, the term nonparametric basically refers to the flexible functional form of the regression curve. However, there are other notions of "nonparametric statistics" which refer mostly to distribution-free methods. In the econometric context, generally, neither the error distribution nor the functional form of the mean function is prespecified.
Between the parametric econometrics and nonparametric econometrics, the question of which approach should be taken in data analysis was a key issue in a bitter fight between Pearson and Fisher in the twenties. Fisher pointed out that the nonparametric approach gave generally poor efficiency whereas Pearson was more concerned about the specification question. Both viewpoints are interesting in their own right. Pearson pointed out that the price we have to pay for pure parametric fitting is the possibility of gross misspecification resulting in too high a model bias. On the other hand, Fisher was concerned about a too pure consideration of parameter-free models which may result in more variable estimates, especially for small sample size n.

Orthogonality in Econometrics

In mathematics, two vectors are orthogonal if they are perpendicular, i.e., they form a right angle.

In linear algebra, an orthogonal matrix is a square matrix with real entries whose columns (or rows) are orthogonal unit vectors (i.e., orthonormal). Because the columns are unit vectors in addition to being orthogonal, some people use the term orthonormal to describe such matrices.
Equivalently, a matrix Q is orthogonal if its transpose is equal to its inverse:

Q^T Q = Q Q^T = I . \,     alternatively,   Q^T=Q^{-1} . \,

The concept of orthogonality tends to be very important in econometrics, since we have been building almost all of the methods and rules based on the matrix platform. For example, if it happens that a relevant independent variable is omitted, in general, the OLS estimator of the coefficients of the remaining variables is biased. If the omitted variable is orthogonal to the included variables, the slope coefficient estimator will be unbiased; the intercept estimator will retain its bias unless the mean of the observations on the omitted variable is zero.
In the case of inclusion of an irrelevant variable, unless the irrelevant variable is orthogonal to the other independent variables, the variance-covariance matrix βOLS becomes larger; the OLS estimator is not as efficient. Thus in this case the MSE of the estimator is unequivocally raised.

Feb 26, 2010

Borrow 500 Years of Life from the Heaven

Lyrics:     Junyi Zhang, Xiaobin Fan
Compostion: Ke Fu
Translation: Haoying Wang

♣ Along the gentle waviness of rising and subsiding territory


♣ Galloping on the beloved land, beloved plateau and Yangtze South


♣ In the face of ice blade and sword, accompanied by attaching wind and rain


♣ Being cherished of my golden life from heaven


♣ And full of fraternity all along


♣ Being afraid of nothing


♣ And full of lofty sentiments all along


♣ Life is always of half pain and half enjoyment


♣ But with distinct cut between good and evil


♣ All come true in the dream for future


♣ Clanking iron heel, Never stops on the vast beloved land


♣ Standing on the top of surge, and holding


♣ The movement of universe


♣ Praying for the world of mortals


♣ Full of peace and bliss


♣ And another 500 Years from the Heaven for me


♣ Another 500 Years from the Heaven for me

Feb 25, 2010

Specification Problems and Empirical Study

Peter Kennedy wrote: Econometric textbooks are mainly devoted to the exposition of econometrics for estimation and inference in the context of a given model for the data-generating process. The more important problem of specification of this model is not given much attention, for three main reasons: (1) specification is not easy; (2) most of econometricians would agree that specification is an innovative/imaginative process that cannot be taught; (3) there is no accepted "best" way of going about finding a correct specification. (Of course, this is why we can always contribute something here, it is too hard to find a best and perfect way of specification.)

So the issue can come as how much trust do we have in econometrics, different people express in a different way:
All models are wrong, but some are useful. - George Box
Models are to be used, but not to be believed. -Theil, H.


Here is what Edward E. Leamer contributed into the discussion:
When an inference is suspected to depend crucially on a doubtful assumption, two kinds of actions can be taken to alleviate the consequent doubt about the inferences. Both require a list of alternative assumptions. The first approach is statistical estimation which uses the data to select from the list of alternative assumptions and then makes suitable adjustments to the inferences to allow for doubt about the assumptions. The second approach is a sensitivity analysis that uses the alternative assumptions one at a time, thereby demonstrating either that all the alternatives lead to essentially the asame inferences or that minor changes in the assumptions make major changes in the inferences. For example, a doubtful variable can simply be included in the equation (estimation), or two different equations can be estimated, one with and one without the doubtful variable (sensitivity analysis).
Simplification is a third. The intent of simplification is to find a simple model that works well for a class of decisions. A specification search can be used for simplification,as well as for estimation and sensitivity analysis. the very prevalent confusion among these three kinds of searches ought to be eliminated since the rules for a search and measures of its success will properly depend on its intent.

Again, Peter Kennedy gave following summarization: 
♣ Models whose residuals do not test as insignificantly different from white noise (random errors) should be initially viewed as containing a misspecification, not as needing a special estimation procedure.
♣ "Testing down" is more suitable than "Testing up"; one should begin with a general, unrestricted model and then systematically simplify it in light of the sample evidence.
♣ Tests of misspecification are better undertaken by testing simultaneously for several misspecifications rather than testing one-by-one for these misspcifications.

Likelihood Ratio, Wald, Lagrange Multiplier Tests

The F test is applicable whenever we are testing linear restrictions in the classic normal linear regression model. However, if, (1) the restrictions are nonlinear; (2) the model is nonlinear in the parameters; (3) the errors are distributed non-normally; then we need other asymptotically equivalent tests.

Suppose the restriction being tested is written as g(β), satisfied at the value βMLE-R where the function g(β) cuts the horizontal axis (please refer to the graph at the bottom). Then we have three asymptotically equivalent tests available to do the test and make reference, all of them are distributed asymptotically as chi-square with degrees of freedom equal to the number of restrictions being tested.

(1) The Likelihood Ratio Test: if the restrictions is true, then ln(LR), the maximized value of ln(L) imposing the restrictions, should not be significantly less than ln(Lmax), then unrestricted maximum value of ln(L). The Likelihood Ratio test tests whether [ln(LR)-ln(Lmax)] is significantly different from zero.

(2) Wald Test: if the restriction g(β)=0 is true, then g(βMLE) should not be significantly different from zero. The Wald test tests whether βMLE (the unrestricted estimate of β) violates the restriction by a significant amount.

(3) Lagrange Multiplier Test: The log-likelihood function of ln(L) is maximized at point A where the slope of ln(L) with respect to β is zero. If the restriction is true, then the slope of ln(L) at point B should be significantly different from zero. The Lagrange Multiplier test tests whether the slope of ln(L), evaluated at the restricted estimate, is significantly different from zero.

Graph for reference:

Feb 24, 2010

Future and Complexity

--For the Understanding of Environmental Economics and Studies Concerned


I believe that man has the power, the intelligence, and the imagination to extricate himself from the serious predicament that now confronts him. The necessary first step toward wise action in the future is to obtain an understanding of the problems that exist. This in turn necessitates an understanding of the relationships between man, his natural environment, and his technology.
                                                            -Ocho Rios, Jamaica, April 1953.

In principle, the vast knowledge we have accumulated during the last 150 years makes it possible for us to look into the future with considerably more accuracy than could Malthus. But in actual fact we are dealing with an extremely complex problem which cuts across all of our major fields of inquiry and which, because of this, is difficult to unravel (to explain something that is difficult to understand or is mysterious) in all of its interlocking aspects. The complexity of the problem, our confusion, and our prejudices, have combined to form a dense fog that has obscured the most important features of the problem from our view - a fog which is in certain respects even more dense than that which existed in Malthus’ time. As a result, the basic factors that are determining the future are not generally known or appreciated.

In spite of the complexity of the problem which confronts us, its overwhelming importance, both to ourselves and to our descendants, warrants our dissecting it as objectively as possible. In doing so we must put aside our hatreds, desires, and prejudices, and look calmly upon the past and present. If we are successful in lifting ourselves from the morass (an unpleasant and complicated situation that is difficult to escape from) of irrelevant fact and opinion and in divorcing ourselves from our preconceived ideas, we will be able to see mankind both in perspective and in relation to his environment. In turn we will be able to appreciate something of the fundamental physical limitations to man’s future development and of the hazards which will confront him in the years and centuries ahead.

Feb 23, 2010

Rejection From Yale

2/23/2010

Dear Mr. Wang:

Thank you very much for applying to the Graduate School of Arts and Sciences at Yale University. I regret to inform you that we are unable to offer you admission. As you know, the very high number of extraordinary candidates among our 10,400 applicants far exceeds the number of places we have in each program, and we are not able to admit many excellent candidates.

We are using this system of electronic notification to communicate with you five to ten days more rapidly than we could by letter and, therefore, help applicants plan their futures quickly and effectively. We wish you every success in all your endeavors.

Sincerely,

Jon Butler
Dean of the Graduate School

Why Student's T-test? (Part 2)

An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question. 
--The first golden rule of mathematics, sometimes attributed to John Tukey

With many calculations, one can win; with few one cannot. How much less chance of victory has one who makes none at all! 
--Sun Tzu 'Art of War'

The T-test may be used to compare the means of a criterion variable for two independent samples or for two dependent samples (ex., before-after studies, matched-pairs studies), or between a sample mean and a known mean (one-sample t-test). In regression analysis, A T-test can be used to test any single linear constraint. Nonlinear constraints are usually tested by using a W, LR or LM test, but sometimes an "asymptotic" T-test is encountered: the nonlinear constraint is written with its right-hand side equal to zero, the left-hand side is estimated and then divided by the square root of an estimate of its asymptotic variance to produce the asymptotic T statistics.

For example, here is the formula to test mean difference for the case of equal sample sizes, n, in both groups:

Let E be the experimental condition and let C be the control condition. Let m be the means, s the standard deviations, and n be the sample size. Then
t = (mE - mC) / SQRT[(s2E + s2C) / n ] 

Three Different Types of T-test:

(1) One-sample T-tests test whether the mean of one variable differs from a constant (ex., does the mean grade of 72 for a sample of students differ significantly from the passing grade of 70?). When p<.05 the researcher concludes the group mean is significantly different from the constant.

(2) Independent sample T-tests are used to compare the means of two independently sampled groups (ex., do those working in high noise differ on a performance variable compared to those working in low noise, where individuals are randomly assigned to the high-noise or low-noise groups?) . When p<.05 the researcher concludes the two groups are significantly different in their means. This test is often used to compare the means of two groups in the same sample (ex., men vs. women) even though individuals are not (in the case of gender, cannot be) assigned randomly to the two groups (to "men" and to "women"). Random assignment would have controlled for unmeasured variables. This opens up the possibility that other variables either mask or enhance any apparent significant difference in means. That is, the independent sample t-test tests the uncontrolled difference in means between two groups If a significant difference is found, it may be due not just to gender; control variables may be at work. The researcher will wish to introduce control variables, as in any multivariate analysis. 

(3) Paired sample T-tests compare means where the two groups are correlated, as in before-after, repeated measures, matched-pairs, or case-control studies (ex., mean candidate evaluations before and after hearing a speech by the candidate). The algorithm applied to the data is different from the independent sample t-test, but interpretation of output is otherwise the same.

Associated Assumptions:

(1) Approximately Normal Distribution of the measure in the two groups is assumed. There are tests for normality. The t-test may be unreliable when the two samples come from widely different shaped distributions (see Gardner, 1975). Moore (1995) suggests data for t-tests should be normally distributed for sample size less than 15, and should be approximately notmal and without outliers for samples between 15 and 40; but may markedly skewed when sample size is greater than 40. 

(2) Roughly Similar Variances: There is a test for homogeneity of variance, also called a test of homoscedasticity. In SPSS homogeneity of variances is tested by "Levene's Test for Equality of Variances", with F value and corresponding significance. There are also other tests for homogeneity of variances. The T-test may be unreliable when the two samples are unequal in size and also have unequal variances (see Gardner, 1975). 

(3) Dependent/Independent Samples. The samples may be independent or dependent (ex., before-after, matched pairs). However, the calculation of T differs accordingly. In the one-sample test, it is assumed that the observations are independent. 

One last note is that, don't confuse a T test with analyses of a contingency table (Fishers or chi-square test). Use a T test to compare a continuous variable (e.g., blood pressure or weight). Use a contingency table to compare a categorical variable (e.g., pass vs. fail, viable vs. not viable). 

Reference:
Gardner, P. L. (1975). Scales and statistics. Review of Educational Research. 45: 43-57. Discusses assumptions of the t-test. 
Moore, D. S. (1995). The Basic Practice of Statistics. NY: Freeman and Co. 

Feb 22, 2010

Why Student's T-test? (Part 1)

Here I am trying to answer two questions for myself:

1. What is the difference between Z-test and T-test?
2. Why we need student's T-test?

First, let's be clear on Z-test V.S. T-test. A thumb rule can be referred as, Z-test is used when the sample size is more than 30 while T-test is used for smaple size less than 30. Now let's get back to the history of story:

Sometimes, measuring every single piece of item is just not practical. That is why we developed and use statistical methods to solve problems. The most practical way to do it is to measure just a sample of the population. Some methods test hypothesis by comparison. The two of the more known statistical hypothesis tests are the T-test and the Z-test. Let's try to break down the two.

Strictly speaking, the Z-test is a test for populations rather than samples. In the real world though, either test will give you a pretty close answer. using the T-test is more accurate because the sample deviation is specific and tailored to the sample you are studying, so the answer will be more accurate. When using a T-test of significance, it is assumed that the observations come from a population which follows a Normal distribution. This is often true for data that is influenced by random fluctuations in environmental conditions or random measurement errors. Whereas the T-distribution is essentially a corrected version of the normal distribution in which the population variance is unknown and hence is estimated by the sample standard deviation.

There are various T-tests and two most commonly applied tests are the one-sample and paired-sample T-tests. One-sample T-tests are used to compare a sample mean with the known population mean. Two-sample T-tests, the other hand, are used to compare either independent samples or dependent samples.

As mentioned above, T-test is best applied, at least in theory, if you have a limited sample size (n < 30) as long as the variables are approximately normally distributed and the variation of values in the two groups is not reliably different. It is also great if you do not know the populations’ standard deviation. If the standard deviation is known, then, it would be best to use another type of statistical test, the Z-test. The Z-test is also applied to compare sample and population means to know if there’s a significant difference between them. Z-tests always use normal distribution and also ideally applied if the standard deviation is known. Z-tests are often applied if the certain conditions are met; otherwise, other statistical tests like T-tests are applied in substitute. Z-tests are often applied in large samples (n > 30). When T-test is used in large samples, the T-test becomes very similar to the Z-test. There are fluctuations that may occur in T-tests sample variances that do not exist in Z-tests. Because of this, there are differences in both test results.

Summary:


1. Z-test is a statistical hypothesis test that follows a normal distribution while T-test follows a Student’s T-distribution.
2. A T-test is appropriate when you are handling small samples (n < 30) while a Z-test is appropriate when you are handling moderate to large samples (n > 30).
3. T-test is more adaptable than Z-test since Z-test will often require certain conditions to be reliable. Additionally, T-test has many methods that will suit any need.
4. T-tests are more commonly used than Z-tests.
5. Z-tests are preferred than T-tests when population standard deviations are known.

Feb 21, 2010

Maybe We Just Need a New Word: Gadget

"Samsung has just announced at Barcelona a new cell phone, the Beam, that they expect to have on the market this summer. Its special feature is a built-in pico projector, making it a combination cell phone and (very wimpy) video projector. A cute gadget, although not one that I am likely to have much use for. I do, however, have one suggestion for improving it."

Reading through this news, I am happened to be interested in the word Gadget: Two similar explanations can be easily referenced from dictionary:
(1) an often small mechanical or electronic device with a practical use but often thought of as a novelty;
(2) any object that is interesting for its ingenuity or novelty rather than for its practical use.

So it comes to me as a question, have we been proposing and digging Gadgets in econometrics and economics? This could happen to be a 'gadget' question, but it is definitely not a 'gadget' issue. Too many people are publishing papers which probably are going to have its author(s) as the only and last careful reader. So why we spend one or two years, even three years to invent such a "gadget"? For tenure, for promotion or just for fun (self understanding of the subjects)? Maybe it is just for a popular social demand of vanity, maybe it is just an indispensable part of the system, who knows?



Monte Carlo Studies

Monte Carlo methods have been used for centuries, but only in the past several decades has the technique gained the status of a full-fledged numerical method capable of addressing the most complex applications. The Monte Carlo method may be thought of as similar to a political poll, where a carefully selected statistical sample is used to predict the behavior or characteristics of a large group.

Enrico Fermi in the 1930's used Monte Carlo in the calculation of neutron diffusion, and later designed the Fermiac, a Monte Carlo mechanical device used in the calculation of criticality (The point at which a nuclear reaction is self-sustaining) in nuclear reactors.

In the 1940's, a formal foundation for the Monte Carlo method was developed by von Neumann, who established the mathematical basis for probability density functions (PDFs), inverse cumulative distribution functions (CDFs), and pseudorandom number generators. The work was done in collaboration with Stanislaw Ulam, who realized the importance of the digital computer in the implementation of the approach.

Before digital computers were available to the labs, "computer" was a job title. Parallel computing was done by rows and columns of mathematicians. The applications, which arose mostly from the Manhattan Project, included design of shielding for reactors.

Uses of Monte Carlo methods have been many and varied since that time. In the late 1950's and 1960's, the method was tested in a variety of engineering fields. At that time, even simple problems were compute-bound. Many complex problems remained intractable through the seventies. With the advent of high-speed supercomputers, the field has received increased attention, particularly with parallel algorithms which have much higher execution rates.

In econometrics, the general idea behind a Monte Carlo study is to (1) model the data-generating process, (2) generate several sets of artificial data, (3) employ these data and an estimator to create several estimates, and (4) use these estimates to gauge the sampling distribution properties of that estimator.

A useful reference is the paper:

Design and analysis of Monte Carlo experiments
Written by Kleijnen, J.P.C. (Tilburg University, Center for Economic Research)

Mean Square Error (MSE) and Variance

The difference between the variance of an estimator and its MSE is that the variance measures the dispersion of the estimator around its mean whereas the MSE measures its dispersion around the true value of the parameter being estimated. for unbiased estimators they are identical.

Biased estimator with smaller variances than unbiased estimators are easy to find. The MSE estimator has not been as popular as the best unbiased estimator because of the mathematical difficulties in its derivation. Furthermore, when it can be derived its formula often involves unknown coefficients (the value of beta), making its application impossible. Monte Carlo studies have shown that approximating the estimator by using OLS estimates of the unknown parameters can sometimes circumvent this problem (a little confused here, using approximated OLS estimates to substitute the real beta?)

Note: Weighted Square(d) Error Criterion can be a very interested topic to explore!
Peter Kennedy: When the weights are equal, the criterion is the popular mean square error (MSE) criterion. It happens that the expected value of a loss function consisting of the square of the difference between beta and its estimate (i.e. the square of the estimation error) is the same as the sum of the variance and the squared bias. 
Please refer to following derivation:



OLS: It is not the case that the OLS estimator is the minimum mean square error estimator in the Classic Linear Regression model. Even among linear estimators, it is possible that a substantial reduction in variance can be obtained by adopting a slightly biased estimator.

Feb 17, 2010

Toronto Milk Producers


The Toronto Sunday World, Mar 23, 1914.

A meeting of the Toronto Milk and Cream Producers' Association will be held at the Labor Temple on Thursday, commencing at 2 p.m. The meeting is called to discuss and decide on prices of milk and cream for the ensuing season, May to October, and any other business in the interest of the association. In the evening a banquet will be held at the Grand Union Hotel.

The Story of Maximum Likelihood

The theory of maximum likelihood is very beautiful indeed: a conceptually simple approach to an amazingly broad collection of problems. This theory provides a simple recipe that purports to lead to the optimum solution for all parametric problems and beyond, and not only promises an optimum estimate, but also a simple all-purpose assessment of its accuracy. And all this comes with no need for the specification of a priori probabilities, and no complicated derivation of distributions. Furthermore, it is capable of being automated in modern computers and extended to any number of dimensions. Maximum-likelihood estimation was recommended, analyzed and vastly popularized by R. A. Fisher between 1912 and 1922 (although it had been used earlier by Gauss, Laplace, Thiele, and F. Y. Edgeworth). Reviews of the development of maximum likelihood have been provided by a number of authors.


When we analyze an analysis of variance or linear regression, typically we estimate parameters for the model using the principle of least squares. The idea of least squares is that we choose parameter estimates that minimize the average squared difference between observed and predicted values. That is, we maximize the fit of the model to the data by choosing the model that is closest, on average, to the data.

For many other procedures such as logistic, Poisson, and proportional hazards regression, least squares usually cannot be used as an estimation method. Instead, most often we turn to the method of maximum likelihood. In maximum likelihood estimation, we search over all possible sets of parameter values for a specified model to find the set of values for which the observed sample was most likely. That is, we find the set of parameter values that, given a model, were most likely to have given us the data that we have in hand.

By way of analogy, imagine that you are in a jury for a civil trial. Four things are presented to you in the course of the trial: 1) charges that specify the purpose of the trial, 2) prosecution's version of the truth, 3) defendant's version of the truth, and 4) evidence. Your task on the jury is to decide, in the context of the specified charges and given the evidence presented, which of the two versions of the truth most likely occurred. You are asked to choose which version of the truth was most likely to have resulted in the evidence that was observed and presented.

Analogously, in statistical analysis with maximum likelihood, we are given: 1) a specified conceptual, mathematical, and statistical model, 2) one set of values for the parameters of the model, 3) another set of values for the parameters of the model, and 4) observed data. We want to find the set of values for the parameters of the model that are most likely to have resulted in the data that were actually observed. (We do this by searching over all possible sets of values for the parameters, not just two sets.)

In analysis of variance or linear regression, we measure the fit of the model to the data using the regression sum of squares. With maximum likelihood, the likelihood measures the fit of the model to the data, Therefore, we want to choose parameter values that maximize the likelihood. In analysis of variance or linear regression if we want to compare the fit of two models, we form the ratio of two mean squares to yield an F-test . With maximum likelihood, we do this by forming the ratio of two likelihoods to yield a chi-square test.

Asymptotic Properties

Since econometricians quite often must work with small samples, depending estimators on the basis of their asymptotic properties is legitimate only if it is the case that estimators with desirable asymptotic properties have more desirable small-sample properties than do estimators without desirable asymptotic properties.

Feb 16, 2010

Rejection From Rice



We regret having to inform you that Rice University cannot offer you admission for graduate study.
You can be assured that your application received very careful consideration.
Our decision is based on high standards of selectivity and on the constraints of space, and faculty.
For these reasons, we must limit the number of admissions in all departments.
The other members of the departmental graduate committee join me in wishing you success in your
future endeavors.
Yours sincerely,
Simon Grant, Director
Economics Graduate Program
 
Locations of visitors to this page