This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||
|
Daily pageviews of this article
A graph should have been displayed here but
graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at
pageviews.wmcloud.org |
This article is substantially duplicated by a piece in an external publication. Please do not flag this article as a copyright violation of the following source:
|
This editor added a few paragraphs which seem to reflect a complete misunderstanding of the topic, perhaps due to confusion with a margin of error. I have moved them here in case anyone wants to discuss them further. The additions follow below. -- Avenue 11:56, 19 July 2006 (UTC)
See Student's t-distribution, where one finds this:
The numerator has a normal distribution with standard deviation σ. The denominator is distributed as σ times a chi-distributed random variable with n − 1 degrees of freedom. So the standard deviation σ cancels, and the probability distribution of the expression above does not depend on σ. Michael Hardy 21:39, 24 July 2006 (UTC)
The formula introduces two new symbols which are not defined anywhere in the article: and . Can someone please clarify what is meant by these symbols? -- Dbmercer ( talk) 14:46, 3 February 2010 (UTC)
We could insert the following sentence right after the formula. Please review.
I see on the article itself that it is tagged for cleanup, but I don't see any tags here. Nor, frankly, do I see what needs cleaning (the article looks good to me). I am new here, what am I missing? Plf515 11:46, 24 November 2006 (UTC)plf515
Is the following correct?:
As the article stands, it doesn't include residual outside of statistics (such as in approximation). If the above concise definition is correct, I'd like to add it near the top of this article. —Ben FrantzDale 21:31, 30 November 2006 (UTC)
I think, section 1 needs to be more precise in explaining what n and N depict. maye 18:58, 8 March 2007 (UTC)
References:
Glass & Hopkins (1984) is a widely known and successful university textbook, in which I cannot find any reference to the concepts explained in the introduction, which seem to be based on Cook (1982).
A "residual" exists if you perform a prediction an estimate based on a mathematical model, such as a regression equation (or the cost function of an optimization algorithm). A residual is the difference between an observed value and the estimated value.
The term "residual" and the expression "error of estimate" are used as synonims in the above mentioned textbook (page 121, par. 8.7). By the way, I slightly disagree with Glass & Hopkins, and I believe that the error of estimate should have opposite sign with respect to the residual: the error is in the estimate, rather than in the observed value.
The difference between the observation and the mean is a "deviation". Of course, if you like you can think of the mean as a regression equation with null regression coefficient (null slope) and constant output. Thus, the deviation from the mean can be regarded both as an error of estimate and a residual. Contrary to what is stated in the first paragraph of the introduction, there's an error even when we refer to the sample mean (not only when we refer to the population mean).
Notice that the error of estimate depends on the method of estimate. If we use a (linear or non-linear, simple or multiple) regression based on population data, rather than just a population mean, we obtain a better estimate and a smaller (RMS) error. How can we justify the concept that the error is in the observation, rather than in the expected value (see second paragraph of introduction)?
Based on these concepts, I believe that the introduction of this article is highly questionable. The reference included in the article (Cook, 1982) may give a non-conventional definition of the above mentioned terminology. Unless you have other reliable and widely accepted references supporting that interpretation, I suggest to rewrite the introduction. Paolo.dL ( talk) 14:29, 17 December 2007
I am not sure about what you mean when you say "means estimated by least squares". Of course, the simple arithmetic mean has the property to minimize the sum of squared deviations. Also, as I wrote, a mean can be regarded as the simplest form of regression. As for the word "prediction", I used it as a synonym of "estimate", or "inference", without reference to the future. I believe that this (improper) generalization of the word is quite common. Paolo.dL ( talk) 15:33, 17 December 2007 (UTC)
Summary: My main point is that the distinction between "residual" and "error" introduced in the article is questionable. A deviation from the mean can be regarded both as an error of estimate and a residual. As far as I know (cf. Glass & Hopkins, 1984), the very specific and strict definition given in the article of the generic word "error" is not standard. Moreover, it is not even valid for more specific concepts such as "random error", "error of estimate", "standard error of the mean"... Paolo.dL ( talk) 18:33, 21 December 2007 (UTC)
Is there a reference to support the article's definition on error and residual? mezzaninelounge ( talk) 06:43, 25 December 2007 (UTC)
Thank you, Michael. This reference uses the specific expression "statistical error" to indicate the concept that is generically referred to as "error" in the article. It also states that a residual is an error as well, namely a "fitting error". We all know about other uses of the generic word "error" in statistics (error of estimate, error of the mean). Thus, I substituted "statistical error" for "error" in the article. Notice that Wikipedia redirects the expression "statistical error" to this article. I am still not sure that the definition of "statistical error" is widely accepted, but I am not a statistician and I cannot give a final answer on this topic. Paolo.dL ( talk) 20:47, 27 December 2007 (UTC)
In my opinion, we still don't have a satisfactory answer to this question: how can we justify the concept that the statistical error is in the observation, rather than in the expected value? The explanation in the 2nd paragraph of the introduction does not convince me. I think everybody agrees that an error is something we subtract from the "wrong" value in order to obtain the "correct" value. For instance, I would say that a "fitting error" is the opposite of a residual (see previous section). Residual means "what we add to the predicted value to get the correct value or perfect fit", while "fitting error" means "what we shoud subtract from the predicted value to obtain the perfect fit"). A statistical error is properly an error only if an estimate based on a mathematical model (including linear regression) is supposed to be perfect. But any mathematical model is known to be imperfect, by definition! Why should our terminology imply that an estimate is more correct than the true value?
We do know that some statisticians describe the residual as an error (see references in previous section). Is there a good rationale? Is it just an improper (but standard) use of the word "error" in statistics, conflicting with the meaning of the word in all other contexts, including current language? Is this questionable terminological convention accepted by all statisticians?
Notice that, even in statistics, "error of the mean" is a proper use of the word "error" (something we subtract from the sample mean in order to get the population mean)... Paolo.dL ( talk) 21:00, 27 December 2007 (UTC)
Good article. Thanks to the authors. -- landroni ( talk) 18:44, 25 March 2009 (UTC)
In Econometrics I often come accross the terms disturbances or discrepancies. I've created a redirect from Disturbance (statistics) to here, but I am not positively sure whether 'disturbances' point to errors or to residuals. According to a book I have, "disturbances u's are unobservable, so I would assume that it points to errors; it'd be better, however, for someone more informed that I am to do the editing (say, errors (also known as disturbances). -- landroni ( talk) 17:59, 14 April 2009 (UTC)
That example shows how to set up an equation to solve the residual but is done poorly because it does not define what all the variables stand for. How is anyone supposed to know what to plug in where by reading this article? It also isn't really an example because it doesn't actually put in numbers as examples and then come up with a solution as an example. —Preceding unsigned comment added by 71.82.67.87 ( talk) 08:56, 4 June 2009 (UTC)
I see that someone recently added a "fact" tag. I think this can probably be found in the writings of Carl Gauss. I will look for it. Michael Hardy ( talk) 19:41, 25 January 2010 (UTC)
respectfully, the intro is way to complex; ok for later but words like univariate; almost by definition, such a word should not be used in a general encyclopedia — Preceding unsigned comment added by 68.236.121.54 ( talk) 20:50, 21 September 2012 (UTC)
The result of the move request was: page moved. ( non-admin closure) Calidum T| C 04:45, 30 May 2015 (UTC)
Errors and residuals in statistics →
Errors and residuals – shorter and unambiguous; longer version only exists because of disamb suffixea common practice, e.g.,
Errors (statistics) and
Residuals (statistics). --Relisted.
George Ho (
talk) 23:00, 23 May 2015 (UTC) –
Fgnievinski (
talk)
03:48, 15 May 2015 (UTC)
Agree with the nomination -- there are no obvious conflicts with any other article titles, nor common concepts (as far as I know). The idea of someone looking for an article about "errors and residues [sic]" in the context of cooking is ... implausible. -- JBL ( talk) 23:09, 23 May 2015 (UTC)