Range (statistics)

From Wikipedia
Jump to navigation Jump to search

In descriptive statistics, the range of a set of data is the size or width of the narrowest interval which contains all the data. It is calculated as the difference between the largest and smallest values (also known as the sample maximum and minimum).[1] It is expressed in the same units as the data.

The range provides an indication of statistical dispersion. Robust measures of range include the interdecile range and the interquartile range.

Range of continuous IID random variables

For n independent and identically distributed continuous random variables X1, X2, ..., Xn with the cumulative distribution function G(x) and a probability density function g(x), let T denote the range of them, that is, T= max(X1, X2, ..., Xn)- min(X1, X2, ..., Xn).

Distribution

The range, T, has the cumulative distribution function[2][3]

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle F(t)= n \int_{-\infty}^\infty g(x)[G(x+t)-G(x)]^{n-1} \, \text{d}x.}

Gumbel notes that the "beauty of this formula is completely marred by the facts that, in general, we cannot express G(x + t) by G(x), and that the numerical integration is lengthy and tiresome."[2]: 385 

If the distribution of each Xi is limited to the right (or left) then the asymptotic distribution of the range is equal to the asymptotic distribution of the largest (smallest) value. For more general distributions the asymptotic distribution can be expressed as a Bessel function.[2]

Moments

The mean range is given by[4]

 

where x(G) is the inverse function. In the case where each of the Xi has a standard normal distribution, the mean range is given by[5]

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \int_{-\infty}^\infty (1-(1-\Phi(x))^n-\Phi(x)^n ) \,\text{d}x.}

Derivation of the distribution

Please note that the following is an informal derivation of the result. It is a bit loose with the calculation of the probabilities.

Let Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle m, M} denote respectively the min and max of the random variables Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_1 \dots X_n} .

The event that the range is smaller than Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T} can be decomposed into smaller events according to:

  • the index of the minimum value
  • and the value Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} of the minimum.

For a given index Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle i} and minimum value Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} , the probability of the joint event:

  1. Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_i} is the minimum,
  2. and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_i=x} ,
  3. and the range is smaller than Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T} ,

is:Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle g(x) \left[ G(x+T) - G(x) \right]^{n-1} } Summing over the indices and integrating over Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} yields the total probability of the event: "the range is smaller than Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T} " which is exactly the cumulative density function of the range:Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle F(t) = n \int_{-\infty}^{\infty} g(x) \left[G(t+x)-G(x) \right]^{n-1} \, \text{d}x } which concludes the proof.

The range in other models

Outside of the IID case with continuous random variables, other cases have explicit formulas. These cases are of marginal interest.

  • non-IID continuous random variables.[3]
  • Discrete variables supported on Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbb N} .[6][7] A key difficulty for discrete variables is that the range is discrete. This makes the derivation of the formula require combinatorics.


The range is a specific example of order statistics. In particular, the range is a linear function of order statistics, which brings it into the scope of L-estimation.

See also

References

  1. George Woodbury (2001). An Introduction to Statistics. Cengage Learning. p. 74. ISBN 0534377556.
  2. 2.0 2.1 2.2 E. J. Gumbel (1947). "The Distribution of the Range". The Annals of Mathematical Statistics. 18 (3): 384–412. doi:10.1214/aoms/1177730387. JSTOR 2235736.
  3. 3.0 3.1 Tsimashenka, I.; Knottenbelt, W.; Harrison, P. (2012). "Controlling Variability in Split-Merge Systems". Analytical and Stochastic Modeling Techniques and Applications (PDF). Lecture Notes in Computer Science. 7314. p. 165. doi:10.1007/978-3-642-30782-9_12. ISBN 978-3-642-30781-2.
  4. H. O. Hartley; H. A. David (1954). "Universal Bounds for Mean Range and Extreme Observation". The Annals of Mathematical Statistics. 25 (1): 85–99. doi:10.1214/aoms/1177728848. JSTOR 2236514.
  5. L. H. C. Tippett (1925). "On the Extreme Individuals and the Range of Samples Taken from a Normal Population". Biometrika. 17 (3/4): 364–387. doi:10.1093/biomet/17.3-4.364. JSTOR 2332087.
  6. Evans, D. L.; Leemis, L. M.; Drew, J. H. (2006). "The Distribution of Order Statistics for Discrete Random Variables with Applications to Bootstrapping". INFORMS Journal on Computing. 18: 19–30. doi:10.1287/ijoc.1040.0105.
  7. Irving W. Burr (1955). "Calculation of Exact Sampling Distribution of Ranges from a Discrete Population". The Annals of Mathematical Statistics. 26 (3): 530–532. doi:10.1214/aoms/1177728500. JSTOR 2236482.

Template:Statistics