An upper bound on |mode(X)-E(X)|

(A pdf version may be clearer)

This page puts an upper bound on the difference between the mode and mean of a random variable with a weakly unimodal probability density function or with a weakly unimodal probability distribution, and proves this bound. It also discusses when the bound is achieved or approached, what constraints there are on the distribution on one side of the mode, how this relates to Gauss's inequality, an equivalent speculative result for the median, and the minimising properties of a uniform distribution

|mode(X)-E(X)|<=sqrt(3).sd(X)

Combining continuous and discrete elements

Step 0
Initial assumptions

Achieving equality

Step 1
Dealing with X>=mode(X)>E(X)

P(X>=mode(X))

Step 2
Dealing with X<mode(X)

Weakening and simplifying Gauss's inequality

Step 3
Result for continuous case

|median(X)-E(X)|<=sqrt(3/5).sd(X)

Step 4
Result for discrete case

Minimising properties of the uniform distribution

or go to some Statistics Jokes

or look at a one-tailed version of Chebyshev's inequality
and how it is related to this inequality

or a more general relationship between the mean, median, mode and standard deviation

written by Henry Bottomley     

To prove that for a weakly unimodal random variable X:
|mode(X)-E(X)|<=sqrt(3).sd(X)
or equivalently (mode(X)-E(X))
2<=3.Var(X)

Step 0 (Return to top)

Step 1 (Return to top)

Step 2 (Return to top)

Step 3 (Return to top)

Step 4 (Return to top)


Note A: Random variables combining continuous and discrete elements (Return to top)

Steps 0 to 3 dealt with continuous random variables while Step 4 dealt with discrete random variables. The remaining question is that of combinations of the two

We can exclude random variables with two or more points of positive probability together with continuous elements since they will fail a suitable definition of being unimodal similar to that in Step 4

That leaves random variables that combine a single point of positive probability together with continuous elements. For this to be unimodal, the point of positive probability needs to be the mode of the random variable, and the continuous elements need to be weakly monotonically increasing up to that point and weakly monotonically decreasing down from that point. In such a case the calculations in Steps 0 to 3 still apply, and so the result applies.


Note B: Achieving equality |mode(X)-E(X)|=sqrt(3).sd(X) (Return to top)

The equality |mode(X)-E(X)|=sqrt(3).sd(X) is achieved for any uniform distribution if the mode is taken as being at one end

If this is not seen as being sufficiently unimodal, then equality is not achieved for a continuous random variable except in the trivial case where the distribution is a single atom, since equality requires a=0 in Step 3, and requires f(x)=u(x), possibly except at x=0 and x=2.m/(1-a), in Step 2

But consider a continuous random variable Xd with probability distribution function pd(x)
where for some d, i and j with j>0 and 0<d<=1/j
pd(x)=0 if x<=i, pd(x)=(1+2.d.x-2.d.i-d.j)/j if i<x<=i+j, pd(x)=0 if i+j<x,
then mode(Xd)=i+j
E(Xd)=i+j/2+d.j2/6
Var(Xd)=j2/12-d2.j4/36
(mode(Xd)-E(Xd))2/Var(Xd)=(3-d.j)2/(3-d2.j2)
=3-d.j.2+d2.j2.(4-d.j.2)/(3-d2.j2)

So as d tends to 0 from above, Xd approaches a uniform distribution
and (mode(X
d)-E(Xd))2/Var(Xd) tends to 3 from below

By comparison,
for a triangular distribution, d=1/j and (mode(Xd)-E(Xd))2/Var(Xd)=2
while for an exponential distribution (mode(X)-E(X))2/Var(X)=1

 

For a discrete random variable equality is not achieved except in the trivial case where the distribution is a single atom, since equality in the inequality mode(X)-E(X)>=h/2 in Step 4 is only achieved if mode(Y)-E(Y)=0

But consider a discrete continuous random variable Yn with probability density P(Yn=y)
where for some h, i, integer n with h>0 and n>0
P(Yn=i+j.h)=1/(n+1) if j is an integer with 0<=j<=n and P(Yn=i+j.h)=0 otherwise
then choose mode(Yn)=i+n.h
E(Yn)=i+n.h/2
Var(Yn)= n.(n+2).h2/12
(mode(Xd)-E(Xn))2/Var(Xn)=3.n/(n+2)
=3-6/(n+2)

So as n tends to infinity
(mode(X
n)-E(Xn))2/Var(Xn) tends to 3 from below
and in a sense the cumulative probability function of Xn approaches that of a uniform distribution, particularly if h is made proportional to 1/n

Xn is only weakly unimodal but it would be easy to construct a similar sequence of strictly unimodal random variables with the same property

 

For a random variable which combines discrete and continuous elements equality is not achieved, since it requires a=0 in Step3


Note C: A one-tailed inequality for P(X>=mode(X)) (Return to top)

Since in Step 3 above
Var(X)>=(mode(X)-E(X))2.(1+3.P(X>=mode(X)))/(3.(1-P(X>=mode(X))))
we get the result:
If mode(X)>=E(X)
then P(X>=mode(X))<=(3.Var(X)-(mode(X)-E(X))
2)/(3.(Var(X)+(mode(X)-E(X))2))
i.e. P(X>=mode(X))<=1-4.(mode(X)-E(X))
2/(3.(Var(X)+(mode(X)-E(X))2))

From this we have
P(X<=mode(X))>=4.(mode(X)-E(X))2/(3.(Var(X)+(mode(X)-E(X))2))
and by considering Y= -X we get the result:
If mode(Y)<=E(Y)
then P(Y>=mode(Y))>=4.(E(Y)-mode(Y))
2/(3.(Var(Y)+(E(Y)-mode(Y))2))


Note D: Weakening and simplifying Gauss's inequality (Return to top)

The terms in these results have slight similarities with Gauss's inequality of 1821 for a unimodal distribution:
P(|X-mode(X)|>=g.sqrt(Var(X)+(mode(X)-E(X))2))<=4/(9.g2)

This could become P(|X-mode(X)|>2.g.sd(X))<=16/(9.(2.g)2)
i.e. P(|X-mode(X)|>=k.sd(X))<=16/(9.k2)
or P(|X-mode(X)|>=t)<=Var(X).16/(9.t
2)
which is weaker but simpler than Gauss's inequality
and is not that far away in form from Chebyshev's inequality:
P(|X-E(X)|>k.sd(X))<=1/k2
or P(|X-E(X)|>t)<=Var(X)/t2
though in a unimodal case, Chebyshev's inequality could be considerably tightened

In the extreme case of the uniform distribution where the mode is taken as being at one end
P((|X-mode(X)|>=k.sd(X))=16/(9.k2) for only one value of k:
4/sqrt(3) i.e. about 2.309401.. when P((|X-mode(X)|>=k.sd(X))=1/3
For all other unimodal distributions or values of k, equality is not achieved


Note E: Speculation on |median(X)-E(X)| for unimodal distributions (Return to top)

The result above is |mode(X)-E(X)|<=sqrt(3).sd(X)
or equivalently (mode(X)-E(X))2<=3.Var(X)

For a general random variable the equivalent result for the median is well known as
|median(X)-E(X)|<=sd(X)
or equivalently (median(X)-E(X))2<=Var(X)

I suspect that for a continuous random variable unimodal distribution the equivalent result for the median is
|median(X)-E(X)|<=sqrt(3/5).sd(X)
where sqrt(3/5)=0.77459..
or equivalently 5.(median(X)-E(X))
2<=3.Var(X)
derived from speculation on a one-tailed version of Chebyshev's inequality for unimodal distributions, with equality approached for distributions with about half their probability very close to a single point and the remainder virtually uniformly distributed on one side of that point

This last result is tighter than the case for discrete random variables
for example if P(Y=0)=1/2-d and P(Y=1)=1/2+d for some small d>0
then this meets the definition of "weakly unimodal" in Step 4
but |median(Y)-E(Y)|>=(1-2.d).sd(Y) which can be very close to sd(Y)


Note F: Among positive monotonically decreasing continuous random variables, a uniform distribution minimises key descriptive statistics (Return to top)

Step 2 above showed that for a positive random variable with a known mean and an unknown finite variance and a monotonically decreasing probability density function, then the second moment E(X2) is bounded below by that of a uniform distribution with a minimum of zero and the same mean

Similar methods can be used to show directly that this uniform distribution provides a lower bound for both the variance and the maximum

Similar methods can also show that for a positive random variable with an unknown finite mean and an unknown finite variance and a monotonically decreasing probability density function bounded above by a known value (at or near zero), then the first moment E(X), the second moment E(X2), the variance and the maximum are all bounded below by those of a uniform distribution with a minimum of zero and the same bound on the maximum of the value of the probability density function

This ability of either (as in Step 2) a uniform distribution, or (as in Step 3 and Note A) a combination of point of positive probability and a uniform distribution to be the limiting case can lead to speculation of possible further results

For example, it might be the case that for a continuous random variable X with a unimodal probability density function and with values over a finite range:
maximum(X)-minimum(X)>=3.sd(X)
with equality approached for distributions with about a third of their probability very close to a single point and the remainder virtually uniformly distributed on one side of that point
maximum(X)-minimum(X)>=4.|median(X)-E(X)|
with equality approached for distributions with about half their probability very close to a single point and the remainder virtually uniformly distributed on one side of that point
maximum(X)-minimum(X)>=2.|mode(X)-E(X)|
with equality approached for distributions virtually uniformly distributed and with the mode at one end

The results for discrete random variables with unimodal probability distributions (as defined in Step 4) can be very different, perhaps
maximum(Y)-minimum(Y)>=2.sd(Y)
and at first glance it seems likely that this result may in fact apply to all bounded random variables, unimodal or not, whether they are discrete, continuous or mixed
If so, then we have the slightly broader result
maximum(Y)-minimum(Y)>=2.sd(Y)>=2.|median(Y)-E(Y)|
with the possibility of achieving or at least approaching equality


Return to top or go to some Statistics Jokes or look at a one-tailed version of Chebyshev's inequality and further discussion or the mean, median, mode and standard deviation relationship or see Henry Bottomley's home page

Copyright December 1999 Henry Bottomley. All rights reserved.