Difference between mode and mean

An upper bound on |mode(X)-E(X)|

(A pdf version may be clearer)

This page puts an upper bound on the difference between the mode and mean of a random variable with a weakly unimodal probability density function or with a weakly unimodal probability distribution, and proves this bound. It also discusses when the bound is achieved or approached, what constraints there are on the distribution on one side of the mode, how this relates to Gauss's inequality, an equivalent speculative result for the median, and the minimising properties of a uniform distribution

\|mode(X)-E(X)\|<=sqrt(3).sd(X)	Combining continuous and discrete elements
Step 0 Initial assumptions	Achieving equality
Step 1 Dealing with X>=mode(X)>E(X)	P(X>=mode(X))
Step 2 Dealing with X<mode(X)	Weakening and simplifying Gauss's inequality
Step 3 Result for continuous case	\|median(X)-E(X)\|<=sqrt(3/5).sd(X)
Step 4 Result for discrete case	Minimising properties of the uniform distribution
or go to some Statistics Jokes	or look at a one-tailed version of Chebyshev's inequality and how it is related to this inequality or a more general relationship between the mean, median, mode and standard deviation

written by Henry Bottomley

To prove that for a weakly unimodal random variable X:
|mode(X)-E(X)|<=sqrt(3).sd(X)
or equivalently (mode(X)-E(X))²<=3.Var(X)

Step 0 (Return to top)

Look first at a continuous random variable X
with a weakly unimodal probability density function
(The discrete case comes in Step 4)
"Weakly unimodal probability density function" means
a probability density function p(x) for X
with a value of x called mode(X)
where for all y and z where z<y<mode(X) or mode(X)<y<z
p(z)<=p(y)<=p(mode(X))
i.e. p(x) is weakly monotonically increasing up to mode(X)
and p(x) is weakly monotonically decreasing down from mode(X)
Note that in cases where p(x) achieves its maximum over a range
mode(X) may not be uniquely defined;
in such cases the selection of a value for mode(X) is discretionary
If mode(X)=E(X) then |mode(X)-E(X)|=0<=sqrt(3).sd(X)
So assume mode(X)>E(X)
(otherwise consider Y= -X and mode(Y)>E(Y) )
Let m=mode(X)-E(X)
and so from assumption m>0
Let a=P(X>=mode(X))
note a<1
Let b=E(X-E(X)|X>=mode(X))
note b>=m>0
Let c=E((X-E(X))²|X>=mode(X))
note c>=m²>0 and a.c<Var(X)
then E(X-E(X)|X<mode(X))= -a.b/(1-a)
and E((X-E(X))²|X<mode(X))=(Var(X)-a.c)/(1-a)

Step 1 (Return to top)

Suppose M|(X>=mode(X))=m
and M|(X<mode(X))=0
Suppose S|(X>=mode(X))=0
and S|(X<mode(X))=m.(a.b-a.m+(X-E(X)))/(a.b-a.m+m)
M.S=0
E(M)=a.m
E(S)=(1-a).m.(a.b-a.m-a.b/(1-a))/(a.b-a.m+m)
= -a.m
so E(M+S)=0
E(M²)=a.m²<=a.c
E(S²)=(1-a).E(S²| X<mode(X))
=(1-a).(m²/(a.b-a.m+m)²).(Var(X)-a.c)/(1-a)
<=Var(X)-a.c
so Var(M+S)=E(M²)+E(S²)
<=Var(X)
and S|(X<mode(X)) is weakly monotonically increasing up to m
with mean -a.m/(1-a) and variance<=(Var(X)-a.c)/(1-a)

Step 2 (Return to top)

(m-S)|(X<mode(X)) is a positive continuous random variable
with a weakly monotonically decreasing probability density function,
mean m/(1-a) and finite variance bounded above by (Var(X)-a.c)/(1-a)
let f(x) be probability density function of (m-S)|(X<mode(X))
and let u(x) be probability density function of U(0,2.m/(1-a))
also a positive random variable with the same mean m/(1-a)
int[0 to infinity]((f(x)-u(x)).dx)=0
f(x)>=0=u(x) if x>2.m/(1-a)
int[0 to 2.m/(1-a)]((u(x)-f(x)).dx)
=int[2.m/(1-a) to infinity]((f(x)-u(x)).dx)>=0
so int[2.m/(1-a) to infinity](x².(f(x)-u(x)).dx)
>=(2.m/(1-a))².int[2.m/(1-a) to infinity]((f(x)-u(x)).dx)
=(2.m/(1-a))².int[0 to 2.m/(1-a)]((u(x)-f(x)).dx)
>=int[0 to 2.m/(1-a))](x².(u(x)-f(x)).dx)
so E((m-S)²|(X<mode(X)))=int[0 to infinity](x².f(x).dx)
>=int[0 to infinity](x².u(x).dx)=E(U²)
=(4/3).(m/(1-a))²
so E((m-S)²)
>=a.m²+(4/3).m²/(1-a)
=m².(3.a-3.a²+4)/(3.(1-a))
But E((m-S)²)=m²-2.m.E(S)+E(S²)
=m².(1+2.a)+ E(S²)
so E(S²)
>=m².(3.a-3.a²+4)/(3.(1-a))-m².(1+2.a)
=m².(1+3.a²)/(3.(1-a))

Step 3 (Return to top)

Suppose M|(X>=mode(X))=m
and M|(X<mode(X))=0
Suppose R|(X>=mode(X))=0
and R|(X>=mode(X))~U(-m.(1+a)/(1-a),m)
E(M)=a.m
and E(M²)=a.m²
E(R)= -a.m
and E(R²)=m².(1+3.a²)/(3.(1-a))
Var(X)>=E(M²)+E(S²)
>=E(M²)+E(R²)
=m².(1+3.a)/(3.(1-a))
>=m²/3
which implies Var(X)>=m²/3=(mode(X)-E(X))²/3
i.e. |mode(X)-E(X)|<=sqrt(3).sd(X)
The Pearson mode skewness is (E(X)-mode(X))/sd(X)
and so |Pearson mode skewness|<=sqrt(3)
for a unimodal continuous random variable

Step 4 (Return to top)

Now consider a discrete random variable with a weakly unimodal probability distribution
It is easy to create a discrete distribution for which the inequality apparently does not apply. For example with:
P(Y=0)=0.18, P(Y=1)=0.19, P(Y=2)=0.20, P(Y=3)=0.21, P(Y=10)=0.22
Mode(Y)=10, E(Y)=3.42, Var(Y)=13.1836
(mode(X)-E(X))²/Var(X)=3.2841..>3
and higher figures can easily be obtained in similar ways
This can be seen as not being properly unimodal
since P(Y=3)>P(Y=4)<P(Y=10)
but if we can restrict unimodal probability distributions to those where possible values are equally spaced
then we can achieve the result
"Weakly unimodal probability distribution" means
a probability distribution P(Y=y) for Y
with a possible value of Y called mode(Y) and a spacing h>0
where for all integers n₁ and n₂ where n₂<n₁<0 or 0>n₁>n₂
P(Y=mode(Y)+n₂.h)<=P(Y=mode(Y)+n₁.h)<=P(Y=mode(Y))
and where if y<>mode(Y)+n.h for all integers n then P(Y=y)=0
Note that in cases where P(Y=y) achieves its maximum over a range
mode(Y) may not be uniquely defined;
in such cases the selection of a value for mode(Y) is discretionary
Suppose Y is such a discrete random variable with a weakly unimodal probability distribution
If mode(Y)=E(Y) then |mode-E(Y)|=0<=sqrt(3).sd(Y),
otherwise assume mode(Y)>E(Y) (or consider Z= -Y)
Construct from Y a continuous random variable X
with a probability density function p(x)
where if for some integer n
mode(Y)+(n-1/2).h<x<=mode(Y)+(n+1/2).h
then p(x)=P(Y=mode(Y)+n.h)/h
We then have
E(X)=E(Y)
Var(X)=Var(Y)+h²/12
mode(X) is in the range (mode(Y)-h/2,mode(Y)+h/2]
Select mode(X)=mode(Y)+h/2
so mode(X)-E(X)>=h/2
p(x) is a weakly unimodal probability density function as in Step 0
so from Step 3 (mode(X)-E(X))²/3<=Var(X)
(mode(Y)-E(Y))²
=(mode(X)-h/2-E(X))²
=((mode(X)-E(X))²-((mode(X)-E(X)).h+h²/4
<=((mode(X)-E(X))²-h²/2+h²/4
<=3.Var(X)-3.h²/12
=3.Var(Y)
so |mode(Y)-E(Y)|<=sqrt(3).sd(Y)

Note A: Random variables combining continuous and discrete elements (Return to top)

Steps 0 to 3 dealt with continuous random variables while Step 4 dealt with discrete random variables. The remaining question is that of combinations of the two

We can exclude random variables with two or more points of positive probability together with continuous elements since they will fail a suitable definition of being unimodal similar to that in Step 4

That leaves random variables that combine a single point of positive probability together with continuous elements. For this to be unimodal, the point of positive probability needs to be the mode of the random variable, and the continuous elements need to be weakly monotonically increasing up to that point and weakly monotonically decreasing down from that point. In such a case the calculations in Steps 0 to 3 still apply, and so the result applies.

Note B: Achieving equality |mode(X)-E(X)|=sqrt(3).sd(X) (Return to top)

The equality |mode(X)-E(X)|=sqrt(3).sd(X) is achieved for any uniform distribution if the mode is taken as being at one end

If this is not seen as being sufficiently unimodal, then equality is not achieved for a continuous random variable except in the trivial case where the distribution is a single atom, since equality requires a=0 in Step 3, and requires f(x)=u(x), possibly except at x=0 and x=2.m/(1-a), in Step 2

But consider a continuous random variable X_d with probability distribution function p_d(x)
where for some d, i and j with j>0 and 0<d<=1/j
p_d(x)=0 if x<=i, p_d(x)=(1+2.d.x-2.d.i-d.j)/j if i<x<=i+j, p_d(x)=0 if i+j<x,
then mode(X_d)=i+j
E(X_d)=i+j/2+d.j²/6
Var(X_d)=j²/12-d².j⁴/36
(mode(X_d)-E(X_d))²/Var(X_d)=(3-d.j)²/(3-d².j²)
=3-d.j.2+d².j².(4-d.j.2)/(3-d².j²)

So as d tends to 0 from above, X_d approaches a uniform distribution
and (mode(X_d)-E(X_d))²/Var(X_d) tends to 3 from below

By comparison,
for a triangular distribution, d=1/j and (mode(X_d)-E(X_d))²/Var(X_d)=2
while for an exponential distribution (mode(X)-E(X))²/Var(X)=1

For a discrete random variable equality is not achieved except in the trivial case where the distribution is a single atom, since equality in the inequality mode(X)-E(X)>=h/2 in Step 4 is only achieved if mode(Y)-E(Y)=0

But consider a discrete continuous random variable Y_n with probability density P(Y_n=y)
where for some h, i, integer n with h>0 and n>0
P(Y_n=i+j.h)=1/(n+1) if j is an integer with 0<=j<=n and P(Y_n=i+j.h)=0 otherwise
then choose mode(Y_n)=i+n.h
E(Y_n)=i+n.h/2
Var(Y_n)= n.(n+2).h²/12
(mode(X_d)-E(X_n))²/Var(X_n)=3.n/(n+2)
=3-6/(n+2)

So as n tends to infinity
(mode(X_n)-E(X_n))²/Var(X_n) tends to 3 from below
and in a sense the cumulative probability function of X_n approaches that of a uniform distribution, particularly if h is made proportional to 1/n

X_n is only weakly unimodal but it would be easy to construct a similar sequence of strictly unimodal random variables with the same property

For a random variable which combines discrete and continuous elements equality is not achieved, since it requires a=0 in Step3

Note C: A one-tailed inequality for P(X>=mode(X)) (Return to top)

Since in Step 3 above
Var(X)>=(mode(X)-E(X))².(1+3.P(X>=mode(X)))/(3.(1-P(X>=mode(X))))
we get the result:
If mode(X)>=E(X)
then P(X>=mode(X))<=(3.Var(X)-(mode(X)-E(X))²)/(3.(Var(X)+(mode(X)-E(X))²))
i.e. P(X>=mode(X))<=1-4.(mode(X)-E(X))²/(3.(Var(X)+(mode(X)-E(X))²))

From this we have
P(X<=mode(X))>=4.(mode(X)-E(X))²/(3.(Var(X)+(mode(X)-E(X))²))
and by considering Y= -X we get the result:
If mode(Y)<=E(Y)
then P(Y>=mode(Y))>=4.(E(Y)-mode(Y))²/(3.(Var(Y)+(E(Y)-mode(Y))²))

Note D: Weakening and simplifying Gauss's inequality (Return to top)

The terms in these results have slight similarities with Gauss's inequality of 1821 for a unimodal distribution:
P(|X-mode(X)|>=g.sqrt(Var(X)+(mode(X)-E(X))²))<=4/(9.g²)

In the extreme case of the uniform distribution where the mode is taken as being at one end
P((|X-mode(X)|>=k.sd(X))=16/(9.k²) for only one value of k:
4/sqrt(3) i.e. about 2.309401.. when P((|X-mode(X)|>=k.sd(X))=1/3
For all other unimodal distributions or values of k, equality is not achieved

Note E: Speculation on |median(X)-E(X)| for unimodal distributions (Return to top)

The result above is |mode(X)-E(X)|<=sqrt(3).sd(X)
or equivalently (mode(X)-E(X))²<=3.Var(X)

For a general random variable the equivalent result for the median is well known as
|median(X)-E(X)|<=sd(X)
or equivalently (median(X)-E(X))²<=Var(X)

I suspect that for a continuous random variable unimodal distribution the equivalent result for the median is
|median(X)-E(X)|<=sqrt(3/5).sd(X)
where sqrt(3/5)=0.77459..
or equivalently 5.(median(X)-E(X))²<=3.Var(X)
derived from speculation on a one-tailed version of Chebyshev's inequality for unimodal distributions, with equality approached for distributions with about half their probability very close to a single point and the remainder virtually uniformly distributed on one side of that point

This last result is tighter than the case for discrete random variables
for example if P(Y=0)=1/2-d and P(Y=1)=1/2+d for some small d>0
then this meets the definition of "weakly unimodal" in Step 4
but |median(Y)-E(Y)|>=(1-2.d).sd(Y) which can be very close to sd(Y)

Note F: Among positive monotonically decreasing continuous random variables, a uniform distribution minimises key descriptive statistics (Return to top)

Step 2 above showed that for a positive random variable with a known mean and an unknown finite variance and a monotonically decreasing probability density function, then the second moment E(X²) is bounded below by that of a uniform distribution with a minimum of zero and the same mean

Similar methods can be used to show directly that this uniform distribution provides a lower bound for both the variance and the maximum

Similar methods can also show that for a positive random variable with an unknown finite mean and an unknown finite variance and a monotonically decreasing probability density function bounded above by a known value (at or near zero), then the first moment E(X), the second moment E(X²), the variance and the maximum are all bounded below by those of a uniform distribution with a minimum of zero and the same bound on the maximum of the value of the probability density function

This ability of either (as in Step 2) a uniform distribution, or (as in Step 3 and Note A) a combination of point of positive probability and a uniform distribution to be the limiting case can lead to speculation of possible further results

For example, it might be the case that for a continuous random variable X with a unimodal probability density function and with values over a finite range:
maximum(X)-minimum(X)>=3.sd(X)
with equality approached for distributions with about a third of their probability very close to a single point and the remainder virtually uniformly distributed on one side of that point
maximum(X)-minimum(X)>=4.|median(X)-E(X)|
with equality approached for distributions with about half their probability very close to a single point and the remainder virtually uniformly distributed on one side of that point
maximum(X)-minimum(X)>=2.|mode(X)-E(X)|
with equality approached for distributions virtually uniformly distributed and with the mode at one end

The results for discrete random variables with unimodal probability distributions (as defined in Step 4) can be very different, perhaps
maximum(Y)-minimum(Y)>=2.sd(Y)
and at first glance it seems likely that this result may in fact apply to all bounded random variables, unimodal or not, whether they are discrete, continuous or mixed
If so, then we have the slightly broader result
maximum(Y)-minimum(Y)>=2.sd(Y)>=2.|median(Y)-E(Y)|
with the possibility of achieving or at least approaching equality

Return to top or go to some Statistics Jokes or look at a one-tailed version of Chebyshev's inequality and further discussion or the mean, median, mode and standard deviation relationship or see Henry Bottomley's home page