## An upper bound on |mode(X)-E(X)|

This page puts an upper bound on the difference between the mode and mean of a random variable with a weakly unimodal probability density function or with a weakly unimodal probability distribution, and proves this bound. It also discusses when the bound is achieved or approached, what constraints there are on the distribution on one side of the mode, how this relates to Gauss's inequality, an equivalent speculative result for the median, and the minimising properties of a uniform distribution

 |mode(X)-E(X)|<=sqrt(3).sd(X) Combining continuous and discrete elements Step 0 Initial assumptions Achieving equality P(X>=mode(X)) Step 2 Dealing with X

To prove that for a weakly unimodal random variable X:
|mode(X)-E(X)|<=sqrt(3).sd(X)
or equivalently (mode(X)-E(X))
2<=3.Var(X)

Step 0 (Return to top)

• Look first at a continuous random variable X
with a weakly unimodal probability density function
(The discrete case comes in Step 4)
• "Weakly unimodal probability density function" means
a probability density function p(x) for X
with a value of x called mode(X)
where for all y and z where z<y<mode(X) or mode(X)<y<z
p(z)<=p(y)<=p(mode(X))
i.e. p(x) is weakly monotonically increasing up to mode(X)
and p(x) is weakly monotonically decreasing down from mode(X)
• Note that in cases where p(x) achieves its maximum over a range
mode(X) may not be uniquely defined;
in such cases the selection of a value for mode(X) is discretionary
• If mode(X)=E(X) then |mode(X)-E(X)|=0<=sqrt(3).sd(X)
• So assume mode(X)>E(X)
(otherwise consider Y= -X and mode(Y)>E(Y) )
• Let m=mode(X)-E(X)
and so from assumption m>0
• Let a=P(X>=mode(X))
note a<1
• Let b=E(X-E(X)|X>=mode(X))
note b>=m>0
• Let c=E((X-E(X))2|X>=mode(X))
note c>=m2>0 and a.c<Var(X)
• then E(X-E(X)|X<mode(X))= -a.b/(1-a)
• and E((X-E(X))2|X<mode(X))=(Var(X)-a.c)/(1-a)

Step 1 (Return to top)

• Suppose M|(X>=mode(X))=m
and M|(X<mode(X))=0
• Suppose S|(X>=mode(X))=0
and S|(X<mode(X))=m.(a.b-a.m+(X-E(X)))/(a.b-a.m+m)
• M.S=0
• E(M)=a.m
• E(S)=(1-a).m.(a.b-a.m-a.b/(1-a))/(a.b-a.m+m)
= -a.m
• so E(M+S)=0
• E(M2)=a.m2<=a.c
• E(S2)=(1-a).E(S2| X<mode(X))
=(1-a).(m2/(a.b-a.m+m)2).(Var(X)-a.c)/(1-a)
<=Var(X)-a.c
• so Var(M+S)=E(M2)+E(S2)
<=Var(X)
• and S|(X<mode(X)) is weakly monotonically increasing up to m
with mean -a.m/(1-a) and variance<=(Var(X)-a.c)/(1-a)

Step 2 (Return to top)

• (m-S)|(X<mode(X)) is a positive continuous random variable
with a weakly monotonically decreasing probability density function,
mean m/(1-a) and finite variance bounded above by (Var(X)-a.c)/(1-a)
• let f(x) be probability density function of (m-S)|(X<mode(X))
• and let u(x) be probability density function of U(0,2.m/(1-a))
also a positive random variable with the same mean m/(1-a)
• int[0 to infinity]((f(x)-u(x)).dx)=0
• f(x)>=0=u(x) if x>2.m/(1-a)
• int[0 to 2.m/(1-a)]((u(x)-f(x)).dx)
=int[2.m/(1-a) to infinity]((f(x)-u(x)).dx)>=0
• so int[2.m/(1-a) to infinity](x2.(f(x)-u(x)).dx)
>=(2.m/(1-a))2.int[2.m/(1-a) to infinity]((f(x)-u(x)).dx)
=(2.m/(1-a))2.int[0 to 2.m/(1-a)]((u(x)-f(x)).dx)
>=int[0 to 2.m/(1-a))](x2.(u(x)-f(x)).dx)
• so E((m-S) 2|(X<mode(X)))=int[0 to infinity](x2.f(x).dx)
>=int[0 to infinity](x2.u(x).dx)=E(U2)
=(4/3).(m/(1-a))2
• so E((m-S)2)
>=a.m2+(4/3).m2/(1-a)
=m2.(3.a-3.a2+4)/(3.(1-a))
• But E((m-S)2)=m2-2.m.E(S)+E(S2)
=m2.(1+2.a)+ E(S2)
• so E(S2)
>=m2.(3.a-3.a2+4)/(3.(1-a))-m2.(1+2.a)
=m2.(1+3.a2)/(3.(1-a))

Step 3 (Return to top)

• Suppose M|(X>=mode(X))=m
and M|(X<mode(X))=0
• Suppose R|(X>=mode(X))=0
and R|(X>=mode(X))~U(-m.(1+a)/(1-a),m)
• E(M)=a.m
and E(M2)=a.m2
• E(R)= -a.m
and E(R2)=m2.(1+3.a2)/(3.(1-a))
• Var(X)>=E(M2)+E(S2)
>=E(M2)+E(R2)
=m2.(1+3.a)/(3.(1-a))
>=m2/3
• which implies Var(X)>=m2/3=(mode(X)-E(X))2/3
• i.e. |mode(X)-E(X)|<=sqrt(3).sd(X)
• The Pearson mode skewness is (E(X)-mode(X))/sd(X)
• and so |Pearson mode skewness|<=sqrt(3)
for a unimodal continuous random variable
• Now consider a discrete random variable with a weakly unimodal probability distribution
• It is easy to create a discrete distribution for which the inequality apparently does not apply. For example with:
P(Y=0)=0.18, P(Y=1)=0.19, P(Y=2)=0.20, P(Y=3)=0.21, P(Y=10)=0.22
Mode(Y)=10, E(Y)=3.42, Var(Y)=13.1836
(mode(X)-E(X))2/Var(X)=3.2841..>3
and higher figures can easily be obtained in similar ways
• This can be seen as not being properly unimodal
since P(Y=3)>P(Y=4)<P(Y=10)
but if we can restrict unimodal probability distributions to those where possible values are equally spaced
then we can achieve the result
• "Weakly unimodal probability distribution" means
a probability distribution P(Y=y) for Y
with a possible value of Y called mode(Y) and a spacing h>0
where for all integers n
1 and n2 where n2<n1<0 or 0>n1>n2
P(Y=mode(Y)+n
2.h)<=P(Y=mode(Y)+n1.h)<=P(Y=mode(Y))
and where if y<>mode(Y)+n.h for all integers n then P(Y=y)=0
• Note that in cases where P(Y=y) achieves its maximum over a range
mode(Y) may not be uniquely defined;
in such cases the selection of a value for mode(Y) is discretionary
• Suppose Y is such a discrete random variable with a weakly unimodal probability distribution
If mode(Y)=E(Y) then |mode-E(Y)|=0<=sqrt(3).sd(Y),
otherwise assume mode(Y)>E(Y) (or consider Z= -Y)
• Construct from Y a continuous random variable X
with a probability density function p(x)
where if for some integer n
mode(Y)+(n-1/2).h<x<=mode(Y)+(n+1/2).h
then p(x)=P(Y=mode(Y)+n.h)/h
• We then have
E(X)=E(Y)
Var(X)=Var(Y)+h2/12
mode(X) is in the range (mode(Y)-h/2,mode(Y)+h/2]
• Select mode(X)=mode(Y)+h/2
so mode(X)-E(X)>=h/2
• p(x) is a weakly unimodal probability density function as in Step 0
so from Step 3 (mode(X)-E(X))2/3<=Var(X)
• (mode(Y)-E(Y))2
=(mode(X)-h/2-E(X))2
=((mode(X)-E(X))2-((mode(X)-E(X)).h+h2/4
<=((mode(X)-E(X))2-h2/2+h2/4
<=3.Var(X)-3.h2/12
=3.Var(Y)
• so |mode(Y)-E(Y)|<=sqrt(3).sd(Y)

Note A: Random variables combining continuous and discrete elements (Return to top)

Steps 0 to 3 dealt with continuous random variables while Step 4 dealt with discrete random variables. The remaining question is that of combinations of the two

We can exclude random variables with two or more points of positive probability together with continuous elements since they will fail a suitable definition of being unimodal similar to that in Step 4

That leaves random variables that combine a single point of positive probability together with continuous elements. For this to be unimodal, the point of positive probability needs to be the mode of the random variable, and the continuous elements need to be weakly monotonically increasing up to that point and weakly monotonically decreasing down from that point. In such a case the calculations in Steps 0 to 3 still apply, and so the result applies.

Note B: Achieving equality |mode(X)-E(X)|=sqrt(3).sd(X) (Return to top)

The equality |mode(X)-E(X)|=sqrt(3).sd(X) is achieved for any uniform distribution if the mode is taken as being at one end

If this is not seen as being sufficiently unimodal, then equality is not achieved for a continuous random variable except in the trivial case where the distribution is a single atom, since equality requires a=0 in Step 3, and requires f(x)=u(x), possibly except at x=0 and x=2.m/(1-a), in Step 2

But consider a continuous random variable Xd with probability distribution function pd(x)
where for some d, i and j with j>0 and 0<d<=1/j
pd(x)=0 if x<=i, pd(x)=(1+2.d.x-2.d.i-d.j)/j if i<x<=i+j, pd(x)=0 if i+j<x,
then mode(Xd)=i+j
E(Xd)=i+j/2+d.j2/6
Var(Xd)=j2/12-d2.j4/36
(mode(Xd)-E(Xd))2/Var(Xd)=(3-d.j)2/(3-d2.j2)
=3-d.j.2+d2.j2.(4-d.j.2)/(3-d2.j2)

So as d tends to 0 from above, Xd approaches a uniform distribution
and (mode(X
d)-E(Xd))2/Var(Xd) tends to 3 from below

By comparison,
for a triangular distribution, d=1/j and (mode(Xd)-E(Xd))2/Var(Xd)=2
while for an exponential distribution (mode(X)-E(X))2/Var(X)=1

For a discrete random variable equality is not achieved except in the trivial case where the distribution is a single atom, since equality in the inequality mode(X)-E(X)>=h/2 in Step 4 is only achieved if mode(Y)-E(Y)=0

But consider a discrete continuous random variable Yn with probability density P(Yn=y)
where for some h, i, integer n with h>0 and n>0
P(Yn=i+j.h)=1/(n+1) if j is an integer with 0<=j<=n and P(Yn=i+j.h)=0 otherwise
then choose mode(Yn)=i+n.h
E(Yn)=i+n.h/2
Var(Yn)= n.(n+2).h2/12
(mode(Xd)-E(Xn))2/Var(Xn)=3.n/(n+2)
=3-6/(n+2)

So as n tends to infinity
(mode(X
n)-E(Xn))2/Var(Xn) tends to 3 from below
and in a sense the cumulative probability function of Xn approaches that of a uniform distribution, particularly if h is made proportional to 1/n

Xn is only weakly unimodal but it would be easy to construct a similar sequence of strictly unimodal random variables with the same property

For a random variable which combines discrete and continuous elements equality is not achieved, since it requires a=0 in Step3

Note C: A one-tailed inequality for P(X>=mode(X)) (Return to top)

Since in Step 3 above
Var(X)>=(mode(X)-E(X))2.(1+3.P(X>=mode(X)))/(3.(1-P(X>=mode(X))))
we get the result:
If mode(X)>=E(X)
then P(X>=mode(X))<=(3.Var(X)-(mode(X)-E(X))
2)/(3.(Var(X)+(mode(X)-E(X))2))
i.e. P(X>=mode(X))<=1-4.(mode(X)-E(X))
2/(3.(Var(X)+(mode(X)-E(X))2))

From this we have
P(X<=mode(X))>=4.(mode(X)-E(X))2/(3.(Var(X)+(mode(X)-E(X))2))
and by considering Y= -X we get the result:
If mode(Y)<=E(Y)
then P(Y>=mode(Y))>=4.(E(Y)-mode(Y))
2/(3.(Var(Y)+(E(Y)-mode(Y))2))

Note D: Weakening and simplifying Gauss's inequality (Return to top)

The terms in these results have slight similarities with Gauss's inequality of 1821 for a unimodal distribution:
P(|X-mode(X)|>=g.sqrt(Var(X)+(mode(X)-E(X))2))<=4/(9.g2)

This could become P(|X-mode(X)|>2.g.sd(X))<=16/(9.(2.g)2)
i.e. P(|X-mode(X)|>=k.sd(X))<=16/(9.k2)
or P(|X-mode(X)|>=t)<=Var(X).16/(9.t
2)
which is weaker but simpler than Gauss's inequality
and is not that far away in form from Chebyshev's inequality:
P(|X-E(X)|>k.sd(X))<=1/k2
or P(|X-E(X)|>t)<=Var(X)/t2
though in a unimodal case, Chebyshev's inequality could be considerably tightened

In the extreme case of the uniform distribution where the mode is taken as being at one end
P((|X-mode(X)|>=k.sd(X))=16/(9.k2) for only one value of k:
4/sqrt(3) i.e. about 2.309401.. when P((|X-mode(X)|>=k.sd(X))=1/3
For all other unimodal distributions or values of k, equality is not achieved

Note E: Speculation on |median(X)-E(X)| for unimodal distributions (Return to top)

The result above is |mode(X)-E(X)|<=sqrt(3).sd(X)
or equivalently (mode(X)-E(X))2<=3.Var(X)

For a general random variable the equivalent result for the median is well known as
|median(X)-E(X)|<=sd(X)
or equivalently (median(X)-E(X))2<=Var(X)

I suspect that for a continuous random variable unimodal distribution the equivalent result for the median is
|median(X)-E(X)|<=sqrt(3/5).sd(X)
where sqrt(3/5)=0.77459..
or equivalently 5.(median(X)-E(X))
2<=3.Var(X)
derived from speculation on a one-tailed version of Chebyshev's inequality for unimodal distributions, with equality approached for distributions with about half their probability very close to a single point and the remainder virtually uniformly distributed on one side of that point

This last result is tighter than the case for discrete random variables
for example if P(Y=0)=1/2-d and P(Y=1)=1/2+d for some small d>0
then this meets the definition of "weakly unimodal" in Step 4
but |median(Y)-E(Y)|>=(1-2.d).sd(Y) which can be very close to sd(Y)

Note F: Among positive monotonically decreasing continuous random variables, a uniform distribution minimises key descriptive statistics (Return to top)

Step 2 above showed that for a positive random variable with a known mean and an unknown finite variance and a monotonically decreasing probability density function, then the second moment E(X2) is bounded below by that of a uniform distribution with a minimum of zero and the same mean

Similar methods can be used to show directly that this uniform distribution provides a lower bound for both the variance and the maximum

Similar methods can also show that for a positive random variable with an unknown finite mean and an unknown finite variance and a monotonically decreasing probability density function bounded above by a known value (at or near zero), then the first moment E(X), the second moment E(X2), the variance and the maximum are all bounded below by those of a uniform distribution with a minimum of zero and the same bound on the maximum of the value of the probability density function

This ability of either (as in Step 2) a uniform distribution, or (as in Step 3 and Note A) a combination of point of positive probability and a uniform distribution to be the limiting case can lead to speculation of possible further results

For example, it might be the case that for a continuous random variable X with a unimodal probability density function and with values over a finite range:
maximum(X)-minimum(X)>=3.sd(X)
with equality approached for distributions with about a third of their probability very close to a single point and the remainder virtually uniformly distributed on one side of that point
maximum(X)-minimum(X)>=4.|median(X)-E(X)|
with equality approached for distributions with about half their probability very close to a single point and the remainder virtually uniformly distributed on one side of that point
maximum(X)-minimum(X)>=2.|mode(X)-E(X)|
with equality approached for distributions virtually uniformly distributed and with the mode at one end

The results for discrete random variables with unimodal probability distributions (as defined in Step 4) can be very different, perhaps
maximum(Y)-minimum(Y)>=2.sd(Y)
and at first glance it seems likely that this result may in fact apply to all bounded random variables, unimodal or not, whether they are discrete, continuous or mixed
If so, then we have the slightly broader result
maximum(Y)-minimum(Y)>=2.sd(Y)>=2.|median(Y)-E(Y)|
with the possibility of achieving or at least approaching equality