This week’s Fiddler is about rounding!
Let $\text{round}(x)$ be the value of $x$ rounded to the nearest integer. Suppose $x_1,\dots,x_n$ are independent uniformly distributed random variables in $[0,1]$. Find the probability that
\[
\text{round}(x_1+\cdots+x_n) = \text{round}(x_1)+\cdots+\text{round}(x_n)
\]
My solution:
[Show Solution]
Let’s call the probability we seek $p(n)$. The values of the $x_i$ determine what sums are even possible, so let’s consider some cases based on the possible values of $\text{round}(x_1)+\cdots+\text{round}(x_n)$. Suppose that $k$ of the variables are in in the interval $[\tfrac{1}{2},1]$ and the remaining $n-k$ variables are in $[0,\tfrac{1}{2}]$. The probability of this occurring is $\tfrac{1}{2^n}\binom{n}{k}$. In this case,
\[
\sum_{i=1}^n \text{round}(x_i) = k
\]Note: We can ignore the issue of whether $\tfrac{1}{2}$ rounds to $0$ or $1$ since the probability of $\tfrac{1}{2}$ occurring is zero. Define a set of rescaled variables $y_i$ in $[0,1]$ as follows:
\[
y_i = \begin{cases}
2x_i-1 & \text{if }i=1,\dots,k\\
2x_i & \text{if }i=k+1,\dots,n
\end{cases}
\]Now let’s calculate the probability that the sum of the $x_i$ also rounds to $k$ and express it in terms of the $y_i$:
\begin{align*}
\mathrm{Prob}\left( \text{round}\biggl(\sum_{i=1}^n x_i\biggr) = k\right)
&= \mathrm{Prob}\left( k-\tfrac{1}{2} \leq \sum_{i=1}^n x_i \leq k+\tfrac{1}{2} \right) \\
&= \mathrm{Prob}\left( k-1 \leq \sum_{i=1}^n y_i \leq k+1 \right)
\end{align*}The sum of the $y_i$, which is a sum of $n$ independent variables in $[0,1]$ follows a so-called Irwin-Hall distribution, whose CDF is given by:
\[
F(x) = \text{Prob}\bigl( y_1+\cdots+y_n \leq x \bigr) = \frac{1}{n!}\sum_{k=0}^{\lfloor x \rfloor} (-1)^k \binom{n}{k}(x-k)^n
\]Therefore, the probability we seek is $F(k+1)-F(k-1)$, or:
\[
\frac{1}{n!}\sum_{m=0}^{k+1} (-1)^m \binom{n}{m}(k+1-m)^n
-\frac{1}{n!}\sum_{m=0}^{k-1} (-1)^m \binom{n}{m}(k-1-m)^n
\]Rearranging terms a bit, we obtain:
\[
\frac{(-1)^{n+k}}{n!} \binom{n}{k}+
\frac{1}{n!}\sum_{m=0}^{k} (-1)^m \binom{n}{m}\bigl( (k+1-m)^n-(k-1-m)^n\bigr)
\]By writing out the terms carefully, we can group like $n^\text{th}$ powers and we obtain a simpler equivalent expression:
\[
\frac{1}{n!}\sum_{m=0}^k(-1)^m\left[ \binom{n+1}{m}-\binom{n+1}{m-1}\right](k+1-m)^n
\]Remember this was the probability that the two roundings are the same when $k$ of the $x_i$’s round to $1$. Multiplying each such probability by its prior probability and summing, we get the total probability that the two roundings are the same:
\[
p(n) = \frac{1}{2^n\,n!}\sum_{k=0}^n \binom{n}{k}\sum_{m=0}^k(-1)^m\left[ \binom{n+1}{m}-\binom{n+1}{m-1}\right](k+1-m)^n
\]This can be further simplified by grouping $n^\text{th}$ powers once again and observing that all odd terms cancel out. After quite a bit of algebra, we obtain the simplest expression possible, which is:
$\displaystyle
p(n) = \frac{1}{2^n\, n!}\sum _{k=0}^{\left\lfloor \frac{n}{2}\right\rfloor } (-1)^k \binom{n+1}{k} (n+1-2k)^n
$
We can evaluate this for different values of $n$ and we obtain:
\[
\begin{array}{c|cccccccccc}
n & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10\\ \hline
p(n) & 1 & \frac{3}{4} & \frac{2}{3} & \frac{115}{192} & \frac{11}{20} & \frac{5887}{11520} & \frac{151}{315} & \frac{259723}{573440} & \frac{15619}{36288} & \frac{381773117}{928972800}
\end{array}
\]
The sequence $2^n \, n!\, p(n)$ is actually a well-known sequence. It is A261398 in OEIS (the Online Encyclopedia of Integer Sequences), and according to OEIS it has no known closed-form formula and the one above is the simplest one available. Good enough for me!
Visualization in 2D and beyond
We can visualize the probabilities for the case $n=2$ by coloring in the set of points $(x_1,x_2)$ for which the sum of the rounded values equals the rounded value of the sum. This yields:
We can check the shaded region is $\frac{3}{4}$ of the total area, as anticipated. We can do the same for $n=3$. This time, we plot the set of points $(x_1,x_2,x_3)$ and the probability is the shaded volume:
Again, we can check the volume is $\frac{2}{3}$. Unfortunately, we can’t visualize the case $n=4$, but we can try… This should be a 4D hypercube. In the same way that each slice of a 3D cube is a square, each slice of a 4D hypercube is a 3D cube. Here is an animation showing what happens as we vary $x_4 \in [0,1]$ and we plot the cube resulting from the valid choices of $(x_1,x_2,x_3)$ as a function of $x_4$. The abrupt change occurs when $x_4=\tfrac{1}{2}$ as this causes the sum of rounded values to jump by $1$.
So now, if you can imagine, the probability we’re looking for is the “hyper-volume” of this 4D shape, which is given by integrating the volume above as we vary $x_4$, and as it turns out,
\[
p(4) = \int_{0}^1 \text{Volume}(x_4)\,\mathrm{d}x_4 = \frac{115}{192}.
\]
Approximation
Recall that the solution we found above is given by
\[
p(n) = \sum_{k=0}^n \frac{1}{2^n}\binom{n}{k}\bigl( F(k+1)-F(k-1) \bigr)
\]where $F(x)$ is the CDF of the Irwin-Hall distribution. The formula we found for this is quite messy, so let’s see if we can approximate it.
First, the Irwin-Hall distribution can be well-approximated by a normal distribution. This makes sense, since summing a large number of identically distributed random variables tends to a normal distribution by the central limit theorem. In this case, the limiting distribution has mean $\mu=\frac{n}{2}$ and variance $\sigma^2=\frac{n}{12}$. Therefore, we can approximate:
\begin{align*}
F(k+1)-F(k-1) &\approx 2F'(k) \\
&\approx \frac{2}{\sqrt{2\pi \sigma^2}} \exp\biggl( -\frac{(x-\mu)^2}{2\sigma^2} \biggr) \\
&= \frac{1}{\sqrt{\pi n/24}}\exp\biggl( -\frac{(k-\tfrac{n}{2})^2}{n/6} \biggr)
\end{align*}Next, the weighted binomial sum is actually a binomial distribution, which we can also approximate using a normal distribution. In general, we have $B(n,p) \approx \mathcal{N}(\mu,\sigma^2)$, with $\mu=np$ and $\sigma^2=np(1-p)$. Therefore, if $g$ is any function, we can write the binomial sum as an expectation:
\begin{align*}
\sum_{k=0}^n \frac{1}{2^n}\binom{n}{k} g(k)
&= \mathbb{E}_{k \sim B(n,\tfrac{1}{2})}\bigl( g(k) \bigr) \\
&\approx \mathbb{E}_{k \sim \mathcal{N}(\tfrac{n}{2},\tfrac{n}{4})}\bigl( g(k) \bigr) \\
&=\frac{1}{\sqrt{\pi n/2}} \int_{-\infty}^\infty \exp\biggl(-\frac{(k-\tfrac{n}{2})^2}{n/2} \biggr)g(k)\,\mathrm{d}k
\end{align*}Setting $g(k)=F(k+1)-F(k-1)$ and substituting our previous approximation, we find:
\begin{align*}
p(n) &\approx \frac{1}{\sqrt{\pi n/2}}\frac{1}{\sqrt{\pi n/24}}\int_{-\infty}^\infty \exp\biggl( -\frac{(k-\tfrac{n}{2})^2}{n/6} \biggr)\exp\biggl(-\frac{(x-\tfrac{n}{2})^2}{n/2} \biggr)\,\mathrm{d}k \\
&= \frac{\sqrt{48}}{\pi n} \int_{-\infty}^\infty \exp\biggl(-\frac{(k-\tfrac{n}{2})^2}{n/8} \biggr) \,\mathrm{d}k \\
&= \frac{\sqrt{48}}{\pi n} \sqrt{\pi n/8} \\
\end{align*}Therefore, we have:
$\displaystyle
p(n) \approx \sqrt{\frac{6}{\pi n}}
$
To check our approximation, we can plot it together with the true values:
Not bad! So in conclusion, we have an exact formula and an asymptotically exact approximation for the probability that
\[
\text{round}(x_1+\cdots+x_n) = \text{round}(x_1)+\cdots+\text{round}(x_n)
\]They are given by:
$\displaystyle
p(n) = \frac{1}{2^n\, n!}\sum _{k=0}^{\left\lfloor \frac{n}{2}\right\rfloor } (-1)^k \binom{n+1}{k} (n+1-2k)^n
\approx \sqrt{\frac{6}{\pi n}}
$