Many people have noticed situations where the party forecast to win the seat does not have the largest probability of winning the seat. This paradox looks like there is an error in the calculations, make it is actually real.
We can understand what causes the paradox by looking at the example seat of Sheffield Hallam. On 26 May 2017, the seat prediction was as shown here:
Sheffield Hallam2015 Votes  2015 Share  Predicted Votes  

LIB  22,215  40.0%  39.4% 
LAB  19,862  35.8%  38.5% 
CON  7,544  13.6%  19.7% 
UKIP  3,575  6.4%  0.3% 
Green  1,772  3.2%  1.8% 
OTH  513  0.9%  0.2% 
LIB Majority  2,353  4.2%  Pred Maj 0.9% 
Chance of winning  

LIB 
 
LAB 
 
CON 
 
UKIP 
 
Green 
 
OTH 

As we see, the Liberal Democrats were predicted to win the seat narrowly over Labour, but they were not the most likely party to win the seat. Labour had a higher chance of winning, and indeed its chance was more than 50pc. What accounts for this apparent discrepancy?
A key to unravelling the mystery is to understand how the prediction model works. The Strong Transition Model (STM) is more advanced than the Uniform National Swing model which assumes that all seats behave the same. With STM, voters can switch between parties using random distributions which are calibrated to the opinion polls. Additionally voters are classified as either "strong" (or sticky) voters who do not usually change party or floating voters who are more likely to change party.
But this means there is a complex response between a party's national vote share, and their projected vote share in a particular seat. We can see this for Sheffield Hallam:
The graph shows the vote share for the Lib Dems in Sheffield Hallam against their national vote share, plotted for 1,000 random simulations.
At the general election in 2015, the Lib Dems received 8.1pc of the national vote, but 40pc of the votes in Sheffield Hallam. Note that graph passes through that point – when the national vote share is around 8pc then the seat vote share is around 40pc.
The graph is curved, which is important and necessary. If the graph was a straight line, then worse paradoxes would arise. There are two relevant possibilities:
This corresponds to the basic UNS model, and means that for every 1pc increase (or decrease) in national support, the seat support also increases (or decreases) by 1pc.
The obvious problem with this model is when the Lib Dem national support is exactly zero, but the model still implies a seat support of 32pc although no votes were cast for the Lib Dems at all. The model also implies negative Lib Dem support in many seats to balance that. This is what mathematicians call an "infeasible" solution, and everyone else calls plain daft. It is not a good model, especially for declining parties.
This corresponds to a multiplicative model. If the national support doubles (or halves), then the seat support also doubles (or halves).
The problem with this model is twofold. Firstly, it is very harsh on declining parties and tends to predict they will get zero seats. Secondly it creates crazy results for gaining parties. If the Lib Dems got up to 20pc in the national polls, then the model predicts they would win more than 100pc of the votes in Sheffield Hallam. That is nonsense too.
Importantly, Labour's graph of votes in Sheffield Hallam is different.
On the whole, Labour's curve is a straight line. As a major party, the UNS model works less badly and the STM model fairly closely approximates a straight line.
The other crucial fact about the Lib Dem curve is that it is concave. This means basically that it has downwardssloping curvature (Wikipedia: concave function)
And here is the crucial mathematical fact:
If one party has a concave response function, and another party has a straight line response function, and they have equal predicted vote shares, then the concave party has a smaller chance of winning the seat.
(This is a mathematical statement which is proved in the appendix, for those who are interested.)
This is the driver for the paradox. Because the Lib Dems have a convex response function, which is sublinear, they lose out on average to the Labour party which has a linear response function.
Labour  

Lib  20.0  21.5  23.0  24.5  26.0  27.5  29.0  30.5  32.0  33.5  35.0  36.5  38.0  39.5  41.0  42.5  44.0  45.5  47.0  48.5  50.0 
0.0  13  15  18  20  22  23  24  24  24  23  21  20  18  16  14  12  10  8  7  5  15 
1.5  32  38  44  49  54  57  59  59  58  56  53  49  44  39  34  29  24  20  16  13  37 
3.0  40  48  56  63  68  72  74  75  74  71  67  62  56  50  43  37  31  26  21  16  46 
4.5  42  50  58  65  71  75  78  78  77  74  70  64  58  52  45  39  32  27  22  17  48 
6.0  39  47  55  62  67  71  73  74  72  70  66  61  55  49  43  36  31  25  20  16  46 
7.5  35  42  49  55  59  63  65  65  64  62  58  54  49  43  38  32  27  22  18  14  41 
9.0  30  36  41  46  51  54  55  56  55  53  50  46  41  37  32  27  23  19  15  12  34 
10.5  25  29  34  38  42  44  45  46  45  43  41  38  34  30  26  23  19  16  13  10  28 
12.0  20  24  27  31  33  35  36  37  36  35  33  30  27  24  21  18  15  13  10  8  23 
13.5  15  19  22  24  26  28  29  29  28  27  26  24  22  19  17  14  12  10  8  6  18 
15.0  12  14  17  19  20  21  22  22  22  21  20  18  17  15  13  11  9  8  6  5  14 
16.5  9  11  13  14  15  16  17  17  17  16  15  14  13  11  10  8  7  6  5  4  10 
18.0  7  8  9  11  11  12  13  13  12  12  11  10  9  8  7  6  5  4  3  3  8 
19.5  5  6  7  8  8  9  9  9  9  9  8  8  7  6  5  5  4  3  3  2  6 
21.0  4  4  5  6  6  7  7  7  7  6  6  6  5  4  4  3  3  2  2  1  4 
22.5  3  3  4  4  4  5  5  5  5  5  4  4  4  3  3  2  2  2  1  1  3 
24.0  2  2  3  3  3  3  3  3  3  3  3  3  3  2  2  2  1  1  1  1  2 
25.5  1  2  2  2  2  2  2  2  2  2  2  2  2  2  1  1  1  1  1  1  1 
27.0  1  1  1  1  2  2  2  2  2  2  1  1  1  1  1  1  1  1  0  0  1 
28.5  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  0  0  0  0  1 
30.0  1  1  2  2  2  2  2  2  2  2  2  2  2  1  1  1  1  1  1  0  1 
The table above shows this result in the specific case of Sheffield Hallam. The xaxis along the top row represents Labour's national vote share which runs from 20pc to 50pc. The yaxis down the lefthand column has the Liberal Democrat national vote share, which can be between 0pc and 30pc. Each cell is coloured red or orange depending on whether the Sheffield Hallam seat is won by Labour or the Liberal Democrats. The number in the cell is the chance of it happening, given in hundredths of a percent. So a value of 50 means 0.5pc.
The total sum of the red cells is 4867, so Labour has a 48.7pc chance of winning. The total sum of the orange cells is 4283, so the Lib Dems have a 42.8pc chance of winning. This is an approximate calculation, but it shows that Labour has a higher chance of winning the seat. The predicted national vote shares were 32.5pc for Labour and 8.7pc for the Lib Dems. Using the table to look up this combination, confirms that the Lib Dems would win the seat if that is what happened. But the chance of that happening is less than the chance of Labour winning.
So the model is behaving correctly. As it is not possible for the Lib Dems to have a sensible linear response function, thus their concavity is natural, and the low win chance follows from that.
Let us start with some notation. Let:
Under the assumption that X and Y are jointly normal random variables, and that the seat is balanced so that f (E(X)) = g (E(Y)), then
P( f (X) > g(Y) ) < 0.5.
Let's define μ_{X} to be the expectation of X, so μ_{X} = E(X), and α_{X} to be the gradient of f at μ_{X}, with α_{X} = f '(μ_{X}). (If f is not strictly differentiable, then use the rightderivative.) Similarly, define μ_{Y} = E(Y) and α_{Y} = g '(μ_{Y}).
Since f is concave, it lies beneath its tangent at μ_{X}, so that
f (X) ≤ f (μ_{X}) + α_{X} ( X − μ_{X} ),
with a positive chance of the inequality being strict. As g is affine, it equals its tangent at μ_{Y}, so that
g (Y) = g (μ_{Y}) + α_{Y} ( Y − μ_{Y} ).
Thus the probability that party #1 wins the seat is
P( f (X) > g(Y) ) < P( α_{X} ( X − μ_{X} ) > α_{Y} ( Y − μ_{Y} ) ) = 0.5,
since f (μ_{X}) = g (μ_{Y}), and X and Y are joint normals centred on μ_{X} and μ_{Y} respectively.
Another factor which drives the paradox is the skew in the national vote share distribution. In practice this cannot be the normal distribution, because it the vote share must lie between 0 and 100pc. As described in this article, the random distribution used is a multivariate beta distribution.
For the minor parties such as the Lib Dems and Greens, the beta distribution is skewed. That is, its mean is larger than its median. The mean is used for the central forecast, but the median is the driver of the win chances.
The graph shows the probability density function for the Lib Dem national vote share, along with the median and the mean of that distribution.
The fact that the median is less than the mean is another contribution to the lower win chances for Lib Dems.