ibbly.com

Betting using rankings

July 2009 (index)

The story so far...

We've looked at simple betting patterns for English football. This showed some patterns in the data (home teams and favourites do better than other simple bets) but not enough to overcome the take.

We've looked at a ranking system for English football that lets us give teams a skill level based on their results.

Those skill levels give a predicted probability for the result of a match. So can we use that ranking system to come up with a profitable betting strategy?

Proof of concept

We'll start with a simple test. We saw elsewhere that betting in line with the probabilities implied by the odds gave a certain loss of around 7%.

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Implied probs 2004 344.84 380.00 -35.16 -9.3%
Implied probs 2005 347.00 380.00 -33.00 -8.7%
Implied probs 2006 347.69 380.00 -32.31 -8.5%
Implied probs 2007 351.57 380.00 -28.43 -7.5%
Implied probs 2008 355.10 380.00 -24.90 -6.6%

Next we'll look at betting on the 2008-09 season based on our rankings. We have an unfair advantage since the actual results have been used to generate our rankings. In practice it will be harder to predict outcomes based only on previous results before. But if this easier first attempt doesn't work then we should give up.

First, what if we bet on every match, according to the probabilities produced by our rankings:

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Rankings 0.0 2004 365.01 380.00 -14.99 -3.9%
Rankings 0.0 2005 374.94 380.00 -5.06 -1.3%
Rankings 0.0 2006 363.70 380.00 -16.30 -4.3%
Rankings 0.0 2007 373.27 380.00 -6.73 -1.8%
Rankings 0.0 2008 376.61 380.00 -3.39 -0.9%

This is promising. We've done better than the implied probs every season. So there's something useful in the ranking data.

So far we've bet on every match. Now we restrict ourselves cases where we think we have an advantage.

We've come up with probabilities (pH,pD,pA) for (home-win,draw,away-win) respectively. And we know the corresponding odds (oH,oD,oA).

If we bet 1 unit on a home-win then we expect to get back an amount oH from a proportion pH of the bets. So our expected return is pHoH.

We can calculate (mH,mD,mA) where mH=pHoH, mD=pDoD, mA=pAoA as a measure of how good a bet is. If m=1 then we think we'll break even in the long run. If m>1 we think we'll make money and if m<1 we think we'll lose.

We can look at what happens if we only bet on matches where m>1. In each case we bet the probability of the event (as for the implied probs strategy).

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Rankings 1.0 2004 158.95 133.90 +25.05 +18.7%
Rankings 1.0 2005 215.72 180.95 +34.77 +19.2%
Rankings 1.0 2006 158.86 129.33 +29.53 +22.8%
Rankings 1.0 2007 206.28 185.49 +20.79 +11.2%
Rankings 1.0 2008 213.89 178.32 +35.56 +19.9%

Success! This looks like a pretty decent profit.

If we bet only when there's a good margin, only when m>1.2 then:

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Rankings 1.2 2004 46.70 31.74 +14.96 +47.1%
Rankings 1.2 2005 91.31 64.60 +26.71 +41.4%
Rankings 1.2 2006 36.87 25.86 +11.02 +42.6%
Rankings 1.2 2007 63.70 51.52 +12.18 +23.6%
Rankings 1.2 2008 71.15 54.81 +16.34 +29.8%

Even better. This is good news since it shows that our ranking system is doing sensible things. But bear in mind that we're betting based on rankings at the end of each season. So we're doing something we couldn't do in reality - we're cheating.

But there's still something interesting here. It shows that a set of rankings and our simple model is summarising meaningful information about likely results.

Betting "as it happens"

The real test will be to bet "as it happens". If we generate rankings based only on information available at the time of the match, can we still make a profit? Is there enough stability in the qualiy of teams to give us profitable predictions?

This time we'll fit the rankings based on the previous season, but test on this season. This gives:

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Rankings 1.2 2004 52.54 64.67 -12.13 -18.8%
Rankings 1.2 2005 47.80 56.32 -8.52 -15.1%
Rankings 1.2 2006 56.56 64.06 -7.50 -11.7%
Rankings 1.2 2007 39.96 66.25 -26.29 -39.7%
Rankings 1.2 2008 74.02 71.78 +2.24 +3.1%

Disappointing. Results are worse that a simple "bet everything" or "bet home" strategy.

We should be able to do better than this. This time we start with rankings based on the previous season, and then update match by match during the season. So we're as up to date as possible, but never using information from the future.

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Rankings 1.2 2004 20.42 22.99 -2.57 -11.2%
Rankings 1.2 2005 42.07 36.41 +5.67 +15.6%
Rankings 1.2 2006 14.55 19.09 -4.54 -23.8%
Rankings 1.2 2007 18.56 24.10 -5.54 -23.0%
Rankings 1.2 2008 47.11 45.23 +1.87 +4.1%

Beating the odds

This is frustrating. We have a system that can turn match results into rankings and hence probabilities but they aren't good enough to beat the odds.

To show this, suppose the odds of every match were (2.12,3.64,3.94) corresponding to probabilities of (47%,28%,25%), the average across the last five seasons.

Then a strategy of betting randomly gives roughly break-even results:

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Random 2004 377.94 380.00 -2.06 -0.5%
Random 2005 372.76 380.00 -7.24 -1.9%
Random 2006 389.76 380.00 +9.76 +2.6%
Random 2007 364.00 380.00 -16.00 -4.2%
Random 2008 426.12 380.00 +46.12 +12.1%

And betting based on our rankings would give big profits:

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Rankings 1.2 2004 228.42 146.47 +81.95 +55.9%
Rankings 1.2 2005 274.00 156.40 +117.60 +75.2%
Rankings 1.2 2006 225.01 146.98 +78.03 +53.1%
Rankings 1.2 2007 303.97 172.56 +131.41 +76.1%
Rankings 1.2 2008 269.83 161.07 +108.76 +67.5%

Another way of showing this is to plot the "response curve" for our predictions against the actual outcomes. The chart below shows this for predictions of a home win. We calculate our predicted probabilities for 3,800 matches over 10 seasons, and sort by predicted probability into 10 equally-sized groups. For each group we calculate the average predicted probability and also the proportion of those matches actually resulted in a home win.

response curve for home wins
response curve for home wins

If our predictions were useless then we'd expect the red dots and line to be horizontal (about 40-50% home wins whatver our prediction). If our predictions were perfect and there were enough games to remove random variations then the response curve should lie on the dotted line. The actual curve shows that our predictions aren't too bad.

The catch is that the odds take account of the same information that we're using (and more, such as information on individual players). We can think of three stages of successful prediction: (i) better than random; (ii) better than the predictions implied by the odds; (iii) sufficiently better than the implied predictions to beat the take. Only (iii) is profitable.

Betting amounts

So far we've been betting an amount equal to the probability of the outcome (if we think the bet will be profitable).

That is, we've been betting p whenever m>1.2

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Rankings 1.2 1999 42.89 41.72 +1.17 +2.8%
Rankings 1.2 2000 25.52 28.27 -2.75 -9.7%
Rankings 1.2 2001 15.44 17.60 -2.16 -12.3%
Rankings 1.2 2002 24.50 28.89 -4.39 -15.2%
Rankings 1.2 2003 12.91 10.70 +2.21 +20.6%
Rankings 1.2 2004 20.42 22.99 -2.57 -11.2%
Rankings 1.2 2005 42.07 36.41 +5.67 +15.6%
Rankings 1.2 2006 14.55 19.09 -4.54 -23.8%
Rankings 1.2 2007 18.56 24.10 -5.54 -23.0%
Rankings 1.2 2008 47.11 45.23 +1.87 +4.1%
Rankings 1.2 ALL 263.97 275.00 -11.02 -4.0%

Different betting amounts, even for the same predicted probabilities, will give different results. Betting 1 whenever m>1.2 gives a worse outcome:

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Rankings 1.2 1999 84.73 86.00 -1.27 -1.5%
Rankings 1.2 2000 55.21 60.00 -4.79 -8.0%
Rankings 1.2 2001 39.42 45.00 -5.58 -12.4%
Rankings 1.2 2002 41.42 53.00 -11.58 -21.8%
Rankings 1.2 2003 22.14 18.00 +4.14 +23.0%
Rankings 1.2 2004 37.91 48.00 -10.09 -21.0%
Rankings 1.2 2005 70.82 69.00 +1.82 +2.6%
Rankings 1.2 2006 41.15 47.00 -5.85 -12.4%
Rankings 1.2 2007 36.32 54.00 -17.68 -32.7%
Rankings 1.2 2008 95.58 97.00 -1.42 -1.5%
Rankings 1.2 ALL 524.70 577.00 -52.30 -9.1%

The Kelly criterion suggests that we should bet a proportion (m-1)/(o-1) of our funds whenever this is positive. This needs to be modified because we aren't sure of the probabilities; they're only our estimates. So instead we'll treat (m-1)/(o-1) as an amount to bet rather than a proportion. This gives us:

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Rankings Kelly 1999 43.05 38.32 +4.74 +12.4%
Rankings Kelly 2000 27.00 28.89 -1.89 -6.5%
Rankings Kelly 2001 13.07 16.59 -3.52 -21.2%
Rankings Kelly 2002 31.22 31.79 -0.57 -1.8%
Rankings Kelly 2003 21.75 21.98 -0.23 -1.1%
Rankings Kelly 2004 27.28 28.65 -1.36 -4.8%
Rankings Kelly 2005 45.53 42.11 +3.42 +8.1%
Rankings Kelly 2006 19.93 23.63 -3.70 -15.7%
Rankings Kelly 2007 31.61 33.30 -1.69 -5.1%
Rankings Kelly 2008 42.87 42.13 +0.74 +1.8%
Rankings Kelly ALL 303.32 307.39 -4.07 -1.3%

So it looks like a better way to decide how much to bet. But overall we still make a loss. And even if the overall figure was +1.3% rather than -1.3% it could easily be a lucky result.

Home favourites

So far our two most promising strategies are:
(i) only bet on home favourites; and
(ii) bet using rankings and Kelly.
Can we tweak (ii) based on our knowledge of (i)?

Across ten seasons here are the results of betting Kelly based on the rankings (the -1.3% above) and variants: bet only on home wins; bet only on the favourite; bet only on a home-favourite.

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Ranks Kelly All ALL 303.32 307.39 -4.07 -1.3%
Ranks Kelly H ALL 180.88 178.84 +2.03 +1.1%
Ranks Kelly F ALL 285.62 283.91 +1.71 +0.6%
Ranks Kelly H-F ALL 177.11 174.64 +2.47 +1.4%

This suggests that blending the "home favourite" and "rankings Kelly" approaches might give better results. (Although we'd want to test over more data.)

Looking at restricting "rankings kelly" to each of home, draw, away results shows:

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Ranks Kelly All ALL 303.32 307.39 -4.07 -1.3%
Ranks Kelly H ALL 180.88 178.84 +2.03 +1.1%
Ranks Kelly D ALL 0.86 1.35 -0.49 -36.3%
Ranks Kelly A ALL 121.58 127.20 -5.61 -4.4%

There's something odd here. Very few bets are made on the draw. Across 3800 games "ranking Kelly" makes 2731 bets: 1374 home, 83 draw, 1274 away. And the bet sizes for draws are much smaller than for home or away bets.

  Bet   Number  Total    Mean    Max     Min
All 2731 307.39 0.1126 0.7982 0.0001
Home 1374 178.84 0.1302 0.7982 0.0001
Draw 83 1.35 0.0163 0.1205 0.0003
Away 1274 127.20 0.0998 0.7543 0.0001

In practice we might set a limit on the minimum bet. Setting it to 0.1 gives:

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Rankings Kelly ALL 244.82 243.72 +1.11 +0.5%
Rankings Kelly H 151.76 148.20 +3.56 +2.4%
Rankings Kelly D 0.00 0.12 -0.12 -100.0%
Rankings Kelly A 93.07 95.40 -2.33 -2.4%
Rankings Kelly F 242.11 239.03 +3.08 +1.3%
Rankings Kelly H-F 150.49 147.24 +3.25 +2.2%

and

  Bet   Number  Total    Mean    Max     Min
All 1125 243.72 0.2166 0.7982 0.1001
Home 663 148.20 0.2235 0.7982 0.1001
Draw 1 0.12 0.1205 0.1205 0.1205
Away 461 95.40 0.2069 0.7543 0.1004
Fav 1090 239.03 0.2193 0.7982 0.1001
H-Fav 656 147.24 0.2245 0.7982 0.1001

So if we'd followed the approach of calculating the rankings, betting only on home teams, with amounts equal to the Kelly amount with a threshold bet of 0.1 and if that 0.1 meant £10, then over ten seasons we would have made 663 bets totalling £14,820 to make a net profit of £356, or 2.4% of the stakes.

But along the way we would have had quite a variation in good and bad seasons:

              Strategy  Winnings  Bets      Net     Net(%)
------------------------------------------------------------
Rankings Kelly H 1999 21.51 16.81 +4.70 +28.0%
Rankings Kelly H 2000 11.04 11.59 -0.55 -4.8%
Rankings Kelly H 2001 5.10 5.65 -0.55 -9.7%
Rankings Kelly H 2002 14.65 14.85 -0.19 -1.3%
Rankings Kelly H 2003 10.02 9.56 +0.46 +4.9%
Rankings Kelly H 2004 14.73 14.86 -0.12 -0.8%
Rankings Kelly H 2005 25.49 22.90 +2.60 +11.3%
Rankings Kelly H 2006 9.95 11.29 -1.34 -11.9%
Rankings Kelly H 2007 17.98 17.04 +0.94 +5.5%
Rankings Kelly H 2008 21.29 23.67 -2.38 -10.1%
Rankings Kelly H ALL 151.76 148.20 +3.56 +2.4%

This doesn't look appealing.

Trouble with draws

We saw that there was only one bet on a draw that qualified above the threshold.

  Bet   Number  Total    
All 1125 243.72
Home 663 148.20
Draw 1 0.12
Away 461 95.40

To investigate this we'll look at how our predicted probabilities compare to those implied by the odds. The next chart plots the implied probability of a draw (y-axis) against our prediction (x-axis). The data is for 3800 matches across 10 seasons.

Implied probabilities against predicted probabilities
Implied probabilities against predicted probabilities

The dotted diagonal line connects points where the predicted and implied probabilities are equal. Matches below the line are ones we think we should bet on: we think there's a higher chance of a draw than the bookmaker does. Because of the take we'd want the match to be some way below the line in order to bet on it.

We can see that there aren't many points below the line; hence few bets made on a draw. The lowest probability of a draw implied by the odds is 9.6%. But our predictions get as low as 1.6%. There may be a case for modifying our model, so that draws are never predicted to be so unlikely.

Modifying our prediction model

There are various ways we could change our model. Currently the probability of a home win is given by f(H+h-A-d,k) where H and A are skill levels, h and d are constants to deal with home-advantage and allow for draws, and f(x,k) is the logistic function 1/(1+exp(-kx)) with k an arbitrary constant.

We could replace the logistic function with something else, perhaps a weighted average of two or more logistic functions with different values of k. The chart below shows the current approach (blue) and a couple of tweaks. They both give a higher chance of a draw when two teams of very different strength meet.

variations on the logistic function
variations on the logistic function

Also we could look at changing how h and d interact with H and A in determining the effective difference in skill. Instead of H+h-A-d we could consider

H+h-A-d
H+h-Ad
Hd+h-A
(H+h)d-A
(H+h-A)d1+d2
Hh-A-D
(H-A)h-D
(H-A-D)h
Hh1+h2-A-D
...

and other variations. To examine these properly we would need to recalibrate optimal values of the parameters h,d for each model.

What next?

We've talked about modifying the model. There are other things we might consider (if we're so inclined):

Testing across more matches (eg lower divisions) to see if our results so far are genuine or lucky over-fitting.

Looking at different countries. Are the home/draw biases different? How does having 2 points for a win rather than 3 affect things? Does it look as if getting a good model for one country allows us to bet in many?

Looking at different sports. Is the model reasonable for eg cricket or rugby? These might be nice to try since the chance of a draw is much lower, and draws seem troublesome.

Extend the model so it gives probabilities of not just home/away/win, but how many goals the home team wins by (...,4,3,2,1,0,-1,-2,-3,-4). This could help when we calibrate the skill levels since knowing that A beat B by five goals would seem to give more information about the strengths of the teams than just knowing they won.

Incorporate information about players and details of the teams. I want to shy away from this in the hope that a good model would work for lots of sports and countries without needing specific knowledge. But maybe it's necessary for a good system.


ibbly.com contact