Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

Follow publication

ANALYTICS MADE EASY

Quantitative Betting With Python: How To Backtest a Value Bet Strategy

octosport.io
Geek Culture
Published in
9 min readJan 3, 2022

Quantitative betting consists in using mathematics and algorithm to place a bet as it can be done in finance. To achieve this there are two important steps. First, you need to have predictions, second, you need to backtest your strategy. The backtest is simply a simulation that will help to understand the profitability of a strategy, but also your risk using historical data.

This article will focus on backtesting a strategy given probabilities. For that, we will use data and probabilities provided by Sportmonks.com. In the first part, we will explain in detail how you can backtest a trading strategy on soccer with a focus on value betting. The second part will focus on the data cleaning, the common mistakes, and a concrete example.

How to backtest a trading strategy?

Backtesting is at the core of quantitative trading and betting. Using historical data as odds or probabilities you can test hypotheses. For instance, you can test a value bet system on a specific market, try to find the breakeven odds, how to manage your bankroll, or even limit your market impact.

To backtest hypotheses it is fairly simple. We just have to recreate the market condition at a specific time in the past without using information that was not available at this time. It means do not use odds data you did not have or probabilities that were not available, so preferably use timestamped data.

Probabilities are the key

As we mentioned, the starting point of backtesting is gathering data. For instance, take all matches of the English Premier League for 2 years with associated odds, results, and probabilities. Odds and results are easy to find and often well timestamped. But finding correct probabilities could be harder.

There are two ways of getting probabilities: build a model yourself, or use one or several external providers. In both cases, you must understand the following points.

Probabilities must not rely on odds: indeed you need to make sure that odds are not used to compute the probabilities especially for a value bet model. For instance, do not use odds as a feature in a machine-learning algorithm to make probability predictions. There is no point in trying to beat the bookmaker using their odds.

Probabilities must not use future data: that seems obvious but it is a common mistake. For instance, do not use data to make a prediction in October 2020 using data after that date. You will obtain to optimistic prediction.

While it is relatively easy to stick to these rules when you build a model by yourself it is not the case if you use external predictions and you should choose them very carefully. Indeed it is easy for a data provider to modify probabilities afterward, or use in-sample prediction, yielding again optimistic results.

Backtesting open the door

Now we have the probabilities, the odds and the match results of the past season let’s backtest a football value bet strategy, focusing on 1x2 market. Value betting is a very common strategy that essentially takes advantage of bookmaker mispriced odds. We won’t detail the strategy as there are multiple resources online but we will refresh the main steps.

Given the probabilities for each outcome 1 ( home team wins), 2 (away team wins), and X (draw) and the associated odds, a value is detected if:

the value detection

If a value is found, we want to make the bet. For sake of simplicity, we will bet 1 unit of currency, for instance, 1€.

The value can be detected for one, two, or all outcomes at the same time. In this case, you won’t bet on the 3 outcomes. You can pick the highest value or the highest probability. It is a hypothesis you can backtest.

Say for a specific match we found “home team to win” is a value bet. Since we have the odd we can calculate the amount of money we would win or lose taking the bet depending on the outcome.

  • If the home team wins we get the odds minus our initial bet of 1€. If the odd were 3.2 our profit and loss (P&L) is 2.2€.
  • If the home team loses or draws we get nothing minus our initial bet of 1€. In this case, our P&L is -1€ which is also the maximum loss we can suffer for one bet.

That’s it. We repeat the steps for all matches in our sample and we can calculate the performance of the strategy. Using Python, a pseudo code for the main backtest assuming we always take the maximum value will be:

cumulated_wealth = 0
for a_match in match_data:
odds = a_match["odds"]
probabilities = a_match["probabilities"]
result = a_match["result"]
value = bet_usd * probs - 1
is_value = value * (value>0.) #value bet detection
if is_value.max() == 0:
#no values
continue
is_win_bet = result == is_value.idxmax()
pnl = odds[result] * is_win_bet - 1 #P&L calculation
cumulated_wealth += pnl# Example
# result: 'X'
# odds: pd.Series({'1':1.45,'2':2.87,'X':2.1})
# probabilities: pd.Series({'1':0.35,'2':0.1,'X':0.55})
# is_value: pd.Series({'1': 0, '2': 0, 'X': 0.155})
# pnl: 1.1€

We have now the perfect tools to test strategies but also bankroll management, bet size, and much more. At the end of this journey, you can even build a trading bot that will automatically make the bet given the strategy rules.

One danger of backtesting is overfitting. When we add too many rules to your strategy there are good chance that your strategy do not work as expected in the futures. For instance you could be tempted to test a strategy on all leagues and select the best one to bet on. There is clearly a risk of overfitting. Nothing tells you that if a strategy works on one league this year it will continue next year.

Backtesting a value bet strategy

The first step is getting the data. For that, we will use sportmonks’s API. From it, we can have probabilities but also results and odds for multiple markets. Today we will focus on backtesting a 1x2 value bet strategy.

The second step is cleaning the odds data. Indeed we need to make sure that odds are available at the time of the bet. Say we place our bet in the last 15 minutes before the kick-off. For example, let’s use the Ekstraliga Women league on the 30 of October 2021. The match is UJ Krakow-Tarnovia Tarnów, the match starts at 11:00 AM, UTC time.

The odds are available for 10 bookmakers: 888Sport, Bet-At-Home, BetClic, Betfair, Cashpoint, CloudBet, Dafabet, Pncl, Unibet, and Bwin. The next table shows the timestamp of each odd

UTC timestamp of odds for UJ Krakow-Tarnovia Tarnów, 30 of October 2021

As you can see not all odds are usable. For instance, Betfair timestamp is 16 seconds after the kickoff, or Cashpoint odds have not been updated since the 29th, which might be the case, but it is better to discard it as well. In this match, only two providers will match our requirements: BetClic and CloudBet.

Now we know how to clean the odds data we can take two sets of the remaining odds to calculate the P&L. First for each outcome we take the maximum odds. Indeed we always want to have the best odds, but they might be a bit overoptimistic. The second set of odds that we could use is the average. All sets are shown in the next table.

The final set of odds

Obviously, the “average” set is not investable but it can be used as a proxy, you can always find odds above or equal to the average, so your strategy P&L is underestimated rather than overestimated.

At the time of this article, we have 16114 matches between October 2021 and December 2021 distributed over 643 leagues. Using the value bet detector as described above we found 779 value bets which are about 5% off all available matches. Yes, value bets are rare. Now let’s see the P&L. We will compare two strategies:

  • If multiple values are detected we choose the one with the highest value (strategies in bleu) and bet 1€
  • If multiple values are detected we choose the one with the highest probability (strategies in red) and bet 1€

Both strategies come for the best (dark) and average odds (light). The strategies are time-ordered so we start from older matches to recent ones. The next figures show the cumulative P&L in €.

Cumulative P&L for strategies

Several observations can be made. First, on this sample, it seems clear that probabilities help to detect value bets. Indeed all strategies have a positive P&L. Second, we wanted to know if it is better to use the highest probability rules or the highest value rule when multiple values are detected. It seems that the highest value makes a better job by overperforming of 40€ after 779 bets.

Third, let’s talk about the odds. As you can see betting on the best odd yield a total wealth of 125.6€ while betting on the average gives 47.2. It is more than a 35% reduction.

The “true” final wealth is probably in between the average odd strategy that gives you a lower bound and the best odd strategy that gives you an upper bound.

As we mentioned before, the true final wealth is probably in between the two versions. Indeed you will always have better odds than the average but not always get the best odds. the reason could be you do not have access to the bookmaker, the odds vanished when you trade, odds are moving, and so on.

Beyond the fact that the value bet strategy is working it is interesting to remark that given your hypotheses you can end up with a P&L of 125.6€ or 8.5€ which is a large gap. That is why it is important to keep in mind that the variance of the P&L can be high depending on your assumptions and will go higher as with adding more hypotheses and tests.

Conclusion

In this article, we describe how a value bet strategy could be tested and validated in practice. Using data we show how to backtest the historical performance of the strategy using python.

While getting the data on odds, probabilities, and match results became easier in the last few years, it does not mean that the job is done. Indeed we have seen that cleaning the data is an important step as well as testing hypotheses. Validating a strategy and the rules attached to it is the starting point of quantitative and algorithmic betting.

However we need to be aware of multiple risks we can face: the overfitting risk, the look forward bias, the survival bias, or the selection bias among others.

Backtesting is a great tool that opens the door to discovering quantitative strategies where no human decisions are made and where you let the algorithms pick the right bets for you.

All data used in this article can be found in the sportmonks API or are available upon request.

Disclaimer

This article is meant to show you how to backtest a value bet strategy. It is not betting advice and does not encourage you to use probabilities data for betting in any manner. The results presented in this article are simulations.

None of the information contained here constitutes an offer (or solicitation of an offer) to bet, make any investment, or participate in any particular betting strategy. Betting involves real risks of loss and the author is not responsible for any related loss. Past performance is not indicative of future performance; the material enclosed herein is for educational purposes only

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

octosport.io
octosport.io

Written by octosport.io

I am a data scientist writing about machine learning for football prediction at octosport.io.

Responses (1)

Write a response