Building an intuition for the kelly criterion

Simulating an effective bankroll management strategy
information theory
Author

Carl Colglazier

Published

November 3, 2023

Motivating Example

In an experiment, participants got a bankroll of $25 and a coin, weighted such that it landed on heads 60% of the time (Haghani and Dewey 2017). Over the course of the next 30 minutes, they could bet up to their entire bankroll on coin flips.

Despite the odds being heavily on their side, more than a quarter of participants lost their initial bankroll and more than a third lost money. How? Most players ended playing very bad strategies. Many even bet their entire bankroll on a single flip!

You can try out this senario in the interactive below. The slider controls the percentage of your bankroll you bet on each flip.

Defining the Kelly Criterion

The Kelly criterion describes an optimal betting size strategy which maximizes the expected growth rate (by optimizing the expected value of the logarithm of wealth). In the long run, it is the optimal strategy (e.g. as the number of bets approaches infinity).

\[ f\ast = p - \frac{1-p}{b} \]

where

  • \(f\ast\) is the fraction to bet
  • \(p\) is the expected probability the event occurs (and \(1 - p\) is the probability it does not occur), and
  • \(b\) is odds, or the proportion gained from the bet.

The Kelly criteron tells us to bet when \(b > \frac{1-p}{p}\), or when the payout is greater than the odds the event occurs.

Origins

Kelly (1956) describes his criterion through the lens of information theory. In his original paper, he presents an example of a gambler with a private wire which gives them insight into the results of a series of baseball games between evenly matched teams. The wire is noisy, and the gambler can only correctly predict the outcome of a game with probability \(p\). If the wire was perfect, the gambler could simply bet their entire bankroll each time and grow their bankroll with \(N\) bets to \(2^N\) times the original bankroll; however, because the wire is noisy, the gambler must bet less than their entire bankroll to avoid going bust (Kelly 1956, 918–19). How much should they bet?

Kelly suggests that we maximize the expected value of the logorithm of wealth. We can express this as the growth rate \(r\) using the same notation:

\[r = (1 + fb)^p \cdot (1 - fb)^{(1-p)}\]

Figure 1: Growth rate for a bet with +100 odds and p=0.6

To gain an intuition for the problem, we can plot out all the possible growth rates for a bet with +100 odds and \(p=0.55\) as we have in Figure 1.

To optimize \(r\), it is easiest to take the derivative, but first we can get rid of the exponents by taking the log of both sides:

\[\log(r) = p\log(1 + fb) + (1-p)\log(1 - fb)\]

When the derivative of this expression is zero, we have found the maximum logarithm of the growth rate.1 The use of the logirthm as the value function is somewhat arbitrary and likely has a lot to do with the criterion’s origins in information theory. Kelly (1956) himself notes: “The reason has nothing to do with the value function which he attached to his money, but merely with the fact that it is the logarithm which is additive in repeated bets and to which the law of large numbers applies” (Kelly 1956, 925–26). Kelly describes how a gambler should deviate their strategy from his criterion: if they have a different value function, they could use a different strategy.

1 See the Wikipedia page on the Kelly criterion for the full proof.

I should note that the Kelly criterion was created for a situation with a lot of assumptions. Among them:

  1. The gambler knows the true probability of the event occurring.
  2. The gambler has infinite repeated bets.
  3. The gambler’s only goal is to maximize their bankroll.
  4. The gambler can bet as much or as little as they want every time.
  5. Opportunity costs are unimportant.

The Kelly criterion is a useful heuristic, but few of these assumptions hold up in real life.

Probability and bet sizes

Figure 2: Edge needed to bet 20% of bankroll

The logarithmic properties of the Kelly criterion lead to some desirable outcomes. For instance, given even odds (+100) the criterion tells us we need a 60% winning expectation (20% EV) to bet 20% of our bankroll, but at longer odds like +400, we would need to expect to win 36% of the time (80% EV). Thus the Kelly criterion suggests we need a higher expectation of our edge to bet more on bets with long odds. However, with -300 odds, we’d need to expect to win 80% of the time (6.6–6.7% EV) to bet 20% of our bankroll. This is a much lower edge required for a bet with a high probability of winning.

Bet size True probability Payout Edge Implied probability
20% 30% 700 140.0% 12%
20% 40% 300 60.0% 25%
20% 50% 167 33.3% 37%
20% 60% -100 20.0% 50%
20% 70% -167 12.0% 62%
20% 80% -300 6.7% 75%
20% 90% -700 2.9% 88%

A typical gambler may not only want to maximize bankroll size, but also minimize the chance of losing all their money by going bust. Here, the relationship between probability and the size of bets remains important.

Let us say we know we have a consistent, unwaiving 2% edge on a repeated set of bets. Table 1 shows the bet sizes for a set of gamblers each betting with this edge at different odds. Note that the bet size increases with the probability of the bets winning.

Payout Implied probability True probability Bet size
900 10.0% 10.2% 0.2%
400 20.0% 20.4% 0.5%
233 30.0% 30.6% 0.9%
150 40.0% 40.8% 1.3%
100 50.0% 51.0% 2.0%
-150 60.0% 61.2% 3.0%
-233 70.0% 71.4% 4.7%
-400 80.0% 81.6% 8.0%
-900 90.0% 91.8% 18.0%
Table 1: Bet sizes for gamblers using the Kelly criterion each betting with a 2% edge with different odds

Imagine if all of these bettors became extremely unlucky and all of their bets lost. How much they lose is a function of how much they bet.

\[ (1-f\ast)^n \]

where \(n\) is the number of bets.

We can simulate the scenario for each of the bettors in Table 1. As seen in Figure 3, given the same edge, the bettors with higher odds of winning can potentially lose their money the fastest. This is because they are betting more of their bankroll each time.

While, the worst-case scenaio is within the bounds of the possible, it’s not exactly likely. To give an idea for the range of outcomes, we can simulate how the bettors in Table 1 might fare. To reduce the effects of random chance, we can aggregate over 1,000 simulations. Here, the same pattern emerges where the bets on higher probability events can lose money quickest, but the worst-case simulated events rarely approach the worst-case over time (losing 2000 bets is unlikely even if you only have a 10% of each bet winning).

Figure 3
Figure 4
Figure 5: On average, bets with higher odds make more money given the same percentage of expected value

If there is so much risk involved with betting big bankrolls on higher probability events, why does Kelly tell us to do it? As it turns out, it is very profitable. Figure 5 shows that the big bets by far return the highest profit over time. I had to change the scale to \(\log10\) becasue the difference is so dramatic.

And it’s not just the median. Over the same number of bets, those betting on higher probability events are more likely to make a profit. It pays to bet big! And over time, many of the unlukcy bettors get less unlucky and end up making a profit.

There are two things going on in these simulations:

  1. Higher probability bets make more profit in aggregate.
  2. Higher probability bets create higher variance in their outcomes, so when they lose, they lose a lot.

The variance of bankrolls increases with the number of bets

Final Thoughts

The Kelly criterion is effective in cases where you goals align with optimizing the logarithm of the rate growth of wealth (which seems true in many cases) and where the assumptions don’t seem too outlandish given your information.

References

Haghani, Victor, and Richard Dewey. 2017. “Rational Decision Making Under Uncertainty: Observed Betting Patterns on a Biased Coin.” Journal of Portfolio Management 43 (3): 2–8. https://doi.org/10.3905/jpm.2017.43.3.002.
Kelly, John L. 1956. “A New Interpretation of Information Rate.” The Bell System Technical Journal 35 (4): 917–26.