analysis

Pokemon GO shiny rates: a Bayesian perspective

The Silph Road is the largest online and in-person network of Pokemon GO players and researchers. We investigate the question of how accurate their proposed shiny rates are by putting on our Bayesian hat, setting the "consensus" shiny rate as our prior, and using Silph field studies as observed data.

Background: Silph, shinies, and statistics

The Silph Road organizes regional groups of Pokemon GO players, sets up in-person tournaments, and conducts field studies to learn about game mechanics of Pokemon GO. Of particular interest to us here is the shiny rate, which is the probability that a Pokemon found in the wild will be shiny (for non-Pokemon players, this just means it's rare and specially coloured; it's like a trophy). Though not publicized by the game developer Niantic, this rate has been of great interest to Pokemon GO players (after all, shinies are not too far off from loot boxes).

Silph publishes field studies to determine shiny rates, and these studies have resulted in two consensus rates: one "standard" rate of 1/450 (used for the vast majority of Pokemon), and one "boosted" rate of 1/150 (used during certain events). Recently, however, those rates have been called into question on the Silph subreddit, saying that they are not consistent with the collected data. I am going to re-examine these findings from a Bayesian perspective.

Methodology

I went through the Silph archives looking for their shiny rate publications posted this year, and gathered them into a file rates.csv. The null rows in this file were the result of Silph not reporting their exact numbers (e.g., see Spoink ("over 16,500 Spoink") and Adventure Week ("over 30,000 encounters each")). I chose to keep these in the dataset in case someone asks "what happened?" Additionally, the presence of two rows from the Gligar event were the result of an apparent change in the shiny rate after ~24 hours, which I am taking to be fact.

In [1]:
import pandas as pd

rates = pd.read_csv("rates.csv")
rates.sample(5)
Out[1]:
url date name n_encounters n_shiny
18 https://thesilphroad.com/science/quick-discove... 20190629 geodude_a 6724.0 19.0
1 https://thesilphroad.com/science/quick-discove... 20190928 zangoose_seviper 8977.0 24.0
7 https://thesilphroad.com/science/quick-discove... 20190903 gligar_later 4234.0 33.0
14 https://thesilphroad.com/science/quick-discove... 20190727 zubat 844.0 3.0
9 https://thesilphroad.com/science/quick-discove... 20190824 carvanha 10386.0 20.0

Let's compute the "rarity", defined as n_encounters / n_shinies. A rarity R means that we saw shinies with a rate of 1 in R.

In [2]:
rates["rarity"] = rates["n_encounters"] / rates["n_shiny"]
rates = rates.dropna()
rates.sample(5)
Out[2]:
url date name n_encounters n_shiny rarity
20 https://thesilphroad.com/science/go-fest-weeke... 20190613 horsea 7303.0 64.0 114.109375
4 https://thesilphroad.com/science/quick-discove... 20190919 lillipup 4567.0 14.0 326.214286
3 https://thesilphroad.com/science/quick-discove... 20190919 patrat 4479.0 16.0 279.937500
0 https://thesilphroad.com/science/oddish-shiny-... 20191004 oddish 10988.0 94.0 116.893617
26 https://thesilphroad.com/science/shiny-meltan-... 20190225 meltan 7850.0 128.0 61.328125

Domain knowledge tells us that there are three classes of shiny rates here: a highly boosted one (around 1 in 60, for Alolan Exeggutor and Meltan), one boosted one (which Silph claims to be 1 in 150), and one normal one (which Silph claims to be 1 in 450). We can use this to partition the dataset manuallly, discarding the highly boosted samples because they're not relevant to this debate.

In [3]:
boosted = rates[rates["rarity"].between(70, 200)].sort_values("date").reset_index(drop=True)
unboosted = rates[rates["rarity"] > 200].sort_values("date").reset_index(drop=True)
In [4]:
boosted
Out[4]:
url date name n_encounters n_shiny rarity
0 https://thesilphroad.com/science/extraordinary... 20190522 bronzor 2479.0 15.0 165.266667
1 https://thesilphroad.com/science/go-fest-weeke... 20190613 horsea 7303.0 64.0 114.109375
2 https://thesilphroad.com/science/quick-discove... 20190704 nidoran_m 5722.0 53.0 107.962264
3 https://thesilphroad.com/science/quick-discove... 20190727 sneasel 1588.0 13.0 122.153846
4 https://thesilphroad.com/science/quick-discove... 20190806 poliwag 5627.0 40.0 140.675000
5 https://thesilphroad.com/science/quick-discove... 20190903 gligar_later 4234.0 33.0 128.303030
6 https://thesilphroad.com/science/quick-discove... 20190921 yanma 4052.0 37.0 109.513514
7 https://thesilphroad.com/science/oddish-shiny-... 20191004 oddish 10988.0 94.0 116.893617
In [5]:
unboosted
Out[5]:
url date name n_encounters n_shiny rarity
0 https://thesilphroad.com/science/quick-discove... 20190629 diglett_a 5373.0 11.0 488.454545
1 https://thesilphroad.com/science/quick-discove... 20190629 geodude_a 6724.0 19.0 353.894737
2 https://thesilphroad.com/science/quick-discove... 20190629 rattata_a 5179.0 14.0 369.928571
3 https://thesilphroad.com/science/quick-discove... 20190727 koffing 24902.0 65.0 383.107692
4 https://thesilphroad.com/science/quick-discove... 20190727 ekans 13018.0 37.0 351.837838
5 https://thesilphroad.com/science/quick-discove... 20190727 zubat 844.0 3.0 281.333333
6 https://thesilphroad.com/science/quick-discove... 20190824 barboach 9958.0 23.0 432.956522
7 https://thesilphroad.com/science/quick-discove... 20190824 carvanha 10386.0 20.0 519.300000
8 https://thesilphroad.com/science/quick-discove... 20190903 sentret 19297.0 54.0 357.351852
9 https://thesilphroad.com/science/quick-discove... 20190903 gligar_early 1971.0 3.0 657.000000
10 https://thesilphroad.com/science/quick-discove... 20190919 patrat 4479.0 16.0 279.937500
11 https://thesilphroad.com/science/quick-discove... 20190919 lillipup 4567.0 14.0 326.214286
12 https://thesilphroad.com/science/quick-discove... 20190928 zangoose_seviper 8977.0 24.0 374.041667

Let's start with the proposed boosted shiny rate of 1 in 150. We'll come back to the standard one later.

The boosted shiny rate: the Bayesian way

Frequentist statistics would construct a confidence interval on these rates--it's a simple proportions test--and call it a day. Indeed, that's what both Silph (see every publication they put out) and critics of Silph have done. After constructing this confidence interval, we simply check if 1/150 lies within it.

But we can do better than this yes/no response. Given that we believe that the boosted shiny rate is 1 in 150, the Bayesian way of thinking provides us with a natural way of incorporating this into our analysis: as a prior.

In [6]:
import arviz as az
import pymc3 as pm

az.style.use("fivethirtyeight")

Setting priors

Let's use a Beta prior over p, since a Beta can be used as a distribution over probabilities. Using the success rate interpretation of the Beta, our prior will be fairly weak: equivalent to having seen 10 shinies in 1500 encounters. Put otherwise, our prior is that anything between 1 in 100 and 1 in 300 is plausible.

We'll add a second variable, rarity, which is 1 / p as defined before. This makes it easier to use phrases like "1 in 150" or "1 in N," and is more intuitive when talking about extremely small probabilities. Through the rest of this document, we'll mostly focus on the plots of the rarity.

In [7]:
with pm.Model() as model:
    p = pm.Beta("p", alpha=10, beta=1490)
    rarity = pm.Deterministic("rarity", 1. / p)

prior_samples = pm.sample_prior_predictive(samples=10000, model=model)
In [8]:
axes = az.plot_density(
    prior_samples,
    var_names=["p", "rarity"],
    point_estimate=None,
    credible_interval=0.99,
    shade=0.5,
    figsize=(12, 4),
)

From this, we can see that while 1/150 is at the center of our prior beliefs, we wouldn't be surprised with a rarity of 1 in 100 or 1 in 200 either. This is without having collected any data--if all we had heard was "the shiny rate is 1 in 150," but we weren't sure about that 150 number, this plot represents a plausible range of values.

Adding data

One advantage of the Bayesian approach is that it lets us add as much or as little data as we have. We will demonstrate how our beliefs in the shiny rate change over time as we show our model more data (i.e., as we progress through time and have more shinies released).

In [9]:
from typing import Tuple

def encounters_and_shiny(df: pd.DataFrame, species_name: str) -> Tuple[float, float]:
    """Given a species name, retrieve the number of encounters and number of shinies"""

    row = df[df.name == species_name].iloc[0]
    return (row["n_encounters"], row["n_shiny"])

assert encounters_and_shiny(boosted, "sneasel") == (1588, 13)
assert encounters_and_shiny(unboosted, "sentret") == (19297, 54)

Beacuse each encounter is independently shiny with probability p, a binomial distribution is appropriate for modeling the number of shinies we see. We will use Markov Chain Monte Carlo to learn the likely distributions over our parameters (shiny rate and rarity). In lay terms, we will try to infer a distribution of most probable values for those parameters, little by little as we see more data. We'll start with just Bronzor.

In [10]:
with model:
    n_encounters, n_shiny = encounters_and_shiny(boosted, "bronzor")
    bronzor = pm.Binomial("bronzor", n=n_encounters, p=p, observed=n_shiny)

    trace = pm.sample(1000, chains=4)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 2 jobs)
NUTS: [p]
Sampling 4 chains: 100%|██████████| 6000/6000 [00:03<00:00, 1821.65draws/s]
In [11]:
_ = az.plot_trace(trace)

This plot represents what we might have believed in February 2019, after seeing 15 out of 2479 shinies for Bronzor. The left curves represent the likely ranges for the shiny rate p and the rarity 1-in-N. For those unfamiliar with MCMC, ignore the fuzzy-caterpillar-like plots on the right; for those familiar with it, this model exhibits excellent sampling behavior.

Notice how we're already seeing that these distributions are a little bit tighter. We see virtually no likelihood of the rate being 1 in 300 now, but earlier we did. Meanwhile, 1 in 150 remains a highly likely shiny rate given our limited data.

Let's add the next Pokemon we had an event for, Horsea.

In [12]:
with model:
    n_encounters, n_shiny = encounters_and_shiny(boosted, "horsea")
    horsea = pm.Binomial("horsea", n=n_encounters, p=p, observed=n_shiny)

    trace = pm.sample(1000, chains=4)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 2 jobs)
NUTS: [p]
Sampling 4 chains: 100%|██████████| 6000/6000 [00:03<00:00, 1835.00draws/s]
In [13]:
_ = az.plot_trace(trace)

Because we observed a rate of 1 in 114 for Poliwag, the likelihood for the rarity has now shifted much further left. It is now almost entirely implausible for the shiny rate to be any lower than 1 in 200, and even 1 in 150 is starting to look unlikely.

The next shiny released was Nidoran M.

In [14]:
with model:
    n_encounters, n_shiny = encounters_and_shiny(boosted, "nidoran_m")
    nidoran_m = pm.Binomial("nidoran_m", n=n_encounters, p=p, observed=n_shiny)

    trace = pm.sample(1000, chains=4)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 2 jobs)
NUTS: [p]
Sampling 4 chains: 100%|██████████| 6000/6000 [00:03<00:00, 1651.98draws/s]
In [15]:
_ = az.plot_trace(trace)
">

Nidoran's observed rarity was 1 in 107 over 5700 encounters, shifting our rarity curve evne further left, and now it's becoming more clear that 1 in 150 is a pretty unlikely shiny rate. Let's do this one more time for Sneasel.

In [16]:
with model:
    n_encounters, n_shiny = encounters_and_shiny(boosted, "sneasel")
    sneasel = pm.Binomial("sneasel", n=n_encounters, p=p, observed=n_shiny)

    trace = pm.sample(1000, chains=4)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 2 jobs)
NUTS: [p]
Sampling 4 chains: 100%|██████████| 6000/6000 [00:03<00:00, 1844.41draws/s]
The acceptance probability does not match the target. It is 0.9023306786521638, but should be close to 0.8. Try to increase the number of tuning steps.
In [17]:
_ = az.plot_trace(trace)

At this point (perhaps earlier) I would feel confident saying that the shiny rate, whatever it is, is not 1 in 150. The Sneasel event happened in July 2019, and I'm writing this in October, so clearly that wasn't enough for the Pokemon GO community. Fortunately, four more events happened between then and now, and we can pass them all at once.

In [18]:
with model:
    n_encounters, n_shiny = encounters_and_shiny(boosted, "poliwag")
    poliwag = pm.Binomial("poliwag", n=n_encounters, p=p, observed=n_shiny)

    n_encounters, n_shiny = encounters_and_shiny(boosted, "gligar_later")
    gligar = pm.Binomial("gligar", n=n_encounters, p=p, observed=n_shiny)

    n_encounters, n_shiny = encounters_and_shiny(boosted, "yanma")
    yanma = pm.Binomial("yanma", n=n_encounters, p=p, observed=n_shiny)

    n_encounters, n_shiny = encounters_and_shiny(boosted, "oddish")
    oddish = pm.Binomial("oddish", n=n_encounters, p=p, observed=n_shiny)

    trace = pm.sample(1000, chains=4)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 2 jobs)
NUTS: [p]
Sampling 4 chains: 100%|██████████| 6000/6000 [00:03<00:00, 1762.18draws/s]
In [19]:
_ = az.plot_trace(trace)

We can confidently say that it is extremely unlikely that the boosted shiny rate is 1 in 150. It is much more plausible that the rate is in the neighborhood of 1 in 120, as 150 hasn't even registered on our posterior plot of the rarity.

Notice how natural a fit the Bayesian way of thinking was: we have some prior beliefs (that the rate is 1 in 150), and some data (the Silph studies), and we can marry the two together to get a posterior (the plot we see above). It's clear that the data do not support our prior beliefs, but that's okay; we're researchers, and that's how this is supposed to work.

The normal shiny rate (supposedly 1 in 450)

Let's look next at the normal shiny rate, which is supposedly 1 in 450. For brevity's sake, I won't take us through the step-by-step process again, but rather pass all the data at once.

In [20]:
with pm.Model() as model:
    p = pm.Beta("p", alpha=10, beta=4490)
    rarity = pm.Deterministic("rarity", 1. / p)

prior_samples = pm.sample_prior_predictive(samples=10000, model=model)
In [21]:
axes = az.plot_density(
    prior_samples,
    var_names=["p", "rarity"],
    point_estimate=None,
    credible_interval=0.99,
    shade=0.5,
    figsize=(12, 4),
)