NBA Referee Bias: What the Academic Research Tells Bettors

NBA referee in a black-and-white striped jersey mid-stride down a hardwood basketball court, observing a play with one hand raised in a foul signal

Loading...

The papers most punters never read

The case for using academic research as a betting input is not romantic. The papers are dense. The methods sections are punishing. Most of them sit behind paywalls or in repositories that take three clicks to find. And yet the NBA refereeing literature is the most useful body of work I have read in nine years in this niche, because it is the only place outside the league’s own files where someone has tested the question every bettor instinctively asks: is the calling fair, and if it is not, how unfair is it?

The Pelechrinis paper alone analysed 7,498 personal foul calls. The Price and Wolfers paper looked at fifteen seasons. The McDermott study at North Carolina ran more than 16,000 plays through a logistic regression model. The Belasen paper put the betting market itself at the centre of the question. Each of these works tells you something a sportsbook will never tell you, and each one points at where the price is wrong in ways a disciplined punter can act on. The goal of this guide is to walk you through the literature the way a working bettor reads it — for usable input, not for academic credit.

What “referee bias” actually means in the literature

The first useful thing to clarify is what the academic word “bias” is doing in these papers, because it is narrower than the everyday version.

In casual conversation a biased referee is one who decides outcomes on purpose. The academic literature is essentially not interested in that question, because the data could not answer it cleanly even if researchers wanted to ask. What the papers measure is statistical bias — systematic deviation from a counterfactual in which officiating decisions were independent of the identity and circumstances of the teams. If 100% of marginal foul calls were made independently of who was favoured, who was at home, and who held the lead, the data would show no bias. The data does not show that. It shows directional patterns that hold up across millions of plays and across multiple research teams using different methods.

That distinction matters for how you use the findings. A bettor is not in a position to allege misconduct. A bettor is in a position to identify and price systematic deviations. Every paper covered below is documenting a deviation. None of them is claiming a referee chose the deviation deliberately, and most of them are explicit that the most likely mechanism is implicit — pressure from a crowd, anchoring to expectations, the human tendency to soften the marginal call against a team that is already behind. Implicit bias is bookable. Conscious misconduct is mostly a distraction.

The second clarification: “referees” in this literature usually means the officials working a specific play, not the entire league or any single named individual. Aggregate findings about NBA refereeing are findings about the staff as a whole, weighted by their actual workloads. They do not tell you what Scott Foster or Tony Brothers or Natalie Sago will do on a particular Tuesday. They tell you the distribution from which an individual game is drawn. For a bettor that is exactly the level of generality a model needs, because individual splits are noisy and need aggregate priors to be readable.

The Belasen 2025 paper and the betting-line finding

The Journal of Sports Economics paper by Belasen, Belasen and Olbrecht is the most directly betting-relevant piece of work in the literature, and I would put it on a short list of papers every UK NBA punter should read at least once.

The team built a regression model on Last Two Minute Report plays in close-spread games and asked a direct question: does officiating accuracy systematically differ depending on which side of the betting market a team is on? The headline numbers were striking. In matches with a narrow betting spread, officials made 23% fewer erroneous decisions against road underdogs than against favourites. Against home underdogs the figure was 42% fewer. Both effects survived a battery of controls — score margin, time remaining, official identity, team strength, season effects — and both were well outside the noise band of the model.

What I like about this paper is how directly the authors frame the stakes. “The National Basketball Association publishes Last Two Minute Reports of referee calls to encourage accountability and a consistent application of the rules,” they wrote in the introduction. “However, recent partnerships with gaming operators have brought referee impartiality into question.” That is a remarkably plain sentence for an academic journal, and it tells you the team understood the betting context the data was about to be read in. The paper is not a polemic, but it does not pretend the stakes are purely scholarly either.

The mechanism the authors lean toward — without overclaiming — is that officials in tight, late-game situations face an asymmetric cost function. A perceived error that extends the result of a close game in favour of the trailing team is publicly costly in a way that an error in the other direction is not. The marginal call gets softened in the direction of “do not let me be seen to decide this game by a whistle.” That is implicit, not intentional, and it produces exactly the directional pattern the data shows. For a UK bettor, that means the underdog in close-spread games is being systematically priced wrong by a small but consistent margin in the league’s late-game data, and the implication carries through into how you should weight a closing-window position. The practical follow-through — how to convert this into actual stakes on a UK coupon — sits in our piece on NBA referee handicapping for UK punters.

The Pelechrinis paper and the bubble experiment

If Belasen is the betting-line paper, Pelechrinis is the home court paper.

The 2023 paper in Scientific Reports analysed 7,498 personal foul calls drawn from L2M reports and built a model to test for implicit bias in refereeing decisions, with NBA officials as the testbed. The size of the dataset is the first thing to note: by the standards of refereeing research it is enormous, and the statistical power of the analysis is correspondingly strong. The headline finding was a persistent home bias in officiating decisions across the period of study. Not a dramatic one, but consistent enough to clear the conventional significance thresholds.

The more interesting part of the paper for a bettor is the home court advantage benchmark. Using Sagarin home court ratings as an external anchor, the team documented that the NBA’s home edge dropped from roughly 2.74 points per game in the 2015 to 2019 seasons to about 1.75 points in the 2020 to 2022 seasons. The bubble year, played without fans in Orlando, sat at the bottom of that range. The post-bubble seasons partially recovered but never returned to the pre-pandemic level.

Read together, the two findings — persistent referee bias plus shrinking home edge — give us the cleanest natural experiment we are ever going to get on this question. If the home court edge had been driven primarily by player effects, it would have rebounded fully when crowds returned. It did not. That tells us that a meaningful share of the historical NBA home edge was crowd-driven through the officials, not through the players. A bettor who is still pricing home court in the NBA at the pre-pandemic two and a half points is using a number that no longer reflects the data. The right number is closer to one and three quarters, and on close lines that distinction is regularly the difference between a cover and a push.

The McDermott study and the clock effect

The McDermott paper, produced at the University of North Carolina through the Office for Undergraduate Research in 2023, is less famous than the Belasen or Pelechrinis work, but it answered a specific question with unusual rigour.

The study analysed more than 16,000 L2M calls across the 2017 to 2022 seasons. The central question was whether officiating accuracy varied as a function of time remaining on the clock within the L2M window itself. The answer, after a logistic regression with appropriate controls, was a clear yes — accuracy degrades as the clock runs down. Calls made in the last 30 seconds of a one-possession game are measurably less accurate than calls made earlier in the same closing window.

The size of the effect is not dramatic in absolute terms, but it points at something the other papers do not quite capture. The Belasen paper treats the L2M window as a single regime. The Pelechrinis paper does the same. McDermott shows that the window is actually two regimes — a relatively clean early-L2M period and a degraded late-L2M period — and that any model treating them identically is mixing signal.

For a bettor the implication is twofold. First, the magnitude of the Belasen bias is probably understated in the late-L2M period and overstated in the early-L2M period, because the underlying accuracy is lower in the late window. Second, totals priced for tight finishes are working off a noisier official-decision generator than totals priced for moderately close finishes that end without the last 30 seconds being meaningful. Both implications push toward a small but real adjustment in how you weight late-game scoring expectations, and they explain why ATS variance is structurally higher in one-possession games than in three-possession finishes.

The Price and Wolfers 2010 paper

Before any of the L2M-based research existed, the Quarterly Journal of Economics published a paper in 2010 by Joseph Price and Justin Wolfers that asked a stark question: do NBA officials call fewer fouls on players of their own race?

Their dataset covered the 1991-92 to 2004-05 seasons, predating the L2M and predating any of the integrity infrastructure the league has since built. The methodology controlled for player skill, position, foul tendency, team, opponent and a long list of game-state variables. The finding was a small but statistically robust own-race bias. White officials called marginally fewer fouls on white players than on black players, and the inverse pattern held for the smaller sample of black officials. The size of the effect, importantly, was large enough that a strategy of betting against teams whose racial composition mismatched the officiating crew would have produced a positive ROI across the period studied.

The paper became one of the most controversial pieces of refereeing research ever published, and the league initially pushed back. Subsequent independent replications generally confirmed the existence of the effect across that period, though with smaller effect sizes than the original estimate. For betting purposes the paper matters less as a current-day input than as a baseline. It established that referee identity carries measurable directional information, that the information is bookable as line value, and that the league’s officiating staff is not, in fact, identically interchangeable. Every subsequent paper on NBA bias is in some sense replying to the 2010 Price and Wolfers result.

The methodological contribution of the paper is worth a sentence on its own. Earlier research on officiating relied heavily on box-score aggregates that lost most of the directional information by averaging across game contexts. Price and Wolfers worked at the level of the individual call within a specific game state, controlling for the players on the floor at the moment of the whistle. That move — going from aggregate to play-level — is what made later L2M-based research possible. The Pelechrinis paper and the Belasen paper are both methodologically descended from this 2010 work, even though their data sources are different. The intellectual lineage is unbroken.

The 2018 follow-up and what it told us about awareness

The most useful update to the 2010 paper appeared eight years later in Management Science, written by Devin Pope, Joseph Price and Justin Wolfers — the same Wolfers, now in a team that included a behavioural economist with deep work on attention and incentives.

The 2018 paper repeated the original analysis on a new sample drawn from the 2007 to 2010 seasons, after the original paper had been published and widely reported. The result was striking. The own-race bias documented in the 1991-2005 data essentially disappeared in the 2007-2010 sample. The officials had not changed. The training had not been radically overhauled. What had changed was that everyone in the league office and everyone in the officiating staff now knew the research existed, knew it was being read by the public, and knew similar analyses could be run at any time on any future season’s data.

The conclusion the team drew was that awareness reduces racial bias. Once an implicit pattern is exposed and the people producing it know it is being measured, they correct it — not consciously, in most cases, but through the kind of attentional adjustment that public scrutiny tends to produce. For a bettor that finding has two consequences. The narrow one is that any historical edge tied to own-race bias is gone and should not appear in a modern model. The broader one is that publicly documented biases tend to decay once the documentation is public. By extension, the Belasen finding from 2025 has a half-life. Every season that paper sits in the literature is a season during which the underlying behaviour is more likely to have been corrected. The edge is real today, and it will not last forever.

That decay curve is one of the more uncomfortable truths about betting on academic research. The papers are public — that is why we can read them — and the same publicity that makes the finding readable also makes the underlying pattern correctable. Sportsbooks read the same papers bettors do. The league’s officiating training reads the same papers bettors do. A finding worth acting on today is worth less every year that follows. The practical response is to act on the current numbers without overcommitting capital to any single directional thesis, and to recheck the empirical pattern at the start of every season to see whether the effect size has narrowed. Most edges in this niche have half-lives measured in seasons rather than decades.

Counter-evidence and what the papers do not establish

The literature on refereeing bias is unusually consistent in finding directional effects, but it is not unanimous, and a disciplined bettor needs to know where the limits live.

Several studies using older methodologies — particularly ones that worked from box scores and aggregate per-game foul counts rather than play-by-play data — have failed to replicate the bias findings the L2M-based papers report. The disagreement is mostly methodological. Aggregate data washes out the directional signal because it averages across game contexts that the L2M-based studies treat separately. The newer work is almost certainly more reliable, but readers should be aware that the picture is not uniformly settled.

Predictive modelling work tells us where the absolute ceiling on accuracy sits. Models built on referee data combined with play-by-play features routinely land in the mid-60s for accuracy, with the strongest models hitting around 66% on average and topping out near 78% in their best subsets. Those are the upper bounds on what is achievable with public information. A bettor who claims to be hitting 70% on referee-driven NBA bets across a full season is, with very high probability, either lucky or fooling themselves. The Belasen and Pelechrinis findings are real, and they improve a baseline model. They do not turn an NBA coupon into a vending machine.

The other thing the papers do not establish is direction-of-causation at the individual official level. A staff-wide finding of bias toward favourites in close-spread games does not tell you that any particular crew chief on a particular night will produce that bias. The aggregate distribution is the prior. The specific game adds noise that the model cannot remove without much more data than the public has access to.

One final caveat worth flagging: the literature is essentially silent on coordinated effects across a crew. The papers treat each call as a decision by a single named official, but in practice a three-person crew shares information and develops a calling rhythm across a game. The aggregate bias findings are net of any crew-coordination effects, which means a particular crew might produce a larger or smaller bias than the staff-wide number suggests depending on how the three officials interact. There is no good public dataset for this question. It is on the to-do list of every researcher I know in this space and the practical answer is the same as the answer to many questions in applied statistics — collect more data, average across more games, and do not pretend the noise on a single night is the signal.

How I use the literature on a working coupon

The papers are not a betting system on their own. They are inputs that adjust priors in a system you have to build yourself.

What I take from Belasen into the workflow: in close-spread games, the underdog is being priced slightly weakly by the L2M data, particularly the road underdog. In practice that means I trim the half-point line tax on dogs in tight games and I treat the underdog moneyline in projected close finishes as a marginally more attractive bet than the simple matchup suggests. The size of the adjustment is small — at most a fifth of a point on the spread — but it is consistent across enough games that it compounds across a season.

What I take from Pelechrinis: the home court adjustment in any NBA model needs to be lower than the pre-pandemic standard. One and three quarters is closer to right than two and a half. Any UK book still pricing home edge at the older number is offering you marginal value on the road side of close games. This shows up especially in regular-season Sunday afternoon coupons where line movement is thinner and books rely more heavily on default home-court assumptions.

What I take from McDermott: weight late-L2M variance higher when assessing totals and ATS for projected one-possession games. The officiating itself is noisier in the last 30 seconds, which makes the closing margin less predictable. A bettor concentrating action on projected single-digit games is paying for that noise whether they know it or not.

What I take from Price and Wolfers, and from the Pope follow-up: the lesson is structural rather than directional. Every documented bias in this literature has a decay curve. Bet the current findings, but plan for the decay, and recheck the data each year for signs that the effect has narrowed.

The final discipline is to remember why the literature exists in the first place. The papers are not written for bettors. They are written for academic readers and for league administrators interested in officiating quality. A betting framework that uses them well treats the findings as priors that get layered on top of game-specific information rather than as instructions. The reading list above is closer to a textbook than to a tip sheet, and that is the only reason it has any long-term value at all in a market that arbitrages obvious tips into oblivion within weeks.

Did the COVID bubble change what academic research says about home court advantage?
Yes, and the change has not fully reversed. The Pelechrinis paper documented that the NBA"s Sagarin home court edge dropped from roughly 2.74 points per game in 2015-2019 to about 1.75 points in 2020-2022. The bubble year sat at the floor, and the post-bubble seasons recovered only partially. The implication is that a substantial share of historical NBA home edge was crowd-driven through officiating, not through player performance, and that any model pricing home court at the old number is using a stale input.
Is the academic evidence strong enough to act on as a UK bettor?
It is strong enough to adjust priors but not strong enough to replace a matchup-based model. The effect sizes in the papers are real but small — typically a few percentage points on call accuracy, a fraction of a point on spread expectations. Bettors who treat the literature as a marginal input that refines a broader system tend to extract value. Bettors who treat the literature as a system on its own tend to overbet and underdeliver.
Why did own-race bias disappear after the Wolfers paper went public?
The Pope, Price and Wolfers follow-up in Management Science argued that awareness itself reduces implicit bias. Once the 2010 paper was published and the officiating staff knew similar analyses could be run on future seasons, the directional pattern essentially vanished from the 2007-2010 data. The mechanism is not conscious correction so much as attentional adjustment under scrutiny — once you know you are being measured, the marginal call moves toward the neutral point.

Recommend

Scott Foster Betting Trends: A Data-Led Profile of the NBA's Most Scrutinised Official

The first time the data forced me to pay attention The first time I ran a referee filter through a UK betting model, I expected to see a slow gradient…

Published by the nbarefbettin team.