Paper Draft - Final

Published on January 2017 | Categories: Documents | Downloads: 39 | Comments: 0 | Views: 495
of 52
Download PDF   Embed   Report

Comments

Content


Washington University in St. Louis
Olin Business School
The NFL Gambling Market:
Testing efficiency and the late
season bias
Author:
Daniel Sear
Supervisor:
Dr. Dirk Nitzsche
May 21, 2010
Abstract
This paper analyzes the NFL gambling market for inefficiencies. Testing game data
from 2006 to 2009 we find that betting on late season home underdogs can be prof-
itable, representing a market inefficiency. Next we find that, unlike other recent
research, the weather and climate in which the game is played does not represent a
mis-pricing. Finally, we develop both a Binary and OLS Base Model regression and
find that the Binary Base Model regression can be utilized out-of-sample to form a
profitable strategy on all games late in the season.
ii
Acknowledgements
I would like to thank Professor Dirk Nitzsche for his guidance through the con-
struction of this paper. Also thanks to Brain Burke of AdvancedNFLStats.com for
pointing me in the right direction on some interesting research. Finally, I would like
to thank Professor Richard Borghesi for his helpful clarifications on the finer points
of his research.
iii
Declaration
I declare that this dissertation is the result of my own work and includes nothing
which is the outcome of work done in collaboration. It is not substantially the same
as any which I have submitted for a degree, diploma, or other qualification at any
other university. Additionally, no part of this dissertation has already been, or is
currently being, submitted for any such degree, diplmoa, or other qualification.
(Daniel Sear)
iv
Contents
1 Introduction 7
1.1 NFL Background Information . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Betting in the NFL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Necessity for New Research . . . . . . . . . . . . . . . . . . . . . . . 10
2 Literature Review 12
2.1 OLS Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Binary Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Other variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Analysis 22
3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Time variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Results 28
4.1 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 In-sample predictability . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Out-of-sample predictability . . . . . . . . . . . . . . . . . . . . . . . 40
5 Conclusion 44
Appendices 47
Bibliography 50
v
List of Tables
1.1 Illegal betting by sport in the United States . . . . . . . . . . . . . . 8
3.1 Summary of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1 NFL home team summary statistics by week . . . . . . . . . . . . . . 29
4.2 NFL home underdog summary statistics by week . . . . . . . . . . . 31
4.3 Persistence of biases in the NFL . . . . . . . . . . . . . . . . . . . . . 32
4.4 Weather effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5 Nevada football betting . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.6 Success rates of simple betting rules in the NFL . . . . . . . . . . . . 36
4.7 NFL in-sample predictability . . . . . . . . . . . . . . . . . . . . . . . 39
4.8 NFL out-of-sample predictability using the base binary model . . . . 41
1 Climate of NFL teams . . . . . . . . . . . . . . . . . . . . . . . . . . 48
vi
Chapter 1
Introduction
Gambling has become an integral aspect of modern sports. A motivated person can
place a bet on virtually anything; take the Super Bowl, for instance. Bets can be
placed on every aspect of that game from simply the winner or loser to the coin toss
(heads or tails), even the length of time it takes for the national anthem to be sung
(an over/under). These bets not only drive interest in the sports on which wagering
occurs, but they are huge business and attract over 300 billion dollars each year in
the United States alone (see Table 1.1) and that is only illegal gambling. Due to
the mass amounts of money that flows through this market each year, an inefficiency
could result in large profits for a bettor if he can exploit the mis-pricing properly.
This paper aims to determine if the market for NFL bets displays any biases that
could be exploited or if it is an efficient market with regard to the variables we test.
1.1 NFL Background Information
The National Football League was formed in 1920 and has risen from small beginnings
to become the highest attended American sport by a wide distance.
1
(MacCambridge,
2005) The league consists of 32 teams organized into two conferences each with four
divisions of four teams. Following a four week preseason, teams play 16 regular season
games over a 17 week season, allowing one ‘bye week’ per team per season. The bye
week generally comes between week 4 and week 10 of the regular season. The regular
season currently begins the Thursday evening after Labor Day (the first Monday
in September) and ends the last week of December. Games are played primarily
on Sunday with a weekly primetime game on Monday (the famous Monday Night
Football), however as the season progresses weekly games are added on Thursday
and Saturday. Six teams from each conference make the playoffs which consists of
four rounds: the Wild Card, Divisional, Conference Championship, and Super Bowl.
1
The average attendance for an NFL game is 67,509.
Sear 7
1.2. BETTING IN THE NFL CHAPTER 1. INTRODUCTION
Table 1.1: Illegal betting by sport in the United States
League/Event Total Wagers ($)
National Football League 80 − 100 billion
Super Bowl 6 − 10 billion
College Football 60 − 70 billion
College Basketball 50 billion
NCAA Basketball Tournament 6 − 12 billion
National Basketball Association 35 − 40 billion
Major League Baseball 30 − 40 billion
Hockey, Golf, NASCAR, Boxing, and Other Sports 1 − 3 billion
Soccer Nominal
Total 268 − 325 billion
Notes: This table shows the estimated amount of illegal gambling conducted in the
United States each year by league or event. The information comes from a study
done by CNBC and does not factor in any legal gambling conducted in casinos or
other authorized sports books.
The Super Bowl falls in the first week of February the following year of a season (so
the 2009 season’s Super Bowl was played in February 2010).
The NFL has a ‘hard’ salary cap and revenue sharing which fosters parity
between teams.
2
It also allows smaller market teams such as the Green Bay Packers to
be on level footing with big market teams like the New York Giants. The NFL’s main
demographic is males, around the age of 16-49. This is also the main demographic of
sports bettors which helps to explain the extreme volume of wagering that is centered
around the NFL.
1.2 Betting in the NFL
In standard American football betting (both legal and illegal) a point spread system
is used. In this system a spread is set for the weekend’s games early in the week,
normally Monday. Bets are taken up until kickoff when the betting ends and the
2
A ‘hard’ cap is one that a team cannot exceed. In contrast, the NBA has a ‘soft’ cap which
means a team must pay a fine for any salary that is over the cap, but they can still have a payroll
exceeding the cap. Revenue sharing means that most revenues the league generates are pooled
together and distributed evenly among the 32 teams. For instance, the NFL signs television rights
contracts and gives the Dallas Cowboys 1/32nd of the money, the Chicago Bears 1/32nd of the
money and so on.
Sear 8
1.2. BETTING IN THE NFL CHAPTER 1. INTRODUCTION
‘closing line’ is established. An example of a line would be ‘Chicago minus five at
Minnesota’. This would mean that Chicago is expected to beat Minnesota by five
points. If a bettor believes one team is undervalued compared to their opponent
they bet on that team with a book maker (bookie). In this case, if Chicago outscores
Minnesota by more than five points, bets on Chicago win. If Minnesota loses by four
points or less (including scenarios where they win outright), bets on Minnesota win.
Finally, if Chicago wins by exactly five points a ‘push’ is declared and the money is
simply returned to the bettor.
3
In point spread betting the bookie acts much like a stock exchange specialist.
The bookie, like the stock exchange specialist, charges a fee for setting up sellers and
buyers (in this case bettors wagering on both sides of the line). The bookie makes
money in two ways. First, for each wager a bookie pays out $10 in winnings for
each bet of $11. This means that if in the previous example a bet was placed on
Chicago for $11 and Chicago won by seven points the bettor would be given $21 by
the bookie, not $22; this is known as the vigorish, or vig.
4
Second, the bets that
lose are collected by the bookie and what is not given to other bettors in winnings is
their profit.
Like the stock exchange specialist, a bookie would like to avoid ending up in
a naked position. Therefore, if many bettors are wagering on one side of a bet the
bookie will adjust the line to encourage betting on the other side. For example, if
there are many bets placed on Chicago to win by five the bookie might change the
line from ‘Chicago minus five at Minnesota’ to ‘Chicago minus five and a half at
Minnesota’. This will encourage more betting on Minnesota as that bet would win
3
Sometimes no favorite is declared and the line is zero; this is commonly referred to as a ‘pick-’em’
game.
4
Other betting systems exist in other sports. In baseball and hockey a possible bet would look
like (−175, +190). In this case a bettor could win $1 by betting $1.75 on a favorite or $1.90 by
betting $1 on an underdog. Due to the nature of this system baseball and hockey bettors are not
concerned with point spreads, only who wins the game. Another system exists in horse racing called
pari-mutel betting. As opposed to the odds and point-spread betting systems where the payoff is
locked in at the time of the bet, the winning bettors in a pari-mutuel system divide the total amount
of bet after deducting commissions. This means that if a bettor places a $1 bet on a 20-to-1 horse
they might only receive 15-to-1 odds after all the bets are placed. In this system transaction costs
are normally a percentage of the bet amount.
Sear 9
1.3. NECESSITY FOR NEW RESEARCH CHAPTER 1. INTRODUCTION
if Minnesota loses by five where previously they would have pushed. The bookie will
continue to adjust the line so all bets offset, reducing his exposure. However, it does
not matter to a bettor what the line does after their bet is placed. If a bet was placed
on Chicago when the line was minus five and the line subsequently moved to five and
a half that person’s bet would still win or lose based on a line of five; the line is
locked in for each bet when it is made. Therefore, since betting lines move according
to the dollar amount of wagers on each side of the bet, closing spreads should reflect
all public and private information as well as any biases of market participants.
Illegal betting, which we have seen is equally if not more popular than the
legal variety, can be conducted through the point-spread system or a variety of other
bets, known as ‘prop bets.’ Prop bets exist in legal casinos but are much more
popular in the illegal gambling market. Prop bets are bets not on the outcome of the
game, but on any range of ancillary factors pertaining to the contest. For instance,
we previously mentioned two popular prop bets for the Super Bowl: heads or tails
for the opening coin flip and the length of time it takes for the National Anthem to
be sung. In addition to bets like this, illegal prop bets can range from guessing if a
questionable player will make the starting lineup, to details as inane as what a coach
will be wearing on the sidelines. The illegal gambling market is generally regarded as
a place where any bet can be made if a bettor has the money to support the wager.
Because of this, it is a much less structured market that is difficult to gather hard
details on; we will be focusing this research on the more defined market of Las Vegas
NFL gambling.
5
1.3 Necessity for New Research
This paper is necessary given the existence of previous research done on the subject
of betting market efficiency because it is an area that is constantly evolving. The
idea of efficiency in markets is predicated upon the belief that when an inefficiency is
identified in liquid markets, arbitragers will exploit the inefficiency until it disappears
5
The information for this section has been taken from the CNBC study cited above.
Sear 10
1.3. NECESSITY FOR NEW RESEARCH CHAPTER 1. INTRODUCTION
and all assets are appropriately priced. In this study, we examine an inefficiency
identified by Borghesi (2007) that led to a mis-pricing of late season NFL games,
specifically home underdogs. If the idea holds that markets correct themselves back
to an efficient state after time, this mis-pricing should either be gone or at least
dissipating. Borghesi’s study was done using data through 2000, we will examine
data from 2006 to 2009.
We believe that we will find the mis-pricing has disappeared. The mis-pricing
identified was large enough that a savvy bettor or group of bettors would have come
into the market with large amounts of capital and exploited the bias until it dis-
appeared. Furthermore, we do not think a model can be created that is accurate
to account for transaction costs such as the vigorish. This model would have to be
accurate around 53% of the time to achieve this level. In short, we believe that the
NFL gambling market has returned to an efficient state.
Sear 11
Chapter 2
Literature Review
2.1 OLS Models
There have been numerous studies on the efficiency of the NFL gambling market over
the years. Most repeat or build on the first study done in 1985 by Zuber et al. In that
study, the authors built off of research done into the efficiency of racetrack gambling
to analyze the NFL. They first tested for weak form efficiency in the market. They
thought of the betting lines on NFL games as forward prices on stocks. The way
to test for market efficiency in that situation is to use the forward prices to predict
the future spot prices. Using this analogy they tested for market efficiency in NFL
betting lines using a simple regression where DP
i,t
, the dependent variable, is the
actual difference in points between the teams playing in the ith game in week t and
PS
i,t
, the independent variable, is the final point spread. They note that PS is
not the book makers opinion of the outcome. The first line that is set in the week
represents the book maker’s (expert) assessment of the game, however, as bets come
in he adjusts the line to avoid excessive risk. Therefore, by the end of the week the
line will reflect all public knowledge about the game.
1
It follows that if their equation
rejects the null hypothesis that α = 0 and β = 1 the market is not efficient.
They found that based on the 1983 season they could not reject the null for
13 of the 16 weeks. They decided to test the complete opposite as well, the null
hypothesis that α = β = 0, meaning that the point spread was entirely unrelated to
the outcome. This also could not be rejected for 15 of the 16 weeks. Based on that
result they determined that the weak form test was not a sufficient indicator of the
efficiency of the market.
1
This assumes that there is a high level of activity on the betting lines. While actual total figures
on the amount bet on each game are unavailable due to the illegal placement of many bets it is safe
to assume that there is sufficient activity given that Nevada casinos alone take in over $66.6 billion
each year in football bets.
Sear 12
2.1. OLS MODELS CHAPTER 2. LITERATURE REVIEW
Their next approach was to define an efficient market as one in which no
player could develop a consistently profitable strategy. In this case, that would be
developing a strategy that involves winning greater than 52.40 percent of the time.
2
In an attempt to determine a strategy that would achieve greater than 52.40 percent
return they developed another simple, one variable regression equation where DP
i,t
again represents the actual difference in points between the teams, B

is a vector of
coefficients to be estimated and X
h
i,t
and X
v
i,t
, the independent portion of the equation,
are matrices of observable variables for the home and away teams respectively. Each
of the game and team variables in the matrix X
i,t
is defined in the same way as
DP
i,t
meaning each consists of the difference between the value of the variable for
the home team and its value for the visiting team. The explanatory variables used
in this equation are game statistics like yards rushed, proportion of passing plays
attempted to total offensive plays, and number of rookies for a team. The model was
found significant to explain the actual difference in points between the two teams.
This model was used to predict the actual difference in points, (DPP
i,t
), and several
betting methods were determined.
3
They would bet on a game if |DPP
i,t
−PS
i,t
| ≥ λ
where λ was 0.5, 1, 2, and 3 in different scenarios. The team to bet on is determined
by the sign of (DPP
i,t
−PS
i,t
): if this expression is positive, the gamble is made on
the home team; if negative, the gamble is made on the visiting team. The bet is won
if the team gambled on beats the spread. This requires that the sign of (DP
i,t
−PS
i,t
)
coincide with the sign of (DPP
i,t
−PS
i,t
).
This model found all λ to produce a winning percentage above the required
52.40 percent. They concluded that there was a way to develop a winning strategy in
this market, but reasoned that an inefficiency did not necessarily exist. This is due
to other factors which must be considered as costs like the time and effort it takes
to form all of these analyses. They reasoned that this opportunity existed because
2
Gambling on an NFL game is carried out on the ‘11 for 10’ rule: the gambler must lay out $11
for every $10 he or she wishes to win. The percentage of winning bets (WP) necessary to break
even, 52.40 percent, is obtained by setting the expected value of the random variable, a gamble
WP(10) + (1 −WP)(−11), equal to zero.
3
The explanatory variables were produced each week based on statistics from week 1 to week
(t − 1)
Sear 13
2.1. OLS MODELS CHAPTER 2. LITERATURE REVIEW
no one valued the gain from exploiting it more than they valued the other ways in
which they could spend their time.
There are two main criticisms of this initial study. First, the authors chose to
conduct it only over one season, 1983. This means that the inefficiencies that they
found potentially were only present in that season and the inclusion of other seasons
would have eliminated the inefficiency. In addition, the second criticism of the study
is that they did not test their model on any other seasons. It was shown by Sauer
et al. that when their strategy was applied to other seasons, games outside of the
sample, the strategy was not consistently profitable.
In response to these criticisms, Sauer et al. (1988) repeated Zuber et al.’s
study making refinements so that the results would be more accurate. To start, they
changed the sample size from 14 to 224 to increase the accuracy of the weak form
test, Zuber et al.s’s first equation.
4,5
With this new approach, Sauer et al. arrived
at the same result as the previous paper of not being able to reject the null when
testing for weak form efficiency. However, unlike Zuber et al. previously, they were
able to reject the null on the extreme alternative test where α = β = 0.
Sauer et al. go on to take issue with Zuber et al.’s second test which used
past game statistics to predict the point spread. They argued that the game statistics
would be widely know by the bookies and the bettors and would be included in the
point spread.
6
This was tested using using a modified version of Zuber et al.’s second
equation
DP
i,t
−PS
i,t
= B

· (
ˆ
X
h
i,t

ˆ
X
v
i,t
) +
i,t
(2.1)
where
ˆ
X
h
i,t
=
1
t − 1
·
t−1

s=1
X
h
s
(2.2)
4
Zuber et al. broke up the season into 16 samples of 14 games each, making their n = 14 whereas
Sauer et al. included all the games of the season giving them n = 224.
5
It has been shown that the variance of a least-squares estimator is inversely related to sample
size, so the smaller sample size for Zuber et al. increased the likelihood that they would reject the
null hypothesis.
6
They did not contend that Sauer et al.’s strategy would not have made money certain seasons,
but they argued that it is does not represent a market inefficiency.
Sear 14
2.2. BINARY MODELS CHAPTER 2. LITERATURE REVIEW
and similarly defined for
ˆ
X
v
i,t
. This allows the authors to regress the Zuber et al.
variables against both the actual difference in points as well as the betting point
spread, with the stipulation the coefficient on the betting point spread is equal to
1.0. Using this model Sauer et al. find that there is virtually no information added
to the existing betting line by the Zuber et al. variables. Sauer et al. recreate the
Zuber et al. model for the 1984 season and find that for all λ

s the strategy would
end up losing money.
7
Sauer et al. conclude by stating they have found no evidence
to support the previous claim of inefficiency and they believe the market for NFL
betting is efficient.
2.2 Binary Models
Other authors such as Gray and Gray (1997), Dare and Holland (2004), and Borghesi
(2007) have continued the work on the subject of NFL gambling market efficiency.
However these authors began to introduce the binary (probit) model. Gray and Gray
looked at long-term biases in the market. They were the first researchers to use a
probit model in their analysis. They replaced the left side of the model with a binary
variable to account for the fact that bettors do not care by what magnitude they win
their bets, only if the bet is won. They examine both in-sample as well as out-of-
sample performance of multiple betting strategies. As with other studies, Gray and
Gray’s in-sample strategies perform very well. They devise multiple simple strategies
generating positive returns. For instance, the simple strategy of betting on all home
underdogs achieves an accuracy of nearly 57%, around 4 percentage points greater
than the break even point. This specification was statistically significant for their
sample but they note is has begun to dissipate over recent years. They noticed over
the last 11 seasons of their sample (1984-1994) only three seasons generated accuracy
greater than the break even point.
Gray and Gray further use their probit model out-of-sample. In these tests,
they set up two scenarios: (1) bet on games where the model predicts a team has a
7
The losses were as much as $2,920 for λ = 0.5.
Sear 15
2.2. BINARY MODELS CHAPTER 2. LITERATURE REVIEW
greater than 50% chance to win the bet and (2) where the model predicts a team has
a greater than 57.5% chance to win the bet. Under these specifications the authors
generate out-of-sample returns of 6.93 percent and 16.67 percent, respectively. Other
probit strategies they developed which generate positive in-sample returns did not
generate those same positive returns when they were applied outside of the sample.
They go on to declare that some of the returns generated with various probit samples
were negative (though not significant) and that all probit filters cannot generate
positive returns. They use this fact to give further weight to their claim that the
inefficiencies they have noted are dissipating over time and will soon be gone.
They came to the conclusion that their models indicated an overreaction by
bettors to a team’s recent performance, ignoring the long-term success of the team.
They proposed a strategy of identifying teams that were performing well over the
course of the season but poorly in recent weeks and betting on them or, conversely,
betting against teams that are on a hot streak but are poor performers overall. This
is essentially the contrarian strategy found in financial markets. At the end of their
paper, Gray and Gray suggest that the probit model should be used for all further
research (instead of the widely used OLS model) and that this model should be tested
using other variables such as weather and performance variables.
Dare and Holland (2004) worked to refine and consolidate the work done by
Sauer et al. and Zuber et al. They took the most mathematically based approach
of any researcher to date and analyzed the regression models used by Dare and
McDonald (1996) and Gray and Gray (1997). They point out that characteristics
used by Gray and Gray such as if a team is favored, underdog, home, or visiting are
are correlated and show that the methods used by Gray and Gray would likely lead to
biased estimates and false rejection of the null hypothesis. They note that Dare and
McDonald noticed this issue as well and accounted for it by imposing restrictions on
the estimates that account for the correlation. This restricted the home team bias to
be the opposite of the visiting team bias and the favorite team bias to be the opposite
of the underdog bias. Dare and McDonald, using this model, found minimal to no
Sear 16
2.2. BINARY MODELS CHAPTER 2. LITERATURE REVIEW
evidence of inefficiency in the market.
Dare and Holland criticize the Dare and McDonald specification on the point
that venue and favorite are deemed to be unrelated in the Dare and McDonald model.
On the contrary, Dare and Holland find that a home team is almost twice as likely to
be a betting favorite than a visiting team. They then mathematically work through
the Dare and McDonald equation so they can compare derivatives of the coefficients.
They find the Dare and McDonald model to be over restrictive and therefore not
allow for the proper pricing of games.
8
They then work through the model backwards,
beginning with the misspecified portion of the Dare and McDonald model and add in
Gray and Gray’s binary process to come up with a model that correctly specifies all
of the biases in the market. This model was subsequently used by Borghesi (2007)
to examine the persistence of the late season bias.
Borghesi published multiple papers looking at specific situations that he be-
lieved might represent an inefficiency. The first of those papers identified an ineffi-
ciency in the betting market with regard to temperature. He found that game day
temperature has a great affect on the outcome of the game and is not represented in
the betting line. In another paper he identifies a bias in the final few weeks of each
season. Specifically, the home-underdog effect is not properly accounted for and he
shows that a profitable strategy can be produced from exploiting this bias.
9
This
paper also tests for overall market efficiency. However, he uses a different equation
to model that efficiency. Using Gray and Gray’s idea that bettors have no interest
in the actual difference in points, only if they win or lose the bet, Borghesi used the
equation proposed by Dare and Holland (2004) where W
i
is 1 if the favorite covers
and 0 otherwise, HF
i
= 1 if the home team is the favorite; HF
i
= 0 otherwise,
V F
i
= 1 if the visiting team is the favorite; V F
i
= 0 otherwise and CL
i
is the ab-
solute value of the closing line. This formula allows for a discrete analysis to take
8
Specifically, Dare and Holland find the Dare and McDonald model assumes that there are an
equal number of home underdogs as home favorites. This is an assumption that is almost never
true and thus requires a reworking of the model.
9
The home-underdog effect is when home-underdogs are generally undervalued by the betting
market, so it becomes profitable to wager on their side of the spread.
Sear 17
2.2. BINARY MODELS CHAPTER 2. LITERATURE REVIEW
place. He then augments this model to change W
i
to DP
i
, a variable that reflects the
outcome. This shift establishes an OLS model that takes the magnitude of a bettor’s
win into account. While the OLS model had been discredited by previous studies
(most notably Gray and Gray (1997)), Borghesi included it along with the binary
model to back up Gray and Gray’s assertion that it is flawed.
He also looks at time variation in the model. He examines three separate
variables relating to time: (1) the possibility that market participants might need
more than the one week between games to properly process all the information gen-
erated from the week of games, (2) whether bettors value different characteristics
depending on the period of the season,
10
and (3) whether there is a momentum effect
present in the form of over or under-valuing recent wins and losses, recent offensive
and defensive performance, or any other performance measure.
The findings are quite interesting. He finds that home teams strongly out-
perform the betting line during the last four weeks of the season, significant at the
1 percent level. In the playoffs, the mis-pricing is even stronger with home teams
favored to win by 5.75 points winning by an average of 8.60 points over the 20 season
sample. This means late in the season bettors routinely place too many bets on
away teams. The author was unable to find any evidence of mis-pricing of underdogs
overall, but he did discover mis-pricing in the smaller subgroup of home-underdogs.
The spread on these games favors the away team by 4.65 points on average and the
home underdog only lose by an average of 3.13 points, a significant result at the 1
percent level. In the playoffs this effect is even greater with home underdogs winning
outright on average. The author also shows that the effect is entirely coming from
the last four weeks of the season and has been growing over the 20 year sample, not
slowing as bettors became more aware of the phenomenon.
He reasons that the presence of the late season bias is because of the weather
for the home team.
11
The correlation between weather and outcome is strong the
10
His financial example is small cap vs. large cap in the 1980s and the football example is putting
less value in performance measures earlier in the season because teams are still trying to figure out
how good they are.
11
Weather has been a factor in almost every study done on this topic, however Borghesi was the
Sear 18
2.3. OTHER VARIABLES CHAPTER 2. LITERATURE REVIEW
last four weeks (when it is cold in the northern cities) involving home underdogs from
cold weather cities like Chicago, Buffalo, and New York among others. The author
reasons that this may also be the result of bettors who are down money for the season
overall choosing to alter their strategies from rational ones to irrational ones in the
hope that a change will improve their fortunes. Another alternative reason the author
presents is that bettors might become more risk tolerant as the season progresses,
resulting in irrational movement of the spread. The author is unable to test these
hypotheses with his data, however. Instead, he reasons in support of his claim saying
that as the NFL season progresses it gets increasing media coverage resulting in
more casual bettors. This swings the proportion from the rational, informed bettors
dominating the market to the irrational, casual bettors having a significant influence
on the market. Borghesi finishes his paper by devising regression models to exploit the
aforementioned inefficiencies. Using 1-month, 1-year, and 5-year Base Binary Models
he was able to pinpoint a short-term bias (using the 1-month model specifically) to
generate a late-season success rate of 53.25 percent.
2.3 Other variables
Over the last twenty years there have been many papers written on the topic of NFL
gambling market efficiency. Most of the recent papers have focused on taking the
models that Zuber et al., Sauer et al. or Dare and Holland derived and adding new
information or performance variables. Some of these studies find inefficiencies that
can be exploited by a bettor as Borghesi does with the late season home underdog,
however, most find these new variables provide no new information. In 2006 Boulier,
Stekler and Amundson published a study on the efficiency of the NFL betting market
modeled on the Zuber et al. study from 1985. In that study, Zuber et al. used the
weak form efficiency test combined with performance variables like yards of offense to
create a predictive model for a game’s outcome. Boulier, Stekler and Amundson used
the same model and instead of on-field performance measures used off-field variables.
first to look at it from a time oriented perspective.
Sear 19
2.3. OTHER VARIABLES CHAPTER 2. LITERATURE REVIEW
They examined the explanation power of the New York Times NFL Power Rankings,
whether the home team plays in a complex with a dome, and whether the home team
has artificial turf or not.
12
The authors looked at a modern set of seasons, 1994-2000. It can be assumed
that they chose to start in 1994 due to the establishment of the salary cap and
free agency in 1993 though that is never explicitly stated in the paper.
13
They
found that for these seasons the test for weak form efficiency could not reject the
null, meaning they could not declare it an inefficient market with that test. Their
tests using their new variables also showed very little evidence of mis-pricings in the
market. They found that the dummy variable testing for differences in opponent’s
home playing surface to be significant in-sample, however when tested out-of-sample
it lost significance. They concluded by stating that these new variables have been
properly factored into the prices of the market and do not represent any bias that
can be exploited by a gambler.
Another study done by Borghesi in 2006 investigated the effect of the weather
and if it represents an advantage home teams benefit from that is not represented in
the spread. The type of weather he analyzed in this paper was game day temperature,
different from the general climate distinctions he uses in his later paper on late season
biases. Using data from 1981 to 2000, he is able to locate another mis-pricing in
the NFL gambling market using this analysis. He first shows that forecast errors
in the NFL point spread betting market are biased and provides a link from that
error to the game day temperature conditions. He finds that not only absolute
temperature but also relative temperature acclimatization affects the performance of
players. He finds that the worst mis-pricing occurs when home teams play in the
coldest temperatures.
14
He then adjusts his model to account for the home underdog
12
They believed that teams with a dome play a unique style that would put them at an advantage
against non-dome teams and that teams with artificial turf also played a unique style that would
put them at an advantage against natural grass teams.
13
The establishment of the hard cap and free agency significantly altered the way the game was
played and managed and is oddly never brought up in any of these studies. Our sample comes well
after the effects of these two policies had been realized, an advantage that other samples do not
have.
14
He uses this finding again in his paper on late season bias.
Sear 20
2.3. OTHER VARIABLES CHAPTER 2. LITERATURE REVIEW
bias, and still finds a significant mis-pricing in these cold temperature games. This
finding lends credibility to the argument that the NFL gambling market experienced
a mis-pricing in the later part of the 20
th
century, however most of the studies that
discover this mis-pricing indicate the effect to be dissipating over time. This leads us
to believe that the mis-pricings have been resolved by the seasons our data is from,
2006-2009.
Sear 21
Chapter 3
Analysis
3.1 Data
This analysis presented in this paper is predicated upon the presence of statistical
anomalies in the outcome of NFL games. To locate these anomalies we compile
a time-series data set. This data set consists of all non-exhibition NFL contests
between the start of the 2006 season and the end of the 2009 season.
1
The data
include home/away team, scores, closing lines (point spreads), and statistics (time
of possession, yards, etc.). Seven games have been removed from the sample as
they were played at neutral sites so home field cannot be considered an advantage
for either team. These were three games played in London: New England Patriots
versus Tampa Bay Buccaneers in week seven, 2009; San Diego Chargers versus New
Orleans Saints in week eight, 2008; New York Giants versus Miami Dolphins in week
eight, 2007. Additionally, the Super Bowl, the NFL championship game, is played at
a neutral site each year so those 4 games were left out. During the sample, Buffalo
Bills played two games in Toronto. These contests were left in the sample because
Toronto is close to Buffalo and the Bills have tremendous fan support in the area.
Because of this they still benefited from any home field advantage they would have
had in Buffalo, if not greater support given the Toronto fans only got to see the
team once a year and were more excited and enthusiastic compared to the fans in
Buffalo. This results in 255 regular season and 10 playoff games examined for all
seasons except for 2006 in which we examine 256 regular season games and 10 playoff
games, yielding an entire sample of 1061 games.
Table 3.1 displays a breakdown of the point spreads in our sample. The
regular season has an average spread of 5.98 points. Interestingly, this is slightly
1
The start of the NFL season is in early September and ends with the Super Bowl normally in
the first week of February of the next year. Therefore the end of the 2009 season was actually in
early 2010. For simplicity we will simply refer to each season in which it was started, ignoring the
overlap into the early stages of the next year.
Sear 22
3.2. METHODOLOGY CHAPTER 3. ANALYSIS
Table 3.1: NFL point-spread summary statistics
Closing line
Category N Median Mean
Regular season 1021 5.00 5.98
Playoffs 40 4.25 5.70
Underdogs 1053 5.00 6.01
Home underdogs 353 3.50 4.99
Pick-’ems 8 0.00 0.00
Pushes 25 3.00 4.64
All Games 1061 5.00 5.97
Notes: This table contains summary statistics describing the closing spreads of all
NFL games played between 2006 and 2009. Regular season excludes games in the
pre-season, playoffs, and three regular season games played in London (one each year
beginning in 2007). Playoffs include only games in the post-season, excluding the
Super Bowl each year as well. Underdogs include only games in which there is a
nonzero point-spread. Home underdogs include only games in which the home team
is assigned a positive point-spread. Pick-’ems include only games which there is a
closing-point spread of zero. Pushes include only games in which the final outcome
exactly matches the closing-point spread. Closing spread describes the predicted
difference in the number of points scored by the two teams in each category (absolute
value of home minus away score).
greater than the average playoff spread, 5.70. Common thinking is that the playoff
format puts a weaker team on the road against a higher seeded team which would
cause the predicted score to heavily favor the stronger, home team. Underdogs are
expected to lose by 6.01 points on average while home underdogs are expected to
lose by only 4.99 points.
2
Data was gathered from NFL.com (scored and statistics)
and FootballLocks.com, a betting strategies website (point spreads).
3.2 Methodology
We start our examination with an analysis to determine if there is statistically sig-
nificant difference in the point spread and the actual outcome of the games. In the
2
Not shown in the table are the most common point spreads which are three points (N=115)
and seven points (N=43). Predictably, the most common outcomes of games are also three points
(N=94) and seven points (N=58).
Sear 23
3.2. METHODOLOGY CHAPTER 3. ANALYSIS
past (Zuber et al., 1985; Sauer et al., 1988) this has been tested using the equation
DP
i
= α +βPS
i
+
i
(3.1)
where DP
i
is the actual difference in points between the teams playing in the ith
game, PS
i
is the final point spread, and
i
is the error term. If expectations of
efficiency hold α = 0 and β = 1. However, using this equation proves problematic
for multiple reasons. First, this model only measures biases that are present over an
entire sample. For example, if home teams cover by three points during the first half
of a sample and away teams cover by three points during the second half of a sample,
this test would show α = 0 and β = 1, ignoring the significant trends present within
the sample and not identifying the strong biases. Second, the way the data is defined
can restrict the power of the model. Since the dependent variable, DP
i
, is only based
on one factor, measurement with respect to the underdog will only identify underdog
bias and measurement with respect to the home team will only measure home team
bias. A new model is required that takes these facts into account.
Home teams have an inherent advantage over their opponents. Over the four
sampled seasons 66.73% of favored teams were playing at home. Thus, we need a
model that will take this into account, something that equation 3.1 does not. To
differentiate between the two factors (home and underdog effects), isolate them and
determine their affect Gray and Gray (1997) propose the model
Y
i
= α +β
1
HOME
i

2
FAV
i
+
i
(3.2)
This model fails to allow for other, unexpected relationships, however, and it
needs to contain proper restrictions on estimates. For instance, the model needs to
be restricted so the home effect is the negative of the away effect. Dare and Holland
(2004) developed a model that correctly isolates the venue and spread explanatory
Sear 24
3.3. TIME VARIATION CHAPTER 3. ANALYSIS
variables using the model
D
i
= a
HF
HF
i
+a
V F
V F
i
+ (β − 1)CL
i
+
i
(3.3)
In this case, D
i
is the outcome (the difference in points scored between the favorite
and the underdog) minus the closing line, HF
i
= 1 if the home team is the favorite;
HF
i
= 0 otherwise, V F
i
= 1 if the visiting team is the favorite; V F
i
= 0 otherwise
and CL
i
is the absolute value of the closing line. This is the model we will use and
it will be referred to as the ‘Base Model.’
However, this model would seem to indicate that the magnitude by which a
bet is won is of import to the bettor. We have seen from Gray and Gray (1997) that
it is unclear how much more information is provided by the magnitude of a win. If
a bettor only views their bet in terms of success or failure the magnitude of that
success or failure is irrelevant. This leads to a binary state, win or loss. Equation 3.3
can be augmented to reflect this binary view.
3
The model is the same as equation
3.3 however now the left-hand-side variable is W
i
which is 1 if the favorite covers and
0 otherwise
W
i
= a
HF
HF
i
+a
V F
V F
i
+ (β − 1)CL
i
+
i
(3.4)
If these two equations result in substantially different results it would indicate
that the magnitude by which a bet wins (loses) by is of importance to bettors.
However, we begin our analysis assuming that bettors are principally concerned with
the success or failure of a bet. Bettors learn to price the underlying asset (the game
in this case) based on repeated success and failure against the spread. Therefore, we
will use a discrete choice regression as the basis for our analysis.
3.3 Time variation
The last aspect of valuation that we consider for the data is time variation. It is
very important to account for time variation using subsamples of various lengths for
3
Indeed, Dare and Holland (2004) included this second model in their paper as well.
Sear 25
3.3. TIME VARIATION CHAPTER 3. ANALYSIS
at least three reasons. First, it is quite possible that market participants do not
immediately process all of the information necessary for the formation of a rational
price. In equities markets, prices are continuously evolving and changing which
causes problems when judging how long investors need to properly account for new
information. However, the NFL betting market is much more discrete. In this market
almost all relevant information is released six days before final prices are set.
4
If
bettors can rationally process all of this information in that time period the closing
point spread should be an unbiased measure including all available information. On
the other hand, if bettors cannot react in six days and multiple betting periods pass
before the new information is included into the point spread, inefficiencies can exist.
We examine this rate of information processing by observing how many weeks are
necessary for all prior information to be incorporated into the point spread.
Second, investors may value certain characteristics more than others depend-
ing on the time period; for example, small cap vs. large cap in the 1980s. In sports
betting this phenomenon presents itself largely in bettors placing more emphasis on
venue (home vs. away) early on in the season factoring on field performance increas-
ingly as the season progresses. The reason for this is that teams go through many
changes each off-season like new players, coaches and tactics and the true strength
of a team is less apparent during the early stages of the season when all of these
changes are still taking effect. We analyze whether bettors shift the perceptions of
the value of certain information over time.
Lastly, as seen in equities markets price where momentum and reversals can
be very important to investors, there has been suggestion that streaks are important
to bettors as well (Camerer, 1989). Streaks of this nature involve the mis-valuation of
recent wins and losses or other factors like recent offensive or defensive performance.
While it is unclear if perceived streaks do exist, the value they would have to market
participants depends on the size and length of the streaks. Any shift from rational
4
Other information relating to the outcome such as injuries to players, suspensions, etc. that
might happen mid-week is very limited and we will assume for the sake of simplicity that it has no
overall effect on the process we are examining here.
Sear 26
3.3. TIME VARIATION CHAPTER 3. ANALYSIS
values has to be first identified before it can be exploited by an investor and must
be a great enough mis-pricing to recoup transaction costs for the investor. We will
test for a range of time periods because the length of these streaks is unknown before
they occur.
Sear 27
Chapter 4
Results
4.1 Statistical analysis
The crucial aspect of our analysis is the difference between point spreads and actual
outcomes. We assume the distribution of these differences is nonnormal so we use a
Wilcoxon signed rank test to test their significance. The summary of this statistical
analysis can be found in table 4.1 . Overall, home teams are predicted to win by an
average of 2.64 points and actually win by 2.24 points with this difference falling short
of showing significance. This table is in stark contrast to the findings of Borghesi
(2007). In that study he found statistically significant differences as the season wore
on with weeks 15, 16, and 17 significant at the 10%, 5%, and 2% level, respectively.
Our data shows only two weeks, four and ten, where the data would suggest a mis-
pricing exists. In the later weeks our more recent data shows mostly a negative
median difference with a median of −0.25 for weeks 14-17, but it does not come close
to significantly differing from zero. In fact, the mean and the median have different
signs, indicating that the true average of the data is around zero.
Our data shows the greatest indication of mis-pricing in the early weeks, the
opposite of Borghesi’s findings. Week 4 has a median difference of 6.50 points, a
number significant at the 2% level; week 10 has a median difference of -5.00 points,
a number significant at the 5% level. These two weeks have very little in common
however, and do not appear to be representative of a large mis-pricing given that the
differences not only are 6 weeks apart, but also have differing signs. In this data,
as the season progresses the market seems to get increasingly accurate. This could
be a symptom of the time variation idea that a team’s performance is uncertain at
the beginning of the season due to the large amount of changes that occur in the
off-season. As the season progresses and a team’s true ability is revealed the market
becomes more accurate at pricing the assets.
Sear 28
4.1. STATISTICAL ANALYSIS CHAPTER 4. RESULTS
T
a
b
l
e
4
.
1
:
N
F
L
h
o
m
e
t
e
a
m
s
u
m
m
a
r
y
s
t
a
t
i
s
t
i
c
s
b
y
w
e
e
k
W
e
e
k
G
a
m
e
s
M
e
a
n
P
S
M
e
a
n
o
u
t
c
o
m
e
M
e
d
i
a
n
o
u
t
c
o
m
e
M
e
a
n
d
i

e
r
e
n
c
e
M
e
d
i
a
n
d
i

e
r
e
n
c
e
p
-
v
a
l
u
e
1
6
4
-
2
.
3
9
0
.
2
3
-
2
.
0
0
-
2
.
6
3
-
2
.
2
5
0
.
1
3
3
2
6
3
-
2
.
8
1
-
1
.
9
0
-
3
.
0
0
-
0
.
9
0
-
1
.
0
0
0
.
6
3
2
3
6
2
-
2
.
8
7
-
2
.
1
8
-
3
.
0
0
-
0
.
6
9
0
.
0
0
0
.
7
9
5
4
5
5
-
1
.
7
5
-
5
.
9
8
-
7
.
0
0
4
.
2
4
6
.
5
0
0
.
0
1
3
5
5
6
-
3
.
1
0
-
4
.
8
6
-
3
.
0
0
1
.
7
6
0
.
7
5
0
.
4
2
8
6
5
4
-
2
.
8
0
-
4
.
0
6
-
2
.
5
0
1
.
2
6
0
.
2
5
0
.
8
5
6
7
5
3
-
0
.
5
3
0
.
0
0
-
3
.
0
0
-
0
.
5
3
0
.
0
0
0
.
8
3
3
8
5
2
-
4
.
6
3
-
2
.
6
5
-
4
.
0
0
-
1
.
9
7
-
3
.
5
0
0
.
2
4
7
9
5
5
-
2
.
7
5
-
1
.
4
7
-
3
.
0
0
-
1
.
2
7
-
1
.
0
0
0
.
4
0
7
1
0
5
9
-
2
.
9
5
-
0
.
2
7
2
.
0
0
-
2
.
6
8
-
5
.
0
0
0
.
0
4
3
1
1
6
4
-
1
.
9
1
-
0
.
1
9
-
2
.
5
0
-
1
.
7
3
-
1
.
2
5
0
.
3
3
3
1
2
6
4
-
3
.
4
4
-
2
.
8
1
-
3
.
0
0
-
0
.
6
3
-
1
.
0
0
0
.
7
6
6
1
3
6
4
-
1
.
2
2
0
.
2
7
0
.
0
0
-
1
.
4
8
0
.
7
5
0
.
5
7
9
1
4
6
4
-
2
.
4
8
-
5
.
5
0
-
6
.
5
0
3
.
0
2
4
.
5
0
0
.
0
9
3
1
5
6
4
-
2
.
2
1
-
1
.
2
2
-
3
.
0
0
-
0
.
9
9
-
1
.
0
0
0
.
5
3
0
1
6
6
4
-
2
.
8
2
-
0
.
8
1
-
0
.
0
0
-
2
.
0
1
-
2
.
0
0
0
.
2
5
9
1
7
6
4
-
2
.
9
9
-
3
.
8
6
-
3
.
5
0
0
.
8
7
0
.
5
0
0
.
7
3
4
W
e
e
k
s
1
-
1
3
7
6
5
-
2
.
5
4
-
1
.
9
2
-
3
.
0
0
-
0
.
6
2
-
0
.
5
0
0
.
1
4
8
W
e
e
k
s
1
4
-
1
7
2
5
6
-
2
.
6
3
-
2
.
8
5
-
3
.
0
0
0
.
2
2
-
0
.
2
5
0
.
9
2
7
P
l
a
y
o

s
4
0
-
4
.
7
8
-
4
.
4
3
-
3
.
5
0
-
0
.
3
5
-
1
.
0
0
0
.
7
4
2
A
l
l
g
a
m
e
s
1
0
6
1
-
2
.
6
4
-
2
.
2
4
-
3
.
0
0
-
0
.
4
1
-
0
.
5
0
0
.
2
1
6
N
o
t
e
s
:
T
h
i
s
t
a
b
l
e
c
o
n
t
a
i
n
s
s
u
m
m
a
r
y
s
t
a
t
i
s
t
i
c
s
f
o
r
a
l
l
N
F
L
h
o
m
e
g
a
m
e
s
p
l
a
y
e
d
b
e
t
w
e
e
n
2
0
0
6
a
n
d
2
0
0
9
.
G
a
m
e
s
t
h
a
t
h
a
v
e
n
o
h
o
m
e
t
e
a
m
(
S
u
p
e
r
B
o
w
l
s
)
a
r
e
o
m
i
t
t
e
d
.
W
e
e
k
r
e
f
e
r
s
t
o
r
e
g
u
l
a
r
s
e
a
s
o
n
g
a
m
e
s
o
n
l
y
.
P
l
a
y
o

g
a
m
e
s
a
r
e
s
u
m
m
a
r
i
z
e
d
s
e
p
a
r
a
t
e
l
y
n
e
a
r
t
h
e
b
o
t
t
o
m
o
f
t
h
e
t
a
b
l
e
.
M
e
a
n
P
S
i
s
t
h
e
a
v
e
r
a
g
e
v
a
l
u
e
o
f
t
h
e
c
l
o
s
i
n
g
l
i
n
e
(
p
o
i
n
t
s
p
r
e
a
d
)
r
e
l
a
t
i
v
e
t
o
t
h
e
h
o
m
e
t
e
a
m
(
n
e
g
a
t
i
v
e
i
n
d
i
c
a
t
e
s
t
h
a
t
t
h
e
h
o
m
e
t
e
a
m
i
s
t
h
e
f
a
v
o
r
i
t
e
)
.
M
e
a
n
(
m
e
d
i
a
n
)
o
u
t
c
o
m
e
i
s
t
h
e
m
e
a
n
(
m
e
d
i
a
n
)
d
i

e
r
e
n
c
e
i
n
p
o
i
n
t
s
s
c
o
r
e
d
b
e
t
w
e
e
n
t
h
e
a
w
a
y
t
e
a
m
a
n
d
t
h
e
h
o
m
e
t
e
a
m
(
n
e
g
a
t
i
v
e
i
n
d
i
c
a
t
e
s
t
h
a
t
t
h
e
h
o
m
e
t
e
a
m
w
i
n
s
)
.
M
e
a
n
(
m
e
d
i
a
n
)
d
i

e
r
e
n
c
e
i
s
t
h
e
m
e
a
n
(
m
e
d
i
a
n
)
d
i

e
r
e
n
c
e
b
e
t
w
e
e
n
t
h
e
c
l
o
s
i
n
g
s
p
r
e
a
d
a
n
d
t
h
e
a
c
t
u
a
l
o
u
t
c
o
m
e
(
n
e
g
a
t
i
v
e
i
n
d
i
c
a
t
e
s
t
h
a
t
t
h
e
h
o
m
e
t
e
a
m
c
o
v
e
r
s
)
.
p
-
v
a
l
u
e
i
n
d
i
c
a
t
e
s
t
h
e
l
i
k
e
l
i
h
o
o
d
t
h
a
t
m
e
d
i
a
n
d
i

e
r
e
n
c
e
i
s
s
i
g
n
i

c
a
n
t
l
y
d
i

e
r
e
n
t
f
r
o
m
z
e
r
o
u
s
i
n
g
a
s
i
g
n
e
d
r
a
n
k
t
e
s
t
.
Sear 29
4.1. STATISTICAL ANALYSIS CHAPTER 4. RESULTS
The results of the subgroup of home underdogs appears in Table 4.2 . Within
this subgroup, teams are predicted to lose by an average of 5.11 points and actually
lose by 5.54 points. There are no weeks with a statistically significant difference from
zero and only two weeks, four and twelve, even approach a 10% significance level.
Again, this differs greatly from the findings of Borghesi (2007). In his research, he
finds a large mis-pricing of home underdog games with statistically significant values
for all games, playoffs, the aggregate late season weeks, and individual weeks 15, 16,
and 17. In particular, he finds that home underdogs in the playoffs not only lose
by less than the spread, but win outright by almost nine points.
1
That is a very
striking contrast with our data, and according to this simple analysis the mis-pricing
Borghesi found has been corrected by the market. We will engage in a more focused
analysis of this later, but this data does not show that bettors are unable to rationally
value late season data. It would appear, in fact, that they process better towards
the beginning of the season when the true skill of a team is still relatively unknown
(weeks 1-13 has a higher p-value than weeks 14-17 in both cases).
In their research on the topic Gray and Gray (1997) found a home underdog
bias present from 1976 to 1994 however, they found that this bias was getting smaller
and smaller as time progressed. Our data indicates that the bias has hit a plateau
and leveled off. Looking at table 4.3 the home underdog is correctly priced in our
sample with a bet winning around 50% of the time. For late season home underdogs
however, the results are more interesting. Over the four seasons examined a bet
on the home underdog won 54.03% of the time. This is high enough to suggest a
strategy could be implemented to make money from this mis-pricing, though the lack
of significance seen in table 4.1 and table 4.2 might weaken this ability. Mis-pricing
within a season can occur for many reasons and can lead to irrational betting.
One potential source of persistent seasonal biases is bettors not properly ac-
counting for factors which affect team performance from one point to another in a
1
This is a striking discovery that Borghesi found in his data. He claims that it is not driven
by outliers however, we have found no evidence of any mis-pricing in the post-season, let alone
something this extreme. If it was not driven by outliers we would conclude it was simply an oddity
that occurred over the time period he analyzed.
Sear 30
4.1. STATISTICAL ANALYSIS CHAPTER 4. RESULTS
T
a
b
l
e
4
.
2
:
N
F
L
h
o
m
e
u
n
d
e
r
d
o
g
s
u
m
m
a
r
y
s
t
a
t
i
s
t
i
c
s
b
y
w
e
e
k
W
e
e
k
G
a
m
e
s
M
e
a
n
P
S
M
e
a
n
o
u
t
c
o
m
e
M
e
d
i
a
n
o
u
t
c
o
m
e
M
e
a
n
d
i

e
r
e
n
c
e
M
e
d
i
a
n
d
i

e
r
e
n
c
e
p
-
v
a
l
u
e
1
2
2
4
.
1
1
9
.
1
8
8
.
0
0
-
5
.
0
7
-
5
.
2
5
0
.
1
3
1
2
1
9
4
.
3
2
4
.
8
4
4
.
0
0
-
0
.
5
3
-
1
.
0
0
1
.
0
0
0
3
1
7
4
.
8
8
6
.
7
1
6
.
0
0
-
1
.
8
2
-
2
.
5
0
0
.
4
2
1
4
2
2
4
.
8
0
0
.
7
3
-
2
.
5
0
4
.
0
7
6
.
0
0
0
.
1
0
8
5
1
8
4
.
6
9
8
.
5
0
3
.
5
0
-
3
.
8
1
0
.
2
5
0
.
5
8
6
6
1
7
5
.
5
3
2
.
8
2
-
1
.
0
0
2
.
7
1
5
.
0
0
0
.
3
4
4
7
2
3
6
.
2
4
6
.
7
0
1
0
.
0
0
-
0
.
4
6
-
1
.
5
0
0
.
9
8
8
8
1
0
4
.
2
0
9
.
3
0
9
.
5
0
-
5
.
1
0
-
6
.
5
0
0
.
2
2
1
9
1
5
5
.
1
7
3
.
2
0
3
.
0
0
1
.
9
7
6
.
0
0
0
.
5
8
9
1
0
1
6
5
.
4
4
5
.
4
4
6
.
0
0
0
.
0
0
-
0
.
5
0
0
.
9
1
0
1
1
2
3
5
.
6
7
9
.
3
0
4
.
0
0
-
3
.
6
3
-
1
.
0
0
0
.
2
5
4
1
2
2
0
4
.
6
5
9
.
7
5
1
0
.
5
0
-
5
.
1
0
-
5
.
7
5
0
.
1
0
7
1
3
2
5
5
.
5
0
6
.
0
8
3
.
0
0
-
0
.
5
8
3
.
0
0
0
.
7
8
8
1
4
2
6
4
.
6
7
4
.
0
8
6
.
0
0
0
.
6
0
-
2
.
7
5
0
.
9
1
4
1
5
2
5
6
.
0
8
2
.
8
8
3
.
0
0
3
.
2
0
4
.
5
0
0
.
2
8
4
1
6
2
0
5
.
8
8
1
.
9
0
2
.
0
0
3
.
9
8
3
.
0
0
0
.
2
0
4
1
7
2
1
4
.
8
8
5
.
3
8
6
.
0
0
-
0
.
5
0
-
3
.
0
0
0
.
5
7
5
W
e
e
k
s
1
-
1
3
2
4
7
5
.
0
6
6
.
3
5
4
.
0
0
-
1
.
2
9
-
0
.
5
0
0
.
2
7
3
W
e
e
k
s
1
4
-
1
7
9
2
5
.
3
6
3
.
5
8
4
.
5
0
1
.
7
9
1
.
0
0
0
.
2
9
1
P
l
a
y
o

s
6
3
.
0
8
2
.
1
7
-
2
.
0
0
0
.
9
2
4
.
5
0
0
.
8
2
6
A
l
l
g
a
m
e
s
3
4
5
5
.
1
1
5
.
5
4
4
.
0
0
-
0
.
4
3
0
.
0
0
0
.
7
5
1
N
o
t
e
s
:
T
h
i
s
t
a
b
l
e
s
u
m
m
a
r
i
z
e
s
a
l
l
N
F
L
g
a
m
e
s
f
r
o
m
2
0
0
6
t
o
2
0
0
9
i
n
w
h
i
c
h
t
h
e
h
o
m
e
t
e
a
m
i
s
t
h
e
u
n
d
e
r
d
o
g
.
W
e
e
k
r
e
f
e
r
s
t
o
r
e
g
u
l
a
r
s
e
a
s
o
n
g
a
m
e
s
o
n
l
y
.
P
l
a
y
o

g
a
m
e
s
a
r
e
s
u
m
m
a
r
i
z
e
d
s
e
p
a
r
a
t
e
l
y
n
e
a
r
t
h
e
b
o
t
t
o
m
o
f
t
h
e
t
a
b
l
e
.
M
e
a
n
C
L
i
s
t
h
e
a
v
e
r
a
g
e
v
a
l
u
e
o
f
t
h
e
c
l
o
s
i
n
g
l
i
n
e
r
e
l
a
t
i
v
e
t
o
t
h
e
h
o
m
e
t
e
a
m
(
n
e
g
a
t
i
v
e
i
n
d
i
c
a
t
e
s
t
h
a
t
t
h
e
h
o
m
e
t
e
a
m
i
s
t
h
e
f
a
v
o
r
i
t
e
)
.
M
e
a
n
(
m
e
d
i
a
n
)
o
u
t
c
o
m
e
i
s
t
h
e
m
e
a
n
(
m
e
d
i
a
n
)
d
i

e
r
e
n
c
e
i
n
p
o
i
n
t
s
s
c
o
r
e
d
b
e
t
w
e
e
n
t
h
e
a
w
a
y
t
e
a
m
a
n
d
t
h
e
h
o
m
e
t
e
a
m
(
n
e
g
a
t
i
v
e
i
n
d
i
c
a
t
e
s
t
h
a
t
t
h
e
h
o
m
e
t
e
a
m
w
i
n
s
)
.
M
e
a
n
(
m
e
d
i
a
n
)
d
i

e
r
e
n
c
e
i
s
t
h
e
m
e
a
n
(
m
e
d
i
a
n
)
d
i

e
r
e
n
c
e
b
e
t
w
e
e
n
t
h
e
c
l
o
s
i
n
g
s
p
r
e
a
d
a
n
d
t
h
e
a
c
t
u
a
l
o
u
t
c
o
m
e
(
n
e
g
a
t
i
v
e
i
n
d
i
c
a
t
e
s
t
h
a
t
t
h
e
h
o
m
e
t
e
a
m
c
o
v
e
r
s
)
.
p
-
v
a
l
u
e
i
n
d
i
c
a
t
e
s
t
h
e
l
i
k
e
l
i
h
o
o
d
t
h
a
t
m
e
d
i
a
n
d
i

e
r
e
n
c
e
i
s
s
i
g
n
i

c
a
n
t
l
y
d
i

e
r
e
n
c
e
f
r
o
m
z
e
r
o
u
s
i
n
g
a
s
i
g
n
e
d
r
a
n
k
t
e
s
t
.
Sear 31
4.1. STATISTICAL ANALYSIS CHAPTER 4. RESULTS
Table 4.3: Persistence of biases in the NFL
Home underdog Late home underdog
Season Games Win (%) Games Win (%)
2006 81 58.02 25 48.00
2007 94 50.00 35 57.14
2008 87 43.68 31 58.06
2009 91 48.35 33 51.52
All Seasons 353 49.86 124 54.03
Notes: This table shows the success rate (omitting pushes) of betting on all home
underdogs during the regular season and post-season. Win is the proportion of home
underdogs that cover the spread. Home underdogs include all regular season and
playoff games (except Super Bowls) in which the home team is assigned a positive-
point spread. Late home underdogs exclude games played before week 13.
season. The most pertinent of these that differs from the beginning to the end of
a season is the weather. People around the game (coaches, players, sports writers,
etc.) consistently argue that when a team from a mild climate like San Diego has to
play in an open air stadium in a harsh climate late in the season, such as Chicago,
the mild climate team is at a significant disadvantage. Teams based in harsh cli-
mates regularly practice and play in this harsh weather making them more adept at
handling the adversity that the weather presents. In an efficient market this climate
factor should be fully reflected in the closing point spread. It is possible that late
season spreads do not account for this situation and if the late season mis-pricing
is a result of this it would indicate that bettors are not properly factoring historical
results into their analysis. If they were, they would account for the effect the weather
can have late in the season and the bias would be expected and incorporated into
the prices.
The relationship between weather and outcome in games played in weeks 15
and later is shown in table 4.4 , Panel A. This subsample does not show evidence of
consistent mis-pricing in games involving visiting teams from mild climates traveling
to play games in cold climates.
2
The mean closing line indicates that home teams are
2
A list of teams with their location and climate information can be found in the appendix. Cold
climate games are game that were played after week 14.
Sear 32
4.1. STATISTICAL ANALYSIS CHAPTER 4. RESULTS
T
a
b
l
e
4
.
4
:
W
e
a
t
h
e
r
e

e
c
t
s
S
e
a
s
o
n
G
a
m
e
s
M
e
a
n
P
S
M
e
a
n
o
u
t
c
o
m
e
M
e
d
i
a
n
o
u
t
c
o
m
e
M
e
a
n
d
i

e
r
e
n
c
e
M
e
d
i
a
n
d
i

e
r
e
n
c
e
p
-
v
a
l
u
e
P
a
n
e
l
A
:
C
o
l
d
w
e
a
t
h
e
r
a
d
v
a
n
t
a
g
e
b
y
s
e
a
s
o
n
2
0
0
6
1
4
-
6
.
5
4
-
4
.
0
0
-
3
.
0
0
-
2
.
5
4
-
4
.
0
0
0
.
5
5
1
2
0
0
7
1
2
-
6
.
7
5
-
7
.
5
0
-
8
.
5
0
0
.
7
5
0
.
0
0
0
.
9
3
7
2
0
0
8
1
0
-
1
.
6
0
-
1
2
.
7
0
-
9
.
5
0
1
1
.
1
0
6
.
5
0
0
.
0
4
4
2
0
0
9
1
4
-
6
.
5
7
-
4
.
5
7
-
6
.
0
0
-
2
.
0
0
2
.
0
0
0
.
8
3
4
A
l
l
S
e
a
s
o
n
s
5
0
-
5
.
6
1
-
6
.
7
4
-
7
.
0
0
1
.
1
3
1
.
2
5
0
.
5
2
8
P
a
n
e
l
B
:
C
o
l
d
w
e
a
t
h
e
r
a
d
v
a
n
t
a
g
e
b
y
m
o
n
t
h
M
o
n
t
h
(
s
)
G
a
m
e
s
M
e
a
n
P
S
M
e
a
n
o
u
t
c
o
m
e
M
e
d
i
a
n
o
u
t
c
o
m
e
M
e
a
n
d
i

e
r
e
n
c
e
M
e
d
i
a
n
d
i

e
r
e
n
c
e
p
-
v
a
l
u
e
S
e
p
t
e
m
b
e
r
5
1
-
3
.
1
4
-
4
.
8
2
-
5
.
0
0
1
.
6
9
2
.
5
0
0
.
3
4
7
O
c
t
o
b
e
r
4
8
-
5
.
6
1
-
8
.
3
5
-
7
.
0
0
2
.
7
4
2
.
2
5
0
.
3
5
0
N
o
v
e
m
b
e
r
7
1
-
3
.
5
0
-
2
.
0
7
-
3
.
0
0
-
1
.
4
3
-
1
.
0
0
0
.
3
5
8
D
e
c
e
m
b
e
r
,
J
a
n
u
a
r
y
6
1
-
5
.
9
7
-
7
.
7
0
-
7
.
0
0
1
.
7
4
1
.
5
0
0
.
3
1
5
N
o
t
e
s
:
P
a
n
e
l
A
s
u
m
m
a
r
i
z
e
s
a
l
l
N
F
L
g
a
m
e
s
f
r
o
m
2
0
0
6
t
o
2
0
0
9
i
n
w
h
i
c
h
t
h
e
h
o
m
e
t
e
a
m
h
a
s
a
c
o
l
d
w
e
a
t
h
e
r
a
d
v
a
n
t
a
g
e
.
T
h
i
s
a
d
v
a
n
t
a
g
e
i
s
d
e

n
e
d
t
o
o
c
c
u
r
w
h
e
n
a
v
i
s
i
t
i
n
g
t
e
a
m
i
s
t
r
a
v
e
l
i
n
g
f
r
o
m
a
m
i
l
d
c
l
i
m
a
t
e
t
o
p
l
a
y
i
n
B
u

a
l
o
,
C
h
i
c
a
g
o
,
C
i
n
c
i
n
n
a
t
i
,
C
l
e
v
e
l
a
n
d
,
D
e
n
v
e
r
,
G
r
e
e
n
B
a
y
,
N
e
w
E
n
g
l
a
n
d
,
N
e
w
Y
o
r
k
,
P
h
i
l
a
d
e
l
p
h
i
a
o
r
P
i
t
t
s
b
u
r
g
h
i
n
w
e
e
k
1
5
o
r
l
a
t
e
r
.
M
e
a
n
P
S
i
s
t
h
e
a
v
e
r
a
g
e
v
a
l
u
e
o
f
t
h
e
c
l
o
s
i
n
g
l
i
n
e
(
p
o
i
n
t
s
p
r
e
a
d
)
r
e
l
a
t
i
v
e
t
o
t
h
e
h
o
m
e
t
e
a
m
(
n
e
g
a
t
i
v
e
i
n
d
i
c
a
t
e
s
t
h
a
t
t
h
e
h
o
m
e
t
e
a
m
i
s
t
h
e
f
a
v
o
r
i
t
e
)
.
M
e
a
n
(
m
e
d
i
a
n
)
o
u
t
c
o
m
e
i
s
t
h
e
m
e
a
n
(
m
e
d
i
a
n
)
d
i

e
r
e
n
c
e
i
n
p
o
i
n
t
s
s
c
o
r
e
d
b
e
t
w
e
e
n
t
h
e
a
w
a
y
t
e
a
m
a
n
d
t
h
e
h
o
m
e
t
e
a
m
(
n
e
g
a
t
i
v
e
i
n
d
i
c
a
t
e
s
t
h
a
t
t
h
e
h
o
m
e
t
e
a
m
w
i
n
s
)
.
M
e
a
n
(
m
e
d
i
a
n
)
d
i

e
r
e
n
c
e
i
s
t
h
e
m
e
a
n
(
m
e
d
i
a
n
)
d
i

e
r
e
n
c
e
b
e
t
w
e
e
n
t
h
e
c
l
o
s
i
n
g
s
p
r
e
a
d
a
n
d
t
h
e
a
c
t
u
a
l
o
u
t
c
o
m
e
(
n
e
g
a
t
i
v
e
i
n
d
i
c
a
t
e
s
t
h
a
t
t
h
e
h
o
m
e
t
e
a
m
c
o
v
e
r
s
)
.
p
-
v
a
l
u
e
i
n
d
i
c
a
t
e
s
t
h
e
l
i
k
e
l
i
h
o
o
d
t
h
a
t
m
e
a
n
d
i

e
r
e
n
c
e
i
s
s
i
g
n
i

c
a
n
t
l
y
d
i

e
r
e
n
t
f
r
o
m
z
e
r
o
u
s
i
n
g
a
s
i
g
n
e
d
r
a
n
k
t
e
s
t
.
Sear 33
4.1. STATISTICAL ANALYSIS CHAPTER 4. RESULTS
predicted to win by 5.61 points but home teams win by 6.74 points in this sample.
This figure is not significantly different from zero, but it is possible that if the same
closing line and actual outcome stayed the same over a slightly larger sample, it would
reach significance. Panel B shows that there is not a significant difference when the
games are broken down over a broader, seasonal time period. The p-values in Panel
B do not corroborate the previous findings by Borghesi who found significance in
September, November and December/January.
3
Another possible explanation for the high home underdog late season winning
percentage we found is behavioral. In the NFL point-spread betting market, as in
almost all betting markets, the bettor has a negative expected value due to the
vigorish paid to a bookie. As a season progresses and a bettor begins to lose money,
they may change their strategy from the system that has caused them to lose that
money. Bettors hope that any change in strategy will help change their misfortunes
which allows them to justify the shift from rational bets to irrational bets. This
phenomenon has also been identified in horse racetrack betting (Rachlin, 1990; Ritter,
1994). Rachlin argues that the more bettors lose, the higher their risk tolerance
becomes. This results in placing more bets on long shots. Prior losses cause bettors
to end up over-betting on outcomes that are less likely to occur. In other systems
of betting a bettor would receive bettor odds for these bets but in the point spread
market this behavior results in irrational movements of the spread. This is a theory
that we cannot test with our data. Furthermore, it is a behavior that is likely to
manifest itself randomly during the sample based on the disposition of each bettor
and thus, would not be directly related to the venue of a game, the variable we are
working to explain.
While we can assume that bettors do act irrationally on occasion that fact does
not explain why there is a higher winning proportion of late season home underdogs.
The fact that the winning percentage is around 50% for the entire season indicates
3
Borghesi found the reverse of the cold weather factor happened in September. That is, the
cold climate teams were at a disadvantage when playing a mild climate team at home early in the
season.
Sear 34
4.1. STATISTICAL ANALYSIS CHAPTER 4. RESULTS
Table 4.5: Nevada football betting
Month CFB games NFL games Total games Total bet/game
September 964 208 1172 $6,976,349
October 925 232 1157 $8,745,662
November 715 260 975 $9,145,005
December 148 256 404 $22,224,760
January, February 56 104 160 $190,447,832
Notes: This table shows the number of football games and bet volume by month from September
2006 to February 2010. The number of games is defined as the number of college football (CFB)
and National Football League games listed in Las Vegas. Because Nevada pools its CFB and NFL
betting data for accounting purposes, the value of football bets cannot be broken down into college
vs. professional games.
that there is a group of informed bettors pushing the betting line to a rational spot.
Therefore, there is something about the end of the season that draws out more
irrational, uniformed bettors who influence the market in a way that can potentially
be exploited.
As the season nears completion, media interest in the NFL increases. The
final weeks of the regular season coincide with the conclusion of the baseball World
Series and the very early stages of the NBA season so most media outlets cover the
NFL and college football to great lengths over this time. As table 4.1 shows, the
amount bet on football each month increases consistently over the end of the season
culminating with the playoffs and Super Bowl in January and February.
4
Since the
informed bettors have limited wealth it is reasonable to assume that the new entrants
who come in late in the season drown out the effect of the rational bettors causing
arbitrage opportunities, such as the home underdog effect we have identified.
To further investigate what we have seen we develop four simple betting strate-
gies and test out their effectiveness over the course of our sample. These results can
be seen in Table 4.6. The first two columns show the results of two common betting
strategies.
5
While neither ‘bet on all home teams’ nor ‘bet on all home underdogs’
win enough to cover the vigorish, ‘bet on all home underdogs’ when used over the
last four weeks of the regular season and the playoffs yields a return of 56.52% and
4
The data in table 4.1 has been taken from the Nevada Gaming Control board who keep records
dating back to 1998. They do not differentiate between college and professional football.
5
Amoako-Adu et al. (1985) and Vergin and Scriabin (1978) show that simple strategies can be
effective.
Sear 35
4.1. STATISTICAL ANALYSIS CHAPTER 4. RESULTS
T
a
b
l
e
4
.
6
:
S
u
c
c
e
s
s
r
a
t
e
s
o
f
s
i
m
p
l
e
b
e
t
t
i
n
g
r
u
l
e
s
i
n
t
h
e
N
F
L
S
t
r
a
t
e
g
y
H
o
m
e
H
o
m
e
u
n
d
e
r
d
o
g
s
2
+
H
o
m
e
u
n
d
e
r
d
o
g
s
8
+
H
o
m
e
u
n
d
e
r
d
o
g
s
W
e
e
k
N
A
c
c
u
r
a
c
y
(
%
)
N
A
c
c
u
r
a
c
y
(
%
)
N
A
c
c
u
r
a
c
y
(
%
)
N
A
c
c
u
r
a
c
y
(
%
)
1
6
3
4
4
.
4
4
2
2
4
0
.
9
1
2
2
4
0
.
9
1
1
1
0
0
.
0
0
2
6
0
4
5
.
0
0
2
0
4
5
.
0
0
1
6
5
6
.
2
5
2
0
.
0
0
3
5
9
4
9
.
1
5
1
7
4
7
.
0
6
1
7
4
7
.
0
6
0
N
/
A
4
5
5
6
7
.
2
7
2
2
4
5
.
0
0
2
0
7
0
.
0
0
6
1
6
.
6
7
5
5
4
5
5
.
5
6
1
8
4
7
.
0
6
1
6
5
6
.
2
5
3
6
6
.
6
7
6
5
3
5
0
.
9
4
1
7
6
3
.
6
4
1
6
6
2
.
5
0
3
6
6
.
6
7
7
5
1
4
9
.
0
2
2
4
5
0
.
0
0
2
2
4
5
.
4
5
7
1
4
.
2
9
8
5
2
4
0
.
3
8
1
1
2
7
.
2
7
9
2
2
.
2
2
1
1
0
0
.
0
0
9
5
5
4
7
.
2
7
1
6
6
8
.
7
5
1
5
6
6
.
6
7
2
1
0
0
.
0
0
1
0
5
7
3
6
.
8
4
1
6
4
3
.
7
5
1
4
4
2
.
8
6
4
5
0
.
0
0
1
1
6
2
4
1
.
9
4
2
3
3
4
.
7
8
2
1
3
3
.
3
3
7
7
1
.
4
3
1
2
6
2
4
5
.
1
6
2
3
3
4
.
7
8
2
0
2
5
.
0
0
3
0
.
0
0
1
3
6
4
5
3
.
1
3
2
6
6
1
.
5
4
2
0
6
5
.
0
0
4
7
5
.
0
0
1
4
6
3
5
5
.
5
6
2
6
4
6
.
1
5
2
2
4
5
.
4
5
6
6
6
.
6
7
1
5
6
2
4
5
.
1
6
2
5
6
0
.
0
0
2
5
6
0
.
0
0
7
1
0
0
.
0
0
1
6
6
2
4
0
.
3
2
2
0
6
5
.
0
0
2
0
6
5
.
0
0
4
7
5
.
0
0
1
7
6
2
5
3
.
2
3
2
1
5
7
.
1
4
2
0
6
0
.
0
0
4
7
5
.
0
0
W
e
e
k
s
1
-
1
3
7
4
7
4
8
.
0
6
2
5
5
4
9
.
0
2
2
2
8
4
9
.
1
2
4
3
4
6
.
5
1
W
e
e
k
s
1
4
-
1
7
2
4
9
4
8
.
5
9
9
2
5
6
.
5
2
8
7
5
7
.
4
7
2
1
8
0
.
9
5
P
l
a
y
o

s
4
0
4
7
.
5
0
6
6
6
.
6
7
6
6
6
.
6
7
0
N
/
A
A
l
l
g
a
m
e
s
1
0
3
6
4
8
.
1
7
3
5
3
5
1
.
2
7
3
2
1
5
1
.
7
1
6
4
5
7
.
8
1
N
o
t
e
s
:
T
h
i
s
t
a
b
l
e
s
h
o
w
s
t
h
e
s
u
c
c
e
s
s
r
a
t
e
o
f
f
o
u
r
s
i
m
p
l
e
b
e
t
t
i
n
g
r
u
l
e
s
f
o
r
N
F
L
g
a
m
e
s
f
r
o
m
2
0
0
6
t
o
2
0
0
9
.
N
i
s
t
h
e
n
u
m
b
e
r
o
f
g
a
m
e
s
i
n
w
h
i
c
h
t
h
e
s
i
m
p
l
e
r
u
l
e
c
r
i
t
e
r
i
a
i
s
m
e
t
.
A
c
c
u
r
a
c
y
i
s
t
h
e
s
u
c
c
e
s
s
r
a
t
e
o
f
e
a
c
h
s
t
r
a
t
e
g
y
.
G
a
m
e
s
r
e
s
u
l
t
i
n
g
i
n
a
p
u
s
h
a
r
e
e
x
c
l
u
d
e
d
.
Sear 36
4.2. REGRESSION ANALYSIS CHAPTER 4. RESULTS
66.67%, respectively. This is more than enough to cover the vigorish and any other
transaction costs. The final columns of Table 4.6 show that bets on moderate and
extreme underdogs are even more precise.
6
While only the ‘bet on 8+ home under-
dogs’ is a high enough percentage over the whole sample to make up for the vigorish,
both ‘bet on 2+ home underdogs’ and ‘bet on 8+ home underdogs’ work very well
over the late season and into the playoffs. In fact, in the small sample of 21, the ‘bet
on 8+ home underdogs’ won almost 81% of the time. These findings show that the
most profitable betting strategies rely on the venue and spread but also the precise
timing of the bet.
4.2 Regression Analysis
In the previous section we demonstrated that in certain instances, bettors systemat-
ically misvalue bets. In this section we will present the results of a regression model
betting system. These strategies are designed to take advantage of any mis-pricings
that exist in the available bets. Initially, we will examine in-sample predictability
and compare the predictive accuracy of binary models versus OLS models. Then, we
will augment these models to include and momentum effects.
Since the classification as either favorite or underdog is not independent of
home or visitor status, we start by examining the results of the Binary Base Model
to find clarification on the conditional variables. We estimate equation 3.4 using a
pooled regression and find the coefficients (p-values) for the home favorite, visiting
favorite, and closing line to be 0.4721 (0.0000), 0.5025 (0.0000), and -0.0025 (0.5543),
respectively, which shows significant p-values for both the HF and VF terms. Since
both terms are positive, this says that favorites, both home and away, are more
likely to cover the spread. Since the coefficient on VF is greater than the coefficient
on HF this regression indicates that it is more likely for a visiting favorite to cover
6
Moderate and extreme spreads are defined as two points or greater and eight points or greater,
respectively. While ‘bet on all home teams’ and ‘bet on all home underdogs’ are two commonly
used simple betting strategies, bets on moderate and extreme home underdogs are not. However,
Borghesi (2007) discovered that as the spread increases, bets on home underdogs are more likely to
cover.
Sear 37
4.2. REGRESSION ANALYSIS CHAPTER 4. RESULTS
than a home favorite. A pooled regression of the late season games augments these
results. In this case the coefficients are 0.4621 (0.0000), 0.45885 (0.0000), and -0.0018
(0.8136). The highly significant, positive coefficients on both dummy variables again
indicate that favorites have a higher chance of covering, regardless of venue. However,
there is a switch wherein the coefficient on HF is now greater than the coefficient
on VF. This indicates that late in the season a home favorite is more likely to cover
than a visiting favorite.
7
To check for the climate’s role in mis-pricing Borghesi has developed the model
λ
i
= a
HFM
HF · M
i
+a
V FM
V F · M
i
+(β−1)CL
i
+a
HFM
HF · C
i
+a
V FM
V F · C
i
(4.1)
where M
i
= 1 if the game is played in a moderate climate or the game is played in
a cold climate and the visiting team is from a cold climate; M
i
= 0 otherwise and
C
i
= 1 if the game is played in a cold climate and the visiting team is from a moderate
climate; C
i
= 0 otherwise.
8
Results (not shown) indicate that the impact of weather
on late-season games is not fully reflected in the closing line. The parameter for
HF · C
i
is not significant, the coefficient for V F · C
i
is negative and highly significant
(coefficient= −0.0663, p-value= 0.0610). This means that weather is not correctly
factored into price when visiting favorites from a mild climate play on the road in a
harsh climate. Therefore, home underdogs with a climate advantage are undervalued.
While this agrees with Borghesi (2007) this is a very small coefficient, almost 1/6th of
what he found, and if we use a 5% level of significance, it is thrown out all together.
This does not unequivocally state that there is a mis-pricing by any means.
7
Both of these results contrast with the findings of Borghesi (2007). He found only VF late in
the season to be significant. This coefficient was negative indicating home underdogs are more likely
to cover the spread late in the season.
8
Though our previous analysis showed little significant p-levels for weather influence, it is im-
portant to confirm that this is because it does not exist and not due to the somewhat small sample
size.
Sear 38
4.3. IN-SAMPLE PREDICTABILITY CHAPTER 4. RESULTS
Table 4.7: NFL in-sample predictability
Base model
OLS Binary
Week N accuracy (%) accuracy (%)
1 64 50.00 53.13
2 62 46.77 56.45
3 62 48.39 50.00
4 55 54.55 30.91
5 56 50.00 46.43
6 53 41.51 52.83
7 53 49.06 52.83
8 51 52.94 54.90
9 54 40.74 70.37
10 59 37.29 57.63
11 62 53.23 67.74
12 62 59.68 59.68
13 64 45.31 53.13
14 64 62.50 45.31
15 64 32.81 59.38
16 64 28.13 65.63
17 58 63.79 51.72
Weeks 1-13 757 48.48 54.43
Weeks 14-17 250 46.40 55.60
Playoffs 30 56.67 70.00
All Games 1037 48.22 55.16
Notes: This table shows the success rate of two in-sample regression models used to predict outcomes
of NFL games from 2006 to 2009. Accuracy is the proportion of outcomes that are correctly predicted
in-sample. The Base Model is λ
i
= α
HF
HF
i
+ α
V F
V F
i
+ (β − 1)CL
i
+
i
where HF is 1 if the
home team is the favorite and HF=0 otherwise, V F=1 if the visiting team is the favorite, V F=0
otherwise and CL
i
is the absolute value of the closing point spread. In the OLS model λ
i
is the
outcome (the difference in points scored between the favorite and the underdog) minus the closing
point spread. In the binary model λ
i
= 1 if the favorite covers the spread and λ
i
= 0 otherwise. The
estimators are calculated each season and used to predict outcomes in that same season. Games
with a point spread of zero (‘Pick ‘em’ games) have been omitted from the sample.
4.3 In-sample predictability
To better quantify the value we can derive from imperfect information processing we
first design a series of models to predict within our sample. Using all of the outcomes
in a season we obtain parameters to develop a model to predict the outcomes within
that same season. The success of these models is shown in Table 4.7. Columns one
and two show that the accuracy for the Base OLS Model is 48.22% overall and 55.16%
overall, respectively. This means that if a gambler ex ante had perfect information
about the upcoming season he could devise an objective strategy to win bets at a
Sear 39
4.4. OUT-OF-SAMPLE PREDICTABILITY CHAPTER 4. RESULTS
rate of 55.16%.
It is also worth noting the difference between the OLS and Binary models’
accuracy. The Binary model is far more accurate in predicting in-sample outcomes.
9
This lends credence to the belief that bettors are unconcerned with the magnitude of
a won or lost bet, only if they win or lose it. For instance, a bettor does not care if
a favorite they bet on covers by 1-point or 10-points, they win their bet either way.
The superiority of the binary model confirms this.
4.4 Out-of-sample predictability
The idea that a gambler would have perfect information is grossly flawed, however.
If a gambler did have perfect information before the season starts he could have a
winning percentage of 100%, not just 55.16%. Many studies have found an objective
betting method that succeeds against the vigorish in-sample (Zuber et al. (1985)
for instance), not many have been able to show that this objective advantage holds
outside of the sample of data the model is based on.
To isolate and identify the length and severity of biases we establish two
variants of the Base Binary Model first proposed by Borghesi (2007).
10
The first is
the 1-Month Base Binary Model in which we regress the previous four weeks of data
to predict the next four weeks of results. Estimators for the first few weeks of each
season are derived from the final weeks of the previous season, except for the first
weeks of our sample, weeks 1-4 in 2006, which are omitted. The second is the 1-Year
Base Binary Model in which we regress the previous season’s data to to predict the
season’s results. The first season in our sample, 2006, has been omitted from this
model. The first model is used to identify and exploit short term biases while the
second model is designed to identify and exploit long term biases.
Table 4.8 shows the results of these models. Neither model achieves the
necessary 52.40% accuracy that has been the proposed break even point in previous
9
Conducting a similar study, Borghesi (2007) found increased accuracy in the binary model as
well, however his OLS model was accurate to 52.98%, a high enough accuracy to beat the vigorish.
10
We stop using the OLS model because we have shown the binary model to be a much more
accurate predictor.
Sear 40
4.4. OUT-OF-SAMPLE PREDICTABILITY CHAPTER 4. RESULTS
Table 4.8: NFL out-of-sample predictability using the base binary model
Base binary model variant
1-month 1-year
Week N Accuracy (%) N Accuracy (%)
1 48 54.17 48 58.33
2 46 50.00 46 54.35
3 46 54.35 46 65.22
4 41 43.90 41 48.78
5 56 42.86 42 47.62
6 53 45.28 39 53.85
7 53 56.00 40 55.00
8 51 47.06 39 46.15
9 54 53.70 41 53.66
10 59 45.76 44 52.27
11 62 58.06 46 54.35
12 62 46.77 47 36.17
13 64 64.06 48 45.83
14 64 46.88 48 45.83
15 64 54.69 48 52.08
16 64 60.94 48 43.75
17 58 51.72 41 46.34
Weeks 1-13 695 51.22 567 51.68
Weeks 14-17 250 53.60 185 47.03
Playoffs 40 52.50 30 70.00
All Games 985 51.88 782 50.51
Notes: This table shows the success rate of each time variant of the Base Binary Model at predicting
outcomes of NFL games from 2006 to 2009. Accuracy is the proportion of outcomes that are correctly
predicted out-of-sample. The Base Model is W
i
= α
HF
HF
i
+ α
V F
V F
i
+ (β − 1)CL
i
+
i
where
W
i
= 1 if the favorite covers the spread and W
i
= 0 otherwise, HF is 1 if the home team is
the favorite and HF=0 otherwise, V F=1 if the visiting team is the favorite, V F=0 otherwise and
CL
i
is the absolute value of the closing point spread. The 1-month variant estimates parameters
in four week blocks and predicts the following four weeks’ outcomes. The 1-year variant estimate
parameters over a 1-year period and predict the following season’s outcomes. As a result of these
methods the first four weeks of the 2006 season have been omitted from the 1-month variant and
the entire 2006 season has been omitted from the 1-year variant. Games with a point spread of zero
(‘Pick ‘em’ games) have also been omitted from the sample.
Sear 41
4.4. OUT-OF-SAMPLE PREDICTABILITY CHAPTER 4. RESULTS
studies. However the end of the season shows inefficient prices. The 1-Month Base
Model, used to identify short-term biases, yields a late season success rate of 53.60%.
This result indicates that there are biases late in the season that do not take into
account all the available information and create mis-pricings.
This finding is consistent with previous research done by Borghesi (2007). In
his study, the comparable statistic is 53.25%, less than the accuracy of our model.
This would indicate that the mis-pricing he discovered in 1981-2000 data has not
been corrected. There could be many reasons for this mis-pricing to persist. First,
as previously mentioned, it is possible that a few informed bettors are aware of the
misvaluation of late games by the market and are exploiting it. However, the small
number and limited capital of these individuals is not enough to move the market
towards the correct prices. This is analogous to the idea in equities markets that
arbitragers generally correct biased prices, but may be overwhelmed in the short-term
by noise traders. Secondly, this mis-pricing might persist from the lack of knowledge
of its existence. It is very possible that this, admittedly small, mis-pricing has only
been noted by academics in academically published work. The average NFL bettor
does not read Applied Economics and might not be aware of Professor Borghesi’s
research and findings. In this case, the mis-pricing will persist until enough bettors
are made aware of it to influence the market back to fair prices.
In addition to the late season bias, Table 4.8 shows evidence that the previous
season’s aggregate biases can be seen in the early stages of a season.
11
Early season
for the 1-Year Binary Base Model shows much higher accuracy than late season
indicating that by the end of a season bettors focus on the current season’s variables
and begin to disregard the last season’s results.
12
This same phenomenon can be seen
in the 1-Month Binary Base Model. The first four weeks of that model are based
on previous seasons’ results and those first four weeks begin with relatively high
accuracy, week 1 is 54.17% accurate compared to 51.22% accuracy for weeks 1-13,
11
A graphical representation of this data can be found in the appendix
12
This finding is opposite of the similar study done by Borghesi (2007). In his study he finds the
1-year model to have the same trend as the 1-month model, except the 1-year model is less accurate.
Sear 42
4.4. OUT-OF-SAMPLE PREDICTABILITY CHAPTER 4. RESULTS
and drop to 43.90% accuracy by week 4. This indicates that the biases present at the
end of the previous season influence the bettors through the first few weeks of the
new season until there is current data on which they can base their assessments. The
bettors inclusion of the previous season’s data into their early season pricings and
exclusion of current season data in their late season pricings indicates that the data
processing time for these bettors is longer than the month that our model affords.
Sear 43
Chapter 5
Conclusion
In this paper we demonstrate that the NFL betting market exhibits small pricing
misvaluations. We cannot conclude that it is a totally inefficient market as Borghesi
(2007) does, however. Using the same tests as Borghesi on a more recent sample
return some similar but many differing results. The main differences come in the
univariate tests. In these tests we look for a significant difference in the point spread
of a game and its actual outcome. The measure used is a p-value derived from a
Wilcoxon Signed Rank Test. While Borghesi found a clear pattern of statistically
significant differences late in the season and into the playoffs with home underdogs,
we find no such data. Instead, when testing for both home and home underdog bias
we find no significant points.
We then look at two simple betting strategies over our four year sample, bet-
ting on home underdogs throughout the season and betting on late season home
underdogs. We find, unlike Borghesi, that the strategy of betting on all home under-
dogs is not accurate enough to realize profits. However, we do find that betting on
home underdogs late in the season can be a profitable strategy; in our sample, con-
sisting of 124 observations, there was a 54.03% accuracy rate. This is large enough
to cover transaction costs such as the vigorish and make a profit. This finding is in
agreement with Borghesi’s results. This finding has not been tested out of sample
but the mis-pricing exists both in our recent data as well as Borghesi’s historical data
which allows us to state that there is a misvaluation of late season home underdogs
that has yet to be corrected.
Next, we examined the role of climate and weather in the pricing of NFL
betting lines. We did not find any significant differences between the point spread and
actual outcomes for games in which the home team held a weather advantage (a cold
weather home team playing a visiting team from a mild climate) when broken down
Sear 44
CHAPTER 5. CONCLUSION
by year. When examined by month the data again yielded no statistically significant
results, showing no pricing bias. In his study, Borghesi found the aggregate of his
sample to be affected to a statistically significant degree as well as in each month
except for October. Overall, Borghesi’s data shows much more evidence of mis-pricing
when judging univariate data with p-values than our data does.
We then began regression analysis. The purpose of regression analysis was
to try and determine an objective strategy that could be implemented in real world
situations to consistently make money. Using regression models proposed by Dare and
Holland (2004) we find statistically significant, positive coefficients for both the home
favorite (HF) and visiting favorite (VF) variables. This indicates that favorites, both
home and away, are more likely to cover than underdogs. For the entire regular season
visiting favorites are more likely to cover than home favorites, but when examining
the end of the regular season home favorites become more likely to cover. This finding
is easily linked to the idea of a climate advantage by home teams which we examine
using an augmented Dare and Holland equation proposed by Borghesi (2007). We
find that weather is not correctly factored into price when visiting favorites from a
mild climate play on the road in a harsh climate. Therefore, home underdogs with
a climate advantage are undervalued. These results had a smaller coefficient that
was less significant than Borghesi’s finding which could result from this phenomenon
declining over time indicating it might dissipate entirely over the coming seasons.
Next, we use two models from Dare and Holland (2004) to predict results in-
sample. This means the same season we use to create the model is the season we test
the model on. We test both a binary form and an OLS form of our Base Model. The
Binary Base Model proves to be a much better predictor than the OLS Base Model
and could be used to predict with 55.16% accuracy in-sample. In-sample predictions
have little real world consequence however, because they rely on data that is only
available ex post, and therefore could not be implemented during a season. We then
use two variants of the Binary Base Model to predict outcomes out-of-sample. The
first model used is a 1-month specification in which the previous month’s results are
Sear 45
CHAPTER 5. CONCLUSION
used to forecast the next month’s games. This model proves to be 53.60% accurate
over the late regular season, a finding that agrees with Borghesi’s results and confirms
a small bias exists for the time it takes bettors to process all available information.
The long term model, a 1-year sample used to predict the next year’s games, proves
to be less accurate and never accurate enough to compensate for the vigorish. Both
models confirms a bias towards the previous season’s results during a season’s early
weeks. That is, the bettors are heavily biased towards the end of the previous when
predicting the current seasons games initially.
Overall, we show that the NFL betting market is largely efficient. The few
instances where we identify a mis-pricing the mis-pricing is small and confined to a
limited amount of weeks. The reason these mis-pricings have not been corrected by
arbritraguers is thought to be their limited capital. As the season progresses, more
bettors come into the market causing the informed bettors to have less ability to
influence the market. Therefore, the mis-pricings continue to exist in the market.
The methods and ideas presented in this paper can have applications beyond
the sports betting market. As we have seen sports betting markets exhibit many
of the features of an equities market however they have one crucial difference that
makes them unique; the true value of the underlying is easily observed. While a
stock’s true value is rarely if ever revealed, the weekend’s results clearly assign a
value to a potential bet. The results of test in the NFL point spread betting market
can then have direct implications about price inefficiency in equities markets. We
observe that the increase in betting volume leads to more inefficiency, an observation
that may provide an interesting opportunity for future arbitrage research in financial
markets.
Sear 46
Appendices
Sear 47
Table 1: Climate of NFL teams
Team Climate Team Climate
AFC East NFC East
Buffalo Bills Cold Dallas Cowboys Mild
Miami Dolphins Mild New York Giants Cold
New England Patriots Cold Philadelphia Eagles Cold
New York Jets Cold Washington Redskins Mild
AFC North NFC North
Baltimore Ravens Mild Chicago Bears Cold
Cincinnati Bengals Cold Detroit Lions Mild
Cleveland Browns Cold Green Bay Packers Cold
Pittsburgh Steelers Cold Minnesota Vikings Mild
AFC South NFC South
Houston Texans Mild Atlanta Falcons Mild
Indianapolis Colts Mild Carolina Panthers Mild
Jacksonville Jaguars Mild New Orleans Saints Mild
Tennessee Titans Mild Tampa Bay Buccaneers Mild
AFC West NFC West
Denver Broncos Cold Arizona Cardinals Mild
Kansas City Chiefs Mild St. Louis Rams Mild
Oakland Raiders Mild San Francisco 49ers Mild
San Diego Chargers Mild Seattle Seahawks Mild
Notes: This table shows the 32 NFL teams organized by division and the climate in which they
play their home games. Teams with ‘cold’ climates have been factored into the weather analysis
portion of this paper
Sear 48
Figure 1: Accuracy of out-of-sample Binary Base Model variants
Sear 49
Bibliography
[Amoako-Adu et al., 1985] Amoako-Adu, B., Marmer, H., and Yagil, J. (1985). The
efficiency of certain speculative markets and gambler behavior. Journal of Eco-
nomics and Business, 37(4):365 – 378.
[Avery and Chevalier, 1999] Avery, C. and Chevalier, J. (1999). Identifying investor
sentiment from price paths: The case of football betting. The Journal of Business,
72:493–521.
[Borghesi, 2006] Borghesi, R. (2006). The home team weather advantage and biases
in the NFL betting market. Journal of Economics and Business, 59:340–354.
[Borghesi, 2007] Borghesi, R. (2007). The late-season bias: explaining the NFL’s
home-underdog effect. Applied Economics.
[Boulierk, 2006] Boulierk, B. L. (2006). Testing the efficiency of the National Football
League betting market. Applied Economics.
[Camerer, 1989] Camerer, C. F. (1989). Does the basketball market believe in the
‘hot hand’ ? The American Economic Review, 79:1257–1261.
[CNBC, 2009] CNBC (2009). Top sports for illegal wagering.
[Dare and Holland, 2004] Dare, W. H. and Holland, A. S. (2004). Efficiency in the
NFL betting market: modifying and consolidating research methods. Applied
Economics, 36:9–15.
[databasesports.com, 2008] databasesports.com (2008). NFL Yearly Schedules.
Sear 50
BIBLIOGRAPHY BIBLIOGRAPHY
[FootballLocks.com, 2010] FootballLocks.com (2010). Historical NFL football
spreads.
[Glickman and Stern, 1998] Glickman, M. E. and Stern, H. S. (1998). A state-space
model for National Football League scores. Journal of the American Statistical
Association, 93:25–35.
[Gray and Gray, 1997] Gray, P. K. and Gray, S. F. (1997). Testing market efficiency:
Evidence from the NFL sports betting market. The Journal of Finance, 52:1725–
1737.
[Kassis and Dole, 2008] Kassis, M. M. and Dole, C. (2008). Wage dispersion and
team performance in the NFL. Southern Business and Economic Journal.
[MacCambridge, 2005] MacCambridge, M. (2005). America’s Game: The epic story
of how pro football captured a nation. First Anchor Books.
[Nevada Gaming Commission, 2010] Nevada Gaming Commission (2010). Gaming
revenue reports.
[Paul et al., 2003] Paul, R. J., Weinbach, A. P., and Wilson, M. (2003). Efficient
markets, fair bets, and profitability in nba totals 1995-96 to 2001-02. The Quartely
Review of Economics and Finance, 44:624–632.
[Rachlin, 1990] Rachlin, H. (1990). Why do people gamble and keep gambling despite
heavy losses? Psychological Science, 1:294–297.
[Ritter, 1994] Ritter, J. R. (1994). Racetrack Betting - An example of a market with
efficient arbitrage, chapter 5, pages 431–444. World Scientific Publishing Co. Pte.
Ltd., 2008 edition.
[Vergin and Scriabin, 1978] Vergin, R. C. and Scriabin, M. (1978). Winning strate-
gies for wagering on National Football League Games. Management Science,
24(8):809–818.
Sear 51
BIBLIOGRAPHY BIBLIOGRAPHY
[Zuber et al., 1985] Zuber, R. A., Gandar, J. M., and Bowers, B. D. (1985). Beating
the spread: Testing the efficiency of the gambling market for national football.
The Journal of Political Economy.
Sear 52

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close