I hereby declare that I am the sole author of this thesis.
I authorize Princeton University to lend this thesis to other institutions or individuals for the purpose of scholarly research.
Andrei C. Grecu
I further authorize Princeton University to reproduce this thesis by photocopying or by other means, in total or in part, at a t the request of other institutions or individuals for the purpose of scholarly research.
Andrei C. Grecu
2
To my grandfather, Nicolae Dumitrescu
3
I am especially grateful to Professor Mulvey for his guidance in writing this thesis, his advice and his unyielding support. I would also want to thank Professor Vanderbei, Professor Powell, and Professor Lord for teaching me a lot of the material used in this thesis. In addition, I am indebt to the whole Princeton community for challenging my mind over the last four years.
Dear family, thank you for your incredible support from thousands of miles away, in Romania. I would have been lost without you in this process.
4
CHAPTER 1: CHAPTER 2: 2.1
(16-24)
2.2
(25-30)
2.3
(31-41)
CHAPTER 3: CHAPTER 4: CHAPTER 5: APPENDIX A: APPENDIX B:
5
Chapter 1: Introduction
Andrei C. Grecu
Chapter1:
“ Some people believe soccer is a matter of life and death. I am very disappointed with that attitude. I can assure you it is much, much more than that. “ Bill Shankly, President of F.C. Liverpool
The Soccer World Cup tournament, held every four years, is the most watched sports event in the world, surpassing even the Olympic Games. Moreover, enthusiasm for soccer is relatively uniformly spread across the world, both in term of spectators and actual participants. For example, 194 national tea ms, essentially a team for every country in the world, competed in qualifying games for a place in the next World Cup final tournament, which will take place in Germany, in June-July 2006. Given the international appeal of soccer and of the World Cup in particular, there is a huge interest in assessing the relative strengths of the teams that take pa rt in the tournament and implicitly predicting the overall winner. The intrinsic complexity of the World Cup tournament makes the job of ranking participating teams both challenging and mathematically stirring. In the first phase of the tournament, over a period of about ab out two years, approximately 200 national teams compete in qualifying games on five continents, with the highest ranked teams in each
6
Chapter 1: Introduction
Andrei C. Grecu
geographical region advancing to the final tournament. Subsequently, the 32 qualified teams travel to the country that organizes the World Cup and embark upon a marathon of games. Initially, the 32 teams are divided into e ight groups of four, with each group playing six round robin matches. Following the initial games, the top two teams from each group advance to the second round of the final tournament, which which consists of a simple 16-team knockout tournament. Thus, after approximately one month and 64 matches played, the final of the World Cup determines the World Cup Champion, “the best” national team in the world for the next four years. While the World Cup effectively compares teams by having them play against one another on the soccer field, any attempt to rank the teams before the start of the competition faces a number of challenges. challenges. First, by its very nature soccer is is a highly unpredictable game- only a few goals are scored during the ninety minutes of a normal game, and as opposed to baseball or football, few statistics are recorded for each game. Second, participating teams come from various regions of the world, in which soccer is played at different levelslevels- even though the winner of a geographic region might be much weaker than the winner of another an other region, the format of the World Cup guarantees that teams from all regions take part in the final tou rnament. Third, the relative rarity of games between national teams makes it difficult to find sources sources for comparison- a soccer national team only plays an average of ten games every year, so there are still national teams that have never played one another. Despite the challenges discussed here, a number of papers have tried to model soccer scores and simulate the World Cup tournament.
7
Chapter 1: Introduction
Andrei C. Grecu
Literature Review Kuonen (1997) fits a logistic regression for the probability of winning a soccer game, based on seeding coefficients computed from the outcome of previous games played by the two teams. Karlis and Ntzoufras (1998) e xamine the choice of a Poisson distribution for modeling goals in soccer games and build a Poisson log-linear model for scores in the Greek national league. Dyte and Clarke (2000) also assume a Poisson distribution of goals and model the 2002 World Cup based on the (controversial) national team rankings provided by the Federation of International Football Association (FIFA). The Palomino, Rigotti, and Rustichini (1998) model, inspired from g ame theory, looks at a soccer game in continuous time and examines the effect of three factors- team’s ability (performance record), passion (home-field advantage), and strategy (reaction to current score), in determining the outcome of a soccer game. Finally, Koning (2001) simulates soccer championships using a Poisson fit to predict game outcomes based on historical scoring intensities of the two teams involved in the game. Simulating the soccer soccer World Cup final tournament is even more challenging than simulating the games in a national league. Teams that compete in the World Cup final tournament come from different qualifying tournaments, in which the socc er played can have different characteristics. For example, national teams in South America tend to score on average more goals when they play each other than teams in Europe do. Therefore, the average number of goals scored per game might not be the best variable to predict the result of an encounter between a team in South America and a team in Europe. In all the models mentioned in the previous paragraph, the variables used in fitting the regressions are derived from aggregate statistics of the previous performance of the two
8
Chapter 1: Introduction
Andrei C. Grecu
teams. This approach might be reasonable for teams competing against the same opponents in a league, but in the case of the World Cup we need to adjust the variables to account for the different backgrounds from which participating teams come from. Regarding the soccer World Cup as a competition between teams coming from different regional tournaments brings to mind the problem faced every year by the Bowl Championship Series (BCS) in ranking American college football teams. The two problems- ranking college football teams and ranking World Cup soccer teams, are similar in that: i) the number of games played by every team is relatively small, ii) teams play many more games within their regional league than against teams in other leagues, and iii) crucially, the quality of the opponents of different teams varies from region to region. Thus, the techniques used to rank college football teams can be tailored to rank the teams that participate in the soccer World Cup. The theory behind ranking teams in uneven paired competitions has strong and diverse mathematical foundations. Keener (1993) formulates the ranking problem as a linear eigenvalue problem and solves it by using the result of the Perron-Frobenius theorem. Goddard (1983), Stob (1985) and Ali, Cook, and Kress (1986) develop algorithms for rankings that satisfy the so-called minimum violations ranking (MVR) criterion, which minimizes the instances in which lower ranked teams defeat higher ranked teams. Wilson (1995) builds a neural ne twork based on previous interactions between teams and looks for the equilibrium values of the network. Finally, Thomson (1975) and Reid (2003) design least lea st squares and maximum likelihood methods for ranking teams. Even though some these rankings have strong mathematical foundations,
9
Chapter 1: Introduction
Andrei C. Grecu
all ranking methods are intrinsically subjective by means of the v ariables chosen to explain outcomes, the parameters of the model, and the final interpretation of the results.
Overview This thesis looks at the soccer World Cup and a nd uses mathematical tools to determine the relative strengths of the participating teams and simulate the structure of games within the tournament. Given the various backgrounds of the qualified teams, and the complexity of the World Cup tournament, I use a three-step approach to assessing the relative strengths of the national soccer teams that participate in the soccer World Cup. First, in Chapter 2 I implement three algorithms to rank the participating teams, method takes as input previous before the start of the tournament: i) a matrix-based method takes interactions among teams and returns an eigenv ector with the relative value assigned to each team. ii) a random walker algorithm looks at the steady-state macroscopic solution of a setting in which a number n umber of vacillating voters perpetually change their mind regarding their favorite team, thus executing random walks on a network defined by the participating teams (nodes) and their previous interactions (edges). iii) a neural network algorithm looks at a neural network whose nodes are the participating teams and whose connections are determined by previous interactions among the teams, and uses a soccerintuitive transfer function to update the value o f each node (team) until a steady state solution is reached. Second, in Chapter 3 I use the rankings presented above to make predictions regarding the outcome of separate games ga mes in the World Cup. I start by assuming a Poisson distribution of goals scored in a soccer game and I fit a nonlinear regression using the
10
Chapter 1: Introduction
Andrei C. Grecu
number of goals scored by each team in the games from the 2002 2 002 World Cup. Using the regression results, I compute the probability of winning assigned to both teams competing in a World Cup game. However, given the complex structure of the soccer World Cup, it is difficult to explicitly calculate the conditional probabilities for each team to win the tournament, so I decide dec ide to simulate the games instead. Third, the simulation of the World Cup games performed in Chapter 4 allows me to determine the probability of winning the World Cup assigned to each team. In order to determine the manner in which teams are favored or disadvantaged by the World Cup draw, I also simulate a round robin competition between all the teams qualified for the World Cup. Even though such full competition is not possible in practice because of the great number of games involved, it is considered intuitively the fairest way of determining the best team. Therefore, by comparing the winning probabilities from the World Cup simulation with the winning probabilities from the round robin simulation, I am able to determine which teams had a “lucky draw” for the World Cup games. Finally, I conclude in Chapter 5 by examining the accuracy of my predictions for the 2002 World Cup, making predictions for the up-coming 2006 World Cup, and drawing an analogy between sports betting and the financial markets.
11
Chapter 2: Ranking Algorithms
Andrei C. Grecu
Chapter 2:
Overview This chapter presents three algorithms used for ranking the 32 soccer national teams that qualified for the 2006 Soccer World Cup. By looking at historical games played between soccer national teams around the world, each algorithm returns the relative values of the teams prior to the start of the tournament. Whilst past performance is not always a good indicator of the present value of a team, I will focus on those historical games that are most relevant to the real value of the teams. First, I look at games that took place in conditions somehow similar to the World Cup, namely games played on neutral ne utral field as part of a relevant, competitive championship. Luckily, independent of the World Cup, national soccer teams also compete for regional supremacy on each continent. Thus, every two years, African countries compete in the African Nations Cup. Every four years, national teams in Asia compete for the Asian Cup. Held every two years, the CONCACAF Gold Cup reunites teams in North America. Every three years, teams in South America take part in Copa America. Finally, every four years the best teams in Europe participate in the European Championship. This plethora of regional tournaments generates a significant number of interactions between teams on the same continent. co ntinent. Conveniently, the World Cup, which takes place every four years, reunites teams from all over the world, thus allowing comparisons between teams on different continents.
12
Chapter 2: Ranking Algorithms
Andrei C. Grecu
Second, I am interested in historical games played b etween two teams of comparable strength. For ranking purposes, the fact that Australia beat Cook Islands 16-0 is far less relevant than a tight game between b etween two teams of comparable strengths, such as Germany and Argentina. Therefore, I will focus my analysis only on g ames in which both teams qualified for at least one of the previous three World Cups. This totals 55 competitive teams, with 26 teams teams from Europe, 9 from Africa, 8 from South America, 7 from Asia, and 5 from North America:
North America (5)
Europe (29) Asia (7)
South America (8)
Africa (8)
Figure 1: The five qualifying regional tournaments with the number of teams that qualified to at least one World Cup since 1994 in parentheses
Even though only 32 soccer teams qualified for the 2006 World Cup, I choose to look at games played among 55 teams of comparable strength. By looking at more games and ranking 55 instead of only 32 teams, I increase the precision of the algorithms in determining the values of the 32 teams that did qualify for the World World Cup. For example, even though Cameroon did not qualify for the World Cup this year, its previous games
13
Chapter 2: Ranking Algorithms
Andrei C. Grecu
against competitive teams around the world allows us to better rank other African teams. For instance, Togo qualified for the first time to the World Cup this year, so it does not have direct previous interactions with teams on other continents. However, its games against Cameroon and other African countries coun tries allow us to better determine Togo’s overall strength. Finally, I consider games played between soccer national teams over the last 12 years. Although this is a long period of time, each game is also assigned a weight that decreases with the number of years since the game took place. Thus, a game that took place ten years ago is approximately ten times less significant in de termining the current value of the team than a game that took place this year is. In addition, a lot of skilled soccer players start playing for their national teams when they are very young and they do play for ten or twelve years before they retire. Also, it is not unusual for a national team to form around a nucleus of talented players who will play for their country for approximately a decade. Consequently, putting heavy weights on recent games, but also decreasing weights on past encounters, effectively captures the relative strengths of the teams over time. By comparison, the Federation of International Football Association (FIFA) also uses results from the previous eight years in computing its coefficients for each country. To sum up, I look at a t historical games between competitive teams as part of a competitive tournament, in which the teams play their best players at full potential. In particular, I look at 477 games among 55 teams at the following competitions:
14
Chapter 2: Ranking Algorithms
Andrei C. Grecu
1. World World Cup: Cup: South South Korea Korea/Ja /Japan pan 2002, 2002, Fran France ce 1998, 1998, USA USA 1994 2. African African Nations Nations Cup: Egypt Egypt 2006, 2006, Tunisi Tunisiaa 2004, 2004, Mali Mali 2002, 2002, Ghana 2000, Burkina Faso 1998, South Africa 1996, Tunisia 1994 3. Asian Asian Cup: Cup: China China 2004, 2004, Lebanon Lebanon 2000, 2000, Unite United d Arab Emir Emirate atess 1996 4. CONCAC CONCACAF AF Gold Gold Cup: Cup: USA 2005 2005,, Mexico Mexico 2003, 2003, USA USA 2002, 2002, USA USA 2000, 2000, USA 1998, USA 1996 5. Copa Copa America America:: Peru Peru 2004, Colom Colombia bia 2001, 2001, Parag Paraguay uay 1999, 1999, Bolivi Boliviaa 1997, Uruguay 1995 6. European European Champio Championship: nship: Portugal Portugal 2004, Belgium/Net Belgium/Netherla herlands nds 2000, England 1996
Acknowledgements The first algorithm is inspired from the matrix-based algorithms for ranking American college football teams, as discussed in Keener (1993), Boginski, Butenko, and Pardalos (2004), and Martinich (2003). The second algorithm is a variation of the random walker algorithm discussed in Callaghan, Mucha, an d Porter (2005). The third algorithm is a neural network method somehow similar to the approach presented in Wilson (1995). However, I modify each of these algorithms in at least two important ways. First, First, I implement soccer-intuitive functions and parameters in each algorithm, so that they deal with results of soccer rather than football games. Second, since I look at games that took place over a longer period of time, time, I discount the importance assigned to each game, so that recent games have a much greater impact on the rankings than older games do.
15
Chapter 2: Ranking Algorithms
Andrei C. Grecu
The Intuition “Spain beat Italy, and Italy beat Brazil, therefore Spain should win against Brazil ” or more intricate “ Brazil tied against Germany, Germany beat Japan, and Brazil lost against Italy, therefore Italy should definitely win against Japan”Japan ”- sport fans often make such kinds of conjectures regarding the outcome of future games based on previous results. Even though such predictions often turn ou t to be wrong, the deduction de duction process is not entirely speculative, as previous games do contain information regarding the relative strengths of the teams. For example, consider the following h istory of games between soccer national teams represented by a directed graph- an arrow from BRA to GER indicates that team BRA beat team GER: BRA
GER
TUN ITA SPA
JAP Figure 2: Directed 2: Directed graph showing a hypothetical history of games In this scenario, JAP clearly looks like the weakest of the six teams depicted, but it is less intuitive how the other teams are ranked, especially since different teams have
16
Chapter 2: Ranking Algorithms
Andrei C. Grecu
played a different number of games. A starting starting point for comparison might be the winning percentage of each team, winning% =
number of games won total number of games
:
BRA
GER
ITA
SPA
TUN
JAP
0.500
0.667
0.500
0.667
0.500
0.000
Using winning percentage as the ranking rank ing criterion, GER and SPA are tied for first place. However, such a ranking does not take into account the quality of teams defeated by each team Should GER and an d SPA get the same credit for defeating ITA and TUN, respectively? In order to account for the strength of schedule of each team, we calculate updated winning percentages using the defeated team’s winning percentage instead of a 1, to account for a victory,
(winning% of defeated teams) . For example, the updated total number of games
winning percentage for GER is
0.500 0.667 3
0.389 , since GER played a total of three
games and beat ITA and SPA, whose winning percentages were 0.50, and 0.67, respectively. Similarly, the updated winning percentage for SPA, which used to be tied with GER, is now
(0.500 0.500) 3
0.333 . With the updated winning percentages, GER
is ranked ahead of everybody else since it beat better opponents:
BRA
GE R
ITA
SPA
TUN
JAP
0.333
0.389
0.125
0.333
0.000
0.000
17
Chapter 2: Ranking Algorithms
Andrei C. Grecu
However, if we update the winning percentages once again, we obtain an even better ranking, based upon already updated winning percentages for each team. With the new percentages, BRA is now ranked in first place! However, if we keep updating the winning percentages following the process described above, the rankings eventually stabilize to the following:
BRA
GER
ITA
SPA
TUN
JAP
0.344
0.290
0.204
0.162
0.000
0.000
In other words, BRA gets the first place bec ause its victory came against a very strong GER, while GER’s two victories came against relatively weaker ITA an d SPA. Also, even though ITA and SPA have the same number of victories as GER, half their victories came against the weak teams TUN and JAP, which in turn lowered their own ranking. Thus, using a reasonably intuitive algorithm, we were able to rank the teams tea ms according to their winning percentages adjusted ad justed for the relative strength of their schedules.
The Mathematical Model Consider a competition in which N participants play an uneven paired schedule, meaning that not all teams play each other. Let ni be the number of games played by participant i, and let aij be some nonnegative number assigned to each team, depending on the outcome of the game between participant i and participant j. Now, if we assume that r
18
Chapter 2: Ranking Algorithms
Andrei C. Grecu th
is a vector of ranking values, with positive component r j indicating the strength of the j participant, then we define the overall ove rall score for participant i as: si
1
ni
N
*
a
ij
* r j
j 1
The division by ni prevents teams from accumulating a large score just by playing extra games. Also, the matrix A with entries aij is often called a preference or dominance matrix, since it contains the scores assigned to each team based on its previous interactions with other teams. If we further assume that the rank of a team is proportional to its score, then the ranking vector vec tor r has to be a positive eigenvector eigen vector of the positive matrix A:
A * r * r
An example of a scheme that assigns scores aij as a function of the outcome of a separate game between team i and team j, is the following:
1, if team i beat team j aij = 0.5, if team i and team j tied 0, if team i lost against team j With the choice of aij described above and letting r0 be the column vector with j entries, it is easy to check that A*r0 is the column vector corresponding to the winning 2
percentages of each team, A *r0 gives the average winning percentage of all defeated n
teams, and so on. The solution that we are looking for is the limit, lim (A *r0). However, n
n
the product A *r0 gets very small as n tends to infinity, so we use the power method to
19
Chapter 2: Ranking Algorithms
Andrei C. Grecu
find the ranking vector r, which is the eigenvector corresponding to the largest eigenvalue of A: r lim n
An * r 0 | An * r0 |
The Perron-Frobenius theorem tells us when this limit exists and gives a un ique, positive solution for the ranking vector r:
Theorem: If Theorem: If matrix A has nonnegative entries, then there exists an eigenvector r with nonnegative entries, corresponding to a positive eigenvalue λ. Furthermore, if the matrix A is irreducible, the eigenvector r has strictly positive entries, is unique and simple, and the corresponding eigenvalue if the largest eigenvalue of A in absolute value.
For paired competitions, the matrix A is irreducible if there is no partition of the teams into two sets S and T such that no team in S has played any team in T, or every eve ry game between a team from S and a team from T resulted in a victory for the team in S. In other words, we need all teams to be connected by previous games, and we need a cycle in the directed graph- we need a sequence of distinct teams teams t1, t2, t3, …, tk such that t1 beat t2, t2 beat t3, …, and tk beat t1. In particular, for the preference matrix to be irreducible, there can be no winless teams. When these conditions are satisfied, the limit r lim n
An * r 0 | An * r 0 |
converges to the unique positive eigenvector of A, which gives a
positive ranking of teams.
20
Chapter 2: Ranking Algorithms
Andrei C. Grecu
Using the preference matrix A to find the ranking vector of teams has sound mathematical foundations and effectively takes into account the strength of schedule of each team when rankings the teams. However, the weakness of this method comes from the subjective choice of the score aij for team i, following a game between team i and team j. A scheme that assigns a score of 1 for a victory, 0.5 for a tie, and 0 for a loss, is the natural choice, since in this case A*r 0 gives the winning percentages of the teams, and 2
A *r 0 gives the average winning percentage of all defeated teams, thus containing information regarding the strength of schedule. In particular, this scheme works well when teams play each other frequently, thus making aij a better indicator of the comparative strength of the two teams. However, when interactions between teams are relatively rare or even reduced to one game- as is often the case with soccer national teams, this simple scheme ineffectually gives all of the credit for the win to the winner, while the loser gets a score of zero, regardless of the degree of its defeat. In order to avoid lopsided splits of the merit for a victory between the two teams, I implement a formula that assigns both teams a score of 0.5 before the start of the game, and then updates the scores, as a function of the number of goals scored by each team: aij
Ycf Gij 2 * Ycf Gij G ji
Where: o f goals scored by team i against team j Gij is the number of Y is a year year coefficie coefficient nt equal equal to number number of year year since the the game took took place cf
21
Chapter 2: Ranking Algorithms
Andrei C. Grecu
In case of a tie, Gij= G ji, this scheme assigns each team a score of 0.5 . Then, the more convincing the victory, the closer the winner’s score gets to 1 and the loser’s score gets to 0, without actually equaling 1 or 0, respectively. Furthermore, by means of the year coefficient Ycf , this scheme puts a lot of emphasis on games played in recent years and decreasing weights on games played a long time ago. For example, a 3-1 victory of team i over team j brings team i aij=
but only aij=
12 3 2 *12 3 1
1 3 2 3 1
0.667 points for a game played in 2006,
1 994. By assigning the 0.536 points for a games back in 1994.
winning team two thirds of the one point per game in 2006, but only slightly more than the initial half-half split in 1994, this scheme clearly rewards current success more than past results.
Application to the 2006 Soccer World Cup In order to rank the teams that qualified to the the 2006 soccer World Cup, I applied the algorithm described above to 477 games among the 55 national teams that took part in at least one World Cup between 1994 and 2006. Importantly, all 477 games took place during a major tournament, under conditions con ditions somehow similar to the World Cup. First, teams had a great incentive to win the game and advance to the next phase of the tournament, thus playing its best available players at full capacity. Second, games took place on neutral territory, so that no teams (other than the nation organizing the competition) enjoyed the home-field advantage. Plugging the 477 games in Excel, and manipulating the data to assign a score to all previous interactions using the scheme discussed in the previous section, I formed the
22
Chapter 2: Ranking Algorithms
Andrei C. Grecu
dominance matrix A. Using MATLAB to find the eigenvector r lim n
An * r 0 | An * r 0 |
,I
obtained the ranking of the 55 teams considered. Within the ranking of the 55 teams, I identify the teams that qualified for the 2006 World Cup in Germany, and I scale the 1
results on a 100-point scale for the following final rankings :
1
Australia and Ukraine did not participate in any of the 27 major tournaments considered, so they were not among the 55 teams ranked. Consequently, Consequently, I arbitrarily assumed that Australia and Ukraine take the value of the last competitive team that they defeated: Australia defeated Uruguay in the playoffs for the 2006 nd World Cup, and Ukraine defeated Turkey in the 2 European qualifying group. Therefore, I assigned Australia the value of Uruguay, and Ukraine the value of Turkey.
23
Chapter 2: Ranking Algorithms
Andrei C. Grecu
2006 EIGENVECTOR RANKING
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
BRAZIL 100.00 ITALY 87.52 GERMANY 84.65 MEXICO 81.89 NETHERL AND 77.87 ARGENTINA 73.33 FRANCE 72.42 ENGLAND 68.29 SPAIN 65.46 PARAGUAY 64.55 SWEDEN 60.96 CROATIA 60.78 PORTUGAL 58.53 USA 57.15 SOUTH KOREA 52.21 AUSTRALIA 48.15 UKRAINE 43.76 SAUDI ARABIA 42.09 COSTA RICA 40.89 ECUADOR 39.44 TUNISIA 36.79 CZECH REPUBLIC 35.81 JAPAN 35.63 SWITZERLAND 33.49 SERB SERBIA IA AND MONTE MONTENE NEGR GRO O 23.1 23.11 1 IRAN 20.97 GHANA 17.78 IVORY COAST 15.20 TRINIDAD AND TO TOB BAGO 13.68 POLAND 10.09 ANGOLA 8.16 TOGO 7.33
24
Chapter 2: Ranking Algorithms
Andrei C. Grecu
The Intuition Soccer fans are renowned for the fanatic fana tic love and loyalty to their team. However, for the sake of this algorithm, we need to consider a set of N perpetually vacillating soccer supporters. Each of these N supporters gets a single vote to designate his or her he r favorite team out of the 32 teams tea ms qualified for the soccer World Cup. However, over time each supporter changes her favorite team according to the following rule: 1. She recalls recalls the the win-loss win-loss outcome outcome of a single single game played played by her her favorite favorite team 2. She flips flips a weighted weighted coin coin that that is more likely likely to come come up Heads Heads 3. Completely Completely ignoring ignoring her current current favorite, favorite, she she goes with with the the winner of the game if Heads, and with the loser if Tails 4. Retu Return rnss to to ste step p1
By constantly looking at historical historical games between teams and flipping weighted coins to decide which team to vote for, supporters will perpetually change their mind regarding their favorite team. Indeed, at the microscopic level voters act as perpetual random walkers on a network whose nodes are defined by soccer teams, and whose connections are defined by previous games between the teams. However, even though each individual voter is a random walker, at a macroscopic level the total number of votes cast for each team stabilizes over time. Thereafter, we are able to rank the teams according to fraction of soccer supporters who vo te for each team. The main appeal of this ranking scheme is its simplicity. The only subjective input for the algorithm is the weight of the coin flipped by the soccer fansfans- the probability
25
Chapter 2: Ranking Algorithms
Andrei C. Grecu
with which they go with the winner of o f the game they consider. In order to make the algorithm reasonable, this probability must lie within the interval (0.5, 1), so that the supporter’s vote goes for the winner, rather than with with the loser of the game she consider. In particular, a coin weight close to the p= 0.5 limit does not guarantee a vote in case of a victory, but rather rewards the strength of schedule of ea ch team. On the other hand, a coin weight close to the p= 1 limit almost guarantees a vote in case of a victory, therefore therefore favoring teams with an undefeated record. With these considerations in mind, I will let voters in my model go with the winner of the game with the mid-value probability p= 0.75, so that the algorithm equally emphasizes empha sizes the strength of schedule and the winning record of the teams.
The Mathematical Model Consider a competition in which N teams play an uneven paired schedule, meaning that not all teams play each other. For each team i, let ni be the total number of games games played played,, wi the number of wins, and li the number of losses. Because soccer allows tied games, we treat a tie as half a win and half a loss, so that ni = wi + li stays true. Also, consider V voters, with vi voters casting their single vote to team i, so that even though voters change their preferences, the total numbe r of voters remains constant,
v
i
V .
i
At the beginning of the algorithm each voter is randomly assigned a favorite team out of the N teams to be ranked. Then each of the N voters randomly picks a previous game played by his favorite team and, completely ignoring his current team preference, he casts his vote to the winner of that game with probability p, and goes with the loser with probabi probability lity 1- p. 26
Chapter 2: Ranking Algorithms
Andrei C. Grecu
Additionally, in order to put more emphasis on recent games, I make the probability of choosing a given game inversely proportional to the number of years since it took place. In order to do this I assign each game the value
1 Y k
, where Yk is the number
of years since the game took place, and I recalculate the weighted sums for the number of games, wins, and losses for each team. For example, assuming that team i played n games, won w of them, lost l of them, and tied in t of them, I calculate the discounted number of games, ni, wins, wi, and loses, li, for team i by summing the discounted value of each separate game and an d by treating ties as half-wins and half-losses, so that I maintain the equality ni = wi + li:
ni
n
k 1
wi
w
k
t
1
0.5*
k
k 1
Y k 1
l i
1
Y
l
1
0.5*
k 1
k
k 1
Y
t
1
Y k 1
Y k
Where: years since game k took place Y k is the number of years discount ounted ed numbe numberr of games, games, wins, wins, and and loss losses es n i , w i , and li are the disc n, w, and l are the real numer of games, wins, and losses for team i
In order to avoid rewarding teams for playing more games, I set the rate at which a voter considers a game played by his favored team i to be independent of the number ni of games played by the team. With this choice of constant rates, the expected rate rate of
27
Chapter 2: Ranking Algorithms
Andrei C. Grecu
change for the number of votes cast for each team in the random walk is characterized by the following homogenous system of linear differential equations:
V’ = G * V Where:
V is the column vector of the number of votes vi cast for each of the N teams G is a square matrix whose entries are derived from previous games such that:
G i p * li (1 p) * wi Gij 0.5 * Nij
2 * p 1 2
* Aij , i j
Where: number of games played played between between i and j Nij is the number A is the number number of times times i beat j minus minus the number number of times times j beat i ij
With this choice of matrix G, which encompasses enc ompasses all the outcomes of previous games between all teams, I am interested in the steady-state equilibrium that gives the expected number of voters vi who prefer each team:
G*V=O
This equilibrium equation has a unique solution V for any given p in the (0.5,1) interval. Indeed, with p strictly greater than 0.5, p > 0.5, the off-diagonal elements Gij 0.5 * N ij
2 * p 1 2
* Aij , i j of the matrix G are non-negative. If, in addition, the
28
Chapter 2: Ranking Algorithms
Andrei C. Grecu
underlying graph representing games played between teams consists of a single connected component, then V is the unique equilibrium solution, lies in the null-space of G and therefore is the eigenvector associated with the zero eigenvalue. Even though the equilibrium V does not imply constant flows of voters switching their preference from team i to team j, it guarantees a constant number of random walkers voting for each team at any time. Therefore, by reducing the matrix G to its row-reduced row-reduced echelon form, we can easily compute the expected number of voters supporting each team, and implicitly rank the teams according to the overall percentage of votes received.
Application to the 2006 Soccer World Cup In order to rank the teams that qualified for the 2006 soccer World Cup, I applied the algorithm described above to the 477 games among the 55 national teams that took part in at least one World Cup between 1994 and 2006. Using Excel to compute the discounted number of games, wins, and loses for each team, I constructed the matrix G used to find the steady-state equilibrium. Then using MATLAB to find the row-reduced echelon form of G, I determined the relative strength of each of the 55 teams considered. Choosing the 32 teams qualified for the 2006 World Cup out of the 55 teams, and scaling the results on a 100-point scale, I obtained o btained the following ranking of teams:
29
Chapter 2: Ranking Algorithms
Andrei C. Grecu
2006 WORLD CUP PARTICIPANTS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
CZECH REPUBLIC FRANCE BRAZIL PORTUGAL ENGLAND GERMANY ITALY ARGENTINA NETHERLANDS SPAIN MEXICO SWEDEN AUSTRALIA CROATIA PARAGUAY USA SERB SERBIA IA AND MONTE MONTENE NEGR GRO O UKRAINE JAPAN IRAN SOUTH KOREA TUNISIA POLAND GHANA SWITZERLAND COSTA RICA IVORY COAST TOGO ECUADOR SAUDI ARABIA TRINIDAD AND TO TOB BAGO ANGOLA
30
100.00 99.97 98.44 88.17 82.51 82.05 80.78 80.61 77.64 77.45 72.23 69.22 68.97 64.09 60.73 60.46 59.9 59.99 9 56.60 53.03 52.84 48.27 47.59 45.40 41.04 40.50 38.50 37.10 35.53 34.25 33.17 29.12 27.08
Chapter 2: Ranking Algorithms
Andrei C. Grecu
Artificial neutral networks are biologically inspired algorithms whose patternmatching and learning capabilities address problems otherwise difficult to solve using standard computational and statistical methods. By mathematically modeling the processes that take place in the human brain, artificial neural networks “learn” from a given data set, thus being able to develop a solution without needing a predefined solution algorithm. However, despite their ability to explain complex relationships between inputs and outputs, artificial neural networks are based on a simple model of biological neural networks, which makes them intuitively appealing. Technically speaking, an artificial neural network is characterized by three elements: i) its processing its processing elements analogous to the neurons in the neural networks, ii) its connections among processing elements and their corresponding weights, weights, which algorithm, which contain the information stored in the network, and iii) its evolution algorithm, gives the manner in which the neural network searches and converges to a solution. In an attempt to model the way in which the human brain processes visual data and learns to recognize objects, psychologist p sychologist Frank Rosenblatt designed the first artificial neural network in 1958. Since then, artificial neural network have been applied in a variety of applications, including bonds rating, target marketing, evaluating loan applications, predicting stock movements, and certifying signatures. Another application of neural networks is the rating of sport teams based on previous results, which I will discuss in this chapter.
31
Chapter 2: Ranking Algorithms
Andrei C. Grecu
The Intuition Wilson (1995) discusses the use of a neural network to rank the NCAA Division 1-A college football teams. In his approach, the processing elements correspond to the football teams to be ranked, with the value of each neuron representing the relative value of one team. Also, connections between be tween processing elements correspond to games played between teams, with the weights of the connections being derived from the score of previous encounters. Finally, the evolution of the neu ral network is characterized by the transfer and summation functions: On one hand, the transfer function look at one game at a time, takes as inputs the value of the opponent and the weight of the connection between the two teams, and returns an updated value of the team as a result of that game. On the other hand, the summation the summation function averages the results results of the transfer function function for each game, thus giving the updated value of the team. Eventually, the algorithm stops when the updated values of all neurons stabilize over time, when a new iteration does not change the relative value of each neuron. Using the same setting, I look at the soccer World Cup and design an artificial neural network that ranks the qualified teams based on their previous results. In order to do this, I will change the approach a pproach presented in Wilson (1995) in two important wa ys. First, in assigning weights to the connections corresponding to previous games, I will factor in both the score and the date of the game, thus putting decreasing emphasis on games that took place longer time ago. Second, I will implement a socc er-intuitive transfer function, which updates the values of p rocessing elements (teams) based on the weight of their connection (score and date of previous game).
32
Chapter 2: Ranking Algorithms
Andrei C. Grecu
Consider the following history of games between soccer national teams, in which the arrow from GER to USA means that team GER beat team USA, and WGER,USA is a weight that increases with the difference in goals scored by GER and USA, and decreases with the number of years since the game took place:
GER
WGER,USA
WSPA,GER
USA
WUSA,JAP WUSA,SPA
SPA
WGER,JAP JAP
Figure 3: Neural 3: Neural network setting with 4 neurons representing teams, and 5 connections representing previous games
Taking the processing elements and connections as inputs, an evolution algorithm updates the value assigned to each team at time t based on the team values at time t-1, as well as the connection weights, which remain constant throughout the algorithm. For the situation pictured in Figure 3, the following updates are performed at each iteration, until the value assigned to each team does not change between incremental units of time:
33
Chapter 2: Ranking Algorithms
V t , GER
V t , USA
Andrei C. Grecu
FT (Vt 1, USA, WGER , USA) FT (Vt 1, JAP, WGER, JAP) F T (Vt 1, SPA, W SPA, GE R) 3 FT (Vt 1, GER, WGER, USA) FT (Vt 1, JAP, WUSA, JAP) F T (Vt 1, SPA, WUSA, SP A) 3 V t , SPA
V t , JAP
FT (Vt 1, GER, WSPA, GER) FT (Vt 1, USA, WUSA, SP A) 2 FT (Vt 1, GER, WGER, JAP) FT (Vt 1, USA, WUSA, JA P) 2
Where: transfer er funct function ion with with input inputss Vt-1 and W FT is the transf the val value of tea team ABC ABC at time time t Vt,ABC is the the wei weigh ghtt of of the the conne onnect ctio ion n bet betwe ween en tea team ABC ABC and and te team XYZ WABC,XYZ is the
Intuitively, the transfer function FT should i) increase the value of a team following a victory, ii) assign more value to a victory against a strong opponent than against a weaker opponent, iii) assign more value to a recent victory at a high score than to a close victory a while ago; iv) decrease the value of a team following a defeat, v) decrease more value following a defeat against a weak opponent than against a stronger opponent, and finally vi) decrease more value for a recent defeat at a high score than for a close defeat a while ago. Also intuitively, the initial value assigned to each team should not influence the final rankings. Thus, in order to avoid any bias resulting from the initial value assigned to each team, I assign all teams the same initial value and let the connection weights be the only factors determining the relative ranking of teams. With
34
Chapter 2: Ranking Algorithms
Andrei C. Grecu
these considerations in mind, I implement the following mathematical mode l for a neural network that ranks the teams qualified for the World Cup.
The Mathematical Model Consider N teams with a history of games among themselves, such that every team is connected to all other teams either through direct games or through a succession of games involving its direct opponents. We c onsider a corresponding artificial neural network with N neurons, such that each team is assigned a neuron in the network. Using the results of previous previous encounters among the N teams over the last 12 years, we compute the weights of all connections between team i and team j, such that clear, recent results are weighted more heavily than tight games, which took place longer ago:
Wij g ij * Yij Where: difference between the number of goals scored scored by team i and team j gij is the difference Y is a year coefficient coefficient equal to (12 - number of years years since the game took place) place) ij
Given the artificial neural network and the weights described above, we need to assign each team an initial value, before starting the evolution algorithm. As discussed in the previous section, we want the the relative values of the teams to be derived solely from the connection weights, rather than being influenced by the initial values assigned to the teams. Therefore, in order to reduce the bias resulting from arbitrary initial values, we
35
Chapter 2: Ranking Algorithms
Andrei C. Grecu
assign all teams an initial value of 50, and we implement an evolution algorithm that uses the connection weights as inputs to rank the teams on a 0 to 100 scale. At the heart of the evolution algorithm is the transfer function, which updates the value of each team based on the value of the opponent and the weight of the connection between the two teams. Intuitively, we want the transfer function to always increase the value of a team following a victory and decrease its value following a defeat. However, we want these changes to vary according to the strength of the opponent, the magnitude of the victory, and the number of years since the game took place. An intuitive transfer function that accomplishes the objectives discussed above is presented below:
0
Vi,t
maxV maxVi,t+1’
maxV maxVi,t+1 100
Figure 4: The value of team i can increase more following a victory against a strong opponent than a victory against a weaker opponent, maxV i,t+1 i,t+1 > maxV i,t+1 i,t+1 ’
Consider the case in which team i beat team j, so that the value of team i rises from Vi,t, at time t, to V i,t+1, at time t+1. At this point we model a transfer function that performs two steps in updating the value of team i: i) first, it computes the range of maxVi,t+1), as a function of the opponent’s possible updated values of team i, (min (minV Vi,t+1, maxV value V j,t, and then ii) it determines the exact updated value Vi,t+1 (minV minVi,t+1, maxV maxVi,t+1), by looking at the connection weight Wij. Such a transfer function generates the greatest increase in a team’s value following a recent, clear victory against a strong opponent, and
36
Chapter 2: Ranking Algorithms
Andrei C. Grecu
causes the smallest increase in value following a tight victory against a weak opponent, twelve years ago. First, in determining (min (minV Vi, t+1, maxV maxVi, t+1), assume that minV minVi, t+1 = Vi, t, or the “worst” a team can do in case of a victory is keep its previous value. va lue. Meanwhile, assume maxVi,t+1, or the maximum possible increase in the value of team i, is linearly that maxV dependent on the value of the opponent team, V j,t. Since the team values are bound to the interval [0,100], consider the two limit cases. On one hand, if team i wins against a 100value team, its updated value Vi, t+1 should be allowed to potentially increase all the way to
3 4
of the distance between Vi, t and 100, namely to the point Vi +
3 4
* (10 (100 0 – Vi). On
the other hand, if team i wins against a 0-value team, regardless of the score and date, its update value Vi, t+1 should not be allowed to increase by more than only
between Vi, t and 100, or the point Vi +
1 4
1 4
of the distance
* (10 (100 0 – Vi). Using these two values as
reference, and assuming a linear relationship between the maximum possible increase in maxVi,t+1, and the value of team j, V j,t, we derive the formula for the the value of team i, maxV maximum possible updated value of team i, for a given V j,t:
max Vivictory ,t 1
10 0 V ,t 20 0
* V j ,t
10 0 3 * Vi ,t
4
maxVi,t+1), or the interval of potential potential Second, once we compute (min (minV Vi,t+1, maxV updated values for team i, we look at the connection weight Wij to determine the exact updated value for team i within the interval. For large values of Wij, corresponding to recent, categorical victories, we want the transfer function to assign a big increase in the 37
Chapter 2: Ranking Algorithms
Andrei C. Grecu
value of team i. Conversely, for small values of Wij, corresponding to close victories that took place a while ago, we want a smaller increase in the value of team i within the interval (min (minV Vi,t+1, maxV maxVi,t+1). Therefore, assume that the transfer function returns the maximum updated value for team i, maxV maxVi,t+1, following a game with weight Wmax, the highest weight in the entire neural network. For all other connection weights smaller than Wmax, the transfer function function returns an update value for team i in the interval (minV minVi,t+1, maxV maxVi,t+1), according to the following formula:
Viv,tict1ory V , t
W ij W max
* (maxViv,tict1ory Vi , t )
Similarly, in a game in which team i lost to team j we want the value va lue of team i to decrease as a function of both the value of the opponent team V j,t and the connection weight Wij. Now, the “best” a loosing team can do is keep its current value, maxV maxVi, t+1 = Vi,t, in case it lost in a tight game, against a very strong team, and a long time ago. In the general case, following a defeat, the updated value of team i will lie in the interval whose upper limit is the previous value of the team, Vi,i, and whose lower limit is a linear function of the opponent’s value:
min Vi ,t 1 defeat
V ,t 20 0
* V j ,t
Vi ,t
4
Having computed the interval of possible updated values for team i, (min (minV Vi,t+1, maxV maxVi,t+1), we look at the connection weight Wij to determine the exact updated value for team i within the interval. For large values of Wij, corresponding to recent, categorical defeats, we want the transfer function to assign a big decrease in the value of team i.
38
Chapter 2: Ranking Algorithms
Andrei C. Grecu
Conversely, for small values of Wij, corresponding to close defeats that took place a while ago, we want a smaller decrease in the value of team i within the interval (min (minV Vi,t+1, maxV maxVi, t+1 ). Therefore, the transfer function, which returns the exact updated value of team I following a defeat, is a linearly dependent on the connection weight Wij:
Vid,tefe1at V , t
W ij W max
* (minVid,tefe1at Vi , t )
Finally, in case of a tie, we want to penalize the team with the higher value and reward the team with the lower value, since the former was expected to win, while the latter produced the surprise. However, since ties are on ly half-upsets, we only award half the increase computed in case of a victory, and subtract half the value computed in case of a defeat. Also, in a tie team i and team j score the same number of goals, so the connection weight degenerates to Wij = 0. Therefore, I choose a transfer function that updates the team values as a function of the number or years since the game against team j, Yij. Assuming that team i has a greater value than team j at time t, and a nd supposing that we are taking into account games at most 12 years ago, the transfer function increases the value of team j and decreases the value of team i as follows:
tie j ,t 1
V
tie i ,t 1
V
Vj,t Vi,t
( maxVjvictory ,t+1 -2*Vj,t ) 24 (2 * minVidefeat ,t+1 -Vi,t ) 12
* Yji
* Yij
Using the transfer functions presented here, the evolution algorithm looks at all the games played by team i, computes the updated value of team i corresponding to each
39
Chapter 2: Ranking Algorithms
Andrei C. Grecu
game played, and then averages the separate values to return the overall update value of team i. This process in repeated until the values of the teams stabilize between incremental units of time. Finally, in order to avoid having very large or negative values assigned to certain teams, we normalize the values assigned to each team after a fter each iteration, such that the highest ranked team has a value of 100 and all other teams have a proportional value within the interval (0,100).
Application to the 2006 Soccer World Cup In order to rank the teams that participate in the 2006 World Cup I formed a neural network whose neurons correspond to the 55 soccer national teams that qualified for at least one World Cup between 1994 and 2006. Next, I determined the weights of the intra-neural connections by examining 477 previous games among the 55 teams, played either in previous World Cups or in competitive regional tournaments that replicate the World World Cup environmentenvironment- neutral neutral field, strong strong motivation. motivation. Using the C++ program included in Appendix A, I implemented the transfer functions explained in the previous section to update the value of each team based on the value of the opponent and the connection weight corresponding to each game, I obtained the convergent solution for the 2
value of each team after 178 iterations :
2
Included in Appendix B is a graph that shows how the algorithm converges to stable values for each team
40
Chapter 2: Ranking Algorithms
Andrei C. Grecu
WORLD CUP 2006 PARTICIPANTS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
BRAZIL ITALY FRANCE ARGENTINA ENGLAND GERMANY PORTUGAL NETHERL AND SPAIN SWEDEN CZECH REPUBLIC MEXIC AUSTRALIA PARAGUAY JAPAN IRAN IVORY COAST YUGOSLAVIA POLAND TOGO CROATIA SOUTH KOREA COSTA RICA UKRAINE GHANA USA TUNISIA SAUDI ARABIA TRIN TRINID IDAD AD AND TOBAG TOBAGO O SWITZERLAND ECUADOR ANGOLA
41
100.00 93.32 85.65 84.22 82.09 81.55 80.24 79.88 79.47 77.87 77.66 76.20 69.20 64.69 64.23 63.48 61.98 61.19 61.08 55.92 53.64 52.17 51.40 50.18 46.83 45.23 44.96 43.93 28.5 28.58 8 25.51 19.72 17.39
Chapter 3: A Poisson Model for World Cup Games
Andrei C. Grecu
Chapter 3: Chapter 2 presents three algorithms for ranking soccer na tional teams qualified for the World Cup, based on historical games among the teams prior to the start of the competition. Next, this chapter assesses the predictive power of the rankings developed in the previous chapter by looking at the relationship relationship between initial rankings and game results in the case of the 2002 World Cup in South Korea and Japan. Thus, in order to assess the predicting accuracy of the rankings, I will employ the ranking algorithms from Chapter 2 using historical games available at the beginning of the 2002 World Cup. Using the 2002 rankings, I will then look at the games from the 2002 2 002 World Cup and examine how the value assigned to each team predicts the number of goals scored by that team during the tournament. Implicitly, for every game I calcu late the probability of winning assigned to each team and a nd I compare my predictions with the actual ac tual results from South Korea and Japan. Finally, I explain how the structure of games from the World Cup makes it difficult to compute conditional proba bilities for each team to win entire tournament, which is why I choose to run the simulations of the World Cup in Chapter Cha pter 4.
Rankings at the start of the 2002 World Cup Before looking for a probability distribution for the number the goals scored in the 2002 World Cup games, we need to determine the relative rankings of the teams as of June 2002, the start date for the tournament. In order to rank the teams at the start of the World Cup, we apply the algorithms discussed in Chapter 2 to a history of games played
42
Chapter 3: A Poisson Model for World Cup Games
Andrei C. Grecu
among soccer national teams between 1990 and 2002. Thus, i) we put decreasing emphasis on games that took place longer time ago, ii) we only consider con sider competitive soccer national teams that qualified for at least one World Cup between 1990 and 2002, and iii) we only look at games that were part of a competitive tournament, in which teams play their best players at full potential. More precisely, the dataset consists of 473 games played between 55 teams at the following tournaments: 1. World World Cup: Cup: France France 1998, 1998, USA USA 1994, 1994, Ital Italy y 1990 1990 2. African African Nations Nations Cup: Cup: Mali 2002, Ghana Ghana 2000, Burkina Burkina Faso 1998, South South Africa 1996, Tunisia 1994, Senegal 1992, Algeria 1990 3. Asian Cup: Lebanon Lebanon 2000, 2000, United United Arab Arab Emirate Emiratess 1996, Japan 1992 1992 4. CONCACAF CONCACAF Gold Gold Cup: Cup: USA USA 2002, USA 2000, 2000, USA USA 1998, USA 1996, 1996, Mexico 1993, USA 1991 5. Copa America America:: Colombia Colombia 2001, 2001, Paraguay Paraguay 1999, Bolivia Bolivia 1997, Uruguay Uruguay 1995, Ecuador 1993, Chile 1991 6. European European Championshi Championship: p: Belgium/N Belgium/Netherl etherlands ands 2000, 2000, England England 1996, Sweden 1992
By applying the three ranking algorithms algorithms to this set of games, with their corresponding dates and scores, I obtained the following rankings of soccer national teams at the beginning of the 2002 World Cup:
43
Chapter 3: A Poisson Model for World Cup Games
Andrei C. Grecu
2002 WORLD CUP EIGENVECTOR RANKING
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
ITALY 100.00 ARGENTINA 95.66 GERMANY 92.36 BRAZIL 90.81 MEXICO 82.73 USA 76.02 ENGLAND 74.46 FRANCE 64.60 URUGUAY 62.42 CAMEROON 61.64 SPAIN 61.27 SWEDEN 60.79 PARAGUAY 54.63 BELGIUM 49.56 RUSSIA 47.92 DENMARK 46.81 COSTA RICA 45.92 NIGERIA 44.96 ECUADOR 44.48 SOUTH KOR ORE EA 42.81 CROATIA 39.88 PORTUGAL 34.62 IRELAND 31.84 SAUDI ARAB RABIA 31.02 TUNISIA 28.65 SOU SOUTH AFRICA ICA 25. 25.76 JAPAN 23.17 TURKEY 21.79 SENEGAL 12.75 SLOVENIA 8.78 CHINA 7.78 POLAND 7.00
44
Chapter 3: A Poisson Model for World Cup Games
2002 RANDOM WALKERS RANKING
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
FRANCE 100.00 PORTUGAL 82.90 ITALY 75.27 CROATIA 63.80 BRAZIL 59.04 GERMANY 54.69 ENGLAND 51.22 ARGENTINA 48.08 SPAIN 46.93 IRELAND 45.06 MEXIC 42.30 DENMARK 40.77 NIGERIA 40.63 CAMEROON 39.67 SENEGAL 39.42 USA 36.58 BELGIUM 36.44 SWEDEN 34.58 PARAGUAY 33.33 URUGUAY 32.84 TURKEY 32.37 RUSSIA 32.19 SLOVENIA 31.16 SOU SOUTH AFRICA ICA 30. 30.81 JAPAN 26.20 ECUADOR 22.84 COSTA RICA 22.60 TUNISIA 21.40 SAUDI ARAB RABIA 20.82 SOUTH KOR ORE EA 19.13 CHINA 10.53 POLAND 10.00
45
Andrei C. Grecu
Chapter 3: A Poisson Model for World Cup Games
2002 NEURAL NETWORK RANKING
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
ITALY 100.00 PORTUGAL 96.20 FRANCE 94.56 BRAZIL 84.97 CROATIA 82.69 DENMARK 47.03 ENGLAND 46.11 NIGERIA 45.64 GERMANY 45.56 SPAIN 44.91 ARGENTINA 44.91 SOU SOUTH AFRICA ICA 37. 37.88 CAMEROON 33.92 MEXIC 31.44 PARAGUAY 30.82 SLOVENIA 29.60 SWEDEN 28.87 TURKEY 26.81 URUGUAY 26.37 TUNISIA 25.26 SENEGAL 24.97 BELGIUM 23.87 JAPAN 23.41 RUSSIA 19.60 USA 19.56 IRELAND 19.16 ECUADOR 19.05 COSTA RICA 18.74 SOUTH KOR ORE EA 18.33 SAUDI ARAB RABIA 18.22 CHINA 2.57 POLAND 2.00
46
Andrei C. Grecu
Chapter 3: A Poisson Model for World Cup Games
Andrei C. Grecu
A Poisson Distribution of Goals 3
In a typical soccer game two teams play in continuous time for 90 minutes , with the team that scores more goals winning the game. In order to score a goal, the two teams alternate possession of the ball, which allows them to attack the opponent team’s goal and perhaps score. If attacks are assumed a ssumed to be independent, then the distribution of goals scored is negative binomial, as claimed in Pollard (1977), Moroney (1951), and Reed, Pollard, and Benjamin (1971). However, the probability p that any attack will result in a goal is relatively small. For example, a total of 161 goals were scored in the 64 games from the 2002 World Cup, for an average of approximately 2.5 goals per game or one goal every 40 minutes. Consequently, even thought there are a lot of attacks in a game, there is a small probability p of scoring in any given attack. Therefore, given the small probabilities of success p, the Poisson approximation is a good fit for the number of goals scored in a soccer game, as showed in Lee (1997), Maher (1982), and Dyte and Clarke (2000). Using an approach similar to Dyte and Clarke (2000), my model for the number of goals scored by each team in a given game relies on two assumptions. First, it assumes that the number of goals scored by each team in a game is Poisson distributed. Second, it assumes that the number of goals scored by a team is independent of the number of goals scored by the opposing team. Given these two assumptions, I am interested in measuring the rankings’ predictive power with respect to the number of goals scored by team i in a game against team j. In particular, I am looking for the following relationship:
3
59 out of the 64 games played in the 2002 World Cup ended after 90 minutes. Thus, even though some games from the knockout stage of the World Cup go into extra time to allow teams to score the winning goal, I will focus my analysis on results after 90 minutes of play.
47
Chapter 3: A Poisson Model for World Cup Games
ln(
j
Andrei C. Grecu
) x0 x1 * Ri x2 * R j
Where: goa ls scored by team i in a game against a gainst team j g ij is the number of goals R is the rating of team i at the time of the game i
Having agreed on the model, I looked at the results from the games played in the 2002 World Cup. Even though only 64 games were played in the tournament, they correspond to 128 observations, since for every game we are interested in predicting the number of goals scored by each team. Using MATLAB to fit fit a Poisson regression to the data, I obtained the following sets of coefficients, corresponding to each of the three rankings developed:
1. Eige Eigenve nvect ctor or Rank Rankin ing: g: Value
t value
X0 0.3532 X1 0.0071 X2 -0.0 -0.010 106 6 Deviance 140.9387 Dispersion 1.0134
1.5902 2.4309 -3.4 -3.453 533 3
p value 0.1118 0.0151 0.00 0.000 06
2. Rando Random m Wal Walker kerss Ran Ranki king: ng: Value
t value
p value
X0
0.6224
2.279
0.0227
X1
0.0046
1.1241
0.261
X2
-0.0 -0.015 153 3
-3.0 -3.081 812 2
0.00 0.002 21
Deviance 145.7578 Dispersion 1.0237
48
Chapter 3: A Poisson Model for World Cup Games
Andrei C. Grecu
3. Neur Neural al Netw Networ ork k Rank Rankin ing: g: Value
t value
p value
X0 X1
0.3976
1.9985
0.0457
0.0039
1.3371
0.1812
X2
-0.0 -0.009 091 1
-2.4 -2.477 773 3
0.01 0.013 32
Deviance Dispersion
148.7348 1.0429
All three regressions fit the data well, and give a positive coefficient in front of R i, the rating of team i, and negative coefficient in front of R j, the rating of the opponent team j. These results confirm the basic intuition, which says that the higher the rating of team i, and the lower the ranking of team j, the higher the expected number of goals scored by team i. Also, the negative coefficients in front of R j are significantly larger in absolute terms than the positive coefficients in front of R i for all three regressions, suggesting that in determining the number of goals scored by a team, the rating of the opponent is more important than the team’s own ranking, at least in the 2002 World Cup. Using the regression coefficients to compute the expec ted number of goals scored by a team in a certain game and then using that expectation as a mean, we can compute marginal probabilities for each team’s Poisson distribution of goals scored in a certain game. If we further assume that the number numbe r of goals scored by team i is independent indepe ndent of the number of goals scored by team j, we can compute the probability of any specific result for a given game:
i
exp( x0 x1 * Ri x2 * R j )
j
exp( 0 x1 * R j x2 * Ri )
49
Chapter 3: A Poisson Model for World Cup Games g ij
P( gij & g ji )
i
Andrei C. Grecu
*e
g ji
i
*
g ij !
j
*e
j
g ji !
As an example, consider the game between Argentina and Nigeria from the 2002 World Cup. The Eigenvector ratings computed at the beginning of the tournament for the two teams are R ARG ARG = 95.66 and R NIG = 61.64. Using these ratings, we compute the expected number of goals scored by each team in their encounter: GER
exp( x0 x1 * RARG x2 * RNIG ) exp( exp(0 0.35 .3532 0.00 .0071*95 71*95.6 .66 6 0.01 .0106*61.6 6*61.64 4) 1.46
NIG
exp( x0 x1 * RNIG x2 * RARG ) exp( exp(0 0.35 .3532 0.00 .0071*61 71*61.6 .64 4 0.0 0.0106*95 106*95.6 .66 6) 0.79
Thus, before the game we expected Argentina to score 1.46 goals and Nigeria only 0.79 goals in their direct game. Using these expectations as means, we can compute the probability assigned to each score in the game between Argentina and Nigeria, as long as we assume that the number n umber of goals scored by each team is Poisson distributed and the number of goals scored by Argentina is independent of the number of goals scored by Nigeria:
P (1-0 Argentina win)
g
ARARGG
*e
ARG
g ARG !
g
*
NINGIG
*e
NIG
g NIG !
1.46 * e1.46 * e 0.79 0.149 This is the probability of a 1 - 0 Argentina victory over Nigeria. Nigeria. Performing similar computations, we can compute the proba bility assigned to any score in the game
50
Chapter 3: A Poisson Model for World Cup Games
Andrei C. Grecu
between Argentina and Nigeria. The table below contains the probabilities assigned to any result in which both teams score between 0 and 9 goals:
NIG ARG 0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
0.0853 0.0607 0.1493 0.1062 0.1306 0.0929 0.0762 0.0542 0.0333 0.0237 0.0117 0.0083 0.0034 0.0024 0.0009 0.0006 0.0002 0.0001 0.0000 0.0000
0.0216 0.0378 0.0331 0.0193 0.0084 0.0030 0.0009 0.0002 0.0000 0.0000
0.0051 0.0090 0.0078 0.0046 0.0020 0.0007 0.0002 0.0001 0.0000 0.0000
0.0009 0.0016 0.0014 0.0008 0.0004 0.0001 0.0000 0.0000 0.0000 0.0000
0.0001 0.0002 0.0002 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Table 1: Probability 1: Probability distribution of scores for Argentina vs. Nigeria
In this example, it happens to be that the score with the highest h ighest probability (14.93%), 1-0 for Argentina, was the real result of the game between the two teams. However, given the relatively small probability assigned to each separate result, it does not happen very often that the result with the highest probability is the real result of the game. A more meaningful prediction regarding the game between Argentina and Nigeria would be the probability assigned to a victory of Argentina against Nigeria, at any score. We are able to compute this probability by summing all the probabilities below the main diagonal- the probabilities assigned to all outcomes in which Argentina scores scores at least one goal more than Nigeria. Similarly, we compute the probability of a Nigeria victory by summing the probabilities above the main diagon al. Finally, the sum of the probabilities on the main diagonal gives the probability of a draw. In the case of the game between
51
Chapter 3: A Poisson Model for World Cup Games
Andrei C. Grecu
Argentina and Nigeria, these aggregate probabilities p robabilities were: Argentina wins 62.30%, Nigeria wins 14.75%, game ends in a tie 22.95%.
Predictive Accuracy for the 2002 World Cup Games Using the methodology discussed above, I compute c ompute three sets of predictions for the 2002 World Cup games, each corresponding to one of the three rankings presented in this thesis: the Eigenvector ranking, the Random Walker ranking, and the Neural Network ranking. Then, by looking at a t the actual results of the games from the 2002 tournament I check whether my predictions turned out to be correct correct or not. However, given the relative frequency of unexpected unexpe cted results in the soccer World Cup, any initial ranking will inherently generate a relatively large number of wrong predictions. Therefore, a more meaningful assessment of the predictive p ower of my rankings would be a comparison with the predictions of another party interested in correctly forecasting the outcome of World Cup games. Naturally, I decided to compare my predictions with the predictions implied by the odds for winning the World Cup assigned to each team by the sports bookies. More exactly, I assumed that whene ver a team with higher odds for the World Cup plays a team with lower such odds, the former team is the bookies’ favorite. Below are the odds for winning the 2002 World Cup assigned to each team by the st
London bookmakers, as published in The London Times on the 31 of May, 2002, the day before the start of the tournament:
52
Chapter 3: A Poisson Model for World Cup Games
THE LONDON TIMES RANKING
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
$ FOR $1 BET P (WIN THE WC) FRANCE 3.33 20.54% ARGENTINA 4.5 15.20% ITALY 4.5 15.20% BRAZIL 6 11.40% SPAIN 8 8.55% ENGLAND 12 5.70% PORTUGAL 14 4.89% GERMANY 16 4.28% CAMEROON 33 2.07% RUSSIA 66 1.04% JAPAN 66 1.04% PARAGUAY 80 0.86% SWEDE 80 0.86% CROATIA 80 0.86% POLAND 80 0.86% NIGERIA 100 0.68% URUGUAY 100 0.68% TURKEY 100 0.68% MEXIC 125 0.55% BELGIUM 125 0.55% IRELAND 125 0.55% DENMARK 125 0.55% ECUADOR 150 0.46% SOUTH KOREA 150 0.46% SENEGAL 200 0.34% SLOVENIA 300 0.23% USA 300 0.23% SOUTH AFRICA 300 0.23% COSTA RICA 400 0.17% TUNISIA 500 0.14% SAUDI ARABIA 750 0.09% CHINA 750 0.09%
53
Andrei C. Grecu
Chapter 3: A Poisson Model for World Cup Games
Andrei C. Grecu
In order to compare my predictions with the London Times predictions, I look at the 64 games from the 2002 World Cup and I reward the predictions p redictions that turned out to be correct, while I penalize the predictions that turned out to be wrong. Specifically, I assign a score of –1 to any an y game in which a team with smaller winning probability defeated a team with higher winning probability, since this disagrees with the predictions. Likewise, I assign a score of +1 to any a ny game won by the team with higher winning probability, since this agrees with the predictions. Intuitively, a tie is worth 0, since the team with higher winning probability failed to win the game, but it did not actually lose to the underdog, either. Also, we define the predictive the predictive score of a set of predictions as the sum of the separate scores assigned to each game, given that set of predictions. The higher h igher the predictive score, the more accurate the predictions are, since we have decided to award a point for every correct forecast and subtract one point for every erroneous prediction. For example, using the predictions derived from the London Times rankings and looking at the 2002 World Cup results, we conclude conc lude that the favorite team actually won the game in 32 out of the 64 games, the underdog produced the surprise and won in 18 games, while 14 games ended in a tie. Thus, with 32 correct predictions, 18 wrong ones, and 14 ties, the London Times set of predictions gets a predictive score equal to 32*1 – 18*(-1) + 14*0 = 14. Similarly, we compute the predictive score of the three sets of predictions derived from my rankings. Using the Eigenvector set of predictions we are able to correctly predict the winner of 35 games, while we are wrong for 15 games, g ames, for a predictive score of 35*1 + 15*(-1) = 20. Using the Random Walkers set of predictions we again correctly
54
Chapter 3: A Poisson Model for World Cup Games
Andrei C. Grecu
predict the winner for 35 of the games, while we are wrong in 15 cases, for a predictive score of 35*1 + 15*(-1) = 20. Finally, with the Neural Network set of predictions we are right in predicting the winner of 33 games, while we are mistaken in our prediction for 17 games, for an aggregate predictive score of 33*1 + 17*(-1) = 16. It turns out that my predictions unanimously outperform the London Times predictions in picking the winner of separate World Cup games. In other words, if I had chosen the winner of each game according to my rankings, I would have done better than if I had chosen the winner of each game by looking in the London Times and picking the team with the higher assigned odds for winning the World Cup.
55
Chapter 3: A Poisson Model for World Cup Games
Andrei C. Grecu
Table 2: For 2: For each of the 64 games from the 2002 World Cup, the three sets of predictions derived from each of the three rankings presented in Chapter 2 (a Score of 1 means the prediction was right , a –1 means it was wrong; 0 is a tie )
PREDICTIONS DERIVED FROM THE RANKINGS: Eigenvector Random Walkers ers Neural Netwo worrk Times 64 WC Games Winner: Winner: Winner: A B Result A B Tie Score A B Tie Score A B Tie Score Score FRA SEN 0 1 65% 14% 14% 21% -1 65% 11% 24% -1 62% 15% 23 23% -1 -1 -1 URU DE D EN 1 2 44% 28% 27% -1 32% 41% 27% 1 29% 45% 27% 1 -1 FRA URU 1 2 37% 35% 35% 28% -1 70% 9% 21% -1 61% 15% 24 24% -1 -1 -1 SEN DEN 1 2 2 0% 0% 56% 24% 1 35% 37% 27% 1 28% 45% 27% 1 1 FRA DEN 1 2 45% 27% 27% 27% -1 64% 11% 24% -1 52% 20% 27 27% -1 -1 -1 SEN URU 3 3 15% 64% 21% 0 41% 33% 27% 0 36% 37% 26% 0 0 PAR SAF 2 2 52% 23% 25% 0 38% 35% 26% 0 34% 39% 27% 0 0 SPA SLO 3 1 66% 14% 20% 1 46% 28% 27% 1 42% 31% 27 27% 1 1 SPA PAR 3 1 39% 33% 28% 1 44% 29% 27% 1 42% 31% 27 27% 1 1 SLO SAF 0 1 28 2 8% 47% 25% 1 37% 37% 26% -1 33% 40% 27% 1 -1 SLO PAR 1 3 16 1 6% 62% 22% 1 36% 38% 26% 1 36% 37% 27% 1 1 SPA SAF 3 2 56% 20% 24% 1 46% 27% 26% 1 39% 34% 28 28% 1 1 BRA TUR 2 1 73% 10% 18% 1 52% 22% 26% 1 58% 17% 25 25% 1 1 CHN CRI 0 2 19 19% 58% 23% 1 30% 46% 23% 1 31% 44% 25% 1 1 BRA CHN 4 0 80% 6% 14% 1 67% 13% 20% 1 69% 12% 20 20% 1 1 TUR COS 1 1 25% 50% 25% 0 44% 31% 25% 0 40% 34% 26 26% 0 0 BRA CRI 5 2 59% 17% 24% 1 58% 18% 24% 1 62% 15% 23 23% 1 1 TUR CHN 3 0 45% 30% 25% 1 53% 25% 23% 1 48% 28% 25 25% 1 1 SKO POL 2 0 57% 20% 23% 1 45% 32% 23% 1 44% 31% 25 25% 1 -1 USA PTG 3 2 58% 18% 24% 1 15% 60% 25% -1 13% 65% 22% -1 -1 POL PTG 0 4 23 2 3% 53% 24% 1 7% 78% 15% 1 10% 72% 18% 1 1 SKO USA 1 1 2 1% 1% 54% 26% 0 27% 49% 24% 0 37% 38% 26% 0 0 POL USA 3 1 9% 74% 17% -1 22% 55% 22% -1 30% 45% 25% -1 1 SKO PTG 1 0 41% 32% 27% 1 9% 72% 19% -1 13% 65% 22% -1 -1 GER SA S AU 8 0 68% 12% 20% 1 58% 19% 24% 1 48% 27% 26 26% 1 1 IRE CAM 1 1 22 2 2% 53% 25% 0 39% 33% 28% 0 31% 43% 26% 0 0 GER IRE 1 1 68% 12% 20% 0 41% 30% 29% 0 47% 27% 26 26% 0 0 SAU CAM 0 1 22% 53% 25% 1 26% 49% 25% 1 31% 43% 26% 1 1 GER CA CAM 2 0 51% 22% 27% 1 44% 28% 28% 1 41% 32% 27 27% 1 1 SAU IRE 0 3 36 36% 37% 27% 1 23% 52% 24% 1 37% 37% 26% 1 1 ARG NIG 1 0 62% 15% 23% 1 40% 32% 28% 1 36% 36% 28% -1 1 ENG SW SW E 1 1 43% 29% 28% 0 46% 27% 27% 0 43% 30% 27 27% 0 0 ARG EN E NG 0 1 46% 26% 29% -1 34% 37% 29% 1 36% 36% 28% 1 -1 NIG SW E 1 2 2 8% 8% 44% 27% 1 40% 33% 27% -1 43% 30% 27 27% -1 1
56
Chapter 3: A Poisson Model for World Cup Games ARG NIG CRO ITA ITA ECU ECU ITA JAP RUS JAP BEL BEL JAP GER DEN SW E SPA MEX BRA JAP SKO ENG GER SKO SEN SKO BRA SKO BRA
SW SW E ENG M EX EX ECU CRO MEX CR C RO MEX BEL TUN RUS TUN RUS TUN PA P AR ENG SEN IRE US U SA BEL TUR ITA BRA US U SA SPA TUR GER TUR TUR GER
1 0 0 2 1 1 1 1 2 2 1 1 3 2 1 0 1 1 0 2 0 2 1 1 0 0 0 1 2 2
1 0 1 0 2 2 0 1 2 0 0 1 2 0 0 3 2 1 2 0 1 1 2 0 0 1 1 0 3 0
53% 20% 22 2 2% 52% 17% 59% 65% 13% 67% 12% 12% 19% 56% 39% 34% 43% 27% 24 24% 51% 47% 27% 24 2 4% 50% 48% 26% 37% 35% 34 3 4% 40% 55% 19% 23% 51% 63% 15% 52% 22% 39% 32% 32% 57% 18% 38% 36% 13 13% 66% 2 8% 8% 43% 43% 28% 27 2 7% 46% 3 2% 2% 42% 15% 62% 73% 10% 48% 26% 34% 36% SCORES:
26% 26% 24% 22% 21% 25% 27% 29% 25% 26% 25% 26% 28% 26% 26% 26% 21% 25% 29% 25% 26% 22% 29% 29% 27% 26% 23% 18% 26% 30%
0 0 1 1 -1 1 1 0 0 1 -1 0 1 -1 1 1 -1 0 -1 1 -1 -1 1 1 0 1 1 1 -1 -1
44% 29% 27% 30% 42% 28% 47% 24% 28% 66% 12% 21% 39% 28% 33% 26% 49% 25% 16% 61% 23% 53% 19% 28% 31% 43% 26% 44% 31% 25% 34% 41% 26% 47% 28% 25% 39% 34% 26% 41% 35% 25% 49% 25% 27% 30% 42% 28% 34% 39% 27% 37% 35% 29% 40% 33% 27% 49% 24% 27% 33% 41% 26% 11% 69% 20% 31% 39% 30% 46% 26% 27% 22% 55% 24% 41% 32% 27% 18% 59% 23% 52% 22% 26% 29% 46% 25% 37% 32% 31%
20
Andrei C. Grecu 0 0 -1 1 -1 1 -1 0 0 1 -1 0 1 1 1 1 1 0 -1 1 1 -1 1 1 0 -1 1 1 1 1
20
57
43% 30% 27 27% 36% 36% 28% 55% 19% 26 26% 66% 12% 21 21% 40% 28% 32 32% 32% 42% 26% 16% 61% 23% 61% 15% 24 24% 37% 37% 26% 35% 39% 26% 39% 35% 26 26% 36% 37% 26% 39% 35% 26 26% 36% 38% 26% 42% 31% 27 27% 36% 36% 28 28% 38% 35% 26 26% 47% 27% 26 26% 42% 32% 26 26% 59% 17% 24 24% 35% 38% 26% 12% 67% 21% 23% 50% 28% 47% 27% 26 26% 27% 47% 26% 36% 38% 26% 27% 48% 26% 58% 17% 25 25% 34% 40% 26% 50% 22% 28 28%
0 0 -1 -1 1 -1 -1 1 -1 0 0 -1 1 0 1 -1 1 -1 -1 -1 0 -1 -1 1 1 -1 1 1 0 1 1 1 1 1
0 0 -1 1 -1 1 -1 0 0 1 -1 0 -1 1 1 1 -1 0 -1 1 -1 -1 1 1 0 1 1 1 1 1
16
14
Chapter 4: Simulating the Soccer World Cup
Andrei C. Grecu
Chapter 4:
Chapter 3 showed how the results of the ranking algorithms from Chapter 2 could be used to make predictions regarding the outcome of separate games from from the World Cup. Moreover, we have seen that for the 2002 World Cup games, picking winners using the three ranking methods outperformed the strategy of a lways going with the bookmakers’ favorite in each separate game. In this chapter we are going to shift our focus from separate World Cup games to the overall picture and determine the probability of winning the Cup assigned to each participating team. In particular, how is the probability of winning assigned to each team influenced by the particular schedule of games from the World Cup tournament? How does the initial draw advantage some teams, while disadvantaging other participants? Finally, which teams had a lucky draw for the World Cup final tournament and which teams were assigned the most difficult schedule? Given the structure of games in the World Cup, it turns out to be easier e asier to answer these questions by performing a simulation of the games from the tournament than by explicitly calculating conditional probabilities of certain events happening.
The Structure of the World Cup Tournament Tournament The soccer World Cup has a complex, two-phase structure. Initially, the 32 qualified teams are divided into eight groups of four. Even though this allocation is based b ased upon a draw, teams are also divided into four value-pools according to their previous
58
Chapter 4: Simulating the Soccer World Cup
Andrei C. Grecu
international results, and each group is restricted to exactly one team from each value pool. Theoretically, this structure advantages the highest ranked teams, which are seeded such that they play lower ranked teams in the group stage. However, teams that participate in the World Cup are of comparable value and during the six round robin games played in each group, teams have three games to demonstrate their superiority over the other teams from the group. At the end of the group games, the first criterion for ranking the teams is the number of points achieved by each team in the three group games; in case of a tie, the next criterion used to differentiate between teams is the g oal 4
difference record of each team. Given the final ranking, the top two teams from each group advance to the second round of the competition, which consists of a simple 16team knockout arrangement. Moreover, the entire structure of games of the knockout stage is predetermined, such that the winner of group A plays the runner up of group B, the runner up of group C plays the winner of group D, and so on all the way to the final. This particular structure essentially splits the teams in two different halves, such that teams from one half will not play teams from the o ther half until the final of the World Cup, assuming that both teams reach that phase. The tree structure that describes the schedule o f games from the knockout stage of the World Cup tournament is depicted below. Since 48 games are played in the group stage, the figure labels games G49 to G62, leading all the way to the final:
4
It very rarely happens that these two criteria are not sufficient sufficient for ranking the teams, but other criteria for differentiating exist, such as the number of red/yellow cards received by each team, or the official international ranking of the teams at the time of the play.
59
Chapter 4: Simulating the Soccer World Cup
Andrei C. Grecu
WORLD CUP GAME SCHEDULE
Last 16
WA
Quarter-Finals Semi-finals FINAL Semi-finals Quarter-Finals
G49
G51 W49
RB
G57 WC
W51
RA
W59
G50
RD
WB
G59
W57
G52 W50
Last 16
G62
W62
W52
WD RC
FINAL G53 RF
G58
G54 RH
W53
G61
W61
W58
W55 W60
G55 RE
G60
G56
W54
W56
RG
Figure 5: The tree structure describing the games from the World Cup tournament. E.g.: The winner of group group A (WA) plays the runner-up runner-up of group B (RB) in game G49 G49
In theory, we could explicitly calculate conditional probabilities of winning the Cup for each participating team. However, even if we calculated conditional probabilities involving winning probabilities in games against all other 31 teams from the World Cup, we would still run into difficulties when ranking teams after the group stage games, since goal difference required for tie-breaking make the exact score a requirement. Therefore, I choose to simulate the games from the World Cup using a method inspired from Winston, and Albright (2001).
60
Chapter 4: Simulating the Soccer World Cup
Andrei C. Grecu
The Simulation Model Assume that at the beginning of the World Cup the value of team i is Vi, the corresponding value from the Eigenvector rankings. This simulation models the outcome of a game between team i and team j by generating two numbers from two normal 2
distributions with equal variance σ : the first number Pi is drawn from the distribution 2
N(Vi, σ ) and characterizes team i’s performance in the game; the second number P j is 2
drawn from the distribution N(V j, σ ) and characterize team j’s performance in the game. 2
2
Given these two numbers generated from N(Vi, σ ) and N(V j, σ ), the score Sij of the game between team i and team j is defined as:
Sij Pi Pj = N(Vi , ) N(Vj , )
The equal variance σ of the two distributions determines how often will the sample from the population with the larger mean be lower than the sample from the population with the smaller mean. This probability is particularly important for our interpretation of the model, since a negative nega tive score Sij = Pi – P j < 0 for Vi > V j translates into a “surprise”, or a game in which the team with lower value V j defeated the team with higher value Vi. Therefore, for all the 48 games g ames from the group stage, I look at the 48 values of the “favorite” team in each game, Vf , and I look at the 48 values of the “underdog” team in each game, Vu. The Central Limit Theorem tells us that V Vu has approximately a normal distribution regardless of the underlying population distributions. Therefore, I compute the mean and variance of the two sample sizes, the “favorites” and 2
the “underdogs”, and I determine the variance σ required to make P (Vf – Vu > 0) = 0.75, assuming that 25% of the games in the World Cup are upsets.
61
Chapter 4: Simulating the Soccer World Cup
Andrei C. Grecu
Sample Sample Size Sample Sample Mean Sample Sample Variance Variance Favorites 48.00 63.67 380.51 Underdogs 48 48.00 28.31 295.17
Table 3: Mean and variance variance for the groups of teams with higher value value and lower value, respectively, before the 48 games from the group stage of the World Cup
2
Thus for σ for σ = 58, this simulation simulation model will produce an “upset”- a situation in which given Vi > V j, the returned sample Pi P j , with probability 0.25.
Simulating the 2002 World Cup Using the model and the parameters described above, I simulate the games from the 2002 World Cup. First, I simulate the 6 games from each group and for each team I aggregate the three scores corresponding to its games. Then I rank the teams according to their aggregate scores and, having obtained the top two teams in each group, I simulate the predefined schedule of games until u ntil I get the winner of the World Cup. Cu p. I repeat the simulation 10,000 times and I count the number of times each team won the World Cup, thus determining the probability of winning the World Cup for each team. Having computed the probability for winning the World Cup for each team, we want to determine which teams were advantaged, and which teams were disadvantaged by the World Cup draw. It is generally agreed upon the fact that a Round Robin tournament, in which each team plays every other team, gives a fair ranking of the teams. Therefore, I also simulate a Round Robin alternative to the World Cup. I only run the
62
Chapter 4: Simulating the Soccer World Cup
Andrei C. Grecu
Round Robin simulation 1000 times, since every tournament takes 992 games, as opposed to only 64 in the case of the World Cup, and I compute the probability of winning the Round Robin tournament for each team. Then by comparing the results of the World Cup simulatiuon with the results of the Round Robin simulation, I can determine which teams were mostly mostly advantages or disadvantaged by the World Cup draw. Here are the results results of the 2002 World Cup and Round Robin simulations- apart from Italy, Brazil shows the greatest World Cup advantage, or difference between its winning probabilities for the World Cup and for the Round Robin tournament, which might partially explain their successful advancement all the way to the final and to winning the Cup:
Team Teams s ITA ARG GER BRA MEX USA ENG FRA URU CAM SPA SW E PAR BEL RUS DEN CRI NIG ECU SKO CRO PTG IRE
2002 SIMULATION RESULTS: Round ound Robin obin Worl orld Cup 6.89% 6.69% 6.34% 6.25% 5.68% 5.13% 4.97% 4.29% 4.10% 4.08% 4.01% 3.97% 3.49% 3.22% 3.07% 2.96% 2.86% 2.80% 2.79% 2.63% 2.41% 1.99% 1.76%
15.32% 12.86% 9.90% 13.44% 8.06% 7.20% 4.50% 3.94% 3.62% 2.50% 3.76% 1.44% 2.28% 1.66% 1.40% 1.34% 1.58% 0.64% 0.66% 1.08% 0.46% 0.52% 0.32%
63
Worrld Cup Advan Wo dvanttage age 8.43% 6.17% 3.56% 7.19% 2.38% 2.07% -0.47% -0.35% -0.48% -1.58% -0.25% -2.53% -1.21% -1.56% -1.67% -1.62% -1.28% -2.16% -2.13% -1.55% -1.95% -1.47% -1.44%
Chapter 4: Simulating the Soccer World Cup SAU TUN SAF JAP TUR SEN SLO CHN POL
1.75% 1.64% 1.34% 1.19% 1.11% 0.41% 0.13% 0.05% 0.00%
Andrei C. Grecu
0.24% 0.34% 0.34% 0.18% 0.20% 0.10% 0.04% 0.02% 0.06%
-1.51% -1.30% -1.00% -1.01% -0.91% -0.31% -0.09% -0.03% 0.06%
Predictions for the 2006 World Cup Running the simulation for the 2006 World Cup shows Brazil as the big favorite to win the World Cup, having the highest rating in both tournaments, as well as the highest World Cup advantage, or increase from the Round Robin to the World Cup winning percentages, indicating a “lucky draw”:
2002 SIMULATION RESULTS: Teams BRAZIL ITALY GERMANY MEXIC NETHERLAND ARGTENTINA FRANCE ENGLAND SPAIN PARAGUAY SW EDEN CROATIA PORTUGAL USA SOUTH KOREA SAUDI ARABIA COSTA RICA ECUADOR TUNISIA CZECH REPUBLIC
Round Robin
World Cup
World Cup Advantage
6.89% 6.69% 6.34% 6.25% 5.68% 5.13% 4.97% 4.29% 4.10% 4.08% 4.01% 3.97% 3.49% 3.22% 3.07% 2.96% 2.86% 2.80% 2.79% 2.63%
17.58% 11.04% 10.24% 8.42% 6.96% 6.20% 6.32% 4.54% 2.36% 3.56% 4.16% 2.80% 0.00% 0.30% 0.00% 1.00% 0.00% 0.98% 0.02% 0.48%
10.69% 4.35% 3.90% 2.17% 1.28% 1.07% 1.35% 0.25% -1.74% -0.52% 0.15% -1.17% -3.49% -2.92% -3.07% -1.96% -2.86% -1.82% -2.77% -2.15%
64
Chapter 4: Simulating the Soccer World Cu JAPAN SW ITZERLAND UKRAINE YUGOSLAVIA IRAN AUSTRALIA GHANA IVORY COAST TRINIDAD & TOBAGO POLAND ANGOLA TOGO
2.41% 1.99% 1.76% 1.75% 1.64% 1.34% 1.19% 1.11% 0.41% 0.13% 0.05% 0.00%
Andrei C. Grecu 0.38% 3.16% 0.94% 2.34% 0.22% 0.20% 0.10% 0.10% 0.04% 0.08% 0.16% 0.80%
65
-2.03% 1.17% -0.82% 0.59% -1.42% -1.14% -1.09% -1.01% -0.37% -0.05% 0.11% 0.80%
Chapter 5: Conclusion
Andrei C. Grecu
Chapter 5: This thesis proposes a new approach to ranking the teams that qualify for the soccer World Cup. In essence, it claims that the quality of soccer teams is different on different continents and therefore results in regional qualifying tournaments, against less competitive teams, are not so relevant to the present value of the teams. Instead, it looks at historical games between competitive teams on different continents and implements three algorithms that take into account the strengths of the opponents encountered in the past when determining the present value of the teams. Chapter 2 lays the foundation of the thesis by presenting and then implementing three algorithms for ranking teams in uneven p aired competitions. The three algorithms use different mathematical tools to rank the teams. First, a matrix-based method of ranking takes as input previous interactions among teams and returns an eigenvector with the relative value assigned to each team. Second, a random walker algorithm looks at the steady-state solution of a setting in which a number o f voters continuously change their mind regarding their favorite team, thus executing random walks on a network defined by the participating teams and their previous interactions. Third, a biologically inspired neural network algorithm constructs a neural network whose nodes are the participating teams and whose connections are determined by previous interactions among the teams, and uses a soccer-intuitive transfer function to update the value of each node until a steady state solution is reached. Although very different in their approach , all three algorithms do a good job in ranking the teams, as tested in Chapter Cha pter 3.
66
Chapter 5: Conclusion
Andrei C. Grecu
Chapter 3 contains the most meaningful results of this thesis. Looking four years back at the 2002 World Cup in South Korea and Japan, and assuming a Poisson distribution of goals in a World Cup game, a nonlinear regression for the number of goals g oals scored in each separate game from the 2002 World Cup is fit, using the rankings presented above as predictive variables. Using the result of the regression, we derive predictions regarding the outcome of each game from the 2002 World Cup. When comparing predictions with results, it turns out that a strategy that consistently picks the winner of World Cup games using my predictions is more accurate that a strategy that always picks the favorite implied by the odds offered by the London bookies. Finally, Chapter 4 determines determines the probabilities for winning the World Cup for each team by simulating the predefined structure of games from the World Cup. Also, by comparing the results of the World Cup simulation to the results of a Round Robin tournament simulation, the teams with a “lucky draw” are identified.
As far as I know, this is the fist work that draws a parallel between the challenge of ranking the teams that participate in the soccer World Cup, and the challenge ch allenge faced every year by the Bowl Championship Series (BCS) in ranking American c ollege football. The two tasks are similar in that the nu mber of games played by every team is relatively small, teams play many more games within their regional league than against teams in other leagues, and the quality of the opponents varies from region to region. This thesis modified some of the methods currently used to rank football teams, so that they rank the soccer teams from the World Cup instead. It is my hope that further
67
Chapter 5: Conclusion
Andrei C. Grecu
research will look at other mathematically rigorous BCS ranking methods and adapt them to rank the soccer teams that qualify for the soccer World Cup. Another idea for future research might focus on changing the methods discussed in this thesis, so that games that take place in the World Cup are immediately incorporated into the model. Since there is no better indication of the relative value of a team than its last results, updating the dataset of historical game during the tournament could make the models even more accurate.
Finally, whenever people are predicting the results of future future events, there are betting implication involved. Since my algorithms were proven to be better predictors of individual game results than the bookies for the 2002 World Cup games, there might be “arbitrage opportunities” in the case of the upc oming soccer World Cup games. Of course, my predictions still have a relatively high rate of failure, so there is no obvious moneymaking betting strategy. However, here are my predictions for the 2006 World Cup group games, as well as the simulation results for the probabilities for winning the 5
World Cup assigned to each team . Bet at your own risk!
1 2 3 4 5 6 7 8
TEAM A
TEAM B
A W INS B W INS
TIE
Germany Poland Germany Ecuador Ecuador Costa Rica England Tr Trinidad &Tobago
Cost Rica Ecuador Poland Costa Rica Germany Poland Paraguay Sweden
47.75% 24.58% 22.45% 53.45% 75.79% 8.19% 8.19% 25.87% 47.56% 16.52% 59.87% 64.74% 14.46% 37.53% 33.81% 15.61% 62.75%
27.67% 24.10% 16.00% 16.00% 26.58% 23.61% 20.80% 28.66% 21.64%
5
Probability of Winning the Cup Brazil Italy Germany Portugal Netherlands France Argentina England
In making the predictions I used the team value from the Eigenvector rankings, which were the most accurate in 2002.
68
17.58% 11.04% 10.24% 8.42% 6.96% 6.32% 6.20% 4.54%
Chapter 5: Conclusion 9 England Trinidad & Tobago 10 Sweden Paraguay 11 Sweden England 12 Paraguay Trinidad & Tobago 13 Argentina Cote d’Ivoire 14 Serb Serbia ia & Mont Monten eneg egro ro Neth Nether erla land nds s 15 Argentina Serbia & Monte ne negro 16 Netherlands Cote d’Ivoire 17 Netherlands Argentina 18 Cote Cote d’Iv d’Ivoi oire re Serb Serbia ia & Mont Monten eneg egro ro 19 Mexico Iran 20 Angola Portugal 21 Mexico Angola 22 Portugal Iran 23 Portugal Mexico 24 Iran Angola 25 Italy Ghana 26 USA Czech Republic 27 Italy USA 28 Czech Republic Ghana 29 Czech Republic Italy 30 Ghana USA 31 Brazil Croatia 32 Australia Japan 33 Brazil Australia 34 Japan Croatia 35 Japan Brazil 36 Croatia Australia 37 France Switzerland 38 Korea Republic Togo 39 France Korea Republic 40 Togo Switzerland 41 Togo France 42 Switzerland Korea Republic 43 Spain Ukraine 44 Tunisia Saudi Arabia 45 Spain Tunisia 46 Saudi Arabia Ukraine 47 Saudi Arabia Spain 48 Ukraine Tunisia
Andrei C. Grecu 66.41% 13.29% 33.98% 37.58% 32.12% 39.43% 64.56% 14.44% 68.00% 12.29% 13.3 13.36% 6% 65.77% 63.54% 14.75% 70.14% 11.06% 37.61% 33.19% 11.0 11.06% 6% 70.14% 68.88% 11.60% 14.54% 64.71% 75.60% 8.34% 8.34% 57.20% 19.15% 24.82% 47.61% 44.36% 30.30% 73.12% 9.37% 9.37% 47.74% 25.90% 51.23% 21.96% 46.83% 27.59% 14.41% 63.33% 18.45% 58.36% 55.41% 18.66% 43.08% 30.06% 62.48% 14.48% 24.21% 49.76% 10.91% 69.30% 42.62% 29.80% 57.10% 18.69% 61.96% 16.37% 46.33% 26.25% 23.86% 51.85% 10.32% 71.87% 27.11% 46.47% 47.53% 25.67% 33.69% 39.25% 51.55% 22.71% 35.47% 37.21% 24.95% 48.48% 40.12% 32.83%
69
20.29% 28.44% 28.45% 20.99% 19.70% 20.86% 21.70% 18.80% 29.20% 18.80% 19.52% 20.75% 16.05% 16.05% 23.65% 27.57% 25.34% 17.50% 17.50% 26.36% 26.81% 25.58% 22.25% 23.19% 25.92% 26.86% 23.04% 26.02% 19.79% 27.58% 24.21% 21.67% 27.42% 24.30% 17.80% 26.42% 26.80% 27.06% 25.73% 27.32% 26.57% 27.05%
Sweden Paraguay Switzer zerland Croatia Spain Serbia Serbia & Montenegro Montenegro Saudi Arabia Ecuador Ukraine Mexico Czec Czech h Repu Republ blic ic Japan USA Iran Australia Angola Ghana Cote d’Ivoire Poland Trinid Trinidad ad & Tobago Tobago Tunisia T og o Kore Korea a Repu Republ blic ic Costa Rica
4.16% 3.56% 3.16% 2.80% 2.36% 2.34% 1.00% 0.98% 0.94% 0.80% 0.48 0.48% % 0.38% 0.30% 0.22% 0.20% 0.16% 0.10% 0.10% 0.08% 0.10% 0.10% 0.04% 0.00% 0.00 0.00% % 0.00%
Appendix A: Neural Network C++ Code
Andrei C. Grecu
APPENDIX A: #include #include #include #include #include #include
#define NMAX 100 #define INFI 0x3f3f3f3f #define S 50 using namespace std; cons co nst t do doub uble le epsilon = 6.0; /* prevteam[i] = value of team i at time t-1 */ stat st atic ic do doub uble le prev[NMAX]; /* currteam[i] = value of team i at time t */ stat st atic ic do doub uble le curr[NMAX]; /* team names */ vector teams; /* number of teams */ int N; int adj[60][60][15]; double update(int i, int j, int k) { if (i == j || adj[i][j][k] == 100) return INFI; int dif = adj[i][j][k]; if (prev[i] == prev[j]) { /* win */ if (dif > 0) { double maxgain = (0.5-0.005*prev[i]) *prev[j]+ 25+0.75*prev[i]; return (maxga (maxgain in - prev[i prev[i])/5 ])/55.0 5.0 * abs(dif*k) + prev[i]; } /* lose */ if (dif < 0)
70
Appendix A: Neural Network C++ Code
Andrei C. Grecu
{ double maxloss = 0.005 * prev[i] * prev[j] + 0.25 * prev[i]; return (prev[ (prev[i] i] - maxlos maxloss)/5 s)/55.0 5.0 * abs(dif*k) + maxloss; } /* tie */ return prev[i]; } if (prev[i] > prev[j]) { /* win */ if (dif > 0) { double maxgain = (0.5-0.005*prev[i]) *prev[j]+ 25+0.75*prev[i]; return (maxga (maxgain in - prev[i prev[i])/5 ])/55.0 5.0 * abs(dif*k) + prev[i]; } /* lose */ if (dif < 0) { double maxloss = 0.005 * prev[i] * prev[j] + 0.25 * prev[i]; return (prev[ (prev[i] i] - maxlos maxloss)/5 s)/55.0 5.0 * abs(dif*k) + maxloss; } return 0.005 * prev[i] * prev[j] + 0.5 * prev[i]; /* tie */ double maxloss = 0.005 * prev[i] * prev[j] + 0.25 * prev[i]; return (2.0*maxloss-prev[i])/12.0*k+prev[i](2.0*maxloss-prev[i])/12.0; } if(prev[i] < prev[j]) if(prev[i] { /* win */ if(dif if (dif > 0) { double maxgain = (0.5-0.005*prev[i]) *prev[j]+ 25+0.75*prev[i]; return (maxga (maxgain in - prev[i prev[i])/5 ])/55.0 5.0 * abs(dif*k) + prev[i]; } /* lose */ if (dif < 0) { double maxloss = 0.005 * prev[i] * prev[j] + 0.25 * prev[i]; return (prev[ (prev[i] i] - maxlos maxloss)/5 s)/55.0 5.0 * abs(dif*k) + maxloss; }
71
Appendix A: Neural Network C++ Code
Andrei C. Grecu
/* tie */ double maxgain = (0.5-0.005*prev[i]) *prev[j]+ 25+0.75*prev[i]; return (maxgain-prev[i])/24.0*k + (25.0*prev[i]maxgain)/24.0; } } /* update the team value at time t */ void team_value() { for (int i = 0; i < N; ++i) { double val = 0.0; int cnt = 0; for (int j = 0; j < N; ++j) for (int k = 1; k <= 13; ++k) { double tmp = update(i, j, k); if (tmp != INFI) { val += tmp; ++cnt; } } if (cnt != 0) curr[i] = (double (double) ) val/cnt; else curr[i] = prev[i]; } /* scaling */
double oldmax = -INFI; double oldmin = INFI; for (int i = 0; i < N; ++i) { oldmax = max(oldmax, curr[i]); oldmin = min(oldmin, curr[i]); } double a = 100.0/(oldmax-oldmin); double b = - 100.0*o 100.0*oldmi ldmin/(o n/(oldma ldmax-o x-oldmi ldmin); n); for (int i = 0; i < N; ++i) curr[i] = a*curr[i] + b; }
72
Appendix A: Neural Network C++ Code
Andrei C. Grecu
bool finish() { for (int i = 0; i < N; ++i) if (abs(prev[ (prev[i] i] - curr[i] curr[i]) ) >= epsilo epsilon) n) retu re turn rn fa fals lse e; retu re turn rn tr true ue; ; } void stable() { int iter = 0; while (true true) ) { if (iter == 1000) break; break ; ++ iter; //cerr << ",,,,,,," << endl;
team_value();
if (finish()) break; break ;
/* backup computed values */ for (int i = 0; i < N; ++i) prev[i] = curr[i]; /* print current iteration */ if(iter<=1){ if (iter<=1){ cout << "Begin Iteration # = " << iter << endl; for (int i = 0; i < N; ++i) cout << teams[i] << " " << curr[i] << endl; cout << "End Iteration" << endl; } } cout << "# of iterations: = " << iter << endl; } int team_to_int(string s) { for (int i = 0; i < N; ++i) ++i) if (teams[i].compare(s) == 0) return i;
73
Appendix A: Neural Network C++ Code
Andrei C. Grecu
return -1; } void read_data() { /* read the # of teams */ cin >> N; /* read all the teams */ for (int i = 0; i < N; ++i) { string s; cin >> s; teams. push_back(s); } /* read the games */ while (true true) ) { string a, b; int score; int year; year = -1; cin >> a >> b >> score >> year; if (year == -1) break; break ; //
cerr << a << " " << b << " " << score << " " << year << endl; int na = team_to_int(a); int nb = team_to_int(b);
assert(na != -1); assert(nb != -1); year -= 1993; adj[na][nb][year] = score; } } int main(void void) ) { for (int i = 0; i < 60; ++i) for (int j = 0; j < 60; ++j) for (int k = 0; k < 15; ++k) adj[i][j][k] = 100;
74
Appendix A: Neural Network C++ Code
Andrei C. Grecu
read_data(); for (int i = 0; i < N; ++i) prev[i] = S;
stable(); /* print final results */ cout << "FINAL RESULTS" << endl; for (int i = 0; i < N; ++i) cout << teams[i] << " " << curr[i] << endl; cout << "DONE!" << endl; return 0; }
75
Appendix B: Neural Network Evolution
Andrei C. Grecu
APPENDIX B:
After about 50 iterations, the neural network converges to constant values for every team:
BRA
Neural Network Evolution
ITA FRA
120
ARG ENG
100
GER PTG
e 80 u l a V 60 m a e T 40
NET SPA SWE SW E CZE MEX PAR
20
JAP 0
IRA 0
20
40
60
Iterations
76
80
100
IVO YUG
Bibliography
Andrei C. Grecu
Callaghan, Thomas; Muncha, Peter; and Porter Manson. “Random Walker Ranking for NCAA Division I-A Football”, Georgia Institute of Technology, June 15, 2005 Callaghan, Thomas; Muncha, Peter; and Porter Manson. “The Bowl Championship Series: A Mathematical Review”, Notices of the American Mathematical Society, September, 2004 Cassady, Richard; Maillart, Lisa; Salman, Sinan. “Ranking Sports Teams: A Customizable Quadratic Assignment Approach”, Interfaces, Vol. 35, No. 6, NovemberDecember 2005 Coleman, Jay. “Minimizing Game Score Violations in College Football Rankings”, Interfaces, Vol. 35, No. 6, November- December 2005 Colley, Wesley; Colley’s Bias Free College Football Ranking Method, Ph.D. dissertation, Princeton University th
Devore, Jay. Probability and Statistics for Engineering and Statistics, 6 Edition. Belmont: Belmont: Brooks/ColeBrooks/Cole- Thomson Thomson learning, learning, 2004. Dixon, Mark and Coles, Stuart. “Modeling Association Football Scores and Inefficiencies in the Football Betting Market”, Applied Statistics (1997), 46, No. 2 Dobson, A.J. An Introduction to Generalized Linear Models, Chapman and Hall, London 2001 Dyte D. and Clarke S.R. “A ratings based Poisson model for World Cup soccer simulation”, Journal of the Operational Research Society (2000) 51, 993-998 nd
Fleiss, Joseph L. Statistical Methods for Rates and Proportions. 2 Edition, New York: John Wiley & Sons, 1981. Goddard, Stephen. “Ranking in Tournaments and Group Decision Making”, Management Science, Vol. 29, No.12, Dec. 1983 Sharpe, Graham. Gambling on Goals: A Century of Football Betting, Mainstream Publishing, Edinburgh and London 1997 Jech, Thomas. “The Ranking of Incomplete Tournaments: A Mathematician’s Guide to Popular Sports”, Department of Mathematics, Pennsylvania State University Karlis, Dimitris and Ntzoufras, Ioannis. “Statistical Modelling for Socce r Games: The Greek League”, Department of Statistics, Athens University of Economics and Bu siness
77
Bibliography
Andrei C. Grecu
Keener, James. “The Perron-Frobenius Theorem and the Rank ing of Football Teams”, Society for Industrial and Applied Mathematics Review, Vol.35, No.1, pp. 80-93, March 1993 Koning, Ruud. “Balance in Competition Compe tition in Dutch Soccer”, The Statistician, 49, Part 3, p p 419-431 Martinich, Joseph. “College Football Rankings: Do Computers Know Best?”, College of Business Administration, Administration, University of MissouriMissouri- St. Louis, May 2002 McCullagh, Peter and Nelder, J.A. Generalized Linea r Models, Chapman and Hall, London 1989 McGarry T. and Schutz RW. “Efficacy of Traditional Sport Tournament Structures”, Journal of the Operational Research Society (1997), 48, 65-74 Palomino, Frederic; Rigotti, Luca; Rustichini, Also. “Skill, Strategy, and Passion: an Empirical Analysis of Soccer”, in progress, Department of Econometrics, Tilburg University Pollard R, Benjamin B, and Reep C. “Sports and the Negative Binomial Distribution”, Optimal Strategies in Sports, 1977 Wilson, Rick. “Ranking College Football Teams: A Neural Network Approach”, Interfaces 25, July-August 1995 (pp. 44-59) Winston, Albright. Practical Management Science, Brooks/Cole, 2001
78