Predicting Football Matches using Neural Networks in MATLAB® ____ ______ ____ ____ ____ ____ ____ _______ _________ ________ _________ ________ _______ _______ _______ _______ ________ ________ _______ ________ _________ _________ _________ _______ _______ ________ _______ _______ _______ _______ ________ _________ ________ ________ ____
Predicting Football Matches using Neural Networks in MATLAB® Andrade, Andrade, Pablo; Cisneros Jorge; Suárez Francisco Francisc o Escuela Politécnica Nacional, Faculty of Mechanical Engineering, Quito, Ecuador Mechatronics Abs Abs tract:
The purpos e of this this project is to antici pate the the outcome outcome of a football match of a loca l team (LDU) (LDU) usi ng various typ types of neural neural networks networks via MATL MATLAB AB®. ®. To a chi eve the objective of the project a s eries of input data has to be col lected in rela tion with with the footbal l team team in question, the data was coll ected ected bas ed on pas t matches matches records a gainst gainst different di fferent teams. teams. Wi th the relevant data and the the target ta rget for the proj ect three vi rtual neural networks where tra train ined ed (Perceptron, Feed-Forward and Cascade) and simulated with the latest match played by the home team to see if the network network coul d predic t accurately accur ately the outcome of the match. The best resul ts were were achi eved eved with the implementation implementation o f a feed-forward feed-forward neural network. network. These r esults as well well as the results from the other other types types of networks networks util ized are thoroughly thoroughly di scus sed in this project.
1
1
fare, si mply becaus becaus e the the ana lys is can use more more data. On the other hand, the bigger the p ercentage of the data that is used for testing, the more statis tically reli able our test test wil l be. In order to spl it al l of the data, data, Weka offers a very good s olution olution for this problem, namely a ten-fold cross validation. It splits the data i nto ten ten equal -si zed zed portions and us es nine out of of ten ten portions as training data and the last one as testing data. It repeats repeats the process ten ten times, each time choos ing a di fferent fferent portion a s the testi testi ng data [2]. The s election of the the r elevant features features i s an i mportant mportant feature since an accurate set makes it a lot easier to predict the outcomes outcomes of matches. matches. Features Features a re char acteristics of recent recent matches matches of the teams teams i nvolved, but how far i n his tory do we need to go in order to get get the best predi ctions ctio ns? ? To ans wer this this question we we set up a very very bas ic s et of features features a nd then then each time we changed the amount of history looked at and compared the resul ts. This ini tial s et inc luded the foll owing owing features:
INTRODUCTION
There are ar e many many methods methods to predict the outcome of a football match. It ca n be predicted via a s tatisti c model, model, using an ordered ordered probit regress regress ion model. model. Thi s parti cul ar method method was was used to predict predict Englis h league footbal footbal l matches [1]. In the static model, a wide range of variables were taken in acc ount, in a ddition to to the di fferent fferent teams teams pas t matches’ results data. These variables are the significance of each match for champions hip, promotion promotion or relegation is sues; the involvement of the teams in cup competition; the geographi geographi cal di stanc e betwe between en the teams’ home towns; towns; a nd a ‘big team’ effect [1]. Knowing that these these resul ts wil l serve as a s tarting point in establishing the prices and award for betting in the sports industry, the efficiency of such prices is also analyzed using empiri empiri cal results [1]. A limited but increasing number of academic researchers have attempt attempted ed to model model match results data for footbal footbal l. I t is in this way that that i t can be obs erved erved that di fferent fferent dis tribution tributions are us ed, such as the poi ss on and the negative binomial distributions [1]. The s tatisti c take take on predicting football football matches i s widely used for i ncreasing the bett betting ing chances of the us er, howeve howeverr , the al gorithm al so requires traini ng the machi machi ne. A database is col lected lected during the pas t years years to have an ana lys is samp sample for trai ning and for vali dation. The bigger bigger the percentage percentage of the data that is used for training, the better the system will
Andrade, Pablo: Mechanical Mechanical enginee engineering ring student Cisneros, Jorge: Mechanical Mechanical engineering engineering student Suárez, Suárez, Francisco : Mechanical Mechanical engineering engineering stud ent
1
Goals scored scored by home team in its latest x matches Goals scored scored by away team in its l atest x matches Goals conceded conceded by home team te am in its i ts latest x matches Goals conceded conceded by away team te am in its latest x matches Average number of points gained by home team in its i ts latest x matches matches
Andrade, Pablo; Cisneros Jorge; Suárez Francisco _____________________________________________________________________________________________________________________________ __
Average number of points gaine d by away team in its latest x matches
The x stands for the (vari able) number of matches l ooked at. The fi rst four features are pretty strai ghtforward, the la st two descri be the poi nts the home and away team gai ned in their la test matches. These are cal cula ted as in the football competition itsel f, namely, 3 points for a win, 1 for a draw and 0 for a l oss . The average over the l atest x matches is taken. By importing the features i n Weka a nd letting several machine learning algorithms classify the data as described in Section 1.3, a percentage of correctly predic ted ins tances is given. Now that an optimal number of matches to be consi dered has been found, we can move on to selecting the best possible classifier (machine learning algorithm). These will by means of a c ertain machine learni ng algorithm cla ssify all matches as home wins, draws or a way wins , depending on the features belonging to that match. Duri ng the previous test round a selection has already been made. Below is a li st of seven clas si fiers incl uding a short descri ption of each one:
ClassificationViaRegression – This algorithm uses linear regression in order to predict the right class. MultiClassClassifier – This algorithm is a lot like ClassificationViaRegression, except that it uses logi stic regression instead of linear regression. RotationForest – This algorithm uses a decision tree to predict the right class. LogitBoost – This is a boosting algorithm that alsouses l ogistic regression. BayesNet – This algorithm uses Bayesian networks topredict the right class. NaiveBayes – This algorithm resembles BayesNet, except Home wins – This algorithm wil l, regardless of the feature set, always predict a home win.
In the previous s ection we have already s een that the firs t two perform best, usi ng the gi ven simple feature s et. We now expand our feature s et by a few more features and make several selections of them to see which classifier is best. Pleas e note that the “home wins” -cla ssifier is used merely as a reference. I t can immedia tely be seen that thi s cla ssifier performs wors e than a ll the others . A Bayesi an Network was used to predict the resul ts of Barcelona FC team in the Spani sh League [3]. During the last decade, Bayesi an networks (and probabi listic graphical models in general ) have become very popular in artificial intell igence. Bayesia n networks (BNs) are gra phica l models for reasoning under uncertainty, where the nodes represent variables (discrete or continuous) and arcs represent direct connections between them. These direct connecti ons are often caus al connections. In addi tion, BNs model the
2
quantitati ve strength of the connections between variables, allowing probabilistic beliefs about them to be updated automatic al ly as new i nformation becomes avai la ble. A Bayesia n network for a set of vari ables X = {X1,…..,Xn} cons ists of:
1. A network structure S that encodes a set of conditional independence assertions about variables in X, 2. A set P of local probability distributions associated with each variable. Together, these components defi ne the joint probability distribution for X. The network structure Si s a directed acyclic graph. The BN used in the research of ref. [3] is a s fol lows:
A neural network approa ch can be establ is hed to predic t the results of football matches. It is the case of ref. [4]. In that work, the input and output variables were known, however the hidden la yer and weight dis tributions were not known. Another way of obtaining the wanted results, a compound approa ch can be adopted, as explai ned in ref. [5]. The authors designed FRES (Football Result Expectation System), which consi sts of two maj or components: a rul e bas ed reasoner and a Bayesi an network component. Thi s a pproach i s a co mpound one in the sense that two different methods cooperate in predicting the resul t of a footbal l match. The reasoning can be divided into two stages, strategymaking and r esult-ca lcula ting. Strategies i ncl ude overl apping, man-marking, pressing, position, and passing. The results from Bayesian networks form the bases for these decisions. Each team is assumed to have its own particular characteristics, such as work rate, aggressiveness, pass length, etc. Jess takes al l these facets i nto considerati on to determine a strategy. As well a s pla y-making strategies, the system al so reasons about higher-level decis ions such as subs titutions a nd formation changes. The resul t calcula ting part models the actual flow of a match. It models s uch a spects as the effect of goal s on moral e, the effect of reputations, relati ve scor es, and l ocations on the state of the players . The state changes throughout the match – for example, perhaps a team’s morale is very good at one moment; if nothing speci al happens for a long time then their morale can be expected to converge to normal [5].
Predicting Football Matches using Neural Networks in MATLAB® _________________________________________________________________________________________________________________________
A Bayesi an network, Bayes network, belief network , Bayes(ian) model or pr obabi lis tic directed acycl ic graphical model is a probabi listic graphical model (a type of s tatist ical model) that represents a set of random variables and their conditi onal dependencies vi a a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabi lis tic relationships between di seas es and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases [6]. WHAT
The project intends to create an artificial neural network capa ble of predicting within a r easonabl e margin of error the outcome of a football match during a specifi c season based on s tatisti cal data from past seas ons and performance ratings from the pl ayers as well a s the team a s a whole when pl aying against other team from the same league.
Cascade Feed forward Perceptron
These networks wil l be defined usi ng the NNTOOL toolbox of MATLAB. The results of these si mulations are shown in the next section.
2.3 Simulation The si mulation process cons ists i n adding the statis tics of the la st match and compare the si mulation with the result i n the reality.
3
RESULTS
The results of the different networks are presented for LIGA DE QUITO firstly.
3.1 LIGA DE QUITO
WHY
2
Mathematical and statistical challenge The process needed to train an artificial neural network can be implemented in other similar applications Advancing the artificial i ntelligence field. Betting
3.1.1 Perceptron
METHODOLOGY
The team to be analysed will be LIGA DE QUITO this being the la st winners of the sta ge in the Ecuadori an Cup. A neural network wil l be establi shed for each team, taki ng in account the statis tics from 15 matches of the l as t seas on. These statistics are taken from http://www.futbolmetrics.com. [7]
2.1 Inputs.1. 2. 3. 4. 5. 6.
Shooting ratings Effectivity ratings Goalkeeper saves Team defensive chal lenges won Goals in favor Goals agai nst
2.2 Outputs.1. 2.
Winni ng the match. Drawing the match.
3.
Losing the match.
The neural network methodology consi sts in establ is hing three different types of network:
3
Andrade, Pablo; Cisneros Jorge; Suárez Francisco _____________________________________________________________________________________________________________________________ __
3.1.3 Cascade
3.1.2 Feed forward
4
Predicting Football Matches using Neural Networks in MATLAB® _________________________________________________________________________________________________________________________
3.2 SIMULATION The statistics of the match taken in account are the ones of the s econd match of the s econd seas on of the 2015, as shown below [7].
However, the error i n this network is null for predicting the los ses a nd the winnings. In the s imula tion, this network di d not predict accurately the outcome of the test match, effectively, it shows a winning score.
4.2 Feed forward This network was i mplemented wi th 3 la yers with 10 neurons in the first and second layers. The feed forward network begins with a large error, but the trai ning process reduces the error dr amatica lly. Effectively, the error in the las t trai ning was in the order of 4 ∗ 10 − . The results of the training were proven to be very accurate compar ing with the target. There were no val ues that differed with the expected results.
These simulations will be done in each neural network. The combined results of these simulations are shown below.
In the s imula tion process , it i s the onl y network that accur ately predic ted the outcome, it predicted a dra w (very close to 1).
4.3 Cascade This network showed a good tra i ning process, a reduced error in each trai ning. The error in the learning process turned out to be small for the las t trai ning, in the order of 4 ∗ 10 − . The expected resul t is a dr aw, i.e. a matrix a s of: [ 0; 0; 1] The s imula tion that better sui ts the resul t is for the Feed Forward network: [. ; .; .]
The s imula tion resul t of this neural network was not conclusi ve, since it didn’t predict a ny outcome, the values of drawing, losing and winning were 0.
4.4 Applications
4
DISCUSSION AND APPLICATIONS
4.1 Perceptron Network
This work ca n be appli ed with further r efinement in the input vari abl es to predict the outcome of a footbal l match. Another appl ic ation of this project can be in other sports.
The perceptron network is the simplest kind of network and it has a better vis ual way of comparing the resul ts. Effectively, the perceptron network shows val ues of 1, 0 or -1. This makes it easi er to compare. The traini ng stage is a ls o eas ier, however, the results di d not converge, and the network always reached the maximum epoch without a conclusive result. The error i n predicting the resul t of drawi ng is l arge.
5
Andrade, Pablo; Cisneros Jorge; Suárez Francisco _____________________________________________________________________________________________________________________________ __
5
CONCLUSIONS AND RECOMMENDATIONS
The best suited neural network for this project is the Feed forward network, since it was the one that learnt that scoring more goals than those the team receive s translates into winning the match. The perceptron network is not suited for this kind of project, since it does not cope well with drawings. The cascade network is not good for this project, since it does not predict any outcome. The current network does not predict accurately, since it needs the scored goals to predi ct. Further variables are needed in order to discard the goals from the inputs.
6
References
[1]
J. Goddard, Modell ing football match resul ts and the effici ency of fixed-odds betting, Swansea: Universi ty of Wales.
[2]
D. Buursma, Predicti ng sports events from pas t resul ts, Twente: Univers ity of Twente.
[3]
P. E. a . F. S. M. Farzin Owramipur, "Football Result Prediction with Bayesi an Network in Spa nis h LeagueBarcelona Team," vol. 5, no. 5, 2013.
[4]
[Onli ne]. Avai la ble: http://neuroph.sourceforge.net/tutorials/SportsPredi ction/Premier%20League%20Prediction.html.
[5]
C. C. a. R. I. (. M. Byungho Min, "A Compound Approach for Football Result Prediction," Seoul National Universi ty, Seoul.
[6]
"Bayesi an network," [Onli ne]. Avail able: https://en.wiki pedia.org/wiki /Bayesi an_network.
[7]
"http://www.futbol metric s.com/," [Onl ine].
6