Sign up or up or Login
betti bet ting ngex expe pert rt » Bl Blog og » Ho How w To Bu Buil ild d A Mo Mont nte e Car Carlo lo Si Simu mula lati tion on
English
How To Build A Monte Carlo Simulation What is a Monte Carlo Simulation? How can c an it help you project end of season points totals and finishing positions? Today on the blog Zach Slaton introduces Monte Carlo simulations and shows us how to develop one
!y Zach Slaton "ublished# $th %une &'() *pdated# &+th ,ebruary &'(-
29
5
Like
Tweet
0
This is the forth post in Zach Slaton.s series e/plaining how to use simple0but0effective statistical concepts that can help provide a richer understanding of the data already at your fingertips The first post in the series dealt with how linear regression prediction intervals can yield deeper insights1 the second post e/plained how to use e/ponential regression to 2uantify rare events li3e goal scoring totals1 and the third post e/plained how ordered logistic regression can be used to forecast individual match outcomes Today Zach e/plains how individual match outcome li3elihoods can be used to simulate the outcome of the all the remaining fi/tures in a season
In m last post in this series I explained how an ordered logistic regression could be built to explain soccer match outcomes! and e"en pro"ided se"eral examples o# the tpes o# inputs I$"e included in the ordered logistic regression models I ha"e built o"er time% These models are highl use#ul in understanding the potential impact statisticall signi#icant predictors ma ha"e on the li&elihood o# a match ending in a win! tie! or loss%
But how can those indi"idual building bloc&s be assembled to #orm a comprehensi"e #orecast #or how all o# the teams in a league ma sit relati"e to each other o"er the next wee&! next month! or at the end o# the season' There appears to be a nearl in#inite number o# point combinations that could be realised gi"en there are ()* matches in a +*,team league$s season! each match could end in a loss! tie! or win #or each team! and no match has the odds o# each outcome e"enl split into thirds% How can an analst ma&e sense o# such a range o# possible outcomes'
Introducing Monte Carlo Simulation -ne answer to this complexit is Monte Carlo simulation% As the name implies! Monte Carlo simulation is essentiall a .model o# chance%/ 0i&ipedia describes it as1
.2a broad class o# computational algorithms that rel on a repeated random sampling to obtain numerical results! i%e% b running simulations man times o"er in order to calculate those same probabilities heuristicall 3ust li&e actuall plaing and recording our results in a real casino situation2 Monte Carlo methods are mainl used #or three distinct problems1 optimisation! numerical integration! and generation o# samples #rom a probabilit distribution%/
The repeated random simulations o# indi"idual inputs can thus pro3ect the li&elihood o# an aggregate outcome i# one has the probabilit o# outcome4s5 #or each e"ent% Such an
approach ma sound intimidating! but a solution can be #ound in the much,maligned,but, in#initel,use#ul Microso#t Excel%
Simulating Individual Match Results To start! assume that the analst interested in the aggregate outcome has created a model in their statistical tool o# choice% In this case! it$s a model that pro3ects the li&elihood o# winning! ting 4drawing5! or losing a match% The model is applied to each match in a league season! in this case Ma3or 6eague Soccer in the 7nited States%
The #irst order o# business is to create a random outcome #or each match! and the method used within this example is Excel$s 8A9: #unction that creates a random number between * and ;% The output o# the 8A9: #unction is then compared to the match outcomes using the #ollowing logic1
4, 5678 9 "robability of :oss
TH;7 match outcome is a loss
;:S;
4, 5678 9 <"robability of :oss = "robability of Tie>8raw
TH;7 match outcome @ tie>draw
;:S; match outcome @ win
A screenshot o# a 3ust such a setup is pro"ided below%
9ow that the analst has a random outcome assigned to e"er match in a season! how should the go about creating a Monte Carlo simulation and how man random simulations o# the season should the run'
6ast things #irst1 the answer is that .it depends/%
Utilising Pivot Tables to Roll Up Match Results 9ow #irst things last1 Microso#t Excel o##ers a solution #or running those ;*!*** simulations% =i"ot table #unctionalit within Excel is the per#ect wa to roll up the results #rom the indi"idual matches in point total! goal di##erential! and win>draw>loss outcome count% These totals are achie"ed b creating pi"ot tables with .team>club/ on the rows and either match outcome or points on the columns% In either case! the "alues within the pi"ot table are the sums o# either match outcome or points% See the example below%
The other bene#it o# using a pi"ot table is that re#reshing it is a .calculation/ within Excel! and the 8A9: #unction re,calculates each time there is a calculation elsewhere in an Excel wor&boo&% This means that ;*!*** simulated seasons can be created with the 8A9: #unction! a #ew lin&ed pi"ot tables! and less than twent lines o# ?isual BASIC
code that could be learned in a #irst,le"el computer science and consists o# do>while loops o# cop>paste commands o# the pro3ected table o# each simulated season%
:oing so should produce results that loo& li&e this1
The ;*!*** simulations o# the remaining #ixtures now must be added to the point totals! match outcomes! and goal di##erential to date% This can be done "ia Excel$s ?6--@7= command re#erencing another pi"ot table built using the results to date! and adding the returned "alue to the "alue #or the same attribute in the pro3ected results% Auto,#illing the columns with ?6--@7= commands pro"ides pro3ected "alues #or all o# the "ariables! and all that$s le#t to do is sort the results b run! then point total! then b the league$s tie brea&ers%
:oing this sort ensures data stas within the respecti"e run in which it was generated! and it pro"ides pro3ected table positions within each season%
All that’s left to generate is a likelihood of each team’s finish position, and another pivot table of table position versus team can do this. In this case the pivot table plots teams on the rows and table position in the columns and values. The pivot table’s values will need to be changed to a “count” rather than a sum (the model is measuring how man times a team is pro!ected to finish in a table position", and the “#how data as$” field should be marked as “% of row”.
The resultant pivot table should look like this$
That’s it. That is all that is re&uired to build a 'onte arlo simulation. )sers of the simulation can now update its inputs * matches plaed versus upcoming fi+tures * as fre&uentl as the like, run “what if” studies for the ne+t week’s matches, and an other variet of forecasts. The process can become highl automated and take less than minutes a week to update if special attention is paid to the +cel workbook’s construction. A person can automate even the process of combining prior matches and future fi+tures with /0112)3 and sort functions with even the most basic programming skills via +cel’s “record macro” function.
Applications of Monte Carlo Simulation 4ere are some e+amples of how this ver basic approach can be utilised in competition forecasting.
Transfer Price Index Simulations of the English Premier League Season Transfer 3rice Inde+’s m#&5 model, which utilises venue and relative s&uad costs as inputs, was used to forecast the most likel final table positions of each club on a weekl basis. This model &uantified individual match outcomes’ impacts on each team’s likel finish position ( it wasn’t !ust 'anchester )nited’s win over it in 1ctober that swung the title their wa", as well !ust how much of an advantage a club might have surrendered along the wa (see Tottenham’s 6-%7 likelihood of a Top 8 after beating Arsenal in earl 'arch and how much it fell awa over the final two9and9a9half months of the season".
MLS Eastwood Index :logger 'artin astwood created the astwood Inde+ as a wa to know where teams stand relative to each other, how results against clubs with various levels of &ualit impact a team’s rating, and how the ratings difference between two clubs can help predict future match outcomes.
This model has been applied to '0#, and the 'onte arlo simulations have been used to &uantif things like the impact the #eattle #ounders’ poor start had on the danger (or lack thereof" of not making the league plaoffs.
CONCACA !orld Cup "ualification
;inall, 'onte arlo simulations can even be used to run a “post9mortem what if” using others forecast match outcomes after the matches are completed. 1ne such source for such match forecasts are bookmaker odds. :ookmakers are looking to ma+imise their profit, so the often donorld up &ualifing.
>hile everone knows 'e+ico has struggled from match9to9match, it turns out that bookmakers onl foresaw 'e+ico’s current three points or less in -% of the aggregate outcomes contained in their forecasts. 'eanwhile, the )nited #tates’ four points puts them s&uarel within bookmaker e+pectations.
Conclusion )sing 'onte arlo simulation methods allows analsts to properl measure and model discrete events like soccer matches, and then roll the results of those discrete events up to a bigger forecast over a season or more.
'ore importantl, 'onte arlo simulation methods provide a probabilistic outlook to such forecasts, allowing the analst to e+press their level of statistical certaint (or uncertaint" in the forecast. This is ke to thinking in a nois, uncertain sport like soccer, and as this post has attempted to e+plain it’s not too comple+ an analsis to set up. All that’s needed is a probabilistic model, a tool like 'icrosoft +cel for storing results, and a bare minimum of programming capabilit.