Modeling Profitability Instead of Default Modeling Profitability Level as a Continuous Output (Instead of Binary Classification Default/No Default) Introduction Bot your o!n "odel and te forecast based on #ggertopia scores are binary classifications classifications$ $ tey forecast forecast one of %ust t!o outco"es$ outco"es$ � Default� or � No Default& Default&� 'our boss is interested in te idea tat it "igt be preferable instead to "odel and forecast profits and losses as continuous values using a a "ultivariate linear regression "odel on te sa"e si input variables& *is idea as arisen because te ban+ as been revie!ing individual profit and loss nu"bers for eac custo"er over te tree,year period and as "ade an interesting discovery$ so"e defaulting custo"ers carried so "uc debt for so long and paid so "uc interest on it tat tey !ere profitable for te ban+ even toug tey defaulted- Many custo"ers !o see" to ave ris+y spending beaviors are also a"ong te "ost profitable for a lending business& .nd at te opposite etre"ecusto"ers !o al!ays paid off teir cards in full eac "ont never defaulted but !ere not very profitable$ te ban+ barely barely bro+e bro+e even even or even even lost lost "oney "oney on its� safest safest� borro!e borro!ers& rs& 'our boss as+s as+s you to forecast forecast eac applicant applicant� s epected profitabi profitability lity in dollarsbefore deciding !eter or not to issue te" a credit card& e !ants to +no! o! reliable tis type of forecast !ould be$ !at is te range above and belo! te point esti"ate tat !ill be correct 012 of te ti"e3 .ltoug it "igt be possible to co"bine te si inputs in oter !ays in te interests of ti"e and focusing on te +ey learning ob%ectives !e !ill use only a si"ple linear co"bination of te si input variables for Part 4 of tis Pro%ect& ('ou sould not include te #ggertopia 5cores as an input variable)& 6uestion 7 is about te coefficie coefficients nts or or � betas� used to co"bine co"bine te standard standardi8ed i8ed inputs to get te best,fit,line on standardi8ed outputs on te *raining 5et& 9e ten use tose fied betas to "easure te observed residual error of te "odel on te *est 5et& 6uestions : troug ; concern te forecasts on te *est 5et& 6uestions < troug 77 loo+ at te *raining 5et results so tat tey can be co"pared (for possible over,fitting) against te *est 5et =esults& 6uestions 7: troug 74 are about te uncertainty tat re"ains in a ne! individual forecast of profitability& >se te te #cel #cel � Linest� function function on te si si inputs inputs and and profitabil profitability ity output output on te te :11 *raining *raining 5et applican applicants ts to calculate calculate te te coefficient coefficients s (te (te � betas� ) tat tat result result in te best,fit line& 6uestion$ Do you feel prepared to ta+e tis ?ui83 'es
r
6uestion$ 6uestion$ 9at 9at are your values for eac .ge
�
beta� on te te *raining *raining 5et3
'ears at current e"ployer 'ears at current address Inco"e over te past year Current credit card debt Current auto"obile debt &17 &70 ,&1< &;4 ,&1; 1
r
@or tis ?uestion use te Liner =egression @orecasting eplanation and #cel spreadseet& 6uestion$ 9at is te root,"ean,s?uare residual (te standard deviation of "odel error) on 5tandardi8ed output for te *est 5et3 &A:1 1&;<1 r
@or tis ?uestion use te Linear =egression @orecasting #planation and 5preadseet& 6uestion$ 9at is te observed correlation = on te *est 5et3 &<01 &
@or tis ?uestion use te Linear =egression @orecasting eplanation and #cel spreadseet& 6uestion$ 9at is te 5tandard deviation of "odel error in Dollars for te *est 5et3 AA<0&A; A&74
r
@or tis ?uestion use te Linear =egression @orecasting eplanation and #cel spreadseet$ 6uestion$ 9at is te 012 confidence interval in dollars for te *est 5et3 & above te point esti"ate and & belo! te point esti"ate ;<4&7< above te point esti"ate and ;<4&7< belo! te point esti"ate ;77&07 above te point esti"ate and ;77&07 belo! te point esti"ate
9at is te Percentage Infor"ation Eain (P&I&E&) on te *est 5et3 A<&:2 7&02 :;&42 :<&<2
r
@or tis ?uestion use te Linear =egression @orecasting eplanation and #cel
spreadseet$ 6uestion$ 9at is te Correlation = of your "odel on te *raining 5et3 &<1 &<1 &10
r
@or tis ?uestion use te Linear =egression @orecasting eplanation and #cel spreadseet$ 'ou need to ?uantify te uncertainty in a regression "odel forecast of applicants� future profitability& .ssu"e tat bot te forecast profits and te errors ave a Eaussian distribution& 'ou !ill calculate te standard deviation of "odel error on standardi8ed data te standard deviation in dollars of te "odel error and te 012 confidence interval for profitability esti"ates& 6uestion$ 9at is te standard deviation of your "odel error on te standardi8ed *raining 5et output3 &<
r
@or tis ?uestion use te Linear =egression @orecasting eplanation and #cel spreadseet& 6uestion$ 9at is te standard deviation of "odel error in dollars on te *raining 5et3 FF*is "ay see" si"ilar to ?uestion but 6 refers to te *est 5et& 4A7:&07 AA<0&A;
r
@or tis ?uestion use te Linear =egression @orecasting eplanation and #cel spreadseet& 6uestion$ 9at is te 012 confidence interval in dollars on te *raining 5et3 FF*is "ay see" si"ilar to ?uestion ; but 6; refers to te *est 5et& A:&0A &
r
@or tis ?uestion use te Linear =egression @orecasting eplanation and #cel spreadseet& 6uestion$ 9at is te Percentage Infor"ation Eain (P&I&E&) on te *raining 5et3 FF*is "ay see" si"ilar to ?uestion < but 6< refers to te *est 5et& 47&42 A:&42
A;&2 A<&2
r
6uestions 7A troug 7 use te sa"e ea"ple applicant& *e follo!ing data are +no!n about te sa"ple applicant$ .ge$ 4:&11 'ears at #"ployer$ 7:&44 'ears at .ddress$ 1&0 Inco"e$ 7:7411 CC debt$ ,A4:: .uto debt$ ,:A477 *o convert above inputs to standardi8ed for" locate te *raining 5et 5preadseet (first botto" tab of !or+boo+) in te Data for @inal Pro%ect 9or+boo+& DataGforG@inalGPro%ect&ls >se te input "eans HCells C:1<$:1< and standard deviations HCells C:10$:10& >se te *raining 5et profitability "ean H701&7 and standard deviation H<&07 fro" te Profit and Loss (last botto" tab) 5preadseet& >se te *est 5et standard deviation of error on standardi8ed outputs of &;<1 6uestion$ 9at is te point esti"ate of profitability in dollars3 ,71;A&;7 71;A&;7
r
*e follo!ing data are +no!n about te sa"ple applicant$ .ge$ 4:&11 'ears at #"ployer$ 7:&44 'ears at .ddress$ 1&0 Inco"e$ 7:7411 CC debt$ ,A4:: .uto debt$ ,:A477 *o convert above inputs to standardi8ed for" locate te *raining 5et 5preadseet (first botto" tab) in te Data for @inal Pro%ect 9or+boo+& >se tose "eans HCells C:1<$:1< and standard deviations HCells C:10$:10&
>se te *raining 5et profitability "ean H701&7 and standard deviation H<&07 fro" te Profit and Loss (last tab on botto") 5preadseet >se te *est 5et standard deviation of error on standardi8ed outputs of &;<1 6uestion$ 9it 12 confidence !at is te range of profitability3 =ange fro" 7:0;:&;7 to 71;A&;7 =ange fro" 77:A&: to 04A&04 =ange fro" 7AA14&7; to 1;A&1;&
r
*e follo!ing data are +no!n about te sa"ple applicant$ .ge$ 4:&11 'ears at #"ployer$ 7:&44 'ears at .ddress$ 1&0 Inco"e$ 7:7411 CC debt$ ,A4:: .uto debt$ ,:A477 *o convert above inputs to standardi8ed for" locate te *raining 5et 5preadseet (botto" tab) in te Data for @inal Pro%ect 9or+boo+& >se tose "eans HCells C:1<$:1< and standard deviations HCells C:10$:10& >se te *raining 5et profitability "ean H701&7 and standard deviation H<&07 fro" te Profit and Loss (botto" tab) 5preadseet >se te *est 5et standard deviation of error on standardi8ed outputs of &;<1 & 6uestion$ 9it 002 confidence !at is te range of profitability3 =ange fro" :1;07&A: to ;<&01&
r
Co"paring *est 5et and *raining 5et Perfor"ance 6uestion 7$ Bet!een te *raining 5et and te *est 5et te dollar value of te standard deviation of "odel error� Decreased by about 72 !ic suggests a very strong "odel on *est 5et data& Increased by "ore tan :2 !ic suggests possible "odel over,fitting& Increased by less tan :12 !ic suggests "ini"al "odel over,fitting& r Increased by "ore tan 12 !ic leads to te conclusion of "odel over,fitting&