Stats: 1- You have 2 population population and the the samples. Need to check the diference diference o proportion between the 2 populations. Ans use ! test or diference o two population proportions 2- "ow to test the correlation correlation between between two discrete variables# a. $hi s%uar s%uare e test test crame cramer&s r&s ' - intercorrelation of two discrete variables [2] and may be used with variables having two or more levels
(- )hat is multi-c multi-colli ollinear nearit*# it*# "ow "ow to test# test# a. )hen there there is +ood correlation correlation between independent independent variables it afects the re+ression model. ,t can increase the variance o the coecients o estimates and make the estimates ver* sensitive to model chan+es which results into unstable estimates and can cause them to switch si+ns. b. 'ariance ,nation ,nation actor is used used to detect multi multi co linearit* linearit*.. ',/ is the variance o coecient o estimation inated due to multi collinearit*. ',/ 0 11-3-s%uare4 c. S%rt o ',/ ',/ e5plains e5plains how much much lar+er lar+er the standard standard error is 6 compared compared with what it would be in absence o multi collineari*. d. ',/ 7 1 is consid consider ered ed hi+h hi+h.. e. 8o reduce reduce ',/6 standar standardi!e di!e the variab variables les 9- )hat )hat is 3$ 3$ $ur $urve ve# # a. 3$ 3eceiver eceiver perat peratin+ in+ $urve $urve anal* anal*sis sis b. ,t is used used to check check model valida validation tion +oodne +oodness ss o ;t c. ,t is curve curve between between sensitivit* sensitivit* 8rue speci;cit* 0 14 =- )hat )hat is is +in +inni ni inde5 inde5 a. 3atio o the area area between line line o e%ualit* e%ualit* and ?oren! curve roc roc curve4 @- )hat to do do when the the variables are not normal in in re+ression# re+ression# a. 8ransor ransormatio mation n lo+ 6 e5p etc b. o5 o5 co5 co5 trans transor ormat mation ion
B- )hat hat a. b. c.
s A,$# ,$# Akaik Akaike e inorm inormati ation on criter criteria ia 8o estimate estimate %ualit* %ualit* o each model 3elati 3elative ve estimate estimate o inorma inormation tion lost lost when a +iven +iven model is used used to represent data d. A,$ A,$ 0 2k-2 2k-2ln ln? ?44 e. C number o paramet parameters6 ers6 ? ma5imi!e ma5imi!ed d value o like likeliho lihood od unction unction . Din Din A,$ A,$ is the the bes bestt mod model el
E- $oncordance matri5# a. Percent Concordant = (Number of concordant pairs)/Total number of pairs Percent Discordance = (Number of discordant pairs)/Total number of pairs Percent Tied = (Number of tied pairs)/Total number of pairs Area under curve (c statistics) = Percent Concordant + 0. ! Percent Tied b. Concordant " #$en % is predicted % and 0 is predicted 0 (prob of event is $i&$er t$an no event) c. Discordant #$en % is predicted 0 and 0 predicted %(prob of non even is $i&$er t$an evet) d. Tied " #$en prob of % and 0 are same F- )hat is Giscriminant anal*sis# a. )hen dependent variable is cate+orical and independent variables are continuous 1H-Gif between lo+istic re+ression and Giscriminant anal*sis a. Inlike the discriminant anal*sis6 the lo+istic re+ression does not have the re%uirements o the independent variables to be normall* distributed6 linearl* related6 nor e%ual variance within each +roup
Dachine ?earnin+ 1- )hat is the diference in supervised and unsupervised learnin+# J5ample a. ,n Supervised learnin+ we have a response variable or tar+et variable classi;cation 6 re+ression b. ,n unsupervised we don&t have tar+et variable clusterin+ 2- ,s ,3,S data set in 34 data is e5ample o supervised or unsupervised# a. Insupervised it clusters the owers based on petal len+th and width4 (- )hat is random orest al+o# a. 3andom orest al+o is the multiple decision models like $"A,G6 $A38 4 on random sample to data and decision is taken b* votin+ or avera+in+ rom all the models. b. 3andom samplin+ could be done on attributes or on rows 9- )hat is a eature vector# a. N dimensional vector o eature to represent an obKect =- )hat is $"A,G al+o# a. $hi s%uare automatic interaction detector b. 8ree based decision al+o c. Al+o to create non binar* decision trees or classi;cation when dependent variable is cate+orical 4 based on chi s%uare test and or re+ression t*pe problems when dependent variable is o continuous t*pe4 usin+ / test @- $A38 # a. $lassi;cation tree anal*sis
b. nl* binar* tree can be created unlike $"A,G where more than two cate+ories tree can be created B- )hat is precision and recall# a. alse pos4 b. 3ecall 0 true pos true pos>alse ne+4 E- J5plain $entral ?imit theorem. 9- What is the diference between SVD and PCA
1H-)hen do u use actor anal*sis and when <$A. )hat is the dif in them 11-)hat is support vector machine al+o6 when do u use it 12-)hat is collaborative ;lterin+ 1(-)hat is a perceptron 19-)hen do u use anova and where anova can not be applied 1=-)h* can we not use pairs o t tests instead o anova 1@-