EEOS601,Prob&Stats Handout2 Revised:1/24/11 EDGHomepage ©E.D.Gallagher2011
STATISTICAL DEFINITIONS TABLE
OF
CONTENTS Page:
ListofFigures. ... .. ... .. ... ... .. ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. 1 MathematicalSymbols .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 2 Definitions .. ... .. ... ... .. ... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. 2 References....... Index...
................
.....
.....
.....
.....
..................
.....
....
.....
................
.....
.....
.....
................
.....
.....
.....
.....
.................
.....
....
57
....
62
ListofFigures Figure1.AgraphicaldisplayshowinghowtoidentifythetheBox-Coxtransformationparameter, ë,withdatafromDraper &Smith.isplottedvs ë.Ahorizontallineisdrawn1.92unitsbelowthemaximumlikelihoodvaluetofindthe lowerandupperconfidencelimitsfor ë(0.01and1.05here). ë=0.5indicatesthata / Ytransformisappropriate, but the9 5%CIinc ludes ë=1,indicatingnotransformationofY.The95%CIdoesnotinclude ë=0,indicating thelntransformisnotappropriate.ThisanalysiswasperformedwithanSPSSmacroonbenthicinfaunadatafrom MA Bayfittoanequalmeansgenerallinearmodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Figure2.SPSSboxplotsfromApplicationguideFigure2.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Figure3.Crapstablewithodds,fromWikipedia.Notethatoddsforrollingsevenonthenextroll,5for1,equals4-to-1
odds. .............. ................. ................ ................ ................... 13 Figure4.Table30.09showingtheANOVAmeansquarescorrespondingtothemodelshowninthepreviousequation. ....................................................................................... 14 Figure5.RAFisherfromBMJ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 22 Figure6.A.A.Markov. ... ... ... ... ... ... ... ... ... ... ... .. ... ... .. ... ... .. ... ... ... ... ... ... ... ... 30 Figure7.Ma tlabEzplotofnormaldistribution,showninaboveequation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36 FromtheUCLAportraitsofstatisticianssite .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Figure9.Theregressionellipsefromp.248inGalton(1886),postedattheUCLAstatisticshistorysite: ........................... 46 http://www.stat.http://www.stat.ucla.edu/history/regression_ellipse.gif Figure10.PlateXfromGalton(1886),postedat http://www.stat.ucla.edu/history/regression.gif ................ .. 46 Figure11.Galton’sregressiontothemean,from Freedmanetal.(1998).................................... 47 Figure12.Relationbetweentheoddsratioandrelativeriskfrom Zhang&Yu(1978,Fig1).Oddsratiooverestimates relativerisk,especiallyiftheeventiscommon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Figure13.ROCcurvefromHosmer&Lemeshow . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . . . . 49 Figure14.ROCcurvefo rPSAanti gentestf romTho mpsonetal .(2005).1-speci ficityisthef alsepos itiverate. . . . . . .50 Figure15.Stem&leafdiagramfromStatisticalSleuth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Handout2 IntroProb&Statistics TermsP.2of68
MathematicalSymbols See
http://en.wikipedia.org/wiki/Table_of_mathematical_symbols
A’ Ac
transposeofamatrixorvectorA’or,morerarely,thecomplementofaneventA c thecomplementofEventAP(A)=1-P(A) binomialcoefficient
x exp(x)e,where eisthebaseofnaturallogarithms iid identicallyindependentlydistributed ln(x) Naturallogarithmofx,tothebasee.Itmayalsoberepresentedaslog(x),butlog(x) canbealogarithmtoanybase.Base10and2arealsocommon. 0 ‘isamemberof’,‘in’Seemembership 1 intersection Logicalconjunction;ThestatementA BistrueifAandBarebothtrue;elseitisfalse. Logicaldisjunction;ThestatementA BistrueifAorB(orboth)aretrue;ifbothare false,thestatementisfalse. thenullset � union c therefore
D EFINITIONS
A priori contrastAplannedcomparison,specifiedbeforetheexperimentwasconducted,with implicationsfortheinterpretationoftestsinANOVAandotherstatisticaltests. A posteriori contrast Anyunplannedcomparisoncarriedoutaftercollectingandexamining patternsinthedata.Thesestatisticaltestsusuallyrequireanadjustmentofthealphalevel forthetestdecision.Seemultiplecomparisontests,family-wiseerrorrate. AbsorbingMarkovchain Roberts(1976,Theorem5.3)Achainisabsorbingifandonlyifit hasatleastoneabsorbingstate,andfromeverynonabsorbing(transient)stateitis possibletoreachsomeabsorbingstate.cf., Markovchain,fundamentalmatrix. Accuracy referstothedifferencebetweenthemeasuredorcomputedvalueandthetrue value;itisalsocalledthesystematicerror(cf.precision) ACEAbundance-basedcoverageestimators,aspeciesrichnessmethodreviewedbyColwell& Coddington(1994)andHughesetal.(2001). AdjustedR-squaredcf.,R-squared AkaikeInformationCriterion(AIC)Ameasureofgoodnessoffitofaregressionmodelswith astrongpenaltyforthenumberofparametersinthemodel.SeeRipleyonmodelchoice. AICisintendedfornestedmodels(withonemodelapropersubsetofanother): http://www.stats.ox.ac.uk/~ripley/ModelChoice.pdf cf.,BIC AlgorithmAsetofwell-definedstepsdesignedtoproduceanoutcome. Alphalevel TheprobabilityofTypeIerror.Analphalevelof0.05isthepragmaticcutoff adoptedbothbyFisher&NeymanandPearsontodecidewhetheraresultissignificant
Handout2 IntroProb&Statistics TermsP.3of68
ornot,butthesignificant/notsignificantdichotomyisnotrecommendedincurrent statisticalparlance. AlternativeoralternatehypothesisAhypothesisthatisoftencomplementarytothenull hypothesis.Forexample,thenullhypothesismightbe ,andthetwo-tailed(=two sided)alternatehypothesismightbe sided)alternativehypothesisthat
.Theremightalsobeaone-tailed(one.Thealternatehypothesisusuallymustbe
specifiedtocalculate TypeIIerror(â)andthepower(1-â)ofastatisticaltest. Analysisofcovariance(ANCOVA) ANOVA AnalysisofVariance.InventedbyFisher.Apartitioningofsumsofsquared deviationsfrommeansthatallowstestsfordifferencesinmeansanddifferencesin variance.ANOVAisaformofthegenerallinearmodelwithexplanatoryvariables (formerlycalledindependentvariables)orfactorsthatarecategorical.MostANOVA problemscanalsobeanalyzedasregressionproblemswithcategoriescodedasindicator ordummyvariables,butregressionalsoallowscontinuousexplanatoryvariablestobe includedinthedesign. AssumptionsA)Equalvariancesamongsubgroups(alsocalledhomogeneityofvariance orhomoscedasticity)B)Normally,identicallyindependentlydistributederrors. ForspecificANOVAmodels,therearefurtherassumptions.Forexample,inan unreplicatedrandomizedblockdesign,totestthemaineffectsthetestisbasedon theassumptionthatblockandtreatmenteffectsareadditive( i.e.,nointeraction) Whichassumptionsmatter?Unequalvarianceisamajorproblemfor ANOVA,buttheresultscanberobustifsamplesizesareequal.Wineret al. (1991,Table3.8,p102)provideanexampleofwhyequalsamplesizeis important.Thetable,adaptedfrom Glassetal.(1972),statesthatwithequal samplesizesalphalevelsareunaffected,‘Effectoná:‘Veryslighteffectoná, whichisseldomdistortedbymorethanafewhundreths.Actualáseemsalways tobeslightlyincreasedoverthenominalá’.[Butthishasbeendocumentedby otherstonotbethecasebyWilcoxandothers]Withunequaln’s,thealphalevels canbeaffected,with áincreasedifthesmallergrouphasthelargervariance. SleuthDisplay5.13(toprow)indicatesthatwithunequalvarianceTypeIerror canbemuchhigherthannominal(7.1%vs.thenominal5%)ormuchlessthan nominallevel(0.4%).Quinn&Keough(2002,p.193)reviewastudybyWilcox documentingthatunequalvariancescanaffectTypeIerror,andtheproblemis muchworsewithunequalsamplesizes.Ifthesmallergrouphasthelarger variance,theprobabilityofTypeIerrorcouldbeoneinfourorlarger. BlockedANOVA(randomizedblockANOVA) ModelI (Model1)Fixedeffectsmodel.Eachlevelofeachexplanatoryfactoris assumedtoaddafixedamounttothemean. ModelII (Model2)Random-effectsmodel.ANOVAisusedtoassesswhether differentlevelsofafactorcontributetothevariance.Forexample,inassessing plantheight,therecouldbeamongplantvarianceinheightwithinalocalpatch, anadditionalvariancecomponentduetopatch-to-patchvariabilitywithinanarea, andfinallyanadditionalvariancecomponentduedifferentareas.Infactorial models,theappropriatestatisticaltestsdependonwhetherthefactorsarefixedor
Handout2 IntroProb&Statistics TermsP.4of68
random.Ramsey&Schafer1997p.130“(1)Isinferencedesiredtoalargerset fromwhichthesegroupsareasample,inwhichcaseonemustalsobeconcerned about(2)arethegroups(operators)trulyarandomsamplefromthelargerset?A yesto(1)wouldindicatethatarandomeffectsmodelshouldbeused,butcould onlybejustifiediftheanswerto(2)wasyes.Seealsorandomeffects Mixedmodel (ModelIII)Amodelincludingbothfixedandrandomeffects.Anested orhierarchicalANOVAcanbeanexampleofamixedmodel,withtreatment effectsbeingfixedandthevariabilityamongexperimentalorsurveyunitsbeinga randomfactor.Mixedmodelscanbetreatedasa generallinearmodels, assumingnormallyandindependentlydistributederrors,withmodelparameters estimatedthroughminimizationofsumsofsquares.Mixedmodelscanalsobe analyzedusinggeneralizedlinearmodels,whichusuallyestimateparameters throughmaximumlikelihood.See http://www.statsoft.com/textbook/stvarcom.htmlforabriefdiscussionof generalizedlinearmodelingapproachestomixedmodels.SPSS’sGLM (UNIANOVA)canbeusedforagenerallinearmixedmodel,andSPSS’s program‘mixed’canbeusedforageneralizedlinearmodel.Thegeneralized linearmodelallowsanumberofdifferentwaysofhandlingthevariancecovarianceestimates. Nested(Hierarchical) Involvesmorethanoneobservationperexperimentalunit. Thedegreesoffreedommustbepartitionedintoerrorand‘experimentalunit withintreatment’sourcesofvariation.NotethatsomenestedANOVAmodels treatbothexperimentalorsurveyunitsandtreatmentlevelsasfixedfactors.A mixedmodelnestedANOVAtreatsunitsasrandomfactorsandtreatmentlevels asfixedfactors.Themaineffectofthefixedfactoristestedoverthe experimentalunitwithintreatmentmeansquare. One-way Oneexplanatoryfactororcategory Two-way Twoexplanatoryfactororcategories Factorial Twoormoreexplanatorycategories.Afullfactorialmodelissometimes calledacrossedANOVA. Randomizedblock Eachleveloftreatmentisincludedrandomlyallocatedwithineach block.Totestthefullrandomizedblockmodel,includingblockx treatmentinteraction,requiresthattherebereplicatesofeach treatmentwithineachblock. Repeatedmeasures Thesameexperimentalunits(e.g.,patients,quadrats)aresampled morethanonce(e.g.,clinicaltrialsinwhichapatientisgivenaplaceboandatest drug).Student’spairedttestwouldbeappropriateiftherewerejusttwovariables measuredoneachsubject.
Handout2 IntroProb&Statistics TermsP.5of68
Splitplot Multipletreatmentlevelsarenestedwithinalargertreatmentlevel.For example,anentirefieldcouldreceiveagivenleveloffertilizer,anddifferent wateringlevelscouldbeusedondifferentportionsofthefield.Or,different greenhousescouldbeusedtocontroltemperatureforalargenumberoftraysof plants,andthendifferentwateringlevelsandfertilizerlevelscouldbeused withindifferentareasorblocksofeachgreenhouse.TheANOVAtableisoften split,withtestsofthemainplotbeingbasedonapartitionofthedegreesof freedomofthemainplots(e.g., fieldsorgreenhouses),whereasthefactorsbeing assessedinthesubplots(e.g., waterorfertilizerlevel)canbeassessedwitherror termsincorporatingamuchlargernumberofdegreesoffreedom.Cochran& Cox(1957,p.296-297)comparesplitplotandrandomizedblocksdesignwithA beingthemainfactorandBbeingthesplit-plotfactor: 1) BandABeffectsestimatedmorepreciselythanAeffects inthesplit-plotdesign 2) Overallexperimentalerroristhesamebetweendesigns: increasedprecisiononBandABeffectsareattheexpense ofprecisionfortestsofAeffects, 3) Thechiefadvantageofthesplitplotoverthefactorialis combiningfactorsthatareexpensivetocreate(theAor mainplotfactors)withrelativelyinexpensivesubplot factors. ConsidertheuseofasplitplotdesignwhenBandABeffectsofmoreinterest thanA,oriftheAeffectscannotbefullyreplicatedwithsmallamountsof resources. Analyticalerror Inmeasurement,thereisusuallysamplingerror,andthereisalso analyticalerror.Evenifasamplehadaknownvalueforavariable(samplingerroris zero),someanalyticalmethodsintroduceerror.Theexpectedvalueofthisanalytical error,iftheinstrumentisproperlycalibrated,shouldbezerosothatprecisionisaffected butnotaccuracy (c.f.,systematicerror).[Noteadded5/15/09:Ijustdidawebsearch onanalyticalerrorandfoundthatinchemicalanalysis,Totalanalyticalerror(TAE)is definedasthesumofboththerandomandsystematicerror,sothatTAEaffectsboth accuracy andprecision]. ArcsinesquareroottransformationForsomefrequencydata,/arcsin(x)whenxranges between0and1willsometimesexpandtruncatedtailsinadistribution.Theresidualvs. predictedvalueplotindicatingtheneedforatransformlooksfootball-shaped:thickin themiddlethinatthetails.The logittransformoftenworksbetterandiseasierto interpret. ARIMAautoregressiveintegratedmoving-averagecf.,CAR,SAR,SARIMA Asymptoticrelativeefficiency[Pitmanefficiency]“Supposethat,...,thesamplesizem=nhas beendeterminedforwhichtheWilcoxontestwillachieveaspecifiedpower...Onewouldthen
wishtoknowwhatsamplesizem’=n’isrequiredbythet-testtoachievethesamepoweragainst thesamealternative.Theration’/niscalledtheefficiencyoftheWilcoxontestrelativetothettest...thelimitingefficiency,whichturnsouttobeindependentnotonlyofÐ[power]butalso ofáiscalledthePitmanefficiency(orasymptoticrelativeefficiency)oftheWilcoxontesttothe
Handout2 IntroProb&Statistics TermsP.6of68
t-test.”Lehmann(2006,p.78-80).Theasymptoticrelativeefficiencyofthesigntestis62% relativetothettestand66%relativetotheWilcoxonsignedranktest.TheWilcoxonsigned ranktesthasa94%asymptoticrelativeefficiencyrelativetothettest.Lehmann(2006,p.172). Axiomaticprobabilityseeprobability Bartlett’stest Atestforhomoscedasticityorequalityofvariances,notusedmuchnow sinceitissensitivetonormality.Levene’stestandgraphicalmethodsarepreferred. Bayes,Thomas(1702(?)-1761).ProtestantministerwhodescribedBayestheorem. Bayestheorem.AtheoremnamedaftertheEnglishministerThomasBayes,published posthumouslyin1763.ThefirstexplicitstatementofthetheoremisduetoLaplace.
OrfromRobert&Casella(1999):
Bayesianinference AschoolofstatisticsbasedonBayestheorem.Everyanalysthasaprior beliefabouttheprobabilityofagivenhypothesis&itsalternatives.Afterevaluating data,thesepriorprobabilitiescanbecombinedwiththedatatoproduceposterior probabilities.Bayesianprobabilityestimatesusuallyconvergewithpvaluesfrom statisticaltestsusedinthefrequentistschoolofstatistics.Bayesiansarguethattheir methodsaremoregeneral,andthatBayesianmethodsaremoresuitableforevaluating one-shotevents,wherelongrunprobabilitieshavelittlemeaning.cf., probability Bayesianinformationcriterion(BIC) BICstatisticusedtochooseaparsimonious multipleregressionequationcf.,Mallow’sCp,Aikakeinformationcriterion
Handout2 IntroProb&Statistics TermsP.7of68
Behrens-FisherproblemTestingthedifferencebetweenmeansorcentraltendencyof populationswithunequalvariances.Cf.,Welch’sttest,Satterthwaiteapproximation, Fligner-Policellotest Bernoullitrial Hogg&Tanis(1977,p.66)ABernoulliexperimentisarandom experiment,theoutcomeofwhichcanbeclassifiedinbutoneoftwomutuallyexclusive andexhaustiveways,saysuccessorfailure…AsequenceofBernoullitrialsoccurs whenaBernoulliexperimentisperformedseveralindependenttimessothatthe probabilityofsuccessremainsthesamefromtrialtotrial. Betadistributionhttp://mathworld.wolfram.com/BetaDistribution.html BiasThedifferencebetweenthe expectedvalueandthetruevalueofaparameter cf.,unbiased estimator BICBayesianinformationcriterion BinomialcoefficientUsedinthebinomialexpansionandincalculatingthenumberof combinations
Binomialdistribution(Larsen&Marx2001Theorem3.3.2,p.136)Consideraseriesofn independenttrials,eachresultinginoneoftwopossibleoutcomes,“success”or “failure.”Letp=P(successoccursatanygiventrial)andassumethatpremainsconstant fromtrialtotrial.LetthevariableXdenotethetotalnumberofsuccessesinthentrials. ThenXissaidtohaveabinomialdistributionandthebinomialmassfunctionis
SeealsothePoissonapproximationtothebinomial Binomialexpansion
Binomialtest One-samplebinomialtest: http://www .math.bcit.ca/faculty/david_sabo/apples/math2441/section9/singpoppropsht/sing poppropht.htm BinomialtheoremInventedbyNewton Binomialvariable Biometry Birthdayproblemhttp://www.math.uah.edu/stat/urn/urn7.html Bivariatenormaldistribution BlockingExperimentaldesigninvolvesassigningtreatmentstoexperimentalunits.Whengroups ofexperimentalunitsmaybemoresimilarthanothers,theexperimenteroftencreates
Handout2 IntroProb&Statistics TermsP.8of68
blocksofsimilarexperimentalunitswithreplicatesoftreatmentsappliedwithineach block.Acommonexamplemightbetheagriculturalexperimentinwhichthe experimentalunitsareagriculturalplots,arrayedinspace.Blockscanbecreatedbased onspatiallocation,andtreatmentsallocatedtoplotswithinspatialblocks. BonferroniAconservativemultiplecomparisonstest:testpvalue=Experimentwise alpha/numberoftests. BootstrapAMonteCarlosimulationinwhichnsamplesaredrawnfromafinitesetofsamples alargenumberoftimescf.,jackknife Box-Coxfamilyoftransformations.Box&Cox(1964)developedamaximumlikelihood methodtoestimatewhichtransformationoftheresponsevariable,Y,providedthebest fittothelinearmodelW=Xâ+å,giventhatå~N(0,Ió2).Themajortransformations (squareroot,log,inverse)canbespecifiedbyoneparameter,ë,inthefollowing transformationequation:
ToperformtheBox-Coxtransformation,valuesofëarechosenintherange-1to1and thevalueofthelikelihoodfunctionisplottedvs.lambda.Themaximumlikelihood estimateoflambdaisfound.FollowingDraper&Smith(1998),anapproximate 100(1-á)%confidenceintervalforëwhichsatisfytheinequality:
where isthepercentagepointofthechi-squareddistributionwith1df(3.84 forthe95%CI). Onehalfthisvaluecanbeusedgraphicallyinaplotof
tofindtheupperand
lower95%confidenceintervalsforlambda,asshowninFigure1. Box’sMAtestofhomogeneityofvariance-covariancematrices.
Handout2 IntroProb&Statistics TermsP.9of68
Boxplot InventedbyTukeyanddisplaying anapproximateinterquartilerange, median,rangeandextremedatapoints.A boxmarksthetheinterquartilerange (IQR)withlowerandupperlimits approximatelyequaltothe1stand3rd quartiles.Tukeydidn’tdefinetheboxesin termsofquartiles,butusedthetermhinges, toeliminateambiguity.Thereareanumber ofdifferentwaysofdefiningthe1stand3rd
quartiles,whichmarkthe25th and75th %of thecumulativefrequencydistribution. Figure1.Agraphicaldisplayshowinghow Hingesaresimplythemediansofthelower toidentifythetheBox-Coxtransformation andupperhalfofthedatapoints.Whiskers parameter,ë,withdatafromDraper& extendtotheadjacentvalues,whichare Smith. isplottedvsë.Ahorizontal actualdataoutsidetheIQRbutwithin1.5 lineisdrawn1.92unitsbelowthemaximum IQR’sfromthemedian.Pointsmorethan 1.5IQR’sfromtheIQRareoutliers.Points likelihoodvaluetofindthelowerandupper confidencelimitsforë(0.01and1.05here). morethan3IQR’sfromtheboxare ë=0.5indicatesthata /Ytransformis extremeoutliers.Seealso http://mathworld.wolfram.com/Box-and- appropriate,butthe95%CIincludesë=1, indicatingnotransformationofY.The95% WhiskerPlot.html CIdoesnotincludeë=0,indicatingtheln Brown-ForsythetestAtestforequalvariance usestestusinganANOVAontheabsolute transformisnotappropriate.Thisanalysis deviationfromgroupmedians(Availablein wasperformedwithanSPSSmacroon benthicinfaunadatafromMABayfittoan SPSSOneway).Cf.,Levene’stest. equalmeansgenerallinearmodel. Buffon’sneedleAproblemingeometric probability(http://www.mste.uiuc.edu/reese/buffon/buffon.html)
Figure2.SPSSboxplotsfromApplicationguideFigure2.7
Handout2 IntroProb&Statistics TermsP.10of68
Canonicalcorrelationanalysis Canonicalcorrespondenceanalysiscf.,redundancyanalysis Capture-recaptureexperiment CAR Conditionalautoregressionmodelcf.,SAR Cauchydistribution Causation Censuscf.,quotasampling,surveydesign CentralLimitTheoremDiscoveredbyLaplace(1811)[seeStigler1986,p.146]Seethisbrief synopsis:http://mathworld.wolfram.com/CentralLimitTheorem.html Chain SupposeG=(V,E)isagraph:AchaininGisasequenceu,e,u,e,...,u,e,u 112 2 tt t+1,where
t>0,sothateachu iisamemberofVandeacheiisamemberofEandeiisalwaysthe edge{u,u ,u,u i i+1}.Thechainisusuallywrittenu,u,... 12 t t+1. Changescoreanalysis AsCampbell&Kenny(1999)discuss,thereareseveralwaysto measuretheeffectofanintervention,sayachangeintestscoresastheresultofachange inteachingmethod:1)Theoutcomescanbecompareddirectly,2)Changescoreanalysis inwhichthepretestissubtractedfromtheposttest,3)Regressingtheposttestscoreon thepretestscore(thiscancreateanartifact). Chao1Adiversityindextoestimatespeciesrichness.ReviewedbyHughesetal.(2001)and Colwell&Coddington(1994)Cf.,ACE
Chebyshev’sinequality(Hogg&Tanis1997)IftherandomvariableXhasafinitemeanìand finitevarianceó2,thenforeveryk$1,
Chisquaredistribution Chi-squaredstatistic,inventedbyPearsonin1900(Stigler1986,p.348) ClimatefieldreconstuctionCFRApproachtoreconstructingatargetlarge-scaleclimatefield frompredictorsemployingmultivariateregressionmethods.CFRmethodshavebeen appliedbothtofillingspatialgapsinearlyinstrumentalclimatedatasets,andtothe problemofreconstructingpastclimatepatternsfrom‘climateproxy’data. http://www.realclimate.org/index.php?p=29 ClustereffectReplicatesamplesarenotindependentduetosamplesbeingcollectedin subgroupssuchaspigsinalitter(Ramsey&Schafer2002p.62) Clustersampling Multistageclustersampling
Handout2 IntroProb&Statistics TermsP.11of68
CoefficientofdeterminationR2SeeRsquared Coefficientofmultipledeterminationtheamountofvariationinaresponsevariableexplained byaregressionwithmorethanoneexplanatoryvariable CoefficientofvariationThestandarddeviation,s,dividedbythemean. Collinearityseemulticollinearity CombinationsThenumberofcombinationsofnobjectstakenratatimeis
Combinatorics ComplementLetAbeanyeventdefinedona samplespaceS.Thecomplementof A,writtenAc orA’,istheeventconsistingofalltheoutcomesofSotherthanthosecontainedin A. (Larsen&Marx2001,Definition2.2.10)Concordant Conditionalindependence Conditionalprobability ThesymbolP(A|B)—read“theprobabilityofAgivenB”---is usedtodenotea conditionalprobability.Specifically(P|A)referstotheprobabilitythat AwilloccurgiventhatBhasalreadyoccurred.
ConfidenceintervalKendall&Stuart(1979,p.199)statethattheideasofconfidenceinterval estimationareduetoNeyman,especiallyNeyman(1937). Confidencelimits foraproportion (19)
ConfoundingvariablesAvariablerelatedbothtogroupmembershipandtotheoutcome.Its presencemakesithardtoestablishtheoutcomeasbeingadirectconsequenceofgroup membership.”Ramsey&Schafer1997.Aconfoundingvariablehasnorelationtothe response,butaneffectmodifierdoes. Consistentestimatorseeestimators Contingencytable Cook’sDAdiagnosticstatisticforoutliersthatmatterinregression.Essentially,thechangein regressionparametersresultingfromthedeletionofindividualcases. Cornertest
Handout2 IntroProb&Statistics TermsP.12of68
CorrelationIntroducedbyGalton(1888)(Stigler,1986,p.297)Thecorrelationisa standardizedformofcovarianceobtainedbydividingthecovarianceoftwovariablesby theproductofthestandarddeviationsofxandy.[cf.,Pearson’sr,Spearman’sñ, Kendall’sô] biserialcorrelationcoefficientthecorrelationbetweenanartificialdichotomy(madeby imposingacut-pointona“continuous”variable)anda“continuous”variable [Burrillonsci.stat.edu] partcorrelationFromSPSSregressionuser’sguide.Thecorrelationbetweenthe dependentvariableandanindependentvariablewhenthelineareffectsofthe otherindependentvariablesinthemodelhavebeenremovedfromthe
independentvariable.ItisrelatedtothechangeinR-squaredwhenavariableis addedtoanequation.Sometimescalledthesemipartialcorrelation. partialcorrelationFromSPSSregressionuser’sguide.Thecorrelationthatremains betweentwovariablesafterremovingthecorrelationthatisduetotheirmutual associationwiththeothervariables.Thecorrelationbetweenthedependent variableandanindependentvariablewhenthelineareffectsoftheother independentvariablesinthemodelhavebeenremovedfromboth. pointbiserialcorrelation thecorrelationbetweenadichotomyanda quasi-continuousvariable[Burrill],or“The product-momentcorrelationbetweenadichotomous correlationandacontinuous(scale)variable.” Cohenetal. (2003) polychoriccorrelation“Thismeasureofassociationisbasedontheassumptionthatthe ordered,categoricalvariablesofthefrequencytablehaveanunderlyingbivariate normaldistribution.For2×2tables,thepolychoriccorrelationisalsoknownas thetetrachoriccorrelation....thepolychoriccorrelationcoefficientisthe maximumlikelihoodestimateoftheproduct-momentcorrelationbetweenthe normalvariables,estimatingthresholdsfromtheobservedtablefrequencies.The rangeofthepolychoriccorrelationisfrom-1to1.” http://www .id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap28/sect20. htmThetetrachoriccorrelationisaspecialcaseofthepolychoric. http://ourworld.compuserve.com/homepages/jsuebersax/tetra.htm Phicoefficient:correlationbetweentwodichotomies tetrachoriccorrelationcoefficientUsedwhenbothvariablesaredichotomieswhichare assumedtorepresentunderlyingbivariatenormaldistributions http://www2.chass.ncsu.edu/garson/pa765/correl.htm#tetrachoricand http://ourworld.compuserve.com/homepages/jsuebersax/tetra.htm Correspondenceanalysis,alsoknownarereciprocalaveraging.Aformofprincipal componentsanalysisdesignedtopartitionanddisplaythevariationofachi-square metric.Thereareatleast5differentwaysofscalingthedisplays(seeGreenacre1984 , Legendre&Gallagher2001[especiallynotestoGallagher’sMatlabprogramsthat accompanythepaper]) Countablyinfinite(Larsen&Marx2001,p37footnote).Asetofoutcomesiscountably infiniteifitcanbeputinone-to-onecorrespondencewiththepositiveintegers.
Handout2 IntroProb&Statistics TermsP.13of68
Covarianceameasureofassociationbetweentwovariables;covarianceisthemeanofthecross productsofthecentereddata.Itcanalsobedefinedastheexpectedvalueofthesumof crossproductsbetweentwovariablesexpressedasdeviationsfromtheirrespectivemean. Thecovariancebetween z-transformedvariablesisalsoknownasthecorrelation. CoxproportionalhazardmodelSeeCoxregression Coxregression“Coxregressionoffersthepossibilityofamultivariatecomparisonofhazard rates(Hazardratios).However,thisproceduredoesnotestimatea“baselinerate”;it onlyprovidesinformationwhetherthis‘unknown’rateisinfluencedinapositiveora negativewaybytheindependentvariable(s)(orcovariates).” http://www.lrz-muenchen.de/~wlm/wlmscox.htm
FromtheSPSShelpfile:“LikeLifeTablesandKaplan-Meiersurvivalanalysis,Cox Regressionisamethodformodelingtime-to-eventdatainthepresenceofcensored cases.However,CoxRegressionallowsyoutoincludepredictorvariables(covariates)in yourmodels.Forexample,youcouldconstructamodeloflengthofemploymentbased oneducationallevelandjobcategory. CoxRegressionwillhandlethecensoredcases correctly,anditwillprovideestimatedcoefficientsforeachofthecovariates,allowing youtoassesstheimpactofmultiplecovariatesinthesamemodel.YoucanalsouseCox Regressiontoexaminetheeffectofcontinuouscovariates.”Seealso: http://www.statsoft.com/textbook/stsurvan.html CrapsThemostpopulargameplayedonlywithdice.Theshootermakesabet,calledthecenter bet,andotherplayers‘fade’thebetorbetagainsttheshooter.Theshooterrollsapairof dice.Ifthesumofthediceis7or11,calleda natural,theshooterwinsimmediately,if 2,3,or12isrolled,calledcraps,theshooterimmediatelyloses.Iftheshooterrollsa4, 5,6,8,9,or10,thatnumberbecomeshispoint.Herollsthediceagainuntilheshoots thesamenumberagain,calledmakingone’spointorherollsasevenorcraps outor sevens out.Therearedozensofsidebetsthatcanbemadeontheeventualoutcomeor theoutcomeofasingleroll.Inacasino,allbetsareagainstthehousewithbetsbeing placedonacrapstable(SeeFig.3)c.f. odds
Figure3.Crapstablewithodds,fromWikipedia.Notethatoddsfor rollingsevenonthenextroll,5for1,equals4-to-1odds.
Handout2 IntroProb&Statistics TermsP.14of68
CriticalregionHogg&Tanis(1977,p.255).ThecriticalregionCisthesetofpointsinthe samplespacewhichleadstotherejectionofthe nullhypothesisH.Therejectionregion o foranullhypothesisiscalledthecriticalregionandthecutoffiscalledthecritical value.Theseconceptsareassociatedwiththe Neyman-Pearsonschoolofstatistical inferencecf.,teststatistic CrossoverdesignEachsubjectreceivesmorethanonetreatmentlevel,andtheorderofthe treatmentisusuallyrandomlyassigned.Crossoverdesignscanbeanalyzedwitheither univariateormultivariaterepeatedmeasuresanalyseswithtreatmentorderasabetween subjectfactor.Cochran&Cox(1957,p.127-142)describemodificationsofLatin Squareanalysisappropriateforseveraldifferenttypesofcrossoverdesign(usingcow
milkproductionastheresponseanddietasthetreatmentfactor). Neteretal.(1996,p. 1225)refertocrossoverdesignsaslatinsquarechangeoverdesigns,‘oftenusefulwhena latinsquareistobeusedinarepeatedmeasuresstudytobalancetheorderpositionsof treatments,yetmoresubjectsarerequiredthancalledforbyasinglelainsquare.’Neter etal.(1996)providethemodelandexpectedmeansquareANOVAtable. ...arelativelysimplemodelcanbedeveloped... ñidenotesthe effectoftheithtreatmentorderpattern,êjdenotestheeffectofthe jthorderposition,ôkdenotestheeffectofthe kthtreatment,and çm(I) denotestheeffectofsubjectmwhichisnestedwithintheith treatmentorderpattern:
Neteret.al(1996,p.1226)showhow totestacrossoverdesignwithan ANOVAmodeltodistinguish treatment,orderandpatterneffects(see Figure4).
Dallal (http://www.tufts.edu/~gdallal/crosso vr.htm)reviewsthestrengthsand Figure4.Table30.09showingtheANOVA limitationsofcrossoverdesigns.The increasedprecisionofcrossoverdesign, meansquarescorrespondingtothemodel duetoeachsubjectservingasitsown showninthepreviousequation. control,isvitiatedbytheneedtohave subjectsparticipatealongerperiodoftimeandtheneedtoaccountforcarryovereffects. Mead(1988,p197),likeNeteretal.(1996)describescrossoverexperimentsasaform
Handout2 IntroProb&Statistics TermsP.15of68
ofLatinsquarewithtimeasablockingfactor.Crossoverdesignsallowatremendous gaininprecisionbyreducingtheeffectsofamongpatientvariabilitywhilealsorequiring fewersubjectsthancompletelyrandomizeddesignstoattainsimilarrelativepower efficiency .Mead(1988,p198)discussesthedifficultyinrelevance, The difficulty with the cross-over design is that the conclusions are appropirate to unis similar to those in the experiment; that is , to subjects for a short time period in the context of a sequence of different treatments. We have to ask if the observed difference between two treatments would be expected to be the same if a treatment is applied consistently to each subject to which it is allocated. This is a problem of interpretation of
results from experiment to subsequent use, and it is a problem which must be considered in all experiments. It is particularly acute in cross-over designs, because the experiment is so different from subsequent use. After all, no farmer is going to continually swap the diets for his cattle! DataminingLookingforpatterningiganticdatasets DegreesoffreedomThenumberoftruereplicatesminusthenumberofmodelparametersthat mustbeestimatedfromthedata.Stigler(1986,p.348)statesthatthetermdegreesof freedomwas.notformallyintroduceduntil1922whenFisherintroducedtheterm.Here isapdfofWalker(1940),describingthehistoryandgeometricinterpretation: http://courses.ncssm.edu/math/Stat_Inst/PDFS/DFWalker.pdf DeMorgan’sLaws(Larsen&Marx2001,p.29)LetAandBbeanytwoevents.The complementoftheirintersectionistheunionoftheircomplements:
thecomplementoftheirunionistheintersectionoftheircomplements:
DevianceCalculatedfromtheloglikelihoodstatisticingenarlizedlinearmodels.Changein deviancecanbeusedtotestthegoodnessoffitofageneralizedlinearmodel(withthe chi-squaredistribution)andthechangeindeviancepermitsatestbetweenfull&reduced hierarchicalgeneralizedlinearmodels. Agresti(1996,p96):LetLMdenatethe maximizedlog-likelihoodvalueforthemodelofinterest.LetLSdenotethemaximized log-likelihoodvalueforthemoxtcomplexmidel,whichhasaseparateparameterateach explanatorysetting:thatmodelissaidtobesaturated.Thedevianceofamodelisdefined tobe:Deviance=-2(LM-L). S DFFITS Discriminantanalysis Disjoint Distributions beta binomial bivariatenormal Cauchy chi-square
Handout2 IntroProb&Statistics TermsP.16of68
empirical exponential F gamma geometric Gompertz hypergeometric lognormal multinomial negativebinomial normal Poisson posterior Student’st Weibull DoublymultivariatedesignsAformofprofileanalysisinwhichseveraldifferentresponse variablesaremeasuredatseveraldifferenttimes(Tabachnick&Fidell2001,p423) Duncan’stestAmultiplecomparisonstest DummyvariablesAlsocalledindicatorvariables.Variablesmadeupofzerosandones. DummyvariablesplayakeyroleinANOVAanalysisusingregression.Adiscrete(or categorical)variablewith8levelscanbecodedforwith8dummyvariables.Inleast squaresregression,oneofthesedummyvariablesisleftoutoftheregressionequation andbecomesthe referencelevel .Therearetwocommonwaystocodedummyvariables forregression,thefirstusing0'sand1'sandthesecondusing0's,1'sand-1's.Theformer approachisthemostcommon. Dunn’stestAmultiplecomparisonprocedure[MCP]“TheDunnmultiplecomparison procedureisbasedontheuseofthe tdistributionwithCcomparisonsthatareplanned. Notonlydoyouknowthenumberofcomparisonsbeforetheresearchisdone,youalso knowwhichcomparisonswillbecomputed.”Toothaker(1993,p.31) Dunnet’sttestA posterioricomparisonofcontrolvs.treatments. Durbin-WatsontestAtestforserialcorrelation e(mathematicalconstant)http://www.answers.com/topic/e-mathematical-constant,cf., naturallogarithms Ecologicalfallacy [Ecologicalinferenceproblem]Errorinpredictingindividualbehavior fromaggregatedata.IntroducedbyRobinson(1950)andperhapssolvedbyKing(1997). Kingdescribestheproblemasofteninvolvingtryingtoestimatethecellfrequenciesof anrxccontingencytable,knowingonlythemarginaltotals.Theproblemcf.,Simpson’s paradox Edge Inagraph,thelineconnectingtwo vertices.Itcanberepresentedasanunorderedpair ofvertices{u,v}.IfGisthematrixrepresentationofthegraph,thereisanedge connectingtwoverticesuandviftheGuv andGvuelementsare1.
Handout2 IntroProb&Statistics TermsP.17of68
E(S) (Sanders1968,Hurlbert1971)Hurlbert-SandersexpectednumberofspeciesE(S). n n Hurlbert,usingformulaeforthehypergeometricprobabilitydistribution,correctedthe algorithmdescribedbySandersforestimatingthenumberofspeciesfoundinarandom subsampleofsizenfromasample.
EffectmodicationAfactor,Z,issaidtobeaneffectmodifierofarelationshipbetweenarisk factor,X,andanoutcomemeasure,Y,ifthestrengthoftherelationshipbetweentherisk factor,X,andtheoutcome,Y,variesamongthelevelsofZ.Afactor,Z,issaidto confoundarelationshipbetweenariskfactor,X,andanoutcome,Y,ifitisnotaneffect modifierandtheunadjustedstrengthoftherelationshipbetweenXandYdiffersfrom thecommonstrengthoftherelationshipbetweenXandYforeachlevelofZ.More complicateddefinitionsallowforafactortobebothaneffectmodifieranda counfounder.IfZisaneffectmodifier,thenitisimportanttoreportthestrengthofthe X-YrelationshipforspecificvaluesofZ.IfthestrengthoftheX-Yrelationshipdoesnot varygreatlyamongthelevelsofZ,itmaynotbeimportanttoaccountfortheeffect modification.IfZisaconfounder,thenitiscommontoreportboththestrengthofthe unadjustedX-YrelationshipandthestrengthoftheadjustedX-Yrelationship.Ifthe adjustedandunadjustedstrengthsdonotdiffergreatly,thenitmaynotbeimportantto reportboth....“Effectmodification:...TheeffectofHighdosecyclosporin(cs)on transplantfailureismodifiedbytypeoftransplant.”“Confounding:Theeffectof treatmentonpatientsurvivalisconfoundedbyage.” http://www-personal.umich.edu/~bobwolfe/560/review/kkm13confoundeffectmodify. txt Cf.,mediation EfficientestimatorseeEstimators EMalgorithmexpectation-maximization(EM)algorithm,cf.,maximumlikelihood http://ww w.mathdaily.com/lessons/Expectation-maximization_algorithm EMAP
EPA’sEnvironmentalmonitoringandassessmentprogram
Handout2 IntroProb&Statistics TermsP.18of68
EmpiricaldistributionfunctionHogg&Tanis(1977,p86)Letx1,x2,…,xndenotethe observedvaluesoftherandomsampleX,X,…,X 1 2 nfromadistribution. LetN({x: i xi #x})equalthenumberoftheseobservedvaluesthatarelessthanorequal tox.Thenthefunction
definedforeachrealnumberx,iscalledthe empiricaldistributionfunction. Empiricalorthogonalfunctionanalysis(EOF)Amodificationofprincipalcomponents analysisthatiswidelyusedinphysicaloceanography&meteorology.Ramsey& Schafer(2002p.519-520)provideatoo-briefdescription.Spatialpatterntiedtoa particularmodeoftime/spacevarianceinaspatiotemporaldataset(seealso Principal ComponentsAnalysis).http://www.realclimate.org/index.php?p=25 EmpiricalruleLarsen&Marx ErgodicMarkovchainAMarkovchainiscalledergodicifitstransitiondigraphisstrongly connected(i.e.,everystatecanreacheveryotherstate).Thechainisaregularergodic Markovchainifthereisanumberksuchthateverystatecanreacheveryotherstatein exactlyksteps(Roberts1976,289-290).AMarkovchainisregular ifandonlyifitis possibletobeinanystateaftersomenumberNofsteps,nomatterwhatthestarting state,Thatis,ifandonlyifPNhasnozeroentriesforsomeN(Kemeny&Snell,1976). Gondran&Minoux(1984,p.20)provideagraph-theoreticdefinition.EachMarkov chaincanbeassociatedwitha transitiongraphwhichconsistsofNvertices arc(i,j)ifandonlyif correspondingtothestates,twoverticesiandjbeinglinkedbyan P>0.Ifthetransitiongraphisconnectedandnotperiodic( i.e.,thelargestcommon ij factorofthelengthsofallthecircuitspassingthroughavertexequals1).IfGRisthe reducedgraphandastrongcomponenthasoutdegreegreaterthanzero,thenthat componentisatransientsubsetofthegraph.Iftheoutdegreeofastrongcomponentis0, thenthatstrongcomponentisarecurrent(ergodic)subset.Ifthegraphisnotperiodic andcontainsonly1recurrent(ergodic)class(i.e.,thereducedgraphisstrongly connected),thenthesystemiscompletelyergodic. Error Estimate Astatisticusedasaguessforthevalueofaparameter.Estimatescanbe calculated,butparametersremainunknown( Ramsey&Schafer1997,p.20)
Handout2 IntroProb&Statistics TermsP.19of68
Estimators
ThissectionisfromHarman(1976).
consistentestimatorAnestimator issaidtobeconsistentifitconverges(ina probabilisticsense)tothetrueparameterasthesampleincreaseswithoutlimit,
i.e.,
.
Hogg&Tanis(1977Definition7.5-2)ThestatisticY=u (X,X,…,X)isa consistent 12 n estimatorofèif,foreachpositivenumberå,
efficientestimatorAnestimatorissaidtobeefficientifithasthesmallestlimiting variance.Whenanestimatorisefficientitisalsoconsistent. minimumvarianceunbiasedestimatorGivenachoicebetweentwounbiased estimators,theonewithminimumvarianceispreferred.Forexample,Draper& Smith(1998)notethatwhileOLSandWLSregressionprovideunbiased estimatorsoftheregressionparameters,thevarianceoftheWLSestimatorswill belowerifthevarianceoftheregressionareasareheteroscedastic. sufficientestimatorAnestimatorissaidtobesufficientifitutilizesalltheinformation inthesampleconcerningtheparameter. unbiasedestimator Iftheexpectedvalueoftheestimatoristhetrueparameter, i.e.,
,thentheestimatorisunbiased. “While it is of some advantage to devise an unbiased estimate, it is not a very critical requirement. The method of maximum likelihood is a well established and popular statistical method for estimating the unknown population parameters because such estimators satisfy the first three of the above standards. Not all parameters have sufficient estimators, but if one exists the maximum likelihood estimator is such a sufficient estimator (Mood and Graybill 1963, p. 185). However, a maximumlikelihood estimator will generally not be unbiased. (By getting the expected value of such an estimator, an unbiased statistic can be derived). This method yields values of the estimators which maximize the likelihood function of a sample.” Harman(1967,p. 212-213) Expectedvalue Hogg&Tanis(1977,p53)If f(x) is the probability density function of
the random variable X of the discrete type with space R and if the summation
exists, then the sum is called the mathematical expectation or the expected value of the function u(x), and it is denoted by E[u(X)]. That is,
Handout2 IntroProb&Statistics TermsP.20of68
TheoremWhenitexists,mathematicalexpectationEsatisfiesthefollowingproperties: i)IfcisaconstantE(c)=c,ii)Ifcisaconstantanduisafunction,E[cu(X)]=cE[u(X)], iii)Ifc1andc 2areconstantsandu1andu2arefunctions,then E[cu(X)+cu(X)=cE[u(X) 1 1 2 2 1 1 ]+cE[u(X)],and 2 2
ExperimentThefundamentaldifferencebetweena survey (observationalstudy,census)and anexperimentisthatthesamplingunitsinanexperimentcanberegardedasbeingdrawn
fromaninfinitepopulation: “The distinction between the design of experiments and the design of sample surveys is fairly clear-cut, and may be expressed by saying that in surveys we make observations on a sample taken from a finite population of individuals, whereas in experiments we make observations which are in principle generated by a hypothetical infinite population, in exactly the same way that the tosses of a coin are. Of course, we may sometimes experiment on the members of a sample resulting from a survey, or even make a sample survey of the results of an (extensive) experiment, but the essential distinction between the two fields should be clear.” Kendall&Stuart1979 “Byexperimentwewillmeananyprocedurethat(1)canberepeated,theoretically,an infinitenumberoftimes;and(2)hasawell-definedsetofpossibleoutcomes” (Larsen&Marx2001p.21) “Acornerstoneofthescientificprocessistheexperiment.Ecologistsinparticularusea widevarietyoftypesofexperiments.Weusetheterm“experiment”hereinitsbroadest sense:atestofanidea.Ecologicalexperimentscanbeclassifiedintothreebroadtypes: manipulative,natural,andobservational.Manipulative,orcontrolled,experimentsare whatmostofusthinkofasexperiments:Apersonmanipulatestheworldinsomeway andlooksforapatternintheresponse.....Naturalexperimentsare“manipulations” causedbysomenaturaloccurrence.....Observationalexperimentsconsistofthe systematicstudyofnaturalvariation.”Gurevitch,J.,S.M.ScheinerandG.A.Fox.2002. TheEcologyofPlants.SinauerAssociates,Sunderland,Massachusetts. Experimentaldesigncf.,orthogonalarrays Experimentwiseerror(orexperiment-wiseorfamily-wiseerror)Theerrorassociatedwith rejectingoneormoretruenullhypothesesinanexperiment.Ifalpha[á]isthe probabilityofTypeIerrorforasingletestandntestsareperformedontheresultsof theexperiment,then .Forexample,the experimentwiseerrorlevelifeachof10independenttestsisperformedatalpha=0.05is 40.1%.Variousmultiplecomparisonstestshavebeendesigned,someofwhichcontrol forexperimentwisealphalevel.Family-wiseerrorrate . ExplanatoryvariableAvariableusedtopredictthevalueofaresponsevariable,usuallyina regressionmodel.Sometimescalledan independentvariable,butthisisapoorterm, sincethesevariablesarerarelyindependentoftheresponsevariableorotherexplanatory variables.cf.,responsevariable ExponentialdistributionWaitingtimesforaprocessthathasPoissondistributedrates http://mathworld.wolfram.com/ExponentialDistribution.html
Handout2 IntroProb&Statistics TermsP.21of68
Extrasumofsquaresprinciplehttp://www.tufts.edu/~gdallal/extra.htm FalsepositiveSeesensitivity FdistributionnamedbyGeorgeSnedecorinhonorofR.A.Fisher F-testAratioofvariancesormeansquareswithexpectedvalueofunity,testedwiththeF distributiongivennumeratoranddenominatordegreesoffreedom. FactorAnalysis(FA)AtermcoinedbySpearman(1904).ThegoalofPCAistoaccountforas muchvarianceinthedataaspossible,whereasthegoalofFAistoaccountforthe covariancebetweendescriptors(variables).Factoranalysisassumesthattheobserved descriptorsarelinearcombinationsofhypotheticalunderlyingvariables(orfactors). Factoranalysiscanbedividedintotwotypes:Exploratoryfactoranalysisand Confirmatoryfactoranalysis.FactorAnalysisisprimarilydirectedtowardstheanalysis ofcovariationamongdescriptors,sothatwithmostmodels,therelativepositionsof samplescannotbereadilydetermined,butmethodsdoexisttoestimatetheorientation ofsamples(termedfactorscalesorfactorscores)infactorspace.Inconfirmatory factoranalysis,specificexpectationsaboutthenumberoffactorsandtheirloadingscan betested. ConfirmatoryFactorAnalysis factoranalysisinwhichspecificexpectations regardingthenumberoffactorsandtheirloadingsaretestedonsampledata. (Kim&Mueller(1978),Legendre&Legendre(1998)) ExploratoryFactorAnalysisnoa priorispecificationofthenumberoffactorsor loadings FactorscoreRmode:theestimateforacaseonanunderlyingfactorformedbyalinear combinationofobservedvariables.Qmode:theestimateforavariableonan underlyingfactorformedbyalinearcombinationofobservedcases. FactorloadingsTheelementsoftheeigenvectorsarealsotheweightsorloadingsofthe varioussrcinaldescriptors.Iftheeigenvectorshavebeennormalizedtounit length(i.e.,thesumofthesquaredloadingsforavariableacrossfactorsequals 1.0),thentheelementsoftheeigenvectormatrix(theloadings)aredirection cosinesoftheanglesbetweenthesrcinaldescriptorsandtheprincipalaxes.So thatiftheelementoftheUvector(theloadingforavariable)is.8944,theangle iscos-1(.8944)=arccos(.8944)=26o(LegendreandLegendre1983).The principalcomponentaxisisrotated26ºfromthesrcinalaxis.Forthisreason,the factorloadingsaresometimescalleddirectionalcosines. obliquefactorrotationInorthogonalrotationsthecausalunderlyingfactorsarenot permittedtobecorrelated,whileinobliquerotationsthefactorscanberotated. Thegeometricrelationshipsamongvariablesinordination2-spaceisgreatly alteredwithobliquerotations.Forexample,onecannolongerassumethat descriptorsplottedatrightanglesrelativetothesrcinareuncorrelated.Themain virtueofobliquerotationsisinnamingaxesorfactorsandusingfactorsas explanationsratherthanasdescriptions.Legendre&Legendre(1983;Fig8.13; p.308)describetherelationshipbetweenobliquefactorrotationsandpath
analysis.Obliquerotationsmaybeneededwhencommonfactorsarecorrelated. orthogonalfactors factorsthatarenotcorrelatedwitheachother. Factorial n!ispronounced“nfactorial”
Handout2 IntroProb&Statistics TermsP.22of68
Incalculationswithlargen,thenaturallogofthegammadistribution(Ã)isusually used:nfactorial=exp(gammaln(n+1)) Family-wiseerrorTheerrorassociatedwithrejectingoneormoretruenullhypothesesinan observationstudyorexperiment.Ifalpha(
)istheprobabilityofTypeIerrorfora
singletestandnindependenttestsareperformedontheresultsoftheexperiment,then .Forexample,theexperimentwiseerrorlevelifeachof10 independenttestsisperformedatalpha=0.05is40.1%.Variousmultiplecomparisons testshavebeendesigned,someofwhichcontrolforfamily-wiseorexperimentwise alphalevel,synonymouswithExperimentwiseerrorrate. Fixedeffectscf.,randomeffects Fixedpointprobabilityvector (=stationaryvector,Limitingvector) Thelefteigenvector(ifthetransitionmatrixisin“fromrowstocolumnsform”) associatedwiththedominanteigenvalueofanergodicMarkovchainprocess.The dominanteigenvalueis1.0forergodicMarkovchains. PierredeFermat(1601-1665)WithPascal,thefatherofthemathematicaltheoryof probability(Bell1937,p.87). Fisher,SirRonaldA.,discovererofmaximumlikelihood,discriminant analysis(withBurt),andANOVA.HisGeneticsofNatural Populationslaidthefoundationforquantitativepopulation genetics.Hisstatisticsforexperimenterslaidthefoundationfor experimentaldesign.Fisherisoneofthefathersofthe frequentistschoolofstatistics,theothersbeingJerzyNeyman andEgonPearson.Fisherintroducedmanyoftheteststatistics andadvocatedtheuseofpvaluesinjudgingthesignificanceof results.Neyman&Pearsonintroducedcriticalvaluesand confidencelimits.Thefrequentistphilosophyofstatisticsdiffers Figure5.RAFisher fromtheBayesianphilosophyofstatistics. fromBMJ Fisher’sexacttestAtestfor2x2tables,designedforhypergeometric distributions,butwidelyapplicabletoother2x2problems. Fisher’ssigntest Adistribution-freetestforpaireddata,analogoustothepairedttest.The numberofpositive(ornegative)differencesiscomparedtoexpectations fromthebinomialdistribution. Fligner-PolicellotestArank-basedtestfordifferencesincentraltendencyfortwoindependent sampleswithunequalvariances.Note,thattheWilcoxonranksumtestassumesequal variances.OnMatlabCentralfileexchange,Trujillo-Ortizetal.havepostedfptest.m, whichimplementstheFligner&Policello(1981)test,whichisdescribedinHollander &Wolfe(1999).Cf.,Behrens-Fisherproblem,Wilcoxonrank-sumtest ForwardselectionOneofmanyautomaticselectionproceduresinmultivariateregression.The explanatoryvariablewiththehighestcorrelationwiththeresponsevariableisenteredin
Handout2 IntroProb&Statistics TermsP.23of68
theequationfirst,andtheexplanatoryvariablewiththehighestpartialcorrelationwith theresponsevariableisenterednext,andsoon. Friedman’stestAnon-parametric2-wayANOVA,specificallydesignedforrepeatedmeasures problems.cf.,Kruskal-WallisANOVA Frequentisttheoryofstatisticalinference Thisisthetraditionalmodelofstatisticalinference, developedinthe20 thcenturybyRAFisherandNeyman&Pearson.Statisticaltestsare performedwithanassumedprobabilitymodel.Thepvalueistheprobabilitythatan observedevenoronemoreextremewouldhavebeenobservediftheassumedprobability modelandassociatednullhypothesiswastrue.Neyman&Pearsonintroducedtheuseof criticalvaluesandconfidencelimitsfordescribingtheresultsofstatisticaltestscf., Bayesianinference FundamentalmatrixForanabsorbingMarkovchain,Nisthefundamentalmatrixandis foundastheinverseoftheidentitymatrixminusthetransitionsamongthenon-absorbing states(theQsubmatrix):N=(I-Q)-1.SeeKemeny&Snell1976 Galton,Francis(1822-1911)Darwin’sbrotherinlawwhocoinedthetermregressioninthe contextofdescribingregressiontothemean http://en.wikipedia.org/wiki/Francis_Galton GammaAmeasureofassociation“Theestimatorofgammaisbasedonlyonthenumberof concordantanddiscordantpairsofobservations.Itignorestiedpairs(thatis,pairsof observationsthathaveequalvaluesofXorequalvaluesofY).Gammaisappropriate onlywhenbothvariableslieonanordinalscale.Ithastherange.Ifthetwovariablesare independent,thentheestimatorofgammatendstobeclosetozero.” http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap28/sect20.htm Gammadistribution Gammafunction
GAMSGeneralizedadditivemodels Gauss,CarlFriedrichArguablythemostinfluentialmathematicianofalltime,butStiglerargues thatheshouldnotbegivencreditforthenormalcurve,whichissometimescalledthe Gaussiancurvecf.,Stigler’slawofeponymy Gaussiancurveseenormalcurve Generalizedadditivemodels(GAMS)AccordingtoLeathwick&Austin(2001,p2562), GAMSareanextensionofgeneralizedlinearmodels‘whichofferamorerealistic approachtotheanalysisofecologicaldatainthatcomplexrelationshipsbetween preditorandresponsevariablescanbeaccommodatedinanonparametricmanner throughuseofscatter-plotsmoothers,ratherthanusingmoreinflexibleparametricterms asinGLM’s.’
Handout2 IntroProb&Statistics TermsP.24of68
Generalizedleastsquares(GLS)Thegenerallinearmodelis
whereyisann x1matrixofobservations,âisapx1matrixofregressionparameters,X isannxpmatrixofexplanatoryvariables,andåisannx1vectorofresiduals.General leastsquaresmodelingassumesthattheerrorsareindependently,identically,normally distributed(å -N(0,ó2 I)),leadingtothenormalequationsolutionforâ
Generalizedleastsquarespermitsabroaderarrayofvariance-covariancematricesforthe error(å -N(0,')),where'isasymmetric,positive-definitevariance-covariance matrix.
Onesimpleformofgeneralizedleastsquaresanalysisisweightedleastsquares regression. Generallinearmodel Regression,especiallyregressionusingindicatoror‘dummy’ variablesforcategoricalexplanatoryvariablesisoneformofthegenerallinearmodel. McCulloch&Searle(2001,p.1)describetheessenceofthegenerallinearmodel, “...themeanofeachdatum[is]takenasalinearcombinationofunknownparameters..., andthedata[are]deemedtohavecomefromanormaldistribution...Themodelislinear intheparameters,so‘linearity’alsoincludesbeingoftheform wherethexsareknownandtherecanbe(andoftenare)morethan twoofthem.” ANOVAisasubsetofthegenerallinearmodelandregression,and almostallANOVAproblems,canbeanalyzedasregressionproblems,althoughsome problems(suchasModelIIANOVAormixedmodelANOVA)arebetterhandledas ANOVAproblemssinceitisoftendifficulttodeterminetheappropriateerrortermfor testinghypothesesregardingrandomeffectsinaregressioncontext. Generalizedlinearmodel“AGeneralizedLinearModel(GLM)isaprobabilitymodelinwhich themeanofaresponsevariable,orafunctionofthemean,isrelatedtoexplanatory variablesthrougharegressionequation.”Ramsey&Schafer2002p584(or1997p. 568)Thedataarenotnecessarilyassumedtobenormallydistributed.Probitanalysis (appropriatefortypesofsurvivaldata),tobitanalysis(forcensoreddata),andlogistic regression(binaryandbinomial),andPoissonloglinearregressionareregardedastypes ofgeneralizedlinearmodel.
Handout2 IntroProb&Statistics TermsP.25of68
Geometricdistributionhttp://mathworld.wolfram.com/GeometricDistribution.htmlThe probabilityfunctionis
GeometricseriesSumofageometricseries,where0<x<1:
or(fromAbromowitz&Stegun,1965)
Note,Nahin(2002)solvesseveralproblemsinprobability,including‘TheDuelling Idiots’problemofthetitleusingthesumofconvergentgeometricseries.Often, absorbingMarkovchainscanbeusedtomodeltheseproblems. Gompertzdistribution Goodnessoffit Gosset,W.S.DevelopedStudent’stdistributionin1908 GLMGeneralizedlinearmodel GLS Generalizedleastsquares Hazardrate,hazardratios http://www.weibull.com/AccelTestWeb/proportional_hazards_model.htm Heteroscedasticity Unequalvarianceorunequalspread cf.,homoscedasticty Homoscedasticity Equalvarianceorequalspread cf.,heteroscedasticity Hotelling’sT2Amultivariatetestforthedifferenceinlocation.Itisageneralizationofthe 2 univariateStudent’sttest.SPSSprintsHotelling’strace,whichisT/(N-1). Hurlbert’sE(S) Theexpectednumberofspeciesfromarandomdrawofnindividualsfroma n sample(Hurlbert1971)
Handout2 IntroProb&Statistics TermsP.26of68
Hypergeometricdistribution Hypethetico-deductivemethod ICAIndependentcomponentsanalysis,usedtosolve‘thecocktailpartyproblem’ http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi http://www.cis.hut.fi/projects/ica/fastica/ Seehttp://ica2001.ucsd.edu/index_files/pdfs/115-hundley.pdf Independence (Larsen&Marx2001,p.70Definition2.7.1)Twoeventsaresaidtobe independentifandonlyifP(A1B)=P(A)@P(B),otherwiseAandBaredependent events.Formorethantwoevents:(Larsen&MarxDefinition2.7.2)EventsA,A,…, 1 2 Anaresaidtobeindependentifforeverysetofindicesi,i,…,i 1 2 k between1andn, inclusive,
TheoremfromHogg&Tanis(1977,p.42)IfAandBareindependentevents,thenthe followingpairsofeventsarealsoindependent:(i)AandB’,(ii)A’andB,(iii)A’andB’. [A’isthecomplementofA] Independenttrialsprocess Theoutcomeoftheprocessisunaffectedbyearlierevents.cf., Bernoullitrial. InferenceAninference isaconclusionthatpatternsinthedataarepresentinsomebroader contextAstatisticalinferenceisaninferencejustifiedbyaprobabilitymodellinkingthe datatothebroadercontext Ramsey&Schafer(1997) Inter-quartilerangeSeeboxplots Intersection(Definition2.2.1fromLarsen&Marx2001,p.24)LetAandBbeanytwoevents definedoverthesamesamplespaceS.Then ! TheintersectionofAandB,writtenA 1 B,istheeventwhoseoutcomesbelong !
tobothAandB. TheunionofAandB,writtenA c B,istheeventwhoseoutcomesbelongto eitherAorBorboth.
Handout2 IntroProb&Statistics TermsP.27of68
JackknifeIna1-samplebootstrap,1sampleisdroppedfromasampleofnandthestatistical testrepeated.nsamplesproducesnjackknifedsamples.Usually,thevalueofthetest statisticissubractedfromtheteststatisticbasedonallnsamples(withascalingfor samplesize)toproducetheTukeyjackknifepseudovaluecf.,bootstrap Kaplan-MeiersurvivalanalysisAvailableasanSPSSadvancedmodel (analyze\survival\kaplan-meier).FromtheSPSShelpfile:“Therearemanysituationsin whichyouwouldwanttoexaminethedistributionoftimesbetweentwoevents,suchas lengthofemployment(timebetweenbeinghiredandleavingthecompany).However, thiskindofdatausuallyincludessomecensoredcases.Censoredcasesarecasesfor whichthesecondeventisn’trecorded(forexample,peoplestillworkingforthecompany
attheendofthestudy).TheKaplan-Meierprocedureisamethodofestimatingtime-to eventmodelsinthepresenceofcensoredcases.TheKaplan-Meiermodelisbasedon estimatingconditionalprobabilitiesateachtimepointwhenaneventoccursandtaking theproductlimitofthoseprobabilitiestoestimatethesurvivalrateateachpointin time.”Seealso:http://www.statsoft.com/textbook/stsurvan.html http://ww w.cmh.edu/stats/model/survival/kaplan.aspcf.,Coxregression Kendall’sô (Kendall’stau)Themostnon-parametricofcorrelationcoefficients.Thesignof thedifferenceofallcombinationsoftwoobservationsinonevectorofdataarecompared withobservationsholdingthesamepositionsinthe2ndlistofdatavector.Ifthesignsin bothsetsarethesame,thematchisconcordant.Kendall’stauistheratioof{concordant -discordantranks}tototalpossibleranks.UsingKendall’striangle,exactpvaluescan becalculatediftherearenotiedranks.Cf.,Spearman’sñ Kendall’stau-bStuart’stau-cmakesanadjustmentfortablesizeinadditiontoacorrectionfor ties.Tau-cisappropriateonlywhenbothvariableslieonanordinalscale. http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap28/sect20.htm Kendall’stau-cKendall’stau-bissimilartogammaexceptthattau-busesacorrectionforties. Tau-bisappropriateonlywhenbothvariableslieonanordinalscale. http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap28/sect20.htm Kolmogorov-SmirnovtestAtestofwhetheronecumulativefrequencydistribution(cfd)differs fromanother.Aone-sampletestcomparesaknowncfdwithanobservedcfd.Atwosampletestcomparestwocfds.Theteststatisticsisthemaximumdifferencebetween cfds. Kruskal-Wallistest Nonparametricone-wayANOVA.Thisisak-independentsamples extensionofWilcoxonranksumtestcf.,Friedman’sANOVA KurtosisThefourthmomentaboutthemean( Larsen&Marx,2001p.233).Thepeakedness ofadistribution.aflatpdfiscallyedplatykurtic,whileapeakeddistributioniscalled leptokurtic.cf.,skewness Lackoffit Inaregressionanalysiswithoneexplanatoryfactor,iftherearetruereplicate observationsatoneormorevaluesoftheexplanatoryvariable,thentheresidualvariation fromasimpleleastsquaresregressioncanbepartitionedintopureerrorandlackoffit components(Ramsey&Schafer1997,p.212): (46)
Handout2 IntroProb&Statistics TermsP.28of68
Thepureerrorsumofsquarescanbeobtainedbyperformingaone-wayANOVAtotest fordifferencesinmeansamongreplicatedgroups.Thisone-wayANOVAwillalso producetheamongreplicatedmeanssumofsquares.Iftherearengroupsofreplicated means,therearen-1dfforthisamongmeansSS.ThelackoffitSSistheamongmeans sumofsquaresminustheregressionsumofsquares(with1df).Therefor,thelackoffit MShasn-2df,wherenisthenumberofreplicatedgroups.TheLackofFitF-testuses theratiooftheLackofFitMSoverthePureerrorMS.Theformermeasuresthe departureofthemeanofreplicatedobservationsfromthelineandthelatterthewithin groupvariation.Draper&Smith(1998)recommendperformingalackoffitFtestand onlypoolingthetwosourcesofresidualvariationintotheregressionerrorsumof squaresifthelackoffittestisnotsignificant(p>0.05). Laplace,PierreSimon(1749-1827)Describedthecentrallimittheory Latentclassanalysishttp://ourworld.compuserve.com/homepages/jsuebersax/index.htmor http://www2.chass.ncsu.edu/garson/pa765/latclass.htm Latenttraitmodelsforrateragreement http://ourworld.compuserve.com/homepages/jsuebersax/ltrait.htm LatentvariablesUnmeasuredvariablesorfactorsestimatedfrommeasuredvariablesandused instructuralequationmodels Latinhypersquaresampling AMonteCarlomethodusedtoassessthevarianceof modelpredictionscf.,MonteCarlomethod LatinsquaresAmethodofarrangingexperimentalunitsina2-factor(ormorerarelya3-or4 factor)ANOVA LeastsignificantdifferenceLSDSometimescalledTukey’sLSD.Apairofmeansistested usingtheANOVAerrormeansquareanddffortestingdifferencesamongmeans. ContraststestedwiththeLSDmustbeestablishedapriori,becausethetestoffersno protectionagainsttheinflationofTypeIerrorduetomultiplehypothesistesting.Indeed theLSDtestismorepowerfulthantheindependentsamplesttestiftherearemorethan 2groups. LeastsquaresAdrienMarieLegendre(1805)clearlydescribedthemethodofleastsquares (Stigler1986,p.13),animprovementofLaplace’searlierworkinminimizingthesum ofabsolutedeviations: On the method of least squares
In most investigations where the object is to deduce the most accurate possible result from observational measurements, we are led to a system of equations of the form: E = a + bx+ cy + fz + &c., in which a, b, c, f, &c. are known coefficients varying from one equation to the other, and x, y, z, &c. are unknown quantities, to be determined by the condition that each value of E is reduced either to zero or to a very small quantity … Of all the principles that can be proposed for this purpose, I think there is none more general, more exact, or easier to apply, than that which we have used in this work; it consists of makingthesumofthesquaresoftheerrorsaminimum. By this method, a kind of equilibrium is established among the errors which, since it prevents the extremes from dominating, is appropriate for revealing the state of the system which most nearly approaches the truth.
Handout2 IntroProb&Statistics TermsP.29of68
… We see therefore, that the method of least squares reveals, in a manner of speaking, the center around which the results of observations arrange themselves, so that the deviations from the center are as small as possible.” (Adrien Marie Legendre, 1805). Legendre’spapercanbereadintranslationat: http://www.stat.ucla.edu/history/legendre.pdf LeastsquaresregressionSolvingtheregressionmodelbyminimizingthesumof squaresofresidualsoftheresponsevariabletotheregressionline.cf.,regression Leibniz,GottfriedWilhelm(1646-1716)(Larsen&Marx2001p86)The1666treatise, “Dissertatio de arte cominatoria”wasperhapsthefirstmonographwrittenon combinatorics Levene’stestThereareatleast4differentversionsofLevene’stestforequalityofvariance,an assumptionofgenerallinearmodels(e.g.,ANOVA,regression,2-samplettests).They areallANOVAsofdeviationsfromthemean(squareddeviationfrommean,squared deviationfrommedian,absolutedeviationfromthemeanorabsolutedeviationfromthe median).Alltesttheequal-varianceassumptioninANOVAorregression.SPSS calculatestheLevene’stestbasedonabsolutedeviationsfromthegroupmean.A Brown-ForsythetestforequalvarianceperformsanANOVAontheabsolutedeviation fromgroupmedians.Cf.,homoscedasticity,heteroscedasticity Leverage Thedeviationofanindividualcasefromtherangeofexplanatoryvariables.Cases withhighleveragehavethepotentialforbeingoutliers,withoutliersbeingmorereadily detectedbyCook’sD.Seehttp://www.j.org/v02/i05/pirls/node15.html LikelihoodfromMathworldLikelihoodisthehypotheticalprobabilitythataneventthathas alreadyoccurredwouldyieldaspecificoutcome.Theconceptdiffersfromthatofa probabilityinthataprobabilityreferstotheoccurrenceoffutureevents,whilea likelihoodreferstopasteventswithknownoutcomes.Cf.,Maximumlikelihood LikelihoodfunctioninventedbyFisher1922AdefinitionfromMathworld:Alikelihood functionL(a)istheprobabilityorprobabilitydensityfortheoccurrenceofasample configuration,...,giventhattheprobabilitydensitywithparameteraisknown
LikelihoodratioHogg&Tanis1977,p.394Thelikelihoodratioisthequotient
where
isthemaximumofthelikelihoodfunctionwithrespecttoèwhenè 0ùand
isthemaximumlikelihoodfunctionwithrespecttoèwhenè 0 Ù.Cf.,Maximum likelihood LinearcombinationAnestimatedvalueobtainedbyalinearequationinwhichthecoefficients neednotaddtozero.Ifthesumofcoefficientsiszero,thenthelinearcombinationis,by definition,alinearcontrast.Thevarianceofalinearcombinationandcontrastcanbe readilycalculatedusingformulaeforthepropagationoferror.See: http://www.itl.nist.gov/div898/handbook/prc/section4/prc426.htm
Handout2 IntroProb&Statistics TermsP.30of68
LinearcontrastAcontrastisalinearcombinationof2ormorefactorlevelmeanswith coefficientsthatsumtozero.Cf.,linearcombination,orthogonalcontrast http://www.itl.nist.gov/div898/handbook/prc/section4/prc426.htm Log-linearmodel “Log-linearmodelingisananalogtomultipleregressionforcategorical variables.Whenusedincontrasttolog-linearregressionmodelslikelogitandlogistic regression,log-linearmodelingreferstoanalysisoftableswithoutnecessarilyspecifying adependent.Ratherthefocusisinaccountingfortheobservedfrequencies.” http://www2.chass.ncsu.edu/garson/pa765/logit.htm LogisticregressionAformofgeneralizedlinearmodel,withthelogitlinkfunction,
Thereareseveralformsoflogisticregression,includingbinarylogisticregression (=bivariatelogisticregression),withtheresponsevariabletakingonlytwostates(e.g., deadoralive),andbinomiallogisticregressionwiththeresponsevariabletakingon discretevaluesalongtheinterval(0,1).See http://www2.chass.ncsu.edu/garson/pa765/logistic.htm logitisthelogoddsfunctionln(p/(1-p)).Logitscanbeconvertedtoprobabilities,frequenciesor proportionsusingp=1-1/(1+Exp(Logits)),orp=exp(logits)/(1+exp(logits)) logitlinkfunctionisusedtoconvertprobabilitiesorfrequencydatainlogisticregression. logittransformIfxrangesbetween0&1,thenlog[(x)/(1-x)]oftenexpandsthetailofa distribution.Ramsey&Schafer(1997,2002)applythistransformtopercentagecover data(scaledtorangefrom0to1)andcallitthe regenerationratio. Lognormaldistribution LongitudinaldataThesamesubjectsorexperimentalunitsarefollowedthroughtime.Often analyzedwithrepeatedmeasuresdesigns LSDTukey’sLeastsignificantdifferenceisatestofmeansusingtheerrormeansquarefrom theoverallANOVAastheestimateofpooledstandarderror.Itdoesnotprotectagainst inflationoftheexperimentwiseerror.Itisoneoftheleastconservativeof20ormore multiplecomparisontests. Mallow’sCp cf.,Bayesianinformationcontent,AIC Mann-WhitneyUTestAlgebraicallyidenticaltoWilcoxonranksumtest Markov,AndreiAndreevich(1856-1922) http://www-history.mcs.st-andre ws.ac.uk/Mathematicians/Marko v.html Markovchain Astochasticmodelinwhichthefuturestateofthesystemcan bepredictedfromtheprobabilitymatrixandthestateofthesystemon theprevioustimestep(seealsoabsorbingMarkovchainandergodic Figure6.A.A. Markovchain). Markov. MarkovchainMonteCarlo(MCMC)Asearchmethodusedtoestimate modelparameters. Markovproperty(process)AMarkovprocessisdefinedasastochasticprocesswiththe propertyforanysetofn successivetimes(i.e.,t
(1.1)
Handout2 IntroProb&Statistics TermsP.31of68
Inotherwords,theconditionalprobabilitydensityatt,giventhevaluey n n-1attn-1, isuniquelydeterminedandisnotaffectedbyanyknowledgeofthevaluesat earliertimes.P1|1iscalledthetransitionprobability.VanKampen(1981,p.76) Maxmiumlikelihood“Themaximumlikelihoodestimateofaparameterisdefinedtobethe parametervalueforwhichtheprobabilityoftheobserveddatatakesitsgreatestvalue.” Agresti1996,p.9FromMathworld:Maximumlikelihood,alsocalledthemaximum likelihoodmethod,istheprocedureoffindingthevalueofoneormoreparametersfora givenstatisticwhichmakestheknownlikelihooddistributionamaximum.The maximumlikelihoodestimateforaparameter ìisdenoted .See: http://www.mathdaily.com/lessons/Maximum_likelihoodand http://socserv.socsci.mcmaster.ca/jfox/Courses/SPIDA/MLE-basic-ideas.pdf MaximumlikelihoodestimatorFromMathworld:Amaximumlikelihoodestimatorisavalue oftheparametersuchthatthe likelihoodfunctionisamaximum.seeestimators McNemar’stestAtestofproportionsina2x2classificationforrepeatedmeasures(paired) data.http://www.amstat.org/publications/jse/secure/v8n2/levin.cfm MDSMultidimensionalscaling(sometimescalledNMDS,fornonmetricmultidimensional scaling).Seehttp://forrest.psych.unc.edu/teaching/p208a/mds/mds.html MeasurementscalesVariablescanbeclassifiedintonominal,ordinal,intervalandratioscales ofmeasurement.Stevens(1951)andRoberts(1976)showthatsomemathematical operationsrequireatleastanintervalorevenratioscaleofmeasurment.Forexample, temperatureontheFahrenheitscaleisanintervalmeasureandtheratioofinterval measurmentsismeaingless.Someprocedures,likeFactorAnalysis,assumeatleastan intervalscaleofmeasurement.Vellman&Wilkinson(1993)review40yearsofresearch indicatingthatStevens’proscriptionsmayhavebeentoosevere,e.g.,itisvalidto calculateaverageGPAfromanordinalmeasure. Medianthetwomiddleitemsifthenumberofitemsiseven. Themedianisthesampleitemthatisinthemiddleinmagnitude,oriftheaverageof Mediation http://davidakenny.net/cm/mediate.htmConsideravariableXthatisassumed toaffectanothervariableY.ThevariableXiscalledtheinitialvariableandthevariable thatitcausesorYiscalledtheoutcome.Indiagrammaticform,theunmediatedmodelis
TheeffectofXonYmaybemediatedbyaprocessormediatingvariableM,andthe variableXmaystillaffectY.Themediatedmodelis
Handout2 IntroProb&Statistics TermsP.32of68
Themediatorhasbeencalledaninterveningorprocessvariable.Completemediationis thecaseinwhichvariableXnolongeraffectsYafterMhasbeencontrolledandsopath c’iszero.PartialmediationisthecaseinwhichthepathfromXtoYisreducedin absolutesizebutisstilldifferentfromzerowhenthemediatoriscontrolled.Whena mediationalmodelinvolveslatentconstructs,structuralequationmodelingorSEM providesthebasicdataanalysisstrategy.Ifthemediationalmodelinvolvesonly measuredvariables,however,thebasicanalysisapproachismultipleregressionorOLS. Regardlessofwhichdataanalyticmethodisused,thestepsnecessaryfortesting mediationarethesame. •
Step1:Showthattheinitialvariableiscorrelatedwiththeoutcome.UseYasthe criterionvariableinaregressionequationandXasapredictor(estimateandtest pathc).Thisstepestablishesthatthereisaneffectthatmaybemediated. • Step2:Showthattheinitialvariableiscorrelatedwiththemediator.UseMasthe criterionvariableintheregressionequationandXasapredictor(estimateand testpatha).Thisstepessentiallyinvolvestreatingthemediatorasifitwerean outcomevariable. • Step3:Showthatthemediatoraffectstheoutcomevariable.UseYasthe criterionvariableinaregressionequationandXandMaspredictors(estimate andtestpathb).Itisnotsufficientjusttocorrelatethemediatorwiththe outcome;themediatorandtheoutcomemaybecorrelatedbecausetheyareboth causedbytheinitialvariableX.Thus,theinitialvariablemustbecontrolledin establishingtheeffectofthemediatorontheoutcome. • Step4:ToestablishthatMcompletelymediatestheX-Yrelationship,theeffectof XonYcontrollingforMshouldbezero(estimateandtestpathc’).Theeffectsin bothSteps3and4areestimatedinthesameregressionequation. Seealsohttp://www.public.asu.edu/~davidpm/ripl/mediate.htm MetaanalysisFromWikipediaAmeta-analysisisastatisticalpracticeofcombiningtheresults ofanumberofstudiesthataddressasetofrelatedresearchhypotheses.Thefirst meta-analysiswasperformedbyKarlPearsonin1904,inanattempttoovercomethe problemofreducedstatisticalpowerinstudieswithsmallsamplesizes;analyzingthe resultsfromagroupofstudiescanallowmoreaccurateestimationofeffects...Modern meta-analysisdoesmorethanjustcombinetheeffectsizesofasetofstudies.Ittestsif thestudies’outcomesshowmorevariationthanthevariationthatisexpectedbecauseof samplingdifferentresearchparticipants.Ifthatisthecase,studycharacteristicssuchas measurementinstrumentused,populationsampled,oraspectsofthestudies’designare coded.Thesecharacteristicsarethenusedaspredictorvariablestoanalyzetheexcess variationintheeffectsizes.
Handout2 IntroProb&Statistics TermsP.33of68
Adissimilaritymeasureobeyingthefollowingfouraxioms,thelastbeingthe triangularinequalityaxiom(Legendre&Legendre1983,p.193): 1) ifa=b,D(a,b)=0 2) ifa b,D(a,b)>0 3) D(a,b)=D(b,a) 4) D(a,b)+D(b,c)$D(a,c),asthesumof2sidesofatriangleis necessarilyequaltoorlargerthanthethirdside(triangleinequality axiom) seealsosemimetric,Triangularinequality&ultrametric. Mill’scannonofthedifferenceJ.S.Mill’s(1843)fifthcannonofexperimentalenquiry(The Metric
cannonofdifference)“Whateverphenomenonvariesinanymannerwheneveranother phenomenonvariesinsomeparticularmanneriseitheracauseoraneffectofthat phenomenon,orisconnectedwithitthroughsomefactofcausation”Kendall&Stuart (1979)findtwomajorproblemswithbasinganexperimentalorsamplingdesignon Mill’s5thcannon:1)theone-phenomenon(factor)-at-a-timeapproachdoesnotwork,and 2)“Wecanneverbequitesurethatalltheimportant,oreventhemostimportant,causal factorshavebeenincorporatedinthestructureoftheexperiment.Somemaybequite unknown;othersalthoughknown,maywronglybeconsideredtobeofminorimportance anddeliberatelyneglected.Wealwaysneedtoguardagainsttheperversionofthe inferenceswithinanexperimentbyadventitiousoutsideeffects.” Mixedmodel(linearmixedmodel)Mixedmodelsareeithergenerallinearmodelsor generalizedlinearmodelswhichcontainbothfixedandrandomfactors.Generalized mixedmodelsareasubsetofgeneralizedlinearmodels,whichincludelogistic,probit& log-linearmodels,inwhichrandomandfixedeffectscanaffecttheresponsevariable. Linearmixedmodelsareparticularlyusefulforlongitudinaldata,inwhichthesame subjectsarefollowedthroughtime.Allsuchrepeatedmeasuresdesigns,includingpaired t-tests,canberegardedasasubsetofmixedmodels..Thestandardmixedmodelisofthe form
where,âcontainspopulationparametersdescribingaverageresponsestoexternal variablesand containssubject-specificparametersdescribinghowthei’thsubject deviatesfromtheaveragepopulation,and isavectoroferrorcomponents.The matrices
arecovariates.Mixedmodels,includingthoseinSPSS,allowa
varietyoferrorstructuresincludinganalysesofautocorrelatederrorsandotherfeatures ofrepeatedmeasuresdesigns.Thereareanumberofdifferentmethodsforestimating modelparameters,includingrestrictedmaximumlikelihood,penalizedlikelihood, Bayesiantechniques,andsimulatedmaximumlikelihood.Adequacyofmodelsinvolve likelihoodtests,withpenaltiesforfittedparameters. Minimumnoisefraction(MNF) (orMaximumNoiseFraction)aneigendecomposition methodusedinsatelliteremotesensing.Examples: http://www.earthsat.com/geo/oil&gas/hydrocarbon_MNF.htmlor http://www.eoc.csiro.au/hswww/oz_pi/svt_hilo/burke.pdf
Handout2 IntroProb&Statistics TermsP.34of68
modus tollensThelogicalsyllogism:“IfAthenB,notBimpliesnotA”Themodus tollensis thebasisofPopper’smethodoffalsificationism. MonteCarlomethodAnymethod,includingbootstrapsampling,thatusesrandomlyselected subsetsofthedatatoestimatemodelparameterscf.,bootstrap,permuationanalysis MontyHallProblem Acontestantmustchooseoneofthreedoors.Behindonedoorisa desirableprizeandtheothertwocontaingoats.Thecontestantpicksonedoor,and MontyHallimmediatelyopensoneofthetworemainingdoors,revealingagoat.Monty thenofferstheremainingunopeneddoorforthedooryou’vechosen.Shouldyouswitch? Multicollinearity (Collinearity)Inmultipleregression,ifthereisastrongcorrelation amongexplanatoryvariables,neitherthesignnorthemagnitudeofthecoefficientcanbe trusted.Draper&Smith(1998,p.369)providethefollowingdescriptionof multicollinearity: SupposewewishtofitthemodelY=Xâ+å,Thesolution wouldusuallybesought[b=X\YinMatlab].However,if X’Xissingular,wecannotperformtheinversionandthenormal equationsdonothaveauniquesolution.(Aninfinityofsolutionsexists instead).…atleastonecolumnofXislinearlydependenton(i.e.,isa linearcombinationof)theothercolumns.Wewouldsaythatcollinearity (ormulticollinearity)existsamongthecolumnsofX. MulticollinearityistestedwiththevarianceinflationfactororVIForthetolerance. VIF=1/tolerance.Tolerance=1-Rk2,whereRk2istheamountofvariationinone explanatoryvariableexplainedbytheotherexplanatoryvariables. Otherlinks:UCLAStatisticspage(NotethattheimplicationthatVIF’slessthan20 aren’tcauseforconcernisadubiousbitofadvice.VIF’saslowas3or4couldcreate problemsinaregressionmodel http://www.ats.ucla.edu/stat/stata/modules/reg/multico.htm MultilevelmodelsSeeSinger&Willett(2003)And http://www2.chass.ncsu.edu/garson/pa765/multilevel.htm Multinomialdistribution Multiplecomparisonprocedures(multiplehypothesistests,a posteriori tests,posthoctests) AfteranANOVAindicatesthatallmeansareequal,aninvestigatormaywishtoknow whichpairsorgroupsofmeansdiffer.Avarietyofmultiplecomparisonstests,also knownasa posterioricontrasts,havebeendevelopedtotestfordifferencesamong meanswhileconsideringtheoverallorexperiment-wiseerror.Thesetestsinclude Bonferroni,Duncan’s,Dunn’s,Dunnett’s(forcomparisonswithacontrolgroup), Dunnett’sC(forunequalvariances),Dunnett’sT3(forunequalvariances,Gabriel, Games-Howell(conservativeforunequalnandunequalvariance),Hochberg’sGT2, LeastsignificantdifferenceorLSD(notconservative),Scheffe’s,Sidak,StudentNewmanKeuls(SNK),Tamhane’sT2(forunequalvariances),Tukey’sHSD,TukeyKramer(ageneralformofTukey’sHSD),Waller(Bayesianapproachforequaln). Thereareatleast20suchtests(seeSokal&Rohlf(1995)forathoroughdiscussion). ThemostimportantareperhapstheBonferroni,Tukey-Kramer,andScheffétests, whichadjustforthenumberofaposterioricontrasts.TheBonferroniadjustmentfor experimentwisealphalevelcanbeusedinanytest.Hotelling’sT2canbeusedtoadjust formultiplecorrelation(cf.,multiplehypothesistesting)
Handout2 IntroProb&Statistics TermsP.35of68
Multipleregression Aregressionwithmorethanoneexplanatoryvariable. Multiplicationruleofcombinatorics http://www.math.uah.edu/stat/comb/comb1.html#Multiplication Multivariatehypergeometricdistributionhttp://www.math.uah.edu/stat/urn/urn4.html MutuallyexclusiveEventsAandBdefinedoverthesamesamplespacearesaidtobemutually exclusiveiftheyhavenooutcomesincommon–thatis,ifA1B=�,where�isthe nullset(Larsen&Marx2001,Definition2.2.2) MVUE Minimumvarianceunbiasedestimator Naturallogarithms[ofteninidicatedwithln(x)]LogarithmswereinventedbytheScotJohn Napierin1614andnaturallogarithmsarelogarithmstothebasee,butNapierdidn’tuse
theexponentialfunction,anothercaseof Stigler’slawofeponymy. Negativebinomialdistribution TherandomvariableXissaidtohaveanegativebinomial distributionif
TheexpectedvalueE(X),likethePoissondistributionisëbutVar(X)=ë+ë2/r,wherer iscalledthedispersionparameter.See http://ehs.sph.berkeley.edu/hubbard/longdata/webfiles/poissonmarch2005.pdf Negativebinomialregression Intheanalysisofcountdata,Poissonregressionassumes thevarianceequalsthemean.Overdispersion,orthevarianceexceedingthemean,may indicatetheneedforanegativebinomialregression. http://www.uky.edu/ComputingCenter/SSTARS/P_NB_3.htm NestedANOVA AnAnalysisofvarianceinwhichtheexperimentalunitsareasubsetof treatmentlevels Newton-Raphsonmethod (Newton’smethod)Usedtoestimatetheparametersof generalizedlinearmodels,asdescribedMcCullagh&Nelder(1989,Ch2,p.40-41). http://mathworld.wolfram.com/NewtonsMethod.html Neyman-PearsonschoolAfrequentiststatisticalresearchprogramledbyJerzyNeymanand EgonPearson.Theyintroducedcriticalvaluesandconfidencelimits NIPALSAlgorithm(“NonlinearIterativePartialLeastSquares”)Inadditiontosolvingpartial leastsquaresproblems,canbeusedtofinddominanteigenvalues&eigenvectorsofa squarematrix. NonparametricstatisticsAnunderlyingparametricdistribution,suchasthenormal distribution,isnotassumedforthedata.Normalandchi-squaredistributionsareoften usedforcalculatingpvalues. Nonsensecorrelationcf.,spuriouscorrelation Normaldistribution[Gaussiandistribution,Normalcurve,errorfunction].Stigler(1986,p. 284)tracesthisdistributiontoAbrahamDeMoivre(1733).Figure7showsaMatlab ezplotofanormaldistributionwithmean5.3andsd1.3.
Handout2 IntroProb&Statistics TermsP.36of68
Figure7.MatlabEzplotofnormaldistribution, showninaboveequation.
(59)
(60)
(61)
NormalequationsAccordingtoStigler(1999,p.415-420),atermintroducedbyGauss(1822) (“normalgleichungen”)todescribehowtheleastsquaresmethodcouldbeapplied. Stigler(1986,p.14)arguesthattheconceptofthenormalequationswasusedby Legendre(1805).Cf.,leastsquares,regression NullhypothesisInthefrequentistschoolofstatistics,thenullhypothesisisthehypothesisthat statisticaltestsaredesignedtoreject(cf.,modus tollens,TypeIerror,TypeIIerror). ObservationalstudyThesamplingunitsareinherentlyfinitecf.,experiment
Handout2 IntroProb&Statistics TermsP.37of68
OddsTheoddsareadifferentwayofexpressingtheprobabilityofaneventoccurring.Ifyou knowtheprobabilityofanevent,thenyoucancalculatetheodds.Iftheprobabilityofan eventisp,thenodds=p/(1-p):
Iftheprobabilityofraintodayis50%or0.5thentheoddsofrainare0.5/(1-0.5)=1/1= 1:1or1to1.Thisisalsoexpressedassayingtheoddsofraintodayareeven.Theoddsof gettingasixwhenrollingadie(thesingularofdice) is
Thisissometimes
expressedassayingtheoddsbeing5to1againstgettingasix.Theoddsofgettingthe KingofHeartswhendrawingasinglecardfroma52-carddeck is
Thisiscouldbeexpressedassayingtheoddsare51to1
againstdrawingaKingofHearts.Betsinhorseracesaresetbytheodds.Ifthe probabilityofthefavoritehorsewinningaraceis60%thentheodds are
Thishorsewouldbelistedasa3:2favorite.Inorderto
win$2,you’dhavetobet$3.Ifyoubet$3,youwouldget$5backifthehorsewon.A longshotinahorseracemighthaveaprobabilityofwinningof1%.Theoddsofthat horsewinningwouldbe
Thiswouldbeexpressedas
sayingtheoddsare99to1againstwinning.Ifyoubet$1onthishorseanditwon,you’d win$99. Asshownatright,somecasinosexpressoddsusingthenotation‘10for1'.Thismeans thata$1dollarbetat9to1oddsreturns$10.RichardFrey(1970,p269)inhiseditionof ‘AccordingtoHoyle’describestheconventionofreportingoddswith‘for:’ “Onmany[craps]layoutstheactualoddsbeingofferedaredisguisedby theuseoftheword“for.”Ifthehouse,forexample,pays4-to-1odds,the winnerofabetreceiveshis$1betbacktogetherwith$4paidbythe house,atotalof$5.Somehousesquotetheseoddsbyoffeinr“five-for one,”meaningthatforevery$1thebettorputsup,herecieves,whenhe wins,$5—includinghisown$1.Thisisequivalentotoddsof4-to-1.” Theodds,ù,canbeconvertedtoaprobabilitybyusingtherelationthatiftheoddsofyes areù,P(yes)=ù/(ù+1).So4-to-1oddswouldhaveaprobabilityof1/(4+1)or0.2.Odds reportedas5for1wouldhaveapvalueof0.2 [cf.,craps,oddsratio] Oddsratiotheratiooftwoodds.IftheoddsofapersongettingacoldtakingvitaminCare3 andtheoddsofapersongettingacoldtakingaplaceboare4.5,thentheestimatedodds ratiois1.5.Onecansaythattheoddsofgettingacoldare50%greaterifonedoesn’t
Handout2 IntroProb&Statistics TermsP.38of68
takevitaminC.Ramsey&Schafer(2002,p.540)preferreportingtheoddsratioto differencesinproportionsbecause:1)theoddsratiotendstoremainmorenearlyconstant overlevelsofconfoundingvariables,2)theoddsratioistheonlyparameterthatcan describethebinaryresponsesoftwogroupsfromaretrospectivestudy,and3)the comparisonofoddsextendsnicelytologisticregressionanalysis. OLSOrdinaryleastsquarescf.,WLS One-sidedtestalsocalledone-tailedtestcf.,two-sidedtest orthogonal rightangle OrthogonalarraysInexperimentaldesign,youmightwanttotest1000drugsonamammalian cellculture,includingthe2-and3-wayinteractions.Howcanthatbedonewitha relativelysmallnumberofexperimentalunits.Orthogonalarraysandfactorialdesigns providewaysofconstructingchoicesoffactorsandlevelsoffactorandtheanalysesthat canbeperformedonthem.See http://support.sas.com/techsup/technote/ts723.htmlor http://support.sas.com/techsup/tnote/tnote_stat.html#market.SleuthChapter24 providesaconciseintroductiontotheconcepts. Cf.,experimentaldesign OrthogonalcontrastTwocontrastsareorthogonalifthesumoftheproductsofcorresponding coefficients(i.e.,coefficientsforthesamemeans)addstozero.Cf.,linearcontrast http://www.itl.nist.gov/div898/handbook/prc/section4/prc426.htm orthonormalbasishttp://mathworld.wolfram.com/OrthonormalBasis.html Overdispersion InfittingbinomialandPoissonlogisticregressionmodels,thevarianceof theobserveddataisgreaterthanthatpredictedfromthevarianceexpectedfromthe binomialorPoissonmodels.“Thetermsextra-binomial variationandoverdispersion describetheinadequacyofthebinomialmodelinthesesituations.” Ramsey&Schafer (2002,p.621). Overfitting “Whena[regression]modelisfittedthatistoocomplex,thatis,ithastoomany freeparameterstoestimatefortheamountofinformationinthedata,theworthofthe 2 model(e.g.,R)willbeexaggeratedandfutureobservedvalueswillnotagreewiththe predictedvalues.Inthissituation,overfittingissaidtobepresent,andsomeofthe findingsoftheanalysiscomefromfittingnoiseorfindingspuriousassociationsbetween XandY.”Harrell(2002,p.60) pvalue(fromKWuensch,edstat,3/19/03)Theprobabilityofobtainingdataasormore discrepantwiththenullhypothesisthanarethoseinthepresentsample,assumingthat thenullhypothesisisabsolutelycorrect.JerryDallalhasalengthydiscussiononhisweb site:http://www.tufts.edu/~gdallal/pval.htm PairedttestAformofStudent’sttestinwhichobservationsarepairedandthenullhypothesis isthatthedifferencebetweenpairedobservationsisequaltosomevalue(usuallyzero). Thisisaformofrepeatedmeasuresdesign. Parameter Anunknownnumericalvaluedescribingafeatureofaprobabilitymodel. ParametersareindicatedbyGreekletters(Ramsey&Schafer1997,p.19)cf.,statistic PartialLeastSquares(PLS)“Inpartialleastsquaresregression,predictionfunctionsare representedbyfactorsextractedfromtheY’XX’Ymatrix.Thenumberofsuch predictionfunctionsthatcanbeextractedtypicallywillexceedthemaximumofthe numberofYandXvariables.Inshort,partialleastsquaresregressionisprobablythe leastrestrictiveofthevariousmultivariateextensionsofthemultiplelinearregression model.Thisflexibilityallowsittobeusedinsituationswheretheuseoftraditional
Handout2 IntroProb&Statistics TermsP.39of68
multivariatemethodsisseverelylimited,suchaswhentherearefewerobservationsthan predictorvariables.Furthermore,partialleastsquaresregressioncanbeusedasan exploratoryanalysistooltoselectsuitablepredictorvariablesandtoidentifyoutliers beforeclassicallinearregression.Partialleastsquaresregressionhasbeenusedin variousdisciplinessuchaschemistry,economics,medicine,psychology,and pharmaceuticalsciencewherepredictivelinearmodeling,especiallywithalargenumber ofpredictors,isnecessary.Especiallyinchemometrics,partialleastsquaresregression hasbecomeastandardtoolformodelinglinearrelationsbetweenmultivariate measurements(deJong,1993).”http://www.statsoft.com/textbook/stpls.html, http://www.vcclab.org/lab/pls/m_description.html,SeealsoNIPALS,SIMPLS,WA PLS Pascal,Blaise(b.6/19/1623,ClermontAuvergneFranced.1662).WithFermat,thefatherof mathematicalprobabilitytheory.(Bell1937,p.86) Pathanalysis SewallWrightinventedthistechniquein1921inthepaper“Causationand Correlation”toexplainthecausalbasisofasetofpartialcorrelationsamongasetof variables.Sometimesreferredtoascausalanalysis.Pathanalysisisnowasubsetof structuralequationmodeling. pdf Acronymforprobabilitydensityfunction Pearson,Karl1857-1936 Pearson’srPearson’sproduct-momentcorrelationcoefficient Permutations(Larsen&Marx2001,p.92)Theorem2.9.1Thenumberof permutationsof lengthkthatcanbeformedfromasetofndistinctelements,repetitionsnotallowed,is denotedbythesymbolnP,where k
CorollaryThenumberofwaystopermuteanentiresetofndistinctobjectsisn!The symboln!iscallednfactorial PhiPhiisachi-squarebasedmeasureofassociation,sometimescalledPearson’scoefficientof mean-squarecontingency,thoughsometimesthistermisappliedtoPearson’s contingencycoefficient,discussedbelow,whichisamodificationofphi..With dichotomizedcontinuousdata, tetrachoriccorrelationispreferred.phi= (bc-ad)/sqrt[(a+b)(c+d)(a+c)(b+d)], http://www2.chass.ncsu.edu/garson/pa765/assocnominal.htm Poisson,SiméonDenis(1781-1840) Poissondistribution (Larsen&Marx2001Theorem4.2.2,p.251)FirstdescribedbyPoisson asalimittheoremandthenusedin1898byProfessorLadislausvon BortkiewicztomodelthenumberofPrussiancavalryofficerskickedto deathbytheirhorses. TherandomvariableXissaidtohaveaPoissondistributionif
Handout2 IntroProb&Statistics TermsP.40of68
TheexpectedvalueE(X)andvariance(Var(X))areboth ë. Poissonlimittheorem(Larsen&Marx2001Theorem4.2.2,p.251)Ifn�4andp�0in suchawaythatë=npremainsconstant,thenforanynonnegativeintegerk,
ThePoissonlimittheoremjustifiestheuseofthePoissondistributiontoapproximatethe binomialdistribution.ThePoissonapproximationtothebinomialisquiteaccurateif n$20andp#0.05andisverygoodifn$100andnp #10, Poissonprocess(Hogg&Tanis1977,p78)Letthenumberofchangesthatoccurinagiven continuousintervalbecounted.WehaveanapproximatePoissonprocesswithparameter ë>0ifthefollowingaresatisfied:(i)thenumberofchangesoccurringin nonoverlappingintervalsareindependent,(ii)theprobabilityofexactlyonechangeina sufficientlyshortintervaloflengthhisapproximatelyhë,and(iii)Theprobabilityoftwo ormorechangesinasufficientlyshortintervalisessentiallyzero. Poissonregression Ageneralizedlinearmodelfortheanalysisofcountdata,whichmeetthe assumptionthatthevarianceofthecountsequalsthemean. Popper,SirKarlR. Austrianphilosopherofscience,whospentmostofhiscareeratthe LondonSchoolofEconomics,whoproposedhismethodof“conjecturesofrefutations” inhismagnumopusLogik der Forschung,translatedasLogicofScientificDiscoveryin 1959.Hisscientificmethodisfoundedonthedemarcationprincipleoffalsificationism. Scienceisdistinguishedfrompseudosciencebecausescientificprinciplesaresubjectto falsification.Hislongcareerisprofiledinthewonderful2003book,“Witgenstein’s Poker.” PosteriordistributionSeeBayestheorem Power Theprobabilitythatanullhypothesiswillberejected,giventhatitisfalse. Power=1-P(TypeIIerror)=1-â.Inordertocalculatethestatisticalpower,the alternatehypothesismustbespecified.BillTrochim’swebpagehasaverynice discussionofstatisticalpower(http://trochim.human.cornell.edu/kb/power.htm) Powerfunction Precision indicatestherandomorchancevariabilityaboutthemeanofrepeated observations(cf.accuracy ) PRESS Predictionerrorsumofsquares. principalcomponentmethod=PrincipalComponentsAnalysis(PCA)Developedby Hotelling(1933).PCAissimplytherotationofthesrcinalsystemofaxesinthe multidimensionalspace.Theprincipalaxesare orthogonalandthe eigenvaluesmeasure theamountofvarianceassociatedwitheachprincipalaxis.PCAisusedtosummarizein afewimportantdimensionsthegreatestpartofthevariabilityofadispersionmatrixofa
largenumberofdescriptorsR-mode)orcases(Q-mode).cf.,EOF principalcomponentscores thevalueofaprincipalcomponentforindividualpoints,hencethe newcoordinatesofdatapointsmeasuredalongaxescreatedbytheprincipalcomponent
Handout2 IntroProb&Statistics TermsP.41of68
method.Aprincipalcomponentscorecanberegardedasanadditionalvariableforeach case,thisvariableisalinearfunctionofthesrcinalvariables. Probabilitydensityfunction ProbitanalysisAmaximumlikelihoodregressionproceduretoestimatetheproportionofa populationthatwillbeaffectedbyagiventreatmentlevel.Themethodwaspioneeredby Blisstoanalyzebioassaydatafromtoxicologyexperiments.Forexample,atoxicologist mightwanttofindthelethaldoserequiredtokill50%ofapopulationofinvertebratesin abeaker.Probitanalysisisnowregardedasoneofmanymethodsincludedamongthe generalizedlinearmodels.Thesegeneralizedlinearmodelsareusuallyfitusingthe principleofmaximumlikelihood.Inpractice,thelogisticregressionmodelingprocedure oftengivesverysimilarresults.See, http://www2.chass.ncsu.edu/garson/pa765/logit.htm ProbitlinkfunctionAgresti(1996,p79)writes,“Theprobitlinkappliedtoaprobabilityð(x) transformsittothestandardnormalz-scoreatwhichtheleft-tailprobabilityequalsð(x). Forinstance,probit(.05)=-1.645,probit(0.50)=0,probit(.95)=1.645,andprobit (.975)=1.96.TheprobitmodelisaGLMwitharandomcomponentandaprobitlink.” ProbabilityLarsen&Marx(2001)providefourdistinctlydifferentdefinitionsofprobability: • Classicalprobability,Pascal&Fermat • “Imagineanexperiment,orgame,havingnpossibleoutcomes—and supposethatthoseoutcomesareequallylikely.IfsomeeventAwere satisfiedbymoutofthosen,theprobabilityofA[WrittenP(A)]should besetequaltom/n.Thisistheclassicalora prioridefinitionof probability. • Empiricalprobability(AttributedtovonMises,butcanbefoundatleasta centuryearlier) • “ConsiderasamplespaceS,andanyeventA,definedonA.Ifour experimentwereperformedonthem,eitherAorAcwouldbethe outcome.Ifitwereperformedntimes,theresultingsetofsample outcomeswouldbemembersofAonmoccasions,mbeingsomeinteger betweenoandn,inclusive.Hypothetically,wecouldcontinuetheprocess aninfinitenumberoftimes.Asngetslarge,theratiom/nwillfluctuate lessandless.Thenumberthatm/nconvergestoiscalledtheempirical probabilityofA,thatis •
Axiomaticprobability.AndreiKolmogorov • IfShasafinitenumberofmembers,Kolmogorovshowedthatasfewas threeaxiomsarenecessaryandsufficientforcharacterizingthe probabilityfunctionP. • Axiom1.LetAbeanyeventdefinedoverS.ThenP(A)$0. • Axiom2P(S)=1. • Axiom3LetAandBbeanytwomutuallyexclusiveevents
definedoverS.Then •
WhenShasaninfinitenumberofmembers,afourthaxiomisneeded
Handout2 IntroProb&Statistics TermsP.42of68
•
Axiom4LetA,A,…,beeventsdefinedoverS.IfA 1A= �for 1 2 i j eachi j,then
•
Fromthesesimplestatements,allotherpropertiesoftheprobability functioncanbederived. • Subjectiveprobability • Whatisaperson’smeasureofbeliefthataneventwilloccur? Humorousdefinitions1)“probability”=long-runfractionhavingthischaracteristic. 2)“probability”=degreeofbelievability.3)Afrequentistisapersonwhose lifetimeambitionistobewrong5%ofthetime.4)ABayesianisonewho, vaguelyexpectingahorse,andcatchingaglimpseofadonkey,stronglybelieves hehasseenamule. http://www.statisticalengineering.com/frequentists_and_bayesians.htm Probabilityfunction Therearetwofundamentallydifferenttypesofprobabilityfunctions (Larsen&Marx2001). Adiscreteprobabilityfunctionisafunctiondefinedforaprocesswithafinite orcountablyinfinitenumberofoutcomes.Supposethatthesamplespaceforagiven experimentiseitherfiniteorcountablyinfinite,ThenanyPsuchthata)0#P(S)foralls 0Sand b) TheprobabilityofaneventAisthesumoftheprobabilities associatedwiththeoutcomesinA:
Acontinuousprobabilityfunctionisdefinedforaprocesswithanuncountably infinitenumberofoutcomes.IfSisasamplespacewithanuncountablenumberof outcomesandiffisarealvaluedfunctiondefinedonS,thenfissaidtobeacontinuous probabilityfunctionifa)0# f(y)forally0S,andb) Furthermore,ifAis anyeventdefinedonS,itmustbetruethat ProbabilitydensityfunctionLarsen&Marx(2001,p.121,126)Describingthevariationof adiscreterandomvariableAssociatedwitheachdiscreterandomvariableXisa probabilitydensityfunction(orpdf),p(k).Bydefinitionp(k)isthesumofallthe x x probabilitiesassociatedwithoutcomesinsamplespaceSthatgetmappedinto kbythe randomvariableX.Thatis
Conceptually,p( x k)describestheprobabilitystructureinducedonthereallinebythe randomvariableX. Describingthevariationofacontinuousrandomvariable
Handout2 IntroProb&Statistics TermsP.43of68
AssociatedwitheachcontinuousrandomvariableYisalsoapdf, fy(y),butfy(y)inthis caseisnottheprobabilitythattherandomvariableYtakesonthevaluey.Rather,fy(y)is afunctionhavingthepropertythatforallaandb,
ProfileAnalysisor,‘themultivariateapproachtorepeatedmeasures’whichdoesnotrequire sphericityasanassumption.Tabachnick&Fidell(2001,p422)describeprofile
analysisasanalternativetotraditionalrepeatedmeasuresdesigns,‘aspecialapplication ofmultivariateanalysisofvariance(MANOVA)toasituationwhenthereareseveral [responsevariables],allmeasuredonthesamescale.’Profileanalysisrequiresmore casesthandependentvariablesinthesmallestgroup.Morrison(1976,p153)describes profileanalysisinvolvingT2testsofparallelprofilesfollowedbytestsofdifferentlevels amonggroups.ProfileanalysisisavailableinSPSSMANOVAandSPSSGLM/Repeated measuresasdescribedbyTabachnick&Fidell(2001,p391). PropagationoferrorVariablesestimatedfromdatausuallyhaveanassociatederrorwhich shouldbeincorporatedincalculationsinvolvingthoseparameterestimates.Forexample, Larsen&Marx(2001,p.222-223)presentthepropagationoferrorformulaforthe varianceoflinearcombinations: Calculatingthevarianceofalinearcombination.Theorem3.13.1LetWbeany randomvariable,discreteorcontinuous,andlet aandbbeanytwoconstants,Then
Calculatingthevarianceofasumofrandomvariables.Theorem3.13.2LetW1, W2,...,WnbeasetofindependentrandomvariablesforwhichE( Wi2)isfiniteforalli. Then
TheformulaeandMonteCarloapproachesusedtopropagateerrorarecoveredwellin Bevington&Robinson(1992)andTaylor(1997). PseudoreplicationAtermcoinedbyHurlbert(1984)foraconceptcalledmodel misspecificationbyUnderwood(1997).Itspecificallyreferstotheuseofan inappropriatestatisticalmodel,especiallyonewithinflateddegreesoffreedomusedto estimatetheerrorvariance. Q-mode,R-mode Legendre&Legendre(1983,p.172).Themeasurementofdependence
betweentwodescriptors(variables)isachievedmymeansofcoefficientslikePearson’s product-momentcorrelation,r.Thestudyofthecorrelationorvariance-covariance matricesisthereforecalledanRanalysis.Incontrast,astudyofanecologicaldata matrixbasedupontherelationshipbetweenobjectsiscalled Qanalysis.Cattell(1966)
Handout2 IntroProb&Statistics TermsP.44of68
alsodefinedO-,P-,S-,andT-modes.n.b.,manyauthors(Pielou1984)reversethis conventionalusage.Occasionally,thetermsnormalmodeandinversemodeareused insteadofQandRmode,butthesetermsshouldbeavoidedduetotheoverlapwiththe correspondingstatisticalterms. QuadraticequationAnyequationoftheform:
“Inmathematics,aquadratic
functionisapolynomialfunctionoftheform whereaisnonzero.It takesitsnamefromtheLatinquadratusforsquare,becausequadraticfunctionsarisein thecalculationofareasofsquares.InthecasewherethedomainandcodomainareR (therealnumbers),thegraphofsuchafunctionisaparabola.Ifthequadraticfunctionis settobeequaltozero,thentheresultisaquadraticequation.” http://en.wikipedia.org/wiki/Quadratic Quadraticterm Anytermraisedtoapowerof2. Quantilehttp://mathworld.wolfram.com/Quantile.html Quartilehttp://mathworld.wolfram.com/Quartile.htmlSeealsoTukeyhinges Quetelet,Adolphe(1796-1874)FromtheColumbiaEncyclopedia:Belgian statisticianandastronomer.Hewasthefirstdirector(1828)oftheRoyal ObservatoryatBrussels.AssupervisorofstatisticsforBelgium(from1830), hedevelopedmanyoftherulesgoverningmoderncensustakingandstimulated statisticalactivityinothercountries.Applyingstatisticstosocialphenomena, hedevelopedtheconceptofthe“averageman”andestablishedthetheoretical foundationsfortheuseofstatisticsinsocialphysicsor,asitisnowknown, sociology.Thus,heisconsideredbymanytobethefounderofmodern quantitativesocialscience.ATreatiseonMan(1835;tr.,1842)ishis Fromthe best-knownwork. Quotasampling Gaveriseto“DeweybeatsTruman”cf.,census,probabilistic portraitsof UCLA sampling 2 statisticians Rsquared [coefficientofdetermination,R]Percentageofthetotal site responsevariationexplainedbytheregressionwiththe explanatoryvariables(Ramsey&Schafer1997,p.213;Larsen &Marx2006,p309-310)
AdjustedRsquaredR2adjustedforthenumberoftermsusedtofitthemodel,cf., PRESS
RandomeffectsInANOVA,effectsaremodeledasfixedorrandom.Theappropriate
denominatorforFtestsinfactorialANOVAsdifferdependingonwhetherthemain effectsarefixedorrandom.Arandomeffectmodelisoneinwhichthelevelsarechosen asiftheywererandomsamplesfromaprobabilitydistribution.McCulloch&Searle (2001,p.17)discusswhetheraneffectisfixedorrandom:“Inendeavoriingtodecide
Handout2 IntroProb&Statistics TermsP.45of68
whetherasetofeffectsiffixedorrandom,thecontextofthedata,themannerinwhich theyweregatheredandtheenvironmentfromwhichtheycamearethedetermining factors.Inconsideringthesepointstheimportantquestionis:arethelevelsofthefactor goingtobeconsideredarandomsamplefromapopulationofvalueswhichhavea distribution?if“yes”thentheeffectsaretobeconsideredasrandomeffects;if“no” then,incontrasttorandomness,wethinkoftheeffectsasfixedconstantsandsothe effectsareconsideredasfixedeffects.Thuswheninferenceswillbemadeabouta distributionofeffectsfromwhichthoseinthedataareconsideredtobearandom sample,theeffectsareconsideredasrandom;andwheninferencesaregoingtobe confinedtotheeffectsinthemodel,theeffectsareconsideredfixed.Anotherwayof puttingitistoaskthequestions:‘dothelevelsofafactorcomefromaprobability distribution?’and‘Isthereenoughinformationaboutafactortodecidethatthelevelsof itinthedataarelikearandomsample?’Negativeanswerstothesequestionsmeanthat onetreatsthefactorasafixedeffectsfactorandestimatestheeffectsofthelevels;and treatingthefactorasfixedindicatesamorelimitedscopeofinference.Ontheother hand,affirmativeanswersmeantreatingthefactorasarandomeffectsfactorand estimatingthevariancecomponentduetothatfactor.Inthatcase,whenthereisalso interestintherealizedvaluesofthoserandomeffectsthatoccurinthedata,thenonecan useapredictionprocedureforthosevalues.” cf.,fixedeffects Randomvariable RanksumtestseeWilcoxonranksumtest Rao-Blackwelltheorem(Hogg&Tanis1977,p.404)Let V and Y be two random variables such that V has mean E(V)=è and positive finite variance. Let E(V|Y=y)=w(y). Then the random variable W=w(Y) is such that E(W)=è and Var(W) # Var(V). Thistheoremmeansthatifasufficientstatisticforèexists,sayY,wemaylimitour searchforaminimumvarianceunbiasedestimatortofunctionsofY. RecurrentgroupsanalysisAmethodtographicallydisplayspeciesassociations,introducedby Fager(1957) Referencelevel Toanalyzediscretelydistributedvariablesinaregressionmodel,theyare usuallycodedas0,1dummyvariables.Oneofthelevelsmustbeleftout,andtheone levelthatisleftoutiscalledthereferencelevel(seeRamsey&Schafer1997,p.237) RegressionAtermcoinedbyFrancisGaltonin1879and1886toexplainthebivariate associationbetweenfilialandparentalheights.Yule(1897)wasthefirsttouseleast squarestofitaregressionofYonXbyminimizingthesquaredresidualsbetweenthe regressionlineandY.Thetermregressionwaslaterappliedtotheentirefieldoffitting linearmodelswithleast-squaresmethods.Ramsey&Schafer(1997)state“regression referstothemeanofaresponsevariableasafunctionofanexplanatoryvariable.A regressionmodelisafunctionusedtodescribetheregression.Thesimplelinear regressionmodelisaparticularregressioninwhichtheregressionisastraight-line functionofasingleexplanatoryvariable.” Theseleastsquaresmethodsusedinregressioncanbetracedbackatleastto Legendre (1805)andthenormalequationstoGauss(1822). Theregressionphenomenon–alsocalled regressiontomediocrity,regression tothemean,theregressionartifact–isexpressedmathematicallyas(Stigler1999,p. 176):
Handout2 IntroProb&Statistics TermsP.46of68
(83) Galtonwasthediscovererofregressiontothemean,whichStigler(1999,p.6)regards asoneofthemostsrcinalinthelasttwocenturies: “ … regression to the mean, one of the trickiest concepts in all of
statistics. Galton’s completion of his discovery of this phenomenon in the 1880's should rank with the greatest individual events in the history of science — at a level with William Harvey’s discovery of the circulation of blood and with Isaac Newton’s of the separation of light. In all three cases the discovery is apparently of such an elementary character that it could have been made at least a thousand years earlier, but the fact that it wasn’t and the problems the discoverer had in communicating it convincingly to the world hint at the profound difficulty involved. In all three cases the consequences were immense and far-reaching.” Regressiontothemeanisdescribedin WilliamTrochim’sdatabase: http://trochim.human.cornell.edu/kb/re grmean.htm
Figure9.Theregressionellipsefromp.248 inGalton(1886),postedattheUCLA statisticshistorysite: http://www.stat.http://www.stat.ucla.edu/ history/regression_ellipse.gif
Figure10.PlateXfromGalton(1886), postedat http://www.stat.ucla.edu/history/regression.g if
Handout2 IntroProb&Statistics TermsP.47of68
OLSRegressionFittingdatausingordinaryleastsquares.Theassumptionsthatmatter arethattherearenooutlierswhichsignificantlyaffecttheregressionfitsor statistics,detectedbyCook’sDforexample.Youdon’twanttoseepatterninthe plotofresidualsvs.predictedvalues,norshouldtherebepatternbetweenthe residualsandtheorderinwhichsamplesweretaken(inspaceortime).The formerproblemcouldindicatetheneedfortransformationorforahigherorder regression,andthelattercouldindicatelackofindependenceamongtheerrors. Regression:ModelIIModelIIregressioniscalledforwhenboththeXandYvariablesare
measuredwithconsiderableerror.Legendre&Legendre(1998)provideathorough discussionofmethodsforModelIIregression,includingprincipalcomponents regression. RegressiondiagnosticsSeeJerryDallal’spage: http://www.tufts.edu/~gdallal/diagnose.htmfor descriptionsof Cook’sdistancedifferencesbetweenthe predictedresponsesfromthemodel constructedfromallofthedataandthe predictedresponsesfromthemodel constructedbysettingthei-thobservation aside. DFITSiscaleddifferencebetweenthepredicted responsesfromthemodelconstructed fromallofthedataandthepredicted responsesfromthemodelconstructedby Figure11 .Galton’sregressiontothe settingthei-thobservationaside mean,from Freedmanetal.(1998) DFBetaiwhenthei-thobservationisincludedor exlcuded,DFBETASlooksatthechange ineachregressioncoefficient. Seealso:studentizedresiduals Relativepowerefficiency Relativeriskseeriskratio \RepeatedmeasuresdesignWhen2ormorevariablesaremeasuredfromthesameexperimental units(oftensubjectsorpatientsindrugtrials).Apairedttestisarepeatedmeasures design(actuallyarepeatedmeasuresANOVAwithtwolevelsof1‘withinsubjects’ factorandno‘betweensubjects’factors). ResidualsObservedminuspredictedvalue PRESSresidualsPredictionresidualerrorsumofsquares.Residualsobtainedfrom regressioncoefficientsderivedaftertheeffectofeachcaseisremoved.
Handout2 IntroProb&Statistics TermsP.48of68
Studentizedresidual,rawresidualforacasestandardizedor‘studentized’byscaling withvariablestandarderrorafterthatcaseisdeleted.Thiscanbedonebyadjustingthe meansquareerrorforaregressionbythatcase’sleverage
ResponsevariableThevariablethatisbeingmodeledina regressionmodel.Sometimescalled
thedependentvariable.cf.,explanatoryvariable Retrospectivestudies.Ramsey&Schafer1997p.529 RidgeregressionHoerl&Kennard(1970),quotedinKendall&Stuart1979,p.92Ridge regressionisonemethodforcopingwithcolinearityamongexplanatoryvariables.IfXis thematrixofexplanatoryvariables,thenthenormalequationsaresolvedbyaddinga smallconstanttothesumofsquaresandcroxxproductsmatrixbeforeinversionwith inv(X’X+lambda*I))insteadofinv(X’X).Theeffectofaddingsmallamountstothe maindiagonalisassessedwitharidge-traceplot.ThereisanSPSSmacroavailablefrom RaynaldLavesquetocarryoutridgeregression. RiskratioorRelativeRisk“Theriskratioisaratioofprobabilities,whicharethemselves ratios.Thenumeratorofaprobabilityisthenumberofcaseswiththeoutcome,andthe denominatoristhetotalnumberofcases.Theriskratiolendsitselftodirectintuitive interpretation.Forexample,iftheriskratioequalsX,thentheoutcomeisX-foldmore likelytooccurinthegroupwiththefactorcomparedwithgrouplackingthefactor.” Holcombetal.2001. Zhang&Yu(1978)showtherelationbetweenriskratioand oddsratio:
AsshownbyZhang&Yu(1978,Fig1)theodds ratiooverestimatestheriskratioiftheeventis common.Seealso http://www.childrens-mercy.org/stats/journal/ oddsratio.asp Robust
[robustness]Pvaluesremainaccurate despiteslightviolationsofassumptions.
Figure12 .Relationbetweenthe oddsratioandrelativeriskfrom Zhang&Yu(1978,Fig1).Odds ratiooverestimatesrelativerisk, especiallyiftheeventiscommon.
Handout2 IntroProb&Statistics TermsP.49of68
ROCcurve
ReceiverOperatingCharacteristiccurve. Acurvefromsignaldetectiontheory whichdescribestheclassificationofa signalinthepresenceofnoise(Hosmer &Lemeshow2000,p.160&Figure5.2) Itisusedextensivelyinevaluating diagnostictests,suchasscreeningtests forcancer.Seealso http://www.anaesthetist.com/mnm/stats
/roc/,specificity,sensitivity Figure13 .ROCcurvefromHosmer Rotation Legendre&Legendre(1983,p.309). Transformationsoftheaxesusedtoportraydata. &Lemeshow Bothorthogonal(rigidrotationspreserving Euclideandistancesamongdata[seeVARIMAX])andobliquerotationshavebeenused inFA.Thepurposeofrotationsisnottoimprovethedegreeoffitbetweentheobserved dataandthefactors...thepurposeistoachieve simplestructure. Runs Aconsecutiveseriesofevents.+++00representstworunsand++00+representsthree. Thereareavarietyofrunstestsavailable,severalaredescribedinLarsen&Marx (2001) Sample SAR Simultaneousautoregressivemodelcf.,CAR SARIMAseasonalautoregressiveintegratedmoving-average,cf.,ARIMA Sampleoutcome“Eachofthepotentialeventualitiesofan experimentisreferredtoasa sampleoutcome,s,andtheirtotalityisreferredtoasasamplespace,S.Tosignifythe membershipofsinS,wewrites0S.Anydesignatedcollectionofsampleoutcomes, includingindividualoutcomes,theentiresamplespace,andthenullset,constitutesand event.Thelatterissaidtooccuriftheoutcomeofthe experimentisoneofthemembers oftheevent.”(Larsen&Marx2001,p.21-22) SatterthwaiteapproximationSatterthwaite’s(1946)approximationofthedffortheWelch’st testcf.,Behrens-Fisherproblem
Scheffe’stestAveryconservativemultiplecomparisontestdesignedtoproduceanalphalevel appropriatefortestinglinearcombinationsofthedata(e.g.,groupA+Bvs.Group C+D+E). Screetest (describedbyKim&Mueller,1978,p.44)Cattell(1966)describedtheelbowon thelog(eigenvalue)vs.dimensionplotisthepointbeyondwhichweareinthe“factorial litter”orscree(wherescreeisthegeologicaltermforthedebriswhichcollectsonthe
lowerpartofarockyslope).Thescreetest retainsonlythosefactorsatortotheleftof theelbowforinterpretationorfurtherrotation.Jackson(1993)reviewedvarious stoppingrulesforPCA,concludingthatthescreetestoverestimatedthe“true”numberof factorsby1.
Handout2 IntroProb&Statistics TermsP.50of68
SEM Structuralequationmodeling.Froma4/5/04postfromH.Rubinonsci.stat.edu“The formalideaofstructuralequationsmodeling,asfarasIknow,srcinatedinbiologyin 1919bySewallWright[ThiswouldbeWright’spathanalysis].Theideaisthatone doesnotsimplyhaveregressionswithindependentanddependentvariables,buta structureisusedtodescribetheprobabilitymodelforalldependentvariables.Itwasused inpsychometricswithoutformalrealizationevenearlierinmultiplefactoranalysis,and itwasheavilyusedineconometricsassoonasitwasrealizedthatregressionledtobias.I donotknowwhenthename“StructuralEquationModeling”wasformallyintroduced, butitwasclearlyunderstoodinthemathematicaleconomicsofthe1940s.” SensitivityInadiagnostictestandROCcurve,the sensitivityisdefinedastheproportionofcases (e.g.,patientswithprostatecancer)withatest value(e.g.,PSAantigenlevel)exceedingagiven cutoffvalue.Thespecificityistheproportionof noncases(cancer-freepatients)withatestvalue (PSAantigenlevel)equaltoorbelowthecutoff value.Thefalse-positiverateis(1-specificity) (seeThompsonet al. 2005)seeROCcurve ShrinkageFromHarrell(2002) :Therearetworelated Figure14 .ROCcurveforPSA meanings.Firstinregression,whenonedataset antigentestfromThompsonetal. isusedforcalibration&prediction,theslope (2005).1-specificityisthefalse willbe1(bydefinition).Whenhowever parameterestimatesarederivedfromonedataset positiverate. andappliedtoanother, overfittingwillcausethe calibrationplottohaveaslopelessthan1,aresultofregressiontothemean.Typically lowpredictionswillbetoolow&highpredictionstoohigh.Second,shrinkagecanrefer topre-shrinkingregressionplotssothatthecalibrationplotwillbemoreaccuratewith futuredata. Simplestructure Thurstone’s(1947)termforafactorsolutionwithcertainproperties: Eachvariableshouldhavefactorloadingsonasfewcommonfactorsaspossible,and eachcommonfactorshouldhavesignificantloadingsonsomevariablesandnoloadings onothers.(Kim&Mueller,1978,p.86) “The principle of simple structure: Once a set of k factors has been found that account
for intercorrelations of the variables, these may be transformed to any other set of k factors that account equally well for the correlations....Thurstone (1947) put forward the idea that only those factors for which the variables have a very simple representation are meaningful, which is to say that the matrix of factor loadings should have as many zero elements as possible...a variable should not depend on all common factors but only on a small part of them. Moreover, the same factor should only be connected with a small portion of the variables. Such a matrix is considered to yield simple structure.“ Reyment&Jöreskog(1993,p.87) SIMPLSAlgorithm“Analternativeestimationmethodforpartialleastsquaresregression componentsistheSIMPLSalgorithm(deJong,1993)” http://www.statsoft.com/textbook/stathome.html
Handout2 IntroProb&Statistics TermsP.51of68
SerialcorrelationRegressionanalysistypicallyassumesthatobservationalerrorsarepairwise uncorrelated.Inserialcorrelation,thereisacorrelationbetweenerrorsafixednumberof stepsapart(Draper&Smith1998,p.179).Certaintypesofserialcorrelationaretested withtheDurbin-Watsontest. Simplerandomsampling(SRS):Asimplerandomsampleofsizenfromapopulationisa subsetofthepopulationconsistingofnmembersselectedinsuchawaythateverysubset ofsizenisaffordedthesamechanceofbeingselected. Ramsey&Schafer(1997) Simpson’sdiversity Identicalto-1+Hurlbert’sE(S) n atn=2( Smith&Grassle1977) Simpson(1949)fromPielou(1969):
“SupposetwoindividualsaredrawnatrandomandwithoutreplacementfromanSspeciescollectioncontainingNindividualsofwhichNjbelongtothejthspecies(j=1:s); ÓjN=N).Iftheprobabilityisgreatthatbothindividualsbelongtothesamespecies,we j cansaythatthediversityofthecollectionislow.Thisprobabilityis
,andso
wemayuse:
Thisassumesarandomsampleofapopulation.ThebiasedformofSimpson’s diversityis:
Advantagesandproperties: • Form=2,E(S) n =Simpson’sunbiaseddiversityindex(Smith&Grassle1977) Simpson’sindexisanunbiasedestimator. Problems: ignoresspeciesoccurringonlyonce.Payslittleattentiontorarespecies. cannotbedecomposedintohierarchicaldiversity. Simpson’sparadoxP{A|C}<P{B|C}andP{A|-C}<P{B|-C}butP(A)>P(B).Agresti(1996) presentstheexampleofcapitalpunishmentinFloridainwhichthepercentageofblack capitaldefendantsbeinggiventhedeathpenalty(A)islowerthanthepercentageof whitecapitalcasedefendantsbeinggiventhedeathpenalty(B).But,whenthecasesare partitionedintothosewithawhitevictimandthosewithablackvictim,thepercentage ofblacksgiventhedeathpenaltyishigherthanwhites.cf. ecologicalfallacy, http://plato.stanford.edu/entries/paradox-simpson/and http://www.cawtech.freeserve.co.uk/simpsons.2.html
Handout2 IntroProb&Statistics TermsP.52of68
Singularvaluedecomposition(SVD)Allmatricescanbedecomposedastheproductofthree componentmatrices:U*S*V’.Sisadiagonalmatrixofsingularvalues(i.e.,theoffdiagonalelementsare0s).ThecolumnsofUandtherowsofVareorthogonal.Theratio ofthelargesttothesmallestsingularvalueistheconditionnumberofthematrix.Ill conditionedmatriceshavelargeconditionnumbers(usuallyinthethousands)andare saidtobenotoffullrank.SVDisthemethodofchoiceforcreatingthepowersof matrices.IfPisamatrixandQ*D*R’isthesingularvaluedecompositionofP,then Q*D10*R’=P10 .Thek’thpowerofthediagonalmatrixDiscomputedbyraisingeach diagonalelementtothekthpower.Thebestlowdimensionaldisplayofamatrixina leastsquaressenseiscreatedusingtheSVD.ThisisthebasisofEckart&Young’s (1936)theorem.See: http://mathworld.wolfram.com/SingularValueDecomposition.html SinkspeciesWhereindividualsofaspeciesuseahabitatwheretheircarryingcapacityisless thanzero,thatspeciesisasinkspecies(Rosenzweig1995,p.260 ,Pulliam1988) Skewnesstheskewnessofapdfisthe3rdmomentaboutthemean.Asymmetricpdfhasa skewnessof0.Lognormalpdfsareskewedtotheright http://mathworld.wolfram.com/Skewness.htmlcf., kurtosis Snedecor,GeorgeW1882-1974.DescribedtheFdistributionandnameditforFisher.Author ofafamousstatisticstextbook(withCochran). Somers’D(C|R)andD(R|C)Somers’D(C|R)andSomers’D(R|C)areasymmetric modificationsofKendall’stau-b.C|RdenotesthattherowvariableXisregardedasan independentvariable,whilethecolumnvariableYisregardedasdependent. http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap28/sect20.htm Spearman’sñ Adistribution-free,rank-basedcorrelationcoefficient.Equivalentto Pearson’srafterdataconvertedtoranks.Cf.,Kendall’sô SpecificitySeealsoROCcurve SphericityAformofvariance-covariancematrixassumedbyrepeatedmeasuresANOVA Departuresfromsphericityareassessedusing ,andFteststatisticsareassessedwith
numeratoranddenominator dfadjustedindirectproportionto .Huyhn-Feldtand Greenhouse-Geisseraretwoadjustmentsofdffordeparturesfromsphericity,withthe latterbeingjudgedconservative.Prof.WilliamWareprovidedthispostonthesphericity assumptiontosci.stat.edu(8/26/96)“Saidassumptionisrelevantin“withinsubject” designs,eitherrandomizedblockorrepeatedmeasures.Moststatisticalprocedures assumethattheerrorsareindependent.In“independentgroups”designs,thisreducesto noassociationbetweentheobservationsinthegroups.Butofcourse,in“dependent” samplesdesigns,itisthecorrelationsamongtheobservationsthatweareemployingto reducetheerrorterms...However,ifthecorrelationsareassumedtoarisefromthe “subject”effects,thenitimpliesthatallofthepair-wisecovariancesbetweentreatments shouldbeequaltoonecommonvalue.Thus,theassumptionof sphericityismetisthe variance/covariancematrixisconsistentwiththedatahavingbeendrawnfroma populationinwhichallofthevariancesareequaloneanother,andallofthecovariances areequaltooneanother.Ifyouhavemultiplegroups,thenthe“group” variance/covariancematricesaretestedforequalitypriortopoolingthem.Thepooled matrixisthentestedforsphericity.”SeeSphericityandCompoundSymmetryinthe
Handout2 IntroProb&Statistics TermsP.53of68
ANOVA/MANOVAchapterat http://name.math.univ-rennes1.fr/bernard.delyon/textbook/stathome.html Spearman’sñThePearsonproductmomentcorrelationcoefficientafterthedatahavebeen convertedtorankscf.,Kendall’sô. Split-plotdesignseeANOVA,splitplot SpuriouscorrelationAtermintroducedbyPearson(1897)asnotedbySchlageret al.(1998, p.548): Inaclassicalpaper,Pearson(1897)pointedtoaparticularpropertyof compoundvarialbes,suchasratios,incorrelation.Heshowedthattwo variablesthathavenocorrelationbetweenthemselvesbecomecorrelated whendividedbyathirduncorrelatedvariable.Pearson(1897)introduced theterm‘spuriouscorrelation’forthe‘amountofcorrelationwhich wouldstillexistbetweentheindices,werethevariablesonwhichthey dependdistributedatrandom’ Anotherdefinitionfromtheweb:“Asituationinwhichmeasuresoftwoormore variablesarestatisticallyrelated(theycover)butarenotinfactcausallylinked—usually becausethestatisticalrelationiscausedbyathirdvariable.Whentheeffectsofthethird variablearetakenintoaccount,therelationshipbetweenthefirstandsecondvariable disappears.”[http://www .autobox.com/spur2.html][c.f. nonsensecorrelation] SSCP thesum-of-squares-and-cross-productsmatrix.TheSSCPmatrixforsitesisformedby premultiplyingasitexvariablematrixtimesbyitstranspose.The(i,i)thelementofthe symmetricSSCPmatrixisthesumofsquaresfortheithvariableacrosssites.The(h,i)th elementisthesumofcross-productsofthehthandithvariables. Standarddeviation Thetypicaldistancebetweenasinglenumberandtheset’saverage (Ramsey&Schafer2002);thesquarerootofthevariance
(93)
foraproportion:
Standarderrorandcoefficientofvariation
Thestandarderrorofthesample
meanisthesamplestandarddeviationdividedbythesquarerootofsamplesize.With thefinitepopulationcorrection,(1-n/N),withnbeingthesamplesizeandNthe populationsize,thestandarderrorofthesamplemeanandthecoefficientof variationofthesamplemeanare:
Handout2 IntroProb&Statistics TermsP.54of68
Statistics StatisticsisdefinedbySokal&Rohlf(1995)asthescientificstudyofnumerical databasedonvariationinnature. Statisticscanbeusedinanothervalidsenseasthe pluralofthenounstatistic,anyquantitythatcanbecalculatedfromobserveddata( e.g., thesamplestandarddeviation ).Observationsareusedinthecalculationof samplestatistics(e.g.,thesamplemean andstandarddeviation ,whichare estimatesofpopulationstatisticsorparameters(ì,ó)).Statisticsareusually representedbyRomanletters,whereasparametersarerepresentedbyGreekletters (Ramsey&Schafer1997,p.20)cf.,parameter Stem-and-leafplotAquickgraphicaldisplay methodinventedbyJohnTukey.See http://mathworld.wolfram.com/Stem-andLeafDiagram.html Stigler’slawofeponymy“Noscientific discoveryisevernamedafterits srcinaldiscoverer.” Stigler(1999,p. 277) Stochasticvariable StoppingrulesTherearetwomeaningsfor stoppingrules.Jackson(1993) reviewstestsusedtodecidehow manydimensionstoretaininafactor analysisoraPCAbeforerotation. Stoppingrulesalsoplayarolein sequentialmedicaltrials(Armitage Figure15.Stem&leafdiagramfromStatistical 1975),inwhichaninvestigator Sleuth performsstatisticaltestsonasmall numberofsubjects,andthen sequentiallyaddssubjectsiftheinitialtestwasdeemedinadequatetodistinguish betweennullandalternatehypotheses.Infrequentiststatistics,anadjustmentto experiment-wiseerrorratesmustbemadetotakeintoaccountthenatureofthestopping ruleemployed.Mayo(1996)stronglycriticizesBayesianstatisticalinferenceforits
inabilitytoaccountforstoppingrules. Structuralequationmodeling(SEM)Atechniquetocreatemodelstoexplainpatternsof covariationamongvariables.TheparametersofanSEMareusuallyfitbythemethodof maximumlikelihoodcf.,factoranalysis,pathanalysis,regression
Handout2 IntroProb&Statistics TermsP.55of68
Student’stdistributionAdistributionthatissimilartothenormaldistributionbutaccountsfor theincreaseddispersioncausedbyhavingtoestimatethestandarddeviationfromthe sample.DevelopedbyWilliamGosset,whopublishedunderthenom de plumofStudent Student’sttestAtestforthedifferencebetweentwomeanswhenthevariancesareunknown andmustbeestimatedfromthedata.Thesevariancesareassumedtobeequalin estimatingthepooledvariance.Therearetwoformsoftest:theindependentsamplestt testandthepairedttest,forpaireddata.Theprobabilitythattheobservedresultsare compatiblewithanullhypothesisofnodifferenceisassessedusingStudent’st distribution. Theproblemofperformingatestofmeandifferencewithunequalvarianceis
calledthe Fisher-BehrensproblemandtheWelch’sttestwasdevelopedasa replacementfortheindependentsamplesttestforthatpurpose. SumofSquaresTypeI,II,III&IVSSasusedinSPSSGLMaredefinedat(loginasguestwith passwordguest): http://www.spss.com/tech/stat/algorithms/7.5/ap11smsq.pdf Suppressorvariable“Inthetwo-predictorsituation…traditionalandnegativesuppressors increasethepredictivevalueofastandardpredictorbeyondthatsuggestedbythe predictor'szeroordervalidity.”Conger(1974) Surveydesign[Samplesurveydesign]Asurveyisanobservationalstudyofafinitestatistical population,notan experiment.Hurlbert(1984)referredtoonetypeofsurveydesign, measuringaresponsevariableatdifferentlevelsofacovariate,asamensurative experiment.AsAsurveydesigndescribesthegoalsofthesurvey,usuallytoestimate populationparameters,determinesthenumberandmannerofsamplingthepopulationor populationsofinterest.Surveydesigninvolvestheallocationofsamplingunits,suchas quadratsamples,withlocationsdeterminedbysystematic,orrandomsampling.Transect samplingisoneformofsystematicsampling,oftenincludingarandomcomponent,such asrandomdirectionsorstartingpointsforthetransectorrandompositionsalongthe transect.Clustersamplinginvolvessamplinggroupsofindividuals,sometimesbychoice butusuallybynecessity.Forexampleaquadratsampleofareaprovidesaclustersample ofindividualswithinthatarea.Oftenthestatisticalpopulationisdividedintostratato allowmorepreciseestimatesofpopulationparametersforagivensamplingeffort.As notedbyHayek&Buzas(1996),datafromsurveydesignsinvolvingclustersamplingor systematicsamplingcan’tbepooledtoestimatemeansandvariancesasifthe observationsweresimplerandomsamples;oftenthevarianceofclusterortransect samplesdifferfromthosecalculatedassumingsimplerandomsampling.SAShasnew proceduresthatwillincorporatesurveydesignsintheestimationofpopulation parameters:http://support.sas.com/rnd/app/papers/survey.pdfCf.,Kendall&Stuart distinctionbetweenexperiment&survey t-testStudent’sttest teststatistic Hogg&Tanis(1977,p.255)Astatisticusedtodefinethecriticalregionis calleda teststatistic.ThecriticalregionCisoftendefinedasasetofvaluesofthetest o statisticthatleadstotherejectionofthenullhypothesisH. Time-seriesanalysisInmodelingtimeseriesthroughregression,theindependenceassumption ofordinaryleastsquaresregressionisoftenviolated.Thereispositiveserialcorrelation inregressionresiduals,withnearbypointsintimebeingmoresimilarthanexpectedby
Handout2 IntroProb&Statistics TermsP.56of68
chance.Thislackofindependenceduetopositivetemporalserialcorrelation,alsocalled positivetemporalautocorrelation,isthatthestandarderrorsoftheestimatesaretoo small.TestsbasedonthesestandarderrorswillhaveinflatedTypeIerrorsrelativeto nominalerrors.Therearetwomajorsolutions:adjustingthestandarderrortoaccountfor theserialcorrelationortousefilteringtoadjustboththeresponseandexplanatory variablesinregression.Mosttime-seriesanalysispackageswillhaveroutinestofittime series,adjustingforautocorrelation,usingmaximumlikelihoodextimation. TobitAnalysis Tobitanalysisisaformofgeneralizedlinearmodeling,appropriatefor censoreddata,e.g.,datacontainingalargenumberofzeros.Tobitmodelingwillfitthe non-zerodata. Two-sidedtestalsocalledtwo-tailedtestcf.,one-sidedtest Tukey-KramertestAnaposterioritestbasedonthestudentizedrangestatistic.Itisan extensionofTukey’sHSD,orhonestlysignificantdifference,forunequalsamplesizes. TypeIerror Theerrormadewhenrejectingatrue nullhypothesis.TheprobabilityofTypeI error,calledthealphalevelorsignificancelevel ofthetest,issetinadvanceinthe Neyman-Pearsonschoolofstatisticalinference. TypeIIerrorTheerrormadewhenacceptingafalse nullhypothesis.TheprobabilityofType IIerroriscalledâand1-âisthepowerofthetest. Union Seeintersection Uniquenesstheextenttowhichthecommonfactorsfailtoaccountforthetotalvarianceofa variable. VariableFromMathworld:“Avariableisasymbolonwhosevalueafunction,polynomial, etc.,depends.Forexample,thevariablesinthefunctionf(x,y)arexandy.Afunction havingasinglevariableissaidtobeunivariate,onehavingtwovariablesissaidtobe bivariate,andonehavingtwoormorevariablesissaidtobemultivariate.Ina polynomial,thevariablescorrespondtothebasesymbolsthemselvesstrippedof coefficientsandanypowersorproducts.” Variance ameasureofthedispersionofavariable;definedasthesumofsquared deviationsfromthemeandividedbythenumberofcasesorentities.cf.,standard deviation (99)
Varianceinflationfactor Adiagnostictestformulticollinearityinmultipleregression VenndiagramsAgraphicaldisplay,usuallyconsistingofcirclesonasquarebackgroundwhich areusedtodisplaytheintersection,unionandcomplementofevents. Vertex(vertices) Thepointsornodesinagraph.Verticesmaybeconnectedwithedges. Waldstatistic Agresti(1996,p.88):anystatisticthatdividesaparameterbyitsstandard errorandsquaresitiscalledaWaldstatistic.Ingeneralizedlinearmodels,
parameterestimatesz= /Standarderrorareevaluatedwiththestandard normaldistributionorequivalentlyz2hasachi-squaredistributionwith df=1;thepvalueistheright-taildistributionofthechi-square distribution.
Handout2 IntroProb&Statistics TermsP.57of68
WA-PLS WeightedAveragePartialLeastSquaresseeterBraak&Juggins Weibulldistribution Welch’sttestAnmodificationofStudent’sttesttotestfordifferencesinmeanswithsamples drawnfrompopulationswithdifferentvariances.Thedegreesoffreedomforthetestare adjustedbytheSattertherwaiteapproximationcf.,Behrens-Fisherproblem.FlignerPolicellotest http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap67/sect16.htm (101)
Wilcoxonranksumtest Atestfordifferenceinlocationorcentraltendencybetweentwo samples.Thenonparametricequivalentoftheindependentsamples Student’sttest.The testassumesthatthesamplesaredrawnfromdistributionswithequalspread,equivalent totheequalvarianceassumptionofStudent’sttest.Thepvaluesareidenticaltothose calculatedfromthe Mann-WhitneyUtest,andtheMann-WhitneyUandWilcoxon’s ranksumteststatisticscanbeconvertedexactly.TheFligner-Policellotestisarankbasedequivalentforsampleswithunequalspread,butitprobablyisonlyappropriatefor largesamplesizes.Salsburg(2001)profilesWilcoxon,achemicalengineer. WilcoxonsignedranktestThenonparametricequivalentofthepaired-samplesStudent’sttest. WLSAWeightedLeastSquaresisageneralizedleastsquaresregressioninwhichtheequal varianceassumptionisrelaxed.Draper&Smith(1998)isanexcellentreferenceon WLSregression,whichcanbepeformedreadilywithSPSS. Wright,Sewall FounderofquantitativepopulationgeneticswithFisherandJ.B.S.Haldane. Inventedpathanalysisearlyinhiscareerandusedthismethodtoanalyzepatternsof inheritance.Hismajorcontributionwasdescribinggeneticdriftandhisshiftingbalance modelofevolution.Cf.,SEMSee http://books.nap.edu/books/0309049784/html/438.html Yule,GeorgeUdny(1871-1951)StudentofPearson.Yule(1897)adaptedGauss’snormal equationapproachtoestimatetheslopeofaregressionline (Stigler,1986,p.350).Our modernapproachofestimatingtheslopeandy-interceptofaleast-squaresregressioncan betracedto Yule(1897).Yulecoinedtheterm nonsensecorrelation. ztransformThestandardnormaldistributionisoftencalledthezdistribution.Theztransform —subtractthemeananddividebythestandarddeviation—producesatransformed variatewithzeromeanandunitstandarddeviation.The z-scoreisthecutpointofthe standardnormaldistribution(e.g.,az-scoreof-1.96correspondstop=0.025onthe cumulativenormalprobabilitydistribution,z-score=0.5correspondstop=0.5onthe cumulativenormalprobabilitydistributionandz-score=1.96correspondstop=0.975on thecumulativenormalprobabilitydistribution.
References Abramowitz,M.andI.A.Stegun,Eds.1965.Handbookofmathematicalfunctionswith formulas,graphs,andmathematicaltables.DoverPublications,NewYork.1045pp.[25]
Handout2 IntroProb&Statistics TermsP.58of68
Agresti,A.1996.Anintroductiontocategoricaldataanalysis.Wiley,NewYork.[15,31,41, 51,56] Armitage,1975.Sequentialmedicaltrials,2ndedition.JohnWiley&Sons,NewYork.[54] Bell,E.T.1937.Menofmathematics.Simon&Schuster,NewYork.[22,39] Boesch,D.F.1977.Applicationofnumericalclassificationinecologicalinvestigationofwater pollution.EnvironmentalProtectionAgency,EcologicalResearchSeries EPA-600/3-77-033.Corvallis,Oregon.115pp.[?] Box,G.E.PandD.R.Cox.1964.Ananalysisoftransformations.J.Roy.Statist.Soc.B-26, 211-243,discussion244-252.AscitedinDraper&Smith(1998)[8] Campbell,D.T.andD.A.Kenny.1999.Aprimeronregressionartifacts.TheGuilfordPress, NewYork.[10] Cochran,W.G.andG.M.Cox.1957.Experimentaldesigns.JohnWiley&Sons,NewYork. 611pandtables.[ 5,14] Cohen,J.etal.2003.Appliedmultipleregression/correlationanalysisforthebehavioral sciences,thirdedition.LawrenceErlbaumAssociates,Mahwah,NJ.[12] Conger,A.J.1974.Areviseddefinitionforsuppressorvariables:aguidetotheiridentification andinterpretation.EducationalandPsychologicalMeasurement 34:35-46.[?] Draper,N.R.andH.Smith.1998.AppliedRegressionAnalysis,3rdEdition.JohnWiley& Sons,NewYork.706p,withdatadiskette.[8,19,28,34,51,57,58] Eckart,C.andG.Young.1936.Theapproximationofonematrixbyanotheroflowerrank. Psychometrika1:211-218.[52] Fager,E.W.1957.Determinationandanalysisofrecurrentgroups.Ecology38:586-595.[45] Freedman,D,R.PisaniandR.Purves.1998. Statistics,3rdedition.Norton,NewYork.[This is a wonderful introduction to probability and statistics. It is very elementary though, and the authors’ avoidance of any equations really limits the book’s usefulness] [47] Galton,F.1877.Typicallawsofheredity.Nature15:492-495.[?] Galton,F.1886.Familylikenessinstature.Proc.Roy.Soc.London40:42-73.[45] Galton,F.1888.Co-relationsandtheirmeasurement,chieflyfromanthropologicaldata.Proc. Roy.Soc.London.45:133-145.[12]
Handout2 IntroProb&Statistics TermsP.59of68
Gauss,C.F.1822.AwendungderWarhsceinlichkeitsrecnungaufeineAufgabederpractischen Geometrie.AstronomischeNacricten,vol.1/6,cols81-86.[FullcitationinStigler 1999,p.445][36,45] Golumbic,M.C.1980.Algorithmicgraphtheoryandperfectgraphs.AcademicPress,New York.[36,45] Gondran,M.andM.Minoux.1984. Graphsandalgorithms.JohnWileyandSons,NewYork. [18] Gower,J.C.1966.Somedistancepropertiesoflatentrootvectormethodsusedinmultivariate analysis.Biometrika53:325-338. [?] Greenacre,M.1984.TheoryandApplicationofcorrespondenceanalysis.AcademicPress, Orlando.[12] Harman,H.H.1967.ModernFactorAnalysis.Univ.ChicagoPress,Chicago&London.474pp. [19] Hogg,R.V.andE.A.Tanis.1977.Probabilityandstatisticalinference.MacMillanpublishing, NewYork.450pp.[7,10,14,18,19,26,29,40,45,55] Holcomb,W.L.,T.Chaiworapongsa,D.A.LukeandK.D.Burgdorf.2001.AnOddMeasureof Risk:UseandMisuseoftheOddsRatio.Obstetrics&Gynecology2001;98:685-688. [48] Hosmer,D.W.andS.Lemeshow.2000.Appliedlogisticregression,2ndEdition.JohnWiley& Sons,NewYork.373pp.[49] Hurlbert,S.M.1971.Thenon-conceptofspeciesdiversity:acritiqueandalternative parameters.Ecology52:577-586.[17,25] Jackson,D.A.1993.Stoppingrulesinprincipalcomponentsanalysis:acomparisonof heuristicalandstatisticalapproaches.Ecology74:2204-2214.[49] Jardine,N.andR.Sibson.1968.Theconstructionofhierarchicandnonhierarchic classifications.ComputerJ.11:177-184.[?] Kemeny,J.G.andJ.L.Snell.1976.FiniteMarkovchains.Springer-Verlag,NewYork,New York,U.S.A.[18,23] Kendall,D.G.1969.Someproblemsandmethodsinstatisticalarchaeology.World Archaeology 1:68-76.[?]
Handout2 IntroProb&Statistics TermsP.60of68
Kendall,M.G.andA.Stuart.1979.TheAdvancedTheoryofStatistics,Vol.2.Hafner,New York.[11,20,33,48] King,G.1997.Asolutiontotheecologicalinferenceproblem:reconstructingindividual behaviorfromaggregatedata.PrincetonUniversityPress,PrincetonNJ.342pp.[16] Larsen,R.J.andM.L.Marx.2001.Anintroductiontomathematicalstatisticsandits applications,3rdedition.PrenticeHall,UpperSaddleRiver,NJ.[12,20,41,43] Larsen,R.J.andM.L.Marx.2006.Anintroductiontomathematicalstatisticsandits applications,4thedition.PrenticeHall,UpperSaddleRiver,NJ.920p.[44] Leathwick,J.R.andM.P.Austin.2001.CompetitiveinteractionsbetweentreespeciesinNew Zealand’sold-growthindigenousforests.Ecology2560-2573.[23] Legendre,A.M.1805.Nouvellesméthodespourladéterminationdesorbitesdescomètes. Paris:Courcier[SeefullcitationinStigler1986,p.388][28,36,45] Legendre,P.andL.Legendre.1998.NumericalEcology,2ndEnglishEdition,Elsevier, Amsterdam.853pp.[21,47] Legendre,P.andE.Gallagher.2001.Ecologicallymeaningfultransformationsforordinationof speciesdata.Oecologia:129:271-280.[12] Lehmann,E.L.2006.Nonparametrics.Statisticalmethodsbasedonranks,RevisedFirst Edition.Springer,NewYork.463pp.[6] Mayo,D.G.1996.Errorandthegrowthofexperimentalknowledge.UniversityofChicago Press,Chicago&London.493pp.[54] McCulloch,C.E.AndS.R.Searle.2001.Generalized,linear,andmixedmodels.JohnWiley& Sons,NewYork.325pp.[24,44] Mead,R.1988.Thedesignofexperiments.CambridgeUniversityPress,Cambridge.620p.[14, 15] Nahin,P.J.2002.Duellingidiotsandotherprobabilitypuzzlers.PrincetonUniversityPress, PrincetonN.J.[25] Neter,J,M.H,Kutner,C.J.NachtsheimandW.Wasserman.1996.Appliedlinearstatistical models.Irwin,Chicago.1408pp.withdatadiskette.[14] Pearson,K.1897.Onaformofspuriouscorrelationswhichmayarisewhenindicesareusedin themeasurementoforgans.Proc.Roy.Soc.London60:489-502.{CitedbySchlageret al.(1998)}[53]
Handout2 IntroProb&Statistics TermsP.61of68
Pielou,E.C.1969. Anintroductiontomathematicalecology.Wiley-Interscience,NewYork. [51] Pielou,E.C.1984. Theinterpretationofecologicaldata:aprimeronclassificationand ordination.JohnWiley&Sons,NewYork.Readpp.13-81[44] Pielou,E.C.1984. Theinterpretationofecologicaldata:aprimeronclassificationand ordination.JohnWiley&Sons,NewYork.[44] Popper,K.R.1959.TheLogicofScientificDiscovery.Hutchinson&Co.,London.[40] Pulliam,H.R.1988.Sources,sinks,andpopulationregulation.Amer.Natur.132:652-661.[p. 52] Ramsey,F.L.andD.W.Schafer.1997.Thestatisticalsleuth:acourseinmethodsofdata analysis.DuxburyPress,BelmontCA.742pp.[4,11,18,24,26,27,38,44,45,48,51, 54] Ramsey,F.L.andD.W.Schafer.2002.Thestatisticalsleuth:acourseinmethodsofdata analysis,2ndEdition.DuxburyPress,PacificGroveCA.742pp.[3,4,10,11,18,24,26, 27,30,38,44,45,48,51,53,54] Robert,C.P.andG.Casella.1999.MonteCarlostatisticalmethods.Springer-Verlag,New York.507pp.[6] Roberts,F.S.1976. Discretemathematicalmodelswithapplicationstosocial,biological,and environmentalproblems.Prentice-Hall,EnglewoodCliffs,NewJersey.[2,31] Robinson,W.S.1950.Ecologicalcorrelationandthebehaviorofindividuals.American SociologicalReview15:351-357.[16] Rosenzweig,M.L.1995.Speciesdiversityinspaceandtime.CambridgeUniversityPress, Cambridge.[p.52] Salsburg,D.2001.Theladytastingtea:howstatisticsrevolutionizedscienceinthetwentieth century.W.H.Freeman&Co.,NewYork.340pp.[57] Schlager,W.,D.Marsal,P.A.G.vanderGeest,andA.Sprenger.1998.Sedimentationrates, observationspan,andtheproblemofspuriouscorrelation.MathematicalGeology30: 547-556.[p.53,60] Shmida,A.andS.Ellner.1984.Coexistenceofplantspecieswithsimilarniches.Vegetatio58: 29-55.[p.?]
Handout2 IntroProb&Statistics TermsP.62of68
Shmida,A.andM.V.Whittaker.1981.Patternandbiologicalmicrositeeffectsintwoshrub communities,southernCalifornia.Ecology62:234-251.[p. ?] Shmida,A.andM.V.Wilson.1985.Biologicaldeterminantsofspeciesdiversity. J. Biogeography12:1-20.[p. ?] Smith,W.andJ.F.Grassle.1977. Samplingpropertiesofafamilyofdiversitymeasures. Biometrics33: [51] Sokal,R.R.andF.J.Rohlf.1995.Biometry,3rdEdition.W.H.Freeman&Co.,NewYork.887 pp.[A top-notch guide to statistics with many biological examples. This text does a particularly good job with one-way ANOVA. and multiple-comparison tests][34,54] Stevens,S.S.1951.Mathematics,measurement,andpsychophysics.Pp.21-30inS.S.Stevens, ed. HandbookofExperimentalPsychology.Wiley,NewYork.[31] Stigler,S.M.1986.Thehistoryofstatistics:themeasurementofuncertaintybefore1900. BelknapPress,Cambridge.[10,12,15,28,35,36,57] Stigler,S.M.1999.StatisticsontheTable.BelknapPress,Cambridge.[36,45,46,54] Tabachnick,B.G.&L.S.Fidell.2001.Usingmultivariatestatistics,4thEd.Allyn&Bacon, Boston.966pp.[16,43] Thompson,I.M.,D.P.Ankerst,C.Chi,M.S.Lucia,P.J.Goodman,J.J.Crowley,H.L.Parnes, C.A.Coltiman.2005.Operatingcharacteristicsofprostate-specificantigeninmenwith aninitialPSAlevelof3.0ng/mlorlower.J.Amer.Med.Assoc.294:66-70.[50] Toothaker,L.E.1993.Multiplecomparisonprocedures.SagePublications,NewburyPark,CA. 96pp.[16] Torgerson,W.S.1952.Multidimensionalscaling:I.Theoryandmethod.Psychometrika17: 401-419.[?] Vellman,P.F.andWilkinson,L.1993.Nominal,ordinal,intervalandratiotypologiesare misleading.TheAmericanStatistician,47(1),65-72.[?] Yule,G.U.1897.Onthetheoryofcorrelation.J.Roy.Stat.Soc.60:812-854.[45,57] VanKampen,N.G.1981. Stochasticprocessinphysicsandchemistry.NorthHolland, Amsterdam.[31]
Index
Handout2 IntroProb&Statistics TermsP.63of68
'atleastone'.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 2, 34 accuracy ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . 2, 5, 40 adjustedR-squared .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. .. .. . . 2 alphalevel.... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 2, 20, 22, 34, 49, 56 alternativehypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 ANCOVA... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... 3 ANOVA .. .. .. .. .. .. .. .. .. .. .. .. 2-5,9, 14, 16, 22-24, 27-30,34, 35, 43, 44, 47, 52, 53, 62 hierarchical .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 4 ModelII ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. 24 nested ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .. 4, 35 oneway ......................... ............................ ... 27, 28, 62 autocorrelation .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 56 Axiomaticprobability .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6, 41 Bartlett’stest.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 6 Bayestheorem .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 6, 40 Bayesianinference .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . 6, 23 BehrensFisher ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... 7, 22, 49, 57 Bernoullitrial .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7, 26 Betadistribution .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 7 bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Binomialdistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7, 22, 35, 40 Binomialtheorem .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 7 Biometry... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 7, 62 Birthdayproblem .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7 bivariate ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .... 7, 12, 15, 30, 45, 56 8,, 34 Bonferronimultiplecomparisonprocedure . ,. .27 Bootstrap ... ... ... ... ... ... ... ... ... . ..... . ..... . ..... . ..... . ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 8 34 Boxplot ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . 9 Box-Coxtransformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8, 9 canonicalcorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 categoricaldata. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 58 Cauchydistribution.. .. .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. 10 causation... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . 10 censoreddata .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 24, 56 census ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .... 10, 20, 44 Centrallimittheorem .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. 10 Chebyshev’sinequality .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chisquaredistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Classicalprobability . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. 41 clustereffect .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 10 clustersampling .. .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . . 10, 55 Coefficientofdetermination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 combinations .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7, 11, 21, 27, 43, 49 combinatorics .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 11, 29, 35 complement .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 2, 11, 15, 26, 56
Handout2 IntroProb&Statistics TermsP.64of68
Conditionalprobability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11, 31 Confidenceinterval... . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. 8, 11 Confoundingvariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11, 38 Consistentestimator .. .. .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . . 11, 19 contingencytable .. . .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. 11, 16 Continuousprobabilityfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Cook’sdistance .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 47 Cornertest .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 11 correlation.................. 10, 12, 13, 16, 22, 23, 27, 34, 35, 39, 43, 51-53, 55-58, 61, 62 spurious ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. 35, 38, 53, 60, 61 correspondenceanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10, 12, 59 covariance ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 3 ,4, 8, 12, 13, 21, 24, 43, 52 covariate ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. 55 criticalregion .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 14, 55 criticalvalue .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 14 Datamining .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 15 degreesoffreedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4, 5, 8, 15, 21, 28, 43, 49, 52, 56, 57 deviance ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. 15 DFFITS... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... . 15 Discreteprobabilityfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 discriminantanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15, 22 Distributions bivariatenormal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7, 12 empirical... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . 41 exponential .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 20 F ........................... , 52 gamma ..... ..... ..... ..... ................................. ..... ..... ..... ..... ..... .......... ..... ..... 21 2 2, 23 geometric .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 25 Gompertz ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . 25 hypergeometric ..................................................... 26, 35 lognormal .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 30 negativebinomial .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. 35 normal ..... ..... ..... ..... ..... ..... ..... ..... ..... . 7, 12, 24, 35, 36, 55-57 Poisson ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 35, 39, 40 posterior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Weibulldistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Duncan’stest .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 16 Durbin-Watsontest. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. 16, 51 Ecologicalfallacy .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. .. 16, 51 Efficiency .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 5, 6, 15, 47 efficientestimator. .. .. .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . . 17, 19 empiricaldistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Empiricalrule .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 18 errorsumofsquares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28, 40, 47 Estimator
Handout2 IntroProb&Statistics TermsP.65of68
maximumlikelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19, 31 Expectedvalue .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 5, 7, 13, 19, 21, 35, 40 Experiment .. .. .. .. .. . .. .. . .. .. .. .. . .. 2, 7, 8, 10, 15, 20, 22, 33, 34, 36, 41, 42, 49, 54, 55 experimentaldesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7, 20, 22, 38 experimentalunit .. . .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. . 4 extra-binomialvariation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 F-test.. ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... . 21, 28 extra-sum-of-squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Lack-of-fit ......................................................... 27, 28 factoranalysis .. . .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . . 21, 31, 50, 54, 59 Fermat... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 22, 39, 41 Fisher ... ... ... ... ... ... ... ... ... ... ... ... ... ... . 2, 3, 7, 15, 21-23,29, 49, 52, 55, 57 Fisher’sexacttest . .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. 22 forwardselection . .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. 22 Friedman’stest .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 23 Galton. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 12, 23, 45, 46, 58 Gauss ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... . 2 3,36, 45, 59 Gaussiancurve .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 23 Generallinearmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4, 24, 25, 41, 43, 55 generalizedlinearmodel......................................... 4, 15, 24, 25, 30, 40 Geometricseries .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 25 GLS........ ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... 24, 25 Goodnessoffit . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. 2, 15, 25 Gosset ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .. 25, 55 heteroscedasticity .. . .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. 25, 29 homogeneityofvariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 . . ,356 ,8 honestsignificantdifference................................................. Hotelling’sT-squaredtest .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25, 34 independence .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 11, 26, 47, 55, 56 independenttrials . .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. 7, 26 intersection .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 2, 15, 26, 56 inter-quartilerange .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. 26 IQR... ............................ ............................ .............. 9 Jackknife ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. 8, 27 KruskalWallis ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... 23,27 kurtosis ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 27, 52 Laplace ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .... 6, 10, 28 LatinSquares .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 28 Leastsignificantdifference .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28, 30, 34 leastsquares . . . . . . . . . . . . . . . . . . . . . . . 16, 24, 25, 27-29, 35, 36, 38, 39, 45, 47, 50, 52, 55, 57 Legendre .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 12, 21, 28, 29, 33, 36, 43, 45, 47, 49, 60 levelofsignificance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Levene’stest .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 6, 9, 29 leverage ....... ........ ........ ....... ........ ........ ....... ........ .... 2 9,48 likelihood .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 4, 8, 9, 12, 15, 17, 19, 22, 29, 31, 33, 41, 54, 56
Handout2 IntroProb&Statistics TermsP.66of68
Likelihoodfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8, 19, 29, 31 Likelihoodratio .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 29 linearcombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21, 24, 29, 30, 34, 43 linearcontrast .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 29, 30, 38 Linearmodel.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 3, 4, 8, 9, 15, 24, 25, 30, 40 Linearregression . .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. 30, 38, 39, 45 Logisticregression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24, 30, 38, 41, 59 logit .................................................................. 5, 30, 41 Mallow’sCp. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 6, 30 Mann-WhitneyUTest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30, 57 Markovchain .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 2, 18, 22, 23, 30 absorbing ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. 2, 23, 25, 30 ergodic .......................... ............................ ... 18, 22, 30 Markovprocess... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 30 definitions.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 30 Matlab ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... 12, 22, 35, 36 Maximumlikelihood......................... 4, 8, 9, 12, 17, 19, 22, 29, 31, 33, 41, 54, 56 MDS ......................... ............................ .................. 31 meansquare .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 4, 14, 28, 30, 48 Median ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .... 9, 29, 31 Methodofleastsquares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28, 29 Mill’scannons .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 33 Mixedmodel.... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 4, 24, 33 Mode .......................................................... 18, 21, 40, 43, 44 Modustollens .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 34, 36 MontyHallproblem .. .. .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. 34 Multicollinearity varianceinflationfactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34, 56 multinomialdistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Multiplecomparisontests.................................................... 2, 30 Multiplecorrelation .. .. .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. 34 Multipleregression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6, 30, 32, 34, 35, 56, 58 multiplicationrule .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. 35 multivariate .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10, 13, 14, 16, 22, 25, 35, 38, 39, 43, 56, 59, 62 Multivariatehypergeometricdistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 mutuallyexclusive .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. 7, 35, 41 nonparametric ...................................................... 23, 27, 35, 57 nonsensecorrelation . .. .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . . 35, 57 normalcurve. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 23, 35 normalequations . .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. 34, 36, 45, 48 normality ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. 6 nullhypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3, 14, 23, 36, 38, 40, 55, 56 Observationalstudy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20, 36, 55 Odds ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... . 13, 30, 37, 38, 48, 59 Oddsratio .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 37, 38, 48, 59
Handout2 IntroProb&Statistics TermsP.67of68
OLS(OrdinaryLeastSquares) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19, 32, 38, 47 overdispersion.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 35, 38 P-Value ......................................................... 8, 23, 37, 38, 56 paireddata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22, 55 Parameter .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7-9,15, 18, 19, 29, 31, 35, 38, 40, 43, 50, 54, 56 Partialcorrelation . .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . . 12, 23 Pascal ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . 22, 39, 41 pathanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21, 39, 54, 57 Pearson .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 2, 10, 14, 22, 23, 32, 35, 39, 53, 56, 57, 60 placebo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4, 37 Poisson ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . 7, 16, 20, 24, 35, 38-40 Poissonlimittheorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19, 20, 22, 32, 33, 41, 45, 51-55, 57, 61 Power ..... ..... ..... ..... ..... ..... ..... ..... ..... ... 3, 5, 15, 32, 40, 44, 47, 52, 56 Precision. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 2, 5, 14, 15, 40 Probability . . . . . . . . . . . . . . . 2, 3, 6, 7, 9, 11, 17, 19, 20, 22-26,29-31,37-45, 48, 50, 51, 55-60 subjective .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 42 Probabilityfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25, 41, 42 Propagationoferror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29, 43 Quadraticequation .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. 44 Quadraticterm .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 44 Quotasampling.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 10, 44 randomsample .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 4, 18, 45, 51 Randomvariable .. .. .. .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . 10, 19, 35, 39, 42, 43, 45 randomizedblockdesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 45 Referencelevel .. -..13.., 16 .. , ..19.., 20 .. , ..22..-24.., 27 .. -..30.., 32 .. , ..34..-36 .., .. .. -..48.., 50 .. , ..51., 16 Regression . . . . .. . 2.., 3.., 6.., 10 38.. -41.., 44 54,-59 ModelII ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. 47 multicollinearity . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. . 11, 34, 56 Suppressorvariable .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Regressiontothemean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23, 45-47, 50 tomediocrity... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 45 Regularergodicchain definitions.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 18 Relativeefficiency .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. . 5, 6 Relativerisk .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 47, 48 Repeatedmeasures.................................. 4, 14, 23, 30, 31, 33, 38, 43, 47, 52 Residuals ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . 24, 29, 45, 47, 55 Ridgeregression .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 48 Runs ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... . 49 sample .. .. .. .. .. .. . 3-5,7, 11, 14, 17-21, 25-27, 29, 31, 32, 35, 38, 41, 42, 45, 49, 51, 53-57 Scheffe’stest .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 49 Schwarz’sBayesianInformationCriterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2, 6, 7 Signtest ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 6, 22 Simpson'sdiversity
Handout2 IntroProb&Statistics TermsP.68of68
definitions.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 51 Simpson’sparadox .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . . 16, 51 Skewness ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . 27, 52 squareroottransformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Standarddeviation .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. . 11, 53-57 Standarderror .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 30, 48, 53, 56 Statistic .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 6, 10, 11, 14, 15, 18, 19, 27, 31, 38, 45, 54-56 Student'stdistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16, 25, 55 Studentizedrange .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 56 Studentizedresidual .. .. .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. 48 Student’st.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 16, 25, 38, 55, 57 Sufficientestimator .. .. .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. 19 Sumsofsquares .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 4 Teststatistic . .. .. .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. . 14, 22, 27, 52, 55, 57 Tukey-Kramertest .. .. . .. .. . .. .. .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. .. . .. .. .. .. . . 34, 56 TypeIerror. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 2, 3, 20, 22, 28, 36, 56 TypeIIerror .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 3, 36, 40, 56 ttest ........................... ............................ ............ 5 , 6, 55 unbiasedestimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7, 19, 35, 45, 51 union .............................................................. 2, 15, 26, 56 univariate ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... . 14, 25, 56 variable .. .. . .. .. .. .. . .. .. . 5,7, 8, 10-13, 16, 19-24,27, 29-35,39, 41-43, 45, 48, 50, 52-56 variance . . . . . . . . . . . 3,4, 8-10, 18, 19, 21, 24, 25, 28, 29, 34, 35, 38, 40, 43, 45, 52, 53, 55-57 Wilcoxonranksumtest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22, 27, 30, 45, 57 Wilcoxonsignedranktest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6, 57 Yule .........................
............................
............
45, 57, 62