STATISTICAL RELATIONSHIP BETWEEN INCOME AND EXPENDITURES, (INCOME=DEPENDENT VARIABLE & EXENDITURES=INDEPENDENT VARIABLE)
A Project Presented By
Rehan Ehsan Contact# +92 321 8880397
[email protected] To Dr. Naheed Sultana In partial fulfillment of the requirements for course completion of ECONOMETRICS
M.PHIL (FINANCE) (SEMESTER ONE)
LAHORE SCHOOL OF ACCOUNTING & FINANCE The University of Lahore
1
Acknowledgement Acknowledgement To say say this this proj projec ectt is “by “by Reha Rehan n Ehsa Ehsan” n” over oversta state tes s the the case case.. Without the significant contributions made by other people this project would certainly not exist. I would like to say thanks to general public who helped me out to have questionnaires regarding their income and expenses. Thanks to their cooperation and thanks to my colleges as well who helped me making my project completed.
2
ABSTRACT
We found that monthly expenditures are dependent on the monthly total income income and the contributio contribution n of population population is very low in this regards. regards. As pers person on who who earn earns s also also make make expe expens nses es and and also also save save the the surpl surplus us amou amount nt so tota totall mont monthl hly y inco income me is brea break k up of Expe Expend ndit itur ures es and and savings.
3
TABLE OF CONTENTS Introduction------------------------------------------------------------------------------------------------1 Data table---------------------------------------------------------------------------------------------------1 Descriptive statistics--------------------------------------------------------------------------------------2 Frequency table--------------------------------------------------------------------------------------------4 Histogram--------------------------------------------------------------------------------------------------6 Simple linear regression function-----------------------------------------------------------------------7 Regression analysis---------------------------------------------------------------------------------------7 Problems of Regression analysis------------------------------------------------------------------------7 Ordinary least square method---------------------------------------------------------------------------8 Test of regression estimates-----------------------------------------------------------------------------8 F-Test-------------------------------------------------------------------------------------------------------9 ANOVA----------------------------------------------------------------------------------------------------9 Reliability--------------------------------------------------------------------------------------------------9 Models of ANOVA-------------------------------------------------------------------------------------11 I. II . III .
Fixed ef effect mo model Random ef effect mo model Mixed effect model
Assumptions----------------------------------------------------------------------------------------------11 Means-----------------------------------------------------------------------------------------------------12 Goodness to fit-------------------------------------------------------------------------------------------12 Chi square Goodness to fit-----------------------------------------------------------------------------12 Correlation------------------------------------------------------------------------------------------------13 Correlation coefficient----------------------------------------------------------------------------------14 4
Classical normal linear regression model------------------------------------------------------------18 Assumptions of CNLRM-------------------------------------------------------------------------------18 I.
Critical assumptions
II.
Detailed assumptions
T-Test-----------------------------------------------------------------------------------------------------19 Uses of T-Test-------------------------------------------------------------------------------------------20 Types of T-Test------------------------------------------------------------------------------------------20 Summary--------------------------------------------------------------------------------------------------21 Conclusion------------------------------------------------------------------------------------------------22
5
INTRODUCTION:
I made survey on general public and ask them about their Income and Expenses. From the data gathered I rounded off the figures from 5,000 to 150,000 and put the expenditures to their nearest as per my research. This project is to show the relationship between monthly income and expenditures. DATA TABLE:
Sr# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Income 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 50,000 55,000 60,000 65,000 70,000 75,000 80,000 85,000 90,000 95,000 100,000 105,000 110,000 115,000 120,000 125,000 130,000 135,000 140,000 145,000
Expenditur e 5,000 9,500 14,500 18,500 19,000 27,000 30,500 35,000 39,000 45,500 49,500 52,000 55,000 59,000 64,000 69,500 73,000 78,500 81,000 84,700 90,000 90,000 90,500 93,000 94,800 95,750 98,000 100,000 104,590
30
150,000
110,000
Total
2,325,000
1,876,340
6
DESCRIPTIVE DESCRIPTIVE STATISTICS:
Descriptive Statistics
INCOME EXPENDITURE Valid N (listwise)
N
Range
Minimum
Maximum
Sum
Statistic
Statistic
Statistic
Statistic
Statistic
Std. Deviation
Mean Statistic
Std. Error
Statistic
30
145000
5000
150000
2325000
77500.00
8036.376
44017.042
30
105000
5000
110000
1876340
62544.67
5852.542
32055.690
30
Test Statistics
Chi-Square(a,b) df Asymp. Sig.
INCOME .000
EXPENDITURE .933
29
28
1.000
1.000
a) 30 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.0. b)
29 cells (100.0% (100.0%)) have expected expected frequen frequencies cies less less than 5. The The minimum minimum expected expected cell freque frequency ncy is 1.0.
7
Variance Statistic 19375000 00.000 10275672 67.126
INCOME
EXPENDITURE
Observed N
Expected N
Residual
Observed N
Expected N
Residual
5000
1
1
0
5000
1
1
0
10000
1
1
0
9500
1
1
0
15000
1
1
0
14500
1
1
0
20000
1
1
0
18500
1
1
0
25000
1
1
0
19000
1
1
0
30000
1
1
0
27000
1
1
0
35000
1
1
0
30500
1
1
0
40000
1
1
0
35000
1
1
0
45000
1
1
0
39000
1
1
0
50000
1
1
0
45500
1
1
0
55000
1
1
0
49500
1
1
0
60000
1
1
0
52000
1
1
0
65000
1
1
0
55000
1
1
0
70000
1
1
0
59000
1
1
0
75000
1
1
0
64000
1
1
0
80000
1
1
0
69500
1
1
0
85000
1
1
0
73000
1
1
0
90000
1
1
0
78500
1
1
0
95000
1
1
0
81000
1
1
0
100000
1
1
0
84700
1
1
0
105000
1
1
0
90000
2
1
1
110000
1
1
0
90500
1
1
0
115000
1
1
0
93000
1
1
0
120000
1
1
0
94800
1
1
0
125000
1
1
0
95750
1
1
0
130000
1
1
0
98000
1
1
0
135000
1
1
0
100000
1
1
0
140000
1
1
0
104590
1
1
0
145000
1
1
0
110000
1
1
0
150000
1
1
0
Total
Total
30
30
FREQUENCY TABLE: TABLE: 8
INCOME
Valid
EXPENDITURE
Frequ requen ency cy
Perce ercent nt
Valid Percent
Cumulative Percent
Frequ requen ency cy
Perce ercent nt
Valid Percent
Cumulative Percent
5000
1
3.3
3.3
3.3
5000
1
3.3
3.3
3.3
10000
1
3.3
3.3
6.7
9500
1
3.3
3.3
6.7
15000
1
3.3
3.3
10
14500
1
3.3
3.3
10
20000
1
3.3
3.3
13.3
18500
1
3.3
3.3
13.3
25000
1
3.3
3.3
16.7
19000
1
3.3
3.3
16.7
30000
1
3.3
3.3
20
27000
1
3.3
3.3
20
35000
1
3.3
3.3
23.3
30500
1
3.3
3.3
23.3
40000
1
3.3
3.3
26.7
35000
1
3.3
3.3
26.7
45000
1
3.3
3.3
30
39000
1
3.3
3.3
30
50000
1
3.3
3.3
33.3
45500
1
3.3
3.3
33.3
55000
1
3.3
3.3
36.7
49500
1
3.3
3.3
36.7
60000
1
3.3
3.3
40
52000
1
3.3
3.3
40
65000
1
3.3
3.3
43.3
55000
1
3.3
3.3
43.3
70000
1
3.3
3.3
46.7
59000
1
3.3
3.3
46.7
75000
1
3.3
3.3
50
64000
1
3.3
3.3
50
80000
1
3.3
3.3
53.3
69500
1
3.3
3.3
53.3
85000
1
3.3
3.3
56.7
73000
1
3.3
3.3
56.7
78500
1
3.3
3.3
60
Vali d
90000
1
3.3
3.3
60
95000
1
3.3
3.3
63.3
81000
1
3.3
3.3
63.3
1
3.3
3.3
66.7
84700
1
3.3
3.3
66.7
1
3.3
3.3
70
90000
2
6.7
6.7
73.3
1
3.3
3.3
73.3
90500
1
3.3
3.3
76.7
1
3.3
3.3
76.7
93000
1
3.3
3.3
80
1
3.3
3.3
80
94800
1
3.3
3.3
83.3
1
3.3
3.3
83.3
95750
1
3.3
3.3
86.7
1
3.3
3.3
86.7
98000
1
3.3
3.3
90
1
3.3
3.3
90
1
3.3
3.3
93.3
1
3.3
3.3
93.3
1
3.3
3.3
96.7
1
3.3
3.3
96.7
1
3.3
3.3
100
1
3.3
3.3
100
30
100
100
30
100
100
10000 0 10500 0 11000 0 11500 0 12000 0 12500 0 13000 0 13500 0 14000 0 14500 0 15000 0 Total
10000 0 10459 0 11000 0 Total
9
Statistics
INCOME 30
EXPENDITURE 30
0
0
Mean
77500.00
62544.67
Std. Error of Mean
8036.376
5852.542
77500.00(a)
66750.00(a)
5000(b)
90000
44017.042
32055.690
1937 193750 5000 0000 00.0 .000 00
1027 102756 5672 7267 67.1 .126 26
Skewness
.000
-.310
Std. Error of Skewness
.427
.427
145000
105000
Minimum
5000
5000
Maximum
150000
110000
2325000
1876340
25
40000.00(c)
35000.00(c)
50
77500.00
66750.00
75
115000.00
90500.00
N
Valid Missing
Median Mode Std. Deviation Variance
Range
Sum Percentiles
a) b) c)
Calc Calcula ulate ted d from from gro group uped ed dat data. a. Multip Multiple le modes modes exist exist.. The smal smalles lestt value value is show shown n Percen Percentil tiles es are are calcul calculate ated d from from groupe grouped d data. data.
Ratio Statistics for INCOME / EXPENDITURE EXPENDITURE Mean 95% Confidence Interval for Mean
1.197 Lower Bound Upper Bound
Median 95% Confidence Interval for Median
Lower Bound
1.148
Upper Bound
1.222
Actual Coverage
95.7% 1.239
Lower Bound
1.196
Upper Bound
1.282
Minimum
1.000
Maximum
1.400
Std. Deviation
.110
Range
.400
Price Related Differential
.966
Coefficient of Dispersion Coefficient of Variation a)
1.238 1.169
Weighted Mean 95% Confidence Interval for Weighted Mean
1.156
.071 Median Centered
9.7%
The confidence confidence interval interval for for the median is constru constructed cted without without any distribut distribution ion assumptions assumptions.. The actual coverage level may be greater than the specified level. Other confidence intervals are constructed by assuming a Normal distribution for the ratios.
10
HISTOGRAM WITH NORMAL CURVE:
INCOME
5
4
y c3 n e u q e r F 2
1
Mean =77500 Std. Dev. =44017.042 N =30 0 0
20000
40000
60000
80000
100000
120000
140000
INCOME
EXPENDITURE
6
y c n4 e u q e r F
2
Mean =62544.67 Std. Dev. =32055.69 N =30 0 0
20000
40000
60000
80000
100000
120000
EXPENDITURE
11
SIMPLE REGRESSION FUNCTION:
In statistics, statistics, simple linear regression regression is the least squares estimator of a linear regression model with a single predictor variable. variable . In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible. REGRESSION ANALYSIS: ANALYSIS:
Regression analysis includes any techniques for modeling and analyzing several variables, when when the focus is on the relation relationshi ship p betwee between n a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analy analysis sis estima estimates tes the condition conditional al expectati expectation on of the the depe depend nden entt vari variab able le give given n the the independent variables — that is, the average average value value of the dependent variable when the independent variables are held fixed. Less commonly, the focus is on a quantile, quantile, or other location location paramete parameterr of the condit condition ional al distri distribut bution ion of the depen depende dent nt varia variable ble given given the independent variables. In all cases, the estimation target is a function of the independent variab variables les calle called d the regres regressio sion n functi function. on. In regres regressio sion n analys analysis, is, it is also also of intere interest st to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution. distribution . Regres Regressio sion n analy analysis sis is widely widely used used for prediction and forecasting, forecasting, wher wheree its its use use has has substantial overlap with the field of machine learning. learning . Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. PROBLEMS IN REGRESSION ANALYSIS:
MULTICOLLINEARITY Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may chang hangee erra errati tica call lly y in resp respon onse se to smal smalll cha change nges in the the model odel or the the data data.. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors. predictors. That is, a multiple regression model with correlated predictors can indicate how well the entire bundle of predictors predicts the outcome variable, variable , but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others. HETEROSCEDASTICITY In statistics, statistics, a sequence of random variables is heterosce heteroscedast dastic, ic, or heterosce heteroscedast dastic, ic, if the random variables have different variances. variances. The term means "differing variance" and comes 12
from the Greek "hetero" ('different') and "skedasis" ('dispersion'). In contrast, a sequence of random variables is called homoscedastic if it has constant variance. ORDINARY LEAST SQUARE METHOD Ordinary least squares (OLS) or linear least squares are a method for estimating the unknown parameters in a linear regression model. model . This method minimizes the sum of squared vertical distances between the observed responses in the dataset, dataset, and the responses predicted by the linear linear approxima approximation. tion. The resulting resulting estimator can can be expr expres esse sed d by a simp simple le form formul ula, a, especially in the case of a single regressor on the right-hand side. The The OL OLS S esti estima mato torr is consistent when when the regres regressor sor are exogenous and and ther theree is no Multicollinearity, and optimal in the class of linear unbiased estimators when the errors are homosced homoscedastic astic and serially serially uncorrela uncorrelated ted.. Unde Underr thes thesee cond condit itio ions ns,, the the meth method od of OL OLS S provides minimum-variance mean-unbiased estimation when the errors have finite variances. Under the additional assumption that the errors be normally distributed, distributed , OLS is the maximum likelihood estimator. estimator . OLS is used in economics ( econometrics) econometrics) and electrical engineering (control theory and signal processing), processing ), among many areas of application. TEST OF REGRESSION ESTIMATES: To test if one variable significantly predicts another variable we need to only test if the correlation between the two variables is significant different to zero (i.e., as above). In regression, a significant prediction means a significant proportion of the variability in the predic predicted ted varia variable ble can can be accou accounte nted d for by (or "attri "attribut buted ed to", to", or "expla "explaine ined d by", by", or "associated with") the predictor variable.
Descriptive Statistics N 30
Mean 77500.00
Std. Deviation 44017.042
30
62544.67
32055.690
INCOME EXPENDITURE Valid N (listwise)
30
Model Fit
Fit Statistic Stationary R-squared R-squared RMSE MAPE MaxAPE MAE MaxAE Normalized BIC
Mean
SE
Minimum
Maximum
5
10
25
50
Percentile 75
90
95
5
10
25
50
.428
.
.428
.428
.428
.428
.428
.428
.428
.428
.428
.997 1882.23 1 3.282
.
.997 1882.23 1 3.282
.997
.997 1882.23 1 3.282
.997 1882.23 1 3.282
.997 1882.23 1 3.282
.997 1882.23 1 3.282
.997 1882.23 1 3.282
.997 1882.23 1 3.282
.997
16.348 1439.57 7 4395.07 6
16.348 1439.57 7 4395.07 6
16.348 1439.57 7 4395.07 6
16.348 1439.57 7 4395.07 6
16.348 1439.57 7 4395.07 6
16.348 1439.57 7 4395.07 6
15.307
15.307
15.307
15.307
15.307
15.307
16.348 1439.57 7 4395.07 6 15.307
. . . . . .
16.348 1439.57 7 4395.07 6 15.307
1882.231 3.282 16.348 1439.577 4395.076 15.307
13
1882.231 3.282 16.348 1439.577 4395.076 15.307
ANOVA (b)
Model 1
Regression Residual Total
a) b)
Sum of Squares 29230939495.261
df 1
Mean Square 29230939495.261
568511251.405
28
20303973.264
29799450746.667
29
F 1439.666
Sig. .000(a)
Pred Predic icto tors rs:: (Const (Constan ant) t),, INCOM INCOME E Depend Dependent ent Variab Variable: le: EXPEND EXPENDITU ITURE RE
F-TEST An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis . It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled. ANOVA Analysis of variance (ANOVA) is a collection of statistical models, models , and their associated proced procedure ures, s, in which which the observ observed ed variance in a partic particula ularr varia variable ble is partit partition ioned ed into into compon componen ents ts attrib attributa utable ble to differ differen entt source sourcess of variat variation ion.. In its simple simplest st form form ANOVA ANOVA provides a statistical test of whether or not the means of several groups are all equal, and therefore generalizes t -test -test to more than two groups. Doing multiple two-sample t-tests would result in an increased chance of committing a type I error. error . For this reason, ANOVAs are useful in comparing two, three or more means. RELIABILITY: RELIABILITY: Case Processing Summary N Cases
a)
Valid Excluded(a)
30
% 100.0
0
.0
Total
30
100.0
Listwise Listwise deletio deletion n based based on all variable variables s in the procedu procedure. re.
Reliability Statistics
Cronbach's Alpha .970
Cronbach's Alpha Based on Standardized It Items N of Items .995
2
14
Inter-Item Covariance Matrix
INCOME EXPENDITURE
INCOME 1937 193750 5000 0000 00.0 .000 00
EXPENDITURE 1397 139747 4724 2413 13.7 .793 93
1397 139747 4724 2413 13.7 .793 93
1027 102756 5672 7267 67.1 .126 26
Inter-Item Correlation Matrix
INCOME EXPENDITURE
INCOME 1.000
EXPENDITURE .990
.990
1.000
Summary Item Statistics
Item Means Item Variances
Inte Interr-IItem tem Covar ovaria ianc nces es Inter-Item Correlations
Maximum / Minimum
Mean
Minimum
Maximum
Range
Variance 11183099 7.556 41398878 91773752 00.000
N of Items
70022.333
62544.667
77500.000
14955.333
1.239
14825336 33.563
10275672 67.126
19375000 00.000
90993273 2.874
1.886
1397 139747 4724 24 13.793 .990
13974724 13.793 .990
13974724 13.793 .990
.000
1.000
.000
2
.000
1.000
.000
2
2 2
Item-Total Statistics Scale Mean if Item Deleted
Scale Variance if Item Deleted
Corrected Item-Total Correlation
Squared Multiple Correlation
VAR00001
1.1186
2.439
.302
.091
.
a
VAR00002
.6402
.176
.302
.091
.
a
.
Cronbach's Alpha if Item Deleted
The value is negative due to a negative average covariance among items. This Violates reliability model assumptions. You may want to check item codings.
Scale Statistics Mean 140044.67
Variance 57600120 94.713
Std. Deviation 75894.744
N of Items 2
15
ANOVA
Sum of Squares 83520175373.33 3
Between People W ithin People
Between Items Residual Total
Total
df
Mean Square 29
2880006047.356
3354929926.667
1
3354929926.667
2466775373.333
29
85061219.770
5821705300.000 89341880673.33 3
30
194056843.333
59
1514269163.955
F
39.441
Sig
.000
Grand Mean = 70022.33
MODELS: (Model 1) FIXED EFFECTS MODEL
The The fixed fixed-e -effe ffect ctss mode modell of anal analy ysis sis of varia varianc ncee appl applie iess to situ situat atio ions ns in whic which h the the experimenter applies one or more treatments to the subjects of the experiment to see if the response variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole. (Model 2) RANDOM EFFECT MODEL
Random effects models are used when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables, variables , some assumptions and the method of contrasting the treatments differ from ANOVA model 1. (Model 3) MIXED EFFECTS MODEL
A mixed-effects model contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types. Most random-effects or mixed-effects models are not concerned with making inferences concerning the particular values of the random effects that happen to have been sampled. For example, consider a large manufacturing plant in which many machines produce the same product. The statistician studying this plant would have very little interest in comparing the three particular machines to each other. Rather, inferences that can be made for all machines are of interest, such as their variability and the mean. However, if one is interested in the realized value of the random effect, best linear unbiased prediction can be used to obtain a "prediction" for the value. ASSUMPTIONS OF ANOVA The analysis of variance has been studied from several approaches, the most common of which use a linear model that relates the response to the treatments and blocks. Even when the statistical model is nonlinear, it can be approximated by a linear model for which an analysis of variance may be appropriate. 16
Independence of cases – this is an assumption of the model that simplifies the statistical analysis. Normality – the distributions of the residuals are normal. Equality (or "homogeneity") of variances, called homoscedasticity — the variance of data in groups should be the same. Model-based approaches usually assume that the variance is constant. The constant-variance property also appears in the randomization (design-based) analysis of randomized experiments, where it is a necessary consequence of the randomized design.
MEANS: Case Processing Summary Cases Included N EXPENDITURE * INCOME
Excluded
Percent 30
N
100.0%
Total
Percent 0
N
.0 %
Percent 30
100.0%
Report
GOODNESS TO FIT: The goodne goodness ss of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing. CHI-SQUARE AS GOODNESS TO FIT When an analyst attempts to fit a statistical model to observed data, he or she may wonder how well the model actually reflects the data. How "close" are the observed values to those which would be expected under the fitted model? One statistical test that addresses this issue is the chi-square goodness of fit test. Test Statistics
Chi-Square(a,b) df Asymp. Sig. a) b)
INCOME .000
EXPENDITURE .933
29
28
1.000
1.000
30 cells cells (100.0%) (100.0%) have have expected expected frequenc frequencies ies less less than 5. 5. The minimu minimum m expected expected cell frequency is 1.0. 29 cells cells (100.0%) (100.0%) have have expected expected frequenc frequencies ies less less than 5. 5. The minimu minimum m expected expected cell frequency is 1.0.
17
INCOME
EXPENDITURES
Observed N
Expected N
Residual
Observed N
Expected N
Residual
5000
1
1
0
5000
1
1
0
10000
1
1
0
9500
1
1
0
15000
1
1
0
14500
1
1
0
20000
1
1
0
18500
1
1
0
25000
1
1
0
19000
1
1
0
30000
1
1
0
27000
1
1
0
35000
1
1
0
30500
1
1
0
40000
1
1
0
35000
1
1
0
45000
1
1
0
39000
1
1
0
50000
1
1
0
45500
1
1
0
55000
1
1
0
49500
1
1
0
60000
1
1
0
52000
1
1
0
65000
1
1
0
55000
1
1
0
70000
1
1
0
59000
1
1
0
75000
1
1
0
64000
1
1
0
80000
1
1
0
69500
1
1
0
85000
1
1
0
73000
1
1
0
90000
1
1
0
78500
1
1
0
95000
1
1
0
81000
1
1
0
100000
1
1
0
84700
1
1
0
105000
1
1
0
90000
2
1
1
110000
1
1
0
90500
1
1
0
115000
1
1
0
93000
1
1
0
120000
1
1
0
94800
1
1
0
125000
1
1
0
95750
1
1
0
130000
1
1
0
98000
1
1
0
135000
1
1
0
100000
1
1
0
140000
1
1
0
104590
1
1
0
145000
1
1
0
110000
1
1
0
150000
1
1
0
Total
Total
30
30
CORRELATION:
Dependence refers to any statistical r elationship between two random variables or two sets of data. data. Corre Correla lati tion on refe refers rs to any any of a broa broad d clas classs of stat statis isti tica call rela relati tion onsh ship ipss invo involv lvin ing g dependence. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price. Correlations are useful because they can indicate a predictive relationship that 18
can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example there is a causal relationship. relationship .
Descriptive Statistics
INCOME EXPENDITURE
Mean 77500.00
Std. Deviation 44017.042
62544.67
32055.690
N 30 30
Correlations
INCOME
Pearson Co Correlation
INCOME 1
Sig. (2-tailed) Sum of Squares and Cross-products Covariance N EXPE EXPEND NDIT ITUR URE E
Pear Pearso son n Corr Correl elat atio ion n Sig. (2-tailed) Sum of Squares and Cross-products Covariance N
EXPENDITURE .990(**) .000
56187500 000.000 19375000 00.000 30 .990(**)
40526700000.000 1397472413.793 30 1
.000 40526700 000.000 13974724 13.793 30
29799450746.667 1027567267.126 30
** Correlation is significant significant at the 0.01 level (2-tailed).
CORRELATION COEFFICEINT: COEFFICEINT: Correlation coefficient may refer to: Pearson product-moment correlation coefficient, also known as r, R, or Pearson's r, a
measure of the strength of the linear relationship between two variables that is defined in terms of the (sample) covariance of the variables divided by their (sample) standard deviations Correlation and dependence, a broad class of statistical relationships between two or more random variables or observed data values Goodness of fit, which refers to any of several measures that measure how well a statistical model fits observations by summarizing the discrepancy between observed values and the values expected under the model in question
19
Coefficie Coefficient nt of determination determination,, a measure measure of the proportion proportion of variability variability in a data set that is accounted for by a statistical model; often called R2; equal in a single-variable linear regression to the square of Pearson's product-moment correlation coefficient.
Coefficient Correlations (a) Model 1 a)
Correlations Covariances
INCOME 1.000
INCOME INCOME
.000
Depend Dependent ent Variab Variable: le: EXPEN EXPENDIT DITURE URE
Collinearity Diagnostics (a) Eigenvalue Model 1
Dimension 1 2 a)
(Constant) 1.873
Condition Index
Variance Pr Proportions
INCOME (Constant) INCOME 1.000 .06 .06
.127
3.842
.94
.94
Depend Dependent ent Variab Variable: le: EXPEN EXPENDIT DITURE URE
RESIDUALS:
Residuals Statistics (a)
Predicted Value
Minimum 10252.15
Maximum 114837.18
Mean 62544.67
Std. Deviation 31748.440
Residual
N 30
-7624.422
7620.241
.000
4427.622
30
Std. Predicted Value
-1.647
1.647
.000
1.000
30
Std. Residual
-1.692
1.691
.000
.983
30
a)
Depend Dependent ent Variab Variable: le: EXPEN EXPENDIT DITURE URE
CHARTS: 20
Histogram
Dependent Variable: EXPENDITURE 5
4
y c n3 e u q e r F2
1 Mean =-1.04E-16 Std. Dev. =0.983 N =30
0 -2
-1
0
1
2
Regression Standardized Residual
Normal P-P Plot of Regression Standardized Residual
Dependent Variable: EXPENDITURE
1.0
0.8
b o r P 0.6 m u C d e t c0.4 e p x E 0.2
0.0 0.0
0.2
0.4
0.6
0.8
1.0
Observed Cum Prob
21
Normal P-P Plot of INCOME
1.0
b o r0.8 P m u C0.6 d e t c e0.4 p x E 0.2
0.0 0.0
0.2
0.4
0.6
0.8
1.0
Observed Cum Prob Transforms: natural log
104590
Dot/Li nes s how Modes
95750
90500
E R U T I D N E P X E
81000
69500
55000
45500
30500
18500
5000
5000
35000 20000
65000 50000
95000 80000
125000
110000
140000
INCOME
CLASSICAL NORMAL LIINEAR REGRESSION MODEL:
22
Econometrics is all about causality. Economics is full of theory of how one thing causes another: increases in prices cause demand to decrease, better education causes people to become richer, etc. So to be able to test this theory, economists find data (such as price and quantity of a good, or notes on a population's education and wealth levels). Data always comes out looking like a cloud, and without using proper techniques, it is impossible to determine if this cloud gives any useful information. Econometrics is a tool to establish correlation and hopefully later, causality, using collected data points. We do this by creating an explanatory function from the data. The function is linear model and is estimated by minimizing the squared distance from the data to the line. The distance is considered an error term. This is the process of linear li near regression.
ASSUMPTIONS UNDERLYING CLASSICAL NORMAL LIINEAR REGRESSION MODEL There are 5 critical assumptions relating to CLRM. These assumptions are required to show that the estimation technique, Ordinary Least Squares (OLS), has a number of desirable properties, and also so that the hypothesis tests regarding the coefficient estimates could validly be conducted.
CRITICAL ASSUMPTIONS: The errors have zero mean. The variance of the errors is constant and finite over all values of X. The errors are statistically independent of one another. There is no relationship between the error and the corresponding X. ∑ is normally distributed.
DETAILED ASSUMPTIONS The regression model is linear in parameters
The value of the regressor’s, X’s (independent variables) are fixed in repeated samples. For given values of X’s, the mean value of the errors equals zero. For given values of X’s, the variance of the errors in constant. For given values of X’s there is no autocorrelation. The X’s are stochastic and the errors and the X’s are not correlated. The number of observations is greater than the number of independent variables. There is sufficient variability in the values of the X’s. The regression model is correctly specified. 23
There is not multi-Collinearity. The error term is normally distributed.
T-TEST: A t-tes t-testt is any statistic statistical al hypothes hypothesis is test in which the test statistic follows a Studen Student's t's t distribution, distribution, if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known.
One-Sample Statistics
N INCOME
30
Mean 77500.00
EXPENDITURE
30
62544.67
Std. Deviation 44017.042
Std. Error Mean 8036.376
32055.690
5852.542
One-Sample Test
t Lower 9.644
INCOME EXPENDITURE
10.687
Test Value = 0 Mean Sig. (2-tailed) Difference
df Upper
Lower
95% Confidence Interval of the Difference
29
Upper Lower .000 77500.000 61063.77
29
.000
62544.667
50574.88
Upper 93936.23 74514.46
ANOVA EXPENDITURE Sum of Squares Between Groups
(Combined) Linear Term
Contrast Deviation
Mean Square
F
Sig.
29799450746.667
29
1027567267.126
.
29230939495.261
1
29230939495.261
.
568511251.405
28
20303973.264
.
.000
0
.
29799450746.667
29
Within Groups Total
df
.
USES: Among the most frequently used t -tests -tests are: 24
•
•
•
•
A one-sample location test of whether the mean of a normally distributed population has a value specified in a null hypothesis. A two sample location test of the null hypothesis that the means of two normally distributed populatio populations ns are equal. All such tests are usually usually called Student's Student's t -tests, -tests, though strictly speaking that name should only be used if the variances of the two popula populatio tions ns are also assume assumed d to be equal; equal; the form form of the test used used when when this this assumpti assumption on is dropped dropped is sometimes sometimes called Welch's t -test. -test. These These tests tests are often often referred referred to as "unpaired" "unpaired" or "independ "independent ent samples" samples" t -tests -tests,, as they they are typic typicall ally y applied when the statistical units underlying the two samples being compared are nonoverlapping. A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor size for many of the patients to be smaller following the treatment. This is often referred to as the "paired" or "repeated measures" t -test. -test. A test of whether the slope of a regression line differs significantly from 0.
TYPES: UNPAIRED & PAIRED TWO SAMPLES T-Test Two-sample t-tests for a difference in mean can be either unpaired or paired. Paired t-tests are a form of blocking, blocking, and have greater power than unpaired tests when the paired units are similar with respect to "noise factors" that are independent of membership in the two groups being being compared compared.. In a different different context, context, paired t-tests can be used to reduce reduce the effects effects of confounding factors in an observational study. study . The unpaired, or "independent samples" t-test is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared. For example, suppose we are evaluating the effect of a medical treatment, and we enroll 100 subjects into our study, then randomize 50 subjects to the treatment group and 50 subjects to the control group. In this case, we have two independent samples and would use the unpaired form of the t-test. The randomization is not essential here—if we contacted 100 people by phone and obtained each person's age and gender, and then used a two-sample ttest to see whether the mean ages differ by gender, this would also be an independent samples t-test, even though the data are observational. Dependen Dependentt samples samples (or "paired") "paired") t-tests t-tests typically typically consist of a sample sample of matched matched pairs of similar units, units, or one group of units that has been tested twice (a "repeated measures" t-test). A typical example of the repeated measures t-test would be where subjects are tested prior to a treatment, say for high blood pressure, and the same subjects are tested again after treatment with a blood-pressure lowering medication. A dependent t-test based on a "matched-pairs sample" results from an unpaired sample that is subsequently used to form a paired sample, by using additional variables that were measured 25
along with the variable of interest. The matching is carried out by identifying pairs of values consisting of one observation from each of the two samples, where the pair is similar in terms of other measured variables. This approach is often used in observational studies to reduce or eliminate the effects of confounding factors.
SUMMARY:
Case Processing Summary (a) Cases Included N
Excluded
INCOME
30
Percent 100.0%
EXPENDITURE
30
100.0%
a)
N
Total
0
Percent .0%
0
.0%
N 30
Percent 100.0%
30
100.0%
Limi Limite ted d to to fir first st 100 100 cas cases es..
26
Case Summaries (a)
Case Number 1
INCOME 5000
EXPENDITURE 5000
2
10000
9500
3
3
15000
14500
4
4
20000
18500
5
5
25000
19000
6
6
30000
27000
7
7
35000
30500
8
8
40000
35000
9
9
45000
39000
10
10
50000
45500
11
11
55000
49500
12
12
60000
52000
13
13
65000
55000
14
14
70000
59000
15
15
75000
64000
16
16
80000
69500
17
17
85000
73000
18
18
90000
78500
19
19
95000
81000
20
20
100000
84700
21
21
105000
90000
22
22
110000
90000
23
23
115000
90500
24
24
120000
93000
25
25
125000
94800
26
26
130000
95750
27
27
135000
98000
28
28
140000
100000
29
29
145000
104590
30
30
150000
110000
77500.00
62544.67
Minimum
5000
5000
Maximum
150000
110000
145000 19375000 00.000 30
105000
1 2
Total
Mean
Range Variance N a)
1027567267.126 30
Limi Limite ted d to to fir first st 100 100 cas cases es..
CONCLUSION: Hence from all the above discussion, we found that monthly expenditures are dependent on the monthly total income and the contribution of population is very low in this regards. As the person who earns make expenses and also save the surplus amount so total monthly income is break up of Expenditures and savings.
27