Statistical Relationship Between Income and Expenditures

STATISTICAL RELATIONSHIP BETWEEN INCOME AND EXPENDITURES, (INCOME=DEPENDENT VARIABLE & EXENDITURES=INDEPENDENT VARIABLE)

A Project Presented By

Rehan Ehsan Contact# +92 321 8880397 [email protected] To Dr. Naheed Sultana In partial fulfillment of the requirements for course completion of ECONOMETRICS

M.PHIL (FINANCE) (SEMESTER ONE)

LAHORE SCHOOL OF ACCOUNTING & FINANCE The University of Lahore

1

Acknowledgement Acknowledgement To say say this this proj projec ectt is “by “by Reha Rehan n Ehsa Ehsan” n” over oversta state tes s the the case case.. Without the significant contributions made by other people this project would certainly not exist. I would like to say thanks to general public who helped me out to have questionnaires regarding their income and expenses. Thanks to their cooperation and thanks to my colleges as well who helped me making my project completed.

2

ABSTRACT

We found that monthly expenditures are dependent on the monthly total income income and the contributio contribution n of population population is very low in this regards. regards. As pers person on who who earn earns s also also make make expe expens nses es and and also also save save the the surpl surplus us amou amount nt so tota totall mont monthl hly y inco income me is brea break k up of Expe Expend ndit itur ures es and and savings.

3

TABLE OF CONTENTS Introduction------------------------------------------------------------------------------------------------1 Data table---------------------------------------------------------------------------------------------------1 Descriptive statistics--------------------------------------------------------------------------------------2 Frequency table--------------------------------------------------------------------------------------------4 Histogram--------------------------------------------------------------------------------------------------6 Simple linear regression function-----------------------------------------------------------------------7 Regression analysis---------------------------------------------------------------------------------------7 Problems of Regression analysis------------------------------------------------------------------------7 Ordinary least square method---------------------------------------------------------------------------8 Test of regression estimates-----------------------------------------------------------------------------8 F-Test-------------------------------------------------------------------------------------------------------9 ANOVA----------------------------------------------------------------------------------------------------9 Reliability--------------------------------------------------------------------------------------------------9 Models of ANOVA-------------------------------------------------------------------------------------11 I. II . III .

Fixed ef effect mo model Random ef effect mo model Mixed effect model

Assumptions----------------------------------------------------------------------------------------------11 Means-----------------------------------------------------------------------------------------------------12 Goodness to fit-------------------------------------------------------------------------------------------12 Chi square Goodness to fit-----------------------------------------------------------------------------12 Correlation------------------------------------------------------------------------------------------------13 Correlation coefficient----------------------------------------------------------------------------------14 4

Classical normal linear regression model------------------------------------------------------------18 Assumptions of CNLRM-------------------------------------------------------------------------------18 I.

Critical assumptions

II.

Detailed assumptions

T-Test-----------------------------------------------------------------------------------------------------19 Uses of T-Test-------------------------------------------------------------------------------------------20 Types of T-Test------------------------------------------------------------------------------------------20 Summary--------------------------------------------------------------------------------------------------21 Conclusion------------------------------------------------------------------------------------------------22

5

INTRODUCTION:

I made survey on general public and ask them about their Income and Expenses. From the data gathered I rounded off the figures from 5,000 to 150,000 and put the expenditures to their nearest as per my research. This project is to show the relationship between monthly income and expenditures. DATA TABLE:

Sr# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Income 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 50,000 55,000 60,000 65,000 70,000 75,000 80,000 85,000 90,000 95,000 100,000 105,000 110,000 115,000 120,000 125,000 130,000 135,000 140,000 145,000

Expenditur e 5,000 9,500 14,500 18,500 19,000 27,000 30,500 35,000 39,000 45,500 49,500 52,000 55,000 59,000 64,000 69,500 73,000 78,500 81,000 84,700 90,000 90,000 90,500 93,000 94,800 95,750 98,000 100,000 104,590

30

150,000

110,000

Total

2,325,000

1,876,340

6

DESCRIPTIVE DESCRIPTIVE STATISTICS:

Descriptive Statistics

INCOME EXPENDITURE Valid N (listwise)

N

Range

Minimum

Maximum

Sum

Statistic

Statistic

Statistic

Statistic

Statistic

Std. Deviation

Mean Statistic

Std. Error

Statistic

30

145000

5000

150000

2325000

77500.00

8036.376

44017.042

30

105000

5000

110000

1876340

62544.67

5852.542

32055.690

30

Test Statistics

Chi-Square(a,b) df Asymp. Sig.

INCOME .000

EXPENDITURE .933

29

28

1.000

1.000

a) 30 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.0. b)

29 cells (100.0% (100.0%)) have expected expected frequen frequencies cies less less than 5. The The minimum minimum expected expected cell freque frequency ncy is 1.0.

7

Variance Statistic 19375000 00.000 10275672 67.126

INCOME

EXPENDITURE

Observed N

Expected N

Residual

Observed N

Expected N

Residual

5000

1

1

0

5000

1

1

0

10000

1

1

0

9500

1

1

0

15000

1

1

0

14500

1

1

0

20000

1

1

0

18500

1

1

0

25000

1

1

0

19000

1

1

0

30000

1

1

0

27000

1

1

0

35000

1

1

0

30500

1

1

0

40000

1

1

0

35000

1

1

0

45000

1

1

0

39000

1

1

0

50000

1

1

0

45500

1

1

0

55000

1

1

0

49500

1

1

0

60000

1

1

0

52000

1

1

0

65000

1

1

0

55000

1

1

0

70000

1

1

0

59000

1

1

0

75000

1

1

0

64000

1

1

0

80000

1

1

0

69500

1

1

0

85000

1

1

0

73000

1

1

0

90000

1

1

0

78500

1

1

0

95000

1

1

0

81000

1

1

0

100000

1

1

0

84700

1

1

0

105000

1

1

0

90000

2

1

1

110000

1

1

0

90500

1

1

0

115000

1

1

0

93000

1

1

0

120000

1

1

0

94800

1

1

0

125000

1

1

0

95750

1

1

0

130000

1

1

0

98000

1

1

0

135000

1

1

0

100000

1

1

0

140000

1

1

0

104590

1

1

0

145000

1

1

0

110000

1

1

0

150000

1

1

0

Total

Total

30

30

FREQUENCY TABLE: TABLE: 8

INCOME

Valid

EXPENDITURE

Frequ requen ency cy

Perce ercent nt

Valid Percent

Cumulative Percent

Frequ requen ency cy

Perce ercent nt

Valid Percent

Cumulative Percent

5000

1

3.3

3.3

3.3

5000

1

3.3

3.3

3.3

10000

1

3.3

3.3

6.7

9500

1

3.3

3.3

6.7

15000

1

3.3

3.3

10

14500

1

3.3

3.3

10

20000

1

3.3

3.3

13.3

18500

1

3.3

3.3

13.3

25000

1

3.3

3.3

16.7

19000

1

3.3

3.3

16.7

30000

1

3.3

3.3

20

27000

1

3.3

3.3

20

35000

1

3.3

3.3

23.3

30500

1

3.3

3.3

23.3

40000

1

3.3

3.3

26.7

35000

1

3.3

3.3

26.7

45000

1

3.3

3.3

30

39000

1

3.3

3.3

30

50000

1

3.3

3.3

33.3

45500

1

3.3

3.3

33.3

55000

1

3.3

3.3

36.7

49500

1

3.3

3.3

36.7

60000

1

3.3

3.3

40

52000

1

3.3

3.3

40

65000

1

3.3

3.3

43.3

55000

1

3.3

3.3

43.3

70000

1

3.3

3.3

46.7

59000

1

3.3

3.3

46.7

75000

1

3.3

3.3

50

64000

1

3.3

3.3

50

80000

1

3.3

3.3

53.3

69500

1

3.3

3.3

53.3

85000

1

3.3

3.3

56.7

73000

1

3.3

3.3

56.7

78500

1

3.3

3.3

60

Vali d

90000

1

3.3

3.3

60

95000

1

3.3

3.3

63.3

81000

1

3.3

3.3

63.3

1

3.3

3.3

66.7

84700

1

3.3

3.3

66.7

1

3.3

3.3

70

90000

2

6.7

6.7

73.3

1

3.3

3.3

73.3

90500

1

3.3

3.3

76.7

1

3.3

3.3

76.7

93000

1

3.3

3.3

80

1

3.3

3.3

80

94800

1

3.3

3.3

83.3

1

3.3

3.3

83.3

95750

1

3.3

3.3

86.7

1

3.3

3.3

86.7

98000

1

3.3

3.3

90

1

3.3

3.3

90

1

3.3

3.3

93.3

1

3.3

3.3

93.3

1

3.3

3.3

96.7

1

3.3

3.3

96.7

1

3.3

3.3

100

1

3.3

3.3

100

30

100

100

30

100

100

10000 0 10500 0 11000 0 11500 0 12000 0 12500 0 13000 0 13500 0 14000 0 14500 0 15000 0 Total

10000 0 10459 0 11000 0 Total

9

Statistics

INCOME 30

EXPENDITURE 30

0

0

Mean

77500.00

62544.67

Std. Error of Mean

8036.376

5852.542

77500.00(a)

66750.00(a)

5000(b)

90000

44017.042

32055.690

1937 193750 5000 0000 00.0 .000 00

1027 102756 5672 7267 67.1 .126 26

Skewness

.000

-.310

Std. Error of Skewness

.427

.427

145000

105000

Minimum

5000

5000

Maximum

150000

110000

2325000

1876340

25

40000.00(c)

35000.00(c)

50

77500.00

66750.00

75

115000.00

90500.00

N

Valid Missing

Median Mode Std. Deviation Variance

Range

Sum Percentiles

a) b) c)

Calc Calcula ulate ted d from from gro group uped ed dat data. a. Multip Multiple le modes modes exist exist.. The smal smalles lestt value value is show shown n Percen Percentil tiles es are are calcul calculate ated d from from groupe grouped d data. data.

Ratio Statistics for INCOME / EXPENDITURE EXPENDITURE Mean 95% Confidence Interval for Mean

1.197 Lower Bound Upper Bound

Median 95% Confidence Interval for Median

Lower Bound

1.148

Upper Bound

1.222

Actual Coverage

95.7% 1.239

Lower Bound

1.196

Upper Bound

1.282

Minimum

1.000

Maximum

1.400

Std. Deviation

.110

Range

.400

Price Related Differential

.966

Coefficient of Dispersion Coefficient of Variation a)

1.238 1.169

Weighted Mean 95% Confidence Interval for Weighted Mean

1.156

.071 Median Centered

9.7%

The confidence confidence interval interval for for the median is constru constructed cted without without any distribut distribution ion assumptions assumptions.. The actual coverage level may be greater than the specified level. Other confidence intervals are constructed by assuming a Normal distribution for the ratios.

10

HISTOGRAM WITH NORMAL CURVE:

INCOME

5

4

y c3 n e u q e r F 2

1

Mean =77500 Std. Dev. =44017.042 N =30 0 0

20000

40000

60000

80000

100000

120000

140000

INCOME

EXPENDITURE

6

y c n4 e u q e r F

2

Mean =62544.67 Std. Dev. =32055.69 N =30 0 0

20000

40000

60000

80000

100000

120000

EXPENDITURE

11

SIMPLE REGRESSION FUNCTION:

In statistics, statistics, simple linear regression regression is the least squares estimator of a linear regression model with a single predictor variable. variable . In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible. REGRESSION ANALYSIS: ANALYSIS:

Regression analysis includes any techniques for modeling and analyzing several variables, when when the focus is on the relation relationshi ship p betwee between n a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analy analysis sis estima estimates tes the condition conditional al expectati expectation on of the the depe depend nden entt vari variab able le give given n the the independent variables — that is, the average average value value of the dependent variable when the independent variables are held fixed. Less commonly, the focus is on a quantile, quantile, or other location location paramete parameterr of the condit condition ional al distri distribut bution ion of the depen depende dent nt varia variable ble given given the independent variables. In all cases, the estimation target is a function of the independent variab variables les calle called d the regres regressio sion n functi function. on. In regres regressio sion n analys analysis, is, it is also also of intere interest st to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution. distribution . Regres Regressio sion n analy analysis sis is widely widely used used for prediction and forecasting, forecasting, wher wheree its its use use has has substantial overlap with the field of machine learning. learning . Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. PROBLEMS IN REGRESSION ANALYSIS:

MULTICOLLINEARITY Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may chang hangee erra errati tica call lly y in resp respon onse se to smal smalll cha change nges in the the model odel or the the data data.. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors. predictors. That is, a multiple regression model with correlated predictors can indicate how well the entire bundle of predictors predicts the outcome variable, variable , but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others. HETEROSCEDASTICITY In statistics, statistics, a sequence of random variables is heterosce heteroscedast dastic, ic, or heterosce heteroscedast dastic, ic, if the random variables have different variances. variances. The term means "differing variance" and comes 12

from the Greek "hetero" ('different') and "skedasis" ('dispersion'). In contrast, a sequence of random variables is called homoscedastic if it has constant variance. ORDINARY LEAST SQUARE METHOD Ordinary least squares (OLS) or linear least squares are a method for estimating the unknown parameters in a linear regression model. model . This method minimizes the sum of squared vertical distances between the observed responses in the dataset, dataset, and the responses predicted by the linear linear approxima approximation. tion. The resulting resulting estimator can can be expr expres esse sed d by a simp simple le form formul ula, a, especially in the case of a single regressor on the right-hand side. The The OL OLS S esti estima mato torr is consistent when when the regres regressor sor are exogenous and and ther theree is no Multicollinearity, and optimal in the class of linear unbiased estimators when the errors are homosced homoscedastic astic and serially serially uncorrela uncorrelated ted.. Unde Underr thes thesee cond condit itio ions ns,, the the meth method od of OL OLS S provides minimum-variance mean-unbiased estimation when the errors have finite variances. Under the additional assumption that the errors be normally distributed, distributed , OLS is the maximum likelihood estimator. estimator . OLS is used in economics ( econometrics) econometrics) and electrical engineering (control theory and signal processing), processing ), among many areas of application. TEST OF REGRESSION ESTIMATES: To test if one variable significantly predicts another variable we need to only test if the correlation between the two variables is significant different to zero (i.e., as above). In regression, a significant prediction means a significant proportion of the variability in the predic predicted ted varia variable ble can can be accou accounte nted d for by (or "attri "attribut buted ed to", to", or "expla "explaine ined d by", by", or "associated with") the predictor variable.

Descriptive Statistics N 30

Mean 77500.00

Std. Deviation 44017.042

30

62544.67

32055.690

INCOME EXPENDITURE Valid N (listwise)

30

Model Fit

Fit Statistic Stationary R-squared R-squared RMSE MAPE MaxAPE MAE MaxAE Normalized BIC

Mean

SE

Minimum

Maximum

5

10

25

50

Percentile 75

90

95

5

10

25

50

.428

.

.428

.428

.428

.428

.428

.428

.428

.428

.428

.997 1882.23 1 3.282

.

.997 1882.23 1 3.282

.997

.997 1882.23 1 3.282

.997 1882.23 1 3.282

.997 1882.23 1 3.282

.997 1882.23 1 3.282

.997 1882.23 1 3.282

.997 1882.23 1 3.282

.997

16.348 1439.57 7 4395.07 6

16.348 1439.57 7 4395.07 6

16.348 1439.57 7 4395.07 6

16.348 1439.57 7 4395.07 6

16.348 1439.57 7 4395.07 6

16.348 1439.57 7 4395.07 6

15.307

15.307

15.307

15.307

15.307

15.307

16.348 1439.57 7 4395.07 6 15.307

. . . . . .

16.348 1439.57 7 4395.07 6 15.307

1882.231 3.282 16.348 1439.577 4395.076 15.307

13

1882.231 3.282 16.348 1439.577 4395.076 15.307

ANOVA (b)

Model 1

Regression Residual Total

a) b)

Sum of Squares 29230939495.261

df 1

Mean Square 29230939495.261

568511251.405

28

20303973.264

29799450746.667

29

F 1439.666

Sig. .000(a)

Pred Predic icto tors rs:: (Const (Constan ant) t),, INCOM INCOME E Depend Dependent ent Variab Variable: le: EXPEND EXPENDITU ITURE RE

F-TEST An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis . It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled. ANOVA Analysis of variance (ANOVA) is a collection of statistical models, models , and their associated proced procedure ures, s, in which which the observ observed ed variance in a partic particula ularr varia variable ble is partit partition ioned ed into into compon componen ents ts attrib attributa utable ble to differ differen entt source sourcess of variat variation ion.. In its simple simplest st form form ANOVA ANOVA provides a statistical test of whether or not the means of several groups are all equal, and therefore generalizes t -test -test to more than two groups. Doing multiple two-sample t-tests would result in an increased chance of committing a type I error. error . For this reason, ANOVAs are useful in comparing two, three or more means. RELIABILITY: RELIABILITY: Case Processing Summary N Cases

a)

Valid Excluded(a)

30

% 100.0

0

.0

Total

30

100.0

Listwise Listwise deletio deletion n based based on all variable variables s in the procedu procedure. re.

Reliability Statistics

Cronbach's Alpha .970

Cronbach's Alpha Based on Standardized It Items N of Items .995

2

14

Inter-Item Covariance Matrix

INCOME EXPENDITURE

INCOME 1937 193750 5000 0000 00.0 .000 00

EXPENDITURE 1397 139747 4724 2413 13.7 .793 93

1397 139747 4724 2413 13.7 .793 93

1027 102756 5672 7267 67.1 .126 26

Inter-Item Correlation Matrix

INCOME EXPENDITURE

INCOME 1.000

EXPENDITURE .990

.990

1.000

Summary Item Statistics

Item Means Item Variances

Inte Interr-IItem tem Covar ovaria ianc nces es Inter-Item Correlations

Maximum / Minimum

Mean

Minimum

Maximum

Range

Variance 11183099 7.556 41398878 91773752 00.000

N of Items

70022.333

62544.667

77500.000

14955.333

1.239

14825336 33.563

10275672 67.126

19375000 00.000

90993273 2.874

1.886

1397 139747 4724 24 13.793 .990

13974724 13.793 .990

13974724 13.793 .990

.000

1.000

.000

2

.000

1.000

.000

2

2 2

Item-Total Statistics Scale Mean if Item Deleted

Scale Variance if Item Deleted

Corrected Item-Total Correlation

Squared Multiple Correlation

VAR00001

1.1186

2.439

.302

.091

.

a

VAR00002

.6402

.176

.302

.091

.

a

.

Cronbach's Alpha if Item Deleted

The value is negative due to a negative average covariance among items. This Violates reliability model assumptions. You may want to check item codings.

Scale Statistics Mean 140044.67

Variance 57600120 94.713


N of Items 2

15

ANOVA

Sum of Squares 83520175373.33 3

Between People W ithin People

Between Items Residual Total

Total

df

Mean Square 29

2880006047.356

3354929926.667

1

3354929926.667

2466775373.333

29

85061219.770

5821705300.000 89341880673.33 3

30

194056843.333

59

1514269163.955

F

39.441

Sig

.000

Grand Mean = 70022.33

MODELS:  (Model 1) FIXED EFFECTS MODEL

The The fixed fixed-e -effe ffect ctss mode modell of anal analy ysis sis of varia varianc ncee appl applie iess to situ situat atio ions ns in whic which h the the experimenter applies one or more treatments to the subjects of the experiment to see if the response variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole.  (Model 2) RANDOM EFFECT MODEL

Random effects models are used when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables, variables , some assumptions and the method of contrasting the treatments differ from ANOVA model 1.  (Model 3) MIXED EFFECTS MODEL

A mixed-effects model contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types. Most random-effects or mixed-effects models are not concerned with making inferences concerning the particular values of the random effects that happen to have been sampled. For example, consider a large manufacturing plant in which many machines produce the same product. The statistician studying this plant would have very little interest in comparing the three particular machines to each other. Rather, inferences that can be made for all machines are of interest, such as their variability and the mean. However, if one is interested in the realized value of the random effect, best linear unbiased prediction can be used to obtain a "prediction" for the value. ASSUMPTIONS OF ANOVA The analysis of variance has been studied from several approaches, the most common of which use a linear model that relates the response to the treatments and blocks. Even when the statistical model is nonlinear, it can be approximated by a linear model for which an analysis of variance may be appropriate. 16

Independence of cases – this is an assumption of the model that simplifies the statistical analysis. Normality – the distributions of the residuals are normal. Equality (or "homogeneity") of variances, called homoscedasticity — the variance of data in groups should be the same. Model-based approaches usually assume that the variance is constant. The constant-variance property also appears in the randomization (design-based) analysis of randomized experiments, where it is a necessary consequence of the randomized design.

MEANS: Case Processing Summary Cases Included N EXPENDITURE * INCOME

Excluded

Percent 30

N

100.0%

Total

Percent 0

N

.0 %

Percent 30

100.0%

Report

GOODNESS TO FIT: The goodne goodness ss of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing. CHI-SQUARE AS GOODNESS TO FIT When an analyst attempts to fit a statistical model to observed data, he or she may wonder how well the model actually reflects the data. How "close" are the observed values to those which would be expected under the fitted model? One statistical test that addresses this issue is the chi-square goodness of fit test. Test Statistics

Chi-Square(a,b) df Asymp. Sig. a) b)

INCOME .000

EXPENDITURE .933

29

28

1.000

1.000

30 cells cells (100.0%) (100.0%) have have expected expected frequenc frequencies ies less less than 5. 5. The minimu minimum m expected expected cell frequency is 1.0. 29 cells cells (100.0%) (100.0%) have have expected expected frequenc frequencies ies less less than 5. 5. The minimu minimum m expected expected cell frequency is 1.0.

17

INCOME

EXPENDITURES

Observed N

Expected N

Residual

Observed N

Expected N

Residual

5000

1

1

0

5000

1

1

0

10000

1

1

0

9500

1

1

0

15000

1

1

0

14500

1

1

0

20000

1

1

0

18500

1

1

0

25000

1

1

0

19000

1

1

0

30000

1

1

0

27000

1

1

0

35000

1

1

0

30500

1

1

0

40000

1

1

0

35000

1

1

0

45000

1

1

0

39000

1

1

0

50000

1

1

0

45500

1

1

0

55000

1

1

0

49500

1

1

0

60000

1

1

0

52000

1

1

0

65000

1

1

0

55000

1

1

0

70000

1

1

0

59000

1

1

0

75000

1

1

0

64000

1

1

0

80000

1

1

0

69500

1

1

0

85000

1

1

0

73000

1

1

0

90000

1

1

0

78500

1

1

0

95000

1

1

0

81000

1

1

0

100000

1

1

0

84700

1

1

0

105000

1

1

0

90000

2

1

1

110000

1

1

0

90500

1

1

0

115000

1

1

0

93000

1

1

0

120000

1

1

0

94800

1

1

0

125000

1

1

0

95750

1

1

0

130000

1

1

0

98000

1

1

0

135000

1

1

0

100000

1

1

0

140000

1

1

0

104590

1

1

0

145000

1

1

0

110000

1

1

0

150000

1

1

0

Total

Total

30

30

CORRELATION:

Dependence refers to any statistical r elationship between two random variables or two sets of data. data. Corre Correla lati tion on refe refers rs to any any of a broa broad d clas classs of stat statis isti tica call rela relati tion onsh ship ipss invo involv lvin ing g dependence. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price. Correlations are useful because they can indicate a predictive relationship that 18

can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example there is a causal relationship. relationship .

Descriptive Statistics

INCOME EXPENDITURE

Mean 77500.00


62544.67

32055.690

N 30 30

Correlations

INCOME

Pearson Co Correlation

INCOME 1

Sig. (2-tailed) Sum of Squares and Cross-products Covariance N EXPE EXPEND NDIT ITUR URE E

Pear Pearso son n Corr Correl elat atio ion n Sig. (2-tailed) Sum of Squares and Cross-products Covariance N

EXPENDITURE .990(**) .000

56187500 000.000 19375000 00.000 30 .990(**)

40526700000.000 1397472413.793 30 1

.000 40526700 000.000 13974724 13.793 30

29799450746.667 1027567267.126 30

** Correlation is significant significant at the 0.01 level (2-tailed).

CORRELATION COEFFICEINT: COEFFICEINT: Correlation coefficient may refer to:  Pearson product-moment correlation coefficient, also known as r, R, or Pearson's r, a

measure of the strength of the linear relationship between two variables that is defined in terms of the (sample) covariance of the variables divided by their (sample) standard deviations  Correlation and dependence, a broad class of statistical relationships between two or more random variables or observed data values  Goodness of fit, which refers to any of several measures that measure how well a statistical model fits observations by summarizing the discrepancy between observed values and the values expected under the model in question

19



Coefficie Coefficient nt of determination determination,, a measure measure of the proportion proportion of variability variability in a data set that is accounted for by a statistical model; often called R2; equal in a single-variable linear regression to the square of Pearson's product-moment correlation coefficient.

Coefficient Correlations (a) Model 1 a)

Correlations Covariances

INCOME 1.000

INCOME INCOME

.000

Depend Dependent ent Variab Variable: le: EXPEN EXPENDIT DITURE URE

Collinearity Diagnostics (a) Eigenvalue Model 1

Dimension 1 2 a)

(Constant) 1.873

Condition Index

Variance Pr Proportions

INCOME (Constant) INCOME 1.000 .06 .06

.127

3.842

.94

.94


RESIDUALS:

Residuals Statistics (a)

Predicted Value

Minimum 10252.15

Maximum 114837.18

Mean 62544.67


Residual

N 30

-7624.422

7620.241

.000

4427.622

30

Std. Predicted Value

-1.647

1.647

.000

1.000

30

Std. Residual

-1.692

1.691

.000

.983

30

a)


CHARTS: 20

Histogram

Dependent Variable: EXPENDITURE 5

4

y c n3 e u q e r F2

1 Mean =-1.04E-16 Std. Dev. =0.983 N =30

0 -2

-1

0

1

2

Regression Standardized Residual

Normal P-P Plot of Regression Standardized Residual

Dependent Variable: EXPENDITURE

1.0

0.8

b o r P 0.6 m u C d e t c0.4 e p x E 0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

Observed Cum Prob

21

Normal P-P Plot of INCOME

1.0

b o r0.8 P m u C0.6 d e t c e0.4 p x E 0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

Observed Cum Prob Transforms: natural log

 

104590

Dot/Li nes s how Modes

  

95750   

90500   

E R U T I D N E P X E



81000   

69500   

55000

  

45500   

30500   

18500  

5000



5000

35000 20000

65000 50000

95000 80000

125000

110000

140000

INCOME

CLASSICAL NORMAL LIINEAR REGRESSION MODEL:

22

Econometrics is all about causality. Economics is full of theory of how one thing causes another: increases in prices cause demand to decrease, better education causes people to become richer, etc. So to be able to test this theory, economists find data (such as price and quantity of a good, or notes on a population's education and wealth levels). Data always comes out looking like a cloud, and without using proper techniques, it is impossible to determine if this cloud gives any useful information. Econometrics is a tool to establish correlation and hopefully later, causality, using collected data points. We do this by creating an explanatory function from the data. The function is linear model and is estimated by minimizing the squared distance from the data to the line. The distance is considered an error term. This is the process of linear li near regression.

ASSUMPTIONS UNDERLYING CLASSICAL NORMAL LIINEAR REGRESSION MODEL There are 5 critical assumptions relating to CLRM. These assumptions are required to show that the estimation technique, Ordinary Least Squares (OLS), has a number of desirable properties, and also so that the hypothesis tests regarding the coefficient estimates could validly be conducted.

CRITICAL ASSUMPTIONS:  The errors have zero mean.  The variance of the errors is constant and finite over all values of X.  The errors are statistically independent of one another.  There is no relationship between the error and the corresponding X.  ∑ is normally distributed.

DETAILED ASSUMPTIONS  The regression model is linear in parameters   

    

The value of the regressor’s, X’s (independent variables) are fixed in repeated samples. For given values of X’s, the mean value of the errors equals zero. For given values of X’s, the variance of the errors in constant. For given values of X’s there is no autocorrelation. The X’s are stochastic and the errors and the X’s are not correlated. The number of observations is greater than the number of independent variables. There is sufficient variability in the values of the X’s. The regression model is correctly specified. 23



There is not multi-Collinearity.  The error term is normally distributed.

T-TEST: A t-tes t-testt is any statistic statistical al hypothes hypothesis is test in which the test statistic follows a Studen Student's t's t distribution, distribution, if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known.

One-Sample Statistics

N INCOME

30

Mean 77500.00

EXPENDITURE

30

62544.67


Std. Error Mean 8036.376

32055.690

5852.542

One-Sample Test

t Lower 9.644

INCOME EXPENDITURE

10.687

Test Value = 0 Mean Sig. (2-tailed) Difference

df Upper

Lower

95% Confidence Interval of the Difference

29

Upper Lower .000 77500.000 61063.77

29

.000

62544.667

50574.88

Upper 93936.23 74514.46

ANOVA EXPENDITURE Sum of Squares Between Groups

(Combined) Linear Term

Contrast Deviation

Mean Square

F

Sig.

29799450746.667

29

1027567267.126

.

29230939495.261

1

29230939495.261

.

568511251.405

28

20303973.264

.

.000

0

.

29799450746.667

29

Within Groups Total

df

.

USES: Among the most frequently used t -tests -tests are: 24

•

•

•

•

A one-sample location test of whether the mean of a normally distributed population has a value specified in a null hypothesis. A two sample location test of the null hypothesis that the means of two normally distributed populatio populations ns are equal. All such tests are usually usually called Student's Student's t -tests, -tests, though strictly speaking that name should only be used if the variances of the two popula populatio tions ns are also assume assumed d to be equal; equal; the form form of the test used used when when this this assumpti assumption on is dropped dropped is sometimes sometimes called Welch's t -test. -test. These These tests tests are often often referred referred to as "unpaired" "unpaired" or "independ "independent ent samples" samples" t -tests -tests,, as they they are typic typicall ally y applied when the statistical units underlying the two samples being compared are nonoverlapping. A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor size for many of the patients to be smaller following the treatment. This is often referred to as the "paired" or "repeated measures" t -test. -test. A test of whether the slope of a regression line differs significantly from 0.

TYPES: UNPAIRED & PAIRED TWO SAMPLES T-Test Two-sample t-tests for a difference in mean can be either unpaired or paired. Paired t-tests are a form of blocking, blocking, and have greater power than unpaired tests when the paired units are similar with respect to "noise factors" that are independent of membership in the two groups being being compared compared.. In a different different context, context, paired t-tests can be used to reduce reduce the effects effects of confounding factors in an observational study. study . The unpaired, or "independent samples" t-test is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared. For example, suppose we are evaluating the effect of a medical treatment, and we enroll 100 subjects into our study, then randomize 50 subjects to the treatment group and 50 subjects to the control group. In this case, we have two independent samples and would use the unpaired form of the t-test. The randomization is not essential here—if we contacted 100 people by phone and obtained each person's age and gender, and then used a two-sample ttest to see whether the mean ages differ by gender, this would also be an independent samples t-test, even though the data are observational. Dependen Dependentt samples samples (or "paired") "paired") t-tests t-tests typically typically consist of a sample sample of matched matched pairs of similar units, units, or one group of units that has been tested twice (a "repeated measures" t-test). A typical example of the repeated measures t-test would be where subjects are tested prior to a treatment, say for high blood pressure, and the same subjects are tested again after treatment with a blood-pressure lowering medication. A dependent t-test based on a "matched-pairs sample" results from an unpaired sample that is subsequently used to form a paired sample, by using additional variables that were measured 25

along with the variable of interest. The matching is carried out by identifying pairs of values consisting of one observation from each of the two samples, where the pair is similar in terms of other measured variables. This approach is often used in observational studies to reduce or eliminate the effects of confounding factors.

SUMMARY:

Case Processing Summary (a) Cases Included N

Excluded

INCOME

30

Percent 100.0%

EXPENDITURE

30

100.0%

a)

N

Total

0

Percent .0%

0

.0%

N 30

Percent 100.0%

30

100.0%

Limi Limite ted d to to fir first st 100 100 cas cases es..

26

Case Summaries (a)

Case Number 1

INCOME 5000

EXPENDITURE 5000

2

10000

9500

3

3

15000

14500

4

4

20000

18500

5

5

25000

19000

6

6

30000

27000

7

7

35000

30500

8

8

40000

35000

9

9

45000

39000

10

10

50000

45500

11

11

55000

49500

12

12

60000

52000

13

13

65000

55000

14

14

70000

59000

15

15

75000

64000

16

16

80000

69500

17

17

85000

73000

18

18

90000

78500

19

19

95000

81000

20

20

100000

84700

21

21

105000

90000

22

22

110000

90000

23

23

115000

90500

24

24

120000

93000

25

25

125000

94800

26

26

130000

95750

27

27

135000

98000

28

28

140000

100000

29

29

145000

104590

30

30

150000

110000

77500.00

62544.67

Minimum

5000

5000

Maximum

150000

110000

145000 19375000 00.000 30

105000

1 2

Total

Mean

Range Variance N a)

1027567267.126 30

Limi Limite ted d to to fir first st 100 100 cas cases es..

CONCLUSION: Hence from all the above discussion, we found that monthly expenditures are dependent on the monthly total income and the contribution of population is very low in this regards. As the person who earns make expenses and also save the surplus amount so total monthly income is break up of Expenditures and savings.

27

Statistical Relationship Between Income and Expenditures

Recommend Documents