Group Assignment Subject Name : Statistics for Business Decisions Subject Code : HI6007
Lecturer Name : Dr Serguei Mikhailitchenko
Pr epared By: B y: 1
SHOUGANA G CH E N
(DC500 (DC5003)
2
MA ULI UL I K TH AK AR
(E G U8594) U8594)
3
K I R AN DE E P KA UR
(E MV 8687) 8687)
Task 1
The data for task 1 in the data file for assignment represents the starting costs in thousands of dollars for different kind of business.
X1
Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum
X2
83 Mean Standard 9.46722 Error 80 Median 35 Mode Standard 34.1345 Deviation Sample 1165.17 Variance 1.04192 Kurtosis 0.13297 Skewness 105 Range 35 Minimum 140 Maximum
X3
92.090909 Mean Standard 11.726779 Error 87 Median #N/A Mode Standard 38.893327 Deviation Sample 1512.6909 Variance 0.4369227 Kurtosis 0.5098441 Skewness 120 Range 40 Minimum 160 Maximum
72.3 9.918613 70 #N/A 31.36541 983.7889 -0.95897 0.546078 90 35 125
Sum Count
1079 Sum 13 Count
X4
1013 Sum 11 Count
X5
Mean Standard Error Median Mode Standard Deviation Sample Variance
87 11.3539 97.5 100
51.625 6.76872 49 30
Kurtosis Skewness Range Minimum Maximum Sum
-0.4857 0.07729 115 35 150 870
27.0749 733.05 0.47673 0.63311 90 20 110 826
Count
Mean Standard Error Median Mode Standard 35.9042 Deviation 1289.11 Sample Variance Kurtosis Skewness Range Minimum Maximum Sum
10 Count
16
723 10
2.
a) Frequency and relative frequency distributions
b) Relative frequency Histogram For X1
Relative Frequency Histogram of X1 0.35 y c n e u q e r F e v i t a l e R
0.3 0.25 0.2 0.15 0.1 0.05 0 30
60
90
120
150
180
Range
For X2
Relative Frequency Histogram of X2 0.4 y 0.35 c n 0.3 e u q 0.25 e r F 0.2 e v i 0.15 t a l 0.1 e R 0.05 0 30
60
90
120 Range
15 0
180
For X3
Relative Frequency Histogram of X3 0.5 y c n e u q e r F e v i t a l e R
0.4 0.3 0.2 0.1 0 30
60
90
120
150
180
Range
For X4
Relative Frequency Histogram of X4 y c n e u q e r F e v i t a l e R
0.6 0.5 0.4 0.3 0.2 0.1 0 30
60
90
120 Range
For X5
150
180
Relative Frequency Histogram of X5 y c n e u q e r F e v i t a l e R
0.4 0.3 0.2 0.1 0 30
60
90
120
150
180
Range
3. Results obtained in parts 1 and 2
Part 1 and part 2 has the information about the descriptive statistics and frequency distribution of the given data set. From the part 1 results, we can determine that the start-up cost for baker/donuts (X2) is the highest while start-up cost for pet stores (X5) is the lowest. Range value is the highest for the business of baker/donuts (X2) implies the mean of the data set for the start-up costs of this business is not representative of data. Apart from this, range of the data set for the start-up costs of shoe stores (X3) and pet stores (X5) is lower as there is low difference between the individual individual scores. In addition, higher variance and standard deviation of the data set for the start-up costs of the business for baker/donuts (X2) indicates the existence of outliers (Newbold et al. 2012). On the other hand, these values are lower for the data set of pet stores (X5) showing low variability in data set in relation to the mean.
At the same time, the outcomes of part 2 indicate frequency and relative frequency of the given dataset. It implies that data set of start-up costs for baker/donuts (X2) has outliers showing higher variability in relation to the average value. The distribution curve is left-skewed because of presence of outliers in the given data set (Weiers, 2010). But, the distribution curve of data set for business of pet stores (X5) is normally distributed due to low variability in relation to the mean.
4. Test if there significant difference in the starting costs for these types of business.
H0: There is no difference in the starting costs for these types of business. H1: There is significant difference in the starting costs for these types of business Test:
Results: Fvalue>Fcritical p-value (0.018) < p-significance value (0.05)
Null hypothesis hypothesis is rejected means means there is significant significant difference in the starting costs costs for these types types of business.
The data for Task 2 in the data file for Assignment represents the following variables for franchisees of All Greens Pty Ltd: annual sales ($’000), the floor area (sq.ft.’000), inventory ($’000), ($’000), advertising expenditure ($’000), the size of the area where the business operates (number of families, ‘000) and the number of competitors in the area. Data: Table 1 is the data for the task 2. All Greens Franchise
The data (X1, X2, X3, X4, X5, X6) are for each franchise store. X1 = annual net sales/$1000 X2 = number sq. ft./1000 X3 = i nventory/$1000 X4 = amount spent on advertizing/$1000 X5 = size of sales district/1000 families X6 = number of competing stores in district
X1
X2
X3
X4
X5
X6
231
3
294
8.199999809
8.199999809
11
156
2.200000048
232
6.900000095
4.099999905
12
10
0.5
149
3
4.300000191
15
519
5.5
600
12
16.10000038
1
437
4.400000095
567
10.60000038
14.10000038
5
487
4.800000191
571
11.80000019
12.69999981
4
299
3.099999905
512
8.1
10.10000038
10
195
2.5
347
7.699999809
8.4
12
20
1.200000048
212
3.299999952
2.099999905
15
68
0.600000024
102
4.900000095
4.699999809
8
570
5.400000095
788
17.39999962
12.30000019
1
428
4.199999809
577
10.5
14
7
464
4.699999809
535
11.30000019
15
3
15
0.600000024
163
2.5
2.5
14
65
1.200000048
168
4.699999809
3.299999952
11
98
1.600000024
151
4.599999905
2.700000048
10
398
4.300000191
342
5.5
16
4
161
2.599999905
196
7.199999809
6.300000191
13
397
3.799999952
453
10.39999962
13.89999962
7
497
5.300000191
518
11.5
16.29999924
1
528
5.599999905
615
12.30000019
16
0
99
0.800000012
278
2.799999952
6.5
14
0.5
1.100000024
142
3.099999905
1.600000024
12
347
3.599999905
461
9.6
11.30000019
6
341
3.5
382
9.800000191
11.5
5
507
5.099999905
590
12
15.69999981
0
400
8.6
517
7
12
8
Table 1: Original data
i
We have one dependent variable y, which is annual net sales, and 5 independent variables, which are number sq. ft., inventory, amount spent on advertising, size of sales district and number of competing stores in district. We can use Multiple Regression Model to deal with the task.
1.
Table 2 is
Microsoft Excel regression output for annual net sales (y) and number sq. ft. ( ), inventory ( ( ),
amount spent on advertising ( ( ), size of sales district ( ( ), and number of competing stores in district ( ). SUMMARY OUTPUT Regression Statistics
Multiple R
0.996583914
R Square
0.993179497
Adjusted R Square
0.991555568
Standard Error
17.64924165
Observations
27
ANOVA df
SS
MS
F
Significance Significance F
Regression Regression
5
952538.9415
190507.7883
611.5903672
5.39731E-22
Residual
21
6541.410344
311.4957306
Total
26
959080.3519
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Intercept
-18.85941416
30.15022791
-0.625514812
0.538372333
-81.56024554
43.84141723
-81.56024554
43.84141723
X2
16.20157356
3.544437306
4.570986073
0.000165985
8.830512669
23.57263445
8.830512669
23.57263445
X3
0.174635154
0.057606068
3.031540961
0.006346793
0.054836778
0.294433531
0.054836778
0.294433531
X4
11.52626903
2.5321033
4.55205324
0.000173652
6.260471952
16.79206611
6.260471952
16.79206611
X5
13.5803129
1.770456609
7.670514392
1.60543E-07
9.898446822
17.26217897
9.898446822
17.26217897
X6
-5.31097141
1.70542654
-3.114160174 -3.114160174
0.005248873
-8.857600053
-1.764342766
-8.857600053
-1.764342766
Table 2: MS Excel regression output for annual net sales
From the Excel output we can get the estimated regression equation, which is: = −18.86 + 16.20 .20 + 0.17 0.17 + 11.5 11.53 3 + 13.5 13.58 8 − 5.31 5.31 2. As we know, R Square ( ( ), the Coefficient of Determination, which tells us how many points fall on the regression line. But because we have more than one x variables, we should use Adjusted R Square. Our Adjusted R Square is 0.99, which means that 99% of the variation of y-values around the mean are explained by the x-values. x-values. In other words, words, 99% of the values fit the the model. Therefore, Therefore, the model model fits the data very well. 3.
Annual net sales (y) is dependent variable, and the others are independent variables. 1) :
= = = = = 0
: ≠ 0 2) = 0.05 3) P-Value = 5.39731E-22 5.39731E-22 ≈ 0 < = 0.05 4) Conclusion: Reject at = 0.05, there is no sufficient evidence to support = = = = = 0. Therefore, we reject that there is no significant relationship between between the dependent and any of the independent variables. OR Table 3 shows the P-Value test and results.
Dependent Annual sales
Independent Area
p-value 0
Test p-value <0.05
Annual sales
Inventory
0.006
p-value <0.05
Annual sales
Advertising spending Size of sales district Number of competing stores
0
p-value <0.05
0
p-value <0.05
0.005
p-value <0.05
Annual sales Annual sales
Relationship Significant relationship exists Significant relationship exists Significant relationship exists Significant relationship exists Significant relationship exists
Table 3: P-Value test and results 4. Table 4 shows the interpretation of individual slope coefficients.
Variables
Coefficients
Annual net sales -18.86
Interpretation Set = = = = = 0, y = -18.86 -18.86 means that without store, inventory, advertising expense, sales district and competing stores in the district, we will gain -18.86 thousand dollars sales.
Obviously, this is meaningfulness. Number sq. ft. ft.
16.20
16.20 means that every single sq. ft. increasing of store area can increase 16.20 thousand dollars sales.
Inventory
0.17
0.17 means that increasing 1 thousand dollars inventory can increase 170 dollars sales.
Advertising expense
11.53
11.53 means that increasing 1 thousand dollars advertising expense can increase 11.53 thousand dollars sales.
Size of sales district
13.58
13.58 means that if the sales district increases 1000 families the sales will increase 13.58 thousand dollars.
Number of competing stores in district
-5.31
-5.31 means that increasing a single competing store in the sales store can decrease 5.31 thousand dollars sales.
Table 4: interpretation of individual slope coefficients 5. Table 5 shows the interval for slope coefficients of individual variables. Variables Annual net sales
Lowest slope coefficient
Highest slope coefficient
-81.56024554
43.84141723
Store area
8.830512669
23.57263445
Inventory
0.054836778
0.294433531
Advertising expense
6.260471952
16.79206611
Size of sales district
9.898446822
17.26217897
-8.857600053
-1.764342766
Number of competing stores in district
Table 5: the interval for slope coefficients 6.
Variables
t-stat
t-critical
Criteria Accept H0: t-stat>t-cr Else Rejected it
Result
Area
7.7354
2.0555
Rejected
Statistically significant
Inventory
-8.2877
2.0555
Accepted
Not significant
Advertising spending
7.6716
2.0555
Rejected
Statistically significant
7.6869
2.0555
Rejected
Statistically significant
7.3719
2.0555
Rejected
Statistically significant
Size of sales market Number of competing stores
Table 6: Test the estimated slope coefficients for individual variables Table 6 shows the results of testing the estimated slope coefficients for individual individual variables. From the table we can get that the estimated slope coefficient for all variables except inventory is significant.
7. From Table 6, inventory is not significant, we remove it and get below result in Table 7. SUMMARY OUTPUT RegressionStatistics Multiple R R Square AdjustedR
0.995085241 0.990194637
Square Stan Standa dard rdE Err rror or Observations
0.988411844 20.6 20.675 7511 1179 795 5 27
ANOVA Significance df Regression Residual Total
SS 4 22 26
Coefficients Intercept
MS
F
F
237419.0552 427.4605024
555.4175271
9.57989E-22
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
949676.2208 9404.131054 95 9 59080.3519
-39.460022
34.41055873
-1.14674168
0.263807827
-110.8231531
31.90310896
X2 X4 X5
20.44388672 16.96614275 15.67296189
3.814801407 2.092787626 1.90985556
5.359095937 8.10695865 8.206359798
2.21824E-05 4.73185E-08 3.85791E-08
12.53247282 12.62596685 11.71216388
28.35530062 21.30631864 19.6337599
X6
-4.04330128
1.936828415
-2.08758879
0.048629066
-8.060037571
-0.026565
From the Excel output we can get the estimated regression equation, which is: = −39.46 + 20.44 + 16.9 16.97 7 + 15.6 15.67 7 − 4.04 4.04
8.
= −39.46 + 20.44 + 16.9 16.97 7 + 15.6 15.67 7 − 4.04 4.04 = $183591.46
Black, K., 2009. Business statistics: Contempora Contemporary ry decision decision making . USA: John Wiley & Sons. Groebner, D.F., Shannon, P.W., Fry, P.C. and Smith, K.D., 2011. Business statistics: statistics: A decision making approach approach. UK: Prentice Hall/Pearson. Newbold, P., Carlson, W. and Thorne, B., 2012. Statistics for business and economics . UK: Pearson. Weiers, R.M., 2010. Introduction Introduction to business statistics statistics. USA: Cengage Learning.
Source: http://college.cengage.com/m http://college.cengage.com/mathematics/brase/u athematics/brase/understandable_ nderstandable_statistics/7e/students/da statistics/7e/students/datasets/owan/frames/fr tasets/owan/frames/fram am e.html i