Group Assignment Hi6007 statitics

Group Assignment Subject Name : Statistics for Business Decisions Subject Code : HI6007

Lecturer Name : Dr Serguei Mikhailitchenko

Pr epared By: B y: 1

SHOUGANA G CH E N

(DC500 (DC5003)

2

MA ULI UL I K TH AK AR

(E G U8594) U8594)

3

K I R AN DE E P KA UR

(E MV 8687) 8687)

Task 1

The data for task 1 in the data file for assignment represents the starting costs in thousands of dollars for different kind of business.

X1

Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum

X2

83 Mean Standard 9.46722 Error 80 Median 35 Mode Standard 34.1345 Deviation Sample 1165.17 Variance 1.04192 Kurtosis 0.13297 Skewness 105 Range 35 Minimum 140 Maximum

X3

92.090909 Mean Standard 11.726779 Error 87 Median #N/A Mode Standard 38.893327 Deviation Sample 1512.6909 Variance 0.4369227 Kurtosis 0.5098441 Skewness 120 Range 40 Minimum 160 Maximum

72.3 9.918613 70 #N/A 31.36541 983.7889 -0.95897 0.546078 90 35 125

Sum Count

1079 Sum 13 Count

X4

1013 Sum 11 Count

X5

Mean Standard Error Median Mode Standard Deviation Sample Variance

87 11.3539 97.5 100

51.625 6.76872 49 30

Kurtosis Skewness Range Minimum Maximum Sum

-0.4857 0.07729 115 35 150 870

27.0749 733.05 0.47673 0.63311 90 20 110 826

Count

Mean Standard Error Median Mode Standard 35.9042 Deviation 1289.11 Sample Variance Kurtosis Skewness Range Minimum Maximum Sum

10 Count

16

723 10

2.

a) Frequency and relative frequency distributions

b) Relative frequency Histogram For X1

Relative Frequency Histogram of X1 0.35 y c n e u q e r F e v i t a l e R

0.3 0.25 0.2 0.15 0.1 0.05 0 30

60

90

120

150

180

Range

For X2

Relative Frequency Histogram of X2 0.4 y 0.35 c n 0.3 e u q 0.25 e r F 0.2 e v i 0.15 t a l 0.1 e R 0.05 0 30

60

90

120 Range

15 0

180

For X3

Relative Frequency Histogram of X3 0.5 y c n e u q e r F e v i t a l e R

0.4 0.3 0.2 0.1 0 30

60

90

120

150

180

Range

For X4

Relative Frequency Histogram of X4 y c n e u q e r F e v i t a l e R

0.6 0.5 0.4 0.3 0.2 0.1 0 30

60

90

120 Range

For X5

150

180

Relative Frequency Histogram of X5 y c n e u q e r F e v i t a l e R

0.4 0.3 0.2 0.1 0 30

60

90

120

150

180

Range

3. Results obtained in parts 1 and 2

Part 1 and part 2 has the information about the descriptive statistics and frequency distribution of the given data set. From the part 1 results, we can determine that the start-up cost for baker/donuts (X2) is the highest while start-up cost for pet stores (X5) is the lowest. Range value is the highest for the business of baker/donuts (X2) implies the mean of the data set for the start-up costs of this business is not representative of data. Apart from this, range of the data set for the start-up costs of shoe stores (X3) and pet stores (X5) is lower as there is low difference between the individual individual scores. In addition, higher variance and standard deviation of the data set for the start-up costs of the business for baker/donuts (X2) indicates the existence of outliers (Newbold et al. 2012). On the other hand, these values are lower for the data set of pet stores (X5) showing low variability in data set in relation to the mean.

At the same time, the outcomes of part 2 indicate frequency and relative frequency of the given dataset. It implies that data set of start-up costs for baker/donuts (X2) has outliers showing higher variability in relation to the average value. The distribution curve is left-skewed because of presence of outliers in the given data set (Weiers, 2010). But, the distribution curve of data set for business of pet stores (X5) is normally distributed due to low variability in relation to the mean.

4. Test if there significant difference in the starting costs for these types of business.

H0: There is no difference in the starting costs for these types of business. H1: There is significant difference in the starting costs for these types of business Test:

Results: Fvalue>Fcritical p-value (0.018) < p-significance value (0.05)

Null hypothesis hypothesis is rejected means means there is significant significant difference in the starting costs costs for these types types of business.

The data for Task 2 in the data file for Assignment represents the following variables for franchisees of All Greens Pty Ltd: annual sales ($’000), the floor area (sq.ft.’000), inventory ($’000), ($’000), advertising expenditure ($’000), the size of the area where the business operates (number of families, ‘000) and the number of competitors in the area. Data: Table 1 is the data for the task 2. All Greens Franchise

The data (X1, X2, X3, X4, X5, X6) are for each franchise store. X1 = annual net sales/$1000 X2 = number sq. ft./1000 X3 = i nventory/$1000 X4 = amount spent on advertizing/$1000 X5 = size of sales district/1000 families X6 = number of competing stores in district

X1

X2

X3

X4

X5

X6

231

3

294

8.199999809

8.199999809

11

156

2.200000048

232

6.900000095

4.099999905

12

10

0.5

149

3

4.300000191

15

519

5.5

600

12

16.10000038

1

437

4.400000095

567

10.60000038

14.10000038

5

487

4.800000191

571

11.80000019

12.69999981

4

299

3.099999905

512

8.1

10.10000038

10

195

2.5

347

7.699999809

8.4

12

20

1.200000048

212

3.299999952

2.099999905

15

68

0.600000024

102

4.900000095

4.699999809

8

570

5.400000095

788

17.39999962

12.30000019

1

428

4.199999809

577

10.5

14

7

464

4.699999809

535

11.30000019

15

3

15

0.600000024

163

2.5

2.5

14

65

1.200000048

168

4.699999809

3.299999952

11

98

1.600000024

151

4.599999905

2.700000048

10

398

4.300000191

342

5.5

16

4

161

2.599999905

196

7.199999809

6.300000191

13

397

3.799999952

453

10.39999962

13.89999962

7

497

5.300000191

518

11.5

16.29999924

1

528

5.599999905

615

12.30000019

16

0

99

0.800000012

278

2.799999952

6.5

14

0.5

1.100000024

142

3.099999905

1.600000024

12

347

3.599999905

461

9.6

11.30000019

6

341

3.5

382

9.800000191

11.5

5

507

5.099999905

590

12

15.69999981

0

400

8.6

517

7

12

8

Table 1: Original data

i

We have one dependent variable y, which is annual net sales, and 5 independent variables, which are number sq. ft., inventory, amount spent on advertising, size of sales district and number of competing stores in district. We can use Multiple Regression Model to deal with the task.

1.

Table 2 is

Microsoft Excel regression output for annual net sales (y) and number sq. ft. (  ), inventory ( ( ),

amount spent on advertising ( (  ), size of sales district ( (  ), and number of competing stores in district ( ). SUMMARY OUTPUT Regression Statistics

Multiple R

0.996583914

R Square

0.993179497

Adjusted R Square

0.991555568

Standard Error

17.64924165

Observations

27

ANOVA df

SS

MS

F

Significance Significance F

Regression Regression

5

952538.9415

190507.7883

611.5903672

5.39731E-22

Residual

21

6541.410344

311.4957306

Total

26

959080.3519

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

-18.85941416

30.15022791

-0.625514812

0.538372333

-81.56024554

43.84141723

-81.56024554

43.84141723

X2

16.20157356

3.544437306

4.570986073

0.000165985

8.830512669

23.57263445

8.830512669

23.57263445

X3

0.174635154

0.057606068

3.031540961

0.006346793

0.054836778

0.294433531

0.054836778

0.294433531

X4

11.52626903

2.5321033

4.55205324

0.000173652

6.260471952

16.79206611

6.260471952

16.79206611

X5

13.5803129

1.770456609

7.670514392

1.60543E-07

9.898446822

17.26217897

9.898446822

17.26217897

X6

-5.31097141

1.70542654

-3.114160174 -3.114160174

0.005248873

-8.857600053

-1.764342766

-8.857600053

-1.764342766

Table 2: MS Excel regression output for annual net sales

From the Excel output we can get the estimated regression equation, which is:  = −18.86 + 16.20 .20 + 0.17 0.17  + 11.5 11.53 3 + 13.5 13.58 8 − 5.31 5.31  2. As we know, R Square (  (  ), the Coefficient of Determination, which tells us how many points fall on the regression line. But because we have more than one x variables, we should use Adjusted R Square. Our Adjusted R Square is 0.99, which means that 99% of the variation of y-values around the mean are explained by the x-values. x-values. In other words, words, 99% of the values fit the the model. Therefore, Therefore, the model model fits the data very well. 3.

Annual net sales (y) is dependent variable, and the others are independent variables. 1)  :

 =  =  =  =  = 0

 :      ≠ 0 2)  = 0.05 3) P-Value = 5.39731E-22 5.39731E-22 ≈ 0 <  = 0.05 4) Conclusion: Reject  at  = 0.05, there is no sufficient evidence to support  =  =  =  =  = 0. Therefore, we reject that there is no significant relationship between between the dependent and any of the independent variables. OR Table 3 shows the P-Value test and results.

Dependent Annual sales

Independent Area

p-value 0

Test p-value <0.05

Annual sales

Inventory

0.006

p-value <0.05

Annual sales

Advertising spending Size of sales district Number of competing stores

0

p-value <0.05

0

p-value <0.05

0.005

p-value <0.05

Annual sales Annual sales

Relationship Significant relationship exists Significant relationship exists Significant relationship exists Significant relationship exists Significant relationship exists

Table 3: P-Value test and results 4. Table 4 shows the interpretation of individual slope coefficients.

Variables

Coefficients

Annual net sales -18.86

Interpretation Set  =  =  =  =  = 0, y = -18.86 -18.86 means that without store, inventory, advertising expense, sales district and competing stores in the district, we will gain -18.86 thousand dollars sales.

Obviously, this is meaningfulness. Number sq. ft. ft.

16.20

16.20 means that every single sq. ft. increasing of store area can increase 16.20 thousand dollars sales.

Inventory

0.17

0.17 means that increasing 1 thousand dollars inventory can increase 170 dollars sales.

Advertising expense

11.53

11.53 means that increasing 1 thousand dollars advertising expense can increase 11.53 thousand dollars sales.

Size of sales district

13.58

13.58 means that if the sales district increases 1000 families the sales will increase 13.58 thousand dollars.

Number of competing stores in district

-5.31

-5.31 means that increasing a single competing store in the sales store can decrease 5.31 thousand dollars sales.

Table 4: interpretation of individual slope coefficients 5. Table 5 shows the interval for slope coefficients of individual variables. Variables Annual net sales

Lowest slope coefficient

Highest slope coefficient

-81.56024554

43.84141723

Store area

8.830512669

23.57263445

Inventory

0.054836778

0.294433531

Advertising expense

6.260471952

16.79206611

Size of sales district

9.898446822

17.26217897

-8.857600053

-1.764342766

Number of competing stores in district

Table 5: the interval for slope coefficients 6.

Variables

t-stat

t-critical

Criteria Accept H0: t-stat>t-cr Else Rejected it

Result

Area

7.7354

2.0555

Rejected

Statistically significant

Inventory

-8.2877

2.0555

Accepted

Not significant

Advertising spending

7.6716

2.0555

Rejected


7.6869

2.0555

Rejected


7.3719

2.0555

Rejected


Size of sales market Number of competing stores

Table 6: Test the estimated slope coefficients for individual variables Table 6 shows the results of testing the estimated slope coefficients for individual individual variables. From the table we can get that the estimated slope coefficient for all variables except inventory is significant.

7. From Table 6, inventory is not significant, we remove it and get below result in Table 7. SUMMARY OUTPUT RegressionStatistics Multiple R R Square AdjustedR

0.995085241 0.990194637

Square Stan Standa dard rdE Err rror or Observations

0.988411844 20.6 20.675 7511 1179 795 5 27

ANOVA Significance df Regression Residual Total

SS 4 22 26

Coefficients Intercept

MS

F

F

237419.0552 427.4605024

555.4175271

9.57989E-22

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

949676.2208 9404.131054 95 9 59080.3519

-39.460022

34.41055873

-1.14674168

0.263807827

-110.8231531

31.90310896

X2 X4 X5

20.44388672 16.96614275 15.67296189

3.814801407 2.092787626 1.90985556

5.359095937 8.10695865 8.206359798

2.21824E-05 4.73185E-08 3.85791E-08

12.53247282 12.62596685 11.71216388

28.35530062 21.30631864 19.6337599

X6

-4.04330128

1.936828415

-2.08758879

0.048629066

-8.060037571

-0.026565

From the Excel output we can get the estimated regression equation, which is:  = −39.46 + 20.44 + 16.9 16.97 7 + 15.6 15.67 7 − 4.04 4.04 

8.

 = −39.46 + 20.44 + 16.9 16.97 7 + 15.6 15.67 7 − 4.04 4.04  = $183591.46

Black, K., 2009. Business statistics: Contempora Contemporary ry decision decision making . USA: John Wiley & Sons. Groebner, D.F., Shannon, P.W., Fry, P.C. and Smith, K.D., 2011. Business statistics: statistics: A decision making approach approach. UK: Prentice Hall/Pearson. Newbold, P., Carlson, W. and Thorne, B., 2012. Statistics for business and economics . UK: Pearson. Weiers, R.M., 2010. Introduction Introduction to business statistics statistics. USA: Cengage Learning.

Source: http://college.cengage.com/m http://college.cengage.com/mathematics/brase/u athematics/brase/understandable_ nderstandable_statistics/7e/students/da statistics/7e/students/datasets/owan/frames/fr tasets/owan/frames/fram am e.html i

Group Assignment Hi6007 statitics

Recommend Documents