FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) nd
Last updated 22 October 2010 Cover Type B
TO BE RETURNED AT THE END OF THE EXAMINATION. THIS PAPER MUST NOT BE REMOVED FROM THE EXAM CENTRE. SURNAME:
__________________________________
FIRST NAME:
__________________________________
STUDENT NUMBER: __________________________________ COURSE:
__________________________________
__________________________________________________________________________________
FINAL EXAMINATION PRACTICE PAPER ‘P’
SOURCE: Past practice paper written Pre-Autumn 2007
SUBJECT NAME and NUMBER:
BUSINESS INFORMATION ANALYSIS 26133 BUSINESS STATISTICS 26134
TIME ALLOWED:
3 Hours plus 10 minutes reading time
INSTRUCTIONS TO CANDIDATES:
•
This question paper MUST NOT be removed from the Examination Centre
•
This is an OPEN BOOK EXAMINATION (any materials allowed including Lecture Notes)
•
Calculators (including Programmable Calculators) are allowed
•
Record your student name and number carefully on the multiple choice answer sheet provided
•
Answer all questions on the multiple choice answer sheet provided
•
Use the exam question booklet for any working. All exam question booklets will also be collected and MUST NOT be removed from the Examination Centre
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) nd
Last updated 22 October 2010
Question P1: The owner of Fortee Bakery is interested in determining the difference between mean purchase amounts per customer at two locations. To estimate the difference, samples of 500 customer receipts from each location are examined. The following sample statistics are produced: Location 1 Location 2 Mean $5.26 $5.66 Standard Deviation $0.89 $1.05 The owner asks you to examine if there is a significant difference in the average purchase amounts per customer at the two locations. You determine that: (a) Using an independent samples t-test, the mean purchase amounts at each location are not significantly different at the 99% level of significance (b) Using an independent samples t-test, the mean purchase amounts at each location are significantly different at the 99% level of significance (c) Using a paired samples t-test, the mean purchase amounts at each location are not significantly different at the 99% level of significance (d) Using a paired samples t-test, the mean purchase amounts at each location are significantly different at the 99% level of significance B = independent; sales significantly different Answer: Justification: Ho: μ1 – μ2 = 0; Ha: μ1 – μ2 ≠ 0 Test method: independent samples b/c only have aggregate information about each store. No way of matching the two store sales at the individual observation level. Standard error: sqrt(s1^2/n1 + s2^2/n2) = sqrt(.89^2/500 sqrt(.89^2/500 + 1.05^2/500) = .061556 .061556 99% confidence means that α=1%. Since two-tailed test area in upper tail is α/2 = ½% = .005. There are n1=500 receipts and n2 = 500 receipts. Df = 500+500 – 2 = 998 d.f. We use t-distribution since σ is unknown. T(.005,998) = T(.005,Inf) = 2.576 (table available in exam) Critical value = 0 + T*SE = $0.158569 Evidence(Point estimate): xbar1 – xbar2 = 5.26 – 5.66 = -$0.40 Test-statistic = $0.40 / SE = -6.49 Since both the evidence in dollars and test-statistic are in the rejection region, reject Ho. Conclude that the sales at the two locations are significantly different at the 99% level. Question P2: As part of an internal auditing process, a firm wishes to estimate the mean proportion of its credit card holders having accounts that are overdue. The auditors have stipulated that the minimum margin of error must be no greater than 3% and they wish to use a 95% confidence interval estimate. What size sample should they select if they have no idea the percentage of accounts that are overdue but still wish the sample size to be large enough to meet their requirements? (a) 17 (b) 30 (c) 204 (d) 1068 D = 1068 Answer: Justification: n = (critical value)^2(sigma)^2 / (margin of error)^2 NOT n = (critical value)^2(standard error)^2 / (margin of error)^2 n = z^2 * p(1-p) / E^2. Since you do not know what p is, you maximise p(1-p), so assume p=0.5 = 1.96^2 * .5 (1-.5) / .03^2 = 1067.0719
Page 2 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P3: 60% of people have broadband connection. 70% of people use Artsel One as their carrier of choice. What is the expected probability a person does not have broadband connection and choose a different carrier other than Artsel One, assuming that the choice to have broadband is independent of the choice of carrier and visa versa? (a) 12% (b) 18% (c) 42% (d) 58% Answer: A = 12% Justification: P(AC and BC) = P(AC).P(BC) given independence. P(AC) = 1 – P(A) = 1 - .70 = .30 = Probability of not using Artsel One C P(B ) = 1 – P(B) = 1 – .60 = .40 = Probability of not having broadband. C C P(A and B ) = .30 x .40 = .12 Question P4: A listing of advertisements indicates that five advertisements choose to list a website first (coded as one), twenty list a telephone number first (coded as two) and thirty list neither (coded as zero). The variable, type of listing in advertisement, would be considered to have at best (in terms of ability to conduct statistical analysis) which measurement properties: (a) Nominal only (b) Nominal and Ordinal (c) Nominal, Ordinal and Interval (d) Nominal, Ordinal, Interval and Ratio Answer: A = Nominal only Justification: You can only nominate a response as either having the listing. You cannot order the outcomes {web first ; telphone first; neither listed} could be written equivalently: { telphone first; neither listed; web first}. Question P5: The financial manager of a magazine has compiled the following table from a regression analysis used to make predictions about the number of sales (in dollars) per issue: Intercept X1 X2
Coefficients 6.97 2055 35.236
Standard Error .584 10637.5 762.235
t Stat 11.9287 0.19318 0.046227
P-value 5.22E-29 0.846893 0.963148
where X1 = the number of pages in the issue; X2 = a dummy variable taking the value 1 if a well known celebrity is shown on the front cover and 0 otherwise. You predict the level of sales for an issue in which there are 30 pages and a well known celebrity appears on the cover to be: (a) $61,650 (b) $61,685 (c) $61,692 (d) More than $62,000 Answer: C = $61,692 Justification: Sales = 6.97 + 2055(X1) + 35.236(X2) X1 = 30; X2 = 1 (celebrity appears). Sales = 6.97 + 2055*30 + 35.236*1 = 6.97 + 61650 + 35.236 = 6.97 + 61685.24 = 61692.21
Page 3 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P6: A pricing study examines the linear relationship between sales of Cracker Nuts and factors such as its price; its competitor's prices; a dummy variable representing whether Cracker Nuts had a promotional offer or not; and, a dummy variable representing whether its closest competitor, Sirius Nuts, had a promotional offer or not. The data revealed that Cracker Nuts runs promotional offers only when Sirius Nuts happened to be running a promotional offer as well. The regression output is very peculiar. Based on this information you would suspect that the major concern is: (a) non-linear effects (b) multi-collinearity (c) outliers (d) too many variables B = Multi-collinearity Answer: Justification: Since the two companies both run promotional offers at the same time, the dummy variables will be correlated. Hence, this is mult-collinearity. Question P7: An Excel spreadsheet displays a variable that is coded with values representing the area in which an online panel member is employed. The coding scheme used is: 1 = telecommunications 2 = finance and banking 3 = retailing 4 = other industry You calculate the mode, median and mean on the data revealing the values 2, 3, and 3, respectively. Which statement is correct: (a) On average, panel members appear to be employed in retailing (b) Most panel members appear to be employed in retailing (c) 50% of members appear to be employed in retailing (d) None of the above statements are correct D = None of the above Answer: Justification: The data has nominal properties so only the mode is interpretable. The mode tells us that most people are employed in industry category 2, which is finance and banking. Question P8: The weight of goods being transported by an airline for each passenger is observed. A sample of one hundred passengers reveals an average weight of goods to be 17.7kg. The population of weights for all passengers is known to follow a normal distribution with a standard deviation of 10kg. What is the probability that the sample mean of weights observed is within +/- 2kg of the population mean weight? (a) .0456 (b) .1586 (c) .8414 (d) .9544 Answer: D = .9544 Justification: n = 100; N = unknown (infinite); s = 10; s(xbar) = SE = σ / sqrt(n) = 10 / sqrt(100) = 10 / 10 = 1kg mean = 17.7 = P(15.7kg < xbar < 19.7kg) = P(xbar < 19.7) – P(xbar < 15.7) = P(z < (19.7 – 17.7)/SE) - P(z < (15.7 – 17.7)/SE) = P(z<+2/1) – P(z<-2/1)
Page 4 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
= P(z<+2) – [1 - P(z<+2)] = 2*P(z<+2) – 1 = 2*.9772 – 1 = .9544 Note: depending on how they have been taught, some students may use the following: P(xbar – j ≤ E(xbar) ≤ xbar + j) = 2*P(z ≤ j / s(xbar)) – 1; where E(xbar) = 17.7 ; j = 2kg.
Question P9: AGI Sales are informed from their car supplier that carbon monoxide (CO) emissions for a certain kind of car follow a normal distribution, with a mean of 2.9g/mi. AGI Sales believe the emissions, on average, may be significantly higher than that suggested by their supplier. Sampling ten of their vehicles, AGI Sales finds that, on average, the emissions are 3.1g/mi with a standard deviation of 0.4g/mi.. Using the sample, you test the concerns (using alpha = .05) of AGI Sales and conclude: (a) On average, the emissions are significantly higher than the 2.9g/mi claimed (b) On average, the emissions are not significantly higher than the 2.9g/mi claimed (c) On average, the emissions are not significantly higher than the 3.1g/mi claimed (d) With only ten vehicles tested, one cannot make any of the above conclusions Answer: B = On average, mean emissions not significantly higher than 2.9g/mi Justification: Ho: μ ≤ 2.9g/mi (status quo) Ha: μ > 2.9g/mi (claim) Because the sample size is small (n=10) but the population parameter follows a normal distribution, the test statistic can be used and follows a t-distribution. Also, since the population standard deviation is unknown (but we know from sample that s = .4g/mi), we can use this. Rejection region/critical value: tdf=10-1=9,.05 = 1.833 (it is a one-tailed test) From sample observations: xbar = 3.1g. We currently assume mu = 2.9g until we find evidence to lend support for a contradictory view. Test statistic = (xbar – m)/(s/sqrt(n)) = (3.1 – 2.9)/(.4/sqrt(10)) = 1.581139 Since (test stat =1.58) < (t9,.05 = 1.833), we cannot reject the null hypothesis. The mean emission are not significantly higher than the 2.9g claimed.
Question P10: Past studies show that the previous mean time to prepare a home-cooked meal was 40 minutes. A new study claims that the mean time to prepare a home-cooked meal has dropped to be significantly lower than this amount. Suppose that a study is designed to test this by sampling 100 home-owners. What should the null hypothesis be to test the claims made in the new study? (a) μ ≥ 40 minutes (b) μ > 40 minutes (c) μ < 40 minutes (d) μ ≤ 40 minutes Answer: A = mu >= 40 minutes. Justification: One claim or hypothesis is that μ < 40 minutes (significantly lower than 40 minutes). Another claim is the compliment of this, namely μ ≥ 40 minutes. As this contains the equals, this will become the null. That is: Ho: μ ≥ 40.
Page 5 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P11: A production manager of Lotzafun Toys needs to estimate the average time taken to assemble products using a new manufacturing technique. It is believed that the population standard deviation is 15 seconds. How large a sample of assembly times should be taken to estimate the mean assembly time to within 2 seconds, with 95% confidence? (a) 153 (b) 216 (c) 217 (d) 865 Answer: C = 217 Justification: n = (critical value)^2(sigma)^2 / (margin of error)^2 NOT n = (critical value)^2(standard error)^2 / (margin of error)^2 1.96^2 * 15^2 / 2^2 = 216.09 Round up to 217.
Question P12: The mean, a measure of central tendency, is sensitive to outliers because it relies on which measurement property of the data to be calculated? (a) Integer (b) Nominal (c) Ordinal (d) Quantitative Answer: D = Quantitative Justification: The mean relies on the numerical value or quantitative properties of the data b/c it sums the data.
Question P13: Eleven supermarkets introduced a promotional display for Grand Baked Beans promoting the product on the basis of its low fat content. Another eleven stores were identified, this time promoting the product on the basis of its energy producing benefits. These eleven pairs of supermarkets (22 in total) were selected that were similar in terms of geographic location, size and product sales. The difference in units sold was calculated for each pair. The sample mean difference was found to be 4200 units with a standard deviation of 8800 units sold. Examining this evidence, (with α=.05) you can conclude: (a) The average number of units sold was not significantly different across the two sets of stores (b) The average number of units sold was significantly different across the two sets of stores (c) A conclusion about differences in average units sold cannot be made since we do not know how each supermarket performed (d) People appeared to really enjoy the promotion involving low fat content A = no significant difference in number of units sold. Answer: Justification: Hence, Ho: μd = 0; Ha: μd ≠ 0. Hypothesised mean = 0 table critical value = ta/2,n-1 = t.05/2,11-1 = t.025,10 = 2.228 (Note, it is the number of pairs that form the number of observations). observed: dbar = 4.2; sd = 8.8 test statistic = t = (dbar – mud)/(sd/sqrt(n)) t = (4.2–0)/(8.8/sqrt(11)) = 1.583 Since t = 1.583 < t-table we cannot reject Ho at the 95% level.
Page 6 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P14: The premium amounts of all 2500 insurance payers (i.e., the population) follows a normal distribution with a standard deviation of $15. An internal audit selects a sample of 250 premiums, the sample revealing an average premium to be $550. What is the probability that this average premium is within +/- $1.50 of the population mean premium amount? (Hint: consider the issue of finite correction) (a) 9.5% (b) 11.4% (c) 88.6% (d) 90.5% Answer: D = 90.5% Justification: n = 250; N = 2500 (finite); s = 15; hence, n/N = 250 / 2500 = .1 which is NOT <= .05, hence finite correction factor reqd. s(xbar) = [s / sqrt(n)][sqrt(N-n) / sqrt(N-1)] s(xbar) = [15 / sqrt(250)][sqrt(2500-250)/sqrt(2500-1)] = [.948683][.948873]= .90018 (NOT .948683) See question 8 solution for the full step-by-step theoretical approach. The following formula was provided to students in previous semesters to save time, but you do not have to learn this: P(xbar – j ≤ E(xbar) ≤ xbar + j) = 2*P(z ≤ j / s(xbar)) – 1; E(xbar) = 550 ; j = 1.50kg. P(550– 1.5 ≤ E(xbar) ≤ 550+ 1.5) = 2*P(z ≤ 1.5 / .90018) – 1 P(548.50≤ E(xbar) ≤ 551.50) = 2*P(z ≤ 1.666333 ) – 1 = 2*.9525 – 1 = .905
Question P15: A real estate agent believes a regression will be useful to predict auction prices by including various factors. The following regression output is produced: Regression Residual
ANOVA
Total Estimates Intercept Square Metres Distance to Schools Distance to Shops Bathrooms Bedrooms
5 539 544
df 18.69241 45.74425
SS MS 3.738482 44.05016 0.084869
F
Significance F 4.21E-38
64.43666
Coefficients 60.39093 0.068209 -0.05025 -0.23617 0.033752 0.187133
Standard Error t Stat P-value 0.012512 4826.769 0 0.021616 3.155526 0.001692 0.021086 -2.38312 0.017512 0.021768 -10.8495 6.11E-25 0.021957 1.537164 0.12484 0.021467 8.71703 3.53E-17
Using the F-statistic (with 95% level of significance) provided in the ANOVA table only, which statement is correct: (a) Both distance to shops and the number of bedrooms in a dwelling are significant in predicting auction sale prices, better than chance. (b) Only distance to shops is significant in predicting auction sale prices, better than chance. (c) Only the number of bedrooms in a dwelling is significant in predicting auction sale prices, better than chance. (d) None of the above can be established using the statistics listed in the ANOVA table Answer: D = none of the above are correct. Justification: F-statistic in ANOVA only tells us whether none (Ho) or at least one of the mean coefficients are significantly different from zero, but not sure which one (Ha).
Page 7 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P16: A town planner obtains a sample listing the actual heights of buildings (in metres) in the local central business district. The town planner determines that 20% of buildings are above the actual height approved by the town planning committee. The town planner also categorises buildings as being within a close proximity (less than 2km) to the river running through the city. 30% of buildings fall into the category of being in close proximity to the river. Randomly selecting one building for further investigation, what is the observed conditional probability that the town planner selects a building that is built too tall, given it is built within a close proximity to the river? Assume that the excessive height of a building relative to its approved height is independent of its proximity to the river. (a) 6% (b) 20% (c) 30% (d) Unable to be determined Answer: B = 20% Justification: P(Too Tall) = 0.20 P(River) = 0.30 P(TT | R ) = P(TT and R) / P(R); However, P(T and R) are unobserved, but since we know the assumption of independence does not hold, the value can be determined. P(TT and R) = P(TT)*P(R) under independence. P(TT | R ) = P(TT)*P(R) / P(R) = P(TT) = .20 = 20% Note that under independence we’ve shown that P(TT | R) = P(TT) – this should make theoretical sense to students: we are saying that the height of a building is not conditional upon whether it is near the river or not. Question P17: A sports manufacturer tests the durability of five different soles. Durability is assessed based on wear and tear, where higher ratings indicate greater wear and tear. The following results were reported testing the null hypothesis that mean wear and tear for all five soles is equal. ANOVA Source of Variation Between Groups Within Groups
SS 1020.782 533420.8
df
Total
534441.6
2499
4 2495
MS 255.1954 213.7959
F 1.19364
P-value 0.311512
F crit 2.375494
What would you conclude at the α=.05 level based on the Analysis of Variance (ANOVA) output: (a) The soles are not significantly different in terms of mean wear and tear. (b) The soles are significantly different in terms of mean wear and tear. (c) The soles are not significantly different in terms of the amount of variability exhibited in wear and tear. (d) The soles are significantly different in terms of the amount of variability exhibited in wear and tear. Answer: The soles are not significantly different in terms of mean wear and tear. Justification: Ho: μ1 = μ2 = … = μ5 = μ*; Ha: At least one of the means is significantly different. Using p-value: p-val = .311 > a = .05; Hence we do not rewind, we do not rej Ho. Ho refers to the equality of the means not the variances.
Page 8 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P18: The regression output below represents the perceptions of private hospitals, measured on a scale of 1 to 7, where 1=Poor and 7=Excellent. The dependent variable results in an index of performance. Factor
Intercept Empathy of staff Expertise of medical staff Administrative efficiency Trustworthiness of staff Cleanliness Quality of facilities
Coefficients 2.2917 0.5432 2.6029 2.8014 0.7331 1.4829 2.4685
Standard Error 0.0428 0.4756 0.1816 0.2058 0.1794 0.2205 0.8181
t Stat 53.5840 1.1421 14.3301 13.6139 4.0871 6.7238 3.0172
P-value 0.0000 0.2562 0.0000 0.0000 0.0001 0.0000 0.0032
Edyr Hospital has the following ratings: empathy (3); expertise (4); administration (2); trustworthiness (6); cleanliness (2) and quality (5). Suppose the Edyr Hospital aims to improve itself on perceptions regarding empathy and cleanliness, hoping to obtain ratings of 5 and 6 respectively. What will be the impact on their overall index of performance as a result of their endeavour: (a) The index will improve by 5 points, all else constant. (b) The index will improve by 6 points, all else constant. (c) The index will improve by 7 points, all else constant. (d) The index will improve by 8 points, all else constant. Answer: C = The index will improve by 7 points, all else constant. Justification: Need only look at the change only – as seen in the last two columns: Factor
Intercept Empathy of staff Expertise of medical staff Administrative efficiency Trustworthiness of staff Cleanliness Quality of facilities
Coefficients
2.2917 0.5432 2.6029 2.8014 0.7331 1.4829 2.4685 INDEX
current 1 3 4 2 6 2 5 39.6426
improve 1 5 4 2 6 6 5 46.6606
change 0 2 0 0 0 4 0 7.018
impact change 1.0864
5.9316 7.018
Question P19: A charity was attempting to determine if the number of donations being made to its foundation (dependent variable) was somehow related to the characteristics of donors. Intercept Age Income Number of Children Sole Parent (dummy coded)
Coefficients Standard Error t Stat P-value 49.8416 5.1156 9.7431 0.0000 -0.0457 0.0882 -0.5176 0.6050 0.0010 0.0004 2.3912 0.0172 -0.0461 0.0886 -0.5202 0.6032 -0.1427 0.0915 -1.5602 0.1193
Examining this regression output above, the charity should consider that the following characteristics are significant (at the α=.05 level) in explaining donation behaviour: (a) None of the variables (b) Income only (c) All of the variables, except income (d) All of the variables Answer: B = Income only. Justification: While income has the lowest coefficient this is likely b/c of the way in which it measured, hence, deflating the standard error. When testing at α=.05, ONLY Income is significant since .0172 < .05 hence rewind, rej Ho that bi=0. All other variables are insignificant at the a=.05 level since p-values are all > .05, hence we cannot rewind, we cannot rej Ho.
Page 9 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P20: An investment company, ABI , determines from its market share that 10% of investors choose ABI to manage their portfolios. A recent industry survey reveals that 20% of people invest in mining, given they choose ABI to manage this portfolio. The survey revealed that only 5% of people invest in mining, given they used an alternative firm to manage their portfolio. To assess the potential attracti on of people to ABI because of its successful portfolio management in mining investments, the firm wishes to determine the likelihood a person will choose ABI Investments, given they have chosen to invest in mining. You determine this probability to be: (a) 2% (b) 4.5% (c) 30.77% (d) 69.23% Answer: C = 30.7692 Justification: NOTE: The following solution utilises Bayes theorem. Whilst one can avoid using Bayes Theorem to come up with an answer it is useful to see its role. Students in some semesters will not be exposed to Bayes Theorem so would not be expected to see such a difficult question on their final exams. Please do not ask about this on the discussion board.
An investment company, ABI, determines from its market share that 10% of investors choose ABI to manage their portfolios. P(A) = .10 ; P(AC) = .90 A recent industry survey reveals that 20% of people invest in mining, given they choose ABI to manage this portfolio. P(M | A) = .20 The survey revealed that only 5% of people invest in mining, given they used an alternative firm to manage their portfolio. P(M | AC) = .05 To assess the potential attraction of people to ABI because of its successful portfolio management in mining investments, the firm wishes to determine the likelihood a person will choose ABI Investments, given they have chosen to invest in mining. P(A | M) = ? This probability is found to be: Using Bayes Theorem … P(A | M) = P(A).P(M|A) / [ P(A).P(M|A) + P(AC).P(M | AC) ] = (.10)(.20) / [(.10)(.20) + (.90)(.05) ] = .02 / (.02 + .045) = .307692 Even without knowledge of Bayes Theorem, one can see that: P(M | A) = P(M and A)/P(A); rearranging gives P(M and A) = P(M|A)*P(A) = .20*.10 = 0.02 P(M | AC) = P(M and AC)/P(AC); rearranging gives P(M and A C) = P(M|AC)*P(AC) = .05*.90 = 0.045 P(M) = P(M | A) + P(M | A C) = .02 + .045 = 0.065 Finally, P(A|M) = P(A and M) / P(M) = 0.02 / 0.065 = .30792
Page 10 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P21: A survey reveals a breakdown for the number of visits households had made to a National Park in the previous year: Number of Visits # Respondents None 800 1 only 100 2 only 70 3 only 30 4 only 0 Ignoring outcomes relating to those who visit five times a year or more, the average (expected) number of visits to a National Park that a visitor will make will be? (a) No visits per year (b) 0.14 visits per year (c) 0.33 visits per year (d) 1 visit per year Answer: C = .33 visits per year. Justification: Looking only at the none, 1,2,3 only examines 1000 visitors. Hence, the probabilities are: X p(x) x.p(x) 0 .8 0 1 .1 .1 2 .07 .14 3 .03 .09 Total sum x.p(x) = 0 + .1 + .14 + .09 = .33 Question P22: A sub-set of responses from an observational survey of people at a supermarket reveals that each person spent the following amount of time (in seconds) waiting at the check-out. 70; 120; 20; 100; 80; 40; 30; 220 The average and standard deviation of check-out waiting time for this sample is: (a) 85 seconds and 60.4 seconds, respectively. (b) 85 seconds and 64.6 seconds, respectively. (c) 85 seconds and 4171.4 seconds, respectively. (d) 85 seconds and 3650 seconds, respectively. Answer: B = 64.6 Justification: Must be sample mean and standard deviation N 8 Average 85** Stdev 64.58659746** SAMPLE MEAN Var 4171.428571 Stdevp 60.41522987 Varp 3650 It relies on the numerical value or quantitative properties of the data b/c it sums the data.
Page 11 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P23: Based on historical attendance records, an assessment of the number of people attending an IKG business seminar is normally distributed with a mean of 50 people and standard deviation of 10 people. Caterer rates increase substantially when numbers exceed 60 people. What is the probability that this will occur? (a) 16% (b) 20% (c) 34% (d) 84% Answer: A = 16% Justification: P(x>60) = P(z > (60-50)/10) = P(z>10/10) = P(z>1) = 1 – P(z<=1) = 1 - .8413 = .1587 Question P24: A listing of stock reveals that twenty stereos were sold last month while a total of fifty DVD players were sold. Using last months stock listings, the probability that a stereo will be sold this upcoming month is closest too: (a) 29% (b) 40% (c) 60% (d) 71% Answer: A = 29% Justification: 20 stereos + 50 DVD players = 70 stock items. Hence, P (stereo ) = 20 / 70 = .285714. Question P25: You have asked your administrative assistant, Cindy, to run a regression with your weekly departmental expenditure for several years against various tasks that have been completed but with no direct expenditure amount. Cindy runs regression 1 and reports the first part of the results in the table below. Cindy runs a second regression in which she included some more variables that she had initially forgotten to include. Regression 1 Regression 2 Multiple R 0.965577 0.96543 R Square 0.932338 0.932054 Adjusted R Square 0.931925 0.931778 Standard Error 0.286021 0.286329 Observations 495 495 You are immediately suspicious of Cindy’s report because: (a) Multiple R-squared went down (b) The adjusted R-squared went down (c) Both (a) and (b) (d) The number of observations did not change (e) Cindy is a psychopath and shouldn’t be trusted Answer: A = Multiple R-squared went down. Justification: If you add additional items to a regression the R-squared should go up not down. The adjusted R-squared can either go up or down.
Page 12 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P26: Lisa is a University researcher examining the impact of several variables and their effects on average prices in consumer markets. She is interested in predicting the movement in general inflation indicators such as the cost of a loaf of bread or a litre of milk. Lisa hypothesises that average prices are determined by oil prices: higher oil prices will drive average prices up. Also, she believes that interest rate rises will see average prices fall in most consumer markets. She argues that any increase in importation taxes will see the supply of produce decrease and so prices impacted by such taxation increases will skyrocket. Lisa formalises her theory into a regression framework, examining the following variables: X1 = oil, cost per barrel; X 2 = interest rates; X 3 = importation taxes. If Y is the average price in the consumer market Lisa is examining and she estimates the following regression function: Y = b 0 + b1X1 + b2X2 + b3X3 + error, then the expected signs of b 1, b2 and b3 respectively are: (a) positive, positive, and positive (b) positive, negative, and positive. (c) positive, negative and negative. (d) negative, negative and positive. Answer: B = positive, negative, positive. Justification: higher [oil] prices will drive average prices up: hence, b1 is positive. interest rate rises will see average prices fall. Hence, negative effect. b2 should be negative. As importation taxes increase, the supply of goods will decrease, forcing prices up. Hence, b3 is positive. Answer in summary is positive, negative and positive.
Question P27: A manager wishes to quickly assess the percentage of employees who take up to a certain amount of time to travel to work. A survey asks employees how long they have taken that day and their answers are recorded in minutes. Which method would be the most appropriate summary technique to describe the data and achieve the manager's objectives? (a) Correlation Coefficient (b) Frequency Histogram (c) Ogive (d) Scatter-plot Answer: Ogive Justification: Correlation coefficient – numerical measure examines relationship between TWO QUANTITATIVE variables. Frequency histogram – shows frequency of qual or quant data – is a TABLE. Ogive - examines the cumulative percentage over and above a given level of x Scatterplot – graphical measure examines relationship between TWO QUANTITATIVE variables.
Question P28: Brett, a small business owner of Drywall Plumbing has been told that his investment funds will take between 5 and 10 days to transfer to his day-to-day account. He has a bill due in 7 days. What is the probability that the funds will be available in time to pay the bill? (a) 30% (b) 40% (c) 60% (d) 70% Answer: B = 40% Justification: Using uniform since we have no other information a = 5; b=10; 1 / (b-a) = 1 / (10-5) = 1/5 = .2 P(x<=7) = (7-5)*.2 = 2*.2 = .4
Page 13 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P29: Suppose the weight gain over a 12 month period that is induced as a side-effect of a new drug for treating patients is normally distributed. To estimate the mean weight gain, a sample of 51 patients was drawn and the sample mean was found to be 25 kilograms gained over 12 months, with a standard deviation of 6kg determined from the same sample. Constructing a 90% confidence interval estimate of the mean 12-month weight gain for all patients, the upper limit of your confidence interval will be approximately. (a) 24 kg (b) 26 kg (c) 31 kg (d) 37 kg Answer: B = 26 kg Justification: Confidence Interval = point estimate +/- (critical value)(standard error) where point estimate = sample mean = 25 kg. critical value = sourced from t-distribution as σ unknown: 90% implies 0.05 in upper tail area with n1 = 51-1 = 50 df. = 1.676 (based on t-tables so some element of inaccuracy). standard error = standard error of sample mean = s / sqrt(n) = 6 / sqrt(51). = 6/7.07 = .840168 Hence 90% confidence interval is: 25 +/- (1.676)(.840168) = 25 +/- 1.408122 = Approx 23.59 and 26.41 kg. With 26.41 the upper limit.
Question P30: Karen found that she could predict whether a car was speeding from the make and model of the vehicle, although not without some error. She over predicts on Thursday the number of cars speeding by 2 cars. On Friday she under predicts the number of speeding cars by 4 cars. Using the philosophy of regression, which day has been associated with more error in making predictions? (a) Thursday (b) Friday (c) It cannot be determined because we don't know how many cars she predicted would be speeding. (d) It cannot be determined because we don't know how many cars she observed were actually speeding. (e) Karen is always speeding and should slow down Answer: B = Friday Justification: Friday – it doesn't matter whether we are over or underpredicting … what matters is by how much which is the residual squared or residual squared error! We don't need to know what the value was observed nor predicted to work this out. Question P31: A sample consisting of fifty observations is taken to examine the quality of a production line. The population of quality is known to follow a normal distribution. The sample reveals a mean quality rating of 95 and standard deviation of 2. What is the standard error of the sample mean quality rating? (a) 0.04 (b) 0.28 (c) 2.00 (d) 7.07 Answer: B = .2828 Justification: n = 50; s = 2; N = infinite (unknown) Hence, s(xbar) = s / sqrt(n) = 2 / sqrt(50) = 2 / 7.071068 = .2828
Page 14 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P32: A solicitor wishes to assess the likely interest in running a series of seminars on property law in the upcoming month for the set of executive clientele listed who used the solicitor's services in the last year. The solicitor sends an email to a random selection of thirty executive clients to gauge interest in the potential seminar series. Each email has a response asking them if they would come to the seminar series if organised. This is an example of a: (a) Cross-sectional survey using the population (b) Cross-sectional survey using a sample (c) Time-series survey using the population (d) Time-series survey using a sample Answer: B = Cross-sectional survey using a sample Justification: Cross-sectional … while the seminars may be a series, there is only a need to run a one off survey. Only a random selection of clients was used: hence this represents a sample
Question P33: A concreting company wishes to compare additives used in “batches” of concreting. In particular, the company wishes to assess how long each additive sets, on average, and which additive “is best”. There are two sets of machinery each running different batches, one with an old style additive and one with a new style additive. The old style additive is used in 140 batches and reveals that, on average, concrete would set in 17.2 hours with a standard deviation of 2.5 hours. Using a new style additive in 140 batches also, the concrete takes hold with an average and standard deviation setting time of 15.9 hours and 1.8 hours, respectively. Using only this information given, a researcher now must advise which additive to use. The researcher is best advised to consider testing: (a) the difference between means of two populations, and assume independent samples (b) the difference between means of two populations using matched samples approach. (c) the difference between two population proportions using independent samples (d) the difference between two population proportions using a matched samples approach A = the difference between means of two populations, assume independent samples Answer: Justification: Comparing average settting time – continuous variable – looking at MEAN. Cannot create “pairs” of observations even when have same number of observations. No clear information on how one would do this. Hence, forced to use independent samples Question P34: A student researching attitudes of residents to a new building proposal for a local shopping centre decides to visit the existing shopping centre. The research stops people at random as they walk through the centre. The sampling method being used is best described as: (a) cluster sampling (b) convenience sampling (c) random sampling (d) justified sampling Answer: B = Convenience Sampling Justification: Please note that in previous semesters students have been assessed on this topic. Your lecture notes will guide you as to whether this topic is assessable or not, so please do not ask on the UTSOnline discussion board if it is or is not in a given semester. The sample is NOT random b/c we do not have a list of residents and drawing randomly from this. The student is simply obtaining responses from the residents who may be most conveniently located in the vicinity. It is not a cluster sample b/c this is a type of random sampling. There is no such thing as "justified sampling".
Page 15 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P35: A clothing manufacturer has had some bad experiences in the past by taking some actions at times that deviated from what they have normally done. For instance, they decided to print t-shirts using a new dye given that the rate of defects from several experiments appeared to be significantly lower than defect rates using previous dyes. Upon implementing the new dye system, it turned out to be a major mistake. Looking at their future decisions to deviate from current strategies, the company should consider: (a) maximising critical values (b) minimising statistical risk (c) minimising Type I errors (d) minimising Type II errors ANSWER: minimising Type I errors. Justification: The company wants to decrease the likelihood that they make a decision to reject the null hypothesis (status quo) given that the state of nature is that the null hypothesis is true. For instance, concluding Ha: mu>current defect rate when H0: mu<=current defect rate. That is, minimise prob Type I error. Minimising Type II is reducing the likelihood you make a decision to NOT reject the null hypothesis even though the null hypothesis is false. That is, you are more likely to stay with the Status Quo. Question P36: A furniture manufacturer receives instructions that the commercial panels it produces must conform so as to be made to an average thickness of .75 centimetres. Each hour, 50 panels are selected at random and precisely measured. After 20 hours, a total of 1000 panels have been measured. With 1000 panels, the thickness averages 0.753cm with a standard deviation of .034cm. Based on the sample data, what should the company conclude about its product meeting the thickness specification? (a) The average thickness is not significantly different to the desired level (b) The average thickness is significantly different to the desired level (c) The sample mean is not normal so we cannot make any conclusion about meeting standards, but would instead use a binomial distribution (d) One would need to observe the entire population of panels to be confident that production meets the desired standard Answer: B = The average thickness is significantly different to the desired level. Justification: Adapted from Groebner p325. Ho: μ =.75cm (status quo) - two tailed Ha: μ ≠ .75cm (claim) α is unknown so assume α=.05; σ is unknown so assume σ = s, but use t-distribution with n-1 df. Sample size is large (n=1000>30). Xbar = 0.753; sx = 0.034; n=1000; SE = sx/sqrt(n) Critical value = t with 0.025 upper tail and 999df = 1.96 Critical value in cms = μ +/- T*SE = .75 +/- 1.96*.034/sqrt(1000) = .747893 and .752107cm Evidence: xbar = 0.753cm lies in rejection region. Confirming via test-statistic: Test stat = (0.753 – 0.75)/(0.034/sqrt(1000) = 2.790245, again lies in rejection region compared to critical value of 1.96 We reject Ho and adopt view suggested by alternative: hence, the average thickness is significantly different from the desired level of .75cm.
Page 16 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P37: The owner of Wild Club and Grill, a nightclub, wishes to construct a confidence interval estimate for the mean number of customers coming into the club. The owner monitors numbers of several randomly selected evenings. The width of the confidence interval estimate will be narrower: (a) if one decreases the number of evenings that are monitored; (b) if the mean number of visitor numbers decreases; (c) if the mean number of visitor numbers increases; (d) if the variation in visitor numbers over all evenings decreases; D = if the actual variation in visitor numbers over all evenings decreases. Answer: Justification: Confidence Interval = point estimate +/- (critical value)(standard error) In this example, CI = sample mean visitors +/- (za/2 or t dist value)(s or sigma/ sqrt(n)) Statement a = if one decreases the number of evenings that are monitored; This implies that n decreases … this will increase the standard error, hence increasing the CI length. Statement b and c = if the actual mean number of visitor numbers decreases or increases; This implies the point estimate changes, but not necessarily that the standard error changes. Hence the length of the CI is unaffected. Statement d = If the variation decreases, implies s or sigma decreasing (even though we may or may not be observing it). Hence, this decreases the standard error and hence, length of the CI decreases. Question P38: Out of 1000 patents filed, 400 patents were filed with an accelerated request being made. Assuming each patent filing is independent of each other, a random selection of six is made for a complete a udit of the patent office's decision. What is the probability that exactly two being selected for the audit will have been made with an accelerated request? (a) 4.6% (b) 18.66% (c) 31.1% (d) 54.36% Answer: C = 31.1% Justification: Using BINOMIAL since independent trials. There are n=6 selections/trials. p = 400/1000 = .40 P(x=2) = 6C2.(.4^2)(.6^4) = .31104 Question P39: The following table summarises workers roles’ within a particular firm with 50 employees: Department Frequency (Number of Employees) Cumulative Frequency Management 10 10 Production 10 20 Marketing 15 35 Accounting 15 50 Which statement is correct? (a) 30% of employees are employed in production (b) 30% of employees are employed in production or less (c) 30% of employees are employed in marketing (d) 30% of employees are employed in marketing or less Answer: C = 30% of employees are employed in marketing Justification: Total freq = (5+10+15+20) = 50 There are 10 / 50 = 20% of employees in production. There are 15 / 50 = 30% of employees in production. Students should recognise that the cumulative frequency is meaningless. Hence, its use is incorrect.
Page 17 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Question P40: A new technology for cars requires that a specially developed vehicle will require a special charging station, similar to the concept of a petrol station. The current proposed number of charging stations for a 50km journey follows a Poisson distribution, with only one station encountered, on average. The new vehicle has a maximum range of 200km. What is the probability at least one station will be encountered within the maximum range journey? (a) 2% (b) 63% (c) 47% (d) 98% Answer: D = 98% Justification Poisson distribution has average mu = 1 stations / 50 km mu = 4 stations / 200 km (multiplying both by 4) We desire at least one i.e., P(x>=1) = 1 – P(x=0). P(x=0) = (mu^x)(exp(-mu))./(x!) P(x=0) = (4^0)(exp(-4))./(0!) P(x=0) = 1*exp(-4)/1 = exp(-4) = .018316 P(x>=1) = 1 - .018316 = .981684
THIS COMPLETES ALL THE QUESTIONS IN YOUR FINAL EXAM
CONGRATULATIONS ☺
Page 18 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
Cumulative Probabilities for the Standard Normal Distribution Entries in the table given the area under the normal probability distribution to the left of the z value. For example, z=1.25 the cumulative probability is .8944.
z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0 0.1 0.2 0.3 0.4
0.5000 0.5398 0.5793 0.6179 0.6554
0.5040 0.5438 0.5832 0.6217 0.6591
0.5080 0.5478 0.5871 0.6255 0.6628
0.5120 0.5517 0.5910 0.6293 0.6664
0.5160 0.5557 0.5948 0.6331 0.6700
0.5199 0.5596 0.5987 0.6368 0.6736
0.5239 0.5636 0.6026 0.6406 0.6772
0.5279 0.5675 0.6064 0.6443 0.6808
0.5319 0.5714 0.6103 0.6480 0.6844
0.5359 0.5753 0.6141 0.6517 0.6879
0.5 0.6 0.7 0.8 0.9
0.6915 0.7257 0.7580 0.7881 0.8159
0.6950 0.7291 0.7611 0.7910 0.8186
0.6985 0.7324 0.7642 0.7939 0.8212
0.7019 0.7357 0.7673 0.7967 0.8238
0.7054 0.7389 0.7704 0.7995 0.8264
0.7088 0.7422 0.7734 0.8023 0.8289
0.7123 0.7454 0.7764 0.8051 0.8315
0.7157 0.7486 0.7794 0.8078 0.8340
0.7190 0.7517 0.7823 0.8106 0.8365
0.7224 0.7549 0.7852 0.8133 0.8389
1.0 1.1 1.2 1.3 1.4
0.8413 0.8643 0.8849 0.9032 0.9192
0.8438 0.8665 0.8869 0.9049 0.9207
0.8461 0.8686 0.8888 0.9066 0.9222
0.8485 0.8708 0.8907 0.9082 0.9236
0.8508 0.8729 0.8925 0.9099 0.9251
0.8531 0.8749 0.8944 0.9115 0.9265
0.8554 0.8770 0.8962 0.9131 0.9279
0.8577 0.8790 0.8980 0.9147 0.9292
0.8599 0.8810 0.8997 0.9162 0.9306
0.8621 0.8830 0.9015 0.9177 0.9319
1.5 1.6 1.7 1.8 1.9
0.9332 0.9452 0.9554 0.9641 0.9713
0.9345 0.9463 0.9564 0.9649 0.9719
0.9357 0.9474 0.9573 0.9656 0.9726
0.9370 0.9484 0.9582 0.9664 0.9732
0.9382 0.9495 0.9591 0.9671 0.9738
0.9394 0.9505 0.9599 0.9678 0.9744
0.9406 0.9515 0.9608 0.9686 0.9750
0.9418 0.9525 0.9616 0.9693 0.9756
0.9429 0.9535 0.9625 0.9699 0.9761
0.9441 0.9545 0.9633 0.9706 0.9767
2.0 2.1 2.2 2.3 2.4
0.9772 0.9821 0.9861 0.9893 0.9918
0.9778 0.9826 0.9864 0.9896 0.9920
0.9783 0.9830 0.9868 0.9898 0.9922
0.9788 0.9834 0.9871 0.9901 0.9925
0.9793 0.9838 0.9875 0.9904 0.9927
0.9798 0.9842 0.9878 0.9906 0.9929
0.9803 0.9846 0.9881 0.9909 0.9931
0.9808 0.9850 0.9884 0.9911 0.9932
0.9812 0.9854 0.9887 0.9913 0.9934
0.9817 0.9857 0.9890 0.9916 0.9936
2.5 2.6 2.7 2.8 2.9
0.9938 0.9953 0.9965 0.9974 0.9981
0.9940 0.9955 0.9966 0.9975 0.9982
0.9941 0.9956 0.9967 0.9976 0.9982
0.9943 0.9957 0.9968 0.9977 0.9983
0.9945 0.9959 0.9969 0.9977 0.9984
0.9946 0.9960 0.9970 0.9978 0.9984
0.9948 0.9961 0.9971 0.9979 0.9985
0.9949 0.9962 0.9972 0.9979 0.9985
0.9951 0.9963 0.9973 0.9980 0.9986
0.9952 0.9964 0.9974 0.9981 0.9986
3.0
0.9987
0.9987
0.9987
0.9988
0.9988
0.9989
0.9989
0.9989
0.9990
0.9990
Page 19 of 20
FINAL PRACTICE EXAM P (SOURCE: PRE AUT 07) Last updated 22
nd
October 2010
t Distribution Entries in the table give t values for an area or probability in the upper tail of the t distribution. With 10 degrees of freedom and .05 area in the upper tail, t .05 = 1.812. Area in Upper tail Degrees of Freedom
0.10
0.05
0.025
0.01
0.005
1 2 3 4
3.078 1.886 1.638 1.533
6.314 2.920 2.353 2.132
12.706 4.303 3.182 2.776
31.821 6.965 4.541 3.747
63.656 9.925 5.841 4.604
5 6 7 8 9
1.476 1.440 1.415 1.397 1.383
2.015 1.943 1.895 1.860 1.833
2.571 2.447 2.365 2.306 2.262
3.365 3.143 2.998 2.896 2.821
4.032 3.707 3.499 3.355 3.250
10 11 12 13 14
1.372 1.363 1.356 1.350 1.345
1.812 1.796 1.782 1.771 1.761
2.228 2.201 2.179 2.160 2.145
2.764 2.718 2.681 2.650 2.624
3.169 3.106 3.055 3.012 2.977
15 16 17 18 19
1.341 1.337 1.333 1.330 1.328
1.753 1.746 1.740 1.734 1.729
2.131 2.120 2.110 2.101 2.093
2.602 2.583 2.567 2.552 2.539
2.947 2.921 2.898 2.878 2.861
20 21 22 23 24
1.325 1.323 1.321 1.319 1.318
1.725 1.721 1.717 1.714 1.711
2.086 2.080 2.074 2.069 2.064
2.528 2.518 2.508 2.500 2.492
2.845 2.831 2.819 2.807 2.797
25 26 27 28 29
1.316 1.315 1.314 1.313 1.311
1.708 1.706 1.703 1.701 1.699
2.060 2.056 2.052 2.048 2.045
2.485 2.479 2.473 2.467 2.462
2.787 2.779 2.771 2.763 2.756
30 40 60 120
1.310 1.303 1.296 1.289 1.282
1.697 1.684 1.671 1.658 1.645
2.042 2.021 2.000 1.980 1.960
2.457 2.423 2.390 2.358 2.326
2.750 2.704 2.660 2.617 2.576
Page 20 of 14