NAME ROLL NO
Pooja Dutt
1411008323
1 Statistics plays a vital role in almost every facet of human life. Describe the functions of Statistics. Explain the applications of statistics. Meaning of statistics Functions of statistics Applications of statistics Answer: According to Seligman, “Statistics is a science which deals with the method of collecting, classifying, presenting, comparing and interpreting the numerical data to throw light on enquiry”. Functions of Statistics To represent facts in the form of numerical data: The first function of statistics is to present a given problem in terms of numerical figures. We know that the numerical presentation helps in having a better understanding of the nature of problem. To condense and summarize a mass of data: Generally, the problem to be investigated is represented by a large mass of numerical figures which are very difficult to understand and remember. Using various statistical methods, this large mass of data can be reduced to totals, averages, percentages, etc., and presented either graphically or diagrammatically. To facilitate comparison of data: Many times, the purpose of undertaking a statistical analysis is to compare various phenomena by computing one or more measures like mean, variance, ratios, percentages and various types of coefficients because the absolute figures do not convey any significant meaning. But their comparison helps us to draw the conclusion. To formulate and test hypothesis for the purpose of co-relation: A hypothesis is a statement about some characteristics of a population. To forecast future trend: The success of planning by the Government or of an organization depends to a large extent upon the accuracy of their forecasts. Statistical method provides a scientific basis for making such forecasts. Application of Statistics In the field of medicine, statistical tools like t-tests are used to test the efficiency of the new drug or medicine. In the field of economics, statistical tools such as index numbers, estimation theory and time series analysis are used in solving economic problems related to wages, price, production and distribution of income. In the field of agriculture, an important concept of statistics such as analysis of variance (ANOVA) is used in experiments related to agriculture, to test the significance between two sample means. Insurance companies decide on the insurance premiums based on the age composition of the population and the mortality rates. Actuarial science is used for the calculation of insurance premiums and dividends. Statistics is a part of Economics, Commerce and Business. Statistical analysis of the variations in price, demand and production are helpful to both businessmen and economists. Cost of living index numbers help governments in economic planning and fixation of wages. A government’s administrative system is fully dependent on production statistics, income statistics, labour statistics, economic indices of cost,
and price. Economic planning of any nation is entirely based on the statistical facts. Cost of living index numbers are also used to estimate the value of money. In business activities, analysis of demand, price, production cost, and inventory costs help in decision making. Management of limited resources and labour needs statistical methods to maximize profit.
2 a) Explain the approaches to define probability. b) State the addition and multiplication rules of probability giving an example of each case. a) Explanation of the approaches to define probability b) Addition and multiplication rules of probability giving an example of each Answer: a) There are four approaches to define Probability. They are: 1) Classical / mathematical / priori approach Under this approach the probability of an event is known before conducting the experiment. In this case, each of possible outcomes is associated with equal probability of occurrence and number of outcomes favorable to the concerned event is known. Let a random experiment have n equally likely, mutually exclusive and exhaustive outcomes. Let m of these outcomes be favorable to an event A. Then, probability of A is – P(A) = Number of favorable outcomes Total number of outcomes P(A) = m / n Where, ‘m’ is the number of favorable outcomes, ‘n’ is the total number of outcomes of the experiments. 2) Statistical / relative frequency / empirical / posteriori approach Under this approach the probability of an event is arrived at after conducting an experiment. If we want to know the probability that a particular household in an area will have two earning members, then we have to gather data on all households in that area and then arrive at the probability. Greater the number of households surveyed, greater will be the accuracy in the probability, arrived. 3) Subjective approach Under this approach the investigator or researcher assigns probability to the events either from his experience or from past records. It is more suitable when the sample size is ten or less than ten. The investigator has full knowledge about the characteristics of each and every individual. However, there is a chance of personal bias being introduced in such probability. 4) Axiomatic approach Let S be a sample space consisting of all events of a random experiment and A S , then the probability of an event A is a set function satisfying the following axioms: i) Axioms of positivity, P(A) 0 ii) Axiom of certainty, P(S) = 1 b) Addition rule The addition rule of probability states that: i) If ‘A’ and ‘B’ are any two events then the probability of the occurrence of either ‘A’ or ‘B’ is given by:
ΡΑ Β=ΡΑ+ΡΒΡΑ Β ii) If ‘A’ and ‘B’ are two mutually exclusive events then the probability of occurrence of either ‘A’ or ‘B’ is given by: ΡΑ Β= ΡΑ+ΡΒ iii) If ‘A’, ‘B’ and ‘C’ are any three events then the probability of occurrence of either ‘A’ or ‘B’ or ‘C’ is given by: ΡΑΒC= ΡΑ+ΡΒ+ΡCΡΑΒΡΒCΡΑC+ΡΑΒC Multiplication rule If ‘A’ and ‘B’ are two independent events then the probability of occurrence of ‘A’ and ‘B’ is given by:
ΡΑΒ= ΡΑΡΒ 3 a) The procedure of testing hypothesis requires a researcher to adopt several steps. Describe in brief all such steps. b) Explain the components of time series. a) Hypothesis testing procedure b) Components of time series Answer: a) Procedure of testing hypothesis Null and Alternate hypothesis In hypothesis testing, we must state the assumed or hypothesized value of the population parameter before we begin sampling. The assumption we wish to test is called the null hypothesis and is symbolized by ’H0’. Interpreting the level of significance The purpose of hypothesis testing is not to question the computed value of the sample statistic but to make a judgment about the difference between that sample statistic and a hypothesised value for population parameter. Hypothesis are accepted and not proved Even if our sample statistic does fall in the non-shaded region, this does not prove that our null hypothesis (H0) is true; it simply does not provide statistical evidence to reject it. Why? It is because the only way in which the hypothesis can be accepted with certainty is for us to know the population parameter; unfortunately, this is not possible. Therefore, whenever we say that we accept the null hypothesis, we actually mean that there is no sufficient statistical evidence to reject it. Use of the term accept, instead of do not reject, has become a standard practice. It means that when sample data do not suggest us to reject a null hypothesis, we believe that the hypothesis is true. b) Components of time series i) Long term trend or secular trend: It can be defined as “a consistent long term change in the average level of the forecast variable per unit of time”. The steady increase in the population of India recorded by the census department is an example of secular trend.
ii) Seasonal variations: Seasonal variations are caused by the seasonal influence (spring, summer autumn & winter) on business and economic activities. Seasonal variations involve pattern of changes within a year that tend to be repeated from year to year. For Example the hotel industry can expect a substantial increase in the number of tourists during the spring & autumn every year. Similarly physician can expect an increase in the number of flu cases during the summer .As they are regular pattern they are useful in forecasting the future . iii) Cyclic variations: The second type of variation is cyclic fluctuations which are generally business cycles or the values of the variable under study tend to rise and fall in line with the fluctuations of the business cycle. iv) Random variations: This is the fourth type of change in time series analysis. In many situations the value of a variable may be completely unpredictable, changing in a random manner. Irregular variations describe such movements. This can occur due to strikes, break down of plants, non-seasonal illness, and bad weather etc. These variations either go very deep downward or too high to attain peaks abruptly. 4 a) What is a Chi-square test? Point out its applications. Under what conditions is this test applicable? b) Discuss the types of measurement scales with examples. a) Meaning, applications and conditions b) Types of measurement scales with examples Answer: a) Meaning of Chi-square test The Chi-square test is one of the most commonly used non-parametric tests in statistical work. The Greek Letter 2 is used to denote this test. 2describe the magnitude of discrepancy between the observed and the expected frequencies. The value of 2 is calculated as:
Applications of Chi-square test Tests for independence of attributes: In the test for independence, the null hypothesis is that the row and column variables are independent of each other. The hypothesis testing is done under the assumption that the null hypothesis is true. Test of goodness of fit: The test of goodness of fit of a statistical model measures how accurately the test fits a set of observations. This test measures and summarises the differences if any, between the observed and expected values of the considered statistical model. These test results are helpful to know whether the samples are drawn from identical distributions or not. Test for comparing variance: When we have to use ^2 as a test of population variance, then,
Where s^2= variance of the sample p^2= variance of the population (n -1) = degrees of freedom, n being the number of items in the sample. Conditions for applying the Chi-Square test
1. The frequencies used in Chi-Square test must be absolute and not in relative terms. 2. The total number of observations collected for this test must be large. 3. Each of the observations which make up sample of this test must be independent of each other. 4. As ^2 test is based wholly on sample data, no assumption is made concerning the population distribution. In other words, it is a nonparametric-test. 5. ^2 test is wholly dependent on degrees of freedom. As the degrees of freedom increase, the Chi-Square distribution curve becomes symmetrical. 6. The expected frequency of any item or cell must not be less than 5, the frequencies of adjacent items or cells should be polled together in order to make it more than 5. 7. The data should be expressed in original units for convenience of comparison and the given distribution should not be replaced by relative frequencies or proportions. 8. This test is used only for drawing inferences through test of the hypothesis, so it cannot be used for estimation of parameter value.
b) Types of measurement scales Qualitative (categorical) data Qualitative, also known as categorical data, cannot be measured on a numerical scale (quantified). Examples of categorical variables are gender (male or female) and size of T-shirt (XXS, XS, S, M, L, XL and XXL); yet, these two variables differ in a sense; the first is said to be nominal or purely categorical whereas the second is known as ordinal. Nominal (purely categorical) data Nominal variables allow for only qualitative classification. They can be measured only in terms of whether the individual items belong to some distinctively different categories; however, we cannot quantify or even rank order these categories. For example, 2 individuals are different in terms of a certain variable (for example, they are of different race), we cannot say which one has more of the quality represented by the variable. Typical examples of nominal variables are gender, race, colour, city, marital status, etc. Ordinal data Ordinal variables allow us to rank order the items we measure in terms of which has less and which has more of the quality represented by the variable, however they do not allow us to say how much more. A typical example of an ordinal variable is the socioeconomic status of families. For example, we know that upper-middle is higher than middle but we cannot say that it is, for example, 18% higher. Also, this very distinction between nominal, ordinal, and interval scales itself represents a good example of an ordinal variable. For example, we can say that nominal measurement provides less information than ordinal measurement, but we cannot say how much less or how this difference compares to the difference between ordinal and interval scales. Quantitative (numerical) data
Quantitative data can be easily measured on a numerical scale; variables which can be quantified in terms of units are all quantitative. Examples of quantitative variables are number of students per class and height (measured in centimetres). Again, these two variables differ in their nature; the first is said to be discrete whereas the second is continuous. Discrete data: Discrete data occur as definite and separate values; a discrete variable assumes values which are countable so that there are gaps between its successive values. For example, when counting the number of children in a class, we use numbers (0, 1, 2… n). Continuous data: Continuous data occur as the whole set of real numbers or a subset of it. In other words, there are no gaps between successive values so that a continuous variable assumes all the values (including all the decimals) between given boundaries. Temperature is a good example of a continuous variable – though thermometer readings are recorded to the nearest tenth of a degree (Centigrade or Fahrenheit), temperature does not ‘jump’ from, for example, 17.10 C to 17.20 C. It passes through all the real numbers between these two values. Height, weight and speed are also continuous variables. Continuous data can be measured on Interval scale & Ratio scale. 5 Business forecasting acquires an important place in every field of the economy. Explain the objectives and theories of Business forecasting. Meaning of Business forecasting Objectives of Business forecasting Theories of Business forecasting Answer: Business forecasting refers to the analysis of past and present economic conditions with the object of drawing inferences about probable future business conditions. The process of making definite estimates of future course of events is referred to as forecasting and the figure or statements obtained from the process is known as ‘forecast’; future course of events is rarely known. In order to be assured of the coming course of events, an organised system of forecasting helps. Objectives of Business forecasting To a very large extent, success or failure would depend upon the ability to successfully forecast the future course of events. Without some element of continuity between past, present and future, there would be little possibility of successful prediction. But history is not likely to repeat itself and we would hardly expect economic conditions next year or over the next 10 years to follow a clear cut prediction. Yet, past patterns prevail sufficiently to justify using the past as a basis for predicting the future. A businessman cannot afford to base his decisions on guesses. Forecasting helps a businessman in reducing the areas of uncertainty that surround management decision making with respect to costs, sales, production, profits, capital investment, pricing, expansion of production, extension of credit, development of markets, increase of inventories and curtailment of loans. These decisions are to be based on present indications of future conditions. However, we know that it is impossible to forecast the future precisely. There is a possibility of occurrence of some range of error in the forecast. Statistical forecasts are the methods in which we can use the mathematical theory of probability to measure the risks of errors in predictions.
Theories of Business Forecasting Sequence or time-lag theory: This is the most important theory of business forecasting. It is based on the assumption that most of the business data have the lag and lead relationships, that is, changes in business are successive and not simultaneous. There is time-lag between different movements. Action and reaction theory: This theory is based on the following two assumptions. Every action has a reaction Magnitude of the original action influences the reaction Economic Rhythm Theory: The basic assumption of this theory is that history repeats itself and hence assumes that all economic and business events behave in a rhythmic order. According to this theory, the speed and time of all business cycles are more or less the same and by using statistical and mathematical methods, a trend is obtained which will represent a long term tendency of growth or decline. It is done on the basis of the assumption that the trend line denotes the normal growth or decline of business events. Specific historical analogy: History repeats itself is the main foundation of this theory. If conditions are the same, whatever happened in the past under a set of circumstances is likely to happen in future also. A time series relating to the data in question is thoroughly scrutinised such a period is selected in which conditions were similar to those prevailing at the time of making the forecast. However, this theory depends largely on past data. Cross-cut analysis theory: This theory proceeds on the analysis of interplay of current economic forces. In this method, the combined effects of various factors are not studied. The effect of each factor is studied independently. Under this theory, forecasting is made on the basis of analysis and interpretation of present conditions because the past events have no relevance with present conditions.
6 a) What is analysis of variance? What are the assumptions of this technique? b) Three samples below have been obtained from normal populations with equal variances. Test the hypothesis at 5% level that the population means are equal. A B C 8 7 12 10 5 9 7 10 13 14 9 12 11 9 14 [The table value of F at 5% level of significance for 1 = 2 and 2 = 12 is 3.88]
a) Meaning and Assumptions b) Formulas/Calculation/Solution to the problem Answer: a) Meaning of Analysis of Variance Analysis of Variance (ANOVA) is useful in such situations as comparing the mileage achieved by five different brands of gasoline, testing which of four different training methods produce the fastest learning record, or comparing the first-year earnings of the graduates of half a dozen different business schools. In each of these cases, we would compare the means of more than two samples. Hence, in most of the fields, such as agriculture, medical, finance, banking, insurance, education, etc., the concept of ANOVA is used. In statistical terms, the difference between two statistical data is known as variance. Assumptions of Analysis of Variance The samples are simple random samples. The samples are independent of each other The parent populations from which they are drawn are normally distributed. b) b) Let H0: There is no significant difference in the means of three samples The three samples X1 X2 X3 8 7 12 10 5 9 7 10 13 14 9 12 11 9 14 ∑X1= ∑X2= ∑X3= 50 40 60 T= Sum of all observations = 150 Correction factor = (T^2)/N = (150^2)/15 SST (Total Sum of the Squares)= Sum of squares of all observations - (T^2)/N = 8^2 + 7^2 +12^2 +10^2 +..........+14^2-1500= 1600 -1500 =100 Sum of the Squares of Error between the columns (samples):
= 50^2 + 40^2 + 60^2 – 1500 = 1540 – 1500 = 40 5 5 5 Sum of the squares of the Error within columns (samples): SSE = SST – SSC = 100 – 40 = 60 Variance between samples: MSC =SSE = 60 . = 5 (n-k) (15-3) The degree of freedom = (k – 1, n – k) = (2, 12). [k is the number of columns and n is the total number of observations]
ANOVA Table Source variation Between
of
Sum of squares SSC = 40
Df 2
Mean square MSC = 20
F-value Fcal=20/5 =4
Within SSE = 60 12 MSE = 5 Total TSS = 100 14 F table value for degrees of freedom (2,12) [1 = 2, 2 =12] at 5% level of significance is 3.88. Since F table value is smaller than the F calculated value, we reject the null hypothesis and conclude that sample means are not equal.