STATISTICS 4040
GCSE Statistics Revision notes Collecting data Sample – This is when data is collected from part of the population. There are different methods for sampling Random sampling, Stratified sampling, Systematic sampling, cluster sampling, Quota sampling Convenience sampling Random sample – Where each piece of data has an equal chance of being picked. Methods Random number table – Tables of random numbers can be use. Here is an extract from a table of random numbers 36015 37672 90153 67480 26237 10635 34269 01638 Split the numbers into to digit numbers 36 01 53 76 72 90 15 36 74 80 26 23 71 06 35 34 26 90 16 38 And then start from 36 and select numbers between 0 and 50 36 01 15 36 26 Leave out any numbers above 50 Calculator – Use the RAN button on your calculator. For numbers from 0 to 100 type 100 Shift RAN Until you have enough numbers. Numbers in a bag – List numbers from 1 to 100 and put them in a bag and select the appropriate number at random Stratified sample – Where the data sampled is in proportion to the population. ExampleThe table shows the number of students in a school Year Students 7 120 8 100 9 115 10 125 Total 460 A stratified sample of size 30 is to be taken. How many year 7’s will be picked? Solution Fraction of year 7 students in school is 120 240 120 x 30 = 7.82… Approx 8 year 7 students will be picked 240
Other Sample techniques Convenience sample –the first so many pieces of data in the list are sampled. (Quick but unlikely to be representative Quota sample – The amount or quota of each group is given eg 100 woman were sampled Systematic sampling – Data is chosen at regular intervals eg Every 10th Person. Cluster sampling – The population is divided into groups (cluster) and then a group is chosen at random.
Khodabocus Aihjaaz Ahmad
STATISTICS 4040 Census – This is when all of the data in the population is taken. For example a census of the entire population of the UK is taken every 10 years. Sample
Census
Advantage Disadvantage Cheaper Not completely representative Less time consuming Possibly biased Less data to be analysed Unbiased Accurate Takes account of Whole population
Time consuming Expensive Difficult to ensure whole Whole population is surveyed
Types of Data - Secondary data – This is data that has been collected by someone else. Advantage- No need to collect. Ready to analyse Disadvantage – Could be unreliable - Primary data - This is data collected by the person doing the analysis Advantage - Should be reliable Disadvantage – Collecting is time consuming Continuous Data – This is data that is on a continuous scale (Lengths, height, weights, measurements) Discrete Data - This data that consists of separate numbers. (Shoe sizes, number of people, money) Quantitative data – This is data that has numerical values. (Time , heights, weights , number of people) Qualitative data - This is data that is not numerical (Colour, type , ) Questionnaires Open questions – Has no suggested answers and gives people chance to reply as they wish Advantage –Allows for a range of answers Disadvantage – Range of response too broad- hard to analyse Closed questions – Gives a set of answer for the person to choose from Advantage – Restricts response making it easy to analyse responses Disadvantage – Will not necessarily cover all responses Pilot survey (pre-test) – Small scale replica of the survey to be carried out. Used to ensure that the method of Data collection/ questionnaire and data required is suitable for the bigger survey. Leading question – Avoid questions that infer an opinion such as “Smoking is bad for you. Do you agree?” Other sampling methods See page 16 Questionnaires Open questions – Has no suggested answers and gives people chance to reply as they wish Closed questions – Gives a set of answer for the person to choose from Pilot survey (pre-test) – Small scale replica of the survey to be carried out. Used to ensure that the method of Data collection and data required is suitable for the bigger survey. Leading question – Avoid questions that infer an opinion such as “Smoking is bad for you. Do you agree?”
Khodabocus Aihjaaz Ahmad
STATISTICS 4040
Calculations Means from frequency distributions Example
Means from grouped data Find the mid-point of each group and then multiply by frequency. Sum and then divide by total frequency Example
Standard Deviation Variance is a measure of spread about the mean of a distribution of data The square root of the variance is the standard deviation Example 1
Example 2 If the data is grouped ( The mean for this example was found at the top of this page)
Khodabocus Aihjaaz Ahmad
STATISTICS 4040
Standardised Scores This is used to compare values from different sets of data. For example, how do you compare your score in a maths mock exam to your score in an English exam. Here’s how? Standardised score = score – mean Standard deviation Example Sam takes an exam in maths and another in English. His marks along with the mean marks for the year and the standard deviation are shown below
Normal Distribution Standard deviation is used to describe the normal distribution. The normal distribution appears when large amounts of data are collected such as heights of people. When put into a histogram the data will form a Bell shape as below.
Scatter Diagrams Khodabocus Aihjaaz Ahmad
STATISTICS 4040
To find the equation of a line of best fit y = ax + b Where a is the gradient of the line and b is the intercept on the y axis. Causal Relationship When a change in one variable causes a change in another variable there is said to be a causal relationship between the two. For example The size of a car engine and the amount of petrol the car uses. Sales of computers and sales of software Not a causal relationship -> Sales of chocolates and sales of clothes. Spearmans Rank Spearman’s rank correlation coefficient is a numerical measure of the correlation between two sets of data. - 1 is a perfect negative correlation + 1 is a perfect negative correlation 0 means no correlation
Geometric mean To work out the geometric mean of n numbers, multiply the numbers together and then take the nth root of the product Geometric mean of 3 , 7, 4, 8 Geometric mean = 4 3 × 7 × 4 ×8 = 5.09 In percentage change problems the geometric mean tell us the average percentage change over a period of time. Index numbers An index number shows the rate of change in quantity , value or price of an item over a period of time. Index number = quantity x 100 Quantity in base year Example
Chain base index numbers Khodabocus Aihjaaz Ahmad
STATISTICS 4040 A chain base index number tells you the annual percentage change. It is found by using the previous year as the base year and then working out the relative value of an item Example (Using data above for antique)
Weighted Means In a GCSE course 40% of the mark is for paper 1, 40% is for paper 2 10% is for coursework task 1 and 10% is for coursework task 2. If a student scores the following marks we can work out the weighted mean. Paper 1 62% Paper 2 38% Coursework 1 58% Coursework 2 29% Weighted mean = 40 x Paper 1 + 40 x paper 2 + 10 x coursework1 + 10 x coursework 2 40 + 40 + 10 + 10 = 40 x 62 + 40 x 38 + 10 x 58 + 10 x 29 = 49.7% 100 Time series and moving averages A time series graph shows how values change over a a period of time (days, weeks , months, quarters of years) The moving average gives an idea of how the values are changing To find the 3 point moving average or 23, 22, 24, 25, 26, 29, 28 Average 23, 22, 24 then Average 22, 24, 25, then average 24, 25, 26 and so on. Once you have calculated the moving averages you will need to plot these. Then draw a line of best fit through the moving averages to get a trend line. Quality assurance These are used in commercial productions. For example. A packet of crisps should have a weight of 50g. Samples of packets are taken a regular intervals and the mean weight calculated . Upper and lower warning and action limits are set. If the sample mean is above or below the warning limit another sample should be taken immediately. If the sample mean is above or below the action limit the production should be stopped and machines reset.
Khodabocus Aihjaaz Ahmad
STATISTICS 4040
Quality control chart for ranges. Samples are taken and the range found. If the range is too large then production should be stopped.
Charts and Graphs Box plots
Lowest value
Lower quartile
median
Upper quartile
Highest value
Outliers Any values 1.5 x IQR above the UQ or below the LQ are considered to be an outlier Cumulative frequency The frequency of a distribution is accumulated For example Mark Frequenc Cumulative frequency y 0-1 4 4 1 -2 5 4+ 5 = 9 2- 3 2 4 + 5 + 2 = 11 3- 4 6 4 + 5 + 2 + 6 = 17 4- 5 2 4 + 5 + 2 + 6 + 2 = 19 5- 6 3 4 + 5 + 2 + 6 + 2 + 3 = 22 6- 7 1 4 + 5 + 2 + 6 + 2 + 3 + 1 = 23 The values of the cumulative frequency are then plotted at the top value of each group and connected either by straight lines or a curve Khodabocus Aihjaaz Ahmad
STATISTICS 4040
Histograms The area of the bar represents the frequency and the height of the bar is the Frequency density Frequency density = frequency Class width
Stem and Leaf diagrams This is a chart to help order data. For example 68 , 72, 56, 52, 78, 53, 64, 73 Can be represented in a stem and leaf diagram 5 2 3 6 6 4 8 7 2 3 8 Key 5 2 = 52 Comparative Pie charts When comparing two sets of data using pie charts we need to take the total frequencies into account. The areas of the two circles should be in the same ratio as the two frequencies. The larger pie chart has the bigger frequency.
Khodabocus Aihjaaz Ahmad
STATISTICS 4040
Compound Bar Charts See page 53
Population pyramids This allows you to compare percentages of populations by age and gender.
Choropleth maps – Used to show population distributions
Khodabocus Aihjaaz Ahmad
STATISTICS 4040
Probability Odds The ratio failures : successes is the odds against an event happening The ratio successes : failures is the odds on an event happening If the odds are 7:2 against, what is the probability of success Answer: There are 7 chances of failure to every success, thus for (7 + 2) = 9 attempts there will be 2 successes The probability of a success is 2 9 Mutually Exclusive events – Events that cannot happen at the same time Independent events – The probability of one event is not affected by the probability of another event. Exhaustive events – A set of events is exhaustive if the set contains all possible outcomes. Rules of probability P(a or b ) = P (a ) + P(b) P(a and b) = P(a ) x P(b) Tree diagrams When completing a tree diagram remember each pair of branches must add to make 1. As you travel along the branches to find possible outcomes you multiply the probabilities. If the is more than one possible out come sum them.
Khodabocus Aihjaaz Ahmad
STATISTICS 4040 Venn Diagrams Discrete uniform distribution A discrete uniform distribution has n distinct outcomes. Each outcome is equally likely, with probability Equal to 1 n For example a fair six sided dice is rolled. The possible outcomes would be written as a probability distribution x: p(x):
1 1 6
2 1 6
3 1 6
4 1 6
5 1 6
6 1 6
Binomial distribution If two events p and q are independent. If p is consider a success and q a failure and n trials are carried out then the probabilities are found by expanding (p + q)n . p (success) = 0.2 q (failure) = 0.8 5 trials are carried out. Probability distribution is (p + q)5 = p5 + 5p4q + 10p3q2 + 10p2q3 + 5pq4 + q5 Probability of two successes : use
10p2q3 = 10 x (0.2)2 x (0.8)3
Khodabocus Aihjaaz Ahmad