Exploring Statistical Analysis Using JASP
Frequentist and Bayesian Approaches
Christopher P. Halter
JASP Guide
Copyright © 2018 by Christopher P. Halter All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior written permission of the author Cover Art by Gerd Altmann • Freiburg/Deutschland CC0 CreativeCommons ISBN: 171702601X ISBN-13: 978-1717026019
ii
CONTENTS
SECTION I: AN INTRODUCTION
1
Chapter 1 The Guide Notes about the statistics guide The Philosophy Behind This Book and the Open Source Community Notes about the data
2 2 3 3
Chapter 2 Overview of Frequentist Statistical Analysis in Social Science Why use statistics in Social Science research? What is Continuous and Categorical Data? Parametric versus Non-Parametric Data Confidence Intervals (CI) P-Value Effect Size
6 6 7 10 11 12 14
Chapter 3 Overview of Bayesian Statistical Analysis in Social Science Who was Thomas Bayes? The Bayesian Statistical Approach An illustrated example of the Bayesian model Prior Distributions Interpreting the Bayes Factor Statistical Assumptions
17 17 18 18 20 23 25
Chapter 4 Getting Started with JASP Preparing the Data and Making Decisions Creating Your Variable Codebook Data Types and Analysis Methods Data in JASP Using .SAV in JASP Spreadsheets in JASP
27 27 27 28 30 31 32
Chapter 5 Hypothesis Building The Alternative Hypothesis Hypothesis Setting Contingency Table Hypothesis Relationship Hypothesis Association Hypothesis
35 35 35 38 40 42
i
JASP Guide
SECTION II: DESCRIPTIVES AND DATA VISUALIZATION 45 Chapter 6 Descriptive Statistics What are descriptive statistics? Creating Descriptive Statistics in JASP for Categorical Data Descriptive Statistics for Continuous Data
SECTION III: FREQUENTIST APPROACHES
46 46 46 52
59
Chapter 7 Relationship Analysis with Contingency Tables What is a Contingency Table? Chi-Square Analysis (Categorical Differences) Using the Contingency Tables in JASP Contingency Table Analysis
60 60 60 61 64
Chapter 8 Relationship Analysis with t-Test t-Test Analysis (Continuous Differences, two groups) One Sample t-Test using JASP Interpreting the Results Tables for One Sample t-Test Independent Samples t-Test using JASP Interpreting the Results Tables for Independent Samples t-Test Paired Samples t-Test using JASP Interpreting the Results Tables for Paired Samples t-Test
69 69 70 72 73 76 79 80
Chapter 9 Relationship Analysis with ANOVA Analysis of Variance (ANOVA) Using Univariate Analysis for One-Way ANOVA Interpreting Results Tables: One-Way ANOVA Polynomial Trends and What They Look Like
81 81 81 86 90
Chapter 10 Two-Way ANOVA (Factorial ANOVA) Using Univariate Analysis for Two-Way (Factorial) ANOVA Interpreting Results Tables: Two-Way ANOVA
92 92 97
Chapter 11 Relationship Analysis with ANCOVA Analysis of Covariance (ANCOVA) Using Univariate Analysis for Two-Way ANCOVA Interpreting Results Tables: ANCOVA
99 99 99 103
Chapter 12 Associations with Correlation Correlation Analysis with JASP Interpreting Results Tables: Correlation Matrix
107 107 109
Chapter 13 Associations with Regression (Linear) Regression Analysis with JASP Interpreting the Results Tables: Linear Regression
111 111 113
ii
First Edition
Chapter 14 Associations with Regression (Binomial Logistic) Binomial Logistic Regression Using JASP for Binomial Logistic Regression Interpreting Results Tables: Logistic Regression
115 115 115 118
Chapter 15 Reliability Reliability Using JASP for Agreement Interpreting Results Tables: Reliability (Agreement) Reliability Using JASP for Accuracy Interpreting Results Table: Reliability (Agreement)
124 125 127 127 128
Chapter 16 Factor Analysis What is Factor Analysis? Determining the Number of Factors to Extract Conducting Factor Analysis with JASP
129 129 130 133
SECTION IV: BAYESIAN APPROACHES
139
Chapter 17 Relationship Analysis with Bayesian Contingency Tables What is a Contingency Table? Chi-Square Analysis (Categorical Differences) Using the Bayesian Contingency Tables in JASP Bayesian Contingency Table Analysis
140 140 140 141 145
Chapter 18 Relationship Analysis with Bayesian t-Test Bayesian t-Test Analysis (Continuous Differences, two groups) Bayesian One Sample t-Test using JASP Interpreting the Results Tables for Bayesian One Sample t-Test Bayesian Independent Samples t-Test using JASP Interpreting the Results Tables for Bayesian Independent Samples t-Test Bayesian Paired Samples t-Test using JASP Interpreting the Results Tables for Bayesian Paired Samples t-Test
149 149 150 152 154 157 159 161
Chapter 19 Relationship Analysis with Bayesian ANOVA Analysis of Variance (ANOVA) Using Univariate Analysis for Bayesian One-Way ANOVA Interpreting Results Tables: One-Way ANOVA
164 164 164 167
Chapter 20 Bayesian Two-Way ANOVA (Factorial ANOVA) Using Univariate Analysis for Bayesian Two-Way (Factorial) ANOVA Interpreting Results Tables: Two-Way ANOVA
169 169 173
Chapter 21 Relationship Analysis with Bayesian ANCOVA Bayesian Analysis of Covariance (ANCOVA) Using Univariate Analysis for Two-Way ANCOVA Interpreting Results Tables: ANCOVA
177 177 177 180
Chapter 22 Associations with Bayesian Correlation Matrix Bayesian Correlation Analysis with JASP
184 184
iii
JASP Guide
Interpreting Results Tables: Correlation Matrix
186 187
Chapter 23 Concluding Thoughts
SECTION V: RESOURCES
189
Analysis Memos
190
Data Sets Sample Student Data (SSD) Codebook Sample Student Data (SSD) Set Reliability for Agreement Codebook Reliability for Agreement Dataset Reliability for Accuracy Codebook Reliability for Accuracy Dataset Test Scores Codebook Test Scores Dataset College Admission Data Codebook College Admissions Dataset Favorite Class Data Codebook Favorite Class Dataset Effect Size Tables Reporting Frequentist Statistics Reporting Bayesian Statistics
193 194 195 201 202 203 204 205 206 207 208 219 220 228 232 235
References
236
Index
237
ABOUT THE AUTHOR
240
iv
First Edition
v
JASP Guide
THANK YOU.
vi
Section I: An Introduction
1
JASP Guide
Chapter 1 The Guide
Notes about the statistics guide So let’s get this out of the way right from the start. This is NOT a math book. The JASP Guide’s purpose is to assist the novice social science and education researcher in interpreting statistical output data using the JASP Statistical Analysis application. Through the examples and guidance, you will be able to select the statistical test that is appropriate for your data, apply the inferential test to your data, and interpret a statistical test’s results table. The Guide goes into the uses of some of the most common statistical tests and discusses some of the limitations of those tests, i.e. Chi-square using Contingency Tables, t-Test, ANOVA, ANCOVA, Correlation, and Regressions (Linear and Binomial). The ANOVA description includes procedures for conducting the OneWay ANOVA, as well as the General Linear Model (GLM) for other types of ANOVA analysis.
2
First Edition
Exploratory Factor Analysis has been included in this guide as a valuable procedure for data reduction. Reliability tests will be discussed as a way to verify the reliability of coding data between researchers. The focus of this guide will be the typical statistical analysis tools that may be useful for the novice or beginning researcher. Only a subset of the tools currently available in JASP will be covered. The guide does not include all of the tools and features available within the JASP application. The ones that were viewed as most common and readily available to the novice researcher are included. The JASP application supports both Frequentist and Bayesian procedures. These procedures are each explored in the specific section for those analysis methods. The sample window views and output tables shown in this guide were mainly created from JASP 0.8.6.
The Philosophy Behind This Book and the Open Source Community This book began as my own attempt to find a practical way to teach introductory statistical analysis to doctoral students in the field of social sciences. So began my search for an alternative that would be useful in learning basic analysis skills and capable of performing basic statistical analysis tests. This brought me to JASP. Developed at the University of Amsterdam, this powerful software package is effective and easy to use. Another key feature of the open source group is that the software is distributed free of charge. This guide is not intended to be a course on statistics or the mathematics behind statistical analysis. With the advent of statistical analysis applications anyone with a computer can run statistical analysis on any dataset. The intention of this guide is to provide the novice researcher with a step-by-step guide to using these powerful analysis tools and the confidence to read and interpret results tables in order to guide their own research.
Notes about the data Data shown in this guide are for demonstration purposes and do not represent actual research that has been previously conducted. Specific data types will be used to show the steps and procedures for various analysis methods; i.e. t-Test, ANOVA, Reliability, etc. This sample data should not be used to make assumptions or claims about the populations that they represent.
3
JASP Guide
The data codebook and data tables can be found in the appendix. Below is a short description of each data set that will be used in the examples. Sample Student Data (SSD) The Sample Student Data (SSD) is based on data from the High School & Beyond study commissioned by the Center on Education Policy (CEP) and conducted by researcher Harold Wenglinsky. The study was based on a nationally representative, longitudinal database of students and schools from the National Educational Longitudinal Study of 1988-2000 (NELS). The study focused on a sample of low-income students from inner-city high schools. The study compared achievement and other education-related outcomes for students in different types of public and private schools, including comprehensive public high schools (the typical model for the traditional high school); public magnet schools and “schools of choice;” various types of Catholic parochial schools and other religious schools; and independent, secular private schools. The High School and Beyond (HS&B) study included two cohorts: the 1980 senior class, and the 1980 sophomore class. The majority of the SSD data points were drawn from the larger HSB data, with 200 of the records being drawn to be used in this guide. The variables of gender, race, socioeconomic status, school type attended, program type enrolled, as well as assessment scores for reading, writing, mathematics, science, and social studies have been drawn from the HSB data. The SSD data points concerning first generation college attendance and attending college after high school are fictitious data based on known information about similar populations and based on the student demographics of the HSB data. Test Scores (TestScores) Data The Test Scores (TestScores) data contains fictitious assessment scores from an advanced mathematics course. The data points include the student ID, a pre-test assessment score, a post-test assessment score. Reliability for Agreement (Reliable1) Data The Reliability for Agreement (Reliable1) data is an exercise in how well a group of assessors agree on applying a rubric assessment score. The data points include the item number and a value for whether or not the 5 assessment scorers applied the same rubric value.
4
First Edition
Reliability for Accuracy (Reliable2) Data The Reliability for Accuracy (Reliable2) data is an exercise in how well a group of assessors applied a rubric score that contained 4 performance levels. The data points include the item number and assessment scorers applied by each of the 5 scorers. College Admission (CollegeAdmit) Data The College Admission (CollegeAdmit) data file is based on fictitious college acceptance decisions for a sample of high school students. We will be using data from the class including 397 participants with demographic information, including admission decision to college, student SAT test scores, overall High School GPA, and the rubric score for their college essays. Favorite Class (FavClass) Data The Favorite Class (FavClass) data file is a survey asking students to rate their classes on a scale from 1-4, with 1 meaning they hate the class and 4 meaning they love the class. The classes being rated are Biology, Geology, Chemistry, Algebra, Calculus, and Statistics. To download these sample datasets and run your own analysis, please visit https://tinyurl.com/JASP-sampledata
JASP Application Screenshots The guide uses screenshots from the actual JASP application. These screenshots are to demonstrate the pull-down menus and settings dialogue boxes used by JASP. Unfortunately, these screenshots tend to be lower quality than other graphics. The graphics representing the JASP application are presented in the best possible quality for a computer screenshot.
5
Chapter 2 Overview of Frequentist Statistical Analysis in Social Science
Why use statistics in Social Science research? Everyone loves a good story. Rich narratives, interesting characters, and the unfolding of events draw us into the story, anticipating how it ends. Qualitative research methods are well suited to uncover theses stories. However, we should not ignore the power of quantitative methods. One of the assumptions made about quantitative research methods is that they merely deal with numbers. And let’s face it, to many of us numbers are quite boring. A well-constructed data table or beautifully drawn graph does not capture the imagination of most readers. But appropriately used quantitative methods can uncover subtle differences, point out inconsistencies, or bring up more questions that beg to be answered. In short, thoughtful quantitative methods can help guide and shape the qualitative story. This union of rich narratives and statistical evidence is at the heart of any good mixed methods study. The researcher uses the data to guide the narrative. Together these methods can reveal a more complete and complex account of the topic.
6
First Edition
What is Continuous and Categorical Data? Within statistical analysis we often talk about data as being either continuous or categorical. This distinction is important since it guides us towards appropriate methods that are used for each kind of data set. Depending on the kind of data you have there are specific statistical techniques available for you to use with that data. Mark Twain (1804-1881) is often credited with describing the three types of lies as “lies, damn lies, and statistics”. This phrase is still used in association with our view of statistics. This may be due to the fact that one could manipulate statistical analysis to give whatever outcome is being sought. Poor statistics has also been used to support weak or inconsistent claims. This does not mean that the statistics is at fault, but rather a researcher who used statistical methods inappropriately. As researchers we must take great care in employing proper methods with our data. Continuous data can be thought of as “scaled data”. The numbers may have been obtained from some sort of assessment or from some counting activity. A common example of continuous data is test scores. Some examples of continuous data include; • • • •
The time it takes to complete a task A student’s test scores Time of day that someone goes to bed The weight or height of a 2nd grade student
All of these examples can be thought of as rational numbers. For those of us who have not been in an Algebra class for a number of years, rational numbers can be represented as fractions, which in turn can be represented as decimals. Rational numbers can still be represented as whole numbers as well. A subset of this sort of data can be called discrete data. Discrete data is obtained from counting things. They are represented by whole numbers. Some examples of discreet numbers include; • • • •
The number of courses a student takes each year The number of people living in a household The number of languages spoken by someone The number of turns taken by an individual
Categorical data is another type of important statistical data, and one that is often used in social science research. As the name implies, categorical data is comprised of categories or the names of things that we are interested in studying. In working with statistical methods we often transform the names of our categories into numbers. 7
JASP Guide
An example of this process is when we collect information about the primary languages spoken at home by students in our class. We may have found that the primary home languages are English, Spanish, French, and Cantonese. We may convert this data into numbers for analysis. Language Spoken
Code
English
1
Spanish
2
French
3
Cantonese
4 Primary Home Language Code Book
In the above example the numbers assigned to the categories do not signify any differences or order in the languages. The numbers used here are “nominal” or used to represent names. Another example of categorical data could be the grade level of a high school student. In this case we may be interested in high school freshmen, sophomores, juniors, and seniors. Assigning a numerical label to these categories may make our analysis simpler. Grade Level
Code
Freshman
1
Sophomore
2
Junior
3
Senior
4 High School Grade Levels example
In this example, again the numbers are just representing the names of the high school level, however they do have an order. “Freshman” comes prior to “Sophomore”, which is prior to “Junior”, and also “Senior”. This sort of categorical data can be described as ordinal, or representing an order.
8
First Edition
It is important to recognize the sort of data that is being used in the research analysis process. A researcher should ask; • • • • • • • •
Does my data represent information that is continuous (a rational number) or is it categorical (names and labels)? Does this represent test scores or evaluations? Does this data represent something that has been counted? Is the interval between the data points a regular, measured interval? Does this data represent the names of something? Do the data points represent the order of objects? Are the data points opinions? Are there irregular differences between the data points?
Depending on whether a researcher is using categorical or continuous data, there are specific statistical methods available. Below are some of the most common statistical methods in social science research and the data associated with the method. Common Statistical Methods Descriptive Statistics
Inferential Statistics
Statistical Method
Representation and Use
Normal Distribution
Graphs
Central Tendencies
Mean, Median, Mode
Variance
Standard Deviation
Charts and Graphs
Histogram, Pie Chart, Stem & Leaf Plots, Scatterplots
Chi-square
Differences or relationships between categorical data Differences or relationships between continuous data with two groups Differences or relationships between continuous data with more than 2 groups Associations between continuous data
t-Test
ANOVA
Correlation
9
JASP Guide
Regression Factor Analysis
Modeling associations between continuous data Factor grouping within categorical or continuous data. Data reduction.
Parametric versus Non-Parametric Data Our data can also be classified as either parametric or nonparametric. This term refers to the distribution of data points. Parametric data will have a “normal distribution” that is generally shaped like the typical bell curve. Non-parametric data does not have this normal distribution curve.
Normal Distribution Curve of parametric data
Nonparametric data by Carl Boettiger on Flickr https://www.flickr.com/photos/cboettig/9019872976
10
First Edition
Depending on the distribution of your data, various statistical analysis techniques are available to use. Some methods are designed for parametric data while other methods are better suited for non-parametric data distributions. Sample Statistics based on Data Distribution Parametric Data Distribution Normal
Non-Parametric Any
Variance within Data
Homogenous
Any
Typical Data Type
Continuous (Ratio or Interval)
Benefits of the data
More powerful, able to draw stronger conclusions
Continuous or Categorical (Ordinal or Nominal) Simpler to use
Statistical Tests Correlations Relationships with 2 groups Relationships with >2 groups
Pearson
Spearman
t-Test
Mann-Whitney or Wilcoxon Test Kruskal-Wallis or Friedman’s Test
ANOVA
In choosing a statistical method we must consider both the character of the data as well as the distribution of our data. The character, or data type, can be described as nominal, ordinal, ratio, or interval. The distribution can be described as parametric or non-parametric. These data features will lead us to selecting the most appropriate statistical method for our analysis of the data. Throughout this guide the examples will come from our sample data set that contains both categorical and continuous data.
Confidence Intervals (CI) Much of inferential statistics is based on measurements and manipulations of the means. When we have sample data, the measures of central tendency, such as mean, median, and mode, are very simple to calculate and compare. When these measures are compared across categories, factors, or groupings we begin to find differences in their means. Often we take sample data to represent some larger population. We can calculate the sample mean with certainty but the true mean of the population being represented cannot be known for certain through the sample data. This is when the confidence intervals come into consideration. 11
JASP Guide
The confidence interval is a calculated range of values for the true mean. We can know with a certain amount of “confidence”, typically at the 95% confidence level, that the true mean will fall within the specified confidence interval, or range. For example, we may find that the mean of our sample is 52.65 for some measure. The calculated confidence interval could be from 51.34 to 53.95 for the general population. Therefore, given that the sample mean is 52.65, we can state with 95% confidence that the true mean lies somewhere between 51.34 and 53.95 for the population.
P-Value What is a P-value? In statistical analysis the way we measure the significance of any given test is with the P-value. This value indicates the probability of obtaining the same statistical test result by chance. Our calculated p-values are compared against some predetermined significance level. The most common significance levels are the 95% Significance Level, represented by a p-value of 0.05, and the 99% Significance Level, represented by a p-value of 0.01. A significance level of 95%, or 0.05, indicates that we are accepting the risk of being wrong 1 out of every 20 times. A significance level of 99%, or 0.01, indicates that we risk being wrong only 1 out of every 100 times. The most common significance level used in the Social Sciences is 95%, so we are looking for p-values < 0.05 in our test results. However, in statistical analysis we are not looking to prove our test hypothesis with the p-value. We are often trying to reject the Null Hypothesis. What is the Null Hypothesis? In statistical testing the results are always comparing two competing hypothesis. The null hypothesis is often the dull, boring hypothesis stating that there is an association or relationship between the test populations or conditions. The null hypothesis tells us that whatever phenomenon we were observing had no or very little impact. On the other hand, we have the alternative, or researcher’s hypothesis. This is the hypothesis that we are rooting for, the one that we want to accept in many cases. It is the result we often want to find since it indicates that there are associations and relationships between populations or conditions. Then we can take that next step to explain or examine them more closely. When we perform a statistical test, the p-value helps determine the significance of the test and the validity of the claim being made. The claim that is always “on trial” here is the null hypothesis. When the p-value is found to be statistically significant (p < 0.05), or that it is highly statistically significant (p < 0.01), then we can conclude 12
First Edition
that the relationships or associations found in the observed data are very unlikely to occur by chance if the null hypothesis is actually true. Therefore, the researcher can “reject the null hypothesis”. If you reject the null hypothesis, then the alternative hypothesis must be accepted. And this is often what we want as researchers. The only question that the p-value addresses is whether or not the experiment or data provides enough evidence to reasonably reject null hypothesis. The p-value or calculated probability is the estimated probability of rejecting the null hypothesis of a study question when that null hypothesis is actually true. In other words, it measures the probability that you will be wrong in rejecting the null hypothesis. And all of this is decided based on our predetermined significance level, in most cases the 95% level or p < 0.05. Let’s look at an example. Suppose your school purchases a SAT Prep curriculum in the hopes that this will raise the SAT test scores of your students. Some students are enrolled in the prep course while others are not enrolled in the prep course. At the end of the course all your students take the SAT test and their resulting test scores are compared. In this example our null hypothesis would be that “the SAT prep curriculum had no impact on student test scores”. This result would be bad news considering how much time, effort, and money was invested in the test prep. The alternative hypothesis is that the prep curriculum did have an impact on the test scores, and hopefully the impact was to raise those scores. Our predetermined significance level is 95%. After using a statistical test suppose that we find a p-value of 0.02, which is indeed less than 0.05. We can reject the null hypothesis. Now that we have rejected the null hypothesis the only other option is to accept the alternative hypothesis, specifically that the scores are significantly different. This result does NOT imply a "meaningful" or "important" difference in the data. That conclusion is for you to decide when considering the real-world relevance of your result. So again, statistical analysis is not the end point in research, but a meaningful beginning point to help the researcher identify important and fruitful directions suggested by the data. It has been suggested that the idea of “rejecting the null hypothesis” has very little meaning for social science research. The null hypothesis always states that there are “no differences” to be found within your data. Can we really find NO DIFFERENCES in the data? Are the results that we find between two groups ever going to be identical to one another? The practical answer to these questions is “No”. There will always be differences present in our data. What we are really asking is whether or not those differences have any statistical significance. As we discussed previously, our statistical tests are aimed at producing the p-value that indicates the likelihood of having the differences 13
JASP Guide
occur purely by chance. And the significance level of p = 0.05 is just an agreed upon value among many social scientist as the acceptable level to consider as statistically significant. And to find that the differences within the data are statistically significant may just be a factor of having a large enough sample size to make those differences meaningful. A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis. A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis. P-values very close to the cutoff (0.05) are considered to be marginal so you could go either way. But keep in mind that the choice of significance levels is arbitrary. We have selected a significance level of 95% because of the conventions used in most Social Science research. I could have easily selected a significance level of 80%, but then no one would take my results very seriously. Relying on the p-value alone can give you a false sense of security. The p-value is also very sensitive to sample size. If a given sample size yields a p-value that is close to the significance level, increasing the sample size can often shift the p-value in a favorable direction, i.e. make the resulting value smaller. So how can we use p-values and have a sense of the magnitude of the differences? This is where Effect Size can help.
Effect Size Whereas statistical tests of significance tell us the likelihood that experimental results differ from chance expectations, effect-size measurements tell us the relative magnitude of those differences found within the data. Effect sizes are especially important because they allow us to compare the magnitude of results from one population or sample to the next. Effect size is not as sensitive to sample size since it relies on standard deviation in the calculations. Effect size also allows us to move beyond the simple question of “does this work or not?” or “is there a difference or not?”, but allows us to ask the question “how well does some intervention work within the given context?” Let’s take a look at an example that could, and has happened, to many of us when conducting statistical analysis. When we compare two data sets, perhaps we are looking at SAT assessment scores between a group of students who enrolled in a SAT prep course and another group of students who did not enroll in the prep course. Suppose that the statistical test revealed a p-value of 0.043. We should be quite pleased since this value would be below our significance level of 0.05 and we could 14
First Edition
report a statistical difference exists between the group of test takers enrolled in the prep course and those who were not enrolled in the course. But what if the calculated p-value was 0.057. Does this mean that the prep course is any less effective? So here is the bottom-line. The p-value calculation will help us decide if a difference or association has some significance that should be explored further. The effect size will give us a sense of the magnitude of any differences to help us decide if those differences have any practical meaning and are worth exploring. So both the p-value and the effect size can be used to assist the researcher in making meaningful judgments about the differences found within our data. Determining the Magnitude of Effect Size Once we have calculated the effect size value we must determine if this value represents a small, medium, or large effect. Jacob Cohen (1988) suggested various effect size calculations and magnitudes in his text Statistical Power Analysis for the Behavioral Sciences. The values in the effect size magnitude chart can be thought of as a range of values with the numbers in each column representing the midpoint of that particular range. For example, the effect size chart for Phi suggests a small, medium, and large effect size for the values of 0.1, 0.3, and 0.5 respectively. We could think of these as ranges with the small effect for Phi ranging from 0.0 to approximately 0.2, the medium effect size ranging from approximately 0.2 to 0.4, and the large effect size ranging from approximately 0.4 and higher. Suggested Effect Size Magnitude Chart Statistics Test
Small Effect
Medium Effect
Large Effect
Chi Squared
0.1
0.3
0.5
Cohen’s d
t-Test (Paired & Independent)
0.2
0.5
0.8
Eta Squared
ANOVA
0.01
0.06
0.14
Correlation
0.1
0.3
0.5
Correlation and t-Test (Independent)
0.01
0.09
0.25
Effect Size Calculation Phi or Cramer’s Phi
r r2
Values from Cohen (1988) Statistical Power Analysis for the Behavioral Sciences
15
JASP Guide
The importance of effect size can be best summed up by Gene Glass, as cited in Kline’s Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research, Washington DC: American Psychological Association; 2004. p. 95. Statistical significance is the least interesting thing about the results. You should describe the results in terms of measures of magnitude –not just, does a treatment affect people, but how much does it affect them. -Gene V. Glass
16
Chapter 3 Overview of Bayesian Statistical Analysis in Social Science
Who was Thomas Bayes? Thomas Bayes (1701-1761) was a philosopher and Presbyterian minister, as well as a mathematician. Although he never published any mathematical formulas or theories during his lifetime, Reverend Bayes was fascinated with mathematical puzzles and unsolved problems. He often communicated with mathematicians of the day. After his death, the writings of Thomas Bayes were discovered by one of Bayes’ mathematician friends, Richard Price. Price published the work and theorems of Bayes, sparking the birth of Bayesian Statistics. Bayes’ Theorem can be written as; 𝑃 𝐴 𝐵 = •
𝑃 𝐵 𝐴 𝑃(𝐴) 𝑃(𝐵)
In this equation P(A) and P(B) are the probabilities of two independent events and the P(B) ≠ 0. 17
JASP Guide
• •
P(A|B) represents the probability of event A given event B. P(B|A) represents the probability of event B given event A.
In Bayes argument we can state that the probability of some event A, let’s call this our hypothesis given the evidence of event B is the same as knowing the probability of event B happening given the evidence of A multiplied by our knowledge about the probability of event A, all divided by the probability of event B occurring. So, in simpler terms, Bayes claims that if we knew the likelihood of something happening and had some speculation or evidence for the likelihood of the other event happening, that we could predict the likelihood of this unknown event given what we know about the known event. Seems pretty clear, right? Let’s take a look at a more concrete example of this.
The Bayesian Statistical Approach In Bayesian statistical analysis, we begin with an estimate for the prior distribution, often simply referred to as a “prior”. This prior is the expected effect size distribution for our data, which is based on some rationale. As more data is collected and additional analysis is conducted, we can update our prior distributions. JASP will conduct calculations with our data and produce two main values of interest; the posterior distribution and the Bayes Factor. The posterior distribution is an update estimate of the effect size distribution range based on the current data. The Bayes Factor gives an estimate of the likelihood on one hypothesis compared to the other hypothesis. This is of course a very simplified view of Bayes.
An illustrated example of the Bayesian model Let’s suppose that you hear your doorbell ring and you wonder if it means that your package has arrived. P(Package | Doorbell) Now let’s say that you do not receive packages very often, maybe 2% of the time, but your doorbell ringing is a little more common with solicitors or the local girl scouts selling cookies, about 12% of the time. We can also say the the package delivery service rings the doorbell 95% of time when they deliver.
18
First Edition
This scenario gives us 𝑃 𝑃𝑎𝑐𝑘𝑎𝑔𝑒 𝐷𝑜𝑜𝑟𝑏𝑒𝑙𝑙 =
𝑃 𝐷𝑜𝑜𝑟𝑏𝑒𝑙𝑙 𝑃𝑎𝑐𝑘𝑎𝑔𝑒 𝑃(𝑃𝑎𝑐𝑘𝑎𝑔𝑒) 𝑃(𝐷𝑜𝑜𝑟𝑏𝑒𝑙𝑙)
𝑃 𝑃𝑎𝑐𝑘𝑎𝑔𝑒 𝐷𝑜𝑜𝑟𝑏𝑒𝑙𝑙 =
𝑃 0.95 𝑃(0.02) 𝑃(0.12)
𝑃 𝑃𝑎𝑐𝑘𝑎𝑔𝑒 𝐷𝑜𝑜𝑟𝑏𝑒𝑙𝑙 = 0.16 So, there is a 16% probability, or likelihood, that if your doorbell rings it will be from a package delivery. Let’s look at a more complex example. Let’s suppose that a group of researchers recently announced that they had devised a test that could detect if a person would become allergic to chocolate cake later in life. This new test could save countless birthday parties and celebrations. This new test for the development of chocolate cake allergies has only been around a short period of time, but it has an impressive 95% accuracy rate of correctly identifying people who will develop this allergy. It will give a false positive 10% of the time, identifying those who will not develop the allergy as being at risk. From previous data released by the research group, we know that 1% of the population could develop this rare allergy. So we want to know, what is the likelihood that you will develop the allergy if you get a positive result from this new test. Is it the case that you have a 95% chance of being at risk? Maybe not.
𝑃 𝐴𝑙𝑙𝑒𝑟𝑔𝑦 𝑃𝑜𝑠𝑇𝑒𝑠𝑡 =
• • •
𝑃 𝑃𝑜𝑠𝑇𝑒𝑠𝑡 𝐴𝑙𝑙𝑒𝑟𝑔𝑦 𝑃(𝐴𝑙𝑙𝑒𝑟𝑔𝑦) 𝑃(𝑃𝑜𝑠𝑇𝑒𝑠𝑡)
P(Allergy) denotes developing the allergy (1%) P(PosTest|Allergy) denotes getting a positive test result if you will develop the allergy (95%) P(PosTest) denotes the test being positive for anyone. But this we do not know.
19
JASP Guide
Fortunately, we can calculate the number of positive tests for anyone given the test by adding up those who get a positive result and will get the allergy with those who have a positive test and will not develop the allergy. • •
1% will have the allergy and 95% of them will get a positive test. 99% will not get the allergy and 10% of them will get a positive test.
This gives us the probability of a positive test result for anyone as; P(PosTest) = (0.01 X 0.95) + (0.99 X 0.10) = 0.108 or about 10.8% Now we can calculate the likelihood that we will develop the allergy given a positive test result. P(PosTest|Allergy) = 0.95 P(Allergy) = 0.01 P(PosTest) = 0.107 𝑃 𝐴𝑙𝑙𝑒𝑟𝑔𝑦 𝑃𝑜𝑠𝑇𝑒𝑠𝑡 =
=.>?∗=.=A =.A=B
= 0.088 or 8.8%
So in this example, if you test positive for developing the chocolate cake allergy, then there is actually an 8.8% probability, or likelihood, that you will in fact develop this rare allergy. If you have ever wondered how search engines can show advertisements related to your previous searches or suggest websites based on your specific search terms, you can thank Thomas Bayes. An even greater contribution of Reverend Thomas Bayes to our everyday lives is providing the world with the algorithms and methods to detect SPAM before it reaches your inbox! These are the ideas at the heart of Bayes Theorem.
Prior Distributions Bayes statistical analysis allows us to use previous, or prior, knowledge about some event in the calculation. As our knowledge is updated we can update the data in the statistical model. This prior knowledge is called the “Prior Distribution”. There are two basic types of priors in Bayesian statistics; non-informative priors and informative priors. Non-informative priors are used when we do not have much information about the data. Informative priors can be used as evidence is collected and we have more expectations about the data.
20
First Edition
There are several models that illustrate the possible Prior Distributions, most notably Cauchy’s, Student’s t-Test, and Normal probability distributions. Cauchy’s probability distribution. The Cauchy distribution is an example of a distribution which has no mean, variance or higher moments defined. Its mode and median are well defined.
By Skbkekas - CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=9649146
Student’s t-Test probability distribution The t-distribution is symmetric and bell-shaped, like the normal distribution, but has heavier tails, meaning that it is more prone to producing values that fall far from its mean. This makes it useful for understanding the statistical behavior of certain types of ratios of random quantities, in which variation in the denominator is amplified and may produce outlying values when the denominator of the ratio falls close to zero.
By Skbkekas - CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=9546828 21
JASP Guide
Normal probability distribution Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known.
By Inductiveload - Mathematica, Inkscape, Public Domain, https://commons.wikimedia.org/w/index.php?curid=3817954
Uniformed Probability Distribution The continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions in which all intervals of the distribution are equally likely.
General Uniform Distribution
When selecting a Prior to use in Bayesian analysis, think of this as a distribution range for the effect size that you might expect. For example, if we select a prior distribution of 0.707 (this is the default prior used in the JASP Bayesian t-Test), we
22
First Edition
are expecting the effect size within our data to be outside the interval -0.707 to 0.707, with the range centered at zero.
Cauchy’s Prior Distribution
As more data is collected and additional information is gained, the researcher can begin to move beyond the non-informative priors and select informative priors that specify some known effect size ranges that may be centered around values other than zero.
Interpreting the Bayes Factor Within Bayes statistical analysis, we often talk about comparing the probability of the Null hypothesis (H0) against the probability of the Alternative hypotheses (H1). In the model selection we can decide to test if the null and the alternative hypothesis are simply “not equal” to one another. H0 ≠ H1 On the other hand, we can also test for directional differences with the null hypothesis (H0) having greater differences or less differences than the Alternative hypothesis (H1). H0 > H1 or H0 < H1 When reporting the Bayes Factor, we can report either the likelihood of the Null hypothesis when compared to the Alternative hypothesis (H01) or the likelihood of the Alternative hypothesis when compared to the Null hypothesis (H10). Several authors (Jeffries, 1961 and Rafferty, 1995) have published some guidelines in reporting Bayes Factors and the terms to use when describing the strength of the hypothesis. 23
JASP Guide
The table below illustrated these suggested guidelines. Factors derived from H10 Bayes Factor in Favor of H0 1 - 0.33 0.33 - 0.10 0.10 - 0.05 0.05 - 0.03 0.03 - 0.01 0.01 - 0.0067 < 0.0067
Support for Hypothesis
Bayes Factor in Favor of H1 1-3 3 - 10 10 - 20 20 - 30 30 - 100 100 - 150 > 150
Rafferty
Jeffries
Weak Positive Positive Strong Strong Strong Very Strong
Anecdotal Substantial Strong Strong Very Strong Decisive Decisive
Another scale provided by Lee and Wagenmakers (2013), shown in the following table, was adjusted from Jeffreys (1961). Bayes factor
Evidence category
> 100
Extreme evidence for H1
30 – 100
Very strong evidence for H1
10 – 30
Strong evidence for H1
3 – 10
Moderate evidence for H1
1–3
Anecdotal evidence for H1
1
No evidence
0.3 – 1
Anecdotal evidence for H0
0.1 – 0.3
Moderate evidence for H0
0.03 – 0.1
Strong evidence for H0
0.01 – 0.03
Very strong evidence for H0
< 0.01
Extreme evidence for H0
24
First Edition
We can now take p-values, effect size, and Bayes factors into account when determining the strength of our evidence for either the Alternative Hypothesis (H1) or the Null Hypothesis (H0). The table below was suggested by Wetzels et al (2011). Statistic
Interpretation
P-value < 0.001 0.001 – 0.01 0.01 – 0.05 > 0.05
Decisive evidence against H0 Substantive evidence against H0 Positive evidence against H0 No evidence against H0
Effect Size (Cohen’s d) < 0.2 0.2 -0.5 0.5 – 0.8 > 0.8
Small effect size
Small to medium effect size Medium to large effect size Large to very large effect size
Bayes Factor (BF) > 100 30 – 100 10 – 30 3 - 10 1–3 1 1/3 – 1 1/10 – 1/3 1/30 – 1/10 1/100 – 1/30 < 1/100
Decisive evidence for H1 Very strong evidence for H1 Strong evidence for H1 Substantial evidence for H1 Anecdotal evidence for H1 No evidence
Anecdotal evidence for H0 Substantial evidence for H0 Strong evidence for H0 Very strong evidence for H0 Decisive evidence for H0
Evidence Categories for p-values (adapted from Wasserman, 2004, p. 157), for Effect Sizes (as proposed by Cohen, 1988), and for Bayes Factor BF10 (Jeffreys, 1961).
Statistical Assumptions The mathematics behind both Bayesian and Frequentist statistical analysis are based on certain assumptions about the data. These assumptions allow the results to “work” across a wide array of scenarios, while still performing reliably and providing results that accurately describe the statistical model. 25
JASP Guide
Two of the most common assumptions are concerned with the Normality of the data and its Homogeneity.
Example Bell Curve
Real world data is often quite messy and rarely conforms to this perfectly shaped distribution. The goal of normality is not to have data that mirrors the bell curve but to have our data approximate this distribution curve as closely as possible. Homogeneity assumes that the data of our sample populations have equal variation, or equal enough that when the expected and observed variances are graphed on a scatterplot they would form something that resembles a line.
Example Variance Between Two Samples
We should keep in mind that both Normality and Homogeneity are assumptions in the statistical models, but these are not requirements. In the end it is up to the researcher to use good judgment when considering the statistical assumptions. A novice researcher should adhere to these basic assumptions as closely as possible.
26
First Edition
Chapter 4 Getting Started with JASP
Preparing the Data and Making Decisions Now that we have data collected from our study it is time to perform some analysis to address the research questions posed. One of the first choices made has to do with entering your data into an application so that you can actually do the analysis. Depending on the kind of data you have collected there are many choices available. We will focus on using continuous and categorical data sets with the JASP statistical analysis application.
Creating Your Variable Codebook Whether you plan to perform data entry into a spreadsheet first or not, you will need to create a codebook for your data. The codebook is used as a planning tool and a quick reference guide to your data. There are some questions that must be addressed prior to data entry.
27
JASP Guide
Continuous Data • How large is your data set? • Will all the data be manually entered into the spreadsheet? • How many decimal places are required for your data? • How will you “name” the data for easy reference? • Are there any outliers in the data? • How will you handle outliers? Categorical Data • What are the value names for each data item? • How will you represent each value name with an integer value? • Is your data nominal or ordinal? How will this guide the decision for selecting values?
Data Types and Analysis Methods With any categorical data, we begin by converting the labels, such as male/female or private school/public school, into numerical values that can be manipulated in the analysis application. In the case of our “Sample Student Data” (SSD.sav) sample dataset the codebook is represented in the table below. CODEBOOK for High School and Beyond Data:
Variable Name gender
Variable type Categorical
Variable Label
Variable Value
Gender
0=male 1=female
race
Categorical
Race
1=Hispanic 2=Asian 3=African American 4=White
ses
Categorical
Socioeconomic status
1=low 2=middle 3=high
schtyp
Categorical
School type
1=public 2=private
28
First Edition
prog
Categorical
Program type
1=general 2=academic 3=vocational
read
Continuous
Reading score
write
Continuous
Writing score
math
Continuous
Math score
science
Continuous
Science score
socst
Continuous
Social Studies score
firstgen
Numerical
First Generation College
0=parents did not attend college 1=parents attended college
attcollege
Numerical
Attended College
0=no college after HS 1=college after HS
Some of the data in our sample set are nominal in nature. There is not any order to the labels and an order should not be implied from the values. For example, the value for socioeconomic status has been listed “low, middle, and high” with values of “1, 2, and 3” assigned respectively. This does not imply the “low” is first, “middle” is second, and “high” is third. These labels could have been placed in any order and assigned any value. On the other hand if our sample data had involved grade level or degrees attained, then we might be able to assign values based on an order, so this would represent ordinal data. Sample CODEBOOK for Schooling Data:
Variable Name id
Variable type Categorical
Variable Label
gender
Categorical
Gender
grade
Categorical
GradeLevel
Variable Value
Student ID 0=male 1=female 1=First grade 2=Second grade 3=Third grade 4=Fourth grade 29
JASP Guide
degree
Categorical
Degree Attained
1=Not a High School Graduate 2=High School Diploma 3=Bachelors degree 4=Graduate degree
Data in JASP JASP is able to handle and distinguish between four variables, or data types: 1. 2. 3. 4.
Nominal Text, also known as String data Nominal Categorical data representing “names” Ordinal Categorical data representing “names in an order” Continuous real numbers such as integers, whole numbers, decimals
Nominal Text variables are typically used to identify aspects of the data that will not be part of the statistical analysis. This data merely contains descriptions, key words, or some other type of text information. Nominal variables are categorical variables that are represented by numeric values. For example, a variable “Gender” may have levels “0” and “1” representing males and females respectively, as we find in the SSD example data used in this guide. Even though these are numbers, they do not imply an order, and the distance between them is not meaningful. Ordinal variables are categorical variables with an inherent order. The ordinal variables could represent some sort of order such as a grade level with 1 = Freshman, 2 = Sophomore, 3 = Junior, and 4 = Senior. Note that the distance between the numbers is not meaningful. JASP assumes that all ordinal variables have been assigned numeric values. Continuous variables are variables with values that allow a meaningful comparison of distance. Examples include money, distance, and test scores. Often we make the assumption that rubric scores are continuous variables since the rubric levels of 1, 2, 3, and 4 should represent some meaningful difference between them. Variable types in JASP are often enforced. This means that JASP will not allow performing a categorical variable analysis using continuous data or performing continuous variable calculations with categorical (nominal and ordinal) data.
30
First Edition
Using .SAV in JASP The JASP statistical application is able to natively use “.sav” file formats. This means that files created within SPSS, or any other compatible application can be easily opened with JASP without having to convert the file. Click on the “File” icon from the top left corner of the screen, then use the dialogue windows to select the .sav data file from your computer.
JASP Open File icon
JASP Open Dialogue Box
The file will open with the variable names and data.
JASP Data View window with Data
31
JASP Guide
JASP can also open files in other formats such as .csv (comma-separated values), .txt (plain text), .sav (IBM’s SPSS), and .ods (OpenDocument Spreadsheet).
Spreadsheets in JASP Setting Up the Spreadsheet JASP is able to use OpenDocument Spreadsheets (.ods). Spreadsheet files need to have a header row that contains names for each of the columns or variables. Missing values can either just be missing (i.e., an empty cell) or be denoted by “Nan”, “.” (period), or ” ” (space). Getting the spreadsheet setup and the data entered is a simple process. There are a few key points to keep in mind: •
•
• •
In the spreadsheet, the columns contain each variable or data type and the rows represent each case in the study. This is similar to the way JASP displays the data. The first row of the spreadsheet should be the variable names with row 2 containing the first data. Variable names should be short, but meaningful to you. Categorical data must be entered as its numerical value and not the name. The codebook you created will come in handy for this process. Enter all the data.
OpenOffice Spreadsheet with data labels shown as numerical values vice names
32
First Edition
Opening the Spreadsheet To open the .ods file in JASP, click on the File tab at the top of the screen. Using the dialogue box, navigate to the .ods file to select and open the file. JASP will use the entered data to determine the type of data in each column, i.e. Nominal, Ordinal, or Continuous. The data type will be represented by an icon next to the column header.
The ruler icon represents Continuous data, the bar graph icon represents Categorical Ordinal data, and the Venn Diagram icon represents Categorical Nominal data. If JASP assigned an incorrect data type to your column, this can be corrected by clicking on the icon to activate a pull down menu. The data type can be changed by selecting the desired type.
33
JASP Guide
If the data type is Categorical, you can assign labels to the values by clicking on the column header. This will show the Label window. When the .ods file is opened by JASP, the values and the labels will initially be the same.
To change the label, click in the Label field and enter the desired text.
34
First Edition
Chapter 5 Hypothesis Building
The Alternative Hypothesis Let’s talk about the alternative hypothesis (H1). As we enter variables and factors into the JASP dialogue windows, we are actually creating the alternative hypothesis (H1). All of the results tables will be based on this hypothesis to uncover any associations or relationships in the data. As discussed previously, there are two competing hypothesis at work in statistical analysis. They are the alternative hypothesis (H1), as set-up or expressed by the researcher, and the null hypothesis (H0) that states there are no associations or relationships within the data.
Hypothesis Setting JASP allows us to set the hypothesis type. There are three types that can be selected; simple differences, positive differences, and negative differences.
JASP Hypothesis settings; Two-Tailed
35
JASP Guide
The simple differences setting is represented by “Group 1 ≠ Group 2”. In statistical terms this would be a two-tailed analysis. The “two-tail” refers to the fact that our analysis is measuring differences at both ends of the distribution curve and are outside the pre-set critical value (α). Recall that in Frequentist statistics we often use the 95% confidence level giving us a critical value of p = 0.05.
Two-Tailed Distribution
In the figure above, we see the tails represented by the shaded region under the curve. Both these regions would be outside the confidence level. The analysis would be interested in differences that fell within either shaded region under the curve.
JASP Hypothesis settings; One-Tailed (positive)
A Positive difference setting is represented by “Group 1 > Group2”. This would be considered a one-tailed analysis. The “one-tail” in this case refers to the fact that we
36
First Edition
are interested in only one of the critical sections under the curve, namely the shaded region to the right of the median.
One-Tailed Distribution positive direction
In the figure above, our one-tailed test is represented by the shaded region to the right of the upper critical value. The analysis would only be interested in differences that fell in this region.
JASP Hypothesis settings; One-Tailed (negative)
A Negative difference setting is represented by “Group 1 < Group2”. This would also be considered a one-tailed analysis. The “one-tail” in this case refers to the fact 37
JASP Guide
that we are interested in only one of the sections under the curve, namely the shaded region to the left of the median.
One-Tailed Distribution negative direction
In the figure above, our one-tailed test is represented by the shaded region to the left of the lower critical value. The analysis would only be interested in differences that fell in this region. To summarize, when we are building a hypothesis, the two-tailed analysis indicates that we are simply interested in differences that may be present in the data regardless of direction. In a one-tailed positive analysis, we are stating a hypothesis that the differences are greater than the expected mean. In a one-tailed negative analysis, we are stating a hypothesis that the differences are less than the expected mean. A one-tailed hypothesis represents a stricter level of differences and those differences must be more pronounced to be considered statistically significant.
Contingency Table Hypothesis The Contingency Table is useful when analyzing categorical data, such as nominal or ordinal data. The Chi Square analysis method can determine if the counts, frequencies, or percentages are approximately the same as the expected values. In many social science and educational research questions we are interested if there is a relationship between two categorical variables. This analysis method is referred to as a Chi Square Test of Independence. 38
First Edition
The symbols in each dialogue box indicate the data types that may be entered for analysis.
Contingency Table Hypothesis
The Contingency Table hypothesis building window allows us to enter categorical data into rows and columns. The resulting table will display these as percentages and counts to be analyzed. One way to think of the rows and columns is for the rows to represent an independent variable, or factor, and for the columns to represent a depended variable, or factor. In this sense our hypothesis would be that the row factor has some impact on differences in the column factor.
Differences in Program enrollment based on SES group
39
JASP Guide
In the example above the row contains data about a student’s SES grouping while the column contains data about school program enrolled, such as vocational education, general education, or college prep. The alternative hypothesis for this example could be stated as our belief that a student’s SES grouping has some impact on the school program choices of students. This model would explore differences in program type enrollment based on the student SES group.
Relationship Hypothesis Relationships, or differences within the data, are typically investigated with either a tTest or Analysis of Variance (ANOVA). With both of these methods we will build a hypothesis about differences within some continuous data, such as an assessment measure or test score, based on a grouping factor, such as school type or SES group.
t-Test Hypothesis window
The t-Test hypothesis window allows us to enter the dependent variable(s) and some grouping factor. The dependent variable is the continuous measure, as indicated by the ruler icon in the window. Any number of dependent variables can be entered into this window, creating multiple analysis results. The grouping variable will be the categorical data that we believe has an impact on the differences within the dependent variable.
40
First Edition
Hypothesis of science scores and school type
In this example the alternative hypothesis can be stated as our belief that a student’s science test scores will have differences that are based on the school type they attended, either public or private school. The ANOVA analysis is similar to the t-Test, in that there is a continuous dependent variable with differences measured across three or more independent categorical factors, referred to as Fixed factors.
ANOVA Hypothesis Building
When multiple Fixed Factors are entered we can test for interaction effects between these factors.
41
JASP Guide
Hypothesis of reading and program type
In this example the alternative hypothesis can be stated as our belief that student reading scores will have differences that are based on their program type enrollment.
Association Hypothesis Association testing is concerned with the predictive nature of one variable with respect to another variable. Regression analysis is typically used for this sort of question.
Regression Hypothesis Building
The regression hypothesis building window has Dependent Variables and Covariates. The dependent variable is some measure that we believe can be predicted based on the covariate. Both of these variables are continuous data.
42
First Edition
Hypothesis of reading scores and math scores
In this example the alternative hypothesis can be stated as our belief that a student’s reading test scores can be predicted from their math test scores.
43
JASP Guide
44
First Edition
Section II: Descriptives and Data Visualization
45
JASP Guide
Chapter 6 Descriptive Statistics
What are descriptive statistics? Descriptive statistics are used to characterize the data in order to make decisions about the nature, tendencies, and appropriate inferential statistics that can be used to analyze the data or to simply make initial analysis decisions. In descriptive statistics we look at various measures such as mean, median, mode, and standard deviations.
Creating Descriptive Statistics in JASP for Categorical Data Categorical data is best described by exploring the frequencies within the data. The frequencies will display the percentages of each category within the data set. The following examples will use the Sample Student Data (SDS) dataset; see Resources for Codebook and data values.
46
First Edition
Click on the Descriptives pull down menu, select “Descriptive Statistics”.
JASP Descriptives Tab
Descriptive Statistics for Categorical (Nominal and Ordinal) Data When considering Categorical data, it is often helpful to investigate that data from a descriptive standpoint. Here we are interested in frequencies, counts, comparisons between categories, etc. The Descriptive Statistics’ dialogue window, on the left side of the JASP window, allows us to select the measures and visualizations that are needed to get a clear sense of our data. As these measures are selected, the results will appear in the JASP window on the right side.
JASP Descriptive Statistics Dialogue Window 47
JASP Guide
Begin by moving a Categorical item into the Variables window. This can be accomplished by either clicking and dragging the item or by selecting the item and clicking the arrow between the two boxes.
Descriptive Statistics for SES
Once the Categorical data item is moved into the Variables box, the Descriptives results table will update with the new information. The next step will be considering which measures make sense for the data you are using. Checking “Frequencies Tables” below the Variables window will create a table in the Results window that displays the frequencies of each item in the Category. In our example we can see the counts and frequencies for low, middle, and high SES groups contained within the data.
48
First Edition
Now we can move on to the Plots settings to decide which graphical visualizations make sense for our data. In this case we will only be selecting “Distribution Plots” for our Categorical data since neither Correlation or Box plots would provide us with useful information about the SES category.
Distribution Plot for SES category
Next we can select settings from the “Statistics” section. In the case of our Categorical data the only setting that will be the most useful is “Mode”. As we deselect settings that are not needed and select other settings to view, our Descriptive Statistics table on the right side window will automatically update.
Statistics Settings in JASP
49
JASP Guide
After all the Descriptive settings have been selected, click the “OK” button. If you need to go back and make changes to any of the previously selected settings, simply click in the output section of the Results window and the settings will be displayed in the left side window. Within the JASP Results window, we can include our own thoughts, notes, ideas, and points of interest about the data. For each output section of the Results window you will find a small pull-down arrow when the pointer hovers over the output.
Clicking the arrow will bring up the dialogue menu to “Add Note”.
The researcher’s note can then be entered into a new section above the output table.
One of the powerful features of JASP is that the output tables and graphics can easily be imported into a word processing document. The JASP Results section also displays the tables in APA format. Use the same pull-down arrow, you can select “Copy”, to place the table into your clipboard allowing it to be pasted into a different application or document. 50
First Edition
When the pull-down for graphs is selected, we are presented with the “Copy” option as well as a “Save Image As” option, allowing us to export the graph as a .png image.
Save or Copy Graphs
JASP has the capability to create graphs that are “Split” by some other variable in your data. For example, we could examine the SES groups and how they are split, or represented, within the different school types (private or public school). To create these split graphs, move the main variable into the Variable box and move the organizing variable into the “Split” box. Be sure to select “Distribution Plots” as well.
51
JASP Guide
The JASP Results window will produce two separate graphs of this data.
Private School Enrollment by SES
Public School Enrollment by SES
By exploring the descriptive statistics for our data, questions may begin to emerge and suggest directions for further investigation.
Descriptive Statistics for Continuous Data When considering Continuous data, it is often helpful to investigate that data from a descriptive standpoint. These descriptive statistics should include frequencies, counts, mean, median, mode, standard deviation, etc. The Descriptive Statistics’ dialogue window, on the left side of the JASP window, allows us to select the measures and visualizations that are needed to get a clear sense of our data. As these measures are selected, the output will appear in the JASP Results window on the right side.
JASP Descriptive Statistics Dialogue Window
52
First Edition
Begin by moving a Continuous item into the Variables window. This can be accomplished by either clicking and dragging the item or by selecting the item and clicking the arrow between the two boxes.
Descriptive Statistics for Writing
Once the Continuous data item is moved into the Variables box, the Descriptives results table will update with the new information. The next step will be considering which measures make sense for the data you are using. We can look at the Plots settings to decide which graphical visualizations make sense for our data. In this case we will be selecting “Distribution Plots” for our Continuous data as well as the Box plots to provide us with useful information.
Distribution Plot for the Writing assessment
53
JASP Guide
The Distribution plot will display a histogram of the data with a curve superimposed over the graph. This curve will help us determine if the data has a “somewhat” normal distribution. Box plots can also help us visualize how the data is spread out across the quartiles. Each section of the Box plot contains 25%, or one-quarter, of the data. In our example data set of 200 students, each section contains 50 student scores.
In the case of our Writing assessment scores, we can notice from the Box plot that the lowest quartile of scores is more spread out than the upper quartile of scores. This could be something to consider as we begin analyzing the data. Next we can select settings from the “Statistics” section. In the case of our Continuous data, most of these settings will provide us with meaningful information as we examine the measures and begin to make sense of the data. As we select settings that are needed, the Descriptive Statistics table on the right side window will automatically update.
Statistics Settings in JASP
54
First Edition
After all the Descriptive settings have been selected, click the “OK” button. If you need to go back and make changes to any of the previously selected settings, simply click in the output section of the Results window and the settings will be displayed in the left side window. There are a few terms that may be useful; •
Standard Deviation (Std Dev column) gives a measure of the variation in our test scores from the mean. The scores can be described as 1 standard deviation from the mean, or 2 standard deviations from the mean, or 3 standard deviations from the mean. The standard deviations follow the 6895-99 rule in statistics, in that 68% of the data falls within the first standard deviation, 95% of the data falls within the second standard deviation, and 99% of the data falls within the third standard deviation.
Standard Deviation Diagram
•
Kurtosis describes the “peakness” of the data. A kurtosis value of zero represents data that resembles a normally distributed data set. Positive values represent data with a leptokurtic distribution, or very high peaks, and negative values represent data with a platykurtic distribution, or one that is more flat.
•
Skewness gives us information about the distribution of data from the mean. A skewness value of zero would have data evenly distributed and balanced around the mean. A positive skewness value indicates data weighted 55
JASP Guide
more heavily to the right of the mean and a negative skewness value indicates data weighted to the left of the mean. As we saw in the previous examples, JASP has the capability to create graphs that are “Split” by some other variable in your data. For example, in this case we could examine the Writing assessment and how they are split, or represented, within gender to observe any differences between the scores of girls and boys. To create these split graphs, move the main variable into the Variable box and move the organizing variable into the “Split” box. Be sure to select “Correlation Plots” as well as Boxplots.
In the case of Continuous data that has been split by some selected category, in this case we are considering the Writing scores split across gender, the Box plot may contain some useful clues about the data.
56
First Edition
Box Plot Writing Score by Gender
When exploring continuous data “split” by some categorical data, the correlation plot may be a useful visualization. In the Plots section, check the Correlation Plot box.
JASP Plots section
The Correlation Plot will produce a density graph for each of the categories in the “Split” variable. Male
Female
Density Plots for Writing Scores split by Gender 57
JASP Guide
By exploring the descriptive statistics for our data, questions may begin to emerge and suggest directions for further investigation. In the case of our writing scores we may notice that the median score for girls is higher than the boys’ median score. We can also see that the writing assessment for boys seems to have a greater spread and variability. Once the data’s descriptive statistics has been explored, patterns may have emerged and questions may have arisen based on the data. We are now ready to begin inferential statistical analysis.
58
Section III: Frequentist Approaches
59
Chapter 8 Relationship Analysis with t-Test
t-Test Analysis (Continuous Differences, two groups) The t-Test (also called Student’s t-Test) compares two averages, or means, and indicates if they differ from one another other. The t-Test also tells you the significance the differences. In other words, the t-Test lets you know if those differences could have occurred by chance. There are three t-Tests methods available in JASP; 1. One Sample t-Test: Compares the mean score of a sample to a known mean score. The known mean is typically referred to as the population mean. 2. Independent Samples t-Test: Compares the means scores of two groups on a given variable. 3. Pair Samples t-Test: Compares the means of two variables. This is commonly used for groups with pre- and post- test measures.
69
JASP Guide
One Sample t-Test using JASP The following examples will use the Sample Student Data (SDS) dataset; see Resources for Codebook and data values. The question we are asking this data is whether or not there are differences in the students’ science scores compared to a “hypothetical” National average science score? Using the t-Test tab, select One Sample t-Test. With One Sample t-Test we will investigate the differences between one group’s performance on a measure compared to some known average on the SAME measure.
JASP One Sample t-Test Menu
When using the One Sample t-Test dialogue window, select the test variable. Be sure to move the measure to be compared into the test variable window. Move the variable by either highlighting the variable and clicking on the arrow next to the variable window or by dragging the variable into the window. In this example we are testing the science assessment compared to the “National average” scores for the science assessment. This comparison is a hypothetical example.
One Sample t-Test window
70
First Edition
We will also enter the known average for this measure. The known average would come from a source outside of your own data collection, such as a norm referenced test or some national assessment measure that has a published average. In this example we are using a score of “50” to represent the known average for this assessment.
One Sample t-Test with known average of “50”
In the Tests settings, we have checked both Student’s t-Test and the Wilcoxon signed-ranked t-Test. The Student’s t-Test should be used if the data meets our assumption of “Normality”, while the Wilcoxon signed-ranked t-Test is designed to be used with data that does not meet this “Normality” assumption. For this example we are testing if the sample population science assessment score is “different” than the known average. This can be found in the Hypothesis setting with “≠ Test Value”. You could also select one of the directional hypothesis if there was a reason to test for directional differences. The Assumption Checks setting should be used to verify normality within our data. As stated above, if the data meets the normality assumption, we can use the Student’s t-Test. If the data does not meet the normality assumption, we can use the Wilcoxon signed-rank t-Test. We will also select the effect size and descriptives tables for the Results output. With many of the Frequentist analysis methods, there is an “assumption” that the data exhibits a normal distribution. There are two ways to check for normality within your data; mathematically or graphically.
71
JASP Guide
JASP includes the mathematical normality check as an option in the analysis settings, as seen in the figure above. If the data does not “pass” this mathematical normality check, then the results table will contain a warning. However if the sample size is either too small or too large, the normality check may incorrectly measure the normality of the data. The graphical normality check involves the researcher inspecting the distribution plot of the data and determining if it appears to have normal distribution characteristics. science
Distribution plot of Science assessment
The Assumptions Checks in the JASP settings will give us a mathematical measure of normality.
Interpreting the Results Tables for One Sample t-Test We can begin the analysis process by checking the assumptions.
Assumption Checks Test of Normality (Shapiro-Wilk) W science 0.99
p 0.03
Note. Significant results suggest a deviation from normality. The results indicate that there is a statically significant deviation from the normal distribution within our data (p = 0.03). We should note that the actual p-value is fairly close to the critical value of p = 0.05. This result is consistent with the distribution graph of our data. The resulting output table will show the significance level (p-value), along with the mean difference and the confidence interval for the mean difference. In this case we 72
First Edition
find that the p-value for our measure when compared to the known mean is statistically significant. One Sample T-Test Test science Student Wilcoxon
Statistic 2.64 9835.50
df 199
p 0.01 0.01
Effect Size 0.19 -0.02
Note. For the Student t-test, effect size is given by Cohen's d ; for the Wilcoxon test, effect size is given by the matched rank biserial correlation.
Descriptives N Mean SD SE science 200 51.85 9.90 0.70 Output table for Science score compared to known mean of 50
In this analysis we will use the Wilcoxon test since the data does deviate from the normality assumption. The One Sample T-Test results table indicates that the mean score in our sample population is statistically different than the known average of “50” (p = 0.01). We also find that the sample has a very small effect size (d = 0.02).
Independent Samples t-Test using JASP The following examples will use the Sample Student Data (SDS) dataset; see Resources for Codebook and data values. The question we are asking this data is whether or not there are differences in student science scores based on their gender? Using the t-Test tab, select “Independent Samples T-Test” from the pull-down menu. With Independent Samples we will investigate the differences between TWO groups on the SAME measure.
JASP Independent Samples t-Test Menu 73
JASP Guide
When using the Independent Samples t-Test select the test variable for the dependent variable window and the groups or factor for the grouping variable window. Be sure that the grouping variable only contains two grouping factors. In this example we will be looking for differences in the science assessment based on a student’s gender (boys and girls).
Independent t-Test Sample Window
If you select a grouping variable that contains more than two groups, the JASP results window will give an error message.
JASP Grouping Variable error message
74
First Edition
We will also use the menu below the variable window to select the settings needed for this analysis. In this example we are using the Student t-Test.
Independent t-Test Settings Window
The hypothesis that we will use in this example is that the science assessment scores for boys and girls are different. Our hypothesis does not indicate a direction for the differences or assume which group might outperform the other. We will also check the settings to calculate the effect size and to produce a descriptives table for the science assessment. The Student’s t-Test has two assumptions; 1) the data has a generally normal distribution and 2) the data does not exhibit significant variation. The assumption checks section of the settings allows us to verify these assumptions.
JASP Assumption Checks Settings
Recall that the mathematical assumption check for normality can be influenced by sample size, so a researcher should check the normality of the data both mathematically and graphically. But what if our data violates these assumption checks and either does not have a normal distribution or exhibits too much variation within the data?
75
JASP Guide
t-Test Selection
Well, luckily for us, the JASP Independent Samples t-Test includes two variations of the test that do not rely on these assumptions; the Welch Test and the MannWhitney Test.
Interpreting the Results Tables for Independent Samples t-Test The results tables below were produced using the Sample Student Data (SSD) found in the Resources section. Theses tables represent the Independent Samples t-Test for differences in science assessment scores between boys and girls in the sample population. Reviewing the Results window should begin with the Assumption Checks. Recall that the Student’s t-Test assumes that your data generally has a normal distribution and that there is not significant variance within the data.
Assumption Checks Test of Normality (Shapiro-Wilk) science
W 0.97 0.99
male female
p 0.06 0.27
Note. Significant results suggest a deviation from normality. Test of Equality of Variances (Levene's) F df p science 3.61 1 0.06 Assumptions Check Results Tables
The Test of Normality is not statistically significant (p = 0.06) for either the boys’ science score or for girl’s science scores (p = 0.27). This result indicates that the data does approximate the normal distribution. Levene’s Test of Equality of Variances is NOT statistically significant (p = 0.06), therefore we can assume that the variances are equal.
76
First Edition
With these assumptions met, we can move on to the Student’s t-test. Independent Samples T-Test t df p Cohen's d science 1.81 198.00 0.07 0.26 Note. Student's t-test.
The results of the Student’s t-Test indicate that the differences in science assessment scores are not significantly different since the p-value is greater than 0.05 (p = 0.07).
Interpreting the Results Tables for Independent Samples t-Test (Pt2) Let’s take a second look at interpreting the results tables produced from an Independent Samples t-Test. The question we are asking this data is if there are differences in a student’s math score based on whether or not a student would be a first generation college student, i.e. whether or not their parents attended college? In this scenario we will be looking for differences in the math assessment scores between students whose parent attended college and those whose parents did not attend college. The same settings have been selected in the settings window as shown in the previous example. Again, we will begin with the assumption checks for the data. Test of Normality (Shapiro-Wilk) W math Parents no college 0.92 Parents have college 0.98
p < .001 0.11
Note. Significant results suggest a deviation from normality.
The mathematical test of normality indicates that some of the data deviates from the expected normal distribution. The p-value for the first generation college student groups is less than 0.001, making the deviation statistically significant. We also check Levene’s Test of Equality of Variances. The results show that the differences in the variances are not statistically significant (p = 0.50), therefore we can assume equal variances.
77
JASP Guide
Test of Equality of Variances (Levene's) F df p math 0.46 1 0.50 Levene’s Test for science assessment
Given that one assumption needed for the Student’s t-Test was violated, we will use Welch’s t-Test since it does not rely on the assumptions of a normal distribution and equal variances within the data. Independent Samples T-Test Test Statistic df p Cohen's d math Student -9.39 198.00 < .001 -1.33 Welch -9.43 197.53 < .001 -1.33 Independent t-Test Results Table
Using Welch’s t-test we can see that the differences in math assessment scores are significantly different between the groups (p < 0.001) with a large effect size (d = 1.33).
Effect Size Calculation
Statistics Test
Small Effect
Medium Effect
Large Effect
Cohen’s d
t-Test (Paired & Independent)
0.2
0.5
0.8
The descriptives for this sample can give us more information about the differences. Group Descriptives Group N Mean SD SE math Parents no college 94 47.14 7.51 0.77 Parents have college 106 57.53 8.07 0.78 Math Assessment Descriptives
The math assessment mean average for the students whose parent went to college was over 10 points greater when compared to the mean score for the the other group whose parents did not attend college.
78
First Edition
Paired Samples t-Test using JASP The following examples will use the TestScores dataset; see Resources for Codebook and data values. The question that we are asking this data is whether or not there are differences in student achievement from their pre-test assessment to their post-test assessment? Using the t-Test tab, select Paired Samples t-Test. With Paired (Dependent) Samples we will investigate the differences between ONE group on the TWO different measures, such as two assessments or a pre-/post-assessment.
JASP t-Test Menu
When using the Paired Samples t-Test, in the dialogue box select the TWO variables. Use the arrow to move each variable into “Test Pair(s) window” or drag the desired variables into the window. In this example we will be looking at a mathematics pre- and post-test.
Paired Samples t-Test Window
The test settings and requirements are similar to the other t-Tests, such as the Independent Samples t-Test and the One Sample t-Test. 79
JASP Guide
Paired Samples t-Test Settings
Here we have selected both the Student’s t-Test, used for data that conforms to our assumption rules of normality, as well as the Wilcox signed-rank test used for data that does not exhibit normal distribution.
Interpreting the Results Tables for Paired Samples t-Test We will begin with the assumption check for normality. Test of Normality (Shapiro-Wilk) Before_TI
- After_TI
W 0.97
p 0.72
Note. Significant results suggest a deviation from normality.
The results indicate that the Test of Normality is not significant (p = 0.72), therefore we can assume that the data has generally normal distribution. Paired Samples T-Test t df p Cohen's d Before_TI - After_TI -3.23 19 0.00 -0.72 Note. Student's t-test.
The Paired Samples t-Test table indicates that the differences in pre- and post- test are statistically significant (p < 0.001) with a large effect size (d = -0.72).
80
Section IV: Bayesian Approaches
139
Chapter 18 Relationship Analysis with Bayesian t-Test
Bayesian t-Test Analysis (Continuous Differences, two groups) The t-Test (also called Student’s t-Test) compares two averages, or means, and indicates if they are different from each other. The t-Test also suggests how significant the differences are from one another. In other words, it lets you know if those differences could have happened by chance. There are three t-Tests methods available in JASP; 1. Bayesian One Sample t-Test: Compares the mean score of a sample to a known mean score. The known mean is typically referred to as the population mean. 2. Bayesian Independent Samples t-Test: Compares the means scores of two groups on a given variable. 3. Bayesian Pair Samples t-Test: Compares the means of two variables. This is commonly used for groups with pre- and post- variables.
149
JASP Guide
Bayesian One Sample t-Test using JASP The following examples will use the Sample Student Data (SDS) dataset; see Resources for Codebook and data values. The question we are asking this data is whether or not there are differences in the students’ science scores compared to a “hypothetical” National average science score? Using the t-Test tab, select Bayesian One Sample t-Test. With One Sample t-Test we will investigate the differences between one group’s performance on a measure compared to some known average on the SAME measure.
JASP Bayesian One Sample t-Test Menu
When using the One Sample t-Test dialogue window, select the test variable. Be sure to move the measure to be compared into the test variable window. Move the variable by either highlighting the variable and clicking on the arrow next to the variable window or by dragging the variable into the window. In this example we are testing the science assessment compared to the “National average” scores for the science assessment. This comparison is a hypothetical example.
One Sample t-Test window
150
First Edition
We will also enter the known average for this measure. The known average would come from a source outside of your own data collection, such as a norm referenced test or some national assessment measure that has a published average for the entire test group. In this example we are using a score of “50” to represent the known average for this assessment.
One Sample t-Test with known average of “48”
For this example, we are testing if the sample population science assessment score is “different” than the known average. This can be found in the Hypothesis setting with “≠ Test Value”. You could also select one of the directional hypothesis if there was a reason to test for directional differences. The Bayes Factor of BF10 is selected to produce the probability or likelihood of the Alternative Hypothesis (H1) compared to the Null Hypothesis (H0). We could have also selected the inverse of this hypothesis, namely the probability or likelihood of the Null Hypothesis (H0) compared to the Alternative Hypothesis (H1). In some cases the Bayes Factor will be a decimal probability, which may be more difficult to interpret. In such cases this can be remedied by selecting the other Bayes Factor argument. The resulting probability may be more meaningful or clearly interpreted. In the Plots settings we are interested in the Prior and Posterior plots with additional information as well as the Bayes factor robustness check with additional information.
151
JASP Guide
Prior setting
The default Prior of 0.707 is set by the JASP default setting. Unless there is some other evidence to suggest an alternative or updated prior width, the default should be used for the analysis. It should be noted that under the effect size suggestions of Cohen (1988), a t-Test large effect size of 0.8 is recommended.
Interpreting the Results Tables for Bayesian One Sample t-Test The resulting output table will show the Bayes Factor for our data. Bayesian One Sample T-Test BF₁₀ science 2.33
error % 7.16e -5
Note. Test hypothesis is population mean is different from 50 Descriptives N Mean SD SE science 200 51.85 9.90 0.70 Output table for Science score compared to known mean of 50
The Bayesian One Sample T-Test results table indicates that the mean score in our sample population is statistically different than the known average. The Alternative Hypothesis that the scores will be different is 2.33 times more likely than the Null Hypothesis of no difference in scores, as described by the Bayes Factor. The Bayes Factor (BF10) = 2.33 can be interpreted as “anecdotal evidence” in favor of the alternative hypothesis or as some authors state, “barely worth mentioning”.
152
First Edition
When we examine the plots that were selected, the Prior and Posterior plot as well as the Bayes Factor robustness check, we can get a sense of the strength of the evidence.
Prior and Posterior
The Prior and Posterior plot gives a visual representation of how the posterior prior changes with respect to the prior given the evidence. Here we see the Bayes Factor for both the Alternative hypothesis compared to the Null (BF10 = 2.33) and the Null hypothesis compared to the Alternative (BF01 = 0.43). We are also shown a revised median for the effect size range, here given as median = 0.184. The default median range is set at zero. The 95% confidence interval is also given for the posterior distribution [0.014 to 0.323]. The circle on the posterior graph being lower than the circle on the prior graph indicates that the evidence is in favor of the alternative hypothesis. Since the circles are so close together on the graph this also suggests that the evidence is weak.
153
JASP Guide
Bayes Robustness check
The Bayes robustness check gives the researcher a sense of how the Bayes factor would change given different prior distributions. In the case of our example above, the highest BF was achieved with a very small prior distribution of less than 0.25 Cauchy prior width. Our prior width of 0.707 had similar BF’s as the wide and ultrawide Cauchy prior widths, all falling within the “anecdotal” range.
Bayesian Independent Samples t-Test using JASP The following examples will use the Sample Student Data (SDS) dataset; see Resources for Codebook and data values. The question we are asking this data is whether or not there are differences in student math scores based on whether or not the student attended college after high school graduation? Using the t-Test tab, select Bayesian Independent Samples t-Test from the pull-down menu. With Independent Samples we will investigate the differences between TWO groups on the SAME measure.
154
First Edition
JASP Independent Samples t-Test Menu
When using the Bayesian Independent Samples t-Test select the test variable for the dependent variable window and the groups or factor for the grouping variable window. Be sure that the grouping variable only contains two grouping factors. In this example we will be looking for differences in the math assessment based on after high school college attendance.
Independent t-Test Sample Window
If you select a grouping variable that contains more than two groups, the JASP results window will give an error message.
JASP Grouping Variable error message 155
JASP Guide
We will also use the menu below the variable window to select the settings needed for this analysis.
Bayesian Independent t-Test Settings Window
The settings for the Bayesian Independent Sample t-Test is very similar to the previous example. Our hypothesis states that the group means are not equal (Group 1 ≠ Group 2). We are interested in the Bayes Factor BF10 and the plots to determine the strength of our outcomes.
Prior setting
The default prior of 0.707 is selected as the default setting.
156
First Edition
Interpreting the Results Tables for Bayesian Independent Samples t-Test The results tables below were produced using the Sample Student Data (SSD) found in the Resources section. Theses tables represent the Bayesian Independent Samples t-Test for differences in math assessment scores between students who attended colleges after high school graduation and those who did not attend college. Bayesian Independent Samples T-Test error % BF₁₀ math 4.67e +30 2.25e -36 Group Descriptives Group N Mean SD SE math No college after HS 67 43.07 3.93 0.48 College after HS 133 57.47 7.39 0.64 Bayesian Independent Samples Results
The Bayes Factor (BF10) indicates that it is much more likely that the math scores will be different for those students who later attend college when compared to those who did not attend college. This result would be classified as “extreme” evidence in favor of the alternative hypothesis.
157
JASP Guide
Prior and Posterior Plots
The Prior and Posterior plot gives a visual representation of how the posterior prior changes with respect to the prior given the evidence. Here we see the Bayes Factor for both the Alternative Hypothesis compared to the Null (BF10 = 4.67 X 1030) and the Null Hypothesis compared to the Alternative (BF01 = 2.14 X 10-31). We are also shown a revised median for the effect size range, here given as median = -2.203. The default median range was set at zero. The 95% confidence interval is also given for the posterior distribution [-2.566 to -1.851]. The circle on the posterior graph being lower than the circle on the prior graph indicates that the evidence is in favor of the alternative hypothesis. Since the circles are farther apart on the graph this also suggests that the evidence is strong, or in this case “extreme” evidence.
158
First Edition
Bayes factor Robustness Check graph
The Bayes robustness check gives the researcher a sense of how the Bayes factor would change given different prior distributions. In the case of our example above, all of the BF Cauchy prior width ranges achieved BF results indicating “extreme” evidence for the Alternative hypothesis. Our prior width of 0.707 had very similar BF’s as the wide, ultrawide, and maximum Cauchy prior widths, all falling within the “extreme” range.
Bayesian Paired Samples t-Test using JASP The following examples will use the TestScores2 dataset; see Resources for Codebook and data values. The question that we are asking this data is whether or not there are differences in student achievement from their pre-test assessment to their post-test assessment? Using the t-Test tab, select Bayesian Paired Samples t-Test. With Paired (Dependent) Samples we will investigate the differences between ONE group on the TWO different measures, such as two assessments or a pre-/post-tests.
JASP t-Test Menu
159
JASP Guide
When using the Bayesian Paired Samples t-Test, in the dialogue box you will need to select the TWO variables. Use the arrow to move each variable into “Test Pair(s) window” or drag the desired variables into the window. In this example we will be looking at a Before_TI pre-test and the After_TI posttest.
Bayesian Paired Samples t-Test Window
Here we are interested in possible changes to a student’s test score after there is some testing intervention. The window above represents the Before Testing Intervention score compared to the After Testing Intervention score. The test settings and requirements are similar to the other t-Tests, such as the Independent Samples t-Test and the One Sample t-Test.
Bayesian Paired Samples t-Test Settings 160
First Edition
Here we have the Hypothesis setting, along with settings for the Bayes Factor, Descriptives, and the Plots.
Prior setting
The default prior of 0.707 is selected as the default setting.
Interpreting the Results Tables for Bayesian Paired Samples t-Test The results tables below were produced using the TestScores2 found in the Resources section. Theses tables represent the Bayesian Paired t-Test for differences in assessment scores before and after a testing intervention. Bayesian Paired Samples T-Test BF₁₀ error % Before_TI - After_TI 10.23 5.66e -4 Descriptives N Mean SD SE Before_TI 20 18.40 3.15 0.70 After_TI 20 20.45 4.06 0.91 Bayesian Independent Samples Results
The Bayes Factor (BF10) indicates that The Alternative Hypothesis is over 10 times more likely (BF10 = 10.23) than the Null Hypothesis. This result would be classified as “strong” evidence in favor of the alternative hypothesis that the test scores significantly changed after the testing intervention.
161
JASP Guide
Prior and Posterior Plots
The Prior and Posterior plot gives a visual representation of how the posterior prior changes with respect to the prior given the evidence. Here we see the Bayes Factor for both the Alternative Hypothesis compared to the Null (BF10 = 10.23) and the Null Hypothesis compared to the Alternative (BF01 = 0.09). We are also shown a revised median for the effect size range, here given as median = -0.878. The 95% confidence interval is also given for the posterior distribution [-1.566 to -0.274]. The circle on the posterior graph being lower than the circle on the prior graph indicates that the evidence is in favor of the alternative hypothesis. Since the circles are farther apart on the graph this also suggests strong evidence in favor of the Alternative hypothesis.
162
First Edition
Bayes factor Robustness Check graph
The Bayes robustness check gives the researcher a sense of how the Bayes factor would change given different prior distributions. In the case of our example above, the highest BF was achieved with a small prior distribution of about 0.5 Cauchy prior width. Our prior width of 0.707 had similar BF’s as the wide and ultrawide Cauchy prior widths, all falling within the “Moderate” and “Strong” evidence border range.
163
First Edition
Chapter 23 Concluding Thoughts
Hopefully you have found this guide to be a useful tour through statistical analysis. There are many more methods of analysis available to us through statistics. It is up to the researcher to understand the appropriate use of these methods and to be able to select the necessary tools to answer their questions. With descriptive statistics we are able to get a sense of the make-up and characteristics of our data. We gain information about the population demographics as well as any variation within the sample. The graphic representations, such as pie charts, histograms, and barcharts allow us to visually inspect the data. In our tests for differences within data we explored the uses of Chi Square analysis when dealing with categorical data, such as population descriptors. The t-Test uncovers differences within continuous data, such as test scores, between two groups. If we need to explore the differences within continuous data between more than two groups, the ANOVA analysis can provide this information. We are also able to determine associations between two data sets. Correlations, along with scatterplots, can point out both positive and negative associations. The strength of this association can also be measured. Once we know that an association exists,
187
JASP Guide
regression analysis can provide mathematical models so that we can make predictions about one variable when another is known. The true power of statistical analysis comes when we use it to test models and uncover subtle differences within our data. Thoughtful consideration of your statistical results can lead to rich questions about the world in which we live.
188
First Edition
Section V: Resources
189
Analysis Memos The purpose of writing an analysis memo is to keep all your analyses organized and to have written documentation of all the analysis you do. This way you will know what paths you went down, which ones lead to interesting places, and you will have the writing to include in your dissertation/paper when needed. I. Question This is expressed in terms that can be answered with our data. (ie. Are there gender differences in responses to questions X,Y,Z?) II. Method. (ie. summation of frequency counts, correlation, …) Include a rationalization for using this type of analysis if necessary. III. Results. Include tables and/or charts with results that could be input into formal writing if needed. IV. Discussion. Thoughts or reactions to results. Can be formal or informal writing.
190
First Edition
Sample Analysis Memo I. Question: Are there SES differences in school types? II. Method: A chi-square test of group difference was conducted on SES, with three categories, by School type, with two categories. III. Results: The chi-square test of group differences was significant (X2 (2) = 6.33, p = .04), indicating that there are statistically significant group differences in SES by type of school. Type of School by SES SES Low Middle School Type N (% w/School Type) Public 45 (26.8%) 76 (45.2%) Private 2(6.3%) 19 (59.4%)
High 47 (28%) 11 (34.4%)
IV. Discussion: Although the majority of students from all three SES groups attend public school, fewer low income students attend private school than any other SES groups, while more middle income students attend private school than any other SES groups. This is an interesting finding as one might assume that students from high SES backgrounds would be more likely to attend private schools. I wonder if this could be due to higher SES families living in school districts with better public schools, while middle income families may not have access to the best public schools, but do have the financial means to send their children to private schools.
191
JASP Guide
192
JASP Guide
Effect Size Tables Effect Size Magnitude Table Effect Size Calculation
Statistics Test
Phi or Cramer’s Phi
Chi Squared t-Test (Paired & Independent)
Cohen’s d Eta Squared
Small Effect 0.1
Medium Effect 0.3
Large Effect 0.5
0.2
0.5
0.8
0.01
0.06
0.14
ANOVA
r
Correlation 0.1 0.3 0.5 Correlation and r2 t-Test 0.01 0.09 0.25 (Independent) Values from Cohen (1988) Statistical Power Analysis for the behavioral Sciences Effect Size Calculation Equations Effect Size Statistics Test Calculation Phi (ϕ)
Chi Squared 2X2
Cramer’s Phi (ϕc)
Chi Squared > 2X2
Cohen’s d
t-Test (Paired)
Cohen’s d
t-Test (Independent)
Eta Squared
ANOVA
r
r
2
Equation ϕ=
Q(RSA)
TDUVW S TDUVX
η2 =
TDUVW S TDUVX ]^ _``abc
(]^de`f_X )W E (]^de`f_W )W J
]]h`hla
r2 =
228
SD pooled =
]]gbhibbj de`f_k
r =
Correlation and t-Test (Independent)
N is the total number of observations. N is the total number of observations and k is the lesser of rows or columns.
CYUVZUGZ ZD[\UY\HV (]^)
d=
Correlation and t-Test (Independent)
Q
PJ
ϕc =
d=
PJ
Notes
YW (Y W EZm)
YW (Y W EZm)
Correlation output tables will show the r-value. Correlation output tables will show the r2value.
First Edition
Calculating Effect Size for Chi Square The Chi Square analysis has two main calculations for effect size, Phi (ϕ) or Cramer’s Phi (ϕc). For crosstabs tables that are 2 X 2 we will use Phi. A crosstabs table that is described as 2 X 2 will have exactly 2 rows and 2 columns. With crosstabs tables that are greater than 2 X 2 we use Cramer’s Phi. When we have a crosstabs table that is greater than 2 X 2 this means that output table has either 3 or more rows, 3 or more columns, or both the rows and columns have 3 or more entries. In the current version of JASP both Phi and Cramer’s Phi are produced by the Crosstabs command when you select “Phi” in the statistics selection. Crosstabs Table Equal to 2 X 2 Phi (ϕ)
ϕ=
PJ Q
In the Phi formula 𝜒2 is equal to the Chi Square value produced by JASP and N is the total number of observances or samples. Crosstabs Table Greater than 2 X 2 Cramer’s Phi (ϕc)
ϕc =
PJ Q(RSA)
In the Cramer’s Phi formula the 𝜒2 is equal to the Chi Square value produced by JASP. The N is equal to the total number of observances or samples. For k we use the lesser value from the number of rows or the number of columns.
229
JASP Guide
Calculating Effect Size for t-Test (paired samples) The Paired Samples t-Test effect size is calculated using Cohen’s d Cohen’s d (d)
d=
TDUV Z\mmDGDVoD CYUVZUGZ ZD[\UY\HV (]^)
Or d=
TDUVW S TDUVX CYUVZUGZ ZD[\UY\HV (]^)
Calculating Effect Size for t-Test (independent samples) The Independent Samples t-Test effect size can be calculated with either Cohen’s d (in some cases referred to as Hedge’s g) or the r2 value. In this case since we will have standard deviations for two separate (independent) data sets, we must use a “pooled standard deviation” value in the equation. Cohen’s d (d)
or
d= d=
TDUV Z\mmDGDVoD CYUVZUGZ ZD[\UY\HV ]^ FHHpDZ TDUVW S TDUVX ]^ _``abc
The equation to calculate the pooled standard deviation (SDpooled) uses the standard deviation from each group in the equation shown below.
SD pooled =
(]^de`f_X )W E (]^de`f_W )W J
The means and standard deviation values are from the t-Test (Independent samples) Output Table. Another option to calculate the effect size for a t-Test (Independent Samples) is to use the r2 calculation. 2
r value
2
r =
YW Y W • Zm
230
First Edition
In the r2 formula, t is the t-value from the JASP output table and df represents the degrees of freedom from the JASP output table. Calculating Effect Size for One-Way ANOVA The effect size for a one-way ANOVA tests can be calculated with eta squared (η2). Eta squared (η2)
η2 =
]rT Hm ]srUGDCgbhibbj de`f_k ]rT Hm ]srUGDCh`hla
or
η2 =
]]gbhibbj de`f_k ]]h`hla
The Sum of Squares values are taken from the JASP ANOVA output table. Calculating Effect Size for Correlations The effect size for correlations can be found by either r or r2. Both of these values are produced in the JASP Correlation Output Table.
231
JASP Guide
Reporting Frequentist Statistics Statistic Chi Square, Crosstabs, Contingency Tables
Data Only Categorical Data, i.e. Nominal and Ordinal
Questions Are there differences in school type enrollment based on a student’s SES group?
Reporting The difference in percentage of program enrollment based on SES was statistically significant, χ2 (1, N = 200) = 15.5, p = .02, with a medium effect (φ = .50). χ2 (df, N = sample size) = Chi2 value, p = “”actual p-value unless <0.001”, with a medium effect (φ = Phi or Cramer’s Phi value)
t-Test; One Sample
Only Continuous Date, i.e. interval measures or test scores
Are my student’s writing scores different than the districtwide average of 68?
Student writing scores were significantly different than the district average of 68 with a medium effect size, t(199) = 7.12, p < .001 (d = 0.5). t(df) = t-value, p < .001 (d = Cohen’s d value).
t-Test; Paired Samples
Only Continuous Date, i.e. interval measures or test scores
Did my students perform better on their post test compared to their pre test?
Student’s scores significantly better on the post-test compared to their pre-test with a large effect size, t(19) = 3.23, p < .001 (d = 0.72). t(df) = t-value, p < .001 (d = Cohen’s d value).
232
First Edition
Statistic t-Test; Independent Samples
Data Continuous Dependent Variable with a Categorical Factor with only 2 groups
Questions Are student reading scores different based on their gender?
Reporting There was a significant effect for gender on reading scores with a medium effect size, t(198) = 3.73, p < .001 (d = 0.53) , with women receiving higher scores than men. t(df) = t-value, p < .001 (d = Cohen’s d value).
Anova; OneWay
Anova; TwoWay (Factorial)
Continuous Dependent Variable with a Categorical Factors with 3 or more groups
Continuous Dependent Variable with two Categorical Factors with 3 or more groups
Are student writing scores different based on their program type enrollment?
Are student math scores different from an interaction effect of SES status and race?
233
There was a significant difference in writing scores for the main effect of program type with a medium effect size, F(2, 197) = 4.97, p = .01 (η2 = 0.05). F(prog df, residual df) = F-value, p = .01 (η2 = eta squared value) There was a significant difference in math scores for the main effect of race with a large effect size, F(3, 188) = 4.97, p < .001 (η2 = 0.07). There was not a significant difference for the interaction of race and ses, F(6, 188) = 0.48, p = .08.
JASP Guide
Statistic Ancova
Data Continuous Dependent Variable with two Categorical Factors with 3 or more groups, and a Categorical or Continuous Covariate
Questions Are student social studies scores different from an interaction effect of SES status and program type enrollment when controlling for race?
Correlation
Continuous Is there a Data, Interval or correlation between Discreet the student’s reading scores and math scores?
Reporting There was a significant difference in social studies scores for the main effects of SES with a medium effect size, F(2, 190) = 6.39, p < .001 (η2 = 0.05), and program type with a large effect size, F(2, 190) = 22.06, p < .001 (η2 = 0.17), controlling for race. There was not a significant difference for the interaction of ses and program type, F(4, 190) = 4.45, p = .12. The students reading and math scores showed a strong positive correlation, r(198) = 0.66, p < 0.001. r (N - 2) = Pearson’s rvalue, p < 0.001
234
First Edition
Reporting Bayesian Statistics Bayesian Statistics does not seem to have much standardized reporting requirements at the moment. Many of these techniques are beginning to find their way into more and more journals. In some ways reporting a Bayesian result is far simpler than reporting other statistics. The Bayesian methods all use the Bayes Factor (BF) with the same interpretations of the results. One could report Bayes results similarly to the examples below; •
•
The BF01 was 11.5, suggesting that these data are 11.5 more likely to be observed under the null hypothesis, suggesting strong evidence for the alternative hypothesis. The Bayes factor (BF01 = 12.2) suggested strong evidence that the data were 12 times more likely in favor of the null hypothesis
Or even •
Statistical analyses were conducted using the free software JASP using default priors (JASP Team. JASP Version 0.8.6 [Computer software]). We reported Bayes factors expressing the likelihood of the data given H1 relative to H0 assuming that H0 and H1 are equally likely. The resulting BF10 = 7.5 suggest moderate evidence for the likelihood of the alternative hypothesis over the null hypothesis.
There are several key components that must be included in the write-up of an empirical paper implementing Bayesian estimation methods. These could include; • • • • •
The statistical program used for analysis A discussion of the priors Thoroughly detail and justify all prior distributions, even if default priors were used Bayes Factor robustness check results Sequential analysis check results
235 235
References Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). Hillsdale, N.J: L. Erlbaum Associates. Eagle, E., & Carroll, C. D. (1988). High school and beyond national longitudinal study: postsecondary enrollment, persistence, and attainment for 1972, 1980, and 1982 high school graduates. Washington, D.C: National Center for Education Statistics, U.S. Dept. of Education, Office of Educational Research and Improvement Fisher, R. A. The Design of Experiments. Oliver and Boyd, Edinburgh, 1935. Kline, R. B. (2004). Beyond significance testing: reforming data analysis methods in behavioral research (1st ed). Washington, DC: American Psychological Association. Howell, D. C. (1982). Statistical methods for psychology. Boston, Mass: Duxbury Press. Jeffreys, H. (1961). Theory of probability (3rd Ed.). Oxford, UK: Oxford University Press.
National Governors Association Center for Best Practices, Council of Chief State School Officers (2010). Common Core State. National Governors Association Center for Best Practices, Council of Chief State School Officers, Washington D.C. Raffery, A. E. (1995). Bayesian model selection in social research. In P. V. Marsden (Ed.), Sociological methodology 1995 (pp. 111–196). Cambridge, MA: Blackwell. Lee, M.D., & Wagenmakers, E.-J. (2013). Bayesian Modeling for Cognitive Science: A Practical Course: Cambridge University Press Wasserman, L. (2004). All of statistics: A concise course in statistical inference. New York: Springer. Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E.-J. (2011). Statistical Evidence in Experimental Psychology: An Empirical Comparison Using 855 t Tests. Perspectives on Psychological Science, 6(3), 291–298. https://doi.org/10.1177/1745691611406923
236
First Edition
Index A
H
Analysis of Covariance (ANCOVA), 100 ANOVA, 81, 166 Assumption Checks, 94
Harold Wenglinsky, 4 Homogeneity, 26 hypergeometric, 145
B
I
Bayes Factor, 18, 23 Bayes’ Theorem, 17 binomial logistic regression, 117 Bivariate Correlation, 109, 186
independent factor, 81, 100, 166, 179 independent multinomial, 145 Independent samples t-Test, 70, 74, 152, 157 independent variable, 82, 167 inter-rater reliability, 130
C
J
categorical, 7 categorical control variable, 102 Categorical data, 7, 28 Cauchy distribution, 21 Center on Education Policy, 4 Chi Square, 60, 142 Codebook, 27 coefficients, 116 Contingency tables, 60, 142 Continuous, 30 Continuous data, 7, 28, 82, 101, 109, 167, 180, 186 correlation, 109, 186 covariant, 100 Cronbach’s Alpha, 126, 128
joint multinomial, 145
K Kendyll’s Tau, 186 kurtosis, 55
L leptokurtic, 55 Levene’s statistic, 87
M Mark Twain, 7 mean, 46 median, 46
D dependent, 114 dependent variable, 81, 82, 101, 166, 167, 180 Descriptive statistics, 46 discreet data, 7
N Negative correlation, 111, 188 nominal, 8, 29, 30 Normal distributions, 22 Normality, 26 numerical values, 28
E eigenvalue, 132 Equality of Variance, 87
O
F
One-Way ANOVA, 81, 100, 166 ordinal, 8, 29, 30
Factor Analysis, 131 frequencies, 46
237
JASP Guide
P
Residual plots, 120
Pearson R-value, 115 Pearson’s Correlation, 186 Pearson’s Rho, 187 platykurtic, 55 Poisson, 145 Polynomial Contrast, 90 positive correlation, 111, 188 Post Hoc, 88, 96 Posterior plot, 155 Prior Distribution, 20, 22 P-value, 111
S scaled data, 7 scree plot, 134 scree test, 133 Sensitivity, 124 skewness, 55 Specificity, 124 spreadsheet, 32 SPSS, 31 Standard Deviation, 55 standard deviations, 46 statistical methods, 9
Q Q-Q plot of residuals, 87, 106 Qualitative, 6 quantitative, 6 quantitative methods, 6 quantitative research, 6
T t-distribution, 21 Thomas Bayes, 17 Two-Way (Factorial) ANOVA, 92, 171
R
U
R Squared, 115 rational numbers, 7 regression analysis, 113 Reliability, 126
uniform distribution, 22
238
239
JASP Guide
ABOUT THE AUTHOR Christopher P. Halter, Ed.D, is a faculty member at the University of California San Diego’s Department of Education Studies. He teaches courses in mathematics education, secondary mathematics methods, research methodology, emerging technologies, and statistical analysis. His research includes teacher development, new teacher assessment, digital storytelling, and video analysis. He also teaches online courses in creating online collaborative communities, middle school science strategies, and blended & synchronous learning design.
240