1 Data Analysis Project: Determining the Relationship between Blood Pressure and Smoking
Data Analysis Project: Determining the Relationship between Blood Pressure and Smoking Mariah Delaire COH 602 Biostatistics National University June 5, 2015
2 Data Analysis Project: Determining the Relationship between Blood Pressure and Smoking
The question addressed in this data analysis is, “is there a relationship between smoking and blood pressure?” The extraneous variables that were considered to better address this question is weight status and gender. It is hypothesized that there is a higher proportion of overweight male smokers compared to overweight female smokers, causing males to have higher blood pressure. This project used the data set HEART which is a subset of variables from the Framingham Heart Study. PROC CONTENTS was constructed using SAS to determine variables that will be used for the project. For descriptive statistics, frequency distributions were computed using the PROC FREQ procedure in SAS for all of the categorical variables. The categorical variables were blood pressure status, smoking status, weight status and gender. Once the frequencies were analyzed, there were no obvious outliers skewing the data, allowing for all variables to remain in the procedure. The next procedure used was PROC GCHART which allowed for the categorical variables to be visualized. Blood pressure, smoking status, weight status and gender were all graphed in the GCHART procedure to better visualize the distributions of the data. The data was then sorted by gender using the PROC SORT procedure. After which, PROC FREQ was used once more to sort the frequencies by gender to better assess distributions between the variables. PROC GCHART was used once again for the sorted data to visualize the sorted distributions. Inferential statistics was conducted using the chi-squared procedure to assess the relationships between blood pressure status, smoking status and weight status. The first thing was to eliminate extraneous variables by just assessing the relationship between all variables. The chi squared procedure was the most appropriate test to use because all variables used in this study were categorical. Using t tests or ANOVA would not suffice due to the lack of continuous data.
3 Data Analysis Project: Determining the Relationship between Blood Pressure and The chi-squared procedure was used to first assess the relationship between blood pressure status Smoking
and smoking status. After that procedure was ran, the same was done for blood pressure status and weight status. These were done separately to assess their independent relationships to the independent variable. The last procedure ran without extraneous variables was chi squared using all three categorical variables- blood pressure status, smoking status and weight status. After relationships were seen, adding the extraneous variable, gender, was done to stratify the data and test the research question. The chi squared procedure was ran using the PROC SORT feature to stratify the data between male and female. After which, the chi squared procedure was done to assess the relationship between blood pressure status and smoking status between genders. The same was done using blood pressure status and weight status. The last procedure used was chi squared to assess the relationship between blood pressure status, smoking status, and weight status between genders. Hypothesis testing was done to determine whether the data obtained was significant or not. The null hypothesis is that blood pressure status, smoking status and weight status is independent regardless if extraneous variables exist or not. The alternative hypothesis is that there is a relationship between all variables and that they affect blood pressure status. The appropriate test statistic was that for chi squared χ2 = ∑(O-E)2/E. To decide whether or not to reject the null hypothesis, the critical value was determined using the degrees of freedom for each test along with the table for chi square distribution. For blood pressure status*smoking status using chi squared, the critical value determined was 18.31. So if χ2≥18.31, then the end result is considered significant. Blood pressure * weight status’ critical value came to be 12.59, so if χ2≥12.59, the value is significant. The critical values for the remaining variables are listed below in table 1.
4 Data Analysis Project: Determining the Relationship between Blood Pressure and Smoking VARIABLE NAME VARIABLE LABEL RESPONSE VALUES
BP_STATUS
Blood Pressure Status
1 = High 2 = Normal 3 = Optimal
SEX
1 = Female 2 = Male
SMOKING_STATUS
Smoking Status
1 = Heavy (16-25) 2 = Light (1-5) 3 = Moderate (6-15) 4 = Non-smoker 5 = Very Heavy (>25
WEIGHT_STATUS
Weight Status
1 = Normal 2 = Overweight 3 = Underweight
Table 1: Variables used in Data Analysis
As seen in table 1, the variables for the data analysis used were blood pressure status, gender, smoking status, and weight status. Table 2 shows the frequencies in all the categorical variables between genders. As expected, males had a higher percentage of heavy smokers compared to females along with a higher percentage of overweight individuals. 11.87 percent of
5 Data Analysis Project: Determining the Relationship between Blood Pressure and females are heavy smokers while 30.51 percent of males are heavy smokers. 66.46 percent of Smoking
females are overweight while 70.39 percent of males are overweight. The percentage of females who have high blood pressure is 41.28 while the percentages of males is 46.28. These results correlate with the idea that males have higher blood pressures due to their weight and smoking status.
Table 2: Frequencies for categorical variables between genders
Smoking Status (females) Smoking_Status
Frequency Percent
Cumulative Cumulative Frequency Percent
Heavy (1625)
339
11.87
339
11.87
Light (15)
422
14.78
761
26.65
Moderate (615)
340
11.90
1101
38.55
1682
58.89
2783
97.44
73
2.56
2856
100.00
Nonsmoker Very Heavy (> 25)
Frequency Missing = 17
Weight Status (females) Cumulative Cumulative Weight_Status Frequency Percent Frequency Percent Normal Overweight Underweight
846
29.49
846
29.49
1907
66.47
2753
95.96
116
4.04
2869
100.00
Frequency Missing = 4
6 Data Analysis Project: Determining the Relationship between Blood Pressure and Blood Pressure Status (females) Smoking Cumulative Cumulative BP_Status Frequency Percent Frequency Percent High
1186
41.28
1186
41.28
Normal
1166
40.58
2352
81.87
Optimal
521
18.13
2873
100.00
Smoking Status (males) Smoking_Status
Cumulative Cumulative Frequency Percent Frequency Percent
Heavy (1625)
707
30.51
707
30.51
Light (15)
157
6.78
864
37.29
Moderate (615)
236
10.19
1100
47.48
Nonsmoker
819
35.35
1919
82.82
Very Heavy (> 25)
398
17.18
2317
100.00
Frequency Missing = 19 Weight Status (males) Cumulative Cumulative Weight_Status Frequency Percent Frequency Percent Normal Overweight Underweight
626
26.82
626
26.82
1643
70.39
2269
97.22
65
2.78
2334
100.00
Frequency Missing = 2
7 Data Analysis Project: Determining the Relationship between Blood Pressure and Blood Pressure Status (males) Smoking Cumulative Cumulative BP_Status Frequency Percent Frequency Percent High
1081
46.28
1081
46.28
Normal
977
41.82
2058
88.10
Optimal
278
11.90
2336
100.00
Inferential statistics were used to determine the relationship between blood pressure and smoking with extraneous variables of weight and gender. The below table 3 shows the results of using the chi-squared procedure in SAS. The first variables tested were tested without stratifying the data between males and females. The relationship between blood pressure status and smoking status were ruled significant because 89.94>18.31 with a p-value<0.0001. The same occurred for blood pressure status and weight status with a p value <0.0001 and a test statistic value of 386.667>12.59. With all variables tested against each other using chi squared, the same results occurred as the last. The p-value for blood pressure status, smoking status and weight status was <0.0001 with a test statistic value of 58.21>25.00. Table 3: Chi- Squared Procedure Results BPStat* Smoking Stat
BPStat* WeightStat
BPStat* WeightStat* Smoking Stat
BPStat* Smoking Stat (female)
BPStat* Smoking Stat (male)
BPStat* WeightStat (female)
BPStat* Weight Stat (male)
BPStat* WeightStat* SmokingSta t (female)
BPStat* WeightStat* SmokingStat (male)
DF χ2 Value
10 89.935
6 386.667
15 58.206
10 102.387
10 26.774
6 247.180
6 138.14
15 65.452
15 21.774
P-Value
<0.000
<0.0001
<0.0001
<0.000
0.0028
<0.0001
7 <0.001
<0.0001
0.1139
1 18.31
12.59
25.00
1 18.31
18.31
12.59
12.59
25.00
25.0
Critical Value
8 Data Analysis Project: Determining the Relationship between Blood Pressure and After sorting the data between genders, the results did not change too significantly. For Smoking
blood pressure status * smoking status in females, the test statistic value was 102.39>18.31 with a p-value of <0.0001, which showed significance between the variables. Testing the same variables for males, resulted in a test statistic value of 26.77>18.31 with a p-value of 0.003. The results showed there was a significant relationship between blood pressure and smoking in males. The next variables tested were blood pressure status and weight status in females which yielded a test statistic value of 247.18>12.59 with a p-value of <0.0001, classifying it significant as well. The same results tested for males resulted in a test statistic value of 138.15>12.59 with a p-value of >0.0001, which showed that there was a significant relationship between blood pressure and weight in males. The last chi squared procedure was used to test blood pressure status, smoking status, and weight status in both males and females. For females the results showed a test statistic value of 65.45>25 with a p-value of >0.0001 indicating a significant relationship between these variables. Using the test for males, the test statistic value was 21.77<25 with a p-value of 0.114, yielding results that showed there was not a significant relationship. In conclusion, there were more significant results with the unsorted data, which coincided more with the alternative hypothesis. As seen in the results from table 3, all of the p-values and test statistic values were in a range that proved large significance. This showed that there was a significant relationship between blood pressure status, smoking status and weight status. After sorting the data, there was a larger level of significance between the variables and females as opposed to the variables and males. As seen, the sorted data for females showed a much larger level of significance whereas the data for males was much lower, including one set of variables that were proven insignificant. As a result, it has been seen that males do not have a higher blood
9 Data Analysis Project: Determining the Relationship between Blood Pressure and pressure due to weight and smoking in comparison to females. The results actually show the Smoking
reverse, where the females have a higher level of significance between all the variables. In conclusion, we fail to reject the null hypothesis because 21.77<25.00 with a p-value of 0.114. We do not have statistically significant evidence to show that blood pressure is higher in males than females due to smoking and weight. However, the results can be proven significant if the data was not stratified, and the variables were assessed individually with blood pressure status.
References: Delwiche, L., & Slaughter, S. (2012). The little SAS book: A primer (5th ed). Cary, N.C.: SAS Institute. SAS/STAT 9.2 User’s Guide, Second Edition. Retrieved July 3, 2016. Sullivan, L., & Sullivan, L. (2012). Essentials of biostatistics in public health (2nd ed.). Sudbury, MA: Jones & Bartlett Learning.