The visual display of quantitative information: The use of graphs in research and manuscripts David L. Schriger, MD, MPH Richelle J. Cooper, MD MSHS
For a copy of these slides and a bibliography: e mail:
[email protected] put “graphing lecture” on the subject line any comments about the lecture would be appreciated. Mr Mikulich will anonymously forward these to us (yeah right). Much of this material can be found in the January, 2001 issue of A nn nnal als s of E me merr ge gen n cy M edi ci ne ne..
For a copy of these slides and a bibliography: e mail:
[email protected] put “graphing lecture” on the subject line any comments about the lecture would be appreciated. Mr Mikulich will anonymously forward these to us (yeah right). Much of this material can be found in the January, 2001 issue of A nn nnal als s of E me merr ge gen n cy M edi ci ne ne..
Goals of the session • Impor Importa tanc ncee and adv advan anta tage gess of grap graphi hica call data display • Ma Mast ster er use use of of basi basicc feat featur ures es of of grap graphs hs • Le Lear arn n adv advan ance ced d tec techn hniq ique uess • Ga Gain in ab abil ilit ity y to to cri criti tiqu quee gra graph phss
Importance of graphs: • Ex Expl plor orat ator ory y Data Data Ana Analy lysi siss • Pr Pres esen enttat atio ion n of of dat dataa – Values of data elements – Relationship of data elements elements
Importance of graphs: • Ex Expl plor orat ator ory y data data ana analy lysi siss – – – – – – –
Picture worth 978 words How the investigator learns about about the data Seeing is believing If we only had n dimensions dimensions Make multiple slices Advanced computer computer methods - “Fantastic voyage” voyage” Can’t cover in this hour
-1.79 -1.7 -1.68 -1.57 -1.55 -1.44 -1.36 -1.34 -1.29 -1.27 -1.23 -1.23 -1.18 -1.12 -1.09
-.97 -.95 -.89 -.87 -.85 -.85 -.8 -.8 -.78 -.76 -.75 -.75 -.73 -.66 -.66
-.64 -.63 -.63 -.59 -.59 -.57 -.55 -.55 -.38 -.35 -.33 -.33 -.32 -.31 -.29
-.23 -.22 -.22 -.2 -.19 -.12 -.1 -.06 -.04 .01 .05 .1 .14 .14 .15
.16 .2 .21 .24 .27 .28 .3 .31 .32 .33 .37 .46 .48 .52 .55
.57 .57 .59 .64 .71 .82 .87 .87 .9 .96 1 1.08 1.13 1.13 1.14
1.25 1.25 1.26 1.27 1.53 1.62 1.85 1.9 1.95 1.98
. summ z Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------z | 100 -.0692 .9138869 -1.79 1.98
. summ z,d z ------------------------------------------------------------Percentiles Smallest 1% -1.745 -1.79 5% -1.495 -1.7 10% -1.25 -1.68 Obs 100 25% -.755 -1.57 Sum of Wgt. 100 50% 75% 90% 95% 99%
-.155 .56 1.195 1.575 1.965
Largest 1.85 1.9 1.95 1.98
Mean Std. Dev.
-.0692 .9138869
Variance Skewness Kurtosis
.8351893 .2684074 2.395736
-.155
-1.787513
z
1.983905
Stem-and-leaf plot for x plot in units of .01 -1** | 79,70,68 -1** | 57,55,44 -1** | 36,34,29,27,23,23 -1** | 18,12,09 -0** | 97,95,89,87,85,85,80,80 -0** | 78,76,75,75,73,66,66,64,63,63 -0** | 59,59,57,55,55 -0** | 38,35,33,33,32,31,29,23,22,22,20 -0** | 19,12,10,06,04 0** | 01,05,10,14,14,15,16 0** | 20,21,24,27,28,30,31,32,33,37 0** | 46,48,52,55,57,57,59 0** | 64,71 0** | 82,87,87,90,96 1** | 00,08,13,13,14 1** | 25,25,26,27 1** | 53 1** | 62 1** | 85,90,95,98
.1
0 2 , 3 2 7 6 2 3 , , , 3 2 3 6 2 3 , , , 0 4 3 2 8 6 2 3 , , , , 0 6 9 6 1 9 8 6 2 1 3 5 , , , , , , 3 5 6 1 5 0 7 2 8 6 3 1 3 5 , , , , , , , 3 5 3 5 2 4 4 8 7 6 4 2 8 7 5 3 0 1 2 5 9 1 , , , , , , , , , , , 7 7 5 5 3 6 4 7 5 0 3 7 8 2 8 7 5 3 0 1 2 5 9 1 2 9 , , , , , , , , , , , , , 8 4 9 9 9 5 7 3 0 0 4 2 7 3 6 5 6 4 2 0 8 7 5 3 1 1 2 5 8 1 2 9 , , , , , , , , , , , , , , , , 0 5 4 2 5 6 9 5 2 5 1 8 1 7 8 5 0 7 5 3 1 9 7 5 3 1 0 2 4 7 8 0 2 9 , , , , , , , , , , , , , , , , , 9 7 6 8 7 8 9 8 9 1 0 6 4 2 0 5 3 2 5 7 5 3 1 9 7 5 3 1 0 2 4 6 8 0 2 5 6 8 | | | | | | | | | | | | | | | | | | | * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 - - - - - - - - -
n o i t c a r F
0 -1.79
1.98 z
Importance of graphs:
•Why not just use summary statistics?
Summary statistics for selected variables
Variable |
Obs
Mean
Std. Dev
---------+-------------------------------z |
100
-.0692
.9138869
a |
100
-.0692
.9136105
u |
100
-.0692
.9131416
.2
.34
A
n o
U
n o i t c a r F
c a r
0
0 -1.79
1.98 a
Z n o c a r
0 z
1.04 u
.1
-1.79
-1
1.98
Importance of graphs:
•Why not just use analytic statistics?
. regress o n Number of obs = 100 F( 1, 98) = 411.93 Prob > F = 0.0000 R-squared = 0.8078 Adj R-squared = 0.8059 Root MSE = .40336 ---------------------------------------------------o | Coef. Std. Err. t P>|t| ---------+-----------------------------------------n | 2.818721 .1388798 20.296 0.000 _cons | -1.41951 .0816387 -17.388 0.000 ----------------------------------o | [95% Conf. Interval] ---------+------------------------n | 2.543118 3.094323 _cons | -1.581519 -1.2575 -----------------------------------.
2.46622
o
-2.10937 .005723
.984407 n
Principles of graphing: • Graph the data, not statistics • Exploit the dimensionality of the graphic format • Depict the unit of analysis • Stratify on confounders • Show the trees and the forest
82 82 81
p< .05
80 ) m m79 ( S A V 78 n a e M77
77
76 75
A
B Treatment Group
100 90 80
) m m ( S A V n a e M
p < .05
70 60 50 40 30 20 10 0
A N = 200
B Treatment Group
N = 200
•The mean VAS in group A was 77 (SD 30, N=200) and in group B was 82 (SD 7 , N=200) (p< .05).
20
Median 89
15 Treatment A N = 200
10 s t c e j b u s f o r e b m u N
Mean 77
5 0
0
50 VAS score (mm)
5
100
10 15 20
Treatment B N = 200 Mean 82 Median 83
9 7 9 7 88 6988 High pain group 6878 8 58677 979847667 8 9868747667 7 88877487366678 97 777574853355569 787 894763537433555599673 7 8854653336333345469572 6 6 7533632325323343353352 96 56253432431323322222343212
Group A by treating physician
Low pain group s e s a c f o r e b m u N
5 2 8 32 2 22222 22112 11112 11111212
4 11 VAS (mm)
71
VAS (mm)
100
Principles of graphing: 100
82
20
90 82
Median 89
15
80
81
p<.05
m m ( S A V n a e M
80
m m ( 79 S A V n 78 a e
77
p<.05
70
TreatmenA t N = 200
10
9 7
s t c e j b u s f o r e b m u N
50 40 30
5
6878 8
0
50 VAS score (mm)
5
100
5
0 A
75 A
B
N = 200
B TreatmentGroup
15 2 0
9868747667
2 8
8 8877487366678
32 2
22 112
10
10 76
11 112 11 1 11212
894 76353743355559967 3 7 8854 65333633334546957 2 6 6 7533 63232532334335335 2 96 562 53432 43132332222234321 2
TreatmenB t N = 200 Mean82 Median83
N = 200
ü
• Depict the unit of analysis
ü ü
• Stratify on confounders • Show the trees and the forest
ü
8 7 97
7 77574853355569 78 7
TreatmentGroup
• Graph the data, not statistics
58677
979847667
0
22 222
20
M77
9
7 88
Mean77
6988
60
ü
Importance of showing data: • No assumptions needed • Efficiency • Empowers readers to: – make their own conclusions – determine whether authors’ analyses are appropriate – do their own analyses
Elements of the graph: Title • Title should state what is being shown or compared. • Focus reader on what they are about to see NOT - “Change in Respiratory Function” BUT - “Change in FEV1 by group and baseline FEV1”
Elements of the graph: Legend • Makes the figure self-explanatory. • Defines abbreviations, symbols, and methods – any regression line, p-value, or other symbol based on calculations should be explained
• Defines sample size if not shown in graph
Elements of the graph:
•Axes
Elements of the graph: Axes - scale • Appropriate boundaries: Do not overly compress or expand the data. • Uniformity: Distance along axis must retain consistent interpretation throughout graph.
m m ) S A V ( e r o c S n i a P n a e M
73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 1
2
3
4
5
6
Post-op day
7
8
9
Elements of the graph: Axes – tick marks and labels • Avoid clutter • Align ticks, labels and data points • Consider specific label for first and last point of the data set
Elements of the graph:
•Data points
.99277
b
.074068 -.007491
3.27872
a
.99277
b
.074068 -.007491
3.27872
a
.99
b
.07 -.01
3.28
a
Elements of the graph: Data points • Is it the pattern or the individual points that you want readers to see? • Consider using jitter or alter graph dimensions to avoid clutter. • Consider symbols to further differentiate strata in the data.
Elements of the graph:
•Chartjunk! •Any ink that does not show or explain data
82 82 81
p< .05
80 ) m m79 ( S A V 78 n a e M77
77
76 75
A
B Treatment Group
Elements of the graph: Background and Shading • Efficiency is the key to a good graphic. • Avoid - background shadings, background grid lines, or unnecessary axes - moiré patterns - 3-D effects
Elements of the graph: Other pitfalls to avoid
• Avoid redundancy within the graphic • Check for errors
Elements of the graph: Matching the graph with the text • Avoid redundancy. • Use a graph to provide information beyond what can be conveyed tersely in text. • Be sure text and figures are congruent. • Work with copy editor - get the graphic on the same page as the relevant text.
Choosing the type of graph • Choice depends on the type of data collected, number of observations, and the message you wish to convey. – Numeric details or overall pattern? – Number of subjects or measurements? – Data form a distribution? – Data paired?
Graph type - examples • Univariate simple display - pie chart, bar graph, point graph • Univariate distribution – one-way, histogram, stem-and-leaf plot, boxand-whisker, survival curve • Bivariate display – scatterplot and its variations (ROC, Bland-Altman)
Simple Univariate Display 100 90 80
4% 12%
12%
10%
1 2 3 4
26%
p < .05
70
5
) m m ( S A V n a e M
60 50 40 30
6 20
36% 10 0
A N = 200
B Treatment Group
N = 200
Univariate distributions -1** | 79,70,68 -1** | 57,55,44 -1** | 36,34,29,27,23,23 -1** | 18,12,09 -0** | 97,95,89,87,85,85,80,80 -0** | 78,76,75,75,73,66,66,64,63,63 -0** | 59,59,57,55,55 -0** | 38,35,33,33,32,31,29,23,22,22,20 -0** | 19,12,10,06,04 0** | 01,05,10,14,14,15,16 0** | 20,21,24,27,28,30,31,32,33,37 0** | 46,48,52,55,57,57,59 0** | 64,71 0** | 82,87,87,90,96 1** | 00,08,13,13,14 1** | 25,25,26,27 1** | 53 1** | 62 1** | 85,90,95,98
20
Median 89
15 Treatment A N = 200
10
Mean 77
5 s t c e b u s f o r e b m u N
0
0
100
50 VAS score (mm)
5
10 15 2 0
Treatment B N = 200 Mean 82 Median 83
600
) n i 400 m / L ( R F 200 E P Pre
Post
Pre
Post
0 Drug S
Drug N
Figure 6a - Drug S
Figure 6b - Drug N, v.1
600
600
400
400
)200 n i m / 0 Pre L ( R F E600 P
200
0 Post
Pre
Figure 6c - Drug N, v.2
Post
Figure 6d - Drug N, v.3 600
400
400
200
200
0
0 Pre
Post
Pre
Post
Figure 7 - Change in PEFR by subject Not on Steroids
On Steroids
600 ) n i 475 m / L ( R 350 F E P
225 100 Subjects N = 360 N=180
Not on steroids
On steroids
) 470 n i m 400 / L (
R F E 200 P n i e g n 0 a h C -110 110
200 300 400 Initial PEFR (L / min)
510
Special Features • Allow extra detail or strata to be portrayed. • Convey complex relationships simply • Increase information content while maintaining visual clarity
Examples of special features • Illustration of pairing • Symbolic dimensionality • Small multiples • Layering of two graphic types to convey detail and summary measures (eg scatterplot with box-and-whisker plots)
Linear Regression
Lowess Regression
10
n o i t c a f s i t a S t n e i t a P
7
4
1 0
2 4 6 Length of ED Stay - hours
8
Discharged
. Admitted
10 8 n o i t c a f s i t a S t n e i t a P
6 4 2 1 0
2 4 6 Length of ED Stay - hours
8
ankle injury
laceration
wrist/hand fracture
back pain
bronchospasm
vomiting
don't feel well
weakness
headache
10 7 4
n 1 o i t n c o a 10 i t f c s 7 i a t f a 4 s i t S 1 a t S n e i t 10 a P 7 4 1
0
4
8
0
4
8
Length ofStay Stay(hours) - hours Length of
0
4
8
For a copy of these slides and a bibliography:
e mail:
[email protected] please put “graphing lecture” on the subject line any comments about the lecture would be appreciated. Mr Mikulich will anonymously forward these to us (yeah right). Much of this material can be found in the January, 2001 issue of An nals of Emer gency M edicine.