UNIVERZITET U BEOGRADU
FILOLOŠKI FAKULTET
Aleksandar Belić
ACOUSTIC ANALYSIS OF SOME AMERICAN ENGLISH VOWELS Diplomski-master Diplomski-master rad
Mentor: prof. dr Biljana Čubrović
Beograd, 2012.
AKUSTIČKA ANALIZA NEKIH SAMOGLASNIKA AMERIČKOG ENGLESKOG APSTRAKT Cilj ovog rada je bio izmjeriti te uporediti frekvencije formanata samoglasnika troje govornika američkog engleskog kao maternjeg jezika. Govornici su snimljeni dok su čitali sa čitali sa spiska nasumično nasumično oda branih riječi, a akustička analiza je izvršena nakon toga uz pomoć kompjuterskog programa, mjereći frekvencije prvih triju formanata. Frekvencije formanata. Frekvencije formanata su mjerene ili na mjestu gdje formanti postižu stabilno stanje, ili na sredini artikulacije ako stabilno stanje nije vidljivo. Dijagrami samoglasnika su napravljeni da bi grafički ilustrovali pozicije samoglasnika. Podaci su pokazali da određeni samoglasnici ispoljavaju neznatno drukčije osobine od govornika do govornika. Neke od uočenih razlika su objašnjene kao varijante unutar govornikovog regionalnog dijalekta, a neke kao njihovi individualni idiolekti. Analiza je takođe pokazala da je kod nekih govornika prisutan određeni nivo diftongizacije samoglasnika kada se taj samoglasnik nalazi ispred nekih samoglasnika u finalnoj poziciji riječi. Ipak, većina samoglasnika ne ispoljava značajnije razlike u k valiteti između govornika.
Ključne riječi: akustička fonetika, formanti samoglasnika, samoglasnici u američkom engleskom, frekvencija formanata, akustička analiza, regionalni dijalekti
ACOUSTIC ANALYSIS OF SOME AMERICAN ENGLISH VOWELS ABSTRACT
The objective of this paper is to measure and compare the frequencies of vowel v owel formants of three native speakers of American English. The speakers were recorded while reading from a list of randomly chosen words, and afterwards a fterwards the acoustic analysis was conducted with the help of a computer program, measuring the frequencie s of the first three formants. The formant frequencies were measured either at a point where formants reach their steady state, or in the middle of the articulation if the steady state was not visible. Vowel charts were made to illustrate the vowel positions graphically. The data showed that certain vowels exhibit slightly different qualities from speaker to speaker. Some of the d ifferences observed were explained as being be ing varieties within the speakers’ regional dialects, and some as their individual idiolects. The idiolects. The analysis also showed that certain amount of diphtho ngization is present with certain speakers when the vowel in question precedes certain consonants in word-final position. However, the majority of the vowels showed no significant difference in quality between the speakers.
Key words:
acoustic phonetics, vowel formants, American English vowels, formant frequenc y, acoustic analysis, regional dialects
CONTENTS
1.
INTRODUCTION .............................................. ..................................................... ..................................................................... ................ 1 1.1.
2.
3.
4.
5.
METHOD
.....................................................................................................................................
4
WASHINGTON STATE ....................................................... ......................................................................................................... .................................................. 5 2.1.
FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap) ............... 5
2.2.
BACK VOWELS ( as in goose, as in took, as in top, and as in war)........ 10
2.3.
CENTRAL VOWELS ( as in run, as in first and as in cannon) ........................... 13
GEORGIA ............................................................................................................................. 17 3.1.
FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap) .............. 17
3.2.
BACK VOWELS ( as in goose, as in took, as in top, and as in war)........ 21
3.3.
CENTRAL VOWELS ( as in run, as in first and as in cannon) ........................... 25
ALABAMA ........................................................................................................................... 29 4.1.
FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap) ............. 29
4.2.
BACK VOWELS ( as in goose, as in took, as in top, and as in war)........ 33
4.3.
CENTRAL VOWELS ( as in run, as in first and as in cannon) ........................... 37
CONCLUSION ..................................................................................................................... 40
REFERENCES ............................................................................................................................. 43
1
1. INTRODUCTION We can arguably say that vowel sounds are the backbone of every language in the world. In fact, there is a strong concurrence among linguists that languages without vowels are not onl y non-existent, but also impossible. This is, of course, only logical, because vowels are considered the least marked sounds, and therefore there is no reason why at least some vowels would not be incorporated into the sound system of a particular language. The number of vowels, however, may vary. The most common vowel system has five vowels, but there are lan guages with three, or even fewer vowels, although they are very rare (O’Grady/Dobrovolski/Katamba 1997:375). The number of vowels in English (not including diphthongs), and more importantly, their quality, can vary, depending on the country and the accent spoken in that particular region. American English, for example, can have a wide variety of vowels, some of which can be regarded as being only the variants of the same vowel, but certain authors regard them as being individual phonemes. The conclusion is that every dialect has a separate vowel system. However, even when trying to describe all the vowels in all of the American dialects, not all authors operate with the same number of vowels. Kenyon lists sixteen vowels (1964:28-29), Wells lists 1
eleven (1999:472) , and Thomas lists seventeen (1958:128). On the other hand, Ladefoged and Johnson (2010:90) mention only nine, while Olive, Greenwood and Coleman (1993:20) operate with twelve. These examples show that there is n o definite way of determining the exact number of vowels in American English, since different authors have different und erstanding of whether certain sounds ought to be classified as being only one vowel or more. It would be difficult to determine the exact status of each of the sounds without going deeper into the study of dialectal origins of the differences that caused them. This paper will operate a system of 11 monophthongs, dutifully recognizing differences that some vowel variations exhibit in certain con texts. For example, Wells (1999) regards the vowel in the word sport as being of a different quality than the vowel in the word short . While this is undeniably true, it makes analysis more co mplicated, which this paper will try to avoid. Having this in mind, the symbols used in this paper should be understood as symbols for a 1
But in his Longman Pronunciation Dictionary (2000), he lists 12 vowels (diphthongs excluded)
2
particular “group” of vowels, where each symbol represents a group of possible vowel variations found in different accents. Thus, the symbol // will represent both vowel variations found in sport and short , respectively. The traditional view on the analysis of vowel sounds recognizes two distinctive methods: articulatory and acoustic. The former is, perhaps, more “anatomical“, for it deals with the actual position and/or movement of the articulatory organs within the vocal tract. The latter, on the other hand, is founded in physics, and is primarily concerned with the acoustic properties of sounds, which may, or may not coincide with the articulatory descriptions. The problem with the traditional distinctive feature framework, as Olive, Greenwood and Coleman (1993:28-32) suggested it, is its inability to provide descriptions that are more precise when more subtle differences between vowels are in question. For example, in the traditional binary classification, vowels are regarded as being high, low, back , round and tense, whereas the presence of the particular feature wou ld mark the sound as being +feature, and – feature if the feature was not present. However, certain vowels are neither high no r low, but somewhere in between, making them difficult to d escribe using only this system. Acoustic analysis, on the other hand, provides a more precise method of description, where more subtle changes caused by the movement of the articulators are visible, more easily tracked, measured, and therefore described. With the use of a spectrogram, minute differences in the quality of the sound can be analyzed, and also graphically presented, which is difficult to do using the traditional binary classification. The principal component that needs to be taken into account when analyzing vowels is the frequency of its formants. Formants can be de fined as “resonances of the vocal tract that have a specific frequency expressed in hertz (Hz ). In most cases, the first two formants are sufficient to characterize speech sounds, but occasionally the third formant is also useful for description” (Olive/Greenwood/ Coleman 1993:80). Before the analysis itself, a certain geographical identification of speech varieties needs to be made. Since the speakers whose speech will be analyzed in this paper come from different states (Alabama, Georgia, and Washington), some kind of geographical labeling needs to be established. In order to place the speakers in to established groups, in this case speech areas, o ne needs to determine them exactly. The literature on this matter offers a wide variety of solutions,
3
and maps of the USA that portray America’s three major speech areas existed even b efore WWII. From this simple 3-way division (Eastern, Southern, General American), to a more complex 8-way division from the 50’s (Thomas 1958:232) and the 70’s (Wood 1972 as cited in Wells 1999:528), the general understanding that A merican pronunciation is in no way uniform in all parts of the United States has been evident from the start. The issue becomes even more complicated in modern times, when considerable accent and population shifting have taken place. This has led to further fragmentation of speech areas, which has made precise dialect division more difficult to determine. Although certain general characteristics of local speech that differentiate it from other areas still very much exist, it is not so evident and clear-cut today as it has been in the past. The majority of dialectologists, however, would agree to place Alabama and Georgia speech into Southern, and Washington into Western, or more precisely, Pacific Northwest area.
Fig. 1: Major dialect areas of the United States (Thomas 1958)
Having all this in mind, the intention of this paper will be to analyze and compare the vowel articulation of three speakers of American English, measure the frequencies of the first
4
three formants in all vowels, and draw general c onclusions on whether these dialects differ in the way their vowels are being pronounced, and to what extent. Therefore, this paper will use acoustic analysis with the help of a computer program to describe vowel articulation of three native speakers of American English. The previously determined “target” values found in textbooks and other sources will serve as a reference po int, but only to some degree, for it would be misleading to take these values as “absolute”. It must be noted that “formant values vary across speakers and depend on many variables. Even for a given speaker, formants may change according to phonetic contexts, manner of speaking and rate of speech. In fact, it should be stressed that there are no absolutely rigid descriptions of phonemes”. (Olive/Greenwood/Coleman 1993:81)
1.1.
METHOD
Three female speakers each recorded a total of 77 words, containing 11 vowels of the American English, in various positions and phonetic con texts. The words were chosen randomly, making sure that at least two phonetic contexts were present. The recording was made on one afternoon using a Shure PG47 microphone, and a laptop computer. The analysis was done with the help of the computer program “Praat”. The maximum formant frequency setting was changed when and where it was necessary, in most cases the default setting of 4500 Hz was used. The formants were measured in the usual way, at the place where all three formants exhibit a “steady state”, or, in cases when this was not possible, in the middle of the vowel articulation. The speakers are given names according to the states which they come from, therefore the terms “the Alabama speaker” or “the Washington speaker” will be used throughout this paper. It is important to emphasize that not all phonetic contexts have been taken into account, due to restrictions in the length and volume. Nevertheless, even without analyzing all the possible phonetic contexts, combinations and changes a vowel might manifest when influenced by a neighboring sound, it is still possible to draw a general conclusion on how different (or similar) certain vowels are in terms of their formants’ frequencies.
5
2. WASHINGTON STATE
2.1.
FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap)
The speaker from Washington is a female in her mid 30s, born in Arlington, WA. She works in education, and has a college degree. Since she never lived outside Washington, the possible influence of other regional dialects on her own is minimal. The first thing that is immediately noticeable in the spectrographic representation of her articulation is the relative steadiness of the formants for the articulation of , especially of the first formant. The values for the first formant are relatively close to the valu es suggested by Olive, Greenwood & Coleman (1993:104) with the average of about 300 Hz. The frequency of the first formant seems to have little or no variation throughout the articulation, regardless of the phonetic environment. Even in the instances when F2 and F3 move as a result of co-articulation, F1 retains its approximate value. In the words bleed and fleet , F2 rises significantly from the target value for /l/ and almost merges with the third formant. This is especially visible in the former, where the values of F2 and F3 differ by only 45 Hz. In other examples, neighboring sounds influence the frequency of the F2 in the expected manner. After fricative sounds, F2 has a slight rising move ment, and the same is visible in instances where the preceding sou nd is a bilabial stop /b/. The voicing, however, seems to have no influence on the frequency of F1-F3, since the relative values of F1F3 are the same for in both deep and peak , respectively. In all of the examples, F2 remains significantly high in comparison to the data provided by Olive, Greenwood and Coleman (1993:104). However, data from a Hillenbrand et al .(1995:3103) suggests that F2 values are somewhat higher than stated by Olive, Greenwood and Coleman (1993:104), probably because Olive, Greenwood and Coleman never stated the sex of their speakers. Stevens (1998:288) gives data from both male and female speakers, and the relative values of F2 for female speakers resemble greatly the data in this paper.
6
Although F3 is usually not essential for vowel identification, it was measured nevertheless. Spectrograms show that F3 is the least prominent of all three, often hardly even noticeable, and rarely with the steadiness in frequenc y found in F1. With the average value of around 3,440 Hz, its approximate value seems to be consistent with the data by Stevens (1998:288) and Hillenbrand et al . (1995:3103).
Fig. 2: in bleed
Fig. 3: in deep
For front vowels, F1 becomes lower when th e constriction in the oral cavity increases.
is the most constricted vowel. F1 increases as the tongue position gets lower. In addition, has the highest F2 and has the lowest F2.(Chen/Wang 2012) Consequently, is expected to have a higher F1 value than , and the data confirms it. The average value of F1 for the vowel is around 450 Hz, the highest being measured in the word kit (510 Hz), and the lowest in the word rip (380 Hz). There are no significant formant variations in any of the examples. The articulation is short, between 60 and 70 milliseconds. The strongest signal appears to be in the word rip, where the frequency of the first formant resembles the F1 of /r/, and the signal becomes darker as the articulation of the vo wel begins.
7
What is also noticeable from the spectrogram is the rise of the F2 and F3 in the word rip. F2 starts at around 1,200 Hz at the beginning of r , and immediately starts to rise until reaching its steady state at around 1,870 Hz. In instances when a velar sound precedes the vowel, F2 and F3 are close at the beginning of the articulation, and then start to move away from each other, as can be seen in the word kit . This is the result of a velar pinch which is characterized by the coming together of F2 and F3 during the articulation of a velar consonant (Olive/Greenwood/Coleman 1993:85). In addition, F2 and F3 exhibit a slight rising movement in instances when is preceded by a nasal sound. In other phonetic contexts, namely when preceded by a fricative s, or a voiceless stop p, F2 and F3 seem to have a steady frequency throughout the articulation, with little or no variation. The average value of 2,190 Hz for F2 is consistent with the data from Stevens (1998:288), and the average value for F3 (3,030 Hz) is almost identical with the findings of Hillenbrand et al. (1995:3103).
Fig. 3: in rip
Fig. 4: in ki t
The vowel is more back and also lower than or , as suggested by Ladefoged and Johnson (2010:90). As a result, F1 will be higher, and F2 lower in comparison to or
8
. All of the examples show the steadiness of F1 during most part of the articulation. Slight rising movement is visible in instances when a voiced bilabial b precedes the vowel, and F1 falls if the sound following it also is a voiced stop. In the words red , let and led , F1 retains its frequency throughout the articulation, while F2 and F3 move upwards to reach their target values. The average value for F1 is around 620 Hz. All three formants are usually visible in the spectrogram, F3 being the least prominent. F2 appears to be the least stable one, often having a rising or falling movement because of the phonetic context. Its average frequency is 2,080 Hz. The average duration of the vowel seems to be somewhat longer than for , often being more than 100 ms.
Fig. 5: in red
Fig. 6: in bet
The maximum separation (for the front vowels) between F1 and F2 occurs with the highest vowel, and is the smallest with the lowest (Olive/Greenwood/Coleman 1993:102). This is clearly noticeable from the data in this research. While the separation between F1 and F2, i.e. the difference in their frequencies, was around 2,500 Hz for , for it was only around 1,200 Hz. Not all possible phonetic contexts were taken into account for . The focus was the influence of nasal sounds on the vowel in instances when it follows the vowel in question.
9
Preceding sound in these examples is usually a stop, voiced or voiceless; in one instance, the voiceless sound is in an unaccented position to show the influence of aspiration (or the lack of it) on the visibility and movement of the formants. In the word trap, F1 starts to rise immediately after becoming visible in the spectrogram at the onset of /r/, and quickly reaches its steady state at around 950 Hz. The F1 value measured in the middle of the articulation was 970 Hz. In other examples, F1 seems to be rather stable throughout the articulation, with the exception of the word stamp, where F1 seems to be rising at the beginning of the articulation, possibly as a result of the transition from an unaspirated t to
. F3 is hardly even noticeable in trap, and its projected value of 2,850 Hz is, to some extent, disputable. What is also typical of F2 is its fall before nasal sounds, in words such as candle, stamp, or sand .
Fig. 7: the word trap
Fig. 8: in stamp
10
F1
300
454
618
848
F2
2840
2187
2083
2027
F3
3447
3033
2689
3028
Table 1: Average formant frequencies (in Hz) for front vowels (Washington speaker)
2.2.
BACK VOWELS ( as in goose, as in took, as in top, and
as in war) The back vowels differ from the front vowels in that F2 is much lower and closer to F1 for the back vowels than for the front (Olive/Greenwood/Coleman 1993:103). This is evident from the spectrograms for this speaker as well. In words like fool and pool , F1 and F2 are especially close to each other, with the difference in frequency of some 400 Hz. In goose, this is not the case, since F2 has a falling movement from a high position after the velar pinch. This speaker pronounces the word new as [], with a clear distinction between and , and not as , which is also a common pronunciation of this word in American English. This kind of pronunciation influences the shape of F2, since normally has a higher F2 than what is usual for (Olive/Greenwood/Coleman 1993:118). This results in a downward movement of F2 towards its target va lue, which, in this case, is around 970 Hz. The sounds and create a similar result in clue and shoe, where F2 first rises for
and starts off high for , but then gradually falls. F3 usually retains its initial value throughout the articulation, although some rising movement is noticeable in rude and shoe.
Fig. 9: the word pool
11
Fig. 9: the word shoe
For , no significant changes to the formants can be seen in most examples. After
in would and woman, F2 rises rapidly, although this rise is more apparent in would . All articulations are short, usually around 50 ms long. The average frequency of F1 is around 400 Hz. F1 in most cases retains its stable position and does not exhibit any significant movements, regardless of the environment. F2 is close to F1, although not as close as with . In addition, no noticeable diphthongization occurs in any of the articulations for this sound.
Fig. 10: in would
In words rot , lot , top or dot , where in RP the sound is predominantly found, in American English the sound is more common (Cruttenden 2008:84). has a slightly higher F1 than and , for this speaker 870 Hz was the average value that was measured. F1 and F2 are close to each other, and mostly holding their frequencies steadily in phonetic contexts examined in this paper. In rot , F3 exhibits a sharp rise in frequency at the beginning of the articulation, after being very low and close to F1 and F2 through the most part of the articulation of . In most words, F3 has a rather weak energy and is often barely visible in the
12
spectrogram. In addition, no diphthongization was found in the articulation of for this speaker.
Fig. 11: in lot
Fig. 12: in top
The vowel does not usually appear in American English in contexts without the sound following it. The whole issue involving and other sounds that may be pronounced in its place is rather more complex, and it depends from speaker to speaker. For some speakers, there is a difference in vowel q uality between the words force and north (Wells 1999:483). For the purposes of this paper, we will consider both words as having the same vowel
. Since is involved in all contexts for , a great deal of rhotic coloring (Olive/Greenwood/Coleman 1993:220) is present in all examples. In fact, all the words show a similar pattern, and what is said for one word can easily apply to other words as well. F1 has an average value of 524 Hz. It is stable during the articulation of the vowel, but it can have a slight rise near the transition towards . F2 usually starts off low and close to the first formant, but then gradually rises, while F3 falls. The duration o f the vowel is not long, although it is not short either. In four and score, it is around 150 ms long. F3 is high, often not entirely distinguishable.
13
Fig. 13: in four
Fig. 14: in score
F1
341
402
871
524
F2
1040
1194
1251
1037
F3
2646
2731
2727
3021
Table 2: Average formant frequencies (in Hz) for back vowels (Washington speaker)
2.3.
CENTRAL VOWELS ( as in run, as in first and as in cannon)
According to Olive, Greenwood, and Coleman, “the most central vowel is , the vowel in bud . This vowel is recognized by having formant values that most resemble the values of a neutral vocal tract; the first three formants are at a pproximately equal intervals” (Olive/Greenwood/Coleman 1993:103-104). These statements, as can be seen from the table, are not entirely consistent with the data from this measurement. Although it is true that the vowel is central, it is not “the most central”, since both and , respectively, appear to be closer to
14
that relative position (around 1,500 Hz for the second formant). Even if we disregard the difference of around 100 Hz, which is admittedly not big, we still cannot claim that , in this case,
is “the most central vowel”, since two more v owels occupy the same approximate position. All this, naturally, applies to this speaker only, and may not be true for the other speakers. The average value of F1 for is around 820 Hz, which is significantly higher than the data from Olive, Greenwood and Coleman (1993:104) and Yao et al.(2010:87), and somewhat higher than the data from Hillenbrand et al .(1995:3103) and Peterson and Barney (1952:183). This value of F1 suggests a somewhat lower position of , which is in fact almost as low as
. Since all sample - words except hut include a final alveolar nasal, all spectrograms have similar-looking patterns. All of the usual formant movements triggered by the preceding sounds are present: the F2 and F3 moving away from each other after initial velar con sonant, the rise of F2 and F3 after the liquid , and the obvious nasalization of the vowel characterized by the presence of the nasal formant. There are no indications that any form of diphthongization has taken place.
Fig. 15: in hut
Fig. 16: in gun
15
In , there is a large amount of r-coloring, as can be expected from a rhotacized accent of English. For this speaker, the articulation of the v owel is not long, and the pronunciation is systematic, with no apparent diphthongization. The average value of F1 is around 520 Hz, which is within what is usual. This vowel is mid-central, and its approximate position is very close to that of . F3 is close to F2, sometimes even merging with it, as in church. In fact, looks and sounds like a reversed variant of , there are no distinct areas within the spectrogram that might be characterized as being pure sound. This is probably why many classifications of American English vowels do not list as a distinct vowel. However, “there is no acoustically distinct consonant area in the region of , and, therefore, in a strictly concatenative-segmental analysis, we must consider this sound as part of the American English vowel system” (Olive/Greenwood/Coleman 1993:104).
Fig. 17: in church
Fig. 18: in first
Sample-words containing sound all include contexts in which is found in an unstressed position. Being is such a position, the articulation is very short, u sually around 40 ms, with up to 70 ms in Cana da, canno n and comma .
16
Even in such short articulations, formants are systematic, in a round formants even start to fall in anticipation of , however short and barely visible this movement may be. The average value of F1 is around 630 Hz, and for F2 it is 1,470 Hz. Since the lowest and the highest second formants measured for this speaker were 1,040 Hz for and 2,840 Hz for , is placed in the mid-central area.
Fig. 19: in Canada
Fig. 20: in appear
F1
819
523
632
F2
1395
1452
1472
F3
2628
1832
2560
Table 3: Average formant frequencies (in Hz) for central vowels (Washington speaker)
17
3. GEORGIA
3.1.
FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap)
The speaker from Georgia is a female in her mid forties, from Atlanta. What is immediately noticeable is that, in conversation, she does not have the accent typical of someone coming from the South, and claims that she had lost it through education and by moving around USA and abroad. Her regional accent, therefore, might be influenced by other regional accents to an extent that cannot be easily determined. For , F1 does not have any notable movement, either up nor down. With the average frequency of around 320 Hz, it is slightly higher than the data from Olive, Greenwood and Coleman (1993:104), but also slightly lower than the measurements conducted by Hillenbrand et al.(1995:3103), and Yao et al.(2010:87), respectively. In bleed and fleet , similarly as in the case of the previous speaker, F2 and F3 exhibit a rising movement after the articulation of l. On the other hand, both formants fall before m in seem, and k in peak . The relative duration of the articulation is long, usually longer than 200 ms. There is no indication of diphthongization of this vowel in any of the examples.
Fig. 21: in fleet
18
Fig. 22: in peak
In instances where a nasal sound both precedes and follows , as in the word nymph, F1 first rises to reach its target frequency, and immediately after that falls in anticipation of a nasal sound m. In this example, the values for all three formants resemble sound more than a typical sound. This similarity is even possible to notice au dibly. In the word tip, there is a noticeable diphthongization of , where towards the end of the articulation a more centralized sound resembling is heard, resulting in the pronunciation [t p. This allophone is mentioned by Wells (1999:485) as being present mostly in the southern parts of the USA, although he found it only in environments when a following final sound is a voiced consonant. F2 and F3, if not found in front of a nasal sound, usually retain their value throughout the articulation, with small variations, depending on the phonetic context. The average value of the F2 formant is around 1,860 Hz and the frequency of F3 is around 2,940 Hz.
Fig. 23: in nymph
19
Fig. 24: in ti p
has a larger value of F1 and only a slightly lower value of F2 if compared with . The sample-words chosen for this research include a limited number of phonetic contexts for
, with only t and d being in syllable-final position, while the preceding sounds include p, t, b, f , l, and r . The previously mentioned phenomenon of inserting after the vowels , , , does not seem to be present in the articulation of , except perhaps in the word bed . It is possible that this allophone was more frequently present in the pronunciation of this speaker in the past, but because of the influence of other accents is now present only in certain words. The average F1 value of 645 Hz is consistent with the findings of Yao et al. (2010:87) F1 seems to be rather stable in all of the examples, without any noticeable variations in frequency regardless of the phonetic environment. In instances where l precedes , F1 rises in order to attain its target value, and this rise is almost instantaneous. F2 has an expected rise in instances when r precedes the sound, and F3 is clearly visible in all examples.
Fig. 25: in bed
20
Fig. 26: in led
The average value of F1 for is around 703 Hz, which is the largest value of F1 for all front vowels. This makes the lowest of the four on the traditional articulatory vowel chart. Its shape seems to be uniform and its frequency steady throughout the articulation. In the environment where a final d follows the vowel, a slight falling movement of F1 is visible. F2 is stable in the words trap and bad , while in the words with a nasal sound following the vowel F2 usually has a falling movement. In these examples, the influence of nasalization is clearly visible in the presence of a nasal formant, which is characterized by a prominent low frequency F1. (Olive/Greenwood/Coleman 1993:97)
Centralization of towards the end of its articulation is also noticeable and audible upon closer inspection. Forms such as bnd or stmp seem to be occurring normally. Another allophonic variation noticed by Wells, involving an “assimilatory off-glide to the area” (Wells 1999:486), is also present with this speaker in the p ronunciation of the word tank as
tk . This feature is most certainly the attribute specific of this speaker’s regional phonetic heritage.
Fig. 27: in band
21
Fig. 28: in tank
F1
317
525
645
703
F2
2470
1865
1841
2036
F3
2997
2939
2917
2727
Table 4: Average formant frequencies (in Hz) for front vowels (Georgia speaker)
3.2.
BACK VOWELS ( as in goose, as in took, as in top, and
as in war) F1 and F2, expectedly, are close in the articulation of , as is the case with all other back vowels. F1 is low, similar to , only slightly higher. It holds a steady frequenc y of around 340 Hz in average throughout the articulation, and in all of the sample-words. Depending on the environment, F2 can have a larger separation from F1, namely in words in which ,
, and precede the vowel. In goose, F2 starts from a high position as a result of a preceding sound and the velar pinch associated with its production. F3 rises in goose and rude, and falls in new. Similar to the Washington speaker, this speaker also pronounces new with the sound, which then has an identical effect on the formants as previously stated. When
follows the vowel, it does not seem to have the same effect on F2 as it does in cases when it precedes it. There is no apparent movement of F2 and, in fact, all three formants have a rather steady frequency.
22
Fig. 29: in goose
Fig. 30: in rude
In crooked , F2 and F3 are close, but the vowel itself is very short, formants are visible for only 40 ms before fading out quickly and completely. F1 and F2 are not as close as in , although in full they almost merge. There is almost no movement of formants when the following sound is . The influence of in the word crooked seems to be minimal in the area where the vowel has already started its articulation. The duration of the articulation is generally short, with the exception of in could , which is over 200 ms long, but it cannot be said that this vowel has become long in this context since the data from Hillenbrand et al. (1995:3103), and Yao et al. (2010:87), to name but a few, is even longer. The average duration of F1 is 430 Hz, and of F2 1,250 Hz.
23
Fig. 31: in crooked
Fig. 32: in could
For , F1 is usually around 780 Hz, it is steady with no significant movement. There is, however, a small rising movement at the onset of the vowel preceded by , as in the word jot . Here, F1 and F2 start away from each other and then move closer to reach their target values. A similar situation is visible in dot . There is a gradual rise of F2 in situations where a liquid precedes , as evident from the spectrograms in lot and rot. The average value of F2 is around 1,260 Hz, which is lower if compared with the data from Hillenbrand et al. (1995:3103) and Yao et al. (2010:87), but consistent with the data from Peterson and Barney (1952:183). F3 is weak, but usually steady in its frequenc y, except after a liquid, when a sharp rising movement is visible. There are no indications of diphthongization in the articulation of this vowel for this speaker.
24
Fig. 33: in jot
Fig. 34: in dot
In , F1 is low for this speaker, in fact, it is the lowest among all three. The average frequency of F1 is only 366 Hz, which is significantly lower than the data from Hillenbrandet al. (1995:3103), Yao et al. (2010:87), and Olive, Greenwood and Coleman (1993:104). The following always has a similar effect on the formants, usually increasing the value of F2 and decreasing the value of F3. In many sample-words, formants do not seem to be particularly steady, often having rapid movements up or down. F1 and F2 are close for the most part of the articulation, usually with the difference between 400 and 500 Hz. F3 is the least prominent of all three formants, with the least amount of energy.
Fig. 35: the word more
25
Fig. 36: the word war
F1
340
434
781
366
F2
1288
1252
1267
785
F3
2614
2609
2696
2420
Table 5: Average formant frequencies (in Hz) for back vowels (Georgia speaker)
3.3.
CENTRAL VOWELS ( as in run, as in first and as in cannon)
The average value of F1 measured for is 740 Hz, which places this vowel rather low, almost to the level of . The vowel is in the central position in the vowel chart, with the average value of F2 around 1,400 Hz. The articulations are usually not long, which is normal since is considered a lax vowel (O’Grady/Dobrovolski/ Katamba 1997:42). Apart from being slightly lower than what would be usual, there are no other noticeable differences between this vowel and current relevant phonetic descriptions of this sound.
26
Fig. 37: in fun
Fig. 38: in hu t
As with the Washington speaker, the Georgia speaker also h as a rhotacized , evident by the large amount of r-coloring. Apart from being slightly more front, there are no other significant differences between the articulations of for these two speakers. The average value of F1 is around 550 Hz, which is consistent with both Hillenbrand et al. (1995:3103), and Peterson and Barney (1952:183), respectively. The second formant’s average value is around 1,380 Hz, which is somewhat lower than the data from the previously mentioned sources. Characteristically, F2 and F3 are very close, some times even barely distinguishable from one another.
Fig. 39: in curse
27
Fig. 40: in journey
The sound for this speaker is short, in initial positions in words like a pprove, a bove etc. it is around 50 ms long. It appears that this sound is somewhat longer if found in medial or final position. In the words Cana da, canno n, and comma , the articulation is between 70 and 80 ms long, depending on the particular word. The average value of F1 is around 650 Hz, which is a little higher than for , and thus a little lower on the vowel chart. F2 is around 1,360 Hz.
Fig. 41: in above
Fig. 42: in cannon
28
F1
738
556
651
F2
1397
1379
1362
F3
2569
1828
2745
Table 6: Average formant frequencies (in Hz) for central vowels (Georgia speaker)
29
4. ALABAMA
4.1.
FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap)
The speaker from Alabama is a female in her late forties, college educated, who has lived in Mobile, Alabama, her whole life. Her accent is noticeably Southern, with all of its typical features. Admits that in certain parts of the USA, speakin g in a Southern accent still carries a level of social stigma, but by her own account, she has never tried to change it. This, among other obvious reasons, makes this speaker a releva nt representative of Southern/Alabama accent. The formants are not always easily visible in . They often show discontinuation and sometimes completely disappear. In those instances it was necessary to rely onl y on the computer program in measuring them, thus certain incorrect measurements a re unfortunately possible. Another difficulty stems from the relative closeness of F2 and F3. In words like beat or deep, F2 and F3 are especially close to one another, making them more difficult to differentiate, and thus, properly measure. All articulations of this vowel are long, much longer than for the other two speakers. In bleed , for example, the vowel articulation is around 350 ms long. The F1 frequency is low, which is typical of this vowel. The average value of F1 is around 320 Hz, which corresponds with the values from Peterson and Barney (1952:183) and Hillenbrand et al. (1995:3103). F2 exhibits a rising movement in bleed and fleet , but also in beat . F3 usually does not vary too much; however, it is sometimes difficult to distinguish. The largest separation among the first three formants is between F1 and F2, with F2 being especially high, but not higher in comparison to the Washington speaker.
30
Fig. 43: in beat
Fig. 44: in deep
In nymph , F1 has a somewhat lower frequency value, possibly because of the inability of the computer to properly measure it, since the nasal formant is present throughout the articulation, suggesting a strong nasalization. Because the nasal formant se ems to have a similar, but somewhat lower value, it has possibly influenced the measurement of F1 by the computer that was, it seems, unable to tell apart between F1 and the nasal formant. The spectrogram itself is also not conclusive, but visual analysis seems to suggest the value of F1 at around 525 Hz. In myth and rip, F2 and F3 show a sharp rise in frequency as a result of a nasal sound preceding the vowel in former, and a liquid in latter. In almost all words, a mild diphthongization is audible, usually involving a glide towards the sound . In other sample-words, formants seem to have a more or less steady frequency throughout the articulation, with minor adaptations in order to reach their target values.
31
Fig. 45: in nymph
Fig. 46: in myth
In most cases, is heard at the end of the articulation of the sound . This is most evident in words like bed , red or ted , i.e. before final voiced consonants. F1 is high, the highest for all three speakers with the average value of around 704 Hz. The formants are not always clearly visible, F1 is not especially prominent in let , and F2 and F3 in bet . The duration of the vowel articulation is long, usually between 250 and 300 ms, but sometimes even longer.
Fig. 47: in bed
32
Fig. 48: in let
This speaker exhibits the largest amount of diphthongization of the vowel in comparison with the other two speakers. Words like trap, bad or stamp, sound more like[], [ ] or [] rather than, and, the pronunciations provided by Wells in his pronunciation dictionary: (Wells 2000:793), (Wells 2000:61), and (Wells 2000:729). [] is a common substandard substitution for (Kenyon 1964:156). In most sample-words, is found preceding the nasal consonant, thus F2 and F3 often move up or down in response to the phonetic environment. In stamp, F1 is masked by the presence of the nasal formant, i.e. it is almost indistinguishable in the spectrogram. The computer measured it at around 285 Hz, but this is highly doubtful, for it is probabl y the frequency of the nasal formant that the computer measured, and not the F1 of . Visual analysis seems to suggest F1 being at around 900 Hz, but this is open to some debate since the formant is only faintly visible, mostly at the beginning of the articulation. Th e same problem arose in all other words where a nasal followed , thus the computer measurement could not be taken as reliable. In those cases, visual analysis was the primary means of me asurement.
Fig. 49: in bad
33
Fig. 50: in stamp
F1
323
500
704
814
F2
2626
2157
2049
2026
F3
3164
2935
2947
2718
Table 7: Average formant frequencies (in Hz) for front vowels (Alabama speaker)
4.2.
BACK VOWELS ( as in goose, as in took, as in top, and
as in war) In goose, formants F2 and F3 move rapidly from each other and quickly attain their target values. F2 is high, and it never comes close to F1, which would normally be expected in back vowels. In fool and pool , a centring off glide into sound is heard in the second half of the articulation, resulting in [] and [], respectively. This pronunciation is typical for some speakers, especially in the mid western speech (Wells 1999:487), but throughout the country as well (Kenyon 1964:172). In clue and rude, F2 rises after and , and new is pronounced [], which is why formants seem to have a much more stable and stead y frequency as opposed to what was visible for the other two speakers. The average value of F1 is around 425 Hz, and for F2 it is around 1,490 Hz. In instances where neighboring sounds do not have significant influence on F2, F1 and F2 are close. The duration of the vowel is longer than with the other two speakers, often more than 400 ms.
34
Fig. 51: the word fool []
Fig. 52: the word new []
In full , the vowel pronounced is closer to than to , the sound that would normally be expected in American English (Wells 2000:311). In could , an off glide into is evident, resulting in []. In this situation, the vowel is pronounced longer than usual, more than 300 ms in this particular example. F1 and F2 are close with little or no movement, except in sugar , where F2 starts high and then falls in order to attain its target value. The average frequency of F1 is around 470 Hz, which is normal for a female speaker. F3 usually has a weak energy, and sometimes it is barely visible in the spectrogram.
Fig. 53: the word full
35
Fig. 54: the word could
The average value of F1 in is 990 Hz. F2 is very close at 1,340 Hz. In lot , F1 and F2 are so close that they merge, making them indistinguishable from one another. F3 is onl y barely visible in almost all sample-words, where only in rot a slightly stronger F3 can be seen having a rising movement as a result of a liquid preceding the vowel. The first two formants have a steady value throughout the articulation, except in those instances where preceding sounds caused a movement, as for example in jot and dot , or as in already mentioned liquid-to-vowel sequences.
Fig. 55: in dot
Fig. 56: in jot
In most of the sample- words, but most notably in four and score, this speaker produces an off glide to the area, resulting in [] and [], respectively. In north, F2 and F3
36
start the articulation at almost the same height, and then separate, F2 moving down while F3 up. The average F1 frequency is 430 Hz, which is low, but can be explained by the influence of on the preceding vowel. F2 is close to F1, usually with a 400 Hz difference in frequency. The duration of the vowel is predictably long, with only in boring being less than 200 ms.
Fig. 57: [] in four
Fig. 57: [] in score
F1
424
470
992
432
F2
1489
1200
1340
848
F3
2276
2374
1897
2127
Table 8: Average formant frequencies (in Hz) for back vowels (Alabama speaker)
37
4.3.
CENTRAL VOWELS ( as in run, as in first and as in cannon)
For this speaker, is articulated rather long, which seems to be a general feature for this speaker, observed for almost every vowel. Th e rate of speaking is, obviously slower, so every vowel appears to be much longer if compared with the other two speakers’ pronunciation. This speaker, however, retains the normal long vs. short vowel distinction, with the difference of having longer articulations than usual for short vowels, and even longer ones for long vowels. The duration of for this speaker ranges from 150 ms to 250 ms. Surprisingly, in hu t, the articulation is around 250 ms long, which is much longer than for the other two speakers whose articulation in hut is 50 ms for the Washington speaker, and 80 ms for the Georgia speaker. The signal for the first formant is not prominent throughout the articulation. In man y cases it almost disappears or becomes faintly visible at best, making it more d ifficult to measure. The average formant value is around 715 Hz for F1, and 1,415 Hz for F2. This confirms as a central vowel, being somewhat “more front” than.
Fig. 58: in hut
Fig. 59: in sun
38
This speaker, like the rest, produces a large amou nt of r-coloring of the vowel when pronouncing words like wor se, chur ch etc. The average value of F1 for vowel is around 485 Hz, which is consistent with the data from Hillenbrand et al.(1995:3103) and Peterson and Barney (1952:183). F2 is around 1,530 Hz, which places this vowel firmly in the central area of the vowel chart. F2 and F3 are almost merged, and there is no obvious diphthongization of this sound visible in the spectrogram.
Fig. 60: in worse
Fig. 61: in church
The sound is pronounced short, its length being approximately the same as for the other two speakers, which is not always the case when other short vowels are in question. The average value of F1 is around 500 Hz, which is similar to the value of . However, these two vowels do not occupy the same position within the vowel chart, since the lower F2 in placed it somewhat behind . Although the final sound in Canad a was not part of the original measurement, it is interesting to note that the Georgia an d the Alabama speakers pronounce it as what is best transcribed as [ ], while the Washington speaker uses , thus pronouncing [].
39
Fig. 62: in Canada
Fig. 63: in appear
F1
715
484
497
F2
1415
1533
1284
F3
2286
1806
1862
Table 9: Average formant frequencies (in Hz) for central vowels (Alabama speaker)
40
5. CONCLUSION By looking at the data presented so far, it is possible to draw general concl usions about the personal and/or regional characteristics of speech of the analyzed speakers. Because of the limitations in length and volume, we will not be looking at all the noticeable differences visible for each speaker. Hence, it must be noted that this paper is not the complete analysis and that it does not deal with all the regional and individual characteristics that the regional dialects of Alabama, Georgia and Washington normally exhibit.
Fig. 64: The Alabama speaker vowel chart
One general conclusion can be made by looking at the data. Voicing seems to have no influence on the frequency of F1-F3. Measurements showed similar values in both voiced and voiceless environments. It is also noticeable that F1 is the lowest for front, high vowels and the highest for front, low vowels. F2 behaves in the opposite manner: the value decreases as the tongue moves lower in the mouth. The separation between F1 and F2 is the largest with high vowels, and decreases towards low positions. As far as individual words are concerned, the difference in the pronunciation of new and Canada was observed. The speakers from Washington and Georgia pronounced the word new as [], while the Alabama speaker pronounced it []. In Canada, the speakers from Alabama and Georgia seem to have an [] sound at the end of the pronunciation, pronouncing it [], while the Washington speaker pronounced it [].
41
One common thing observed for all three speakers is the amount of r – coloring for sounds and , which is usual and normal for most speakers of American English. Further analysis shows that even though articulation of some short vowels was rather long in the case of the Alabama speaker, the long vs. short relation between the vowels was still preserved. The long vowels simply had an even longer articulation. In addition, the length of the vowel did not affect the behavior of the formants. The articulation of was very short for all three speakers, however formants still moved in anticipation of neighboring sounds, like . Upon examination of the vowel charts created for each individual speaker’s first and second formant values, it is noticeable that, in the case of certain vowels, the relative position of the vowel is different from speaker to speaker. , for example, is similar, and its position as being the highest and also the most front vowel for all three speakers is, therefore, confirmed. On the other hand, some vowels seem to be pronounced at completely different positions. The most obvious difference is the relative position of articulation for the v owel . We can see that, for the Alabama speaker, is heavily centralized, and basically not very far from . The centralization of is noticeable for the Georgia speaker as well, although not in such an extreme way. Its short and lax counterpart, the vowel , seems to be at the position for all three speakers.
Fig. 65: The Georgia speaker vowel chart
42
The most peculiarly looking vowel chart is seen for the Alabama speaker. This speaker seems to have a number of vowels grouped together and not very far from each other, all in the central, mid-high area of the vowel chart. What is surprising is the fact that, unlike the other two speakers, for this speaker seems to be not as central as the vowel , which might suggest that this speaker clearly differentiates between these two vowels.
Fig. 66: The Washington speaker vowel chart
The only vowel that is truly back for the two speakers belonging to the Southern dialect is the vowel . For the Washington speaker, alongside , the vowel also exhibits the same degree of backness. remains the lowest vowel for all three speakers, although the relative position of is somewhat different for the Alabama speaker, bein g somewhat higher. Front vowels and are close in the vowel chart for the two speakers from the South. This suggests that in instances where vowel is normally found, these speakers show inclination towards pronouncing a sound similar to instead. As mentioned previously, the Alabama speaker exhibits strong diphthongization of , resulting in [], which is a variant associated with eastern New England and the south respectively (Wells 1999:477). Other differences, usually involving off gliding into other sounds, are presented in the main discussion for each individual speaker and need not be repeated here. It can be concluded that the vowel system for these speakers, in most part, does not differ significantly. Differences that were found may be explained as either individual idiolects, or instances of regional variation.