A Profile of Arabic Script Languages Bushra Zawaydeh, Ph.D., Senior Linguist June 7, 2007
Proprietary Information of Basis Technology Corp.
History of the Arabic Script
Derived from the Nabataean Nabataean script, which was used in Petra in the 2nd century BC.
2
History..
The Nabatae Nabataean an script is an offshoo offshoott from the Aramaic Aramaic script. script.
The Aramaic script developed from the Phoenician script.
The Phoenician script was a model for the Greeks to develop the Greek writing system (around 1000 B.C.), from which English, and all Western alphabets were based on.
3
4
Developmentt of Phoenician Script Developmen
5
Developmentt of Ar Developmen Arabic abic Script
Arabic inscriptions became widely available after the birth of Islam.
The Quran descended upon the prophet Mohammad in the year A.D. 612 (Khan, 2001)
6
History
Before the the descension descension of the Quran, Arabic was primarily an oral language.
Arabic is considered a holy language because it is the language of the Quran. Hence it is the primary prayer language for Muslims.
Arabic spread through the spread of Islam. By the 11 th century, Arabic became the common medium of expression from China to France.
7
Typess of Arabic Type A rabic Callig Calligraphy raphy:: Kufi Ku fic c
The earliest manuscripts of the Quran (8 th – 10th century) were written in the Kufic style of Arabic writing (Campbell, (Campbell, 1997). 1997).
Kufic script is angul Kufic angular, ar, which which was most likel likely y a product of inscribing on hard surfaces such as wood or stone.
8
9
10
Typess of Arabic Type Ar abic Cal Callig ligraphy raphy:: Naskhi
Since the 11th century, the cursive style that is known as Naskhi was developed.
11
Arabic Ar abic Abjad Abjad
There are different writing systems that languages use, such as:
Alphabet Alph abet – deno denotes tes both conson consonants ants and and vowels. vowels.
Ex: English.
Abja Ab jad d – de deno note tess co cons nson onan ants ts..
Ex: Arabic, Hebrew.
Syllab Syl labary ary - cha charac racte ters rs denot denote e sylla syllable bles. s.
Ex: Japanese Hiragana
12
Spread of Arabic
The Muslim Arab civilization flourished in the Arabian Peninsula, and was embraced by the Turks, Iranians, Afghans, Indians, North Africans, Spanish Andalusians.
Arabic became the language of art, science, and technology.
Islamic Calligraphy became a noble art, that was appreciated more than any other form of art.
13
Samples of Arabic Ar abic Call Calligraphy igraphy
14
15
Cursive Ar Arabic abic Call Callig igraphy raphy
16
F eat eatur ures es of the the Arabic A rabic Script
The Arabic alphabet contains 28 letters.
complex text language, because it has bidirectional script. It is written right to left, except for numbers and Latin words are written left to right.
Many letters change their form depending on whether they appear alone, at the beginning, middle or end of the word.
Letters that change form, are always joined in both hand-written and printed Arabic. Hence, it is cursive, as in the English hand writing.
Only 3 long vowels are written.
Diacritics indicate things like short vowels and gemination. 17
Arabic Ar abic Abjad Abjad
18
Arabic Ar abic Letters Letters in Diff Differen erentt Positions Positions
19
Letters in Diff Differen erentt Positions Positions
20
Arabic Diacritics
21
More Mor e Features of the Arabic Script
Lack of capital letters.
Lack of word division word finally.
Unlike many other alphabetic scripts, it denotes a high phonetic accuracy, when diacritics are added.
22
Arabic Ar abic Ligat Ligatur ures es
Arabic script uses ligatures. A compulsory one is the lam followed by an aleph:
23
Ligatures
Optional/ stylistic
24
Arabic Ar abic Language Language
Arabic is a Semitic language.
221 million speakers.
Countries it is spoken in:
Afghanistan, Algeria, Bahrain, Chad, Cyprus, Djibouti, Egypt, Eritrea, Iran, Iraq, Israel, Jordan, Kenya, Kuwait, Lebanon, Libya, Mali, Mauritania, Morocco, Niger, Oman, Palestinian West Bank & Gaza, Qatar, Saudi Arabia, Somalia, Sudan, Syria, Tajikistan, Tanzania, Tunisia, Turkey, UAE, Uzbekistan and Yemen.
25
Worldwide use u se of the the Arabic A rabic Abjad Abjad
•Dark green → Countries where the Arabic script is the only official orthography. •Light green → Countries where the Arabic script is used alongside other orthographies.
26
Arabic Arabi c Abjad Abjad Usa Usag ge in Other Languag Languages
Arabic Abjad is used in a large large number number of languages languages other other than Arabic.
Abjad sprea spread d in the world through through the the Islamic Islamic conquests conquests (7-8th century).
It is the second most widespread script in the world.
27
Writing Systems of the World Wor ld Today
28
Languages Using the the Ar Arabic abic Script Presen Presentl tly y 1. Arabic
11. Berber languages.
2. Persian/ Da Dari
12.. Mopl 12 Moplah ah (d (dia iale lect ct of of Mala Malaya yala lam) m)
3. Urdu
13. Malagasy
4. Pashto
14. Sulu
5. Baluchi 6. Kurdish 7. Lahnda 8. Kashmiri 9. Sindhi 10.. Uy 10 Uygh ghur ur
29
Languages that that Abandoned the Ar Arabic abic Script
Languages now using Roman
Indonesian (Malay)
Hausa
Somali
Sudanese
Swahili
Turkish
Caucasian languages now using Cyrillic
Chechen
Kabardian
Lak
Avar
Lezgi
30
Adoption of the Arabic Ar abic Script
When the the Arabic Abjad was adopted, it was augmented to fit the phonologie phon ologiess of the non-se non-semitic mitic lang language uages. s.
The alphabet was extended by the different languages. The 28 basic Arabic letters were extended to more than 100 letters (Esfahbod, 2004).
31
Method of Adoption
All the Arabic letters are borrowed directly to preserve the Arabic orthography.
When borrowing Arabic loanwords, the pronunciation would depend on the phonology of the borrowing language.
Arabic specific sounds that are not present in the borrowing language, would be pronounced pronounced as a sound that is present in that language. language. Ex: the Arabic gutturals and interdentals.
32
Arabic Gutturals
Sounds produced with a constriction in the back part of the vocal tract (Zawaydeh, 1999)
ظ، ص، ض،ط
Emph Em phat atic icss (T (T,, D, S, Z) Z)
Uvul Uv ular arss (q (q,, X) X)
Phar Ph aryn ynge geal alss (H (H,, Eiyn) Eiyn)
Laryn La ryngea geals ls (gl (glott ottal al stop, stop, h) h)
خ،ق ع،ح ء،ﻩ
33
Rendition of Arabic Ar abic Guttural Gutturalss and Interdental Interdentalss
The Arabic emphatics are not pronounced as uvularized, but rather as plai plain, n, non-uv non-uvulari ularized zed sound sounds. s.
Persian:
عsound is pronounced as a glottal stop. Pharyngeal حsound is pronounced as a [h].
Persian phonetic redundancies:
Pharyngeal
ص، س،ث Persian /z/ is rendered as ز، ذ، ض،ظ Persian /s/ is rendered as
34
Nast Na sta aliq Sc Scrip riptt
A writing style which is used, with extra letters, to write:
Farsi
Urdu
Pashto
Kashmiri
Sindhi
Turkish Turki sh - (Und (Under er the Ottoman Ottoman Empire Empire before 1920). 1920).
35
Nast Na sta aliq Sa Sampl mples es
36
Persian
Locally called:
Farsi in Iran.
Dari in Afghanistan
Tajikii in Central Tajik Central Asia (former (former Soviet Union Union countries) countries)
Dialects:
Lari La ri (i (in n Ir Iran an))
Hazaragi Hazar agi (in Afgh Afghanist anistan), an),
Darwazi Darw azi (In Afghani Afghanistan stan and and Tajikistan Tajikistan))
37
Persian Languag L anguage e Map
38
Status of Languages in Iran
Main languages:
Persian and its dialects 58%
Azeri and other Turkic languages 26%
Kurdish 9%
Balloc Ba och hi 1%
Arabic 1%
Official language is Persian.
Ethnologu Ethn ologue e repor reports ts 71 71 langua languages! ges!
39
40
Strategies for Modifying Arabic Script: Persian
Basic Strategy:
Add more dots to certain letters to create new letters.
Persian added 4 more letters.
پwhile Arabic /b/ is ب. Persian /ʒ / is: ( ژwhile جis /ʤ /) /) Persian /ʧ / is: چ Persian /g/ is: گ – this originally originally had three dots dots.. Persian /p/ is:
41
Persian/ Per sian/ Dari Alph Alphabet abet
32 letters. Red is the Persian additional letters.
42
Persian vs. Arabic
used use d for Izaf Izafet et com compou pounds nds..
Pers Pe rsia ian n Kaf Kaf an and d Ya
43
Other Persian Orthographic Or thographic Modificat Modifications ions
إا ةﻩ
or
ت
Arabic words with hamza, may be spelled in various ways, example: ﻣﺴﺆولis spelled as ﻣﺴﺌﻮل.
Damma Dam ma is pronounced pronounced as an an [o] not an [u] [u] as in Arabic. Arabic.
44
Languages Extending the Persian Alphabet
Some languages used the Persian alphabet as a base, which in turn is based on Arabic, and added more letters that are not in Persian or Arabic.
Examples:
Urdu
Pashto
Sindhi
45
Status of Languages in Pakistan Pak istan
Major languages in Pakistan are:
Punjabi, Saraiki, Sindhi, Pashto, Urdu, Balochi, Hindko, and Brahui.
Official language is English. National Language is Urdu.
Language Distribution
Punjabi 44% Pashto 15% Sindhi 14% Sira Si raik ikii 11 11% % Urdu 8% Balo Ba loc chi 4% others 4%
46
Languages in Pakista Pakistan n
47
Status of Languages in Pakistan Pak istan
Urdu and Sindhi have standardized spellings. If a speaker from the other languages needs to write their language, they would use either Urdu or Sindhi.
In Pakistan, the classical spelling standard of Pashto is not always followed. There is a tendency to use the Urdu forms of letters instead of the Pashto forms (UCLA Language Materials Project).
48
Urdu Alpha A lphabet bet
Red is Persian Letters.
Blue is the Urdu letters
49
50
Urdu Alpha A lphabet bet
Uses the emphatic طabove the letter to mark sounds that are retroflex, which are the “d, t, and r”.
Uses the shape of the Arabic nun نwithout the dot, to indicate nasalized vowels: ﻣﺎںma ː ̃ “Arab”
For aspirated consonants,
Urdu [h] appears in the following forms:
Distinguishess between [i] and [e, ɛ] sounds word finally: Distinguishe
ﯽﮐﮍﻟlaɽki ﮯﮐﮍﻟlaɽke
ː
follows the letter.
“girl”. “boys”.
51
Status of Languages in Afg A fghan hanistan istan
Official languages are Pashto and Dari (Afghan Persian).
Turkic languages (Uzbek and Turkmen).
Other languages: Baluchi, Pahsai, Nirisani, etc.
52
Pashto
Uses a modified form of the Perso-Arabic script. Improvised the Perso-Arabic script by adding letters that don’t appear in any other script.
Used 4 Persian letters. Added 8 more letters: 4 Retroflex consonants /t/, /d/, /r/, /n/. Written with “pandak”, “gharwandah”, or “skarraen”: ټ ډ ړand ڼ
ښږ dental den tal affr affrica icates tes /dz/ ځand /ts/ څ Letters Le tters “ge “ge” ” and “x “xin”: in”:
[g] is written either in the Persian style or as:
ګ
ابپتټثجځچڅحخدډذرړزژږسشښصضطظعغفقﮎګلمنڼﻩ ۀوؤىئيېۍ
53
54
Pashto Zwarakay
Pashto has a 4th vowel diacritic, which looks like a horizontal line.
55
Pashto diacritics
56
Arabic Numbers
The decimal numbering system originated in India.
It got adapted by the Arabic world.
The Europeans adopted the Arabic numbers.
57
Arabic Numbers
The number 4, 5, 6, 7 have various forms in the languages of Iran, Pakistan, and India.
58
Basis Technolog Techn ology y Products Pr oducts Handling Ar Arabic abic Script
Arabic
Base Linguistics Arabic Ara bic Chatroo Chatroom m Rev Revers erse e Transliterator Entity Extractor Name Matching Name Translation Arabic Editor Transliteration Assistant Digital Forensics Language Identification
Persian
Base Linguistics Entity Extractor Transliteration Assistant Name Matching Name Translation Digital Forensics Language Identification
Urdu
Base Linguistics
Entity Extractor
Name Matching
Name Translation
Language Identification
Pashto
Transliteration Assistant
Name Matching
Name Translation
Language Identification
59
References
Afghan Transitional Islamic Administration. Ministry of Communications. United Nations Development Program. Computer Local Requirements for Afghanistan. Bhurghi, Abdul-Majid. Abdul-Majid. Enabling Pakistani Pakistani Languages through Unicode. (Written for for Microsoft). Campbell, George. 1997. Handbook of Scripts and Alphabets. New York: Routledge. Eid, Mushira, et. Al. 2006. Encyclopedia of Arabic Language and Linguistics. Volume I. Ishida, Richard. 2004. Urdu script notes [Draft]. http://people.w3.org/rishida/scripts/urdu/urdu http://people.w 3.org/rishida/scripts/urdu/urdu-in-unicode.html -in-unicode.html.. Kew, Jonathan. 2005. Notes on some Unicode Arabic Ar abic characters: recommendations for usage. Draft 2. Khan, Gabriel Mandel. 2001. Arabic Script. New York: Abbeville Press. Milo, Thomas. 2002. Authentic Arabic: A case Study. 20th International Unicode Conference. Washington, DC. Salloum, Habeeb. The Odyssey of the Arabic Language and a nd its Script. http://www.alhewar.com/habeeb_salloum_arabic_language.htm UZT 1.01 & Unicode Mapping Mapping for Urdu. Center for Research in Urdu Language Processing. National University of Computer and Emerging Sciences. Unicode Standard 4.0. Zawaydeh, 1999. The Phonetics and Phonology of Gutturals in Arabic. Ph.D. Dissertation. Indiana University.
60
61