Abstract
This chapter presents an overview of Turkish morphology followed by the architecture of a state-of-the-art wide coverage morphological analyzer for Turkish implemented using the Xerox Finite State Tools. It covers the morphophonological and morphographemic phenomena in Turkish such as vowel harmony, the morphotactics of words, and issues that one encounters when processing real text with myriads of phenomena: numbers, foreign words with Turkish inflections, unknown words, and multi-word constructs. The chapter presents ample illustrations of phenomena and provides many examples for sometimes ambiguous morphological interpretations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Literally, “(the thing existing) at the time we caused (something) to become strong.” Obviously this is not a word that one would use everyday. Turkish words (excluding non-inflecting high-frequency words such as conjunctions, clitics, etc.) found in typical running text average about 10 letters in length. The average number of bound morphemes in such words is about 2.
- 2.
For phonological representations we employ the SAMPA representation. The Speech Assessment Methods Phonetic Alphabet (SAMPA) is a computer-readable phonetic script using 7-bit printable ASCII characters, based on the International Phonetic Alphabet (IPA) (see http://en.wikipedia.org/wiki/Speech_Assessment_Methods_Phonetic_Alphabet (Accessed Sept. 14, 2017) and www.phon.ucl.ac.uk/home/sampa/ (Accessed Sept. 14, 2017)). The Turkish SAMPA encoding convention can be found at www.phon.ucl.ac.uk/home/sampa/turkish.htm (Accessed Sept. 14, 2017).
- 3.
In this chapter, we use - to denote syllable boundaries and + to denote morpheme boundaries wherever appropriate.
- 4.
For example, Xerox Finite State Tools, available at http://www.fsmbook.com (Accessed Sept. 14, 2017), FOMA, available at http://fomafst.github.io/ (Accessed Sept. 14, 2017), HFST available at http://hfst.sf.net (Accessed Sept. 14, 2017) or OpenFST available at http://www.openfst.org (Accessed Sept. 14, 2017).
- 5.
Note that we also explicitly show the morpheme boundary symbol, as in implementation, it serves as an explicit context marker to constrain where changes occur.
- 6.
There are also very special forms denoting families of relatives, where the number and possessive morphemes will swap positions to mean something slightly different: e.g., teyze+ler+im “my aunts” vs. teyze+m+ler “the family of my aunt.”
- 7.
An example below when we discuss derivation will show a full deconstruction of a complex verb to highlight these features.
- 8.
Obviously the first two are applicable to a smaller set of (usually) transitive verbs.
- 9.
We present the surface morpheme segmentations highlighting the relevant derivational morpheme with italics.
- 10.
So the next time you are up on a cliff looking down and momentarily lose your balance and then recover, you can describe the experience with the single verb düşeyazdım.
- 11.
Where meaningful we also give the segmentation of the words form into surface morphemes in italics.
- 12.
Users of such words have the bizarre presumption that readers know how to pronounce those words in English!
- 13.
In every group we first list the morphological features of all the tokens, one on every line and then provide the morphological features of the multiword construct followed by a gloss and a literal meaning.
- 14.
Here we just show the roots of the verb with - denoting the rest of the suffixes for any inflectional and derivational markers.
- 15.
The question and the emphasis clitics which are written as separate tokens can occasionally intervene between the components of a semi-lexicalized collocation. We omit the details of these.
References
Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Stanford University, Stanford, CA
Clements GN, Sezer E (1982) Vowel and consonant disharmony in Turkish. In: van der Hulst H, Smith N (eds) The structure of phonological representations. Foris, Dordrecht, pp 213–255
Karttunen L (1993) Finite-state lexicon compiler. Technical report, Xerox PARC, Palo Alto, CA
Karttunen L, Beesley KR (1992) Two-level rule compiler. Technical report, Xerox PARC, Palo Alto, CA
Karttunen L, Chanod JP, Grefenstette G, Schiller A (1996) Regular expressions for language engineering. Nat Lang Eng 2(4):305–328
Kornfilt J (1997) Turkish. Routledge, London
Koskenniemi K (1983) Two-level morphology: a general computational model for word-form recognition and production. PhD thesis, University of Helsinki, Helsinki
Oflazer K (1994) Two-level description of Turkish morphology. Lit Linguist Comput 9(2):137–148
Oflazer K (2003) Lenient morphological analysis. Nat Lang Eng 9:87–99
Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Comput Speech Lang 20:80–106
Oflazer K, Çetinoğlu Ö, Say B (2004) Integrating morphology with multi-word expression processing in Turkish. In: Proceedings of the ACL workshop on multiword expressions: integrating processing, Barcelona, pp 64–71
Sproat RW (1992) Morphology and computation. MIT Press, Cambridge, MA
van der Hulst H, van de Weijer J (1991) Topics in Turkish phonology. In: Boeschoten H, Verhoeven L (eds) Turkish linguistics today. Brill, Leiden
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Turkish Morphological Features
Appendix: Turkish Morphological Features
In this appendix we present an overview of the morphological features that the morphological analyzer produces. The general format of an analysis is as given in Sect. 2.4.1: any derivations are indicated by ̂DB. The first symbol following a ̂DB is the part-of-speech of the derived form and the next feature symbol is usually a semantic marker that indicates the semantic nature of the derivation. If the second symbol is +Zero that indicates a implied covert derivation without any overt morphemes.
-
1.
Major Root Parts of Speech: These mark the part-of-speech category of the root word. This is not necessarily the part-of-speech of the final word if the word involves one or more derivations.
Feature
Indicates
Feature
Indicates
+Noun
Noun
+Adj
Adjective/modifier
+Adverb
Adverb
+Verb
Verb
+Pron
Pronoun
+Postp
Postposition
+Num
Number
+Conj
Conjunction
+Det
Determiner
+Interj
Interjection
+Ques
Question clitic
+Punc
Punctuation
+Dup
Onomatopoeia
words
-
2.
Minor Parts of Speech: These follow one of the part-of-speech category symbols above and either denotes a further subdivision that is morphosyntactically relevant or a semantic marker that indicates the nature of the derivation.
-
(a)
After +Noun
Feature
Indicates
Example
+Prop
Proper noun
Çağla,
Mahkemesi’nde
-
(b)
After +Pron
Feature
Indicates
Example
+Demons
Demonstrative pronoun
bu “this”
+Ques
Interrogative pronoun
kim “who”
+Reflex
Reflexive pronoun
kendim “myself”
+Pers
Personal pronoun
biz “we”
+Quant
Quantifying pronoun
hepimiz “all of us”
-
(c)
After +Num
Feature
Indicates
Example
+Card
Cardinal number
iki “two”
+Ord
Ordinal number
ikinci “second”
+Dist
Distributive number
ikişer “two each”
-
(d)
After ̂DB+Noun
Feature
Indicates
Example
+Inf1
Infinitive
gitmek “to go”
+Inf2
Infinitive
gitme “going” , gitmem
“my going”
+Inf3
Infinitive
gidiş (going)
+PastPart
Past participle
gittiği (the fact that
(he) went)
+FutPart
Future participle
gideceği “the fact that
he will go”
+FeelLike
“the state of feeling like”
gidesim ((the state of) me
feeling like going)
-
(e)
After ̂DB+Adj: These are markers that indicate the equivalent of subject, object, or adjunct extracted relative clauses.
Feature
Indicates
Example
+PastPart
Past participle
gittiğim [yer]
“[the place] I am going”
+FutPart
Future participle
gideceğim [yer]
“[the place] I will be going”
+PresPart
Present participle
giden [adam]
“[the man] who is going”
+NarrPart
Evidential participle
gitmiş [adam]
“[the man] who (is rumored)
to have gone”
+AorPart
Aorist participle
geçer [not] “passing [grade]” ,
dayanılmaz [sıcak]
“unbearable [heat]”
-
(a)
-
3.
Nominal forms (Nouns, Derived Nouns, Derived Nominal and Pronouns) get the following inflectional markers. Not all combinations are valid in all cases:
-
(a)
Number/Person Agreement
Feature
Indicates
Example
+A1sg
1st person singular
ben “I”
+A2sg
2nd person singular
sen “you”
+A3sg
3rd person singular
o “he/she/it” , all singular nouns
+A1pl
1st person plural
biz “we”
+A2pl
2nd person plural
siz “you”
+A3pl
3rd person plural
onlar “they” , all plural nouns
-
(b)
Possessive Agreement
Feature
Indicates
Example
+P1sg
1st person singular possessive
kalemim “my pencil”
+P2sg
2nd person singular possessive
kalemin “your pencil”
+P3sg
3rd person singular possessive
kalemi “his/her/its pencil”
+P1pl
1st person plural possessive
kalemimiz “our pencil”
+P2pl
2nd person plural possessive
kaleminiz “your pencil”
+P3pl
3rd person plural possessive
kalemleri “their pencil(s)”
+Pnon
No possessive
kalem “pencil”
-
(c)
Case
Feature
Indicates
Example
+Nom
Nominative
çocuk “child”
+Acc
Accusative
çocuğu “child as definite object”
+Dat
Dative
çocuğa “to the child”
+Abl
Ablative
çocuktan “from the child”
+Loc
Locative
çocukta “on the child”
+Gen
Genitive
çocuğun “of the child”
+Ins
Instrumental/
kalemle “with a pencil”
accompanier
çocukla “with the child”
+Equ
Equative (by object)
bizce “by us”
-
(a)
-
4.
Adjectives do not take any inflectional markers. However, the cases ̂DB+Adj+PastPart and ̂DB+Adj+FutPart will have a possessive marker “one of the first six of the seven above” to mark subject agreement with the verb that is derived into the modifier participle. For example, gittiğim [yer] “[the place] (that) I went” will have …̂DB+Adj+PastPart+P1sg , gittiğimiz [yer] “[the place] (that) we went” will have …̂DB+Adj+PastPart+P1pl.
-
5.
Verbs will have multiple classes of markers
-
(a)
Valency changing voice suffixes are treated as derivations. These voice markers follow ̂DB+Verb. A verb may have multiple causative markers.
Feature
Indicates
Example
+Pass
Passive
yıkandı “it was washed”
+Caus
Causative
yıkattı “he had it washed”
+Reflex
Reflexive
yıkandı “he washed himself”
+Recip
Reciprocal/
selamlaştık “we greeted each other”
Collective
gülüştük “we all giggled”
-
(b)
The following markers marking compounding and/or modality are treated as deriving new verbs with a semantic twist. These markers also follow ̂DB+Verb. All except the first have quite limited applicability.
Feature
Indicates
Example
+Able
Able to verb
okuyabilir
“[s/he] can read”
+Repeat
verb repeatedly
yapadurdum
“I kept on doing [it]”
+Hastily
verb hastily
siliverdim
“I quickly wiped [it]”
+EverSince
have been verbing ever since
bilegeldiğimiz
“that we knew ever since”
+Almost
Almost verbed but did not
düşeyazdım
“I almost fell”
+Stay
Stayed/frozen while verbing
uyuyakaldılar
“they fell asleep”
+Start
Start verbing immediately
pişirekoydum
“I got on cooking [it]”
-
(c)
Verbal polarity attaches to a verb (or the last verbal derivation (if any), unless last verbal derivation is from a +Noun or +Adj is a zero derivation).
Feature
Indicates
Example
+Pos
Positive polarity
okudum “I read”
+Neg
Negative polarity
okumadım “I did not read”
-
(d)
Verbs may have one or two tense, aspect or mood markers. However not all combinations are possible.
Feature
Indicates
Example
+Past
Past tense
okudum “I read”
+Narr
Evidential past tense
okumuşum
“it is rumored that I read”
+Fut
Future tense
okuyacağım “I will read”
+Prog1
Present continuous tense—process
okuyorum “I am reading”
+Prog2
Present continuous tense—state
okumaktayım
“I am in a state of reading”
+Aor
Aorist mood
okur “he reads”
+Desr
Desiderative mood
okusam “wish I could read”
+Cond
Conditional aspect
okuyorsam “if I am reading”
+Neces
Necessitative aspect
okumalı “he must read”
+Opt
Optative aspect
okuyalım “let’s read”
+Imp
Imperative aspect
oku “read!”
-
(e)
Verbs also have Person/Number Agreement markers. See above. Occasionally finite verbs with have a copula +Cop marker.
-
(a)
-
6.
Semantic markers for derivations
-
(a)
The following markers mark adverbial derivations from a verb—hence they appear after ̂DB+Adverb.
Feature
Indicates
Example
+AfterDoingSo
After having verbed
okuyup “after having read”
+SinceDoingSo
Since having verbed
okuyalı “since having read”
+As
As …verbs
okudukça “as he reads”
+When
When …is done verbing
okuyunca
“when he is done reading”
+ByDoingSo
By verbing
okuyarak “by reading”
+AsIf
As if verbing
okurcasına
“as if he is reading”
+WithoutHaving-
Without having verbed
okumadan
DoneSo
“without having read”
okumaksızın
“without reading”
-
(b)
+Ly marks manner adverbs derived from an adjective: yavaş (slow) derives yavaşça “slowly”.
-
(c)
+Since marks temporal adverbs derived from a temporal noun: aylar “months” derives aylardır “since/for months.”
-
(d)
+With and +Without mark modifiers derived from nouns: renk “color” derives renkli “with color” and renksiz “without color.”
-
(e)
+Ness marks a noun derived from an adjective with semantics akin to -ness in English: kırmızı “red” derives kırmızılık “redness,” uzun “long” derives uzunluk “length.”
-
(f)
+Become and +Acquire mark verbs productively derived from nouns with the semantics of becoming like the noun or acquiring the noun: taş “stone” derives the verb stem taşlaş “become a stone/petrify”; para “money” derives the verb stem paralan “acquire money.”
-
(g)
+Dim marks derives a diminutive form a noun: kitap “book” derives kitapçık “little book/booklet”.
-
(h)
+Agt marks a noun derived from another noun involved in someway with the original noun; the actual additional semantics is not predictable in general but depends on the stem noun: kitap derives kitapçı “bookseller,” gazete “newspaper” derives gazeteci “journalist,” fotoğraf derives fotoğrafçı “photographer.”
-
(a)
-
7.
The following will follow a postposition to indicate the case of the preceding nominal it will subcategorize for. This is not morphologically marked but is generated to help with parsing or morphological disambiguation. Their only use is to disambiguate the case of the preceding noun if it has multiple morphological interpretations.
-
+PCAbl
-
+PCAcc
-
+PCDat
-
+PCGen
-
+PCIn
-
+PCNom
-
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Oflazer, K. (2018). Morphological Processing for Turkish. In: Oflazer, K., Saraçlar, M. (eds) Turkish Natural Language Processing. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-90165-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-90165-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90163-3
Online ISBN: 978-3-319-90165-7
eBook Packages: Computer ScienceComputer Science (R0)