Skip to main content

Morphological Processing for Turkish

  • Chapter
  • First Online:
Turkish Natural Language Processing

Abstract

This chapter presents an overview of Turkish morphology followed by the architecture of a state-of-the-art wide coverage morphological analyzer for Turkish implemented using the Xerox Finite State Tools. It covers the morphophonological and morphographemic phenomena in Turkish such as vowel harmony, the morphotactics of words, and issues that one encounters when processing real text with myriads of phenomena: numbers, foreign words with Turkish inflections, unknown words, and multi-word constructs. The chapter presents ample illustrations of phenomena and provides many examples for sometimes ambiguous morphological interpretations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Literally, “(the thing existing) at the time we caused (something) to become strong.” Obviously this is not a word that one would use everyday. Turkish words (excluding non-inflecting high-frequency words such as conjunctions, clitics, etc.) found in typical running text average about 10 letters in length. The average number of bound morphemes in such words is about 2.

  2. 2.

    For phonological representations we employ the SAMPA representation. The Speech Assessment Methods Phonetic Alphabet (SAMPA) is a computer-readable phonetic script using 7-bit printable ASCII characters, based on the International Phonetic Alphabet (IPA) (see http://en.wikipedia.org/wiki/Speech_Assessment_Methods_Phonetic_Alphabet (Accessed Sept. 14, 2017) and www.phon.ucl.ac.uk/home/sampa/ (Accessed Sept. 14, 2017)). The Turkish SAMPA encoding convention can be found at www.phon.ucl.ac.uk/home/sampa/turkish.htm (Accessed Sept. 14, 2017).

  3. 3.

    In this chapter, we use - to denote syllable boundaries and + to denote morpheme boundaries wherever appropriate.

  4. 4.

    For example, Xerox Finite State Tools, available at http://www.fsmbook.com (Accessed Sept. 14, 2017), FOMA, available at http://fomafst.github.io/ (Accessed Sept. 14, 2017), HFST available at http://hfst.sf.net (Accessed Sept. 14, 2017) or OpenFST available at http://www.openfst.org (Accessed Sept. 14, 2017).

  5. 5.

    Note that we also explicitly show the morpheme boundary symbol, as in implementation, it serves as an explicit context marker to constrain where changes occur.

  6. 6.

    There are also very special forms denoting families of relatives, where the number and possessive morphemes will swap positions to mean something slightly different: e.g., teyze+ler+im “my aunts” vs. teyze+m+ler “the family of my aunt.”

  7. 7.

    An example below when we discuss derivation will show a full deconstruction of a complex verb to highlight these features.

  8. 8.

    Obviously the first two are applicable to a smaller set of (usually) transitive verbs.

  9. 9.

    We present the surface morpheme segmentations highlighting the relevant derivational morpheme with italics.

  10. 10.

    So the next time you are up on a cliff looking down and momentarily lose your balance and then recover, you can describe the experience with the single verb düşeyazdım.

  11. 11.

    Where meaningful we also give the segmentation of the words form into surface morphemes in italics.

  12. 12.

    Users of such words have the bizarre presumption that readers know how to pronounce those words in English!

  13. 13.

    In every group we first list the morphological features of all the tokens, one on every line and then provide the morphological features of the multiword construct followed by a gloss and a literal meaning.

  14. 14.

    Here we just show the roots of the verb with - denoting the rest of the suffixes for any inflectional and derivational markers.

  15. 15.

    The question and the emphasis clitics which are written as separate tokens can occasionally intervene between the components of a semi-lexicalized collocation. We omit the details of these.

References

  • Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Stanford University, Stanford, CA

    Google Scholar 

  • Clements GN, Sezer E (1982) Vowel and consonant disharmony in Turkish. In: van der Hulst H, Smith N (eds) The structure of phonological representations. Foris, Dordrecht, pp 213–255

    Google Scholar 

  • Karttunen L (1993) Finite-state lexicon compiler. Technical report, Xerox PARC, Palo Alto, CA

    Google Scholar 

  • Karttunen L, Beesley KR (1992) Two-level rule compiler. Technical report, Xerox PARC, Palo Alto, CA

    Google Scholar 

  • Karttunen L, Chanod JP, Grefenstette G, Schiller A (1996) Regular expressions for language engineering. Nat Lang Eng 2(4):305–328

    Article  Google Scholar 

  • Kornfilt J (1997) Turkish. Routledge, London

    Google Scholar 

  • Koskenniemi K (1983) Two-level morphology: a general computational model for word-form recognition and production. PhD thesis, University of Helsinki, Helsinki

    Google Scholar 

  • Oflazer K (1994) Two-level description of Turkish morphology. Lit Linguist Comput 9(2):137–148

    Article  Google Scholar 

  • Oflazer K (2003) Lenient morphological analysis. Nat Lang Eng 9:87–99

    Article  Google Scholar 

  • Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Comput Speech Lang 20:80–106

    Article  Google Scholar 

  • Oflazer K, Çetinoğlu Ö, Say B (2004) Integrating morphology with multi-word expression processing in Turkish. In: Proceedings of the ACL workshop on multiword expressions: integrating processing, Barcelona, pp 64–71

    Google Scholar 

  • Sproat RW (1992) Morphology and computation. MIT Press, Cambridge, MA

    Google Scholar 

  • van der Hulst H, van de Weijer J (1991) Topics in Turkish phonology. In: Boeschoten H, Verhoeven L (eds) Turkish linguistics today. Brill, Leiden

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kemal Oflazer .

Editor information

Editors and Affiliations

Appendix: Turkish Morphological Features

Appendix: Turkish Morphological Features

In this appendix we present an overview of the morphological features that the morphological analyzer produces. The general format of an analysis is as given in Sect. 2.4.1: any derivations are indicated by ̂DB. The first symbol following a ̂DB is the part-of-speech of the derived form and the next feature symbol is usually a semantic marker that indicates the semantic nature of the derivation. If the second symbol is +Zero that indicates a implied covert derivation without any overt morphemes.

  1. 1.

    Major Root Parts of Speech: These mark the part-of-speech category of the root word. This is not necessarily the part-of-speech of the final word if the word involves one or more derivations.

    Feature

    Indicates

    Feature

    Indicates

    +Noun

    Noun

    +Adj

    Adjective/modifier

    +Adverb

    Adverb

    +Verb

    Verb

    +Pron

    Pronoun

    +Postp

    Postposition

    +Num

    Number

    +Conj

    Conjunction

    +Det

    Determiner

    +Interj

    Interjection

    +Ques

    Question clitic

    +Punc

    Punctuation

    +Dup

    Onomatopoeia

      
     

    words

      
  2. 2.

    Minor Parts of Speech: These follow one of the part-of-speech category symbols above and either denotes a further subdivision that is morphosyntactically relevant or a semantic marker that indicates the nature of the derivation.

    1. (a)

      After +Noun

      Feature

      Indicates

      Example

      +Prop

      Proper noun  

      Çağla,

        

      Mahkemesi’nde

    2. (b)

      After +Pron

      Feature

      Indicates

      Example

      +Demons

      Demonstrative pronoun  

      bu “this”

      +Ques

      Interrogative pronoun

      kim “who”

      +Reflex

      Reflexive pronoun

      kendim “myself”

      +Pers

      Personal pronoun

      biz “we”

      +Quant

      Quantifying pronoun

      hepimiz “all of us”

    3. (c)

      After +Num

      Feature

      Indicates

      Example

      +Card

      Cardinal number

      iki “two”

      +Ord

      Ordinal number

      ikinci “second”

      +Dist

      Distributive number  

      ikişer “two each”

    4. (d)

      After ̂DB+Noun

      Feature

      Indicates

      Example

      +Inf1

      Infinitive

      gitmek “to go”

      +Inf2

      Infinitive

      gitme “going” , gitmem

        

      “my going”

      +Inf3

      Infinitive

      gidiş (going)

      +PastPart

      Past participle

      gittiği (the fact that

        

      (he) went)

      +FutPart

      Future participle

      gideceği “the fact that

        

      he will go”

      +FeelLike  

      “the state of feeling like”  

      gidesim ((the state of) me

        

      feeling like going)

    5. (e)

      After ̂DB+Adj: These are markers that indicate the equivalent of subject, object, or adjunct extracted relative clauses.

      Feature

      Indicates

      Example

      +PastPart

      Past participle

      gittiğim [yer]

        

      “[the place] I am going”

      +FutPart

      Future participle

      gideceğim [yer]

        

      “[the place] I will be going”

      +PresPart

      Present participle

      giden [adam]

        

      “[the man] who is going”

      +NarrPart

      Evidential participle

      gitmiş [adam]

        

      “[the man] who (is rumored)

        

       to have gone”

      +AorPart   

      Aorist participle  

      geçer [not] “passing [grade]” ,

        

      dayanılmaz [sıcak]

        

       “unbearable [heat]”

  3. 3.

    Nominal forms (Nouns, Derived Nouns, Derived Nominal and Pronouns) get the following inflectional markers. Not all combinations are valid in all cases:

    1. (a)

      Number/Person Agreement

      Feature  

      Indicates  

      Example

      +A1sg

      1st person singular   

      ben “I”

      +A2sg

      2nd person singular

      sen “you”

      +A3sg

      3rd person singular

      o “he/she/it” , all singular nouns

      +A1pl

      1st person plural

      biz “we”

      +A2pl

      2nd person plural

      siz “you”

      +A3pl

      3rd person plural

      onlar “they” , all plural nouns

    2. (b)

      Possessive Agreement

      Feature 

      Indicates 

      Example 

      +P1sg

      1st person singular possessive

      kalemim “my pencil”

      +P2sg

      2nd person singular possessive

      kalemin “your pencil”

      +P3sg

      3rd person singular possessive   

      kalemi “his/her/its pencil”

      +P1pl

      1st person plural possessive

      kalemimiz “our pencil”

      +P2pl

      2nd person plural possessive

      kaleminiz “your pencil”

      +P3pl

      3rd person plural possessive

      kalemleri “their pencil(s)”

      +Pnon

      No possessive

      kalem “pencil”

    3. (c)

      Case

      Feature 

      Indicates  

      Example

      +Nom

      Nominative

      çocuk “child”

      +Acc

      Accusative

      çocuğu “child as definite object”

      +Dat

      Dative

      çocuğa “to the child”

      +Abl

      Ablative

      çocuktan “from the child”

      +Loc

      Locative

      çocukta “on the child”

      +Gen

      Genitive

      çocuğun “of the child”

      +Ins

      Instrumental/  

      kalemle “with a pencil”

       

      accompanier

      çocukla “with the child”

      +Equ

      Equative (by object)

      bizce “by us”

  4. 4.

    Adjectives do not take any inflectional markers. However, the cases ̂DB+Adj+PastPart and ̂DB+Adj+FutPart will have a possessive marker “one of the first six of the seven above” to mark subject agreement with the verb that is derived into the modifier participle. For example, gittiğim [yer] “[the place] (that) I went” will have …̂DB+Adj+PastPart+P1sg , gittiğimiz [yer] “[the place] (that) we went” will have …̂DB+Adj+PastPart+P1pl.

  5. 5.

    Verbs will have multiple classes of markers

    1. (a)

      Valency changing voice suffixes are treated as derivations. These voice markers follow ̂DB+Verb. A verb may have multiple causative markers.

      Feature  

      Indicates   

      Example

      +Pass

      Passive

      yıkandı “it was washed”

      +Caus

      Causative

      yıkattı “he had it washed”

      +Reflex

      Reflexive

      yıkandı “he washed himself”

      +Recip

      Reciprocal/

      selamlaştık “we greeted each other”

       

      Collective

      gülüştük “we all giggled”

    2. (b)

      The following markers marking compounding and/or modality are treated as deriving new verbs with a semantic twist. These markers also follow ̂DB+Verb. All except the first have quite limited applicability.

      Feature  

      Indicates

      Example

      +Able

      Able to verb

      okuyabilir

        

      “[s/he] can read”

      +Repeat

      verb repeatedly

      yapadurdum

        

      “I kept on doing [it]”

      +Hastily

      verb hastily

      siliverdim

        

      “I quickly wiped [it]”

      +EverSince

      have been verbing ever since

      bilegeldiğimiz   

        

      “that we knew ever since”

      +Almost

      Almost verbed but did not

      düşeyazdım

        

      “I almost fell”

      +Stay

      Stayed/frozen while verbing

      uyuyakaldılar

        

      “they fell asleep”

      +Start

      Start verbing immediately

      pişirekoydum

        

      “I got on cooking [it]”

    3. (c)

      Verbal polarity attaches to a verb (or the last verbal derivation (if any), unless last verbal derivation is from a +Noun or +Adj is a zero derivation).

      Feature  

      Indicates  

      Example

      +Pos

      Positive polarity

      okudum “I read”

      +Neg

      Negative polarity

      okumadım “I did not read”

    4. (d)

      Verbs may have one or two tense, aspect or mood markers. However not all combinations are possible.

      Feature  

      Indicates  

      Example

      +Past

      Past tense

      okudum “I read”

      +Narr

      Evidential past tense

      okumuşum

        

      “it is rumored that I read”

      +Fut

      Future tense

      okuyacağım “I will read”

      +Prog1

      Present continuous tense—process   

      okuyorum “I am reading”

      +Prog2

      Present continuous tense—state

      okumaktayım

        

      “I am in a state of reading”

      +Aor

      Aorist mood

      okur “he reads”

      +Desr

      Desiderative mood

      okusam “wish I could read”

      +Cond

      Conditional aspect

      okuyorsam “if I am reading”

      +Neces

      Necessitative aspect

      okumalı “he must read”

      +Opt

      Optative aspect

      okuyalım “let’s read”

      +Imp

      Imperative aspect

      oku “read!”

    5. (e)

      Verbs also have Person/Number Agreement markers. See above. Occasionally finite verbs with have a copula +Cop marker.

  6. 6.

    Semantic markers for derivations

    1. (a)

      The following markers mark adverbial derivations from a verb—hence they appear after ̂DB+Adverb.

      Feature

      Indicates

      Example

      +AfterDoingSo

      After having verbed

      okuyup “after having read”

      +SinceDoingSo

      Since having verbed

      okuyalı “since having read”

      +As

      As …verbs

      okudukça “as he reads”

      +When

      When …is done verbing  

      okuyunca

        

      “when he is done reading”

      +ByDoingSo

      By verbing

      okuyarak “by reading”

      +AsIf

      As if verbing

      okurcasına

        

      “as if he is reading”

      +WithoutHaving-

      Without having verbed

      okumadan

        DoneSo

       

      “without having read”

        

      okumaksızın

        

      “without reading”

    2. (b)

      +Ly marks manner adverbs derived from an adjective: yavaş (slow) derives yavaşça “slowly”.

    3. (c)

      +Since marks temporal adverbs derived from a temporal noun: aylar “months” derives aylardır “since/for months.”

    4. (d)

      +With and +Without mark modifiers derived from nouns: renk “color” derives renkli “with color” and renksiz “without color.”

    5. (e)

      +Ness marks a noun derived from an adjective with semantics akin to -ness in English: kırmızı “red” derives kırmızılık “redness,” uzun “long” derives uzunluk “length.”

    6. (f)

      +Become and +Acquire mark verbs productively derived from nouns with the semantics of becoming like the noun or acquiring the noun: taş “stone” derives the verb stem taşlaş “become a stone/petrify”; para “money” derives the verb stem paralan “acquire money.”

    7. (g)

      +Dim marks derives a diminutive form a noun: kitap “book” derives kitapçık “little book/booklet”.

    8. (h)

      +Agt marks a noun derived from another noun involved in someway with the original noun; the actual additional semantics is not predictable in general but depends on the stem noun: kitap derives kitapçı “bookseller,” gazete “newspaper” derives gazeteci “journalist,” fotoğraf derives fotoğrafçı “photographer.”

  7. 7.

    The following will follow a postposition to indicate the case of the preceding nominal it will subcategorize for. This is not morphologically marked but is generated to help with parsing or morphological disambiguation. Their only use is to disambiguate the case of the preceding noun if it has multiple morphological interpretations.

    • +PCAbl

    • +PCAcc

    • +PCDat

    • +PCGen

    • +PCIn

    • +PCNom

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Oflazer, K. (2018). Morphological Processing for Turkish. In: Oflazer, K., Saraçlar, M. (eds) Turkish Natural Language Processing. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-90165-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90165-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90163-3

  • Online ISBN: 978-3-319-90165-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics