Indo-European languages

The Indo-European languages comprise a family of several hundred languages and dialects (443 according to the SIL estimate), including most of the major languages of Europe, as well as many in Southwest Asia, Central Asia and Southern Asia. Contemporary languages in this family include Hindi , Bengali, German, English, Portuguese , Russian, Spanish (each with more than 100 million native speakers), as well as numerous smaller national or minority languages. Indo-European has the largest numbers of speakers of recognised families of languages in the world today, with its languages spoken by approximately 3 billion native speakers (the Sino-Tibetan family of tongues has the second-largest number of speakers). Some researchers have (controversially) proposed other supergroupings.



Indo-Germanic (obsolete)


Before the 15th century, Europe, and South and Southwest Asia; today worldwide.


One of the world's major language families; although some have proposed links with other families, none of these has received mainstream acceptance.






Italic (including Romance)


The various subgroups of the Indo-European language family include (in historical order of their first attestation):

  • Anatolian languages, earliest attested branch, from the 18th century BC; extinct, most notably including the language of the Hittites.
  • Indo-Iranian languages, descending from a common ancestor, Proto-Indo-Iranian
    • Indo-Aryan languages, including Sanskrit, attested from the 2nd millennium BC
    • Iranian languages, attested from roughly 1000 BC, including Avestan , Kurdish and Persian
    • Dardic languages
    • Nuristani languages
  • Greek language, fragmentary records in Mycenaean from the 14th century BC; Homeric traditions date to the 8th century BC. See Proto-Greek language, History of the Greek language.
  • Italic languages, including Latin and its descendants (the Romance languages), attested from the 1st millennium BC.
  • Celtic languages, Gaulish inscriptions date as early as the 6th century BC; Old Irish texts from the 6th century AD, see Proto-Celtic language.
  • Germanic languages (including Old English and English), earliest testimonies in runic inscriptions from around the 2nd century, earliest coherent texts in Gothic, 4th century, see Proto-Germanic language.
  • Armenian language, attested from the 5th century.
  • Tocharian languages, extinct tongues of the Tocharians, extant in two dialects, attested from roughly the 6th century.
  • Balto-Slavic languages, believed by many Indo-Europeanists to derive from a common proto-language later than Proto-Indo-European, while skeptical Indo-Europeanists regard Baltic and Slavic as no more closely related than any other two branches of Indo-European.
    • Slavic languages, attested from the 9th century, earliest texts in Old Church Slavonic.
    • Baltic languages, attested from the 14th century, and, for languages attested that late, they retain unusually many archaic features attributed to Proto-Indo-European.
  • Albanian language,; attested from the 13th century (1210); relations with Illyrian, Dacian, or Thracian proposed.

In addition to the classical ten branches listed above, several extinct and little-known languages have existed:

  • Illyrian languages — possibly related to Messapian or Venetic; relation to Albanian also proposed.
  • Venetic language — close to Italic.
  • Liburnian language — apparently grouped with Venetic.
  • Messapian language — not conclusively deciphered.
  • Phrygian language — language of ancient Phrygia, possibly close to Greek, Thracian, or Armenian.
  • Paionian language — extinct language once spoken north of Macedon.
  • Thracian language — possibly close to Dacian.
  • Dacian language — possibly close to Thracian and Albanian.
  • Ancient Macedonian language — probably related to Greek; some propose relationships to Illyrian, Thracian or Phrygian.
  • Ligurian language — possibly not Indo-European; possibly close to or part of Celtic

No doubt other Indo-European languages once existed which have now vanished without leaving a trace. Scholars cannot classify the fragmentary Raetian language with any certainty.

Specialists have postulated the existence of further subfamilies, among them Italo-Celtic and Graeco-Aryan. Neither of these has achieved wide acceptance. Indo-Hittite refers to the hypothesis that a significant separation occurred to split Anatolian from all the remaining groups.

Satem and Centum languages

Many scholars classify the Indo-European sub-branches into a Satem group and a Centum group. This terminology comes from the varying treatments of the three original velar rows. Satem languages lost the ddistinction between labiovelar and pure velar sounds, and at the same time assibilated the palatal velars. The centum languages, on the other hand, lost the distinction between palatal velars and pure velars. Geographically, the "eastern" languages belong in the Satem group: Indo-Iranian and Balto-Slavic (but not including Tocharian and Anatolian); and the "western" languages represent the Centum group: Germanic, Italic, and Celtic. The Satem-Centum isogloss runs right between the Greek (Centum) and Armenian (Satem) languages (which a number of scholars regard as closely related), with Greek exhibiting some marginal Satem features. Some scholars think that some languages classify neither as Satem nor as Centum (Anatolian, Tocharian, and possibly Albanian). Note that the grouping does not imply a claim of monophyly: we do not need to postulate the existence of a "proto-Centum" or of a "proto-Satem". Areal contact among already distinct post-PIE languages (say, during the 3rd millennium BC) may have spread the sound changes involved.

Suggested superfamilies

Some linguists propose that Indo-European languages form part of a hypothetical Nostratic language superfamily, and attempt to relate Indo-European to other language families, such as South Caucasian languages, Altaic languages, Uralic languages, Dravidian languages, and Afro-Asiatic languages. This theory remains controversial, like the similar Eurasiatic theory of Joseph Greenberg, and the Proto-Pontic postulation of John Colarusso.


History of the idea of Indo-European

The first proposal of the possibility of common origin for some of these languages came from Marcus Zuerius van Boxhorn in 1647. Van Boxhorn suggested their derivation from "Scythian". However, the suggestions of van Boxhorn did not become widely known and did not stimulate further research.

The hypothesis re-appeared in 1786 when Sir William Jones first lectured on similarities between four of the oldest languages known in his time: Latin, Greek, Sanskrit, and Persian. Systematic comparison of these and other old languages conducted by Franz Bopp supported this theory, and Bopp's Comparative Grammar, appearing between 1833 and 1852 counts as the starting-point of Indo-European studies as an academic discipline.

Reconstructions and hypotheses

Scholars have dubbed the common ancestral (reconstructed) language Proto-Indo-European (PIE). They disagree as to the original geographic location (the so-called "Urheimat" or "original homeland ") from where it originated. Two main candidates exist:

  1. the steppes north of the Black Sea and the Caspian Sea (see Kurgan)
  2. Anatolia (see Colin Renfrew).

Proponents of the Kurgan hypothesis tend to date the proto-language to ca. 4000 BC, while proponents of Anatolian origin usually date it several millennia earlier, associating the spread of Indo-European languages with the Neolithic spread of farming (see Indo-Hittite).

The Kurgan hypothesis

Marija Gimbutas originally suggested the Kurgan hypothesis in the 1950s. According to the Kurgan hypothesis, chalcolithic steppe cultures of the 5th millennium BC between the Black Sea and the Volga spoke early PIE.

Kurgan hypothesis timeline:

  • 4500 - 4000: Early PIE. Sredny Stog, Dnieper-Donets and Samara cultures, domestication of the horse.
  • 4000 - 3500: The Yamna culture (prototypical kurgan-building) emerges in the steppe, and the Maykop culture in the northern Caucasus. Indo-Hittite models postulate the separation of Proto-Anatolian before this time.
  • 3500 - 3000: Middle PIE. The Yamna culture reaches its peak: it represents the classical reconstructed Proto-Indo-European society, with stone idols, early two-wheeled proto-chariots, predominantly practising animal husbandry, but also with permanent settlements and hillforts, subsisting on agriculture and fishing, along rivers. Contact of the Yamna culture with late Neolithic Europe cultures results in the "kurganized" Globular Amphora and Baden cultures. The Maykop culture shows the earliest evidence of the early Bronze Age, and bronze weapons and artefacts enter Yamna territory. Probable early Satemization.
  • 3000 - 2500: Late PIE. The Yamna culture extends over the entire Pontic steppe. The Corded Ware culture extends from the Rhine to the Volga, corresponding to the latest phase of Indo-European unity, the vast "kurganized" area disintegrating into various independent languages and cultures, but still in loose contact and thus enabling the spread of technology and early loans between the groups (except for the Anatolian and Tocharian branches, already isolated from these processes). The Centum-Satem division has probably run its course, but the phonetic trends of Satemization remain active.
  • 2500 - 2000: The breakup into the proto-languages of the attested dialects has done its work. Speakers of Proto-Greek live in the Balkans, speakers of Proto-Indo-Iranian north of the Caspian in the Sintashta-Petrovka culture. The Bronze Age reaches Central Europe with the Beaker culture, whose people probably use various Centum dialects. Proto-Balto-Slavic speakers (or alternatively, Proto-Slavic and Proto-Baltic communities in close contact) emerge in north-eastern Europe. The Tarim mummies possibly correspond to proto-Tocharians.
  • 2000 - 1500: Invention of the chariot, which leads to the split and rapid spread of Iranian and Indo-Aryan from the Andronovo culture and the Bactria-Margiana Archaeological Complex over much of Central Asia, Northern India, Iran and Eastern Anatolia. Proto-Anatolian splits into Hittite and Luwian. The pre-Proto-Celtic Unetice culture has an active metal industry (Nebra skydisk).
  • 1500 - 1000: The Nordic Bronze Age develops (pre-)Proto-Germanic, and the (pre-)Proto-Celtic Urnfield and Hallstatt cultures emerge in Central Europe, introducing the Iron Age. Proto-Italic migration into the Italian peninsula. Redaction of the Rigveda and rise of the Vedic civilization in the Punjab. Flourishing and decline of the Hittite Empire. The Mycenaean civilization gives way to the Greek Dark Ages.
  • 1000 BC - 500 BC: The Celtic languages spread over Central and Western Europe. Northern Europe enters the Pre-Roman Iron Age, the formative phase of Proto-Germanic. Homer initiates Greek literature and early Classical Antiquity. The Vedic civilization gives way to the Mahajanapadas. Zoroaster composes the Gathas; rise of the Achaemenid Empire, replacing the Elamites and Babylonia. The Scythians supplant the Cimmerians (Srubna culture) in the Pontic steppe. Armenians succeed the Urartu culture. Separation of Proto-Italic into Osco-Umbrian and Latin-Faliscan, and foundation of Rome. Genesis of the Greek and Old Italic alphabets. A variety of Paleo-Balkan languages have speakers in Southern Europe. The Anatolian languages suffer extinction.

A strength of the Kurgan hypothesis lies in the fact that part of its proposed mode of spread (military conquest by horsemen) agrees with historical reports about the spread of early Greek and early Indo-Aryan peoples.

The Anatolian hypothesis

Colin Renfrew in 1987 suggested an association between the spread of Indo-European and the Neolithic revolution, spreading peacefully into Europe from Asia Minor (Anatolia) from around 7000 BC with the advance of farming (wave of advance). Accordingly, all the inhabitants of Neolithic Europe would have spoken Indo-European tongues, and the Kurgan migrations would at best have replaced Indo-European dialects with other Indo-European dialects.

According to Renfrew , the spread of Indo-European proceeded in the following steps.

  • Around 6500 BC: Pre-Proto-Indo-European, located in Anatolia, splits into Anatolian and Archaic Proto-Indo-European, the language of those Pre-Proto-Indo-European farmers that migrate to Europe in the initial farming dispersal. Archaic Proto-Indo-European languages occur in the Balkans (Starčevo-Körös-Cris culture), in the Danube valley (Linear Pottery culture), and possibly in the Bug-Dniestr area (Eastern Linear pottery culture).
  • Around 5000 BC: Archaic Proto-Indo-European splits into Northwestern Indo-European (the ancestor of Italic, Celtic, and Germanic), located in the Danube valley, Balkan Proto-Indo-European (corresponding to Gimbutas' Old European culture), and Early Steppe Proto-Indo-European (the ancestor of Tocharic).
  • After 3000 BC: The individual families of Indo-European develop; except for the ones already mentioned, they all derive from Balkan Proto-Indo-European. Proto-Greek speakers move southward into Greece; Proto-Indo-Iranian moves northeast into the steppe area.

The main strength of the farming hypothesis lies in its linking of the spread of Indo-European languages with an archeologically known event that likely involved major population shifts: the spread of farming (though the validity of basing a linguistics theory on archeological evidence remains disputed).

While the Anatolian theory enjoyed brief support when first proposed, the linguistic community in general now rejects it. A major problem lies in its postulating a much earlier date for Proto-Indo-European than linguistic evidence suggests. If PIE broke up in the 7th millennium, one cannot postulate a common Indo-European word for "wheel" (invented in the 5th millennium), incidentially one of the most solidly reconstructed Indo-European lexemes. While the spread of farming undisputedly constituted an important event, Renfrew's critics see no case to connect it with Indo-Europeans in particular, seeing that terms for animal husbandry tend to have much better reconstructions than terms related to agriculture.

Other hypotheses

