Studying the Uralic proto-language
(27. January 2006)
[Current article is a free translation of my recent Finnish article "Uralilaisen
kantakielen tutkiminen", published in Tieteessä tapahtuu 1 / 2006. Unfortunately all
my sources are not available in English.]
In Tieteessä tapahtuu 7 / 2005 Kalevi Wiik presented a fresh study about the genes
of English population. At the end of his text Wiik repeated his belief in the theory
according to which the very first post-Glacial population in England spoke Uralic
proto-language. I would like to clarify a few points about Proto-Uralic and the means
of studying it for the readers (nowadays also "Finno-Ugric" often points to the Uralic
language family as a whole). I try to keep my presentation as clear and understandable
as possible, so that even the readers unaware of the discipline concerned could follow
One of the basic principles of science is that every object is studied by the methods
proper and relevant for that particular object. Consequently, language is studied by
linguistics, material culture is studied by archaeology and genes are studied by genetics.
Because language, for example, is not connected to any particular gene, we cannot study
the language by the methods of genetics: there is no gene that would determine the
language we speak.
This is probably not a surprise. And still there are scholars who think they can act
differently. Kalevi Wiik, for example, thinks that because we can't reach the most
distant times by the methods of linguistics, we must turn to the methods of other
disciplines - such as genetics and archaeology - in order to study the linguistic
situation in the distant past (Wiik 2002: 23). Later in this paper we will see that
material culture and genes have a few similarities in the mechanism of their heritage,
so I bundle them up together under the one and same method, just like Wiik himself does.
Now the elementary question is, is it possible to get reliable information about the
linguistic past by the method used by Wiik, namely by following the genetic and/or
archaeological continuity back in time.
Reliability of the method
The only argument of Wiik goes like this: "The population of Finland descends from the
earliest post-Glacial inhabitants. They came from south, from (Northern) Central Europe.
In archaeological data there is perceivable an evident continuity from the earliest
inhabitants to the historical era. Thus the earliest inhabitants of Finland have spoken
a Uralic language, predecessor of present-day Finnish. Because those people arrived from
Central Europe, the original Proto-Uralic area must evidently have been situated there."
This sounds logical so far, doesn't it? Even though the conclusions about language has
been made by the means of other disciplines than linguistics. But let's see what other
results has been gained with the similar method.
By pleading the archaeological and/or genetic continuity, the original area of
Proto-Indo-European has been "proved" to locate in India, Caucasus, Middle-Asia, Anatolia,
Ukraine and Central Europe (see Mallory 1989: 143-185). Respectively the same method has
been used to "prove" that the original Proto-Uralic area must be located in Siberia
(Kosinskaja 2001), Upper Volga (Carpelan 2000) and Central Europe (Wiik 2002).
Naturally all these testimonies cannot be true, because the original area of every
proto-language has been narrow (I'll return to this later). Not only the place, but also
the time concerned is contradictory: Indo-European continuity in Central Europe has been
"proved" to reach Neolithic (Renfrew 1987) and Palaeolithic Age (Makkay 2001), and Uralic
continuity in Finland has been "proved" to reach Neolithic (Meinander 1984) and Mesolithic
Age (Nuñez 1987).
And above all, the results gained by this method are contradictory also concerning the
linguistic identity: the Late Palaeolithic inhabitation of Central Europe has been "proved"
both as Indo-European (Makkay 2001) and Uralic (Wiik 2002).
In short: this method (making conclusions about language by the means of other disciplines
than linguistics) is most unreliable and thus totally worthless. But why the method is so
First we must understand, that the archaeologically perceivable continuity is evident about
everywhere (Mallory 2001: 357) - continuity doesn't mean that there may not be any external
influence, but it means that the external influence is, as archaeologists see it, too weak
to could have been conducted a language shift.
Genetic continuity is also evident everywhere; the only exception would be an area, where
the earlier people would have been disappeared totally before the arrival of new inhabitants.
Only then there would be a clear discontinuity in archaeological and genetic data (if only
there were any remains later to compare to).
One archaeological culture can be multi-rooted, so that influences have flown from the
different directions (one item type from here, another from there), and similarly the
genetic roots of a person is in theory doubled in every generation (with the exception
of paternal and maternal lineage, which I shall discuss later on).
Language is, however, a different case: language is always one-rooted. This means that a
child adopts one of the languages spoken around him as his mother tongue. This language
has always only one root: the root of Finnish leads to Proto-Uralic, and the root of
Swedish leads to Proto-Indo-European. Later alien features cannot change the genetic
identity of language. Even though Finnish has a plenty of common words and structures
with Swedish, it is still a Uralic language. (Laakso 1995.)
A language is "born" so that at the certain area there occurs enough changes, which
differentiate a vernacular from the other, spoken by neighbours. It seldom happens that
the result is a sharp boundary between two areally close vernaculars - because people are
in contact with each other and might adopt certain features from their neighbours. The
"languagezation" rather occurs via the disappearance of intermediary dialects. If, for
example, dialects 1 and 2 unite (adopt from each other all the features which used to
separate them), the process results the growing difference and thus sharpening of the
linguistic boundary between dialects 2 and 3. (Salminen 1999: 14; 2001: 385)
It follows, that a language is always "born" in a narrow area: the wider the area is,
the more improbable is the occurring of a sharp boundary, because the distribution of
the features does not match each other as easily as in a narrow area. Those who suggest,
that the Proto-Uralic area has been wide, ignore this linguistic law: Proto-Uralic must
have been "born" in a narrow area (Janhunen 1999: 34). And those who suggest, that
Proto-Uralic was a mixed language born as a result of intensive areal contacts, ignore
this very same law: also the contact languages are born in a narrow area.
The prehistory of Finland is full of influences from different directions at different
times. All these have left traces in local cultures: some more, some less. And still the
Finnish language has only one root, which leads to Proto-Uralic.
Thus it follows that when we try to solve with which cultural or genetic wave of influence
the Uralic language could be connected, we are merely in the lap of Fortune. As we have
seen, one scholar thinks the Uralic language has spread to Finland along with the original
inhabitants, while the other thinks it is connected with the Neolithic Combed Ware. It is
simply impossible to get any reliable information about language merely by the methods of
archaeology or genetics.
Even the one-rooted genetic father and mother lineages cannot help us. There are no single
Finnish lineages - all the Finns do not descent from the same foreparents. The Finnish
father lineages point to a different directions, and so do the mother lineages. There is
no way to find out with which lineage the Uralic language has spread here, so the name of
the game here is, again, lottery.
Probability of success
It follows from the one-rootedness of language compared to the multi-rootedness of culture
or genome, that the wider the area of language, the less the continuity in archaeological
or genetic data can actually tell us.
Let's suppose that the Uralic language family consists of about 30 speech areas. Proto-Uralic
was spoken in one of these areas (unless it was located outside of present-day Uralic
area) - to all the other areas Uralic language has spread later. Because Proto-Uralic is a
much later language than the end of Ice Age and it surely didn't spread to empty areas, in
all these other 29 presently Uralic areas there must have occurred a language shift: the
earlier inhabitants have abandoned their original languages and adopted Uralic language.
In all these 30 areas the archaeologically perceivable continuity is evident: it has also
been used as an argument for locating the original Uralic area, as we saw in the beginning
of this article. It follows that archaeological continuity corresponds with linguistic
continuity only in one area, when in all the other 29 areas archaeological continuity
corresponds with linguistic discontinuity and language shift. So the probability of success,
when trying to locate the original Proto-Uralic area by the results of archaeology is 1/30,
that is 3.33 %. So there is 96.67 % chance to fail.
No wonder then, that the results gained by this method are contradictory. And in the case
of Wiik, who is searching the Proto-Uralic area even outside the known Uralic area, the
chance of failing is even bigger.
What if we nevertheless tried linguistics, even though Wiik believes it cannot tell
anything about the times so distant. Perhaps Wiik just isn't aware of all the choices
available in linguistics. For example, there is no law to determine how far in the past
the linguistic method can reach. It depends on the language in question: the areal width
of the language family, intensity of the contacts and the width of the contact language
family all allow us to follow the language farther in the absolute past. As far as
relative past is concerned, the Uralic languages can be traced to the very beginning,
that is Proto-Uralic, and for the absolute past this means about 6 000 years (see the
Because Wiik wants to reach Proto-Uralic and because Proto-Uralic can be reached by the
methods of comparative linguistics, it is evident that we take the linguistic results
into consideration. After all, it seems reasonable that the linguistic past is best
reached by the methods of linguistics - at least the object and the methods would now
correspond each other. We surely wouldn't study a Neolithic remains by the linguistic
methods, any more than the atmosphere of Venus by the methods of dentistry.
For example, linguistics can tell us what kind of language was spoken in the Central
Europe before the Indo-Europeanization of the area. It has been found features, both
phonological and lexical, from the aboriginal languages, which point to influence of
Wiik has also presented such substrate features in Proto-Germanic, Proto-Baltic and
Proto-Slavic; he assumes that these features are due to Uralic influence originating
from the process when the originally Uralic-speaking inhabitants learned Proto-Indo-European
through the filter of their own language system (Wiik 1999).
However, such phonetic, prosodic and structural features may well be due to the internal
development of language; and even if they were substrate features, nothing could prove
that they were due to Uralic influence. Furthermore, many of the features presented by
Wiik are too late to be Uralic. The hypothesis about the Uralic substrates features in
Germanic has been disproved years ago (Kallio 1997b; Kallio, Koivulehto & Parpola 1998).
The lexical evidence is more reliable indicator of the identity of the substrate language:
if those non-Indo-European words would be similar to Uralic words, this would truly be a
strong proof for the Uralic identity of the language(s). Especially so, because some words
are always very stable and the relation would be perceivable even after a very long
divergence: for example, Finnish word "kala" 'fish' has cognate in the Samoyedic Forest
Nenets language spoken in Siberia: "kal'a" 'fish' - although the languages concerned
separated several thousands of years ago.
This is to say, that if the aboriginal languages of Central Europe would have been
Uralic-related, these substrate words would be identifiable when compared to the
present-day Uralic languages and the Proto-Uralic reconstructed via them.
It has been found out, that in a language-shift situation like the one Wiik supposes,
particularly the vocabulary concerning local nature and geographical features is exposed
to loaning (Saarikivi 2000). Accidentally, it has become clear that these ancient
languages in Central Europe do not resemble Proto-Uralic the least - neither phonologically
nor lexically (Kallio 1997a; 1997b; Schrijver 2001).
Thus linguistics has proved that Uralic-related languages were not present in Central
Europe before Indo-European expansion; the local aboriginal languages were totally
distinct. The original area of Proto-Uralic was not in Central Europe, nor it ever even
reached there. Linguistics can also help us to solve the original Proto-Uralic temporal
and spatial location more accurately; very comprehensive and clear guide to such a
subject is "Suomalaisten esihistoria kielitieteen valossa" by Kaisa Häkkinen (Häkkinen 1996).
However, this is not a place to sport with the subject concerned - those interested in
the question may study it by themselves. It is adequate to sum up that the linguistic
evidence clearly points to eastern origin. Scholars are only arguing if the original area
was west or east of Ural-mountains (Salminen 2001: 391; Janhunen 2000: 63). In addition,
both Proto-Uralic and Proto-Indo-European seem to be much later languages than Wiik
supposes, dated no earlier than fourth millennium BC (Kallio 1997a; Carpelan & Parpola 2001).
On the studies for origin
What have we learned about the studies for origin? At least that we can't get reliable
information about the language by any other methods than linguistic. It has also become
clear that there are absolutely no basis whatsoever for locating Proto-Uralic in Central
Europe, not to mention Britain.
The method adopted by Wiik and many other scholars - to ignore the best argued linguistic
evidence and instead rely on archaeology and/or genetics - has been revealed most
unreliable. In scientific studies for origin we must always respect the autonomy of
disciplines: if we study material culture, the results of archaeology must form the
very basis; if we study language, the results of linguistics must form the basis.
In practice, applied in the Uralic studies, this means that when we have located the
Proto-Uralic in time and space by linguistic methods, we may take archaeology along.
This is done by finding such an archaeological culture, which happens to match the
proto-language concerned by its time, place and direction of expansion. In short, we
won't prefer lottery-method any more in search of the matching culture. This way
Proto-Indo-European has been managed to locate in the Pontic steppes at the fourth
millennium BC (Carpelan & Parpola 2001 with further references).
If it isn't done this way, but conclusions about language are made while ignoring the
results of linguistics, we are not talking about scientific studies of origin. Then
there remains two further options: if the results of linguistics even couldn't tell
anything about the particular question, it is just a matter of guessing, probability of
which being no more than a few percents. If, on the other hand, the results of linguistics
would tell a lot about the subject (like in the Proto-Uralic case) but they are still
ignored, it is merely a leap outside of science, to the world of fantasy (Saarikivi 2003).
There is no single interdisciplinary method, which could magically solve the problems of
linguistic, genetic and cultural origin all by one. Every component of the origin must be
studied by the methods matching the object, and only after this all the independent results
can be connected as an interdisciplinary summary.
Consequently, the origin of Finns is not a coherent entity where we could proclaim, after
finding the origin of one or two components, that the Origin is now resolved. Such a case
would be possible only in the world where genes, culture and language were always
inherited from one and same "homeland". There populations would have been born as "ready"
packages in their original homeland, and they wouldn't receive any genetic, cultural or
linguistic influence during their migration. Populations would all live in the vacuum of
their own, inbreeding and lacking contacts with other populations. Naturally this is not
the case in our world.
The origin of people is rather a multileveled and constantly changing puzzle, where the
object of study is not identical neither in genetic, cultural nor linguistic level with
the "same" people thousand years ago. The genetic roots of Finns lead to many differing
directions, and the same goes with cultural roots. Yet our language is quite a late
newcomer from east.
There is no contradiction in such a view, because the components of the origin are not
interdependent: they function at totally different levels and thus even cannot contradict
with each other. That someone has dark skin, does not automatically mean that his mother
tongue could not be Finnish. Language, genes and culture do not actually meet at any
level - they meet only in the artificial concept of "people" we use.
This is the very reason why every scholar who understands the origin of certain people
as a one coherent object of study is automatically misled. There are many origins and
they are totally independent. There is no way to solve the absolute origin of people,
because there is no absolute origin at all. By linguistic study we reach only the
linguistic origin, by genetic study only the origin of certain genetic feature and
by archaeological study only the origin of certain feature of material culture.
In the studies for origin it has sometimes been pleaded to different schools, as if
different views could justify the contradictory results. Whether such schools really
exist or not, it remains a fact that some methods are more reliable than others. A
school applying an unreliable method is scientifically less worthy than a school applying
a more reliable method. Unreliable method will not become any more reliable, no matter
how long list of scholars using the method is presented.
Wiik sees the key question to be: "How has occurred such a situation, that some of the
peoples linguistically related to Finns are not genetically related to them? How has
occurred such a situation, that some of the peoples genetically related to Finns yet
speak a language not related to Finnish?" (Wiik 2002: 28; my translation.)
Wiik answers, leaning to the method which has no match in unreliability and which
ignores all the plausible results of linguistics, that all those peoples in Central
Europe which are genetically related to Finns have earlier spoken Uralic language but
later changed it for Indo-European one.
I, on the other hand, can answer just like anyone else scientifically studying for
origin would answer: the first inhabitants of Finland after the last Ice Age arrived
mainly from south, but later they shifted their language to the Uralic one, spreading
from east. Traces of Palaeo-European languages earlier spoken in Central Europe have
been reached, and those languages were definitely not Uralic.
I believe the reader is now, after this article, able to assess, which one of these
answers is based on more reliable method and is thus scientifically more plausible.
[The arguments concerning the unreliability of "continuity"-method are of course
relevant in Indo-European studies also; thus we can consider erroneous any urheimat-theory
based on archaeological and/or genetic continuity and contradicting the most plausible
linguistic evidence. This includes such recent theories like those of Colin Renfrew (1987),
János Makkay (2001) and Mario Alinei (http://www.continuitas.com/intro.html).
I'm sure the list would finally become very long, if someone was patient enough to collect all such theories.
I also recommend for all interested in the subject a critical article by J. P. Mallory
concerning the "continuity card" argumentation (Mallory 2001) - it has been a major
inspiration for this text.]
Carpelan, Christian 2000: "Essay on archaeology and languages in the western end of the
Uralic zone". Congressus nonus internationalis Fenno-ugristarum 7.-13.8.2000. Tartto 2000.
Häkkinen, Kaisa 1996: Suomalaisten esihistoria kielitieteen valossa.
Tietolipas 147. Suomalaisen kirjallisuuden seura, Helsinki 1996.
Janhunen, Juha 1999: "Euraasian alkukodit". Pohjan poluilla. Suomalaisten juuret
nykytutkimuksen mukaan. Toim. Paul Fogelberg. Bidrag till kännedom av Finlands natur och
folk, 153. Helsinki 1999.
Janhunen, Juha 2000: "Reconstructing Pre-Proto-Uralic typology spanning the
millennia of linguistic evolution". Congressus Nonus Internationalis Fenno-Ugristarum
Kallio, Petri 1997a: "Uralilaisten alkuperä indoeuropeistisesta näkökulmasta".
Virittäjä 101, Helsinki 1997.
Kallio, Petri 1997b: "Uralic substrate features in Germanic?" SUSA 87, Helsinki 1997.
Kallio, Petri - Koivulehto, Jorma - Parpola, Asko 1998: "Kantagermaanin
suomalais-ugrilainen substraatti: edelleen perusteeton hypoteesi". Tieteessä tapahtuu
3 / 1998, Helsinki.
Kosinskaja, L. L. 2001: "The Neolithic period of north-western Siberia: The question
of southern connections". Early Contacts between Uralic and Indo-European: Linguistic
and Archaeological Considerations (toim. Carpelan et al.). SUST 242, Helsinki 2001.
Laakso, Johanna 1995: "A spade is always a spade". Itämerensuomalainen kulttuurialue
. Toim. Seppo Suhonen. Castrenianumin toimitteita 49. Helsinki 1995.
Makkay, János 2001: "The earliest Proto-Indo-European-Proto-Uralic contacts: An
Upper Palaeolithic model". Early Contacts between Uralic and Indo-European: Linguistic
and Archaeological Considerations (toim. Carpelan et al.). SUST 242, Helsinki 2001.
Mallory, J. P. 1989: In Search of the Indo-Europeans. Language, Archaeology and
Myth. Thames and Hudson, London / England 1989.
Mallory, J. P. 2001: "Uralics and Indo-Europeans: Problems of time and space".
Early Contacts between Uralic and Indo-European: Linguistic and Archaeological
Considerations (toim. Carpelan et al.) SUST 242, Helsinki 2001.
Meinander, C. F. 1984: "Kivikautemme väestöhistoria". Suomen väestön
esihistorialliset juuret. Bidrag till kännedom av Finlands natur och folk, 131.
Nuñez, Milton G. 1987: "A Model for the Early Settlement of Finland".
Fennoscandia Archaeologica 4. Helsinki 1987.
Parpola, Asko 1999: "Varhaisten indoeurooppalaiskontaktien ajoitus ja paikannus
kielellisen ja arkeologisen aineiston perusteella". Pohjan poluilla. Suomalaisten
juuret nykytutkimuksen mukaan. Toim. Paul Fogelberg. Bidrag till kännedom av
Finlands natur och folk, 153. Helsinki 1999.
Renfrew, Colin 1987: Archaeology and Language. The Puzzle of Indo-European Origins.
Penguin Books Ltd., Harmondsworth, Middlesex, England 1987.
Saarikivi, Janne 2000: "Kontaktilähtöinen kielenmuutos, substraatti ja
substraattinimistö". Virittäjä 104, Helsinki 2000.
Saarikivi, Janne 2003: "Fiktiivistä tiedettä?" Hiidenkivi 1 / 2003.
Salminen, Tapani 1999: "Euroopan kielet muinoin ja nykyisin". Pohjan poluilla.
Suomalaisten juuret nykytutkimuksen mukaan. Toim. Paul Fogelberg. Bidrag till
kännedom av Finlands natur och folk, 153. Helsinki 1999.
Schrijver, Peter 2001: "Lost languages in northern Europe". Early Contacts
between Uralic and Indo-European: Linguistic and Archaeological Considerations (toim.
Carpelan et al.) SUST 242, Helsinki 2001.
Wiik, Kalevi 1999: "Pohjois-Euroopan indoeurooppalaisten kielten suomalais-ugrilainen
substraatti". Pohjan poluilla. Suomalaisten juuret nykytutkimuksen mukaan. Toim.
Paul Fogelberg. Bidrag till kännedom av Finlands natur och folk, 153. Helsinki 1999.
Wiik, Kalevi 2002: Eurooppalaisten juuret. Atena, Jyväskylä 2002.
SUSA = Suomalais-Ugrilaisen Seuran aikakauskirja
SUST = Suomalais-Ugrilaisen Seuran toimituksia
Back to the Main Page