Talk:Proto-language

	Linguistics portal This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LinguisticsWikipedia:WikiProject LinguisticsTemplate:WikiProject LinguisticsLinguistics articles
???	This article has not yet received a rating on the project's importance scale.
	This article is supported by Theoretical Linguistics Task Force.

Languages

	Language portal This article is within the scope of WikiProject Languages, a collaborative effort to improve the coverage of languages on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LanguagesWikipedia:WikiProject LanguagesTemplate:WikiProject Languageslanguage articles
???	This article has not yet received a rating on the project's importance scale.

Wiki Education Foundation-supported course assignment[edit]

This article is or was the subject of a Wiki Education Foundation-supported course assignment. Further details are available on the course page. Student editor(s): Malrey.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 07:22, 17 January 2022 (UTC)[reply]

[Untitled][edit]

a proto-language is not necessalrily reconstructed. The point is however, that if it is attested, it will not be actually referred to as proto-something, because it will have its own name. For example, Latin is also Proto-Romance. It should be made clear that it's the *last* common ancestor of a group of related languages (eg. Proto-Italic is not Proto-Romance, even though all Romance languages derive from it) Dbachmann 11:02, 10 Aug 2004 (UTC)

A quite important point which I've added to the article. Moreover, it is important to note, I think, that the term proto-language is misleading: we think of languages as including many subvarieties, but reconstructions essentially lead to proto-dialects without any variation. In part this lack of internal differentiation is certainly an artifact of the method, but it should also be remembered that a language such as Latin was originally confined to a single settlement (Rome in this case, even the surrounding rural dialects were different) while surrounded by more or less closely related idioms, and it is only realistic to imagine language families as arising from the spread of a single local dialect which was perceived as privileged (initially likely acting as a lingua franca) absorbing various closely related, more remotely related and even entirely unrelated languages and dialects. The direct last common ancestor of Romance is not the whole Italic dialect continuum, but only the dialect of Rome. Languages such as Gaulish or Hispano-Celtic didn't converge with Italic/Latin to become French or Spanish, they were simply replaced entirely. Therefore there is absolutely no compelling reason to assume that the apparent uniformity of proto-languages is only an illusion and that they really descend from heterogeneous dialect continua spread over extensive regions. Sure enough, even local dialects have some internal differentiation, for example along social class lines, or different registers, but even here there have been attempts, for example, to reconstruct aspects of "Poetic Indo-European" by comparing poetic texts in the oldest Indo-European languages, so this internal variation may not be entirely lost to us. --Florian Blaschke (talk) 12:21, 31 December 2013 (UTC)[reply]

I've also added the point that a proto-language is not necessarily the ancestor of a language family to the article: Proto-Basque, Proto-Albanian, Proto-Gaelic, Proto-Chinese, Proto-Japonic etc. are useful concepts and in actual use even though it is controversial (for various reasons) whether all of these groups of idioms should be described as language families, which implies at least two different languages within the group. The language-dialect issue is fortunately not directly relevant to historical linguistics: a researcher can remain entirely agnostic about it. In fact, in older usage it was not uncommon to speak of dialects (rather than branches) of a family in general, even when the family was as large as Indo-European. Hence, you would say that the dialects of Indo-European are Indo-Iranian, Baltic, Slavic, Albanian, Greek, Phrygian, Armenian, Anatolian, Tocharian, Italic, Celtic, Germanic etc., while each of these branches is again composed of dialects. Scratching the concept of "individual language" (except perhaps to refer to a written standard dialect) sounds like a good idea to me, given how ill-defined, redundant, unnecessary and what an ideology-charged can of worms it is in practice. Any subdivision of a dialect continuum into languages is arbitrary and I give no credence to any statement "there are X languages spoken in the world" because there is no way to get a reasonably objective figure here. --Florian Blaschke (talk) 12:41, 31 December 2013 (UTC)[reply]

"In a protolanguage, each sentence consists of one two-word phrase"

Where did it come from? Any source or links? =)

-- Vassili Nikolaev

user:Anonymous56789 added it. I would be very cautious with his edits. For example he made University of Berlin. To my knowledge there are 3 universities in Berlin, but none with that definitive name. I asked him on the talk page to back this up (and on his user talk page), and she has not responded until now. If you know better, then go ahead and remove the above sentence, I think I'm removing the content from University of Berlin as well. Cheers --snoyes 01:05 Mar 1, 2003 (UTC) (ps. just put in two dashes before signing otherwise it makes a horizontal rule (I corrected it).)

Majority of the citations and sources used throughout the article are not from credible sources. When clicked on, they take you to another Wikipedia page on information about the topic they are discussing, such as "tree model". The problem with this especially lies in the fact that when you visit these cited Wikipedia articles, there are warnings at the top. For example, the article "Tree Model" ^[1] has a problem with biased opinions and does not have enough sources cited in order to verify its information. The article "Dialect Continuum" ^[2] does not have enough sources to verify its information as well. "Wave Model" ^[3] is the only citation provided in the whole third paragraph under the subtitle "Definition and verification" and also has credibility issues according to the site, thus further proving the point of a lack of reliable sources cited in the "Proto-language" article. Ultimately, I suggest fact-checking this article and including much more reliable sources which prove these facts. Getting information from articles with unreliable information is a sure way to deplete the credibility of your own article. Malrey (talk) 22:13, 25 September 2017 (UTC)Malrey[reply]

A proto-language is indeed necessarily reconstructed. Latin, especially Classical Latin, is not the same thing as Proto-Romance, but it is very close to it. As far as I can tell, there is no difference between the neologism proposed language and the established term proto-language, which is why I'm recommending the former be merged here. See also my comments at Category talk:Proposed languages. --Angr/_{tɔk tə mi} 20:13, 22 Jun 2005 (UTC)

The crucial point here is that Latin is not the same as Classical Latin. Latin is a broad and not strictly defined term. In this sense it is true to say that Proto-Romance is a form of Latin, just a (possibly) unattested one.

Extended content

Proto-Romance must have been more conservative (quite conceivably considerably more archaic) than any attested Romance dialect, and in view of the fact that several attested (marginal or medieval) Romance dialects have preserved various archaisms and (even in non-marginal modern varieties) at least traces of older categories, it is misleading to equate some form of "Common Italo-Western Romance" with Proto-Romance, which must have still had at least three living cases (nominative, accusative and genitive), three living genders, no palatalisation of velars, long and short vowels (or at the very least nine vowel qualities), final stops, many lexical archaisms, etc. Considering the early colonisation of Sardinia, and what is known about the phonetical development of Latin, it is entirely possible that the divergence between Sardinian and the rest of Romance (which was probably the earliest split within Romance) already started at the end of the Old Latin period (between 150 and 50 BC) in the spoken language, I think; for example, the merger of /eː/ < ei with /iː/ had already happened, the monophthongisation of ae and oe, the loss of h, final m except in monosyllables, n before fricatives, all of which are reflected in Romance, so providing a likely terminus post quem (it is true that much Old/Classical Latin lexicon is nowhere preserved in Romance or even as loanwords in Insular Celtic, Albanian or Basque, even common words and particles, but parallel or common development and wave-like convergence within already differentiated dialect continua means that the real proto-stages were probably always more archaic than our reconstructions: independent or common loss is always a very real possibility), while Vulgärarchaismen – *potēre could in principle be such a preserved archaism from Old Latin, although like *volēre it could equally be a later analogical re-formation after the u-perfect – are positive indications that Proto-Romance may be this old. (In fact, however, I suspect that Sardinian is only the sole remainder of a whole group of "Southern Romance" dialects and that the divergence actually happened in Southern Italy already, starting with the divergent vowel collapses which were quite possibly influenced by Greek and Sabellic vocalism respectively.)

--Florian Blaschke (talk) 11:21, 31 December 2013 (UTC)[reply]

Proto-Romance has no /x/, has only nine vowels and one diphthong. It's not a form of spoken Latin. Comparative method is inherently lossy, unless the daughters split perfectly and there are plenty of them. --Ivan Štambuk (talk) 20:23, 31 December 2013 (UTC)[reply]

How do you reconstruct the numeral for "eight" then, for example? As I said, the term Latin is so broad that you cannot define Proto-Romance as "not Latin", and the monophthongisation of ae is attested already in late republican Latin, i. e., Old Latin (as early as the second century BC), and was complete in urban Latin by the time of the Pompejan inscriptions in the 1st century AD at the latest. Many Classical Latin pronunciations appear to be artificial archaisms, spelling pronunciations, so Classical Latin norms must be disregarded because they do not reflect the spoken language; errors and deviant spellings are much more telling. Pompejan inscriptions of the 1st century AD already record loss/assimilation of final -t, which Western Romance, Sardinian and Lausberg zone dialects preserve. I agree that the method is lossy, that was my point! (How well can Midland Early Modern English be reconstructed from post-Shakespearean Midland English dialects?) --Florian Blaschke (talk) 00:10, 3 January 2014 (UTC)[reply]

Also, already in Imperial inscriptions you occasionally find Z for /j/, B for /w/ and V for /b/, indicating that /j/ had already become an affricate like /dʒ/, /w/ had become a fricative /β/ and fallen together with /b/, which had obviously become lenited to /β/ after vowels. None of these developments are reconstructed for Proto-Romance as far as I am aware; for example, the development /j/ > /dʒ/ cannot even be reconstructed for Proto-Western-Romance as it is not general in Castilian Spanish (only word-initially before back vowels), and seems to postdate the development /g/ > /j/ before front vowels, implying that the language spoken in Italy could already be described as Italo-Romance specifically.

Ringe dates the breakup of Romance (with Sardinian and African Romance probably having split off earliest) to the first century BC. That would indeed make Proto-Romance contemporary with Old/Classical Latin. Are you saying Romance does not derive from Old Latin ultimately? Because if you agree that it does, I see no way around the conclusion that Proto-Romance is essentially a form of Latin. Unless you try to argue against this apparently inescapable conclusion by playing semantic games.

But if that example isn't clear enough for you, how about the Goidelic languages? As per Jackson's quote on Talk:Primitive Irish/Archive 1#From 'Primitive Irish' to 'Primitive Irish Language' (on the very bottom), there is evidence for differences between Irish and Scottish Gaelic starting as early as the 10th century, there are traces of specifically Scottish Gaelic traits in the 12th century Middle Irish marginalia of the Book of Deer, and by the 13th century, the emerging dialects were certainly already distinct.

Hence, as per Old Irish, the well-attested language Old Irish is a direct predecessor of all modern Goidelic languages. Even under a strict definition referring to the most recent common ancestor directly prior to the breakup, the language between the 10th and 12th centuries, known as Middle Irish, is even better attested (but less uniform). I can see no way to escape the occlusion that Proto-Goidelic is for all intents and purposes identical with an attested language, namely (Late) Old or Middle Irish. (For practical purposes, Old Irish is treated as Proto-Goidelic.) Irish is attested ever since about the 4th century, and from Primitive Irish over Old Irish and at least part of Middle Irish, all stages can be treated as ancestral to all the modern Goidelic languages. This is the most unassailable case of an attested proto-language I can think of. And I don't know how you could dispute it. --Florian Blaschke (talk) 23:42, 15 February 2014 (UTC)[reply]

If it's attested it's not reconstructed so it's not a protolanguage. All this interchangeability of terms between protolanguage and language, and natural descendants from attested forms vis-a-vis reflexes from protoforms is very annoying, because the two are not the same. Both are derivations - the first from real words (set of sounds), the other from abstract formulae which are themselves derived deductively (so speaking of reflexes of protoforms is in fact circular reasoning). I am appalled at how many historical linguists take the approach "let's ignore everyone else's theories on proto-X, this is how it really the language was "spoken" [note the leap from protolanguage->language and reconstructed protoforms->spoken sets of sounds], and let's ignore the inherent deficiencies of the comparative method such as lossiness and conflating chronologically different layers, and let us pretend that there was absolutely no variation as if the language was spoken by 300 people in a single village". I mean, Proto-Slavic is the "youngest" of all IE branches and it's arguably full of **** (from fantasy phonemes such as the opposition of patalalized vs. unpalatalized syllabic /l/ and /r/, to other inconsistencies that we've discussed elsewhere (Holzer and loanwords, Old Novgorod etc.) - I'm sure that you can pile up just as many unsolvable inconsistencies for any other IE branch that go back 1000-2000 years further than Proto-Slavic. Treating protolanguages as ancestors and presenting them as resembling normal languages (either according to a single author, or according to "consensus" among linguists which never seems to have been written only WP:SYNTH-ed by wiki editors) is one giant POV-pushing which needs to be corrected (not that I volunteer to do it...). --Ivan Štambuk (talk) 10:23, 16 February 2014 (UTC)[reply]

Erm, nobody is claiming that any scholar's personal reconstruction is "the truth" and guaranteed to be identical with the historically existing language. Each reconstruction is only an approximation of an ideal, which is necessarily never attainable due to the incompleteness of our knowledge. But in principle, there is no reason to deny the reality of Proto-Slavic or Proto-Indo-European as a language much like the dialect of Rome ca. 100 BC, Athens ca. 400 BC or London ca. 1600 AD. Note that each was a settlement that had far more than 300 inhabitants, yet still (at least excluding suburbs) considerably fewer than a modern megacity, indeed quite on the level where a local dialect generally has no appreciable regional (as opposed to social) diversity.

You also miss the fact that there is no clear distinction between a reconstructed and an attested language. In fact, it is a continuum, and there is no cut-off point whereby a language can be considered attested and not reconstructed in the slightest. Old Irish is an example of a functional proto-language that is attested but only quite incompletely, being reconstructed to a good deal – to a significant extent on the evidence of later forms of Irish – so it is in fact close to a hybrid. Old Persian is perhaps even more squarely in the middle of the continuum, and many of the forms assumed for it are indeed not attested but reconstructed. Gothic and Old Prussian are similar to Old Persian in that they are close to the attested end but with large gaps in our knowledge (Kleinkorpussprache), while Venetic and Phrygian belong to the better-attested Trümmersprachen. Several Iranian, Celtic and Anatolian languages are even more in-betweenish, if they are not outright Trümmersprachen. Classical Armenian and Old Church Slavonic – also both functionally proto-languages – are other examples of languages that are well-attested but still there are gaps in our knowledge that are filled by reconstruction. Similar things can be said about the Tocharian languages. And there are numerous early stages of languages that are just barely attested, such as Old British or early stages of Romance languages, where reconstruction attempts to fill the gaps. Early Norse is hardly better attested than Venetic and Phrygian, but does that mean that it is a reconstructed language? How about all the conflicting theories of what the pronunciation of Shakespeare's English was like?

As Baad points out below, every language is essentially a reconstructed language to some extent. Even Standard American English is not perfectly known. A protolanguage is simply the ancestral form out of which the members of a group of languages with a common origin (a language family) have developped through an unbroken line of generational descent and transmission, by definition, because the proto-language is the common origin. There's nothing more to it. How well that language is attested is completely irrelevant. Proto-Tyrsenian or Proto-Hurro-Urartian constitute models with a high probability of having existed historically – even if they consist only of a few fragments.

I don't see anything wrong with comparing protoforms, for example. It makes no real difference if I present a long list of all the Germanic cognates of some Gothic word in order to compare it with putative cognates in other I-E branches, or whether I present the reconstruction right away (as long as either I or some one else I cite has done the legwork of justifying that reconstruction by data and analysis), like Kroonen does in his dictionary. In fact, I argue that it is significantly better to do this instead of skipping the reconstructive step. For in that case, you leave the reconstruction of the in-between stages only to the reader, which calls for all kinds of trouble and makes your argument significantly weaker. After all, your reader could come to a quite different conclusion what that reconstruction could be like. Scholars have recognised the utility of Zwischenursprachen and make liberal use of reconstructions of in-between stages now, for the sake of the clarity of the argument. As long as you acknowledge that your Proto-Germanic (etc.) reconstruction is not a cold, hard fact, but only preliminary, and less certain than most attested forms, you are justified in taking them like attested forms for the sake of the argument. Ultimately, all scientific knowledge is not plain, absolutely undisputable fact that is 100% certain, but preliminary and only "probable bordering on certain" at best. Even forms generally accepted as being securely attested can be subject to some doubt, like when evidence calling the authenticity of an inscription containing them into doubt turns up.

The common ancestor species of a group of animals is usually equally unattested as most proto-languages, but that does not mean the existence or reality of such an ancestor species is in any way open to doubt. Reconstructed animals are not mere abstract formulae; they are models of an underlying historical reality. Just like with species for which we have some evidence, however, like various dinosaur species, we can never know them as well as an extant species by necessity.

Yes, we should not confuse a reconstruction with reality. It is also lazy to present one's reconstruction as fact and ignore diverging opinions about the language, regardless of how well or poorly it is attested. But that does not mean that reconstructed proto-languages do not resemble natural languages, because they clearly do: they have phonological and grammatical structures just like natural languages, even if many concrete details (that are irrelevant to the general structure) will never be known. Moreover, they are not invented out of thin air, like constructed languages. So for all intents and purposes, proto-languages, even if 100% reconstructed like PIE, can be treated as reflecting some historical reality, even if all individual reconstructions may be grossly inaccurate and inadequate. Hell, every language system is an idealisation; linguists usually deal with competence, not performance like the idiolectal, idiosyncratic way I speak right now because I'm drunk – if anything, they're interested in generalisations like the precise ways people's speech changes when they're drunk and if that may have any effects on language change. So all the pontification about reconstructed proto-languages only being abstract formulae is a red herring. Written language is all abstract formulae, that doesn't mean it's not the real reflection of a real (albeit idealised, abstract) language system. It's a false, naive dichotomy much like "chemical vs. natural/organic". --Florian Blaschke (talk) 19:10, 1 September 2014 (UTC)[reply]

well, if you are drawing a distinction between a language as recorded, and a language as actually spoken, every language needs to be 'reconstructed', since nobody has the full set of utterances of any language available. I would rather recommend that we split the 'absolute' sense (almost-a-language) from the 'relative' sence (common predecessor). These are two essentially unrelated meanings lumped together at present. Maybe we can treat the "relative" sence over at proposed language, while the absolute sense can remain here' Baad 13:32, 29 October 2005 (UTC)[reply]

We can't create meanings here; that's not what an encyclopedia is for. A proto-language is commonly defined a common ancestor language reconstructed through the comparative method of historical linguistics. I don't know of any common definition of "proposed language"; it seems to be a neologism and therefore inappropriate for inclusion in Wikipedia. --Angr/_{tɔk tə mi} 13:44, 29 October 2005 (UTC)[reply]

References

^ "Tree Model". Wikipedia. Retrieved September 25, 2017.
^ "Dialect continuum". Wikipedia. 22 September 2017. Retrieved September 25, 2017.
^ "Wave model". Wikipedia. 1 December 2016. Retrieved September 25, 2017.

Names of split articles[edit]

If we split the page into two separate articles and redirect it to a disambig page (which I wholeheartedly support), there's the question of what the two articles should be named. In my opinion, just Proto-language (historical linguistics) and Proto-language (glottogony), maybe? What do others think? Take care, --Miskwito 02:45, 2 March 2007 (UTC)[reply]

Diagram[edit]

Tree model of historical linguistics. The proto-languages stand at the branch points, or nodes: 15, 6, 20 and 7. The leaf languages, or end points, are 2, 5, 9 and 31. The root language is 15. By convention, the Proto-languages are named Proto-5-9, Proto-2-5-9 and Proto-31, or Common 5-9, etc. The overall Ursprache has a proto name reflecting the ordinary name of the entire family, such as Germanic, Italic, etc. The links between nodes indicate descent or genetic descent. All the languages in the tree are related. Nodes 6 and 20 are the daughters of 15, their parent. Nodes 6 and 20 are cognates or sister languages, etc. The leaf languages must be attested by some sort of documentation, even a lexical list of a few words. All the proto-languages are hypothetical, or reconstructed languages; however sometimes documentation is found that supports their former existence.

@Megaman en m: – the subject here is proto-language, and an abstract tree diagram with numbers does little to explain it. If anywhere, it belongs to tree model, although that one is better illustrated with some real-world examples. Further, it badly fails MOS:CAPSUCCINCT. In fact, I can't find a single sentence in the caption that does not state the obvious (the proto language of X-Y-Z is named proto-X-Y-Z, duh!). The overall value of such image and caption is negative, as it presents a visual distraction without adding any useful information. No such user (talk) 10:37, 20 November 2020 (UTC)[reply]

Is the Kazanas quote DUE?[edit]

Nicholas Kazanas is a fringe author and the quote was added by an IP. Ioe bidome (talk) 01:16, 21 March 2024 (UTC)[reply]

[1] "Tree Model". Wikipedia. Retrieved September 25, 2017.

[2] "Dialect continuum". Wikipedia. 22 September 2017. Retrieved September 25, 2017.

[3] "Wave model". Wikipedia. 1 December 2016. Retrieved September 25, 2017.

[1]

[2]

[3]