![]() |
TLN sent me this ages ago |
I have been deep-diving on Proto-Indo-European linguistics for the last year for Conlang Reasons, which has filled me with useless knowledge that makes me very fun at parties. Now, after many false starts and nigh-constantly running up against one wall or another, I have returned from the depths with treasures. And / or eyes on the inside.
0. Prelude to the Introduction
The first thing to get out of the way is this: PIE studies are bullshit (slightly affectionate) for reasons you will hear a lot about; the second thing is that this rabbit hole is so deep I am bumping elbows with Tsathoggua, which makes for blogposts bogged down in tedious explanations of each and every weird thing in a desperate attempt to make any of this chicanery make any sense at all.
So as to avoid dumping everything on y’all all at once, for this post I’m only going to do an introduction with my goals & design principles, and then go through an example word sound-by-sound to show how I got from the abstracted reconstructed version to the conlang proper.
Third and potentially most important thing to get out of the way first: there is no “true” PIE; there was a dialect continuum spoken by some Eurasian steppe nomads, and that dialect continuum drifted and fragmented over thousands of years as its speakers spread out across west-central Asia, India, and Europe. Reconstructions of PIE are an abstraction used to describe a language we have no attestation for; they’re closer to algebraic formulas than an actual language, and they are algebraic formulas composed with limited data, bias, best guesses, academic dogmatism, outright crankery, occasional bits of insight, and every other skew you can possibly think of. Ceci n'est pas une *h₁éḱwos.
This is made infinitely more frustrating by PIE reconstructioneers (this is the official technical term) and lay linguists alike using “Proto-Indo-European” to describe the reconstructed language-abstraction and the real historical language(s) interchangeably, despite the former being a work of artifice set in amber outside of time and the latter being three goblins and a horse in a trenchcoat. I myself will be guilty of this, but I will try not to be by saying "when I say how something works in PIE, I am talking about how linguists think real!PIE worked according to how they have built reconstructed!PIE like some sort of word-demiurge, not how it actually worked in reality".
0. Prelude to the Introduction
The first thing to get out of the way is this: PIE studies are bullshit (slightly affectionate) for reasons you will hear a lot about; the second thing is that this rabbit hole is so deep I am bumping elbows with Tsathoggua, which makes for blogposts bogged down in tedious explanations of each and every weird thing in a desperate attempt to make any of this chicanery make any sense at all.So as to avoid dumping everything on y’all all at once, for this post I’m only going to do an introduction with my goals & design principles, and then go through an example word sound-by-sound to show how I got from the abstracted reconstructed version to the conlang proper.
Third and potentially most important thing to get out of the way first: there is no “true” PIE; there was a dialect continuum spoken by some Eurasian steppe nomads, and that dialect continuum drifted and fragmented over thousands of years as its speakers spread out across west-central Asia, India, and Europe. Reconstructions of PIE are an abstraction used to describe a language we have no attestation for; they’re closer to algebraic formulas than an actual language, and they are algebraic formulas composed with limited data, bias, best guesses, academic dogmatism, outright crankery, occasional bits of insight, and every other skew you can possibly think of. Ceci n'est pas une *h₁éḱwos.
This is made infinitely more frustrating by PIE reconstructioneers (this is the official technical term) and lay linguists alike using “Proto-Indo-European” to describe the reconstructed language-abstraction and the real historical language(s) interchangeably, despite the former being a work of artifice set in amber outside of time and the latter being three goblins and a horse in a trenchcoat. I myself will be guilty of this, but I will try not to be by saying "when I say how something works in PIE, I am talking about how linguists think real!PIE worked according to how they have built reconstructed!PIE like some sort of word-demiurge, not how it actually worked in reality".
General time periods will be named according to a schema of my own devising.
Now with all that out of the way, let’s get inside baseball.
1. Introduction
Emboldened by the flame of ambition, this project began with two must-haves:
Point 2 immediately gave me no shortage of issues, because outside of a few edge cases in Iranian and Armenian the only descendants to retain laryngeals are the Anatolian languages; Anatolian languages are so divergent from all the others that you need to consider pre- and post- Anatolian split as wholly different stages of the PIE continuum deserving of wholly different reconstructions (Strongbadian vs Strongmadian PIE), which the mainstream reconstruction doesn’t bother to do. So I ended up having to trudge through material that is bogged down in features that didn’t exist at the time I wanted to split my language off from the whole.
(Granted, good data for this sort of thing is even harder to come by than the usual, and historical linguistics is an extremely slow-to-adapt field).
It does not help, and this one is entirely on me, that I was using Wikipedia for most of my research: Wikipedia’s PIE pages are abyssmal. Outdated, contradictory, poorly-written and inadequately explained, they will teach you the wrong things and then you will have to waste a considerable amount of time unlearning all the horseshit. Don’t do what I did.
- Homsar Hol - Prior to the divergence of my as-yet unnamed language family; “Pre-Indo-European” or “Pre-Indo-Anatolian”
- Strongbadian PIE - Prior to the divergence of the Anatolian languages from the core continuity; “Proto-Indo-Anatolian”, “Proto-Indo-Hittite”, or “Early PIE”.
- Strongmadian PIE - An era of significant differentiation between core and fringe speakers; “Middle PIE”
- Strongsadian PIE - Total dissolution of the core speaking community, dialectal continuity completely lost by this stage; “Late PIE”.
Now with all that out of the way, let’s get inside baseball.
1. Introduction
Emboldened by the flame of ambition, this project began with two must-haves:- It was going to belong to its own branch of the greater Indo-European family (and likely become very weird because of it)
- I wanted to retain the infamous mystery-consonant laryngeals in some form.
Point 2 immediately gave me no shortage of issues, because outside of a few edge cases in Iranian and Armenian the only descendants to retain laryngeals are the Anatolian languages; Anatolian languages are so divergent from all the others that you need to consider pre- and post- Anatolian split as wholly different stages of the PIE continuum deserving of wholly different reconstructions (Strongbadian vs Strongmadian PIE), which the mainstream reconstruction doesn’t bother to do. So I ended up having to trudge through material that is bogged down in features that didn’t exist at the time I wanted to split my language off from the whole.
(Granted, good data for this sort of thing is even harder to come by than the usual, and historical linguistics is an extremely slow-to-adapt field).
It does not help, and this one is entirely on me, that I was using Wikipedia for most of my research: Wikipedia’s PIE pages are abyssmal. Outdated, contradictory, poorly-written and inadequately explained, they will teach you the wrong things and then you will have to waste a considerable amount of time unlearning all the horseshit. Don’t do what I did.
1.2 Brief List of Sources
I’ll have a longer writeup later on down the line: the bulk of my inspiration thus part has come from the work of Martin Kümmel, Andrew Byrd (he made Wenja for Far Cry Primal), Paul Kiparsky (primarily for his Compositional Theory), the blogs Paleoglot (Glen Gordon), PhDniX’s Blog (PhoeniX), and protouralic.wordpress ( ), some random bullshit I found on reddit, and The Oxford Introduction to Proto-Indo-European and the Proto-Indo-European World by Mallory & Adams. I'll try and update this as I continue.1.3 Core Premise & Design Principles
I am for the time being painting the speakers in broad strokes, and will have to be content with worldbuilding as I go. - The speakers diverged from the main language continuity extremely early, leading to a language that doesn’t include many of the features later PIE stages are known for, and retains several major parts that were lost.
- The point of divergence was sometime between █████ and ████ BC.
- The speakers’ culture runs orthogonal to the patriarchal horse-based murder that characterized much of the later PIE culture group(s), and they may or may not have even remained in this world.
- I am not going to worry overmuch about where the vocabulary comes from, at least not at the moment; while I will try to stick to words with more solid / widespread attestation, some might time travel to before their invention.
- The rules are made up and the points don’t matter.
The early point of divergence gives me some guidelines to follow with the actual content of the grammar: it’s going to retain features that were lost entirely in other branches (or only survived as scattered and unproductive archaisms), and it’s going to sidestep the development of some signature features of later stages of the family. As of right now, that’s going to include:
- The laryngeal consonants (which are not actually laryngeals) are retained.
- Active-stative alignment rather than nominative-accusative alignment
- No grammatical gender; animate / inanimate distinction is only semantic at this point, not morphological.
- No thematic endings (or at least not in the way they are typically reconstructed.)
- Pre-syncope - This language is set before the stress-based syncope obliterated most of the vowels in the Great Dying / Vowel Mass Extinction / Schwapocalypse / the Fuckening.
- No ablaut - or at the very least, I’m going with Paul Kiparsky’s Compositional Theory if I need to, because I can grok it much easier.
- No vowels in hiatus - This is pretty well established in PIE, vowels that are next to each other either merge into a long vowel, have a glide inserted between them, or turn into a glide.
- No geminate consonants - Also well established in PIE.
- Everything's got to start with a consonant, but that includes the glottal stop so it's basically cheating - this is in line with modern ideas of root constraints, though for my purposes roots don't also have to end with a consonant.
Don’t worry if you don’t have a fucking clue what I am talking about here, we’ll get to it eventually.
Principle 0: Art over accuracy
As I said in part 0, accuracy is a mug’s game in historical linguistics. I’ve aimed for the sweet spot of “coherent enough to make sense” and “I personally like it”, with the latter taking priority in cases where I need to decide. This is an art project based on half-baked amateur linguistics and the power of Pattern Recognition, nothing more. (To be honest, I had to enforce this principle on myself just to stop the what-if rabbit-holing and decision paralysis.)
Principle 1: The Two-Step Plan
I want to do two distinct stages of this language: the one I’m going to be describing here is the older of the two, which is intended to be pretty close to the core Strongbadian PIE dialect. The later one is where we go off the rails.
Principle 2: Areal features yes, macrofamilies no
There was absolutely cultural and linguistic exchange going on between the peoples of east Europe / west Asia: that does not mean that their languages are connected via descent from a shared origin. I will pillage loan words and grammatical features from Proto-Kartvelian, Proto-Caucasian, Proto-Uralic, and Proto-Semitic if I think there’s something neat, but I’m not designing this conlang to align with the Indo-Uralic, Pontic, or Nostratic theories (note: these are all varying levels of fringe, and I only recommend looting them for creative purposes.)
Principle 3: Crank Credits
Sometimes the cranks have a good idea or two; these will be called out with the cashing in of a Crank Credit™. I don’t expect many of them, because the reasonable ideas are few and far between and the entertainingly bonkers ones are somehow even rarer. (I did find one guy I found who somehow managed to turn recon!PIE into Earthsea magic, though it is not as cool or useful as you would hope.)
(Fun fact: there is absolutely no requirement whatsoever for anything uploaded to Academia.edu to have any connection to an institution of higher learning or a peer-reviewed publication.)
Principle 4: Moderation in laryngeals
I am going into this project with the presumption that zealous reconstructioneers overuse laryngeals as an inconsistency-solving tool. *h₂ cannot possibly be that common and be a single phoneme, that’d make what is probably a uvular fricative the second or third most common consonant in the language and that is bonkers.
Principle 5: *a definitely existed
We’ll get to this one.
I am going into this project with the presumption that zealous reconstructioneers overuse laryngeals as an inconsistency-solving tool. *h₂ cannot possibly be that common and be a single phoneme, that’d make what is probably a uvular fricative the second or third most common consonant in the language and that is bonkers.
Principle 5: *a definitely existed
We’ll get to this one.
Principle 6: Symbol Usage
I will be marking normal reconstructions with the usual asterisk like *so, and I will mark my own bespoke Pre-PIE reconstructions with two **asterisks. This will be strictly in reference to explaining how things from my version of recon!PIE changed to get to traditional recon!PIE.
X < Y means “X derived from Y”; X > Y means “X turns into Y”
I promise this is the end of the set up.
2. *h₁n̥gʷnís
First and most important thing (I'm really saying that a lot, aren't I): this isn’t actually one of the PIE words for fire; This is a formula representing one of the PIE words for fire. Think of every letter here as an algebraic variable that, if you apply the right sequence of functions to it, will become words like Sanskrit agni and Latin ignis. If you had a time machine, popped into a settlement of Eurasian steppe nomads, pointed at the campfire and said “*h₁n̥gʷnís!” you’d get some very strange looks but if the reconstruction is solid the confused steppe nomads would probably figure out that you meant “fire” and correct your gods-awful pronunciation.(Granted, that’s dependent on them being from one of the dialects that used *h₁n̥gʷnís in the first place: it’s the less common of the two.)
2.1.1: Basic Structure
*h₁n̥gʷnís can be broken down into component parts:- h₁engʷ- ; a root with a general meaning of “to burn” or “fire”.
- -n- ; An extension added to the root of entirely unknown function: it’s here in *h₁n̥gʷnís, but missing from the related *h₁óngʷl̥ (“charcoal” or “embers”). It might be part of the suffix?
- -i- ; A suffix that makes animate nouns out of verbs or adjectives.
- -s ; The nominative singular case ending
2.1.2: h₁
Starting off strong we get one of the mystery laryngeal consonants; these are sounds (that are not actually laryngeals) that were lost in all IE languages (save the Anatolian languages, a few edge cases in Iranian and Albanian, and this weird thing called the Triple Reflex in Greek) but we know that they were there because they influenced nearby vowels (and sometimes consonants). There are normally three laryngeals reconstructed, sometimes four, but some people have gone as low as 1 (highly unlikely) and as high as 10 or 12 (also unlikely, but less unlikely when framed as h₁, h₂, and h₃ encompassing multiple sounds each)h₁ is the easy one, because it doesn’t have much going for it: it lengthened vowels, it didn’t have any coloring effect on *e (the others did, more on that eventually), it sometimes turned into e in Greek, and it vanished in all descendents (including Anatolian). Nearly all reconstructioneers plug it in as either *h or *ʔ, since those sounds fit all the criteria: I’m going to be going with the glottal stop ʔ (for the time being, stick a pin in that).
Word Progress: ʔ-
2.1.3: n̥
That little dot means that this is a syllabic resonant - a consonant that can serve as the nucleus of a syllable in place of a vowel. English has them all over the place (It’s why “little” is two syllables) and they’re not particularly difficult to wrangle. Syllabic consonants are almost always the result of a nearby vowel being reduced and / or deleted, and we can clearly see that the root h₁engʷ- has a vowel in it: this is an example of ablaut, which is when vowels change and carry different meanings when they do (English sing-sang-sung is an example of ablaut).In this case, since the stress is on the *í, the *e got reduced/deleted because there was a resonant to pick up the slack. But since my language doesn’t have stress-based deletion as part of ablaut, it’s going to stay as **en.
Vowels in PIE reconstructions are a 50 gallon drum of worms that I am going to save for another time: for now, I am going to say that *e isn’t actually /e/ most of the time, and was probably closer to the ɛ, ə, ɐ, or æ - something weakly pronounced and a bit forward in the mouth. I’ll just be representing it with <a> for now because I’ll need a separate schwa in the next step and haven't fully decided on how the low vowels will pan out.
Word progress: ʔan-
2.1.4: gʷ
This one is going to be a tricky one, despite looking relatively normal. It’s got two prominent distinctive features, but they’re a lot more questionable than what’s come before. As reconstructed, *gʷ is:- Voiced, contrasting with unvoiced *kʷ and breathy voiced *gʷʰ
- Labialized velar (pronounced with rounded lips), contrasting with plain *g and palatovelar *ǵ.
To whit: The three-way voiceless-voiced-breathy voiced (*T, *D, *Dʰ series) distinction in the stop consonants is so rare in the modern day that the number of comparable languages is in single-digits. This has led some reconstructioneers to theorize that the *D series was something else entirely, usually some kind of glottalized voiceless consonant (this is called Glottalic Theory), to account for why they are so infrequent in the corpus, why they never appear twice in the same root when *DʰeDʰ is extremely common, and why there is basically no *b at all except weird edge cases that might be errors or loanwords.
I’m going to cash in one of my Crank Credits™ and go with Allan Bomhard’s version of Glottalic Theory: the traditional *D series behaved similarly to glottalic consonants in Coast Tsimshian / Sm'algya̱x. Glottalization occurs on whatever side the vowel is on (leaning towards stressed vowel if between two), and is unreleased word-finally.
(Bomhard, as a rule, is not a reliable source: his whole deal is trying to reconstruct a protolanguage macrofamily ("Nostratic") that encompasses basically every language in Eurasia, and you can probably see the issue with making a reconstruction based on other reconstructions and claiming that it’s reflective of reality. His work is impressively thorough, methodologically whack, and would be better served if it was an elaborate art project. That said, in his efforts to make a Grand Unified Theory he cites basically everything anyone has ever written about the subject and entertains basically any idea that could even tangentially fit.)
So instead of *gʷ, I’m going with **k’ʷ, but this leads us to a second problem: it’s pretty weird for a labialized ejective to be stuck between two other consonants. (*-n̥-, since it acts like a vowel, is less weird in this regard, but since I’m working with a stage that doesn’t have *-n̥-, that’s not an option.)
Here’s where saying “fuck it, we ball” is very handy. Labialized consonants are pronounced with lip-rounding, and they are typically formed when a rounded vowel like /o/ or /u/ carries over to the preceding consonant. This is the extremely common process of assimilation which boils down to “brain makes one sound closer to a nearby sound to make it easier to say.”
Tugging on that thread (we are outside of normal reconstruction and fully into the art project weeds now), I’m going to stick a schwa in there, representing an unstressed **u that got reduced during the Vowel Mass Extinction and then obliterated in the Schwapocalypse (also called syncope) but left behind its roundedness on the **k’.
(This theory I am pulling primarily from the long-abandoned Paleoglot blog by Glen Gordon and his “Diachrony of Pre-PIE” document which was saved from oblivion by an automated Scribd web trawler. It has some significant issues that I have already run into trying to prep the next post, so I’m including it here because I like it and I can make it work for the time being - we’ll see how it turns out in the future.)
Now we are fully into the weeds and have three different versions of the word: pre- Extinction (reduction of unstressed high vowels to schwa), and then pre- and post- apocalyptic (deletion of schwa)
Word progress (Pre-Extinction): ʔank’u-
Word progress (Pre-Schwapocalpyse): ʔank’ʷə-
Word progress (Post-Schwapocalypse): ʔank’ʷ-
2.1.5: n
After all that mess, *n is just **n. Nasal consonants are anomalously well-behaved in reconstructed!PIE. There’s no indication of what this might have meant, if it meant anything at all, though there are other instances of *n getting slapped onto the end of words so maybe later we’ll see something that can give us a clue.Word progress (Pre-Extinction): ʔank’un-
Word progress (Pre-Schwapocalpyse): ʔank’ʷən-
Word progress (Post-Schwapocalypse): ʔank’ʷn-
2.1.6: í
*i and *u are weird, because reconstructioners treat them as syllabic versions of *y and *w, working the same way as *-n̥- did above. They normally get written as *ey and *ew when stressed, *i and *u / *y and *w when unstressed, but as you’ve probably noticed by now, this here is a stressed *i. Exceptions to rules are everywhere, especially in old words, but that actually works in our favor.While the “*i is just syllabic *y and the unstressed form of *ey” works for the background formula level of PIE chicanery, for my purposes there is a much simpler function I want to use: that at some point in the history of PIE, stressed high vowels (**i and **u) broke into the diphthongs **ay and **aw (or **əy and **əw - I’ll figure that out when we get there), and then when ablaut stress changes were applied we ended up with the syllabic *y and *w.
This is way too many words to say “*i is just **i for the purposes of this conlang”.
Word progress (Pre-Extinction): ʔank’uni-
Word progress (Pre-Schwapocalpyse): ʔank’ʷəni-
Word progress (Post-Schwapocalypse): ʔank’ʷni-
2.1.7: s
This was probably pronounced closer to /z/, since /s/ commonly voices after stressed vowels or voiced consonants (again, super common in English), but there didn’t seem to be a meaningful distinction between the two in recon!PIE: it’s just *s, nothing weird there.Except there is something weird, it’s just grammatical instead of phonological - *s appears all over noun endings in PIE, to the point of being weird, but this post has gone on long enough without me going into a digression about why I think this happened just right now. To bullet-points it:
- It’s typologically unusual for nominative-accusative languages to explicitly mark the subject of a sentence, but you do find this sort of thing in languages that make a distinction between the subject of an intransitive verb and the agent of a transitive verb.
- PIE neuter nouns use the accusative case ending (*-m) for the nominative, which is another indication that we’re dealing with something that descended from an older system that cared about agency / animacy: since a rock isn’t animate it would never be the agent, and thus it would always use the ending for the patient of a verb, and this carried over through the switch to NOM-ACC.
- The singular nominative demonstrative pronoun, *so, (“this, that”) is weirdly out of place - every other form in its declension table (all the non-NOM cases and every single plural form) begins with *t, not *s.
All put together we get a theory (that I did not make up myself) that the NOM.sing ending *-s is the leftovers of **sə, which is the reduced form of *so (which for vowel reasons I will write as **sɑ for now and explain later), which was originally **tɑ.
Now, to get all that working properly I have to add several more steps to our sequence and rename the ones we have. Here I’m going to shift over into directly describing
- Starting Point: ʔank’uni tɑ
- High Vowel Collapse + Labialization: ʔank’ʷəni tɑ
- Agglutination Dance: ʔank’ʷənitɑ
- Final vowel reduction: ʔank’ʷənítə
- Schwapocalpyse: ʔank’ʷnit
- Spirantization of final *t: ʔank’ʷnis
- a > e shift: ʔenk’ʷnis
- Ablaut Deletions: ʔn̥k’ʷnís
- Glottalized > creaky voice:ʔn̥g̰ʷnís
- Creaky voice to plain voiced: ʔn̥gʷnís
- Laryngeal Loss: n̥gʷnís
And bing-bang boom we have a timeline of (hypothetical) changes from Early PIE to Late PIE that I can add to and adjust as I need to later on. I had to run my functions backwards in time, which is a bit awkward, but now I can just pick a stage and say “here’s where my language broke off”. Then I can just apply all those steps in reverse to any reconstructed word and add more granularity and more steps as needed.
Going forward, I think I am going to split off shortly after the Great Vowel Collapse, to get all those fun labialized consonants. More on that later.
3. Dictionary Entry
- ʔan.k’u.ní (AN): wildfire; uncontrolled blaze; a fire that is particularly intense, destructive, uncontrollable, or fast-spreading.
This is where my brain has been for the last year. I believe I might be coming down with a case of the classical madness.
ReplyDelete