Talk:ISO 639

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
WikiProject Computing (Rated C-class, Mid-importance)
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
WikiProject Languages (Rated C-class, Top-importance)
WikiProject iconThis article is within the scope of WikiProject Languages, a collaborative effort to improve the coverage of languages on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Top  This article has been rated as Top-importance on the project's importance scale.

To look-up codes, and allow easy referencing from outside, some redirects have been implemented. The urls then are in the form: More Category:Redirects from ISO 639.


What about dialects such as en-us? Are those part of this standard? -- AdamRaizen 14:15, 2003 Sep 8 (UTC)

No, I believe thats just in the Internet RFCs (combination of the ISO 639 and ISO 3166 codes).
Yes, RFC 3066. But it's not only about country codes (ISO 3166). It can be anything that identifies a language/script variant (zh-HK-HanT = Chinese - Hongkong - Traditional Han ideographs; en-scouse) -- 19:03, 11 August 2005 (UTC)


Someone added "scy" as a code for Scanian; however, I wasn't able to find that code or language in [1] or [2]. The site appears to me to be normative, so I'm removing it.

If you have newer information (e.g. a mailing list post from a standardisation authority), please provide a source for this new code. -- pne 10:49, 13 Jul 2004 (UTC)

ISO 639 sources[edit]

The Ethnologue, the ISO recognized authority for standard 639-3, is my first go-to reference for language codes, and usually the last one I need. Plugging "scy" into the Ethnologue language-by-code URL format to get
tells us
Invalid language code
scy is not a language code used in the Ethnologue, 16th edition, nor is it a valid ISO 639-3 code.
And that page says under "Comments":
The language has had no recognition since Sweden obtained Scania from Denmark in 1658. It is called 'Southern Swedish' in Sweden, and 'Eastern Danish' in Denmark. Today it is heavily influenced by Swedish in Sweden.
ISO 639-6 assigns Scanian the 4-alpha code scyr, and under that "Scanian Spoken" scys. I've updated Scanian dialects#Status with the recent history (2009-2010) of that change. --Thnidu (talk) 17:12, 21 November 2012 (UTC)

Eskimo languages[edit]

I see "esk" is listed as a code for "Eskimo languages" (a better term I guess would be Yupik languages), apparently ever since the page has existed. For the same reasons given above for Scanian, I am wondering if this is a legitimate ISO 639 code. Let me know if you have a source for this code. --Iceager 10:47, 18 Aug 2004 (UTC)

Ethnologue (the authority for ISO 639-3; see #ISO_639_sources) lists 10 Eskimo languages, including Northwest Alaska Inupiatun, which has the 639-3 code "esk" (probably based on the US Census listing it as “Eskimo”). The Eskimo group has two branches, Inuit and Yupik; Northwest Alaska Inupiatun is in the Inuit branch. --Thnidu (talk) 17:54, 21 November 2012 (UTC)

The same ISO 639-2 and ISO 639-3 code?[edit]

I think that in the Alpha-3 code space paragraph should be mentioned that languages have the same ISO 639-2 and 639-3 (in case of 639-2 at lest the "form for TERMINOLOGICAL applications"). Am I assuming correctly? Could you please correct my assumptions? —Preceding unsigned comment added by (talk) 23:06, 14 February 2008 (UTC)

Not universally correct. See #ISO_639_sources. --Thnidu (talk) 17:19, 21 November 2012 (UTC)

What is bibliographic? terminological?[edit]

This sentence won't be clear for the average reader: "In these cases, the first code is bibliographic (ISO 639-1/B), and the second code is for terminological use (ISO 639-2/T)." Bibliographical? For use in a bibliography in a book if you use books from another langauge maybe? For use in a library? And terminological? What's that? For use in a dictionary maybe? So if you have the history how the word came into exist you can use the code for middle English? A clarification please.

AFAIK these denominations, just as the whole mess with 3 different code sets, exist only for historical reasons. You're right, the sentence "For these languages, the first three-letter code is for bibliographic use (ISO 639-2/B), and the second three-letter code is for terminological use (ISO 639-2/T)" is quite obscure. "Bibliographic" codes are those traditionally used by US-American libraries, based on Library of Congress's MARC standards. They are derived from the English names of languages, which is not so cool (read: anglocentric). B codes are deprecated. "Terminological" codes are mostly based on self-denomination of languages, and they cover more languages. Those should be used. If a 2-letter code exists, it should be preferred over the 3-letter code. The table should have separate columns for B and T codes and show T codes first, as they're the preferred ones. -- 18:49, 11 August 2005 (UTC)
there are not more T than B codes
B should lead, because this is common, see official reference.
IMO seperate cols are not needed. only few codes have B/T
Tobias Conradi (Talk) 18:36, 17 October 2005 (UTC)
The current guideline is IETF's BCP 47 (replaces RFC 3066). It states on page 8 that the shortest code should be used, that the ISO 639-2/T code should be used when no ISO 639-1 code exists, and that a divergent B code should not be used:
  Note: For languages that have both an ISO 639-1 two-character code
  and an ISO 639-2 three-character code, only the ISO 639-1 two-
  character code is defined in the IANA registry.
  Note: For languages that have no ISO 639-1 two-character code and for
  which the ISO 639-2/T (Terminology) code and the ISO 639-2/B
  (Bibliographic) codes differ, only the Terminology code is defined in
  the IANA registry.  At the time this document was created, all
  languages that had both kinds of three-character code were also
  assigned a two-character code; it is not expected that future
  assignments of this nature will occur.
So B codes are in fact deprecated.-- (talk) 19:46, 30 November 2008 (UTC)

Table conversion[edit]

Since uniform data like ISO 639 codes ought to be presented in a tabular format, I wrote a quick program to do the conversion:

// File:    convert-iso639.cpp
// License: Public domain
// Author:  Ardonik
#include <fstream>
#include <iostream>
#include <string>
using namespace std;

void generate(istream& in, ostream& out) {
  string line;
  while (getline(in, line)) {
    if (line.length() < 5) continue; // Blank line
    if (line.substr(0, 2) == "==" && line.substr(3, 2) == "==") {
      // New section.
      // End old table, if applicable.
      if (line != "==A==") out << "|}\n";
      // Start a new table.
      out << line << "\n";
      out << "{| border=\"1px\" cellspacing=\"0\" cellpadding=\"2px\"\n";
      out << "|- style=\"background-color: #a0d0ff;\"\n";
      out << "!Alpha-3!!Alpha-2!!Language name\n";
      out << "|-\n";
    } else {
      // Just another entry in the current table.
      string alpha3 = line.substr(1, line[4] == '/' ? 7 : 3);
      string alpha2 = line.substr(10, 2); if (alpha2=="  ") alpha2 = " ";
      string language = line.substr(16);
      out << "|" << alpha3 << "||" << alpha2 << "||" << language << "\n";
      out << "|-\n";
  out << "|}\n"; // Close last table.
  if ( && !in.eof()) cout << "Could not read from input\n";
  if ( cout << "Could not write to output\n";  

int main(int argc, char* argv[]) {
  if (argc != 3) {
    cout << "Usage: " << argv[0] << " [infile] [outfile]\n";
    cout << "  If infile is \"-\", input will be read from stdin.\n";
    cout << "  If outfile is \"-\", output will be written to stdout.\n";
    return 0;
  string infile = argv[1], outfile = argv[2];
  if (infile == "-" && outfile == "-") {
    generate(cin, cout);
  } else if (infile == "-") {
    ofstream out(outfile.c_str());
    generate(cin, out);
  } else if (outfile == "-") {
    ifstream in(infile.c_str());    
    generate(in, cout);
  } else {
    ifstream in(infile.c_str());    
    ofstream out(outfile.c_str());    
    generate(in, out);
  return 0;

To operate the program, you should cut the data (headings included) from the old version of the page and paste into a text file like old.txt. Running convert-iso639 old.txt new.txt will give you the tabled version in new.txt, and you can copy and paste that into the article. --Ardonik 01:19, Aug 12, 2004 (UTC)

Serbo-Croatian, Serbian, Croatian[edit]

  • Three letters codes "scr" and "scc" are from Serbo-Croatian and differs alphabet (scr for Latin script and scc for Cyrillic script). But, both -- Serbian and Croatian -- texts from the time of Serbo-Croatian standard could be written in both alphabets (especially Serbian, which has 50/50 texts in Latin and Cyrillic alphabet). In this table "scr" refers only to Croatian and "scc" refers only to Serbian. The question is: Is it ISO mistake (because of this possibility I didn't change codes) or Wikipedia mistake? --Millosh 07:15, 10 Nov 2004 (UTC)
As you said "scr" was not just for Croatian, but for Serbo-Croatian in Latin, and than was also used as a legacy code to refer to Serbian (in Latin)... No error from Wikipedia: "scr" always had to be qualified (but there was no way in ISO 639 to qualify a script, like what BCP47 allows using "scr-Latn" or "scr-Cyrl"; because of that, the case where one wanted to refer to Serbo-Croatian (including Croatian!) written specifically in Cyrillic was to use "scc", but there was NO code in ISO 639 to refer to ONLY Croatian or only to Serbian in Latin as "scr" was used for all variants, just like "sh" it was a collective code, and even "scc" was collective because it allowed representing as well Croatian in Cyrillic, Bosnian in Cyrillic and in fact also other former Yugoslav languages written in Cyrillic). ISO 639 has always been a mess ! BCP 47 made a better job by separating the script from the language code. What this means is that even with BCP47 "scc-Cyrl" and "scr-Cyrl" are completely equivalent to "sh-Cyrl", and "scr-Latn" is completely equivalent to "sh-Latn" (and BCP 47 does not indicate which script is used in "sr", "hr", "bs", they can ALL refer to their Latin or Cyrillic variants).
So "scc" does not refer only to Serbian-Cyrillic ("sr-Cyrl"), as it also refered to Croatian in Cyrillic ("hr-Cyrl"), Macedonian in Cyrillic ("mk-Cyrl") or Bosnian in Cyrillic ("bs-Cyrl").
And "scr" does not refer to only Croatian-Latin ("hr-Latn") and not even just Croatian ("hr") in any script ("hr-Latn" or "hr-Cyrl"), as it also refers to Bosnian ("bs") in any script ("bs-Latn" or "bs-Cyrl"), Serbian in any script, Macedonian in any script (there's even been cases where it referred to Slovenian, and Megleno-Romanian, and sometimes even Ukrainian, Albanian, and some local forms of Greek used in the former Yugoslavia). All these codes were a mess and completely broken (even if many linguists consider that Serbo-Croatian is not dead and is a valid "macrolanguage" encompassing only Serbian, Croatian, and Bosnian, indistinctly and independantly of the script that they use for their written form, because ISO 639 is not just for written languages and also covers spoken languages where the script difference is not at all relevant, but regional phonetic variants are relevant but not encoded at all by these codes: regional phonetic variants exist also in Serbian, Croatian, Bosnian and there are also minor grammatical/lexical variants in the spoken form of all these branches of the Serbo-Croatian macrolanguage, but here again ISO 639 does not cover at all these regional variants for grammar or lexic, which are only condired in ISO 639 as "dialects").
"sh" is fully equivalent to "scr", both codes have been deprecated in ISO 639 (their equivalence is kept in BCP 47); "scc" (fully equivalent to "sh-Cyrl", "scc-Cyrl" and "scr-Cyrl" in BCP 47) was also deprecated at the same time in ISO 639. None of these 3 legacy ISO 639 codes "sh", "scr" or "scc" (and none of the derived BCP 47 codes with a script suffix) distinguish precisely either Serbian, or Croatian, or Bosnian (or even Macedonian or Slovenian). verdy_p (talk) 11:48, 5 November 2018 (UTC)
verdy_p, are you sure "scc" can refer to Bulgarian-ish Macedonian in Cyrillic ("mk-Cyrl"), or did you mean Montenegrin in Cyrillic ("cnr-Cyrl") instead? Love —LiliCharlie (talk) 12:09, 5 November 2018 (UTC)
I can speak about both (Montenegrin is a minor variant of Serbian, but Macedonian was part of the former "Yugoslav" lingua-franca, even if it was not part of Serbo-Croatian, and we've seen past references to Macedonian being coded with "sh" which then could be used as well to mean "Yugoslav", just like it was for Slovenian; so the former Yugoslav publications, which could include a mix of these languages, were just referenced with "sh"; there are other variants in Serbia, including an "Albanian Serbian" former dialect for Kosovo; Croatian also has its variants (difficult to distinguish from Bosnian; Bosnian has its variants as well, as today's most common use of Serbian in Cyrillic excluding Latin is found in the autonomous Serbian Republic in Bosnia and depending on people there, confusion still occurs: is it Serbian or Bosnian?). verdy_p (talk) 13:24, 5 November 2018 (UTC)

Including native names in table[edit]

Although the English name for a language is important, the native name is equally if not more important. It is arguablly preferrable to display native names on webpages attempting to alert speakers of the displayed language that content is available in their language. For example, the "In other languages" field uses native names not English ones. I think it would be a worthwile addition to include a native names column in the ISO 639 table. Many of the native names are already available from their respective language articles.

An example of what I'm thinking:

Cleanup needed[edit]

I looked at the article and was unable to understand most of it. IMO, the entire text needs to be rewritten so that it is accessible to people who don't already know what it's about. --Smack (talk) 21:42, 28 August 2005 (UTC)

It also needs to be checked for accuracy. I just removed Banyumasan from the list, because it's not listed here [3], but there are probably other languages which should be removed too. (I also added Ainu, which is on the list of updates [4], but not the main alphabetical list yet, so please don't delete it.) --Chamdarae 00:32, 30 August 2005 (UTC)

I took a stab at clarifying the discussion of Alpha-x spaces, but a lot more could be done.--A12n 14:33, 26 November 2006 (UTC)

with you inserted a false statment. And IIRC in mathematics we called it "bound" not "limit". It is ONE upper bound. Not THE upper bound. There are zillions upper bounds. Tobias Conradi4 14:36, 31 October 2007 (UTC)

Hey, at they have these very nice pictures which I think would go some way to improving clarity, and shouldn't be hard to translate: and -- note the delightful bilingual summary below. Both images are creative commons attribution, both are Inkscape SVG so even with notepad, the desperate could edit them. Right now it's almost seven AM, I've been up all night and shouldn't be on addictive Wikipedia at all and my overheating monitor is flashing in my face making me nauseous ;) but yeah. Although I believe that content is far more useful than images in the long run, I think that converting these images would give a lot of bang for the buck. MIGHT have a crack at it later. Probably not but hey. Anyway correct me if I'm wrong on any of these counts -I'm known to be wrong often. (talk) 17:49, 8 March 2008 (UTC)

There's a number of broken links on this page that should point to Bluethailand (talk) 03:35, 8 November 2017 (UTC)

New RFC[edit]

RFC 3066 has been replaced by RFC 4646. — Preceding unsigned comment added by (talk) 07:51, September 13, 2006‎

List of ISO 639 codes[edit]

I think List of ISO 639-3 codes should be renamed as List of ISO 639 alpha-3 codes or simply moved to List of ISO 639 codes. The same set of codes are not just used in ISO 639-3, but also ISO 639-2 and ISO 639-5.

Many codes that were in Part 2 (i.e., 639-2) have been removed from Part 3. See #ISO_639_sources. --Thnidu (talk) 17:20, 21 November 2012 (UTC)

Furthermore, there are lots of info about "native names" in the articles List of ISO 639-1 codes and List of ISO 639-2 codes. However, these native names are not included in the ISO standard; therefore I think that a better way is to move this part into this article (List of languages by name, or its sub-lists), remaining only ISO 639 codes, English names and French names (French names is a part of the ISO 639).

My plan is to:

  1. Copy the "native names" column inside List of ISO 639-1 codes and List of ISO 639-2 codes to → List of languages by name
  2. Merge contents inside List of ISO 639-1 codes (which is a relatively shorter list) to → ISO 639-1
  3. Deprecate / delete List of ISO 639-1 codes and List of ISO 639-2 codes
  4. Move List of ISO 639-3 codes to → List of ISO 639 codes
  5. Add ISO 639-2 and ISO 639-5 codes into List of ISO 639 codes

-- Hello World! 08:48, 17 July 2008 (UTC)

See Talk:lists of ISO 639 codes TalkChat (talk) 18:37, 11 November 2008 (UTC)

Template for Ladin needed[edit]

Hello. Could you create a template for Ladin which has an official status as minority language in the Province of Bolzano-Bozen and the Province of Trento, Italy. Please implement this template also at Commons, where it would be of much use, since many of the mountains in the Dolomites have actually Ladin names. Regards Gun Powder Ma (talk) 14:59, 17 February 2009 (UTC)


If this was withdrawn, then what is the new standard? This needs clarification. It writes "it was withdrawn" and then it stops. It's natural to ask "then what is in place of it now?" Qorilla (talk) 21:04, 30 June 2009 (UTC)

Personal request[edit]

Apologies for off-topic content, but does anyone know how to contact the maintainers of ISO 639? Have tried their website, but they list a postal address and a phone number, but no email address. Thanks, reply to my talk page please. Mglovesfun (talk) 12:55, 17 December 2009 (UTC)

The different parts of 639 have different Registration Authorities. 639-3's is SIL, the maintainers of the Ethnologue; 639-6's is Geolang. --Thnidu (talk) 17:16, 21 November 2012 (UTC)

What are the # numbers?[edit]

The table in the middle of the article has a column called # that isn't explained at all, and further # numbers appear throughout the article. Can someone explain what those are? (talk) 17:08, 12 December 2014 (UTC) (lKj)

How many codes in ISO 639-3 Comment[edit]

The table near the beginning of the article indicates that ISO 639-3 has 7704 codes in it. However, I just downloaded the code table from the ISO 639-3 registrar, and it contains 7865 codes. Is the 7704 figure just old and needing to be corrected, or does it represent some subset of the complete set? AlbertBickford (talk) 21:36, 27 October 2015 (UTC)

Dunno. If the figure of 7704 has no ref, then I say delete it. ISO does currently have 7865 codes. Ethnologue 18 has about 400 fewer; the difference is due to historical languages, conlangs, things that went extinct before ca. 1950, maybe some other stuff. — kwami (talk) 00:51, 28 October 2015 (UTC)
Right, this number is supposed to be the number of code elements in the standard. But, I wanted to make sure that someone wasn't excluding macrolanguages or anything like that. (Still, if they did, that should have been documented.) I'll make the change. AlbertBickford (talk) 00:57, 28 October 2015 (UTC)
Other WPs (Kazakh, Korean) have the name number. Looks like the figure might be from 2007. — kwami (talk) 00:59, 28 October 2015 (UTC)
That would be an increase of 161 codes in eight years, which strikes me as a little low, but it really doesn't matter. In my changes, I included the date, and updated Part 2 also, with references. The Part 1 number here is 184, and likewise the number listed in List_of_ISO_639-1_codes, but on the ISO 639-1 page it says only 136. So, I fixed that too, but I'd prefer to do so on the basis of an official list--I wasn't able to find one. Do you know if the official list is available on the web, or only in ISO print publications? AlbertBickford (talk) 01:25, 28 October 2015 (UTC)
The official list is downloadable and can be imported easily into Excel. The instructions are on one of the ISO 639-3 pages. --Taivo (talk) 01:46, 28 October 2015 (UTC)
That's the ISO 639-3 list, and that's how I got the 7865 number. I was wondering about the 2-letter codes in the ISO 693-1 list. I couldn't find any links from Wikipedia to any external source for those 184 (?) codes. Would it be safe to use the list from ISO 639-2 and just count the number of 639-1 codes it references? AlbertBickford (talk) 02:44, 28 October 2015 (UTC)


I'm wondering what criteria the ISO uses to sort languages into 639-1, 639-2, and 639-3 respectively. From what I've gathered, 639-1 consists of official languages and lingua francas, 639-3 is a comprehensive listing of almost all languages, while 639-2 is somewhere in the middle. However, it would be helpful to know what standards they use to determine whether a language is given a code in a certain category or not. Xcalibur (talk) 23:43, 22 September 2017 (UTC)

The short answer is that it is a consequence of historical developments, which can only be partially explained by the nature of the languages involved. The 2-letter codes in 639-1 were based on major languages, and if I remember right, were the first to be adopted. 639-2 came later, as an outgrowth of library cataloguing standards. At the time it was proposed, equivalences between 639-1 and 639-2 were established (for languages that were included in both). Considerably later, 639-3 was proposed to greatly expand the number of languages covered by the standard, and to provide a system whereby the code set could be amended in an orderly way. ISO turned to Ethnologue, which had a set of 3-letter codes that provided the coverage they wanted, but there were some cases where Ethnologue's codes didn't match 639-2--and the desire was that the new 639-3 codes for individual languages would not negate or conflict with 639-2 codes. So, about 10% of the Ethnologue codes were changed, and the result became the first version of 639-3. Now that 639-3 is established, essentially all new codes for languages are added to 639-3, since if a language isn't already in 639-1 or 639-2, it wouldn't be big enough or have enough established literature to qualify for either of those two. So, in essence, 639-3 provides a way to expand the code set provided by 639-2 without negating the earlier standard (which, when we're talking about standards, is something that you generally don't want to do--it goes against the meaning of what it is to be a standard). I agree that it would be nice to spell this out in the article, but I chose to summarize things here because I'm doing this from memory and don't have citable sources at hand. So, I'd encourage anyone who does to make the change. AlbertBickford (talk) 02:01, 23 September 2017 (UTC)
I see, thanks. So it was an organic growth process, rather than a cut-and-dry rule (eg, 639-1 is for languages with official status, 639-2 is for languages with some sort of distinction, 639-3 is for all languages). That makes it a little more complicated, but I still think the article should cover the system of classification, however logical it is or is not. Xcalibur (talk) 02:44, 23 September 2017 (UTC)
Yes, that's a good way to put it: an organic growth process rather than a cut-and-dry rule. Because of the history, the three standards embrace different types of languages--because of the types that were trying to be included at the time the standard was established. So, it seems like there should be a rationale for what's in each standard, and there was, sort of. But, there are lots of cases where such decisions could be questioned after the fact, but because they are standards, nobody is going to change them. AlbertBickford (talk) 04:06, 23 September 2017 (UTC)

External links modified[edit]

Hello fellow Wikipedians,

I have just modified one external link on ISO 639. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

As of February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{sourcecheck}} (last update: 15 July 2018).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 11:06, 10 November 2017 (UTC)