Talk:Double-byte character set

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Since Unicode supports all the major languages in East Asia, unlike many other codepages, it is generally easier to enable and maintain software that uses Unicode.
Does this mean there are some other codepages that do? —Frungi 03:17, 11 July 2005 (UTC)[reply]

It's no help to redirect to a nonexistent page.

Character set / Encoding[edit]

I feel confused when I read that UTF-8 would be a character set while it is in fact a character encoding, a way to represent characters (code points) of Unicode plans. Is DBCS misnamed? Should it have been named "double-byte character encoding" instead, or does it really represent a set of symbols (characters)? Teuxe (talk) 18:16, 31 August 2010 (UTC)[reply]

That depends on whether you're asking whether the people who coined the term "double-byte character set" should have called it a "double-byte character encoding" (I would say "yes, they should have, to make it clearer what they're talking about", although I don't know whether, at that time, the "character set" vs. "character encoding" distinction was being properly drawn) or whether the page should be named "double-byte character encoding" rather than "double-byte character set" (I'd say that, if DBCS is the common term, it shouldn't be).
The page should note that it's an encoding; I've changed the first paragraph to use "character encoding" rather than "character set". Guy Harris (talk) 23:23, 25 January 2013 (UTC)[reply]

DBCS/MBCS in Windows[edit]

In Microsoft Windows, MBCS denotes encodings that use a mixture of 1 and 2 bytes per character. In C and C++ using Microsoft's "generic-text mapping" this is enabled via the macro _MBCS. The documentation states that MBCS is DBCS, so in Windows DBCS also refers to 1/2 byte encodings.

Ref: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_90c3.asp http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_using_generic.2d.text_mappings.asp

Perhaps get this into the main text?

Cheers,

- Alf

Always in East Asia?[edit]

Why are almost all double-byte character sets from East Asia? --84.61.7.180 16:11, 3 June 2006 (UTC)[reply]

Probably because most other cultures either use the Roman alphabet for writing, and thus mainly just need some accented versions of Roman-alphabet letters (thus requiring only 104 or so code points, so they can continue to use one byte), or use another small alphabet (thus also requiring only one byte); Chinese, Japanese, and Korean all use logograms or syllabaries, which require a lot more code points, thus requiring more than one byte. Guy Harris (talk) 23:28, 25 January 2013 (UTC)[reply]

DBCS on System i not terribly controversial[edit]

I work for a software company that builds software for the IBM System i (formerly AS/400 and iSeries). DBCS is certainly a complex topic but not one which I would described as particularly controversial for users of this platform. Poorly understood and hard to comprehend, perhaps. Also, using the term DBCS-enabled with other IBM System i users would not be ambiguous. Most applications that run on the IBM System i today use DBCS rather than Unicode as it rather late comer to this platform and has at least one major restriction on the System i platform that prevents it's rapid adoption. That should be clarified. If DBCS is controversial and non-deterministic on other platforms I would suggest separate section to talk about DBCS on per platform basis. I'm new here so I did not want to go nuts editing this article without feedback or guidance.

Marty Acks 00:41, 17 July 2007 (UTC)[reply]

Perhaps the article used to say DBCS on System i was controversial, but it no longer does so. Guy Harris (talk) 23:31, 25 January 2013 (UTC)[reply]

IBM DBCS[edit]

IBM supported a true two-byte DBCS encoding, based on EBCDIC, back in the 1990s. (For example, the code X'4040' was the DBCS encoding for a space character, corresponding to the single-byte EBCDIC X'40' character code, and to ASCII X'20' and Unicode U+0020.) IBM COBOL (VS II) supported it with the PIC G(n) picture clause specifier, where G presumably stood for a 16-bit "graphic" character, as well as the IS DBCS class condition expression. Based on some of the documents I have for it, this character set was intended mainly for Japanese/Asian applications. Here are some online references: 1, 2, 3, and 4. — Loadmaster (talk) 19:00, 27 November 2013 (UTC)[reply]