Developer forums (C::B DEVELOPMENT STRICTLY!) > Development
Linux won't edit Jonsson umlauted name
Defender:
I have a general question related to source encodings.
Is it legal to have non-English (non-ASCII) single byte characters is a C source file as a string literal? Does the compiler recognize, if there are multibyte characters is the source (for example as UTF-8 string literals)?
Thanks: Defender
thomas:
--- Quote from: Defender on May 18, 2006, 04:53:40 pm ---I have a general question related to source encodings.
Is it legal to have non-English (non-ASCII) single byte characters is a C source file as a string literal? Does the compiler recognize, if there are multibyte characters is the source (for example as UTF-8 string literals)?
Thanks: Defender
--- End quote ---
Yes and no. If you specify the input character encoding, it is legal. Not all compilers support that, but gcc for example, does (gcc fully supports UTF-encoded sources, and on the majority of systems, that's even the default).
Don't worry about that kind of stuff, because you will know if you need to specify the encoding because your source will immediately fail to compile with the message "Illegal byte sequence" :)
Defender:
Thanks for the info, thomas :)
thomas:
--- Quote from: MortenMacFly on May 18, 2006, 02:57:12 pm ---I guess it's even worse. As far as I understand variable names could be in unicode for some compilers, too.
--- End quote ---
Luckily, it says "A valid identifier is a sequence of one or more letters, digits or underscore characters", and "letter" is defined as [A-Za-z]. Phew... :)
You're right though, it is still difficult enough.
--- Quote from: MortenMacFly on May 18, 2006, 02:57:12 pm ---Isn't it better to support a limited number of file types and try to determine when a file is of unknown type and request the user to convert it into something "general"? So rather than to try to be very smart be very strict?
--- End quote ---
That was the secret "Plan B" :) Though not the best solution, it would probably work good enough.
Possibly we'll have to settle for something like that (or a hybrid solution) in the end, as it is really not trivial, and it is quite possible that we don't find a better solution.
MortenMacFly:
--- Quote from: thomas on May 18, 2006, 06:22:49 pm ---Luckily, it says "A valid identifier is a sequence of one or more letters, digits or underscore characters", and "letter" is defined as [A-Za-z]. Phew... :)
--- End quote ---
Yes, I got that part wrong. I was refering to an article about the C# language (compiler) but what was ment was the content of a variable, not the name... sorry. :oops:
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version