Question on c == 178 || c == 179 || c == 185 in tokenizer

Developer forums (C::B DEVELOPMENT STRICTLY!) > CodeCompletion redesign

<< < (3/3)

MortenMacFly:
Probably I am missing something, but I don't see what the problem is.

The standard of C++ in theory allows for unicode characters in variable names (if you encode the file properly and use some vodoo- command line switches to GCC, for example). So what wxIsalpha does is correct. With wxIsdigit its different because "²" and stuff are really no digits, thus the work-around.

If a user tries to compile a file with strange variable names and doesn't setup everything properly the compiler will complain anyways.

So what bug are you trying to fix? Is there a combination / source code that does not work properly? Can you provide a test case then?

ollydbg:

--- Quote from: MortenMacFly on March 29, 2012, 09:32:14 am ---So what bug are you trying to fix?

--- End quote ---
I'm viewing the code, and I think we can remove such #ifdef snippet, and made the source code easy to read and understand.

--- Quote ---Is there a combination / source code that does not work properly? Can you provide a test case then?

--- End quote ---
The change I suggest does not fix any errors, but just a kind of re-factoring.

thomas:

--- Quote from: MortenMacFly on March 29, 2012, 09:32:14 am ---With wxIsdigit its different because "²" and stuff are really no digits, thus the work-around.
--- End quote ---
Unluckily, this is no bug, wxWidgets is correct for once.

Unicode is admittedly retarded in many places, this is one -- but it is pointless to discuss whether it makes sense or not, or whether it's "correct". Unicode, which is the standard, defines it that way, so it is correct by definition. It's totall bull, and it doesn't even make sense, but it is correct.

For example, ³ is SUPERSCRIPT THREE, categorized under Number, other, and assigned the numeric value 3. See here for a nice tabular breakdown.

MortenMacFly:

--- Quote from: thomas on March 29, 2012, 12:04:43 pm ---
--- Quote from: MortenMacFly on March 29, 2012, 09:32:14 am ---With wxIsdigit its different because "²" and stuff are really no digits, thus the work-around.
--- End quote ---
Unluckily, this is no bug, wxWidgets is correct for once.

--- End quote ---
I didn't say its a bug in wxWidgets (its not!) I said its wrong in our case as a assignment like:

--- Code: ---int i = ³;
--- End code ---
...and a variable like:

--- Code: ---int i³ = 5;
--- End code ---
is not going to work.

thomas:
Yes, but according to Unicode, both are perfectly legitimate. And, it is correct for wxIsdigit to say that it's a digit, because it is. Unluckily, that's not what we're interested in.

In C++, even your second snippet is strictly legitimate (believe it or not!), as universal-character-name (without further explanation!) is allowed in identifiers as well as "other implementation-defined characters" (whatever that may be).

Funnily, the standard defines exactly what digits (0-9) and nondigits are (a-z, A-Z, and _), but the specification text later talks of letters and digits, without specifying what letter refers to, or what the difference is between "letter and nondigit" or "digits, nondigits, and pretty much every character" and "just every character". And, there is no mention of universal-character-name in the text, either.

On the other hand, for integer literals, C++ very clearly defines what can go into the literal, ² and ³ are not in the list (although they are digits).

Which... I agree, is all in all totally retarded. Here we have, again, a proof of concept for "internationalization is shit".

We might actually be off better using find_first_of("xX0123456789ABCDEFabcdef"); because that much more closely matches what C/C++ understands as number (the same with A-Z, a-z, and underscore added for identifiers).

Actually, why hasn't anyone reported problems with Ogham and Klingon numbers yet?

Navigation

[0] Message Index

[*] Previous page

Go to full version