Author Topic: CB and ubuntu (Read 19186 times)

thomas · « **Reply #15 on:** April 26, 2006, 11:57:05 am »

toupper and tolower are used because they offer an easy way to distinguish between nodes and leaves in the configuration, and the configuration can be accessed the fastest possible way.

You can store your configuration for example like this:
<path name="foo"> <path name="bar"> <key name="x" value="y" /> </path> </path>

In fact, most projects use XML in this "standard" way, too, as the structure is clear, you can design a DTD, most every web browser will display the file just fine, and it is easy to edit nodes by hand.
However, this is not desirable at all. Many people see the fact that something uses XML as a clear invitation to regularly edit the file in Notepad. We had many false bug reports and troubleshooting issues in the past because someone edited project files by hand and forgot a closing tag. That is unnecessary grief which would not happen if people did not say "why, I can edit it, so it is meant to be edited". If we were using SQLite or Berkeley DB as a storage backend, nobody would ever think about this.

Remember, we are not writing a hypertext document or something else that needs to be displayed in a browser, nor anything a human neads to read or understand at all. We don't care about DTDs or anthing of that matter. All we need is a structured, flexible data storage.

Another major issue is speed. The configuration is accessed many thousand times (sometimes 50-60 times per second), so it can become a major bottleneck if care is not taken.
Following the above "standard" scheme, you have to iterate recursively through the path to find the route to a key, each time asking the XML engine for a node pointer, and compare its name attribute. Also, you have to visit each and every key node in a subpath sequentially and compare its name.

On the other hand, what if a path node had no name attribute, but a path node were its name? Then you could just ask the XML engine for the first child node of type "name", no need to iterate anything. The same would work for keys, but you need a way to somehow distinguish keys and path nodes. That would turn O(n) into O(1) for accessing a value¹.
This lead to the scheme used in the Code::Blocks configuration. Path nodes are lowercase, keys are uppercase, and the tag is the node's/key's name:
<foo> <bar> <X value="y" /> </bar> </foo>

Although this looks weird and clearly makes things like a DTD impossible, it is nevertheless valid, and by order of magnitude faster than the other solution. Also, it allows for a couple of other good things.

Of course, for this to work, only pathnames and keys that constitute valid xml tags can be used, but that is normally not a problem. For practical reasons (all devs have to be able to read the names) we only use English names, anyway.

The problem with the Turkish locale, in my opinion, is that it does soemthing that is actually not right. It silently transforms ANSI characters to "strange Unicode chars" one way.
Very well, you could still argue "but that is how we use them in Turkish". However, it does not do the backwards conversion "correctly" ("correctly" means "non-Turkish" :lol:), and that is really bad. toupper(tolower(x)) does not give you the same as toupper(x)! :shock:
I mean, maybe that is really how it should be, but to me it seems quite wrong. It is like (5 + 3) - 3 != 5.

Anyway, let's not get philosophic on how localisation should be :lol:
Both toupper and tolower are now custom locale-unaware functions, so that problem should be solved

¹ In reality, things are a lot more complicated. tinyXML indeed does a linear search internally, so we do actually have a complexity of O(n). However, we would have to add another linear search on top of that, giving us a total O(n²). So the correct figures would be O(n²) versus O(n). The principle is the same, I did not want to make it more complicated than necessary.
However, if tinyXML were optimized one day to do a map lookup instead of a linear search, we would have O(n²) versus O(log(n)) which would really be *a lot* more favourable.

Pecan · « **Reply #16 on:** April 26, 2006, 01:42:12 pm »

Nice explanation. Something new learned.

thanks
pecan

bluekid · « **Reply #17 on:** April 26, 2006, 02:21:00 pm »

excuse me i dont understand

Quote

The problem with the Turkish locale, in my opinion, is that it does soemthing that is actually not right. It silently transforms ANSI characters to "strange Unicode chars" one way.

i dont understand XML but i think there must be a method force a language
am i wrong ?

thomas · « **Reply #18 on:** April 26, 2006, 03:37:44 pm »

Generally, XML is able to store any kind of text or binary data that you can think of.
However, the format of a tag name is a lot more restrictive.

Quote from: xml standard

A Name is a token beginning with a letter or one of a few punctuation characters, and continuing with letters, digits, hyphens, underscores, colons, or full stops, together known as name characters
[...]
Name ::= (Letter | '_' | ':') (NameChar)* NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender

tinyXML is particularly pedantic when it comes to that. If you use any illegal character in a tag name, it will refuse to parse the entire document.
And more, tinyXML even treats a few characters that are actually legal as illegal (for example the colon) :lol:

We use plain English words and underscores as path/key names for the configuration. In addition to this, ConfigManager silently replaces a range of commonly appearing characters that are illegal or that might cause trouble with an underscore (any occurrence of " -:.,;!\"$%&()[]<>{}?*+|#" is replaced). This is also a safety measure for a contributor or plugin developer who might not know anything about that (or against someone who is being deliberately hostile).

Another safety message is the exception that you have seen. It may seem harsh to terminate the application with an exception, but there is a good reason for this. The exception prevents you from making the entire configuration file invalid. If you encounter such a situation, this is not simply a "condition" but a serious design mistake. Therefore, we don't just display a warning, but we stop you before you can do any actual damage.

But now back to the problem: the functions toupper and tolower turn "normal English" into "not so normal English" in Turkish locale by introducing characters that are well outside the ANSI range. So suddenly those tag names become illegal. It is not XML that is having a problem, it is the locale. The exception that you see is only the symptom, not the cause.

bluekid · « **Reply #19 on:** April 27, 2006, 07:27:11 am »

so what is the solution
what can i do

thomas · « **Reply #20 on:** April 27, 2006, 09:33:15 am »

Nothing, it should work now.

bluekid · « **Reply #21 on:** May 03, 2006, 10:30:50 am »

i re install ubuntu with english language and use and now i use CodeBlocks there is no problem for me
but how can i advice in my country use CodeBlocks ?

in my opinion path/key names without 'i' or 'I' letter problem can be solved

bluekid · « **Reply #22 on:** May 09, 2006, 08:29:01 am »

i re-install ubuntu in turkish and using 03 May 2006 build install l C::B withouth a problem

thanks

Code::Blocks Forums

News:

Author Topic: CB and ubuntu (Read 19186 times)

thomas

Re: CB and ubuntu

Pecan

Re: CB and ubuntu

bluekid

Re: CB and ubuntu

thomas

Re: CB and ubuntu

bluekid

Re: CB and ubuntu

thomas

Re: CB and ubuntu

bluekid

Re: CB and ubuntu

bluekid

Re: CB and ubuntu solved