The 27 July 2009 build (5716) is out.

User forums > Nightly builds

<< < (9/15) > >>

Jenna:
@vix:
could you please send me example-files that fail, if possible ?
see: http://forums.codeblocks.org/index.php/topic,10191.msg70572.html#msg70572
Files with extremly less (non-ascii) characters are always hard to decode.

It might help to switch to multibyte-encoding (utf-8), if this is a legal alternative.

vix:

--- Quote from: jens on August 04, 2009, 01:59:57 am ---could you please send me example-files that fail, if possible ?

--- End quote ---

just sent

Jenna:

--- Quote from: vix on August 04, 2009, 08:24:01 am ---
--- Quote from: jens on August 04, 2009, 01:59:57 am ---could you please send me example-files that fail, if possible ?

--- End quote ---

just sent

--- End quote ---
The reason is quite simple, I decided to also search for latin-2 encoding needed for langauges like Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian (when in the Latin script), Slovak, Slovenian, Upper Sorbian, and Lower Sorbian, but it breaks some latin-1 encoded texts due to their similarity, not all and not always (depending of the used characters).
FireFox 3.5 still does not automatically detect these languages correctly (at least not in all cases).

I think I will add a configuration-option that the user can decide whether he/she wants to try to detect latin-2 also (with a hint, that it might break latin-1 detection).

I don't know a better solution, the alternative would be either not to detect latin-2 or some latin-1 encodings.

By the way: I did not stumble over the problem, because if you use german umlauts (äöü) or the german sharp s (ß), even single-byte texts are detected correctly, but I use utf-8 in almost any cases anyways (except for at work where I have to use windows tools, that do not all work with unicode correctly).

vix:

--- Quote from: jens on August 04, 2009, 09:09:54 am ---The reason is quite simple, I decided to also search for latin-2 encoding needed for langauges like Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian (when in the Latin script), Slovak, Slovenian, Upper Sorbian, and Lower Sorbian, but it breaks some latin-1 encoded texts due to their similarity, not all and not always (depending of the used characters).

--- End quote ---

If I understood, the encoding is not saved into the file, so C::B tries to guess the right one when the file is opened. So when you add the latin-2 encoding, some latin-1 files are seen as latin-2, so some characters are displayed in a wrong way.
So I change the option Settings >> Editor >> General Settings >> Use This Encoding to "As default encoding", so the C::B's auto-detectionis bypassed, and this solved my problem.

If this is the situation, I think that your idea of the configuration-option is the easiest one: I'm afraid that detecting latin-1 from latin-2 is not easy...

Biplab:

--- Quote from: vix on August 04, 2009, 11:54:40 am ---If I understood, the encoding is not saved into the file, so C::B tries to guess the right one when the file is opened. So when you add the latin-2 encoding, some latin-1 files are seen as latin-2, so some characters are displayed in a wrong way.
So I change the option Settings >> Editor >> General Settings >> Use This Encoding to "As default encoding", so the C::B's auto-detectionis bypassed, and this solved my problem.

If this is the situation, I think that your idea of the configuration-option is the easiest one: I'm afraid that detecting latin-1 from latin-2 is not easy...

--- End quote ---

A file encoding is never saved to a file. At least I'm unaware of any widely popular method of storing file encoding data to a file. Reason is pretty simple. One need to strip that data (encoding detection data) before feeding it to another program. Or the other program should be aware of how to strip that data.

Most of the known software performs encoding detection by sampling data from file. There are BOM for UTF encoded file. But for other encoding sampling and measuring frequency of encoded characters is the only way to detect an encoding. Mozilla's encoding detection is an example of this.

For a large project with files encoded with a less popular encoding scheme will surely degrade Code:Blocks' performance. Precisely this was the reason Yiannis objected (long ago) to the inclusion of Mozilla's encoding detection code to trunk. IMHO it should be turned on as an Option only.

IMHO the solution should be to give user two option.
1) Use a simple encoding detection scheme (to detect most of the popular encodings).
2) Use Mozilla's code to detect encoding (we should make it very clear that this option may affect performance in some cases).
3) Don't detect encoding.

And to each of the above options Fallback Encoding options shall be-
a) Use System encoding.
b) Use User-provided encoding.

Code:Blocks is an IDE & IMO we should try not to transform it to a Text Editor or a Browser.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version