ISO-8859-1 detection problems

User forums > Help

ISO-8859-1 detection problems - HELP!

(1/2) > >>

rickg22:
Hi there. I'm having a problem with encoding detection on certain files. The files in question are written in ISO-8859-1. But when opening, C::B claims they were written in ISO-8859-7.

The lines in question are:

--- Code: --- if(this.fundamentos_legales[j].descripcion.indexOf(this.nombre_legislaciones[k].legislacion) != -1
|| this.fundamentos_legales[j].descripcion.indexOf(this.nombre_legislaciones[k].legislacion.replace(/[áéíóúñ]/gi,'')) != -1
|| this.fundamentos_legales[j].descripcion.indexOf(this.nombre_legislaciones[k].legislacioncorto) != -1
|| this.fundamentos_legales[j].descripcion.indexOf(this.nombre_legislaciones[k].legislacioncorto.replace(/[áéíóúñ]/gi,'')) != -1)
{

--- End code ---

See the accented characters there? They throw off the auto-detection (C::B changes them to "αινσϊρ". I can't tell C::B to use exclusively ISO-8859-1 because I have utf-8 files elsewhere. How can I tell C::B to use either ISO-8859-1,Windows-1252 *OR* utf-8?

Please help! :(

EDIT: Bug reported in https://developer.berlios.de/bugs/?func=detailbug&bug_id=18316&group_id=5358

I have an idea of what C::B should do. You could specify in the project settings (or the global settings, maybe both) what encodings can be autodetected. If an opened file is detected to be in another encoding, a confirmation dialog should open.

"This file was detected as ISO-8859-7, but we could be mistaken. Do you wish to use ISO-8859-7 as the encoding, or open it with another encoding?"

Then you choose the other encoding, with the option to [ ] Always open as _______ (encoding goes here).

Jenna:
The problem with encoding-detection, is that it works better if it has more text to test, the greatest problem are single characters.
It should work better, if you add some spanish (or other ISO-8859-1) comments.

We could also give the latin1-detection more precedence over the other detections, but this will most likely break the detection of other encodings.
The mozilla-developers have lowered the confidence of the latin1-prober, to make detection more accurate:

--- Code: --- // lower the confidence of latin1 so that other more accurate detector
// can take priority.
confidence *= 0.50f;

--- End code ---

With the following comment (italian as far as I know), I took from a file used to test encoding-detection, your sample works here:

--- Code: ---//L'albero è sul comò e perciò chissà perché non sarà più lì!!
--- End code ---

rickg22:
Thanks, but wouldn't that be a workaround more than a solution?

I just want to be able to specify which encoding is this file, without having to modify it (it's part of a team project, and I shouldn't modify the file unless it's absolutely necessary.

Jenna:

--- Quote from: rickg22 on August 19, 2011, 10:55:27 pm ---Thanks, but wouldn't that be a workaround more than a solution?

I just want to be able to specify which encoding is this file, without having to modify it (it's part of a team project, and I shouldn't modify the file unless it's absolutely necessary.

--- End quote ---
Yes, it's just a workaround.

Another solution would be to force enconding on file level, but we would have to store this information somewehere.
The correct place would be the files properties in the projectfile.
I did not look into it, but it should probably be not too hard to implement.

But before working on it, I would like to see the opinion of other devs and users.

MortenMacFly:

--- Quote from: jens on August 20, 2011, 09:07:52 am ---But before working on it, I would like to see the opinion of other devs and users.

--- End quote ---
I wonder how this is handled in other IDE's (CodeLite/VS for example). Does anyone know?
I don't recall that I've ever seen such kind of flags in project files, so there might be a "smarter" way.
Rick: How would you do this in VS?

Navigation

[0] Message Index

[#] Next page

Go to full version