Code::Blocks Forums

User forums => Using Code::Blocks => Topic started by: edison on October 15, 2014, 04:58:56 am

Title: Wrong file encode detected.
Post by: edison on October 15, 2014, 04:58:56 am
This is a bug which was existed long time ago.
test code:
Code
#include <stdio.h>

int main(void)
{
    printf("Hello World! 测试");

    return 0;
}

[attachment deleted by admin]
Title: Re: Wrong file encode detected.
Post by: stahta01 on October 15, 2014, 05:36:42 am
I suggest posting a link to the file or attaching the file.

Also state the correct encoding and the wrong encoding value detected.

NOTE: If this is a program run-time issue search for the solution because it is NOT a CB issue.

It is posted somewhere on this board.

Tim S.
Title: Re: Wrong file encode detected.
Post by: edison on October 15, 2014, 05:48:50 am
I suggest posting a link to the file or attaching the file.
Also state the correct encoding and the wrong encoding value detected.
NOTE: If this is a program run-time issue search for the solution because it is NOT a CB issue.
It is posted somewhere on this board.
Tim S.

?
I have uploaded a screenshot which include notepad++ and CB open same file. The correct one is notepad++.
It is not a good solution that to choice bypass the encode dectect.
Title: Re: Wrong file encode detected.
Post by: MortenMacFly on October 15, 2014, 08:52:39 am
This is a bug which was existed long time ago.
Sorry, but I can't reproduce. I've created a new file "main.c" copied/pasted your code snippet into it and it just looks exactly like in the forums and notepad...?!
My Settings are:
- Encoding: Windows 1252
- Use this encoding "as fallback"
- Try to detect...: OFF
- If conversion fails... : ON

However, are you sure you've saved your file in a proper file format like UTF-8?
Title: Re: Wrong file encode detected.
Post by: edison on October 15, 2014, 11:13:02 am
I have created a video for demo this issue:
https://vimeo.com/108988215 (https://vimeo.com/108988215)

The CB was ran with default settings.

You can reproduce this problem via add language in Windows CP, it is Simplified Chinese(the code page should be Windows-936 or GBK or cp936) here.
Title: Re: Wrong file encode detected.
Post by: MortenMacFly on October 16, 2014, 08:33:49 am
I have created a video for demo this issue:
I've seen this video. I am asking again:
However, are you sure you've saved your file in a proper file format like UTF-8?
From your video it seems not. Strange is also that you are not being warned about that issue. Usually C::B does so.
Title: Re: Wrong file encode detected.
Post by: edison on October 17, 2014, 06:27:23 am
From your video it seems not. Strange is also that you are not being warned about that issue. Usually C::B does so.
I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854
Title: Re: Wrong file encode detected.
Post by: MortenMacFly on October 17, 2014, 07:45:26 am
I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854
Well what happens is perfectly OK. As you create an UTF-8 w/o BOM and have setup windows-936 as default encoding it will be used when opening the file. There is no way you can distinguish exactly between UTF-8 and windows-936 in case you've only ANSI characters in the file.

So either you use UTF-8 with BOM or start just coding your Korean (whats-o-ever) stuff into the file. :)
Title: Re: Wrong file encode detected.
Post by: edison on October 17, 2014, 08:28:17 am
I had uploaded another video which show CB can not correctly detect the utf-8 file that save by itself:
https://vimeo.com/109202854
Well what happens is perfectly OK. As you create an UTF-8 w/o BOM and have setup windows-936 as default encoding it will be used when opening the file. There is no way you can distinguish exactly between UTF-8 and windows-936 in case you've only ANSI characters in the file.

So either you use UTF-8 with BOM or start just coding your Korean (whats-o-ever) stuff into the file. :)

but why if I use defaut encode(windows-936) to save file and CB will detect it as other encode ? Is it normal? Why other editor(for example notepad++) have not such problem?
Title: Re:
Post by: MortenMacFly on October 21, 2014, 10:48:19 pm
Because with the content you have in the file you have multiple options for a valid encoding. They're is no single solution. That's handled differently by editors. That's why I said enter some characters that make it easier for the detection engine to identify your language. We are using the same mechanism Mozilla uses,btw...
Title: Re:
Post by: MortenMacFly on October 21, 2014, 10:50:15 pm
...not to forget that another perfect solution is to use a file with bom if the target compiler supports this.
Title: Re:
Post by: edison on October 22, 2014, 04:58:56 am
...not to forget that another perfect solution is to use a file with bom if the target compiler supports this.

But I had encouter a problem when using UTF8 w/BOM:
There is some un-readable charter(s) in the first line (for example, the first line should be #include xxxx, but with UTF8 w/BOM that was changed to ("??")#include xxxx in the CB editor).
Title: Re: Wrong file encode detected.
Post by: MortenMacFly on October 22, 2014, 07:34:52 am
I don't know what exactly you do wring, but it works perfectly here:

Steps:
- Create a new file
- enable to use BOM
- save as UTF-8
- close file
- re-open file
-> Result: UTF-8, no matter if I had added ANSI or unicode characters from your example.