Author Topic: encoding/reading problem  (Read 308 times)

Offline Khorne

  • Single posting newcomer
  • *
  • Posts: 5
encoding/reading problem
« on: September 25, 2017, 08:57:54 pm »
Hello,
I need help with encoding-type problem. When i am typing non-english letters, latin letters but with slavic changes, i am getting wierd symbols. I tried to change unicode to ISO 8859-1/ISO 8859-2/UTF-8/Windows-1250/(default) and only thing what has changed was type of wierd symbols displaying in console. Picture bellow.
Help is drastically needed!

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 10279
Re: encoding/reading problem
« Reply #1 on: September 25, 2017, 09:03:08 pm »
I think you need to tell the compiler that the encoding of your source files is not ascii.

To find the option you need to pass you have to search the manual of your compiler...
<debugger plugin maintainer>
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline BlueHazzard

  • Lives here!
  • ****
  • Posts: 1520
Re: encoding/reading problem
« Reply #2 on: September 25, 2017, 09:18:09 pm »

Offline Khorne

  • Single posting newcomer
  • *
  • Posts: 5
Re: encoding/reading problem
« Reply #3 on: September 25, 2017, 09:41:42 pm »
I think you need to tell the compiler that the encoding of your source files is not ascii.

To find the option you need to pass you have to search the manual of your compiler...
What manual? I have MinGW compiler.

Offline Khorne

  • Single posting newcomer
  • *
  • Posts: 5
Re: encoding/reading problem
« Reply #4 on: September 26, 2017, 03:51:55 pm »
unicode and windows are a bit difficult:
http://forums.codeblocks.org/index.php/topic,18803.msg128791.html#msg128791
Is there somewhere even simpler tutorial/manual with steps described in clearer steps in simple english?
« Last Edit: September 26, 2017, 03:54:54 pm by Khorne »

Offline BlueHazzard

  • Lives here!
  • ****
  • Posts: 1520
Re: encoding/reading problem
« Reply #5 on: September 26, 2017, 05:23:10 pm »
Ok, i will try...

Steps to do:
1) Put the line
Code: [Select]
system("chcp 65001  > nul"); as first line in your main:
Code: [Select]
#include <stdio.h>
#include <windows.h>
int main()
{
    system("chcp 65001  > nul");        // Important code line executed before any printf call
    // your code here...
    printf("На берегу пустынных волн\n");  // some Russian
    printf("Я можу їсти скло, і воно мені не зашкодить\n");  // some Ukrainian
    printf("Mohu jíst sklo, neublíží mi. \n"); // some  Czech
    printf("Môžem jesť sklo. Nezraní ma. \n"); // some  Slovak
    printf("Mogę jeść szkło i mi nie szkodzi. \n"); // some  Polish
    return 0;
}

2) Save as UTF-8 file. Do this by calling the following menu entries in codeblocks:
Edit->File encoding->UTF-8
Save the file:
File->Save file
every code file you use should be set to UTF-8 encoding

3) Compile and run. This will probably show gibberish (white squares) in the console... To change this you have to set the correct font:
3.1) After compiling and run the console will open (This black window)
3.2) In this window right click on the top left corner: A Menu will open. Select "Properties"
3.3) In the opening window select "Font" tab from the top
3.4) Select Lucida Console in the Font list
3.5) Hit "OK"

now you should see all characters right.

Limitations:
This limitations come from windows, not codeblocks:
1) You can only use "printf" "std::cout" will not work
2) You have to set the console font on every machine you run this code, or you will always get this gibberish

Advantages with this approach:
1) Your code runs on linux windows and mac (if you remove the system line and replace it with the appropriate line)
2) Your code runs with any utf-8 supported language (as far as the font supports it, you don't have to switch encoding/codepage or have some mixed handling like utf-16)
3) You have not to deal with wide character strings like "wchar_t" and "wcout"

Disadvantages:
1) See Limitations
2) With this approach you are using UTF-8 encoded strings[1][2]. String functions don't work as expected [3]:
for example this
Code: [Select]
printf("string length: %i (24 expected)\n", strlen("На берегу пустынных волн"));
prints this:
Code: [Select]
string length: 45 (24 expected)
and for example:
Code: [Select]
char test[25] = "На берегу пустынных волн";wont compile, because the string is actually 46 characters long
but this will work:
Code: [Select]
char test[] = "На берегу пустынных волн";
But this is all to complicated... Is there not a more easy way to do?
Yes there is a more easy way to do. You use the default windows codepage from your system. I have never used this because i don't want to handle windows code pages because this is different for every language out there. If you are Korean you have to use a other code page then from Japan, or from Russia. There are for sure other websites out there that can explain this to you, but i won't ;)

I use some asian language and done all your steps but i still see only small white rectangles instead of the right symbols?
You have hit the limit of the Lucida Console font.  You have to change to some other font like "currier new". How you do this? Duckduckgo and "lucida console japanese" is your friend

hope this helps...

[1] https://en.wikipedia.org/wiki/UTF-8
[2] http://utf8everywhere.org/
[3] http://www.zedwood.com/article/cpp-utf8-strlen-function
« Last Edit: September 26, 2017, 05:26:25 pm by BlueHazzard »

Offline Khorne

  • Single posting newcomer
  • *
  • Posts: 5
Re: encoding/reading problem
« Reply #6 on: September 26, 2017, 06:11:18 pm »
Huge thanks for help, finally its working.
Qestions:
1) Why I had to do this manually? I mean, why somebody didnt have this problem and didnt have to fix anything due it was naturally working (for ex. my classmates)?
2) By this method I will have to put that system("chcp 65001  > nul"); code every time. Isnt there something "permament" ?
EDIT 3) Why you used Windows.h except stdlib.h ?
« Last Edit: September 26, 2017, 06:16:37 pm by Khorne »

Offline BlueHazzard

  • Lives here!
  • ****
  • Posts: 1520
Re: encoding/reading problem
« Reply #7 on: September 26, 2017, 06:36:04 pm »
Quote
1) Why I had to do this manually?
Because windows decided a long time ago to use this Unicode (UTF-16 and different code pages) approach and time showed this was the wrong way and made all things more complicated... As i described above i go the utf-8 way. There are a lot other ways that don't need to change anything and work out of the box but i did not invest time in them...

Quote
I mean, why somebody didnt have this problem and didnt have to fix anything due it was naturally working (for ex. my classmates)?
i do not understand this fully. There are many ways to Rome... If this was working on one of your class mates pc he probably uses the native language codepage of windows. I don't know, but it could be possible that he uses the native language on windows and you are using the english version of windows? So he uses the native codepage and for this can print his native language"ASCII" glyphs. Or he uses the wide string approach (every string has to be incorporated with L"" ) This is quite complicated and i can not describe this in easy words. Search for codepage  and unicode on wikipedia..

Quote
2) By this method I will have to put that system("chcp 65001  > nul"); code every time. Isnt there something "permament" ?
https://duckduckgo.com/?q=windows+cmd+set+codepage+in+registry&t=ffsb

Quote
3) Why you used Windows.h except stdlib.h ?
I thought "system()" is defined in windows.h, i was to lazy to look it up and it worked ;) now i know it is in stdlib.h

Offline Khorne

  • Single posting newcomer
  • *
  • Posts: 5
Re: encoding/reading problem
« Reply #8 on: September 27, 2017, 12:35:41 pm »
Thanks for help.
I found out this solution for permament change (registry editing):
1) Open cmd console.
2) write/paste this -> REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 0xfde9
3) Enter and done.