Author Topic: encoding/reading problem  (Read 21566 times)

Offline Khorne

  • Single posting newcomer
  • *
  • Posts: 5
encoding/reading problem
« on: September 25, 2017, 08:57:54 pm »
Hello,
I need help with encoding-type problem. When i am typing non-english letters, latin letters but with slavic changes, i am getting wierd symbols. I tried to change unicode to ISO 8859-1/ISO 8859-2/UTF-8/Windows-1250/(default) and only thing what has changed was type of wierd symbols displaying in console. Picture bellow.
Help is drastically needed!

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: encoding/reading problem
« Reply #1 on: September 25, 2017, 09:03:08 pm »
I think you need to tell the compiler that the encoding of your source files is not ascii.

To find the option you need to pass you have to search the manual of your compiler...
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline BlueHazzard

  • Developer
  • Lives here!
  • *****
  • Posts: 3353
Re: encoding/reading problem
« Reply #2 on: September 25, 2017, 09:18:09 pm »

Offline Khorne

  • Single posting newcomer
  • *
  • Posts: 5
Re: encoding/reading problem
« Reply #3 on: September 25, 2017, 09:41:42 pm »
I think you need to tell the compiler that the encoding of your source files is not ascii.

To find the option you need to pass you have to search the manual of your compiler...
What manual? I have MinGW compiler.

Offline Khorne

  • Single posting newcomer
  • *
  • Posts: 5
Re: encoding/reading problem
« Reply #4 on: September 26, 2017, 03:51:55 pm »
unicode and windows are a bit difficult:
http://forums.codeblocks.org/index.php/topic,18803.msg128791.html#msg128791
Is there somewhere even simpler tutorial/manual with steps described in clearer steps in simple english?
« Last Edit: September 26, 2017, 03:54:54 pm by Khorne »

Offline BlueHazzard

  • Developer
  • Lives here!
  • *****
  • Posts: 3353
Re: encoding/reading problem
« Reply #5 on: September 26, 2017, 05:23:10 pm »
Ok, i will try...

[EDIT 31.01.2018] Update this:

Steps to do:
1) Put the line
Code
system("chcp 65001  > nul");
as first line in your main:
Code
#include <stdio.h>
#include <windows.h>
int main()
{
    system("chcp 65001  > nul");        // Important code line executed before any printf call
    // your code here...
    printf("На берегу пустынных волн\n");  // some Russian
    printf("Я можу їсти скло, і воно мені не зашкодить\n");  // some Ukrainian
    printf("Mohu jíst sklo, neublíží mi. \n"); // some  Czech
    printf("Môžem jesť sklo. Nezraní ma. \n"); // some  Slovak
    printf("Mogę jeść szkło i mi nie szkodzi. \n"); // some  Polish
    return 0;
}

2) Save as UTF-8 file. Do this by calling the following menu entries in codeblocks:
Edit->File encoding->UTF-8
Save the file:
File->Save file
every code file you use should be set to UTF-8 encoding

3) Compile and run. This will probably show gibberish (white squares) in the console... To change this you have to set the correct font:
3.1) After compiling and run the console will open (This black window)
3.2) In this window right click on the top left corner: A Menu will open. Select "Properties"
3.3) In the opening window select "Font" tab from the top
3.4) Select Lucida Console in the Font list
3.5) Hit "OK"

now you should see all characters right.

Limitations:
This limitations come from windows, not codeblocks:
1) You can only use "printf" "std::cout" will not work
[EDIT:] You can use std::cout with this method. If it is not working you have to update your compiler, or use utf8 literals that are supported by c++11

2) You have to set the console font on every machine you run this code, or you will always get this gibberish

Advantages with this approach:
1) Your code runs on linux windows and mac (if you remove the system line and replace it with the appropriate line)
2) Your code runs with any utf-8 supported language (as far as the font supports it, you don't have to switch encoding/codepage or have some mixed handling like utf-16)
3) You have not to deal with wide character strings like "wchar_t" and "wcout"

Disadvantages:
1) See Limitations
2) With this approach you are using UTF-8 encoded strings[1][2]. String functions don't work as expected [3]:
for example this
Code
printf("string length: %i (24 expected)\n", strlen("На берегу пустынных волн"));
prints this:
Code
string length: 45 (24 expected)
and for example:
Code
char test[25] = "На берегу пустынных волн";
wont compile, because the string is actually 46 characters long
but this will work:
Code
char test[] = "На берегу пустынных волн";

But this is all to complicated... Is there not a more easy way to do?
Yes there is a more easy way to do. You use the default windows codepage from your system. I have never used this because i don't want to handle windows code pages because this is different for every language out there. If you are Korean you have to use a other code page then from Japan, or from Russia. There are for sure other websites out there that can explain this to you, but i won't ;)

I use some asian language and done all your steps but i still see only small white rectangles instead of the right symbols?
You have hit the limit of the Lucida Console font.  You have to change to some other font like "currier new". How you do this? Duckduckgo and "lucida console japanese" is your friend

hope this helps...

[1] https://en.wikipedia.org/wiki/UTF-8
[2] http://utf8everywhere.org/
[3] http://www.zedwood.com/article/cpp-utf8-strlen-function
« Last Edit: January 31, 2018, 03:06:00 am by BlueHazzard »

Offline Khorne

  • Single posting newcomer
  • *
  • Posts: 5
Re: encoding/reading problem
« Reply #6 on: September 26, 2017, 06:11:18 pm »
Huge thanks for help, finally its working.
Qestions:
1) Why I had to do this manually? I mean, why somebody didnt have this problem and didnt have to fix anything due it was naturally working (for ex. my classmates)?
2) By this method I will have to put that system("chcp 65001  > nul"); code every time. Isnt there something "permament" ?
EDIT 3) Why you used Windows.h except stdlib.h ?
« Last Edit: September 26, 2017, 06:16:37 pm by Khorne »

Offline BlueHazzard

  • Developer
  • Lives here!
  • *****
  • Posts: 3353
Re: encoding/reading problem
« Reply #7 on: September 26, 2017, 06:36:04 pm »
Quote
1) Why I had to do this manually?
Because windows decided a long time ago to use this Unicode (UTF-16 and different code pages) approach and time showed this was the wrong way and made all things more complicated... As i described above i go the utf-8 way. There are a lot other ways that don't need to change anything and work out of the box but i did not invest time in them...

Quote
I mean, why somebody didnt have this problem and didnt have to fix anything due it was naturally working (for ex. my classmates)?
i do not understand this fully. There are many ways to Rome... If this was working on one of your class mates pc he probably uses the native language codepage of windows. I don't know, but it could be possible that he uses the native language on windows and you are using the english version of windows? So he uses the native codepage and for this can print his native language"ASCII" glyphs. Or he uses the wide string approach (every string has to be incorporated with L"" ) This is quite complicated and i can not describe this in easy words. Search for codepage  and unicode on wikipedia..

Quote
2) By this method I will have to put that system("chcp 65001  > nul"); code every time. Isnt there something "permament" ?
https://duckduckgo.com/?q=windows+cmd+set+codepage+in+registry&t=ffsb

Quote
3) Why you used Windows.h except stdlib.h ?
I thought "system()" is defined in windows.h, i was to lazy to look it up and it worked ;) now i know it is in stdlib.h

Offline Khorne

  • Single posting newcomer
  • *
  • Posts: 5
Re: encoding/reading problem
« Reply #8 on: September 27, 2017, 12:35:41 pm »
Thanks for help.
I found out this solution for permament change (registry editing):
1) Open cmd console.
2) write/paste this -> REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 0xfde9
3) Enter and done.