Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

The encoding text from GCC compiler should be UTF8 by default?

(1/2) > >>

ollydbg:
Hi, I have source code .cpp file in UTF8 format, and the file contains CJK chars in comments. When building, I see that the Build log message shows badly of the comments.


See the image shot named: 2022-10-29-utf8.png

I did a test, I can change the source code format to GB2312, now, I see the Build log shows the correct CJK chars, see image show named: 2022-10-29-GB2312.png

So, my question is: do we think that the source file should be UTF8 by default?

ollydbg:
I have a patch to fix this issue:


--- Code: ---From 90a6a42d30abea13cf4b23cc47868e2acc569aeb Mon Sep 17 00:00:00 2001
From: asmwarrior <a@b.com>
Date: Sat, 29 Oct 2022 11:21:56 +0800
Subject: correctly encoding convert from the GCC's message to wxString


diff --git a/src/plugins/compilergcc/compilergcc.cpp b/src/plugins/compilergcc/compilergcc.cpp
index d47dcf5a..edcec90d 100644
--- a/src/plugins/compilergcc/compilergcc.cpp
+++ b/src/plugins/compilergcc/compilergcc.cpp
@@ -3615,7 +3615,11 @@ void CompilerGCC::OnGCCError(CodeBlocksEvent& event)
 {
     wxString msg = event.GetString();
     if (!msg.IsEmpty())
-        AddOutputLine(msg);
+    {
+        wxString msg1 = wxString::FromUTF8(msg.c_str());
+        AddOutputLine(msg1);
+    }
+
 }
 
 void CompilerGCC::OnGCCTerminated(CodeBlocksEvent& event)


--- End code ---

The result looks good, see the image shot named: 2022-10-29-utf8-fix.png

My question is: do we expect the GCC's return text message (in-fact it is the source code's encoding) is UTF8?

Miguel Gimenez:
The encoding of GCC's output will be the same of the source file, not necessarily UTF8. I would check if msg1 is empty, indicating invalid UTF8, and use the original string in that case:


--- Code: ---    {
        wxString msg1 = wxString::FromUTF8(msg.c_str());
        AddOutputLine(msg1.empty() ? msg : msg1);
    }

--- End code ---

ollydbg:

--- Quote from: Miguel Gimenez on October 29, 2022, 01:00:53 pm ---The encoding of GCC's output will be the same of the source file, not necessarily UTF8. I would check if msg1 is empty, indicating invalid UTF8, and use the original string in that case:


--- Code: ---    {
        wxString msg1 = wxString::FromUTF8(msg.c_str());
        AddOutputLine(msg1.empty() ? msg : msg1);
    }

--- End code ---

--- End quote ---

Hi, Miguel Gimenez, thanks for the reply.

Indeed. The output line has the same encoding as the input source code.
Your suggested method is much robust, and better.

So, shall we commit such fix in our code repository? At least it will fix the garbage characters output in the build log for my cases.

My original idea is that we have to detect the encoding of the input source file, but this way is far more complex than the way you suggested.

Miguel Gimenez:
IIRC the message also has the file path, I hope this change will not modify it if there are non-ASCII characters in it.

Navigation

[0] Message Index

[#] Next page

Go to full version