I have a patch to fix this issue:
From 90a6a42d30abea13cf4b23cc47868e2acc569aeb Mon Sep 17 00:00:00 2001
From: asmwarrior <a@b.com>
Date: Sat, 29 Oct 2022 11:21:56 +0800
Subject: correctly encoding convert from the GCC's message to wxString
diff --git a/src/plugins/compilergcc/compilergcc.cpp b/src/plugins/compilergcc/compilergcc.cpp
index d47dcf5a..edcec90d 100644
--- a/src/plugins/compilergcc/compilergcc.cpp
+++ b/src/plugins/compilergcc/compilergcc.cpp
@@ -3615,7 +3615,11 @@ void CompilerGCC::OnGCCError(CodeBlocksEvent& event)
{
wxString msg = event.GetString();
if (!msg.IsEmpty())
- AddOutputLine(msg);
+ {
+ wxString msg1 = wxString::FromUTF8(msg.c_str());
+ AddOutputLine(msg1);
+ }
+
}
void CompilerGCC::OnGCCTerminated(CodeBlocksEvent& event)
The result looks good, see the image shot named: 2022-10-29-utf8-fix.png
My question is: do we expect the GCC's return text message (in-fact it is the source code's encoding) is UTF8?
The encoding of GCC's output will be the same of the source file, not necessarily UTF8. I would check if msg1 is empty, indicating invalid UTF8, and use the original string in that case:
{
wxString msg1 = wxString::FromUTF8(msg.c_str());
AddOutputLine(msg1.empty() ? msg : msg1);
}
IIRC the message also has the file path, I hope this change will not modify it if there are non-ASCII characters in it.
Hi, thanks. That's another issue.
I just checked a file path contains CJK chars:
D:\code\test-crash-中文\main.cpp
Now, without your suggestion empty() check, I see some build log lines are missing(empty).
By using the empty() check, it works OK.
See image shot below:
EDIT:
I also checked another file path which contains some like, latin small letter e with grave
(sorry, our forum does not allow to post that non-ASCII chars in the post, so I add another screen shot. )
It also works OK.
Things are more complex than I thought.
First, it looks like in C::B, the compiler plugin send the compile command to GCC, the file path in the command is in Unicode format, under my Win7, it is GB2312, and in GCC's return text, the file path is still in GB2312 encoding, while when GCC has some diagnose message, for example, it report some error position, it use byte position.
While, in our C::B, when handling the stdout and stderr pipe, it use a converter:
// The following class is created to override wxTextStream::ReadLine()
class cbTextInputStream : public wxTextInputStream
{
protected:
bool m_allowMBconversion;
public:
#if wxUSE_UNICODE
cbTextInputStream(wxInputStream& s, const wxString &sep=wxT(" \t"), wxMBConv& conv = wxConvLocal )
: wxTextInputStream(s, sep, conv),
m_allowMBconversion(true)
{
memset((void*)m_lastBytes, 0, 10);
}
Here, the wxConvLocal is local, which means the byte stream is expected as the GB2312 encoding, and if we have such code
5 | int abc; ///< 串口号
| ^~~
If the code content is in UTF8 format, it just wrongly convert the string to GB2312.
This is the code to fetch each byte from the input pipe stream, and convert it by the wxConvLocal converter.
// The following function was copied verbatim from wxTextStream::NextChar()
// The only change, is the removal of the MB2WC function
// With PipedProcess we work with compilers/debuggers which (usually) don't
// send us unicode (at least GDB).
wxChar NextChar()
{
#if wxUSE_UNICODE
wxChar wbuf[2];
memset((void*)m_lastBytes, 0, 10);
for (size_t inlen = 0; inlen < 9; inlen++)
{
// actually read the next character byte
m_lastBytes[inlen] = m_input.GetC();
if (m_input.LastRead() <= 0)
return wxEOT;
// inlen is the byte index we get copied from the input byte stream
if (m_allowMBconversion)
{
int retlen = (int) m_conv->MB2WC(wbuf, m_lastBytes, 2); // returns -1 for failure
if (retlen >= 0) // res == 0 could happen for '\0' char
return wbuf[0];
}
else
return m_lastBytes[inlen]; // C::B fix (?)
}
// there should be no encoding which requires more than nine bytes for one character...
return wxEOT;
#else
m_lastBytes[0] = m_input.GetC();
if (m_input.LastRead() <= 0)
return wxEOT;
return m_lastBytes[0];
#endif
}
The trick here is: if m_conv->MB2WC(wbuf, m_lastBytes, 2); function works partially on the file content conversion, it got the wrong diagnose wxString.
Only if MB2WC() function call get failed, then the raw byte will returned, and we will have later chance to convert it by the code:
{
wxString msg1 = wxString::FromUTF8(msg.c_str());
AddOutputLine(msg1.empty() ? msg : msg1);
}
I have a new code to solve this issue in the file: sdk\pipedprocess.cpp
I think we don't need to call the function: wxChar NextChar().
wxString ReadLine()
{
wxString line;
std::string lineBytes;
while ( m_input.CanRead() && !m_input.Eof() )
{
char c = m_input.GetC();
if (m_input.LastRead() <= 0)
break;
if ( !m_input )
break;
if (EatEOL(c))
break;
lineBytes += c;
}
// for the compiler output, it could be either the file content and the file path
// the file content could be in any encoding, mostly the utf-8
// for the file path, it usually contains the legacy MBCS encoding.(ANSI string)
// so, we firstly try to convert from UTF8, if failed, try the wxConvLocal
line = wxString::FromUTF8(lineBytes.c_str());
if (line.empty())
{
line = wxString(lineBytes.c_str()); // use the wxConvLocal
}
return line;
}