Things are more complex than I thought.
First, it looks like in C::B, the compiler plugin send the compile command to GCC, the file path in the command is in Unicode format, under my Win7, it is GB2312, and in GCC's return text, the file path is still in GB2312 encoding, while when GCC has some diagnose message, for example, it report some error position, it use byte position.
While, in our C::B, when handling the stdout and stderr pipe, it use a converter:
// The following class is created to override wxTextStream::ReadLine()
class cbTextInputStream : public wxTextInputStream
{
protected:
bool m_allowMBconversion;
public:
#if wxUSE_UNICODE
cbTextInputStream(wxInputStream& s, const wxString &sep=wxT(" \t"), wxMBConv& conv = wxConvLocal )
: wxTextInputStream(s, sep, conv),
m_allowMBconversion(true)
{
memset((void*)m_lastBytes, 0, 10);
}
Here, the wxConvLocal is local, which means the byte stream is expected as the GB2312 encoding, and if we have such code
5 | int abc; ///< 串口号
| ^~~
If the code content is in UTF8 format, it just wrongly convert the string to GB2312.
This is the code to fetch each byte from the input pipe stream, and convert it by the wxConvLocal converter.
// The following function was copied verbatim from wxTextStream::NextChar()
// The only change, is the removal of the MB2WC function
// With PipedProcess we work with compilers/debuggers which (usually) don't
// send us unicode (at least GDB).
wxChar NextChar()
{
#if wxUSE_UNICODE
wxChar wbuf[2];
memset((void*)m_lastBytes, 0, 10);
for (size_t inlen = 0; inlen < 9; inlen++)
{
// actually read the next character byte
m_lastBytes[inlen] = m_input.GetC();
if (m_input.LastRead() <= 0)
return wxEOT;
// inlen is the byte index we get copied from the input byte stream
if (m_allowMBconversion)
{
int retlen = (int) m_conv->MB2WC(wbuf, m_lastBytes, 2); // returns -1 for failure
if (retlen >= 0) // res == 0 could happen for '\0' char
return wbuf[0];
}
else
return m_lastBytes[inlen]; // C::B fix (?)
}
// there should be no encoding which requires more than nine bytes for one character...
return wxEOT;
#else
m_lastBytes[0] = m_input.GetC();
if (m_input.LastRead() <= 0)
return wxEOT;
return m_lastBytes[0];
#endif
}
The trick here is: if m_conv->MB2WC(wbuf, m_lastBytes, 2); function works partially on the file content conversion, it got the wrong diagnose wxString.
Only if MB2WC() function call get failed, then the raw byte will returned, and we will have later chance to convert it by the code:
{
wxString msg1 = wxString::FromUTF8(msg.c_str());
AddOutputLine(msg1.empty() ? msg : msg1);
}