i just ran through a few debug sessions. seems like the bottleneck is in the encoder. specifically,
line 93 EncodingDetector.cpp (see *** CHOKES HERE **** line below)
bool EncodingDetector::ConvertToWxStr(const wxByte* buffer, size_t size)
{
if (!buffer || size == 0)
return false;
if (m_BOMSizeInBytes > 0)
{
for (int i = 0; i < m_BOMSizeInBytes; ++i)
*buffer++;
}
size_t outlen = 0;
wxCSConv conv(m_Encoding);
/* NOTE (Biplab#5#): FileManager returns a buffer with 4 extra NULL chars appended.
But the buffer size is returned sans the NULL chars */
wxWCharBuffer wideBuff = conv.cMB2WC((char*)buffer, size + 4 - m_BOMSizeInBytes, &outlen); ****CHOKES HERE*****
m_ConvStr = wxString(wideBuff);
strangely, this code gets called twice every file open (and is slow each time)
UNRELATED OBSERVATION: when i remove breakpoints in an active debug session, the breakpoints remain (this is a pretty longstanding issue with debugger plugin in cb)
Yes, I can confirm it's the call to mbsrtowcs that's causing the slow down. Unfortunately, I 'm not sure we can do anything about that... :(
Unfortunately mbsrtowcs (defined in wchar.h ) is not the cause. It's the mbstowcs (defined in stdlib.h ) which is causing this slowdown. Though I don't know the difference.
In <wx/wxcrtbase.h> file (line:580):
#ifdef wxNEED_WX_MBSTOWCS
/* even though they are defined and "implemented", they are bad and just
stubs so we need our own - we need these even in ANSI builds!! */
WXDLLIMPEXP_BASE size_t wxMbstowcs(wchar_t *, const char *, size_t);
WXDLLIMPEXP_BASE size_t wxWcstombs(char *, const wchar_t *, size_t);
#else
#define wxMbstowcs mbstowcs
#define wxWcstombs wcstombs
#endif
In src/common/wxcrt.cpp file (line:88):
#ifdef HAVE_WCSRTOMBS
return mbsrtowcs(buf, &psz, n, &mbstate);
#else
return wxMbstowcs(buf, psz, n);
#endif
And this HAVE_WCSRTOMBS macro is defined only for Metroworks compiler on Mac.
I'm not sure why they have used mbsrtowcs() only for Mac when the wxchar.h header is available on different platforms?? :?
It probably doesn't make any difference, but why do we use
wxWCharBuffer wideBuff = conv.cMB2WC((char*)buffer, size + 4 - m_BOMSizeInBytes, &outlen);
m_ConvStr = wxString(wideBuff);
instead of
m_ConvStr = wxString((char*)buffer,conv);
outlen = m_ConvStr.Len();
It probably doesn't make any difference, but why do we use
I don't remember exactly why I used such a workaround. Probably due to some crash. :)
I've found out a better solution for this problem. :D
If we use iconv based implementation, the conversion routine becomes faster. Just apply the following patch and see the difference.
Index: src/sdk/encodingdetector.cpp
===================================================================
--- src/sdk/encodingdetector.cpp (revision 5040)
+++ src/sdk/encodingdetector.cpp (working copy)
@@ -19,6 +19,8 @@
#include "encodingdetector.h"
#include "filemanager.h"
+#include <wx/stopwatch.h>
+#include <iconv.h>
EncodingDetector::EncodingDetector(const wxString& filename)
@@ -106,9 +108,25 @@
/* NOTE (Biplab#5#): FileManager returns a buffer with 4 extra NULL chars appended.
But the buffer size is returned sans the NULL chars */
+ /*wxStopWatch sw;
+ sw.Start();
wxWCharBuffer wideBuff = conv.cMB2WC((char*)buffer, size + 4 - m_BOMSizeInBytes, &outlen);
m_ConvStr = wxString(wideBuff);
+ sw.Pause();
+ Manager::Get()->GetLogManager()->DebugLog(wxString::Format(_T("Time taken: %d milliseconds\n"), sw.Time() ));*/
+ wxStopWatch sw;
+ iconv_t cd = iconv_open("ISO-8859-1", "UTF-8");
+ size_t inbytesleft = 0, outbytesleft = 0;
+ char* outbuf = new char[size + 4 - m_BOMSizeInBytes];
+ sw.Start();
+ iconv(cd, (char **)&buffer, &inbytesleft, &outbuf, &outbytesleft);
+ sw.Pause();
+ m_ConvStr = wxString((wchar_t *)outbuf);
+ delete [] outbuf;
+ iconv_close(cd);
+ Manager::Get()->GetLogManager()->DebugLog(wxString::Format(_T("Time taken: %d milliseconds\n"), sw.Time() ));
+
if (outlen == 0)
{
// Possibly the conversion has failed. Let's try with System-default encoding
Using wx based implementation:
Time taken: 6201 milliseconds
Using iconv based implementation:
Time taken: 0 millisecond
The patch contains commented old code. You can easily compare the performance by commenting out new / old code.
Please note: The iconv based implementation code is NOT generic one. It would work only with the log file.
PS: Please bear with my late replies. :)