Author Topic: The 30 December 2018 build (11543) is out.  (Read 46294 times)

Offline killerbot

  • Administrator
  • Lives here!
  • *****
  • Posts: 5490
Re: The 30 December 2018 build (11543) is out.
« Reply #30 on: January 01, 2019, 09:28:51 pm »
related to what has been mentioned in a previous comment above : http://forums.codeblocks.org/index.php/topic,22936.0.html
==> the assert was as such also already present before, so that is not due to stepping to gcc810

Offline riban

  • Multiple posting newcomer
  • *
  • Posts: 26
Re: The 30 December 2018 build (11543) is out.
« Reply #31 on: January 03, 2019, 12:50:19 am »
@oBFusCATed the crash happens with the simplest of project, e.g. new console project (hello world) when the source code editor is opened. The assertion only happens occasionally so I have not figured out how to trigger that.

Offline BlueHazzard

  • Developer
  • Lives here!
  • *****
  • Posts: 3353
Re: The 30 December 2018 build (11543) is out.
« Reply #32 on: January 03, 2019, 02:22:40 pm »
What is the value of j and next?

funny:

Code
> info locals
next = <error reading variable next (Cannot access memory at address 0x0)>
index = 0
i = 0
j = 0

some heavy compiler optimizing going on here xD

compiled with -g3 and no optimization turned on...

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: The 30 December 2018 build (11543) is out.
« Reply #33 on: January 03, 2019, 03:18:40 pm »
some heavy compiler optimizing going on here xD
I hope this is sarcasm, because the bug happens even with -O0...
And I'm sure it is some kind of buffer overflow, but I'll have to dig a bit more to know the details...
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: The 30 December 2018 build (11543) is out.
« Reply #34 on: January 03, 2019, 03:42:30 pm »
Here is patch which fixes one of the issues with this function....

Code
diff --git a/src/plugins/codecompletion/parser/tokenizer.cpp b/src/plugins/codecompletion/parser/tokenizer.cpp
index 34e85c0..9b72cef 100644
--- a/src/plugins/codecompletion/parser/tokenizer.cpp
+++ b/src/plugins/codecompletion/parser/tokenizer.cpp
@@ -1668,7 +1668,7 @@ bool Tokenizer::ReplaceMacroUsage(const Token* tk)
     return false;
 }

-void Tokenizer::KMP_GetNextVal(const wxChar* pattern, int next[])
+void Tokenizer::KMP_GetNextVal(const wxChar* pattern, int next[], const int patternLen)
 {
     int j = 0, k = -1;
     next[0] = -1;
@@ -1677,6 +1677,8 @@ void Tokenizer::KMP_GetNextVal(const wxChar* pattern, int next[])
         if (k == -1 || pattern[j] == pattern[k])
         {
             ++j;
+            if (j >= patternLen + 1)
+                abort();
             ++k;
             if (pattern[j] != pattern[k])
                 next[j] = k;
@@ -1704,8 +1706,8 @@ int Tokenizer::KMP_Find(const wxChar* text, const wxChar* pattern, const int pat
         }
     }

-    int next[patternLen];
-    KMP_GetNextVal(pattern, next);
+    int next[patternLen + 1];
+    KMP_GetNextVal(pattern, next, patternLen);

     int index = 0, i = 0, j = 0;
     while (text[i] != _T('\0') && pattern[j] != _T('\0'))
diff --git a/src/plugins/codecompletion/parser/tokenizer.h b/src/plugins/codecompletion/parser/tokenizer.h
index a5ea3bf..911835f 100644
--- a/src/plugins/codecompletion/parser/tokenizer.h
+++ b/src/plugins/codecompletion/parser/tokenizer.h
@@ -489,7 +489,7 @@ private:
     bool GetMacroExpandedText(const Token* tk, wxString& expandedText);

     /** used in the KMP find function */
-    void KMP_GetNextVal(const wxChar* pattern, int next[]);
+    void KMP_GetNextVal(const wxChar* pattern, int next[], const int patternLen);

     /** Tokenizer options specify the token reading option */
     TokenizerOptions     m_TokenizerOptions;

To find the offenders remove the two +1...

There are at least there more:
1. access to next[k] is not guaranteed to be within bounds
2. the use of dynamic arrays is a GCC extension which I am not sure how well works in C++ and also it is not really portable. It should be replaced with an alloca or static sized array. The size is guarded already, so I don't see a reason to use a dynamic size here. It should be possible to statically size the array.
3. These functions should probably be made global functions and not part of the class.

I won't make any changes, because I don't understand the code here. @ollydbg would you be able to fix the issue now that we know what is causing it?
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: The 30 December 2018 build (11543) is out.
« Reply #35 on: January 03, 2019, 03:53:39 pm »
Here is patch which fixes one of the issues with this function....
...
I won't make any changes, because I don't understand the code here. @ollydbg would you be able to fix the issue now that we know what is causing it?

The KMP search related codes are added long times ago by Loaden, which try to locate an argument when we do macro expansion.
For example: If we have:
Code
#define M(a,b)  2*a+b
Then we have a usage like:
Code
M(x,y)
We have to search to search "a" in the string "2*a+b", and replace "a" with actual argument "x", so it becomes "2*x+b".
I hope I will take some time on this issue this weekend. Thanks.
« Last Edit: January 05, 2019, 06:33:44 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: The 30 December 2018 build (11543) is out.
« Reply #36 on: January 05, 2019, 06:31:53 am »
funny:

Code
> info locals
next = <error reading variable next (Cannot access memory at address 0x0)>
index = 0
i = 0
j = 0

I just build the wx3.1.1 under MinGW-W64's 64bit gcc8.1 compiler(mingw-builds/8.1.0/threads-posix/seh/x86_64-8.1.0-release-posix-seh-rt_v6-rev0.7z). The final log-release.txt is more than 20M, too many warnings with the below command.
Code
mingw32-make -f makefile.gcc USE_XRC=1 SHARED=1 MONOLITHIC=1 BUILD=release UNICODE=1 USE_OPENGL=1 VENDOR=cb CXXFLAGS="-Wno-unused-local-typedefs -Wno-deprecated-declarations -fno-keep-inline-dllexport" >log-release.txt 2>&1
Also, I have to this patch applied:Fix invalid cast in wxMSW AutoHANDLE::InvalidHandle() ยท wxWidgets/wxWidgets@424f64f, otherwise, the build will fall.

Then build C::B, I see the same crash.
This means the "int next[patternLen];" is corrupt.  :(
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: The 30 December 2018 build (11543) is out.
« Reply #37 on: January 05, 2019, 08:04:38 am »
run the CCTest project on the file "cc_macro_expansion_stringize.cpp"
(note, you need to copy this source file, and rename to ccc_macro_expansion_stringize.cpp, so that CCTest only run the test for this single file)

Set two breakpoints here:

Code
int Tokenizer::KMP_Find(const wxChar* text, const wxChar* pattern, const int patternLen)
{
    if (!text || !pattern || pattern[0] == _T('\0') || text[0] == _T('\0'))
        return -1;

    if (patternLen > 1024)
    {
        if (patternLen < 5012)
            TRACE(_T("KMP_Find() : %s - %s"), text, pattern);
        else
        {
            TRACE(_T("KMP_Find: The plan buffer is too big, %d"), patternLen);
            return -2;
        }
    }

    int next[patternLen];
    KMP_GetNextVal(pattern, next);  //bp1

    int index = 0, i = 0, j = 0;    //bp2
   
This the the value before we call the function KMP_GetNextVal
Code
> info locals

[debug]> info locals
[debug]next = {2283936, 0, 1875784749, 0}
[debug]index = 0
[debug]i = 14
[debug]j = 0
[debug]>>>>>>cb_gdb:

next = {2283936, 0, 1875784749, 0}
index = 0
i = 14
j = 0

And after the function call

Code
> info locals

[debug]> info locals
[debug]next = <error reading variable next (Cannot access memory at address 0x1)>
[debug]index = 0
[debug]i = 14
[debug]j = 0
[debug]>>>>>>cb_gdb:

next = <error reading variable next (Cannot access memory at address 0x1)>
index = 0
i = 14
j = 0

This means this function has some errors.
Note, the "next" array is the lps array stated in
https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/
or
KMP Algorithm | Searching for Patterns | GeeksforGeeks - YouTube

EDIT1:
For a pattern "text", when step into the function:
Code
void Tokenizer::KMP_GetNextVal(const wxChar* pattern, int next[])
{
    int j = 0, k = -1;
    next[0] = -1;
    while (pattern[j] != _T('\0'))
    {
        if (k == -1 || pattern[j] == pattern[k])
        {
            ++j;
            ++k;
            if (pattern[j] != pattern[k])
                next[j] = k;  // error
            else
                next[j] = next[k];
        }
        else
            k = next[k];
    }
}
I do see that the line "//error", has j=4, which means next[j] is beyond the next (since next array only have four elements). :(
But I still need some time to see how the KMP algorithm works.

EDIT2:
To simplify the issue, you only need to debug this function:
Code
int Tokenizer::GetFirstTokenPosition(const wxChar* buffer, const size_t bufferLen,
                                     const wxChar* key, const size_t keyLen)
{
    int pos = -1;
    wxChar* p = const_cast<wxChar*>(buffer);
    const wxChar* endBuffer = buffer + bufferLen;
    for (;;)
    {
        const int ret = KMP_Find(p, key, keyLen);
        if (ret == -1)
            break;

        // check previous char
        p += ret;
        if (p > buffer)
        {
            const wxChar ch = *(p - 1);
            if (ch == _T('_') || wxIsalnum(ch))
            {
                p += keyLen;
                continue;
            }
        }

        // check next char
        p += keyLen;
        if (p < endBuffer)
        {
            const wxChar ch = *p;
            if (ch == _T('_') || wxIsalnum(ch))
                continue;
        }

        // got it
        pos = p - buffer - keyLen;
        break;
    }

    return pos;
}
Where, the arguments are:
Code
[debug]> info args
[debug]this = 0x41d4200
[debug]buffer = 0x41d7bf0 L"text ## line"
[debug]bufferLen = 12
[debug]key = 0x41d7868 L"text"
[debug]keyLen = 4
[debug]>>>>>>cb_gdb:

this = 0x41d4200
buffer = 0x41d7bf0 L"text ## line"
bufferLen = 12
key = 0x41d7868 L"text"
keyLen = 4
« Last Edit: January 05, 2019, 08:25:24 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline Miguel Gimenez

  • Developer
  • Lives here!
  • *****
  • Posts: 1553
Re: The 30 December 2018 build (11543) is out.
« Reply #38 on: January 05, 2019, 11:46:45 am »
You can check this code adapted from your link:

Code
void Tokenizer::KMP_GetNextVal(const wxChar* pattern, int next[])
{
    int len = 0;

    next[0] = 0;  // CB code uses -1

    int i = 1;
    while (pattern[i] != _T('\0'))
    {
        if (pattern[i] == pattern[len])
        {
            len++;
            next[i] = len;
            i++;
        }
        else
        {
            if (len)
            {
                len = next[len - 1];
            }
            else
            {
                next[i] = 0;
                i++;
            }
        }
    }
}

Also, I would try using a std::vector or new[] instead of a dynamic array for next.
« Last Edit: January 05, 2019, 01:56:40 pm by Miguel Gimenez »

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: The 30 December 2018 build (11543) is out.
« Reply #39 on: January 05, 2019, 11:52:36 am »
Also, I would try using a std::vector or new[] instead of a dynamic array for next.
And you'll make the parser slow... There is already a known limit on the number of elements in this array. It can be used. But this won't solve the problem.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: The 30 December 2018 build (11543) is out.
« Reply #40 on: January 06, 2019, 05:00:50 pm »
The crash issue of the KMP algorithm is fixed in trunk now. The "next" array should have the same length as the "pattern" array, so access beyond the array is not allowed.
@Miguel Gimenez  @oBFusCATed
Thanks for the help.

EDIT:
However, there are still some performance issue. For example, for a fixed "pattern" string, we will generate the "next" array many times. :(
« Last Edit: January 06, 2019, 05:03:33 pm by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline Miguel Gimenez

  • Developer
  • Lives here!
  • *****
  • Posts: 1553
Re: The 30 December 2018 build (11543) is out.
« Reply #41 on: January 09, 2019, 01:49:01 pm »
There is a new crash in KMP_Find() with revision 11547 and wxwidgets 3.1.1. To reproduce, just open CodeBlocks_wx31.workspace and wait (it happens sometimes, specially when you start compiling just after loading).

Code
codeblocks.exe caused an Access Violation at location 0CAC57FA in module codecompletion.dll Reading from location 2A13C000.

Registers:
eax=2a13c000 ebx=2523b6f0 ecx=00000000 edx=00001390 esi=2523c740 edi=77de22b0
eip=0cac57fa esp=2523b6e0 ebp=2523b728 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206

AddrPC   Params
0CAC57FA 2A13AC70 00001022 2A17F5D4  codecompletion.dll!KMP_Find  [G:/Codeblocks/src/plugins/codecompletion/parser/tokenizer.cpp @ 1716]
  1714:     while ( i < textLen && j < patternLen)
  1715:     {
> 1716:         if (j == -1 || text[i] == pattern[j])
  1717:         {
  1718:             ++i;
0CAC61CB 2A139C94 00001022 2A17F5D4  codecompletion.dll!GetFirstTokenPosition  [G:/Codeblocks/src/plugins/codecompletion/parser/tokenizer.cpp @ 1928]
  1926:     for (;;)
  1927:     {
> 1928:         const int ret = KMP_Find(p, bufferLen, key, keyLen);
  1929:         if (ret == -1)
  1930:             break;
0CAC5C30 25F51558 2523B92C 2523B8F8  codecompletion.dll!GetMacroExpandedText  [G:/Codeblocks/src/plugins/codecompletion/parser/tokenizer.cpp @ 1843]
  1841:         {
  1842:             // find the first exist of formal arg from data to dataEnd
> 1843:             const int pos = GetFirstTokenPosition(data, dataEnd - data, key, keyLen);
  1844:             if (pos != -1)
  1845:             {

Full report is attached.

Windows 7 64 bits

EDIT: running with GDB (previous attempts didn't crash) I get:

text = L")"
textLen = 43
pattern = L"T"
patternLen = 1

textLen seems incorrect.

Code
0x0c7c57fa in Tokenizer::KMP_Find (this=0x228ce570, text=0x2bfc5fc2 L")", textLen=43, pattern=0x2bfc56ec L"T", patternLen=1)
    at G:\Codeblocks\src\plugins\codecompletion\parser\tokenizer.cpp:1716
1716            if (j == -1 || text[i] == pattern[j])
(gdb) bt
#0  0x0c7c57fa in Tokenizer::KMP_Find (this=0x228ce570, text=0x2bfc5fc2 L")", textLen=43, pattern=0x2bfc56ec L"T", patternLen=1)
    at G:\Codeblocks\src\plugins\codecompletion\parser\tokenizer.cpp:1716
#1  0x0c7c61cb in Tokenizer::GetFirstTokenPosition (this=0x228ce570,
    buffer=0x2bfc5f6e L", name, base, class wxARRAY_DEFAULT_EXPORT)", bufferLen=43, key=0x2bfc56ec L"T", keyLen=1)
    at G:\Codeblocks\src\plugins\codecompletion\parser\tokenizer.cpp:1928
#2  0x0c7c5c30 in Tokenizer::GetMacroExpandedText (this=0x228ce570, tk=0x2569fc30, expandedText=...)
    at G:\Codeblocks\src\plugins\codecompletion\parser\tokenizer.cpp:1843
#3  0x0c7c55a6 in Tokenizer::ReplaceMacroUsage (this=0x228ce570, tk=0x2569fc30)
    at G:\Codeblocks\src\plugins\codecompletion\parser\tokenizer.cpp:1665
#4  0x0c7c38ee in Tokenizer::CheckMacroUsageAndReplace (this=0x228ce570)
    at G:\Codeblocks\src\plugins\codecompletion\parser\tokenizer.cpp:1089
#5  0x0c7c30c2 in Tokenizer::DoGetToken (this=0x228ce570) at G:\Codeblocks\src\plugins\codecompletion\parser\tokenizer.cpp:966
#6  0x0c7c2e88 in Tokenizer::PeekToken (this=0x228ce570) at G:\Codeblocks\src\plugins\codecompletion\parser\tokenizer.cpp:904
#7  0x0c7b060d in ParserThread::HandleFunction (this=0x228ce568, name=..., isOperator=false, isPointer=false)
    at G:\Codeblocks\src\plugins\codecompletion\parser\parserthread.cpp:2463
#8  0x0c7aa705 in ParserThread::DoParse (this=0x228ce568) at G:\Codeblocks\src\plugins\codecompletion\parser\parserthread.cpp:1188
#9  0x0c7ad146 in ParserThread::HandleNamespace (this=0x228ce568)
    at G:\Codeblocks\src\plugins\codecompletion\parser\parserthread.cpp:1755
#10 0x0c7aa0aa in ParserThread::DoParse (this=0x228ce568) at G:\Codeblocks\src\plugins\codecompletion\parser\parserthread.cpp:1043
#11 0x0c7a8295 in ParserThread::Parse (this=0x228ce568) at G:\Codeblocks\src\plugins\codecompletion\parser\parserthread.cpp:533
#12 0x0c7d37e2 in ParserThread::Execute (this=0x228ce568) at G:/Codeblocks/src/plugins/codecompletion/parser/parserthread.h:192
#13 0x020264d8 in cbThreadPool::cbWorkerThread::Entry (this=0x226f3e88) at G:\Codeblocks\src\sdk\cbthreadpool.cpp:228
#14 0x645d34d0 in wxThread::CallEntry() () from C:\Windows\system32\wxmsw311u_gcc_custom.dll
#15 0x645d4cc2 in wxThreadInternal::DoThreadStart(wxThread*) () from C:\Windows\system32\wxmsw311u_gcc_custom.dll
#16 0x645d54c3 in wxThreadInternal::WinThreadStart(void*)@4 () from C:\Windows\system32\wxmsw311u_gcc_custom.dll
#17 0x00000000 in ?? ()
« Last Edit: January 09, 2019, 02:46:02 pm by Miguel Gimenez »

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: The 30 December 2018 build (11543) is out.
« Reply #42 on: January 09, 2019, 02:54:06 pm »
There is a new crash in KMP_Find() with revision 11547 and wxwidgets 3.1.1. To reproduce, just open CodeBlocks_wx31.workspace and wait (it happens sometimes, specially when you start compiling just after loading).
...
Thanks for the report, I will try to reproduce this crash tonight.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: The 30 December 2018 build (11543) is out.
« Reply #43 on: January 09, 2019, 04:30:42 pm »
There is a new crash in KMP_Find() with revision 11547 and wxwidgets 3.1.1. To reproduce, just open CodeBlocks_wx31.workspace and wait (it happens sometimes, specially when you start compiling just after loading).
...
Thanks for the report, I will try to reproduce this crash tonight.
OK, I fixed this bug in trunk r11548. The bug is introduced in rev11545. Though I can't reproduce a crash in my PC, but I think the fix is quite obvious.
You can see the commit message of r11548. Thanks!
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline Miguel Gimenez

  • Developer
  • Lives here!
  • *****
  • Posts: 1553
Re: The 30 December 2018 build (11543) is out.
« Reply #44 on: January 09, 2019, 04:56:59 pm »
It works now, thank you for the quick fix.