Code::Blocks Forums

Developer forums (C::B DEVELOPMENT STRICTLY!) => Development => CodeCompletion redesign => Topic started by: ollydbg on September 17, 2024, 04:51:01 pm

Title: use PPToken for preprocessor in native CC
Post by: ollydbg on September 17, 2024, 04:51:01 pm
I have some code improvement about the preprocessor in our legacy Code completion plugin, see

https://github.com/asmwarrior/codeblocks_sfmirror/tree/master

What I want is that I would like to use "id compare" instead of "string compare" for the high level parser.

Comments are welcome, thanks.
Title: Re: use PPToken for preprocessor in native CC
Post by: Miguel Gimenez on September 17, 2024, 05:58:45 pm
I have been looking at this commit (https://github.com/asmwarrior/codeblocks_sfmirror/commit/9dc5777ddf05aeb599c72dc8bbfd8d03e14b68f0) (the exact location of the changes was not specified).

PPToken looks to me just a way to pack token information, so I assume the real benefit is using the deque afterwards.

I just suggest changing
Code
if (m_PPTokenStream.size() > 0)
to this
Code
if (!m_PPTokenStream.empty())
and removing this part
Code
    else
        ;// peekToken.Clear();
Title: Re: use PPToken for preprocessor in native CC
Post by: ollydbg on September 18, 2024, 12:10:55 am
I have been looking at this commit (https://github.com/asmwarrior/codeblocks_sfmirror/commit/9dc5777ddf05aeb599c72dc8bbfd8d03e14b68f0) (the exact location of the changes was not specified).

PPToken looks to me just a way to pack token information, so I assume the real benefit is using the deque afterwards.

Thanks for the comment.

The "deque" here is used to move the token cursor forward or backward, because we have some interface to "peek token" (look ahead) or "undo token"(move the cursor backward), so I think a "deque" is a good structure to use.



Quote

I just suggest changing
Code
if (m_PPTokenStream.size() > 0)
to this
Code
if (!m_PPTokenStream.empty())

Thanks, but what's the difference? Maybe the "empty()" function runs much faster?

Quote
and removing this part
Code
    else
        ;// peekToken.Clear();

I will read this part of the code later.

Thanks.
Title: Re: use PPToken for preprocessor in native CC
Post by: blauzahn on September 18, 2024, 09:08:10 am
I very briefly glanced across the commit pointed to by the link.

findings:
Cheers

Title: Re: use PPToken for preprocessor in native CC
Post by: ollydbg on September 18, 2024, 10:28:52 am
I very briefly glanced across the commit pointed to by the link.

findings:

Thanks for the comment.

Quote
class PToken: data-member m_Kind is left uninitialized when PToken is default initialized and when initialized via the ctor with 4 args (while having 5 data-member).
  The call to the latter one is also a little errorprone. It might easily be used incorrectly because it has 3 int args. I'd consider member-initializer for the 4 integral data
  member and delete the default-ctor.

Oh, yes, I should initialize the m_Kind member variable in the default constructor and other constructors.
About the argument: "PPToken(wxString lexeme, int charIndex, int lineIndex, int nestLevel)", I really don't know where does the "charIndex" come from, I will looked into it.


Quote
The cctor of PToken is unecessary and breaks the rule-of-0 without need. It might also copy m_Kind from
  an uninitialized data-member. That is undefined behaviour. IMHO the cctor should be removed.

My initial though is that PPToken's copy constructor is used because I think it need to construct the element in the deque, in some cases, the PPToken get copied to the deque. Am I wrong?

Oh, you are correct,

Quote
In C++, the "Rule of Zero" is a guideline that suggests avoiding writing custom constructors, destructors, or copy/move assignment operators if the default compiler-generated versions will suffice. The rule states that if a class does not need custom resource management (like dynamic memory allocation), it can rely on the compiler-generated special member functions.

So, the copy constructor is not needed here, because the compiler will generate the same one if I remove it.


Quote
The compound statement after  if (IsEOF()) is repeated. It sets 2 data-members of PToken and should be delegated to PToken.

Do you mean that the

Code
    /** Check whether the Tokenizer reaches the end of the buffer (file) */
    bool IsEOF() const
    {
        return m_TokenIndex >= m_BufferLen;
    }

should be removed from the high level parser, but we can return a PPToken which has m_Kind field set as "EOF"?

Thanks.

Title: Re: use PPToken for preprocessor in native CC
Post by: blauzahn on September 18, 2024, 04:47:04 pm
No, I mean this:
Code
if (IsEOF())
        {
            m_Lex = wxEmptyString;
            m_Lex.m_Lexeme = wxEmptyString;
            m_Lex.m_Kind = PPTokenKind::EndOfFile;
I haven't looked into the context but setting several data-member usually is none of the caller's business. In addition to that, the data-member m_Lexeme was unnecessarily set twice, once through the implicit operator, once directly. Although I mostly avoid setter, a primitive one here may be like:
Code
class PToken
{
  // ...
  void setEof()
  {
  m_Lexeme = wxEmptyString; 
  m_Kind = PPTokenKind::EndOfFile;
  }
  // ...
};

if (IsEOF())
{
m_Lex.setEof();
return false;
}
It's PToken's responsability to decide what to do with its data-members when tagged as eof. Granted, they are public anyway, so it can not maintain an invariant anyway.
 
Title: Re: use PPToken for preprocessor in native CC
Post by: blauzahn on September 18, 2024, 04:49:27 pm
btw: Have you tried to use the gcc/clang sanitizer? It should be able to spot the ub.
Title: Re: use PPToken for preprocessor in native CC
Post by: ollydbg on September 19, 2024, 12:23:40 am
No, I mean this:
...
...
It's PToken's responsability to decide what to do with its data-members when tagged as eof. Granted, they are public anyway, so it can not maintain an invariant anyway.

Thanks, I understand your idea now.

btw: Have you tried to use the gcc/clang sanitizer? It should be able to spot the ub.

I think under windows, there is no such tool under the msys2/gcc environment. Am I correct?

All I know is a tool like: ssbssa/heob: Detects buffer overruns and memory leaks. (https://github.com/ssbssa/heob)

But it is also hard to read its log output, because the log is always long.
Title: Re: use PPToken for preprocessor in native CC
Post by: blauzahn on September 19, 2024, 09:45:40 am
Quote
I think under windows, there is no such tool under the msys2/gcc environment. Am I correct?
I do not know. Look for (lib)asan. A quick search gave links like:

https://github.com/msys2/MSYS2-packages/discussions/3020 (https://github.com/msys2/MSYS2-packages/discussions/3020)

Title: Re: use PPToken for preprocessor in native CC
Post by: ollydbg on September 19, 2024, 02:51:49 pm
Quote
I think under windows, there is no such tool under the msys2/gcc environment. Am I correct?
I do not know. Look for (lib)asan. A quick search gave links like:

https://github.com/msys2/MSYS2-packages/discussions/3020 (https://github.com/msys2/MSYS2-packages/discussions/3020)

Thanks, but when reading that discussion, I think that feature is not implemented yet, at least for mingw64/gcc platform in msys2.  :(
Title: Re: use PPToken for preprocessor in native CC
Post by: ollydbg on September 20, 2024, 01:02:10 pm
FYI:

I have add the fix commits in the branch:  https://github.com/asmwarrior/codeblocks_sfmirror/tree/master
And the github action build of my master branch(windows 64 version) is now on: main64 (https://github.com/asmwarrior/x86-codeblocks-builds/actions/runs/10955598063)

Happy coding.  ;)
Title: Re: use PPToken for preprocessor in native CC
Post by: ollydbg on September 21, 2024, 04:27:31 am
I found a more detailed answer about how address sanitizer like tools work under Windows, but sadly the mingw64/gcc is not included.

See here:
Compilers that support sanitizers (address, UB etc.) on Windows (https://stackoverflow.com/questions/55480333/clang-8-with-mingw-w64-how-do-i-use-address-ub-sanitizers)