Author Topic: use PPToken for preprocessor in native CC  (Read 2224 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6023
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
use PPToken for preprocessor in native CC
« on: September 17, 2024, 04:51:01 pm »
I have some code improvement about the preprocessor in our legacy Code completion plugin, see

https://github.com/asmwarrior/codeblocks_sfmirror/tree/master

What I want is that I would like to use "id compare" instead of "string compare" for the high level parser.

Comments are welcome, thanks.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline Miguel Gimenez

  • Developer
  • Lives here!
  • *****
  • Posts: 1632
Re: use PPToken for preprocessor in native CC
« Reply #1 on: September 17, 2024, 05:58:45 pm »
I have been looking at this commit (the exact location of the changes was not specified).

PPToken looks to me just a way to pack token information, so I assume the real benefit is using the deque afterwards.

I just suggest changing
Code
if (m_PPTokenStream.size() > 0)
to this
Code
if (!m_PPTokenStream.empty())
and removing this part
Code
    else
        ;// peekToken.Clear();

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6023
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: use PPToken for preprocessor in native CC
« Reply #2 on: September 18, 2024, 12:10:55 am »
I have been looking at this commit (the exact location of the changes was not specified).

PPToken looks to me just a way to pack token information, so I assume the real benefit is using the deque afterwards.

Thanks for the comment.

The "deque" here is used to move the token cursor forward or backward, because we have some interface to "peek token" (look ahead) or "undo token"(move the cursor backward), so I think a "deque" is a good structure to use.



Quote

I just suggest changing
Code
if (m_PPTokenStream.size() > 0)
to this
Code
if (!m_PPTokenStream.empty())

Thanks, but what's the difference? Maybe the "empty()" function runs much faster?

Quote
and removing this part
Code
    else
        ;// peekToken.Clear();

I will read this part of the code later.

Thanks.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline blauzahn

  • Almost regular
  • **
  • Posts: 179
Re: use PPToken for preprocessor in native CC
« Reply #3 on: September 18, 2024, 09:08:10 am »
I very briefly glanced across the commit pointed to by the link.

findings:
  • class PToken: data-member m_Kind is left uninitialized when PToken is default initialized and when initialized via the ctor with 4 args (while having 5 data-member).
      The call to the latter one is also a little errorprone. It might easily be used incorrectly because it has 3 int args. I'd consider member-initializer for the 4 integral data
      member and delete the default-ctor.
  • The cctor of PToken is unecessary and breaks the rule-of-0 without need. It might also copy m_Kind from
      an uninitialized data-member. That is undefined behaviour. IMHO the cctor should be removed.
  • The compound statement after  if (IsEOF()) is repeated. It sets 2 data-members of PToken and should be delegated to PToken.
Cheers


Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6023
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: use PPToken for preprocessor in native CC
« Reply #4 on: September 18, 2024, 10:28:52 am »
I very briefly glanced across the commit pointed to by the link.

findings:

Thanks for the comment.

Quote
class PToken: data-member m_Kind is left uninitialized when PToken is default initialized and when initialized via the ctor with 4 args (while having 5 data-member).
  The call to the latter one is also a little errorprone. It might easily be used incorrectly because it has 3 int args. I'd consider member-initializer for the 4 integral data
  member and delete the default-ctor.

Oh, yes, I should initialize the m_Kind member variable in the default constructor and other constructors.
About the argument: "PPToken(wxString lexeme, int charIndex, int lineIndex, int nestLevel)", I really don't know where does the "charIndex" come from, I will looked into it.


Quote
The cctor of PToken is unecessary and breaks the rule-of-0 without need. It might also copy m_Kind from
  an uninitialized data-member. That is undefined behaviour. IMHO the cctor should be removed.

My initial though is that PPToken's copy constructor is used because I think it need to construct the element in the deque, in some cases, the PPToken get copied to the deque. Am I wrong?

Oh, you are correct,

Quote
In C++, the "Rule of Zero" is a guideline that suggests avoiding writing custom constructors, destructors, or copy/move assignment operators if the default compiler-generated versions will suffice. The rule states that if a class does not need custom resource management (like dynamic memory allocation), it can rely on the compiler-generated special member functions.

So, the copy constructor is not needed here, because the compiler will generate the same one if I remove it.


Quote
The compound statement after  if (IsEOF()) is repeated. It sets 2 data-members of PToken and should be delegated to PToken.

Do you mean that the

Code
    /** Check whether the Tokenizer reaches the end of the buffer (file) */
    bool IsEOF() const
    {
        return m_TokenIndex >= m_BufferLen;
    }

should be removed from the high level parser, but we can return a PPToken which has m_Kind field set as "EOF"?

Thanks.

If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline blauzahn

  • Almost regular
  • **
  • Posts: 179
Re: use PPToken for preprocessor in native CC
« Reply #5 on: September 18, 2024, 04:47:04 pm »
No, I mean this:
Code
if (IsEOF())
        {
            m_Lex = wxEmptyString;
            m_Lex.m_Lexeme = wxEmptyString;
            m_Lex.m_Kind = PPTokenKind::EndOfFile;
I haven't looked into the context but setting several data-member usually is none of the caller's business. In addition to that, the data-member m_Lexeme was unnecessarily set twice, once through the implicit operator, once directly. Although I mostly avoid setter, a primitive one here may be like:
Code
class PToken
{
  // ...
  void setEof()
  {
  m_Lexeme = wxEmptyString; 
  m_Kind = PPTokenKind::EndOfFile;
  }
  // ...
};

if (IsEOF())
{
m_Lex.setEof();
return false;
}
It's PToken's responsability to decide what to do with its data-members when tagged as eof. Granted, they are public anyway, so it can not maintain an invariant anyway.
 

Offline blauzahn

  • Almost regular
  • **
  • Posts: 179
Re: use PPToken for preprocessor in native CC
« Reply #6 on: September 18, 2024, 04:49:27 pm »
btw: Have you tried to use the gcc/clang sanitizer? It should be able to spot the ub.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6023
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: use PPToken for preprocessor in native CC
« Reply #7 on: September 19, 2024, 12:23:40 am »
No, I mean this:
...
...
It's PToken's responsability to decide what to do with its data-members when tagged as eof. Granted, they are public anyway, so it can not maintain an invariant anyway.

Thanks, I understand your idea now.

btw: Have you tried to use the gcc/clang sanitizer? It should be able to spot the ub.

I think under windows, there is no such tool under the msys2/gcc environment. Am I correct?

All I know is a tool like: ssbssa/heob: Detects buffer overruns and memory leaks.

But it is also hard to read its log output, because the log is always long.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline blauzahn

  • Almost regular
  • **
  • Posts: 179
Re: use PPToken for preprocessor in native CC
« Reply #8 on: September 19, 2024, 09:45:40 am »
Quote
I think under windows, there is no such tool under the msys2/gcc environment. Am I correct?
I do not know. Look for (lib)asan. A quick search gave links like:

https://github.com/msys2/MSYS2-packages/discussions/3020


Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6023
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: use PPToken for preprocessor in native CC
« Reply #9 on: September 19, 2024, 02:51:49 pm »
Quote
I think under windows, there is no such tool under the msys2/gcc environment. Am I correct?
I do not know. Look for (lib)asan. A quick search gave links like:

https://github.com/msys2/MSYS2-packages/discussions/3020

Thanks, but when reading that discussion, I think that feature is not implemented yet, at least for mingw64/gcc platform in msys2.  :(
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6023
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: use PPToken for preprocessor in native CC
« Reply #10 on: September 20, 2024, 01:02:10 pm »
FYI:

I have add the fix commits in the branch:  https://github.com/asmwarrior/codeblocks_sfmirror/tree/master
And the github action build of my master branch(windows 64 version) is now on: main64

Happy coding.  ;)
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6023
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: use PPToken for preprocessor in native CC
« Reply #11 on: September 21, 2024, 04:27:31 am »
I found a more detailed answer about how address sanitizer like tools work under Windows, but sadly the mingw64/gcc is not included.

See here:
Compilers that support sanitizers (address, UB etc.) on Windows


If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.