Also, I have move the "handling conditional preprocessor code to Tokenizer class".
Here, I think it is better to move the conditional preprocessor related code from parserthread to tokenizer class.
At the moment, this was done in the parserthread, so we need some specific code to check whether there is a #if, this is even worse when like
void function( A a, B b
#ifdef XXXX
, C c)
#else
, C c, D d)
#endif
, then,
( A a, B b #ifdef XXXX, C c)
will be returned as a whole token.because In the Tokenizer, a pair of braces will be returned as a whole token, thus made conditional directive unbalanced.
for example :CC fails to parse parts of boost (http://forums.codeblocks.org/index.php/topic,11909.0.html)
for another example, in the source of parserThread.cpp, handling class, we need to check the #if statement like:
// handle preprocessor directives in class definition, e.g.
class MyClass
#ifdef FOO
: public MyClass1
, public MyClass2
#endif
{}
This modification will let the Tokenizer give a transparent and consistent way to feed the parserthread.
Another improvement of the parser, I have fix two bugs in parsing VC 2008/2005 header files. (Both the STL and the window API functions can auto prompted correctly).
bug one: In the Visual C++ 's header for std::string, there are some statement:
enum
{ // length of internal buffer, [1, 16]
_BUF_SIZE = 16 / sizeof (_Elem) < 1 ? 1
: 16 / sizeof(_Elem)};
Note the "<", in parser will try the skip the block of "<", but badly, there's no ">" here, so, the parser just skip to the EOF.
bug two:
In the windows API heade file, there's code statement like below:
__inline
BOOL
GetMessage(
LPMSG lpMsg,
HWND hWnd,
UINT wMsgFilterMin,
UINT wMsgFilterMax
)
{
#ifdef UNICODE
return GetMessageW(
#else
return GetMessageA(
#endif
lpMsg,
hWnd,
wMsgFilterMin,
wMsgFilterMax
);
}
the parser failed because there are #if in the function arguments.
All the two bugs were reporteded by Loaden.
@morten:
Sorry for my poor English explanation. I don't have patch of blueshake included in my package.
@JGM:
Thanks for the encouragement.
The aim of this parser tester project was that all the "TRACE" logs were redirected to a standalone simple App instead of the "Debug Log" panel in the C::B host program. So, when you want to check whether the parsing(which involve the parserthread and tokenizer class) works correctly, you can check these log messages.
parsertest.cbp contains several source directly from the CC's source, in fact, I just add two extra source files. one is "parsertest.cpp" which create a simple GUI window, the other is "parser2.cpp" which is a minimal mimic of the "parser.cpp". All the header files were from CC source.
1, You download the ParsertesterV1.zip package, and extract the files to the "src/plugin/codecompletion/parser" folder, then open the parsertest.cbp.
2, build this project, also, you need to supply a file named "test.cpp" as the source file to be parsed.
3, run the generated APP, You will see, all the "TRACE" message were shown in the ParserTester mainframe.
4, A log.txt will be saved after testing, you can exam it.
Note:
To let the parsertest.cbp build successfully, I have change the Macros in the front of parserthread.cpp and tokenizer.cpp. For example:
In the parserthread.cpp
#ifdef PARSER_TEST
extern void ParserTrace(const wxChar* format, ...);
#define TRACE(format, args...)\
ParserTrace(format , ## args)
#else
#if PARSERTHREAD_DEBUG_OUTPUT
#define TRACE(format, args...)\
Manager::Get()->GetLogManager()->DebugLog(F( format , ## args))
#else
#define TRACE(format, args...)
#endif
#endif
So, the TRACE was replaces by the "ParserTrace" routing, which do the hack. This modification doesn't interrupt building the normal Codecompletion plugin, so the inference to the CC's source is minimal.
Also, this package, I have other improvements in the Tokenizer class.( even you don't have these improvement, you can still compile and build the parsertest.cbp)
one improvement is that I add "handling conditional preprocessor" in the Tokenizer::SkipUnwanted function. so, The first thing we are checking is if the CurrentChar is "#", we need to handle the "#if #else #elif #endif". I comment out these code in Tokenizer::DoGetToken()
// else if (c == '(')
// {
// m_IsOperator = false;
//
// // skip blocks () []
// if (!SkipBlock(CurrentChar()))
// return wxEmptyString;
//
// str = FixArgument(m_Buffer.Mid(start, m_TokenIndex - start));
// CompactSpaces(str);
// }
Because these code snippet never consider the #if like statement in the function argument.
Instead, I add another function called : Tokenizer::DoAdvanceGetToken
wxString Tokenizer::DoAdvanceGetToken()
{
wxString str = DoGetToken();
if ( str == _T("(") )
{
int braceLevel = 1;
wxString temp;
TokenizerState undoTokenizerState;
undoTokenizerState = m_State;
m_State = tsSkipNone;
do
{
temp = DoGetToken();
str << temp;
if ( temp == _T("(") )
{
braceLevel++;
}
else if (temp == _T(")"))
{
braceLevel--;
}
}
while( braceLevel>0 && (!temp.IsEmpty()) );
m_State = undoTokenizerState;
}
return str;
}
This is because the "(" and ")" should always be matched, so we still return a "(XXXX,YYYY)", but the conditional preprocessor code inside the function were stripped.
Here is another improvement, these improvement is only for parsing VC 2008/2005's header file correctly, as I stated in my previous posts.
Edit
We have only tested this "parsertest" project under Windows.
Important! To test the VC compiler header files also need to replace the two macro rules:
<_STD_BEGIN>
<![CDATA[-namespace std {]]>
</_STD_BEGIN>
<_STD_END>
<![CDATA[}]]>
</_STD_END>
In the ParserTest.cpp, if you would like to add the replacement tokens, use these statement:
void Start()
{
Parser client(NULL);
wxString fileName = _T("test.cpp");
FileLoader* loader = new FileLoader(fileName);
(*loader)();
TokensTree* tree = new TokensTree();
Tokenizer::SetReplacementString(_T("_GLIBCXX_STD"), _T("std"));
Tokenizer::SetReplacementString(_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE"), _T("+namespace"));
Tokenizer::SetReplacementString(_T("_GLIBCXX_END_NESTED_NAMESPACE"), _T("}"));
Tokenizer::SetReplacementString(_T("_GLIBCXX_BEGIN_NAMESPACE"), _T("+namespace"));
Tokenizer::SetReplacementString(_T("_GLIBCXX_END_NAMESPACE"), _T("}"));
Tokenizer::SetReplacementString(_T("_GLIBCXX_END_NAMESPACE_TR1"), _T("}"));
Tokenizer::SetReplacementString(_T("_GLIBCXX_BEGIN_NAMESPACE_TR1"), _T("-namespace tr1 {"));
// for VC2005/2008
Tokenizer::SetReplacementString(_T("_STD_BEGIN"), _T("-namespace std {"));
Tokenizer::SetReplacementString(_T("_STD_END"), _T("}"));
ParserThreadOptions opts;
opts.wantPreprocessor = false;
opts.useBuffer = false;
opts.bufferSkipBlocks = false;
opts.bufferSkipOuterBlocks = false;
opts.followLocalIncludes = false;
opts.followGlobalIncludes = false;
opts.loader = loader;
ParserThread* ph = new ParserThread(&client, fileName, true, opts, tree);
bool b = ph->Parse();
delete ph;
ShowLog();
}
Here is a improved parserTester version 2.
A lot of GUI improved by Loaden. Added a "find" non-model dialog.
Also, we have moved the project sources to a separate folder instead of the "/plugins/codecompletion/parser"
So, you need to download the 7z package, and unzip it. it should have a "ptest" folder, you need to paste this folder (include all the files) to "codecompletion/parser", so, now, all the source files were in "codecompletion/parser/ptest".
Also, a you need to apply these patches to let the "parserTester" project compile successfully. ( This patch never hurt the building of standard CC)
In this patch, I have fix this problem: Insert all class method without implementation question (http://forums.codeblocks.org/index.php/topic,12091.0.html)
Comments are welcome!!!
By the way, I don't change these code snippet in tokenizer.cpp ,but the TortoiseSVN always thought I was changing that...
#ifdef __WXMSW__ // This is a Windows only bug!
else if (c == 178 || c == 179 || c == 185) // fetch ² and ³
{
str = c;
MoveToNextChar();
}
#endif
I'm not sure why.
[attachment deleted by admin]
This is a pure patch to let the parserthread.cpp and tokenizer.cpp supporting ParserTester project.
Index: parserthread.cpp
===================================================================
--- parserthread.cpp (revision 6218)
+++ parserthread.cpp (working copy)
@@ -21,12 +21,18 @@
#define PARSERTHREAD_DEBUG_OUTPUT 0
+#ifdef PARSER_TEST
+ extern void ParserTrace(const wxChar* format, ...);
+ #define TRACE(format, args...)\
+ ParserTrace(format , ## args)
+#else
#if PARSERTHREAD_DEBUG_OUTPUT
- #define TRACE(format, args...)\
- Manager::Get()->GetLogManager()->DebugLog(F( format , ## args))
+ #define TRACE(format, args...)\
+ Manager::Get()->GetLogManager()->DebugLog(F( format , ## args))
#else
- #define TRACE(format, args...)
+ #define TRACE(format, args...)
#endif
+#endif
int THREAD_START = wxNewId();
int THREAD_END = wxNewId();
@@ -1184,11 +1190,13 @@
while (!token.IsEmpty() && token != ParserConsts::kw_endif)
token = m_Tokenizer.GetToken();
--m_PreprocessorIfCount;
-#if PARSERTHREAD_DEBUG_OUTPUT
+#if PARSERTHREAD_DEBUG_OUTPUT || defined PARSER_TEST
int l = m_Tokenizer.GetNestingLevel();
#endif
m_Tokenizer.RestoreNestingLevel();
+#if PARSERTHREAD_DEBUG_OUTPUT || defined PARSER_TEST
TRACE(_T("HandlePreprocessorBlocks() : Restoring nesting level: %d (was %d)"), m_Tokenizer.GetNestingLevel(), l);
+#endif
}
else if (preproc==ParserConsts::kw_endif) // #endif
--m_PreprocessorIfCount;
Index: tokenizer.cpp
===================================================================
--- tokenizer.cpp (revision 6218)
+++ tokenizer.cpp (working copy)
@@ -15,15 +15,22 @@
#include "manager.h"
#include <cctype>
#include <globals.h>
+#include "logmanager.h"
#define TOKENIZER_DEBUG_OUTPUT 0
+#ifdef PARSER_TEST
+ extern void ParserTrace(const wxChar* format, ...);
+ #define TRACE(format, args...)\
+ ParserTrace(format , ## args)
+#else
#if TOKENIZER_DEBUG_OUTPUT
#define TRACE(format, args...)\
Manager::Get()->GetLogManager()->DebugLog(F( format , ## args))
#else
#define TRACE(format, args...)
#endif
+#endif
namespace TokenizerConsts
{